Skip to content

fix(git-integration): target commit from an outdated branch (CM-717)#3480

Merged
mbani01 merged 4 commits intomainfrom
fix/target_commit_from_outdated_branch
Oct 7, 2025
Merged

fix(git-integration): target commit from an outdated branch (CM-717)#3480
mbani01 merged 4 commits intomainfrom
fix/target_commit_from_outdated_branch

Conversation

@mbani01
Copy link
Copy Markdown
Contributor

@mbani01 mbani01 commented Oct 7, 2025

Changes proposed ✍️

This pull request introduces significant improvements to the repository cloning and processing workflow, primarily by tracking and responding to changes in the default branch of git repositories. The changes add a branch column to the database, update the models and migration scripts, and enhance the cloning logic to detect and handle default branch changes, ensuring accurate and efficient processing.

Database schema and model updates:

  • Added a new branch column to the git.repositories table to track the default branch for each repository, with corresponding migration scripts to add and remove the column. [1] [2]
  • Updated the Repository model to include the new branch field, and updated relevant SQL queries and return values to handle the branch information. [1] [2] [3] [4]

Cloning logic improvements:

  • Implemented logic to detect if the default branch has changed on the remote repository (has_default_branch_changed) and to determine the appropriate cloning strategy (determine_clone_strategy). Full clones are now triggered when the branch changes or for new repositories, while incremental (batched) clones are used otherwise. [1] [2]
  • Added a utility function get_remote_default_branch to determine the default branch of a remote repository without cloning it.

Clone and commit processing updates:

  • Refactored the clone service to use the new strategy logic, including renaming internal methods for clarity and propagating the clone_with_batches flag through the batch info and commit processing logic. [1] [2] [3] [4] [5]
  • Updated the repository worker to save the current default branch to the database after processing, ensuring branch tracking remains accurate.

These changes together make the system more robust against changes in repository configuration and improve the efficiency of repository processing.

Checklist ✅

  • Label appropriately with Feature, Improvement, or Bug.
  • Add screehshots to the PR description for relevant FE changes
  • New backend functionality has been unit-tested.
  • API documentation has been updated (if necessary) (see docs on API documentation).
  • Quality standards are met.

@mbani01 mbani01 self-assigned this Oct 7, 2025
@mbani01 mbani01 requested a review from Copilot October 7, 2025 17:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Implements default branch tracking and adaptive cloning strategy for git repositories to improve correctness when the default branch changes and optimize incremental processing.

  • Adds branch column and model field to persist the tracked default branch.
  • Introduces branch-change detection and clone strategy selection (full vs minimal/batched).
  • Propagates clone mode via CloneBatchInfo instead of passing a separate parameter.

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
repository_worker.py Updates processing to remove explicit clone mode flag and persist detected default branch after final batch.
services/utils.py Adds remote default branch discovery helper and local default branch retrieval.
commit_service.py Refactors commit processing to read clone mode from batch info.
clone_service.py Adds strategy logic, branch change detection, and refactors clone operations.
models/repository.py Adds branch field to Repository model.
models/clone_batch.py Adds clone_with_batches flag to batch info model.
database/crud.py Extends update_last_processed_commit to (optionally) store branch (currently always overwrites).
migrations (add/remove branch) Schema migration to add/drop branch column with documentation comment.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

async def _perform_full_clone(self, repo_path: str, remote: str):
"""Perform full repository clone"""
self.logger.info(f"Performing full clone for repo {remote}...")
await run_shell_command(["git", "clone", remote, repo_path], cwd=repo_path)
Copy link

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The full clone command clones into a destination path (repo_path) while also setting cwd to that same existing directory created by mkdtemp, causing git clone to fail because the destination already exists. Use a parent temp directory and let git create the target folder, or keep the temp directory as the working dir and clone into '.' (or omit the destination argument) so the repository contents populate the existing directory. Example fix: create temp_dir = tempfile.mkdtemp(...); then run_shell_command(["git", "clone", remote, "."], cwd=temp_dir) and treat temp_dir as repo_path.

Suggested change
await run_shell_command(["git", "clone", remote, repo_path], cwd=repo_path)
await run_shell_command(["git", "clone", remote, "."], cwd=repo_path)

Copilot uses AI. Check for mistakes.
Comment on lines 141 to +148
sql_query = """
UPDATE git.repositories
SET "lastProcessedCommit" = $1,
"branch" = $2,
"updatedAt" = NOW()
WHERE id = $2
WHERE id = $3
"""
result = await execute(sql_query, (commit_hash, repo_id))
result = await execute(sql_query, (commit_hash, branch, repo_id))
Copy link

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing no branch (None) will overwrite the existing branch value with NULL, contradicting the "optionally" wording and preventing persistence of a previously stored branch when only the commit updates. Implement conditional SQL (two variants) or use CASE to avoid clearing the branch when branch is None. Example: build query without updating "branch" if branch is None, or use SET "branch" = COALESCE($2, "branch") and pass branch (but only if you never need to intentionally clear it).

Copilot uses AI. Check for mistakes.
@mbani01 mbani01 merged commit 2b6f2fe into main Oct 7, 2025
13 checks passed
@mbani01 mbani01 deleted the fix/target_commit_from_outdated_branch branch October 7, 2025 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants