fix(git-integration): insertions deletions memory consumption [CM-724]#3505
Conversation
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
There was a problem hiding this comment.
This is the final PR Bugbot will review for you during this billing cycle
Your free Bugbot reviews will reset on November 12
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| if CommitService.should_skip_commit(full_commit_text, edge_commit_hash): | ||
| continue | ||
|
|
||
| commit_text, numstats_text = full_commit_text.split(CommitService.NUMSTAT_SPLITTER) |
There was a problem hiding this comment.
Bug: Commit Splitting Fails on Incorrect NUMSTAT Splitter Usage
The split() operation on full_commit_text expects exactly two parts: commit metadata and numstat lines, separated by NUMSTAT_SPLITTER. If the splitter appears zero or multiple times (e.g., within a commit message), unpacking the result will raise a ValueError, causing commit processing to fail.
| finally: | ||
| del commit | ||
| del commit_lines | ||
| del numstats_text |
There was a problem hiding this comment.
Bug: Unconditional Deletion in finally Block
The finally block in process_commits_chunk attempts to del variables like commit, commit_lines, and numstats_text unconditionally. This can cause a NameError if an exception occurs before these variables are defined within an iteration, or if commit was already deleted in a previous loop iteration.
This pull request refactors the commit processing pipeline in
commit_service.pyto streamline how commit metadata and numstat (insertions/deletions) data are extracted and handled. The main change is the unification of commit and numstat extraction into a single git log command and the corresponding update to downstream processing logic. Several memory optimizations and minor bug fixes are also included.Commit and Numstat Extraction Refactor:
COMMIT_START_SPLITTERandNUMSTAT_SPLITTER, and updated the git log formatting to include both metadata and numstat in a single command. (commit_service.py) [1] [2] [3] [4] [5] [6]process_commits_chunk,_construct_commit_dict,_parse_numstats) [1] [2] [3] [4] [5] [6] [7] [8]Memory and Performance Improvements:
commit,commit_lines,numstats_text,chunk_activities_db,chunk_activities_queue) after use to reduce memory usage during batch processing. [1] [2] [3] [4]Interface and Argument Changes:
Bug Fixes and Minor Improvements:
_safe_decodeto preferiso-8859-1beforecp1252, which is more robust for legacy content. (utils.py)queue_service.py)