Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run git gc periodically #564

Merged
merged 23 commits into from
Mar 16, 2021
Merged

Run git gc periodically #564

merged 23 commits into from
Mar 16, 2021

Conversation

ikhoon
Copy link
Contributor

@ikhoon ikhoon commented Feb 25, 2021

Motivation:

Central Dogma git repository does not run git gc automatically.
git gc will save disk storage and improve performance.

Modifications:

  • Add RepositoryGarbageCollectionPlugin that schedules GC with a Quartz cron expression
  • Add RepositoryGarbageCollectionConfig in order to configure a GC schedule and the minimum number of new commits for the GC
  • Add GitGcRevision that reads and writes the last revision when GC was run.

Result:

Motivation:

Central Dogma git repository does not run git gc automatically.
git gc will save disk storage and improve performance.

Modifications:

- Add `GarbageCollectingServicePlugin` that run git gc every day

Result:

- Central Dogma now run git gc everyday.
- Fixes line#264
@ikhoon ikhoon added this to the 0.47.2 milestone Feb 25, 2021
@codecov
Copy link

codecov bot commented Feb 26, 2021

Codecov Report

Merging #564 (c215202) into master (091993d) will decrease coverage by 0.27%.
The diff coverage is 49.14%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #564      +/-   ##
============================================
- Coverage     70.10%   69.83%   -0.28%     
- Complexity     3271     3302      +31     
============================================
  Files           329      332       +3     
  Lines         13030    13219     +189     
  Branches       1405     1427      +22     
============================================
+ Hits           9135     9231      +96     
- Misses         3018     3097      +79     
- Partials        877      891      +14     
Impacted Files Coverage Δ Complexity Δ
...ecorp/centraldogma/server/CentralDogmaBuilder.java 52.30% <0.00%> (-1.67%) 23.00 <0.00> (ø)
...internal/storage/repository/RepositoryWrapper.java 29.41% <0.00%> (-1.84%) 8.00 <0.00> (ø)
...al/storage/repository/cache/CachingRepository.java 85.04% <0.00%> (-1.62%) 35.00 <0.00> (ø)
...raldogma/server/storage/repository/Repository.java 66.96% <ø> (ø) 33.00 <0.00> (ø)
...internal/storage/repository/git/GitRepository.java 79.17% <33.33%> (-1.06%) 151.00 <0.00> (ø)
...nal/storage/RepositoryGarbageCollectionPlugin.java 49.35% <49.35%> (ø) 15.00 <15.00> (?)
...internal/storage/repository/git/GitGcRevision.java 51.66% <51.66%> (ø) 10.00 <10.00> (?)
...ogma/server/RepositoryGarbageCollectionConfig.java 88.88% <88.88%> (ø) 4.00 <4.00> (?)
...necorp/centraldogma/server/CentralDogmaConfig.java 80.95% <100.00%> (+0.26%) 45.00 <6.00> (+1.00)
...erver/internal/storage/PurgeSchedulingService.java 73.56% <0.00%> (-3.45%) 16.00% <0.00%> (-3.00%)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 091993d...c215202. Read the comment docs.

Copy link
Member

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome!
Left some nit comments. 😄

@trustin
Copy link
Member

trustin commented Mar 4, 2021

We should not perform GC when the gain is not large enough, e.g. we shouldn't perform GC when only a few hundred commits were added since the last GC. We will otherwise see a large amount of disk I/O every day.

@ikhoon
Copy link
Contributor Author

ikhoon commented Mar 4, 2021

We should not perform GC when the gain is not large enough, e.g. we shouldn't perform GC when only a few hundred commits were added since the last GC. We will otherwise see a large amount of disk I/O every day.

Good suggestion! The GC will run after 200 commits were added since the last GC. The 200 was referenced in https://docs.gitlab.com/ee/administration/housekeeping.html#manual-housekeeping

@trustin
Copy link
Member

trustin commented Mar 4, 2021

I think that's implementation detail which might not be the case in jGit. How about trying to run GC against a large repository using jGit to make sure it doesn't thrash disk on every GC?

@ikhoon
Copy link
Contributor Author

ikhoon commented Mar 4, 2021

How about trying to run GC against a large repository using jGit to make sure it doesn't thrash disk on every GC?

That makes sense. Only a few repositories have mega commits which are suitable GC targets.

@trustin
Copy link
Member

trustin commented Mar 4, 2021

Running GC at least once is a good idea in my opinion though, because it will reduce the number of files in the data directory, by packing multiple objects into a single pack file.

Copy link
Member

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, @ikhoon!
Could you also update the documentation, please?

@ikhoon
Copy link
Contributor Author

ikhoon commented Mar 10, 2021

Could you also update the documentation, please?

The documentation meant the PR description? Then I did.

@minwoox
Copy link
Member

minwoox commented Mar 10, 2021

The documentation meant the PR description? Then I did.

What I meant was this page. 😄
https://line.github.io/centraldogma/setup-configuration.html

@ikhoon
Copy link
Contributor Author

ikhoon commented Mar 10, 2021

The documentation meant the PR description? Then I did.

What I meant was this page. 😄
https://line.github.io/centraldogma/setup-configuration.html

https://github.com/line/centraldogma/pull/564/files#diff-650dc6b906f91adeda4a61b5c71aa8c9b5680d6266d9d7179072579e7d5a4cb5

Is there anything I am missing? 😅

@minwoox
Copy link
Member

minwoox commented Mar 10, 2021

Is there anything I am missing?

Yeah, letting me know about it. 🤣

@minwoox
Copy link
Member

minwoox commented Mar 16, 2021

Thanks, @ikhoon!

@ikhoon ikhoon deleted the git-gc branch April 8, 2021 12:51
minwoox added a commit to minwoox/centraldogma that referenced this pull request Jul 6, 2021
Motivation:
We no longer need Git gc because we are going to provide a way to remove old commits.
line#575

Modifications:
- Revert Git gc.

Result:
- This reverts commit e6caf26.
minwoox added a commit that referenced this pull request Jul 6, 2021
Motivation:
We no longer need Git gc because we are going to provide a way to remove old commits.
#575

Modifications:
- Revert Git gc.

Result:
- This reverts commit e6caf26.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Periodic garbage collection of Git repositories
3 participants