-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a way to remove old commits #619
Conversation
Motivation: Central Dogma uses jGit to store data. Due to the nature of Git that stores unlimited history, Central Dogma will eventually get in trouble managing disk usage. We can handle this by maintaing the primary and secondary Git repository internally. This works in this way: 1 Commits are pushed to the primary Git repository. 2 If the number of commits exceed the threshold (`minRetentionCommits`), then the secondary Git repository is created. 3 Commits are pushed to the both primary and secondary Git repositories. 4 If the secondary Git repository has the number of commits more than the threshold; - The secondary Git repository is promoted to the primary Git repository. - The primary Git repository is removed completely. - Another secondary Git repository is created. 5 Back to 3. Modifications: - TBD Result: - Close 575 - TBD Todo: - Provide a way to set `minRetentionCommits` and `minRetentionDay` for each repository. - Support mirroring from CD to external Git.
Codecov Report
@@ Coverage Diff @@
## master #619 +/- ##
============================================
- Coverage 69.91% 67.43% -2.48%
- Complexity 3305 3434 +129
============================================
Files 333 340 +7
Lines 13135 14323 +1188
Branches 1427 1611 +184
============================================
+ Hits 9183 9659 +476
- Misses 3079 3727 +648
- Partials 873 937 +64
Continue to review full report at Codecov.
|
I think this is ready. Please, review this PR when you have time. 😄 |
...linecorp/centraldogma/server/internal/storage/repository/git/RepositoryMetadataDatabase.java
Outdated
Show resolved
Hide resolved
...linecorp/centraldogma/server/internal/storage/repository/git/RepositoryMetadataDatabase.java
Outdated
Show resolved
Hide resolved
...n/java/com/linecorp/centraldogma/server/internal/storage/repository/git/GitRepositoryV2.java
Outdated
Show resolved
Hide resolved
...n/java/com/linecorp/centraldogma/server/internal/storage/repository/git/GitRepositoryV2.java
Outdated
Show resolved
Hide resolved
...ain/java/com/linecorp/centraldogma/server/internal/storage/repository/git/GitRepository.java
Outdated
Show resolved
Hide resolved
...n/java/com/linecorp/centraldogma/server/internal/storage/repository/git/GitRepositoryV2.java
Show resolved
Hide resolved
...n/java/com/linecorp/centraldogma/server/internal/storage/repository/git/GitRepositoryV2.java
Outdated
Show resolved
Hide resolved
...n/java/com/linecorp/centraldogma/server/internal/storage/repository/git/GitRepositoryV2.java
Outdated
Show resolved
Hide resolved
...a/com/linecorp/centraldogma/server/internal/storage/repository/git/GitRepositoryManager.java
Outdated
Show resolved
Hide resolved
server/src/main/java/com/linecorp/centraldogma/server/CommitRetentionConfig.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ikhoon All fixed. PTAL. 😉
server/src/main/java/com/linecorp/centraldogma/server/CommitRetentionConfig.java
Outdated
Show resolved
Hide resolved
...n/java/com/linecorp/centraldogma/server/internal/storage/repository/git/GitRepositoryV2.java
Outdated
Show resolved
Hide resolved
...n/java/com/linecorp/centraldogma/server/internal/storage/repository/git/GitRepositoryV2.java
Show resolved
Hide resolved
...n/java/com/linecorp/centraldogma/server/internal/storage/repository/git/GitRepositoryV2.java
Outdated
Show resolved
Hide resolved
...n/java/com/linecorp/centraldogma/server/internal/storage/repository/git/GitRepositoryV2.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked through everything, but looks awesome! 👍 Let me keep taking a look, but only nits for now 🙏
import org.junit.jupiter.api.Test; | ||
import org.junit.jupiter.api.io.TempDir; | ||
|
||
class RepositoryMetadataDatabaseTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional; I think it might useful and trivial to add a test where we can verify RepositoryMetadataDatabase
also reads the file correctly.
@Test
void writeAndRead() throws Exception {
final RepositoryMetadataDatabase other = new RepositoryMetadataDatabase(
db.getRootDir(), false);
assertThat(other.primaryRepoDir().getName()).isEqualTo(db.primaryRepoDir().getName());
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the test. 🙇
if (!(firstRevision.major() <= revision.major() && revision.major() <= headRevision.major())) { | ||
throw new RevisionNotFoundException( | ||
"revision: " + revision + | ||
" (expected: " + firstRevision.major() + " <= revision <= " + headRevision.major() + ")"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" (expected: " + firstRevision.major() + " <= revision <= " + headRevision.major() + ")"); | |
" (expected: " + firstRevision.major() + " <= revision <= " + headRevision.major() + ')'); |
@@ -185,23 +223,24 @@ private synchronized void put(Revision revision, ObjectId commitId, boolean safe | |||
buf.flip(); | |||
|
|||
// Append a record to the file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Append a record to the file. | |
// Append or overwrite a record in the file. |
requireNonNull(rollingRepositoryInitialRevision, "rollingRepositoryInitialRevision"); | ||
checkState(shouldCreateRollingRepository(rollingRepositoryInitialRevision, | ||
minRetentionCommits, minRetentionDays) == | ||
rollingRepositoryInitialRevision, "aaa"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol 😝 aaa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😱
// so we should catch up. | ||
final RevisionRange revisionRange = new RevisionRange( | ||
rollingRepositoryInitialRevision.forward(1), headRevision); | ||
final List<Commit> commits = primaryRepo.listCommits(ALL_PATH, MAX_MAX_COMMITS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q) I'm not sure how realistic/unrealistic the magic number 1000 is, but is it safe to say this won't be exceeded?
I'm asking because it seems like this implementation can break if secondaryRepo.HEAD != primaryRepo.HEAD
(no problem if this isn't realistic)
I guess an alternative may be to allow partial secondaryRepo
states.
This might also allow shorter write lock durations at the cost of additional complexity.
(or maybe we can just set an arbitrarily high value 🤔 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I realized that we're only catching up revisions between the previous HEAD
revision to the current HEAD revision. I guess this scenario isn't really realistic at least with the current specification 😅 feel free to ignore
Revision rollingRepositoryInitialRevision) { | ||
writeLock(); | ||
try { | ||
logger.info("Promoting the secondary repository in {}/{}.", parent.name(), originalRepoName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q) What do you think of validating secondaryRepo.HEAD == primaryRepo.HEAD
here? (inside the write lock)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good suggestion. 👍
* For example, when {@code minRetentionCommits} is set to 2000 and {@code minRetentionDays} is set to 14, | ||
* the commits that are created more than 2000 are not removed until 14 days have passed. Set 0 to retain | ||
* all commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused with the policy when both minRetentionCommits
and minRetentionDays
are set.
the commits that are created more than 2000 are not removed until 14 days have passed.
IIUC, if commits seem not to be removed in 14 days even if the number of commits exceeds 2000.
2000 minRetentionCommits
with 14 minRetentionDays
sounds like Central Dogma allows 2000 commits for 14 days.
If commits are created more than 2000 in 14 days, some old commits need to be removed. Although the old commits were less than 14 days old.
this.author = author; | ||
} | ||
|
||
private static boolean isEmpty(File dir) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Move this method below the constructors?
context.commandExecutor().execute( | ||
Command.createRollingRepository(project.name(), repo.name(), revision, | ||
config.minRetentionCommits(), | ||
config.minRetentionDays())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call .join()
for synchronous executions?
final ProjectManager pm = context.projectManager(); | ||
for (Project project : pm.list().values()) { | ||
for (Repository repo : project.repos().list().values()) { | ||
final Revision revision = repo.shouldCreateRollingRepository(config.minRetentionCommits(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check stopping
for each repo so that a job successfully stops in gracefully shutting down time?
} | ||
} | ||
|
||
private GitRepositoryV2(Project parent, File repoDir, Executor repositoryWorker, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you leave a Javadoc or comment about the difference of the two constructors? :-)
} | ||
} | ||
|
||
static InternalRepository of(Project parent, String originalRepoName, File repoDir, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To rhyme with open
?
static InternalRepository of(Project parent, String originalRepoName, File repoDir, | |
static InternalRepository create(Project parent, String originalRepoName, File repoDir, |
throw new StorageException("found more than one parent: " + | ||
gitRepo.getDirectory()); | ||
} | ||
rebuild(gitRepo, revWalk, headRevision, revCommit, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
findFirstRevisionOrRebuild()
?
} | ||
} | ||
|
||
// Create a new instance only when necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I guess most requests are sent with Revision.HEAD
.
Revision headRevision = this.headRevision;
if (headRevision.major() == major) {
return headRevision;
}
if (secondaryRepo != null) { | ||
promoteSecondaryRepo(secondaryRepo, rollingRepositoryInitialRevision); | ||
} else { | ||
createSecondaryRepo(rollingRepositoryInitialRevision); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per SPR, how about removing createSecondaryRepo()
in promoteSecondaryRepo()
?
if (secondaryRepo != null) { | |
promoteSecondaryRepo(secondaryRepo, rollingRepositoryInitialRevision); | |
} else { | |
createSecondaryRepo(rollingRepositoryInitialRevision); | |
} | |
if (secondaryRepo != null) { | |
promoteSecondaryRepo(secondaryRepo, rollingRepositoryInitialRevision); | |
} | |
createSecondaryRepo(rollingRepositoryInitialRevision); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good suggestion. 👍
I am just asking out of curiosity 😅. I thought of a very naive approach of doing something jGit equivalent for Rolling repository is awesome but at the same time looks relatively complicated compared to the rebase option. Maybe rewriting thousands of commit history could be too heavy or rebase doesn't give us a desired result at all? |
@minwoox is away for a while, so leaving my guess/analysis 😅 Feel free to correct me
From what I understand, this is what we're trying to do. We're trying to squash commits between
My guess is so that we can make the switch atomically. If any of the intermediate steps fails while squashing (which I think would involve multiple IOs to disk) the repository may result in a corrupt state. |
|
||
final InternalRepository repo = secondaryRepo != null ? secondaryRepo : primaryRepo; | ||
if (exceedsMinRetention(repo, headRevision, minRetentionCommits, minRetentionDays)) { | ||
return headRevision; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The {@link Repository} retains at least the number of {@code minRetentionCommits} when more than
* {@code minRetentionCommits} are made.
Q) From the javadocs, I guessed minRetentionCommits
will be retained, but it seems like all commits up to headRevision
will be squashed. Is my understanding correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it seems like all commits up to headRevision will be squashed
The returned headRevision
will be used as the initial revision for creating the secondary repository. So it's not squashed. Did this what you mean?
author, summary, detail, markup, applyingChanges, false); | ||
this.headRevision = res.revision; | ||
final InternalRepository secondaryRepo = this.secondaryRepo; | ||
if (secondaryRepo != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it's possible that primaryRepo.HEAD
can diverge from secondaryRepo.HEAD
since the two calls to commit
aren't transactional.
What do you think of adding the following condition to prevent further divergence?
if (secondaryRepo != null) { | |
if (secondaryRepo != null && Objects.equals(primaryRepo.headRevision(), secondaryRepo.headRevision())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the two calls to commit aren't transactional.
We acquired writeLock
before committing so I think it's transactional and it shouldn't diverge.
Let me add an assertion for that. 😉
// so we should catch up. | ||
final RevisionRange revisionRange = new RevisionRange( | ||
rollingRepositoryInitialRevision.forward(1), headRevision); | ||
final List<Commit> commits = primaryRepo.listCommits(ALL_PATH, MAX_MAX_COMMITS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I realized that we're only catching up revisions between the previous HEAD
revision to the current HEAD revision. I guess this scenario isn't really realistic at least with the current specification 😅 feel free to ignore
@jrhee17 Thanks for the reply!! I thought we could just do something like..: Git.wrap(jGitRepository).rebase().setUpstream("HEAD~2000").runInteractively(new InteractiveHandler() {
@Override
public void prepareSteps(List<RebaseTodoLine> steps) {
if (steps.size() <= 2000) {
return;
}
for (int i = 0; i < 1000; i++) {
try {
steps.get(i).setAction(Action.SQUASH);
} catch (IllegalTodoFileModification e) {
// exception handling..
}
}
}
@Override
public String modifyCommitMessage(String commit) {
return commit;
}
})
I am not sure if I am getting it correctly, but agree that there seems to be no easy way to handle IO errors or merge conflict with the rebase option (though I doubt squashing will cause conflict). |
If a repository has a long history, it would not release a write lock for a long time. The long lock causes to block threads waiting for the lock. |
That's a good question, @ks-yim and Thanks @jrhee17 and @ikhoon for the answer. 😄 There's another reason that I didn't use squash because it modifies the Git Hashes that we are internally using for revision mapping.
If we squash 1th and 2nd then the commits will be:
The hashes of all commits after the squash point are changed so I can't just do the squash. 😄 |
Thanks for the detailed comments!! |
Had a chat with @ikhoon and we decided to keep the current behavior. 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address all comments, PTAL. 🙇
import org.junit.jupiter.api.Test; | ||
import org.junit.jupiter.api.io.TempDir; | ||
|
||
class RepositoryMetadataDatabaseTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the test. 🙇
author, summary, detail, markup, applyingChanges, false); | ||
this.headRevision = res.revision; | ||
final InternalRepository secondaryRepo = this.secondaryRepo; | ||
if (secondaryRepo != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the two calls to commit aren't transactional.
We acquired writeLock
before committing so I think it's transactional and it shouldn't diverge.
Let me add an assertion for that. 😉
|
||
final InternalRepository repo = secondaryRepo != null ? secondaryRepo : primaryRepo; | ||
if (exceedsMinRetention(repo, headRevision, minRetentionCommits, minRetentionDays)) { | ||
return headRevision; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it seems like all commits up to headRevision will be squashed
The returned headRevision
will be used as the initial revision for creating the secondary repository. So it's not squashed. Did this what you mean?
requireNonNull(rollingRepositoryInitialRevision, "rollingRepositoryInitialRevision"); | ||
checkState(shouldCreateRollingRepository(rollingRepositoryInitialRevision, | ||
minRetentionCommits, minRetentionDays) == | ||
rollingRepositoryInitialRevision, "aaa"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😱
if (secondaryRepo != null) { | ||
promoteSecondaryRepo(secondaryRepo, rollingRepositoryInitialRevision); | ||
} else { | ||
createSecondaryRepo(rollingRepositoryInitialRevision); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good suggestion. 👍
Revision rollingRepositoryInitialRevision) { | ||
writeLock(); | ||
try { | ||
logger.info("Promoting the secondary repository in {}/{}.", parent.name(), originalRepoName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good suggestion. 👍
https://github.com/line/centraldogma/runs/5300517235?check_suite_focus=true#step:5:659 It seems like the cache of the build exists and isn't cleaned somehow. 😓 |
Motivation:
Central Dogma uses jGit to store data. Due to the nature of Git that stores unlimited history,
Central Dogma will eventually get in trouble managing disk usage.
We can handle this by maintaining the primary and secondary Git repositories internally.
This works in this way:
minRetentionCommits
), then the secondary Git repository is created.Modifications:
CreateRollingRepositoryCommand
that creates the rolling repository by theCommitRetentionManagementPlugin
.GitRepositoryV2
that manages the rolling jGit repositories.foo_0000000000
,foo_0000000001
,foo_0000000002
and so onRepositoryMetadataDatabase
has the suffix in its file database.GitRepository
is not removed for the migration test.InternalRepository
that has jGit repository andCommitIdDatabase
.diff
,watch
,history
, etc.) is lower than thefirstRevision
of the current primary repo?Revision.INIT(1)
is used, thefirstRevision
is used instead.diff(INIT, new Revision(100) ...)
is equals todiff(new Revision(firstRevisionNumber), new Revision(100) ...)
Revision
betweenRevision.INIT(1)
and thefirstRevision
is used, aRevisionNotFoundException
is raised.watch
andfindLatestRevision
.Result:
minRetentionCommits
while keeping the commits made in the recentminRetentionDay
.Todo:
minRetentionCommits
andminRetentionDay
for each repository.