Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sets depth=1 to Git clone/fetch #622

Merged

Conversation

brunoapimentel
Copy link
Contributor

@brunoapimentel brunoapimentel commented Dec 13, 2021

This is to avoid exhausting the resources while cloning very large repos, such as github.com/openshift/origin, which is consuming ~3GB memory during the cloning process.

In order to maintain compatibility, this behavior will only be applied if there is no need to keep the Git repository in the resulting tarball (i.e. include-git-dir flag is absent).

Maintainers will complete the following section

  • Commit messages are descriptive enough
  • n/a Code coverage from testing does not decrease and new code is covered
  • n/a OpenAPI schema is updated (if applicable)
  • n/a DB schema change has corresponding DB migration (if applicable)
  • README updated (if worker configuration changed, or if applicable)

@chmeliik
Copy link
Contributor

Will golang version resolution still work correctly?

@serg-cymbaluk
Copy link
Contributor

serg-cymbaluk commented Dec 14, 2021

@chmeliik

Will golang version resolution still work correctly?

It might be correct, since the 'get_golang_version' function does 'fetch' operation itself.
What kind of issue you expect there?

@chmeliik
Copy link
Contributor

It might be correct, since the 'get_golang_version' function does 'fetch' operation itself. What kind of issue you expect there?

Ah, I didn't realize we pass update_tags=True there. Should work fine then 👍

@chmeliik
Copy link
Contributor

chmeliik commented Dec 14, 2021

Another potential problem is that the include-git-dir flag exists, and by cloning with depth 1, we change the request output when that flag is used (previously would have been a full clone, now it will be a shallow clone). Maybe that's fine, I don't know if users rely on the full clone, but I would be careful

@brunoapimentel
Copy link
Contributor Author

Another potential problem is that the include-git-dir flag exists, and by cloning with depth 1, we change the request output when that flag is used (previously would have been a full clone, now it will be a shallow clone). Maybe that's fine, I don't know if users rely on the full clone, but I would be careful

I'm thinking we could either:
1- Use shallow clones only when include-git-dir is absent
2- Have a new flag to force shallow clones (it is only a single repo that is causing a crash)

@chmeliik
Copy link
Contributor

I'm thinking we could either: 1- Use shallow clones only when include-git-dir is absent 2- Have a new flag to force shallow clones (it is only a single repo that is causing a crash)

+1 for the first option

# Don't allow git to prompt for a username if we don't have access
env={"GIT_TERMINAL_PROMPT": "0"},
)

repo.remote().fetch(refspec=self.ref, depth=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to have this line twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the shallow clone always defaults to the lastest master commit, so we need to fetch the commit we're actually building.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah, I just forgot there are some lines between 170 and 194. My bad!

@brunoapimentel
Copy link
Contributor Author

I'm thinking we could either:
1- Use shallow clones only when include-git-dir is absent
2- Have a new flag to force shallow clones (it is only a single repo that is causing a crash)

I started writing approach #1, but I felt it looked hackish, like an unexpected behavior. I'm thinking #2 would be a better solution, since the user will need to explicitly ask for a shallow clone.

And if we go with #2, it would not affect how include-git-dir works now: it'll simply include the type of clone that was asked, deep (default) or shallow (via the new flag).

Wdyt, @serg-cymbaluk @ejegrova @akhmanova?

@serg-cymbaluk
Copy link
Contributor

I started writing approach #1, but I felt it looked hackish, like an unexpected behavior. I'm thinking #2 would be a better solution, since the user will need to explicitly ask for a shallow clone.

It is not hackish, since the purpose of that flag is to keep the git history. And if it is not needed, we can apply --depth=1 with no doubts.

The second one sounds more like a workaround.

@brunoapimentel
Copy link
Contributor Author

It is not hackish, since the purpose of that flag is to keep the git history. And if it is not needed, we can apply --depth=1 with no doubts.

The second one sounds more like a workaround.

Well, the purpose of the flag is to keep the whole git directory, although it seems the users are more interested in the refs in there.

I still feel like it is less clear to see why include-git-dir implicates a full clone (and it's absence a shallow clone) than having a flag that explicitly says shallow-clone, which is pretty self-explanatory. My overall feeling is more in the lines of "explicit is better than in implicit".

But I see your point, if we interpret the flag as include-git-history, it would make perfect sense. I could go with either: #2 seems more explicit, but #1 seems a little easier to implement (no API changes).

Maybe the feeling of hackish derived more from all the code layers I hade to "pierce" to pass a new argument... argh :D

@akhmanova
Copy link
Collaborator

@brunoapimentel I am fine with the first approach too. I would not create another flag for this purpose, just renaming will be fine. Let's just update docs and say that without this flag only the last commit will be cloned.

@brunoapimentel brunoapimentel marked this pull request as draft December 22, 2021 20:21
@brunoapimentel brunoapimentel force-pushed the git-depth-1 branch 2 times, most recently from 162990c to 72bd76b Compare December 29, 2021 20:43
@brunoapimentel brunoapimentel marked this pull request as ready for review December 30, 2021 15:21
@@ -143,7 +143,7 @@ def _create_archive(self, from_dir):
os.unlink(self.sources_dir.archive_path)
raise

def clone_and_archive(self, gitsubmodule=False):
def clone_and_archive(self, gitsubmodule=False, include_git_dir=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a line about include_git_dir param?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean in the docstring? Sure, I missed that one.

)
repo = git.repo.Repo.clone_from(self.url, clone_path, **kwargs)

if include_git_dir:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what if we need a special commit and it's not the last one (and include_git_dir is set to False)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If include_git_dir is set to False, a deep clone will be made, which means all commits will already be there. This is the current behavior, by the way.

@@ -89,6 +94,9 @@ def test_clone_and_archive(
mock_ugs.assert_called_once_with(mock_clone.return_value)
else:
mock_ugs.assert_not_called()
#
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not. Holiday mood was impeding my brain. :D

cachito/web/api_v1.py Outdated Show resolved Hide resolved
This is to avoid exhausting the resources while cloning very large
repos (such as github.com/openshift/origin, which consumes ~3GB
memory).

In case the "include-git-dir" flag is set, a deep clone will still
be used.

Signed-off-by: Bruno Pimentel <bpimente@redhat.com>
Copy link
Contributor

@serg-cymbaluk serg-cymbaluk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ejegrova ejegrova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@brunoapimentel brunoapimentel merged commit aa7d449 into containerbuildsystem:master Jan 4, 2022
@brunoapimentel brunoapimentel deleted the git-depth-1 branch January 4, 2022 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants