Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize pulls merging #4921

Merged
merged 11 commits into from
Jan 23, 2019
Merged

Conversation

typeless
Copy link
Contributor

@typeless typeless commented Sep 12, 2018

This is yet-another attempt to fix #601.

It passes the integration tests but not sure if it is sufficiently covered.

Benchmarking is not trivial to be automated within the existing testing framework because

  1. Adding a large test repo inside Gitea's integration facility is not desirable.
  2. Cloning an existing large repo from somewhere on the internet is very time-consuming and not appropriate to be run repeatedly.
  3. Testing small repo seems to be unreliable and I keep getting interrupted by 'database table is locked' along the way when looping through the pull merging test.

I still published this patch in hopes that someone can find any flaws early.

@codecov-io
Copy link

codecov-io commented Sep 12, 2018

Codecov Report

Merging #4921 into master will increase coverage by 0.03%.
The diff coverage is 47.82%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4921      +/-   ##
==========================================
+ Coverage   37.88%   37.92%   +0.03%     
==========================================
  Files         328      328              
  Lines       48084    48141      +57     
==========================================
+ Hits        18219    18256      +37     
- Misses      27246    27257      +11     
- Partials     2619     2628       +9
Impacted Files Coverage Δ
models/pull.go 51.04% <47.82%> (+0.39%) ⬆️
models/repo_list.go 63.29% <0%> (-1.27%) ⬇️
routers/repo/view.go 47.3% <0%> (+1.19%) ⬆️
models/unit.go 14.28% <0%> (+14.28%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7d43437...5c56a2a. Read the comment docs.

@bkcsoft bkcsoft added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Sep 12, 2018
@lunny
Copy link
Member

lunny commented Sep 12, 2018

Maybe we could create a docker to contain some popluar git repos for testing with Gitea and publish it to dockhub.

@lafriks lafriks added the type/enhancement An improvement of existing functionality label Sep 12, 2018
@lafriks lafriks added this to the 1.7.0 milestone Sep 12, 2018
@sapk
Copy link
Member

sapk commented Sep 18, 2018

@typeless You can have a look at https://github.com/go-gitea/gitea/blob/801843b0115e29ba2304fa6a5bea1ae169a58e02/integrations/benchmarks_test.go wich introduce a suite of repos and manage short flag for quick test. Benchmark are not currently run on CI so it does not have an impact on test time. You can use short flag when testing/dev and validate only at end by running all benchmarks.

@filipnavara
Copy link
Contributor

filipnavara commented Oct 17, 2018

I gave it a try, but "Rebase and merge" caused this:

2018/10/17 20:17:15 [...routers/repo/pull.go:579 MergePullRequest()] [E] Merge: git rebase [E:\Git\emclient\emclient.git -> E:/Tools/Gitea/data/tmp/local-repo/merge-387491500.git]: Cannot rebase: You have unstaged changes.
Please commit or stash them.

Adding the -s alone to git clone does help quite a bit though.

@typeless
Copy link
Contributor Author

@filipnavara Is there a simple way to reproduce it?

@filipnavara
Copy link
Contributor

Well, happened on very first attempt I tried. As far as I know git rebase works only when there is a checkout, so I would expect it to happen every time. However, if you think it may be more complicated than that I will be happy to share some test repo+configuration.

@typeless
Copy link
Contributor Author

typeless commented Oct 18, 2018

@filipnavara There is probably a bug. However, the code path following the git-fetch and git-read-tree does have done checkouts (see https://github.com/go-gitea/gitea/pull/4921/files#diff-ca95a2152012afcd9192d0ed961e6b9dL389).

Here is also a demo:

#!/bin/sh
set -e 
set -x
#rm -rf ./work
mkdir -p work && pushd work || exit 1

	mkdir -p repo1
	pushd repo1
		git init
		echo "Hello, " > README.md
		git add .
		git commit -m"Initial commit"
	popd
	
	git clone repo1 repo2
	
	pushd repo1
		echo "World" >> README.md
		git add -u
		git commit -m"Update"
	popd
	
	pushd repo2
		echo "Alice" >> README.md
		git add -u
		git commit -m"Add Alice"
	popd

	git clone -s --no-checkout -b master repo1 repo-tmp
	pushd repo-tmp
		git read-tree master
		git fetch ../repo2 master
		git checkout FETCH_HEAD

		## We don't care about conflits for this demo, so always choose 'ours'
		git rebase -s ours master
	popd
popd

This is a demo for merge, which does not require a checkout

#!/bin/sh
set -e 
set -x
#rm -rf ./work
mkdir -p work && pushd work || exit 1

	mkdir -p repo1
	pushd repo1
		git init
		echo "Hello, " > README.md
		git add .
		git commit -m"Initial commit"
	popd
	
	git clone repo1 repo2
	
	pushd repo1
		echo "World" >> README.md
		git add -u
		git commit -m"Update"
	popd
	
	pushd repo2
		echo "Alice" >> README.md
		git add -u
		git commit -m"Add Alice"
	popd

	git clone -s --no-checkout -b master repo1 repo-tmp
	pushd repo-tmp
		git read-tree master
		git fetch ../repo2 master

		## git-merge works without doing git-chekcout
		git merge --no-edit -X theirs FETCH_HEAD
	popd
popd

@typeless typeless force-pushed the merge-without-full-worktree branch 5 times, most recently from dfe96f4 to a1e5657 Compare October 24, 2018 12:06
@techknowlogick techknowlogick modified the milestones: 1.7.0, 1.8.0 Dec 19, 2018
@typeless typeless force-pushed the merge-without-full-worktree branch 4 times, most recently from 0acae62 to 8708de2 Compare December 21, 2018 07:28
@typeless
Copy link
Contributor Author

Fixed various bugs.

@filipnavara the problem you encountered should be fixed.

@typeless
Copy link
Contributor Author

Can anyone help review this?

I tested it manually (not very systematically though), and it is capable of merging an 11GiB repo (6.2GiB bare) now.

@filipnavara
Copy link
Contributor

I'll try to deploy it on our server and report back. Thanks

@typeless
Copy link
Contributor Author

@filipnavara Thanks. I have deployed to our server too, and deliberately doing more commits 😄
If your repositories exploded, you would not be alone.

Copy link
Contributor

@zeripath zeripath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@typeless - these changes look good to me, but see my comments in #601.

@bkcsoft bkcsoft removed the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Dec 27, 2018
@filipnavara
Copy link
Contributor

Yes

@typeless
Copy link
Contributor Author

typeless commented Jan 10, 2019

@filipnavara I just tested on our production server using the rebase merge-style for a 10+ GB repo.
The pull merge took less than 3 secs (not really measured it). But I did notice that when I pressed the "green" button to create the PR in the first place, it hung for a while in which it was doing git -c credential.helper= remote add -f 1547120927403349389 /home/git/git-repositories/my-org/my-huge-repo.git.

That was probably the culprit.

Edit: But it seems what you encountered was about merging the PR, not creation of it.

@filipnavara
Copy link
Contributor

@typeless On further inspections the "timeouts" were caused by an unrelated issue in the code that sends out notifications. I don't have exact performance data, but this PR makes it good enough for our general usage (~ 6Gb repository at the moment).

@typeless
Copy link
Contributor Author

typeless commented Jan 23, 2019

Given that 1.7.0 is released, please consider merging this.

@typeless
Copy link
Contributor Author

Need an LGTM to proceed.

@bkcsoft bkcsoft added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Jan 23, 2019
@lafriks lafriks merged commit 6ad834e into go-gitea:master Jan 23, 2019
bmackinney pushed a commit to bmackinney/gitea that referenced this pull request Jan 27, 2019
* Optimize pulls merging

By utilizing `git clone -s --no-checkout` rather than cloning the whole
repo.

* Use sparse-checkout to speedup pulls merge

* Use bytes.Buffer instead of strings.Builder for backward compatibility

* Fix empty diff-tree output for repos with only the initial commit

* Fix missing argument for the format string

* Rework diff-tree-list generation

* Remove logging code

* File list for sparse-checkout must be prefix with /

Otherwise, they would match all files with the same name under
subdirectories.

* Update onto the rebased head

* Use referecen repo to avoid fetching objects
@typeless typeless deleted the merge-without-full-worktree branch April 3, 2019 05:01
@twpedersen twpedersen mentioned this pull request Jan 8, 2020
7 tasks
@go-gitea go-gitea locked and limited conversation to collaborators Nov 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. type/enhancement An improvement of existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Merging PRs takes very long time for large repository
9 participants