Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blame is very slow #14

Open
mkhadilk opened this issue Mar 29, 2020 · 14 comments
Open

Blame is very slow #14

mkhadilk opened this issue Mar 29, 2020 · 14 comments
Labels
help wanted Extra attention is needed

Comments

@mkhadilk
Copy link

I tested a Blame and as compared to CLI it is very slow. CLI takes around < 1 and go-git takes 35 second.

@mcuadros mcuadros added the help wanted Extra attention is needed label Apr 3, 2020
@mkhadilk
Copy link
Author

mkhadilk commented Apr 5, 2020

I would have helped by fork and an implementation by diff. But, may not get time till May. And while looking at the code, I realized that FilePatches does not give me proper 'rename'.

I was expecting that "from, to" in the case of rename will be correct. But it is not. That is critical as blame depends on that.

@mkhadilk
Copy link
Author

mkhadilk commented Apr 5, 2020

type FilePatch interface {
// IsBinary returns true if this patch is representing a binary file.
IsBinary() bool
// Files returns the from and to Files, with all the necessary metadata to
// about them. If the patch creates a new file, "from" will be nil.
// If the patch deletes a file, "to" will be nil.

Files() (from, to File)
// Chunks returns a slice of ordered changes to transform "from" File to
// "to" File. If the file is a binary one, Chunks will be empty.
Chunks() []Chunk
}

But what if it is a rename?

@mcuadros
Copy link
Member

mcuadros commented Apr 6, 2020

I am not the original author of this part, and It's a long time since I don't touch this part. But what I remember is that the rename it's no handled because it's not part of the diff process it's a heuristic with a threshold, after the process of making the diff.

@erizocosmico
Copy link
Contributor

@mkhadilk rename detection has just landed in #38

@patrickdevivo
Copy link

I noticed this as well and ended up shelling out to git blame ... as a more performant alternative. The code for which can be found here: https://github.com/augmentable-dev/tickgit/blob/master/pkg/blame/blame.go (probably worth adding a README to that), but just leaving a note in case anyone gets blocked/stuck on the performance of blame in go-git!

@mkhadilk
Copy link
Author

I did exactly the same by doing exec but then that also becomes a problem as need to parse the output...

@cyw3
Copy link

cyw3 commented Jun 21, 2022

Is there any way to speed up?

@donatj
Copy link

donatj commented Aug 15, 2022

Not to pile on, but I work on a pretty large repo. Using Blame here on our README.md takes over 20 seconds. I really thought I was doing something wrong until I found this ticket.

Compare git blame to this

image

image

@github-actions
Copy link

To help us keep things tidy and focus on the active tasks, we've introduced a stale bot to spot issues/PRs that haven't had any activity in a while.

This particular issue hasn't had any updates or activity in the past 90 days, so it's been labeled as 'stale'. If it remains inactive for the next 30 days, it'll be automatically closed.

We understand everyone's busy, but if this issue is still important to you, please feel free to add a comment or make an update to keep it active.

Thanks for your understanding and cooperation!

@github-actions github-actions bot added the stale Issues/PRs that are marked for closure due to inactivity label Sep 29, 2023
@pjbgf
Copy link
Member

pjbgf commented Sep 29, 2023

The blame feature was rewrote (#789) since the last activity on this issue and introduced some performance improvements. So I am closing it for the time being. If the issue persists we can either reopen this or create a new one.

@pjbgf pjbgf closed this as completed Sep 29, 2023
@iainjreid
Copy link

iainjreid commented Apr 4, 2024

Sadly, I believe this might still may be an issue. For small repositories that contain no more than a few hundred commits at most, the time to get the blame for a given file is nothing to complain about, but for larger repositories with a few thousand commits and a complex history, the process grinds to a halt.

Quite where the thresholds are, I can't be sure, but I do get the sense that the overall complexity of the repository has an excessive impact on the performance of the blame functionality.

This is critical to a piece of work I'm currently working on, so I'll start to take a look at the blame source and see what I can find. I thought it would be best to post here now, though, just to give the community a heads-up.

Edit: For transparency, and if a more competent person spots that I am simply doing something wrong please do call me out, this is the function that I have found causes trouble in my project, where git.Blame will happily take a few minutes to load the contents of a file:

// GetFileContents will attempt to read the contents of a file and return each
// line with the most recent blame information.
func (c *Commit) GetFileContents(filepath string) ([]*git.Line, error) {
	file, err := git.Blame(c.ptr, path.Clean(filepath))

	if err != nil {
		return []*git.Line{}, &FileNotFoundError{
			Filepath: filepath,
		}
	}

	return file.Lines, nil
}

@pjbgf
Copy link
Member

pjbgf commented Apr 4, 2024

@iainjreid can you please confirm what version of go-git you are using?

@pjbgf pjbgf reopened this Apr 4, 2024
@github-actions github-actions bot removed the stale Issues/PRs that are marked for closure due to inactivity label Apr 5, 2024
@iainjreid
Copy link

@pjbgf v5.12.0

// go.sum
github.com/go-git/go-git/v5 v5.12.0 h1:7Md+ndsjrzZxbddRDZjF14qK+NN56sy6wkqaVrjZtys=
github.com/go-git/go-git/v5 v5.12.0/go.mod h1:FTM9VKtnI2m65hNI/TenDDDnUf2Q9FHnXYjuz9i5OEY=

@iainjreid
Copy link

After some more testing using the in-memory storage (which is immeasurably faster, but the bottleneck still persists), I believe this issue is simply made apparent and worse by my SQL-backed storage.

My gut feeling is that as the commit history is being walked, the round trip cost of whichever storage mechanism is being used becomes an overwhelmingly high overhead?

I'll run some tests locally and report back 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

8 participants