New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return files modified by specified contributors #53206
Conversation
I'll finish the tests tomorrow |
I'm not quite sure I'm following on this one. Let's take an example query: Does that sound right? |
That scenario with the query with the revision specified makes sense. I was describing a different use case that might not be something we need to support for this first iteration. It's a bit of a verbose explanation so if you want to discuss in a call let me know. In short I don't believe the below scenario needs to be supported, I just wanted to clarify the gap. With this current implementation we know a file was modified by a set of contributors but we only visit the file contents at the revisions queried for. If the specific file the user wants only matches the pattern provided for an older revision (e.g. method was renamed/refactored since that older revision) then the file match won't be returned to the filter job in the first place. The user would need to provide a revision that matches the pattern. The I believe the |
Got it! Thanks for explaining. Skipping that functionality for now sounds good to me. |
…constructor. Case sensitivity change uses regexp package not casetransform.
…has contributor predicate.
Codenotify: Notifying subscribers in CODENOTIFY files for diff 6a0e88c...d7fb512.
|
@@ -514,7 +522,7 @@ func getPathRegexpsFromTextPatternInfo(patternInfo *search.TextPatternInfo) (pat | |||
|
|||
func computeFileMatchLimit(b query.Basic, p search.Protocol) int { | |||
// Temporary fix: | |||
// If doing ownership search, we post-filter results so we may need more than | |||
// If doing ownership or contributor search, we post-filter results so we may need more than |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the missing fix described in this comment meant we wouldn't process count
correctly and I started looking into implementing this. Then I realized our LimitJob
wraps our fileHasContributorJob
so count is handled, we just have some inefficiencies due to child limitJob.child (and related upstream jobs) having count:all
.
I'm just following the own approach and setting count:all
.
@camdencheek PR is updated with code review feedback. Let me know if you want to discuss in a call. Thanks! |
// We send one fetch contributors request per file path. | ||
// We should quit early on context deadline exceeded. | ||
if errors.Is(ctx.Err(), context.DeadlineExceeded) { | ||
errs = errors.Append(errs, ctx.Err()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be protected by the mutex, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're correct, will update
Not sure about this build failure. I'll take a look rn. Also I forgot about changelog and docs so I'll add those today. You can hold off on review or just approve now (I don't believe the build failure is related to my code) |
internal/search/query/predicate.go
Outdated
} | ||
|
||
func (f *FileHasContributorPredicate) Unmarshal(params string, negated bool) error { | ||
if _, err := regexp.Compile(params); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax.Parse()
is a bit cheaper than a full compile
A [user requested](https://sourcegraph.slack.com/archives/C02UM8A9M2R/p1678812044400949) being able to use `author:` keyword along with a pattern to search for file matches. Currently `author` keyword is valid for commit and diff search types. The use case is that a provided pattern will match a part of the file that was modified by the author and that matching line **may or may not** be part of the modified lines of the author's commit. To support this we add a new predicate `file:has.contributor()`. Some characteristics of the predicate: - _Contributor_ is an existing [struct](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/internal/gitserver/gitdomain/common.go?L208). - Regex is supported. We use same regex expression [logic](https://sourcegraph.com/github.com/sourcegraph/sourcegraph@2a63d0dc6ff8f520ff26a91c65b4a1366cf36712/-/blob/internal/gitserver/search/match_tree.go?L23-25) as `author`. - Case sensitivity can be applied for the contributor criteria of `file:has.contributor()` with the existing url parameter. - Negation is supported. - Multiple predicates will be AND'ed together by default if boolean operator not specified. This is how `has.owner` works, not sure if all of the predicates are aligned on this. General discrepancies are tracked and described in detail [here](#50539). - Filtering results happens **after** pattern matches are streamed from child job ([discussion](#50880 (comment)). For this reason a user might not always find what they're looking for if the pattern matches only an older revision of a file and that older revision is not the revision streamed back from the child job. [One of my proposals](#50880 (comment)) would resolve this by iterating through batches of commits with `commitSearcher` but I'm not sure about performance impact of this approach. [GH Issue](#50880) ## Test plan new tests for `FileHasContributorsJob`
A user requested being able to use
author:
keyword along with a pattern to search for file matches. Currentlyauthor
keyword is valid for commit and diff search types. The use case is that a provided pattern will match a part of the file that was modified by the author and that matching line may or may not be part of the modified lines of the author's commit.To support this we add a new predicate
file:has.contributor()
. Some characteristics of the predicate:author
.file:has.contributor()
with the existing url parameter.has.owner
works, not sure if all of the predicates are aligned on this. General discrepancies are tracked and described in detail here.commitSearcher
but I'm not sure about performance impact of this approach.GH Issue
Test plan
new tests for
FileHasContributorsJob