-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
search: call zoekt early #14093
search: call zoekt early #14093
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, determineRepos can return an alert (ie a resolver), which doResults can handle, but zoekt.go cannot. For a cleaner implementation with futures I suggest we refactor doResults and all functions in its call-hierarchy to return simple errors which are then converted to alerts somewhere higher in the call-chain.
Agreed this is very strange that it returns a resolver. I agree with your suggested solution, it shouldn't be returning a resolver but maybe some informative error or even a third type which is more alert specific that zoekt could ignore?
What are your thoughts on landing this code? Do you feel comfortable with how this is implemented from a coupling/future maintainability perspective? I'm a little unsure. However, it feels like if we only access repos via a promise + we have this mode flag feels good. I wonder if there is a way to change this a bit to prevent mistakes when making other changes in the future and not being aware of mode?
@@ -1729,12 +1735,14 @@ func (r *searchResolver) doResults(ctx context.Context, forceOnlyResultType stri | |||
|
|||
args := search.TextParameters{ | |||
PatternInfo: p, | |||
Repos: resolved.repoRevs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, makes sense you removed this. Is it possible that replacing this with just a chan may lead to issues where we try and read this channel more than once? Should we be using a little promise wrapper that has an idempotent implementation?
IE should we remove Repos from TextParameters completely and replace it with something promise like only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the PR, now with Repos
removed from TextParameters completely. I still have to make the new Get
context-aware, though.
} | ||
args.Repos = resolved.repoRevs | ||
if r.isGlobalSearch() { | ||
args.RepoPromise <- resolved.repoRevs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could just always send on this channel?
wg.Add(1) | ||
goroutine.Go(func() { | ||
defer wg.Done() | ||
agg.doFilePathSearch(ctx, &argsIndexed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now this ctx
will be canceled if we return due to error while resolving repos. This is good, but feels like something we could regress on in the future. Maybe to just be explicit we should have a context.WithCancel
in this conditional to keep that behaviour.
cmd/frontend/graphqlbackend/zoekt.go
Outdated
@@ -292,14 +294,21 @@ var errNoResultsInTimeout = errors.New("no results found in specified timeout") | |||
// case of finding partial or full results in the given timeout). | |||
func zoektSearch(ctx context.Context, args *search.TextParameters, repos *indexedRepoRevs, typ indexedRequestType, since func(t time.Time) time.Duration) (fm []*FileMatchResolver, limitHit bool, reposLimitHit map[string]struct{}, err error) { | |||
if len(repos.repoRevs) == 0 { | |||
return nil, false, nil, nil | |||
if args == nil || args.Mode != search.ZoektGlobalSearch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe combine this with the above line? Also why would args be nil? Just in tests? Maybe we should fix those tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, its the tests ;-) Let me see if I can change the tests instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: I was wrong. It is not just the tests. NewIndexedSearchRequest
will return args=nil
if for some reason we decide to return early.
cmd/frontend/graphqlbackend/zoekt.go
Outdated
select { | ||
case <-ctx.Done(): | ||
return nil, false, nil, ctx.Err() | ||
case repos := <-args.RepoPromise: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if in this case we can set something which does the same thing as GetRepoInputRev
, then we can potentially get rid of the branching between ZoektGlobalSearch in the for loop over resp.Files below.
eg maybe we set a function var getRepoInputRev(*zoekt.FileMatch) (repo *types.Repo, revs []string, ok bool)
?
c9affa3
to
546bc22
Compare
Codecov Report
@@ Coverage Diff @@
## main #14093 +/- ##
==========================================
- Coverage 47.73% 46.58% -1.15%
==========================================
Files 1524 1524
Lines 77640 77747 +107
Branches 6934 6934
==========================================
- Hits 37058 36218 -840
- Misses 36963 37902 +939
- Partials 3619 3627 +8
*This pull request uses carry forward flags. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I may send up some commits in another PR for you to take a look at around simplifying the promise. Tell me what ya think :)
@@ -251,16 +251,21 @@ func repoIsLess(i, j *types.Repo) bool { | |||
// 2 above (in the worst case scenario). | |||
// | |||
func paginatedSearchFilesInRepos(ctx context.Context, args *search.TextParameters, pagination *searchPaginationInfo) (*searchCursor, []SearchResultResolver, *searchResultsCommon, error) { | |||
repos, err := args.RepoPromise.Get(ctx) | |||
if err != nil { | |||
log15.Info("paginatedSearchFilesInRepos: context error while getting repos.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we be returning an error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100%. sure. The comment in line 270 implies we shouldn't, although I find it a bit strange.
// Timeouts are reported through searchResultsCommon so don't report an error for them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edit: returning an error now
@@ -1464,7 +1464,7 @@ func (r *searchResolver) determineRepos(ctx context.Context, tr *trace.Trace, st | |||
// diff and commit searches where more than repoLimit repos need to be searched. | |||
func alertOnSearchLimit(resultTypes []string, args *search.TextParameters) ([]string, *searchAlert) { | |||
limits := searchLimits() | |||
|
|||
repos, _ := args.RepoPromise.Get(context.Background()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can we ignore the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only error that can come back is a context error which at this point (inside alertOnSearchLimit
) seemed irrelevant to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: And, we don't have a context here, so we cannot even get a context error ;-)
finalQuery := zoektquery.NewAnd(&zoektquery.RepoBranches{Set: repos.repoBranches}, queryExceptRepos) | ||
var finalQuery zoektquery.Q | ||
if args.Mode == search.ZoektGlobalSearch { | ||
finalQuery = zoektquery.NewAnd(&zoektquery.Branch{Pattern: "HEAD", Exact: true}, queryExceptRepos) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some code comments here I think would be useful for future travellers :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:-) agreed
internal/search/types.go
Outdated
// Get returns the repository revisions. It blocks until the promise resolves or | ||
// the context is canceled. Further calls to Get will always return the original | ||
// results, IE err will stay nil even if the context expired between the first | ||
// and the second call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should also document that if the first calls ctx finishes while resolving, then we will always return that ctx.Error
@stefanhengl now that I played with the promise code I have some extra feedback. How do we resolve an error in the promise? I think we need a way to cancel the promise with error if resolving fails. See below it just needs to also store the error in resolve to work. The implementation below always respects the past in ctx and doesn't need a New method since it will init the channel when needed diff --git a/internal/search/types.go b/internal/search/types.go
index a911a8822f..a58d28ab46 100644
--- a/internal/search/types.go
+++ b/internal/search/types.go
@@ -82,20 +82,22 @@ const (
// channel. However, unlike a channel, a RepoPromise can only be set once, but
// consumed many times. Use Resolve to set the value, and Get to get the value.
type RepoPromise struct {
- closeOnce sync.Once
- repoChan chan []*RepositoryRevisions
+ initOnce sync.Once
+ done chan struct{}
- resolveOnce sync.Once
- repos []*RepositoryRevisions
-
- err error
+ reposOnce sync.Once
+ repos []*RepositoryRevisions
}
// NewRepoPromise returns an unresolved promise.
func NewRepoPromise() *RepoPromise {
- return &RepoPromise{
- repoChan: make(chan []*RepositoryRevisions, 1),
- }
+ return &RepoPromise{}
+}
+
+func (rp *RepoPromise) init() {
+ rp.initOnce.Do(func() {
+ rp.done = make(chan struct{})
+ })
}
// Get returns the repository revisions. It blocks until the promise resolves or
@@ -103,22 +105,22 @@ func NewRepoPromise() *RepoPromise {
// results, IE err will stay nil even if the context expired between the first
// and the second call.
func (rp *RepoPromise) Get(ctx context.Context) ([]*RepositoryRevisions, error) {
- rp.resolveOnce.Do(func() {
- select {
- case <-ctx.Done():
- rp.err = ctx.Err()
+ rp.init()
+ select {
+ case <-ctx.Done():
+ return nil, ctx.Err()
- case rp.repos = <-rp.repoChan:
- }
- })
- return rp.repos, rp.err
+ case <-rp.done:
+ return rp.repos, nil
+ }
}
// Resolve returns a promise that is resolved with a given value.
func (rp *RepoPromise) Resolve(repos []*RepositoryRevisions) *RepoPromise {
- rp.closeOnce.Do(func() {
- rp.repoChan <- repos
- close(rp.repoChan)
+ rp.reposOnce.Do(func() {
+ rp.init()
+ rp.repos = repos
+ close(rp.done)
})
return rp
} To make it clearer, we could even factor the promise out with interface and then create a type safe wrapper. |
Can't help myself since this is fun code to write, but here is an interface{} based promise type Promise struct {
initOnce sync.Once
done chan struct{}
valueOnce sync.Once
value interface{}
}
func (p *Promise) init() {
p.initOnce.Do(func() { p.done = make(chan struct{}) })
}
func (p *Promise) Resolve(v interface{}) {
p.valueOnce.Do(func() {
p.init()
p.value = v
close(p.done)
})
}
func (p *Promise) Get(ctx context.Context) (interface{}, error) {
p.init()
select {
case <-ctx.Done():
return nil, ctx.Err()
case <-p.done:
return p.value, nil
}
} |
Notifying subscribers in CODENOTIFY files for diff d082943...a9d31f1.
|
7bbfc34
to
fe46b9a
Compare
// will stay nil even if the context expired between the first and the second | ||
// call. | ||
func (p *Promise) Get(ctx context.Context) (interface{}, error) { | ||
p.getOnce.Do(func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need getOnce
to make sure that we don't run the select block twice. In case the context is canceled AND p.done
is closed, the select block will randomly return either nil
or ctx.Err()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the getOnce. you can always select on p.done, don't need to only do it once. Thats the reason why in the promise impl I did I switched away from sending the value down the channel and instead just relied on done channel.
} | ||
|
||
// Resolve returns a promise that is resolved with a given value. | ||
func (p *Promise) Resolve(v interface{}) *Promise { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning *Promise
from Resolve
is not really necessary, but it lets us write code like this p:=&Promise{}.Resolve
which is nice.
update promise implementation
902264e
to
76e7d26
Compare
@keegancsmith I slightly modified your suggestion for an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets get this out. One inline comment about the promise impl. Nice stuff
// will stay nil even if the context expired between the first and the second | ||
// call. | ||
func (p *Promise) Get(ctx context.Context) (interface{}, error) { | ||
p.getOnce.Do(func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the getOnce. you can always select on p.done, don't need to only do it once. Thats the reason why in the promise impl I did I switched away from sending the value down the channel and instead just relied on done channel.
For global queries, we call zoekt before resolving and splitting repositories. The scope of this PR is limited to
file
andpath
results for literal search without version contexts.Why?
traces of global queries shows that a significant amount of time is spent on handling large lists of repository references. Specifically, we spent a lot of time on:
For global queries, we can save time, because zoekt can simply search all shards without having to match them against a list of indexed repositories.
We still have to filter the results coming from zoekt to make sure the user sees only repositories she has access to.