-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
searcher: make indexing of repos concurrent #272
searcher: make indexing of repos concurrent #272
Conversation
It still needs some work with respect to the logging for different searchers. Right now, it's impossible to make any sense of the logs which are out of order because of the concurrency. With
The rest of the implementation is good for review. cc @kellegous |
You can simplify the goroutine logic by having a single channel that returns a single type with either a searcher or an error. A quick demo is here: https://play.golang.org/p/7ucBkx6MH_- . This way you know exactly how many results are coming back from the channel -- one per spawned goroutine. This layout also removes the waitgroup and the extra collector goroutine, and also makes it clear there are no data races with the maps. I think it's fine, but by updating them in the main goroutine it's clear by inspection. For the issue with interleaved log lines, you'd need to thread the repository name into the logging code so each log line was prefixed appropriately. This could be done by not using the standard global logger but creating separate |
I agree with @dgryski. The structure he proposes is almost exactly what I had in mind. Specifically, I like to think of this |
Thanks for dropping in. Great suggestions, @dgryski! I'll make changes accordingly. |
This patch attempts to improve the startup time of hound vastly in cases when we have huge number of repositories and hound would generally take long time to start because of its sequential nature of indexing. Now, we have the startup indexing in a concurrent way while respecting the config.MaxConcurrentIndexers parameter set by users in config.json. Fixes hound-search#250
Change the concurrent searchers init routine to simplify the implementation and use only one common channel to return successfully created searchers as well as errors. This way we also get rid of the go routine collecting the results on the channel by making the main routine collect the results by blocking the execution and listening on the channel.
`for range cfg.Repos` throws syntax error in golang 1.3. Hence using `len(cfg.Repos)` instead to iterate.
cb1ccc0
to
e74b892
Compare
searcher/searcher.go
Outdated
@@ -287,7 +287,7 @@ func MakeAll(cfg *config.Config) (map[string]*Searcher, map[string]error, error) | |||
lim := makeLimiter(cfg.MaxConcurrentIndexers) | |||
|
|||
// Channel to receive the results from newSearcherConcurrent function. | |||
resultCh := make(chan searcherResult, 1) | |||
resultCh := make(chan searcherResult) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgryski IMO, making this channel unbuffered won't solve the problem of blocking writes from goroutines. How is unbuffered channel better than buffered channel of length 1?
Buffered channel of length len(cfg.Repos)
, however, will make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I meant more that having a buffer size of 1 didn't make sense. Either make it totally unbuffered (so they all block), or with enough space for them all to put the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, since we aren't doing any heavy work after receiving on the channel, I feel it's okay to make the goroutines blocking on sending to the channel. What do you think? Do you have a preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Standard practice for launching a set of goroutines with a response channel is to have it buffered with the number of known entries so that none of them block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll use the standard practice then. Thanks 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for seeing this through.
This patch attempts to improve the startup time of hound vastly in cases
when we have huge number of repositories and hound would generally take
long time to start because of its sequential nature of indexing.
Now, we have the startup indexing in a concurrent way while respecting
the config.MaxConcurrentIndexers parameter set by users in config.json.
Fixes #250