Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make FindEnvs faster for single and multiple paths #918

Merged
merged 4 commits into from
Aug 28, 2023

Conversation

julienduchesne
Copy link
Member

@julienduchesne julienduchesne commented Aug 25, 2023

  • Something curious caught my eye this week. On a full export of our entire repo (recursive on a single path), we got a runtime of ~3m. However, when passing most of our environments in command line args, we would get a runtime of ~5m.
  • The runtime of passing the root path and passing all env paths in command line should be mostly identical (even faster when passing paths explicitely since it doesn't have to search as much)
  • The main issue here is that the FindEnvs flow for a single path was multithreaded while for multiple paths, it wasn't

I've made all the code paths run the same underlying function and now everything is multi-threaded. To review, go commit by commit. I benchmarked in the first commit then refactored

Here are the benchmark results:

julienduchesne@Juliens-MacBook-Pro tanka % benchstat bench-old.txt bench-new.txt
name                      old time/op    new time/op    delta
FindEnvsFromSinglePath-8     1.74s ± 7%     1.49s ± 3%  -14.56%  (p=0.002 n=6+6)
FindEnvsFromPaths-8          3.19s ±14%     1.50s ± 3%  -53.07%  (p=0.002 n=6+6)

name                      old alloc/op   new alloc/op   delta
FindEnvsFromSinglePath-8     287MB ± 0%     284MB ± 0%   -1.13%  (p=0.002 n=6+6)
FindEnvsFromPaths-8          279MB ± 0%     284MB ± 0%   +1.73%  (p=0.004 n=5+6)

name                      old allocs/op  new allocs/op  delta
FindEnvsFromSinglePath-8     3.99M ± 0%     3.92M ± 0%   -1.63%  (p=0.002 n=6+6)
FindEnvsFromPaths-8          3.93M ± 0%     3.93M ± 0%     ~     (p=0.247 n=5+6)

@julienduchesne julienduchesne marked this pull request as ready for review August 25, 2023 02:27
@github-actions
Copy link

github-actions bot commented Aug 25, 2023

PR Preview Action v1.4.4
Preview removed because the pull request was closed.
2023-08-28 12:15 UTC

@julienduchesne julienduchesne force-pushed the julienduchesne/optimize-find-envs branch from 46a520b to 2f1eb09 Compare August 25, 2023 02:41
@julienduchesne julienduchesne requested a review from a team August 25, 2023 02:41
cmd/tk/export.go Show resolved Hide resolved
pkg/tanka/find.go Outdated Show resolved Hide resolved
pkg/tanka/find.go Outdated Show resolved Hide resolved
pkg/tanka/find.go Show resolved Hide resolved
Comment on lines 87 to 106
if res.err != nil {
return nil, res.err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me think you could use a sync.errgroup here. Those have direct error support. As a bonus we could arrange to pass a context in and then get nicer cancellation behaviour.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually converted it to return a list of errors as well

pkg/tanka/find.go Show resolved Hide resolved
pkg/tanka/find.go Show resolved Hide resolved
@julienduchesne julienduchesne force-pushed the julienduchesne/optimize-find-envs branch 2 times, most recently from f9fe489 to 9f31739 Compare August 25, 2023 12:29
@julienduchesne julienduchesne requested a review from a team August 25, 2023 12:31
Copy link
Member

@iainlane iainlane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one, thanks for the changes. I'll give the stamp

I did leave a couple more suggestions to consider, I think they'll make the code easier to follow, but if you want you can leave them.

!errors.As(err, &jpath.ErrorFileNotFound{}) {
return nil, findErr(path, err)
// find all jsonnet files within given paths
func findJsonnetFilesFromPaths(paths []string, opts FindOpts) ([]string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function and the one after it share the same structure, says to me there's some opportunities to refactor

pkg/tanka/find.go Outdated Show resolved Hide resolved
julienduchesne and others added 4 commits August 28, 2023 07:55
Write benchmarks for it. We'll try to improve those benchmarks!
- Something curious caught my eye this week. On a full export of our entire repo, we got a runtime of ~3m. However, when passing most of our environments in command line args, we would get a runtime of ~5m.
- The runtime of passing the root path and passing all env paths in command line should be mostly identical (even faster when passing paths explicitely since it doesn't have to search as much)
- The main issue here is that the `FindEnvs` flow for a single path was multithreaded while for multiple paths, it wasn't

I've made both functions run the same underlying function and now everything is multi-threaded

Here are the benchmark results:

```
julienduchesne@Juliens-MacBook-Pro tanka % benchstat bench-old.txt bench-new.txt
name                      old time/op    new time/op    delta
FindEnvsFromSinglePath-8     1.74s ± 7%     1.49s ± 3%  -14.56%  (p=0.002 n=6+6)
FindEnvsFromPaths-8          3.19s ±14%     1.50s ± 3%  -53.07%  (p=0.002 n=6+6)

name                      old alloc/op   new alloc/op   delta
FindEnvsFromSinglePath-8     287MB ± 0%     284MB ± 0%   -1.13%  (p=0.002 n=6+6)
FindEnvsFromPaths-8          279MB ± 0%     284MB ± 0%   +1.73%  (p=0.004 n=5+6)

name                      old allocs/op  new allocs/op  delta
FindEnvsFromSinglePath-8     3.99M ± 0%     3.92M ± 0%   -1.63%  (p=0.002 n=6+6)
FindEnvsFromPaths-8          3.93M ± 0%     3.93M ± 0%     ~     (p=0.247 n=5+6)
```
- fix parallelism option
- factor out the two functions (finding jsonnet files, finding envs)
- remove duplicate checking
- close channels earlier
Co-authored-by: Iain Lane <iain.lane@grafana.com>
@julienduchesne julienduchesne force-pushed the julienduchesne/optimize-find-envs branch from a46cd89 to acf4dc3 Compare August 28, 2023 11:55
@julienduchesne julienduchesne merged commit 3f93e71 into main Aug 28, 2023
3 checks passed
@julienduchesne julienduchesne deleted the julienduchesne/optimize-find-envs branch August 28, 2023 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants