Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SnoopPrecompile.jl #140

Merged
merged 1 commit into from Oct 22, 2022
Merged

Use SnoopPrecompile.jl #140

merged 1 commit into from Oct 22, 2022

Conversation

lorenzoh
Copy link
Member

This adds a basic precompile statement using SnoopPrecompile.jl.

This reduces the Time-to-first-fit! by

Measurements:

  • using FluxTraining: 21s (this PR), 19s (master) -> 2s slower
  • fit!(testlearner(), 1): 14.5s (this PR), 30s (master) -> 15s faster
  • both: 35.5s (this PR), 49s (master) -> 13.5s/40% faster

This seems like a clear win for me, except for the longer precompilation time which will only occur once for regular package usage. Has anyone tried using SnoopPrecompile.jl for other packages in the FluxML org?

@github-actions
Copy link
Contributor

A documentation preview has been successfully built, view it here: Documentation preview PR-140

@ToucheSir
Copy link
Member

Has anyone tried using SnoopPrecompile.jl for other packages in the FluxML org

I use it for Zygote in FluxML/Zygote.jl#1281, but didn't get nearly the same speedup because it seems to be bottlenecked by LLVM time. This is quite the improvement!

@lorenzoh
Copy link
Member Author

So should I go ahead with this? I'm not sure how much of this actually comes from Flux.jl vs FluxTraining.jl. Maybe we should try this with Flux.jl as well.

@ToucheSir
Copy link
Member

We should, though that seems like a much bigger project given the size of Flux's API. If you're able to @snoopi_deep that fit!(testlearner(), 1) call, we could look at the flamegraph and see how much is Flux vs FluxTraining (vs Zygote).

@lorenzoh
Copy link
Member Author

image
The 3 big chunks are all Zygote.jl, so I am estimating around 2/3 of the inference time is Zygote.jl

@ToucheSir
Copy link
Member

Good to know. I just rebased the Zygote PR, are you able to test again with it?

@lorenzoh
Copy link
Member Author

Yup, now testing with FluxTraining#master and Zygote#bc/precompile:

  • using FluxTraining: 18.2s
  • fit!(testlearner(), 1): 18s
  • both: 36.2s

Safe to say that Zygote.jl is the culprit here :P . I think this implies the downstream improvements from #bc/precompile are more significant than for Zygote.jl itself. If that's the case, we should definitely merge that one.

@lorenzoh
Copy link
Member Author

lorenzoh commented Oct 22, 2022

Finally, the one with precompilation in both Zygote and FluxTraining is even better:

  • using FluxTraining: 22s
  • fit!(testlearner(), 1): 10.8s
  • total: 32.8s

So I will be merging this PR as well if it looks good to you Brian.

@lorenzoh lorenzoh merged commit 628c720 into master Oct 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants