Use SnoopPrecompile.jl #140

lorenzoh · 2022-10-22T16:47:52Z

This adds a basic precompile statement using SnoopPrecompile.jl.

This reduces the Time-to-first-fit! by

Measurements:

using FluxTraining: 21s (this PR), 19s (master) -> 2s slower
fit!(testlearner(), 1): 14.5s (this PR), 30s (master) -> 15s faster
both: 35.5s (this PR), 49s (master) -> 13.5s/40% faster

This seems like a clear win for me, except for the longer precompilation time which will only occur once for regular package usage. Has anyone tried using SnoopPrecompile.jl for other packages in the FluxML org?

github-actions · 2022-10-22T16:56:58Z

A documentation preview has been successfully built, view it here: Documentation preview PR-140

ToucheSir · 2022-10-22T17:02:49Z

Has anyone tried using SnoopPrecompile.jl for other packages in the FluxML org

I use it for Zygote in FluxML/Zygote.jl#1281, but didn't get nearly the same speedup because it seems to be bottlenecked by LLVM time. This is quite the improvement!

lorenzoh · 2022-10-22T17:11:36Z

So should I go ahead with this? I'm not sure how much of this actually comes from Flux.jl vs FluxTraining.jl. Maybe we should try this with Flux.jl as well.

ToucheSir · 2022-10-22T17:20:23Z

We should, though that seems like a much bigger project given the size of Flux's API. If you're able to @snoopi_deep that fit!(testlearner(), 1) call, we could look at the flamegraph and see how much is Flux vs FluxTraining (vs Zygote).

lorenzoh · 2022-10-22T18:28:37Z

The 3 big chunks are all Zygote.jl, so I am estimating around 2/3 of the inference time is Zygote.jl

ToucheSir · 2022-10-22T18:33:52Z

Good to know. I just rebased the Zygote PR, are you able to test again with it?

lorenzoh · 2022-10-22T18:42:28Z

Yup, now testing with FluxTraining#master and Zygote#bc/precompile:

using FluxTraining: 18.2s
fit!(testlearner(), 1): 18s
both: 36.2s

Safe to say that Zygote.jl is the culprit here :P . I think this implies the downstream improvements from #bc/precompile are more significant than for Zygote.jl itself. If that's the case, we should definitely merge that one.

lorenzoh · 2022-10-22T18:55:16Z

Finally, the one with precompilation in both Zygote and FluxTraining is even better:

using FluxTraining: 22s
fit!(testlearner(), 1): 10.8s
total: 32.8s

So I will be merging this PR as well if it looks good to you Brian.

Add precompile statement

69f25d2

ToucheSir mentioned this pull request Oct 22, 2022

Add precompilation via SnoopPrecompile FluxML/Zygote.jl#1281

Merged

ToucheSir approved these changes Oct 22, 2022

View reviewed changes

lorenzoh merged commit 628c720 into master Oct 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use SnoopPrecompile.jl #140

Use SnoopPrecompile.jl #140

lorenzoh commented Oct 22, 2022

github-actions bot commented Oct 22, 2022

ToucheSir commented Oct 22, 2022

lorenzoh commented Oct 22, 2022

ToucheSir commented Oct 22, 2022

lorenzoh commented Oct 22, 2022

ToucheSir commented Oct 22, 2022

lorenzoh commented Oct 22, 2022

lorenzoh commented Oct 22, 2022 •

edited

Use SnoopPrecompile.jl #140

Use SnoopPrecompile.jl #140

Conversation

lorenzoh commented Oct 22, 2022

github-actions bot commented Oct 22, 2022

ToucheSir commented Oct 22, 2022

lorenzoh commented Oct 22, 2022

ToucheSir commented Oct 22, 2022

lorenzoh commented Oct 22, 2022

ToucheSir commented Oct 22, 2022

lorenzoh commented Oct 22, 2022

lorenzoh commented Oct 22, 2022 • edited

lorenzoh commented Oct 22, 2022 •

edited