Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heuristic for picking the chunk/batch size? #1542

Open
gdalle opened this issue Jun 17, 2024 · 10 comments · May be fixed by #1545
Open

Heuristic for picking the chunk/batch size? #1542

gdalle opened this issue Jun 17, 2024 · 10 comments · May be fixed by #1545

Comments

@gdalle
Copy link
Contributor

gdalle commented Jun 17, 2024

ForwardDiff has a heuristic for picking chunk size, with a default threshold of 12 dictated by memory bandwidth:

https://github.com/JuliaDiff/ForwardDiff.jl/blob/ff56092ed2960717ce45f53a90584898c232e74b/src/prelude.jl#L24-L34

https://github.com/JuliaDiff/ForwardDiff.jl/blob/ff56092ed2960717ce45f53a90584898c232e74b/src/prelude.jl#L8

Does Enzyme have something similar I could use? I seem to remember a graph showing performance as a function of chunk size, with a maximum around 8-12 as well, but it disappeared in the Slackhole

@wsmoses
Copy link
Member

wsmoses commented Jun 17, 2024

Not presently, but contributions welcome!

@gdalle
Copy link
Contributor Author

gdalle commented Jun 18, 2024

Do you know which graph I'm talking about? Is it in some publication online?

@vchuravy
Copy link
Member

I don't think it's in a publication, but around 10mins in @tgymnich has some in his talk at EnzymeCon https://youtu.be/nPN_Z5j6JDM?feature=shared

@tgymnich
Copy link
Member

image

@vchuravy
Copy link
Member

@tgymnich do you remember what machine you used for these measurements?

@wsmoses
Copy link
Member

wsmoses commented Jun 18, 2024

This will also now depend a lot more on the program in Julia.

for example, batching something with a linear solve will almost always be faster since we now do one linear solve to be reused for all chunks

@tgymnich
Copy link
Member

@vchuravy this must have been a bare metal AWS machine provided by @wsmoses. I believe it was with AVX512.

@gdalle
Copy link
Contributor Author

gdalle commented Jun 18, 2024

I'm asking because I'm including vector mode in DI, so it would be nice to have a function in Enzyme I can call to pick a decent chunk size if the user doesn't provide it. Even if the function is dumb at the moment, I feel like that's definitely something I don't want to decide myself

@vchuravy
Copy link
Member

8/16 should be a safe bet.

@wsmoses
Copy link
Member

wsmoses commented Jun 18, 2024

I'm asking because I'm including vector mode in DI, so it would be nice to have a function in Enzyme I can call to pick a decent chunk size if the user doesn't provide it. Even if the function is dumb at the moment, I feel like that's definitely something I don't want to decide myself

Sure, open a PR to enzyme to add a function which returns 16 for now and we can add more complex analysis later.

@gdalle gdalle linked a pull request Jun 18, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants