Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for stripped down / inference only pytorch wheels #12609

Open
zeryx opened this issue Oct 12, 2018 · 4 comments
Open

Request for stripped down / inference only pytorch wheels #12609

zeryx opened this issue Oct 12, 2018 · 4 comments
Assignees
Labels
module: build Build system issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@zeryx
Copy link

zeryx commented Oct 12, 2018

馃殌 Feature

Creating a precompiled pytorch wheel file that is trimmed down, inference only version.

Motivation

Right now pytorch wheels are on average ~400MB zipped -> 1.+ GB unzipped, which is not a big deal for training & prototyping as generally the wheels are only installed once - but that's not the case for productionizing using service providers like sagemaker / algorithmia / etc.

Pitch

If we can create a trimmed down, potentially inference only capable wheel file - we can directly improve the load time performance of these algorithms in serverless algorithm delivery environments, which could directly pytorch's ability to compete in the HPC serverless marketplace.

Alternatives

We could also provide a clear way for users to create their own wheels, by simplifying and documenting the build process somewhat to enable optional features during the compilation process.

Additional context

Full disclosure, I'm an employee at Algorithmia and this change would make my life much easier 馃槃

cc @malfet @seemethere @walterddr

@ezyang
Copy link
Contributor

ezyang commented Oct 12, 2018

Do you need CPU only, or CUDA as well? A CPU wheel is substantially smaller than the CUDA one.

@soumith
Copy link
Member

soumith commented Oct 12, 2018

As edward said, the CPU-only wheel is 10x smaller, would it be sufficient for your purposes? you can download it from the website by selecting "CUDA: None"

@zeryx
Copy link
Author

zeryx commented Oct 12, 2018

CPU only definitely gets us somewhere - although we still find it slower to download and unpack than some other ML frameworks. However, we do also want to ensure that our users can access a gpu enabled wheel as well, while not sacrificing load time performance anywhere nearly as severely as they are today.
At the moment, because of how long the pytorch GPU wheel takes to zip & unzip on our infrastructure, inference algorithms (think autoencoders & other DNN models) that depend on the GPU wheel are nearly impossible to use.

@ezyang
Copy link
Contributor

ezyang commented Oct 13, 2018

I think it'd be generally a good idea for us to pave the wheel-building codepath. Users can very easily, without our intervention, get stripped down builds by, for example, specifying only the architectures they care about. It will be much harder to omit backward kernels, however.

@soumith soumith self-assigned this Nov 7, 2018
@jbschlosser jbschlosser added module: build Build system issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: build Build system issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants