Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make initialization of GoogleNet / Inception faster #2166

Closed
fmassa opened this issue Apr 30, 2020 · 3 comments · Fixed by #2170
Closed

Make initialization of GoogleNet / Inception faster #2166

fmassa opened this issue Apr 30, 2020 · 3 comments · Fixed by #2170

Comments

@fmassa
Copy link
Member

fmassa commented Apr 30, 2020

There have been several reports from users that GoogleNet and Inception are very slow to construct, see #1797 , #1977 and #2145 for example.

The underlying issue is that these models use scipy.truncnorm, for which the implementation was recently updated and became 100x slower than it was before, see scipy/scipy#11299 for reference. This slowdown has been fixed in scipy and will be present in the 1.5.0 release, but in the meantime, users of torchvision still obtain very long startup times.

I think the simplest alternative is to make init_weights default to False, and use a weight initialization from PyTorch instead. This is BC-breaking for users who want to train Inception from scratch, but I'm not sure how much it will affect users in general.

@bisakhmondal
Copy link
Contributor

Thanks for openning an seperate issue for this. As you said i was looking into the Scipy issue and trying to understand the behaviour of

if init_weights:
for m in self.modules():
if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
import scipy.stats as stats
stddev = m.stddev if hasattr(m, 'stddev') else 0.1
X = stats.truncnorm(-2, 2, scale=stddev)
values = torch.as_tensor(X.rvs(m.weight.numel()), dtype=m.weight.dtype)
values = values.view(m.weight.size())
with torch.no_grad():
m.weight.copy_(values)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
.
I think weight initialization with PyTorch nn.init module can significantly improve the scenario. I have tested with

modules=[
        nn.Conv2d(3,512,2),
        nn.BatchNorm2d(512),
        nn.Linear(512,3) ]  X 17 times

for weight initialization, the current implementation takes around 1.21 sec, where the nn.init.uniform_ API takes 0.0048 sec, I think it is far better until next scipy release.

model=nn.Sequential(*modules)
def weight_init(m):
  if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
    nn.init.uniform_(m.weight,-2,2)
  elif isinstance(m, nn.BatchNorm2d):
    nn.init.constant_(m.weight, 1)
    nn.init.constant_(m.bias, 0)
model.apply(weight_init)

Would you mind if I work on it?

@fmassa
Copy link
Member Author

fmassa commented Apr 30, 2020

Hi,

Sure, please it would be great if you could work on it.

I think that we should still keep the old behavior if the user wants, and raise a warning to make users aware of it.

One option would be to change the default value of init_weights to be None, and raise a warning if it's None (forcing the users to be aware of it until they explicitly set the value to either True or False).

@bisakhmondal
Copy link
Contributor

Hii,

One option would be to change the default value of init_weights to be None, and raise a warning if it's None (forcing the users to be aware of it until they explicitly set the value to either True or False).

Sure. That would be better.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants