Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1x1 or 3x3 stem conv? #22

Closed
lucasb-eyer opened this issue Jul 26, 2020 · 3 comments
Closed

1x1 or 3x3 stem conv? #22

lucasb-eyer opened this issue Jul 26, 2020 · 3 comments

Comments

@lucasb-eyer
Copy link

Hi, just like you, I wanted to try s2d stem after reading the isometric nets paper :)

I noticed that in your paper, Figure 1, you show using 4x4 s2d followed by a 1x1 conv64. However, in your code here you clearly follow the 4x4 s2d by a 3x3 conv64. So, which one is used for the results in the paper?

@lucasb-eyer
Copy link
Author

Update:

  1. I did try both in my setting, and 1x1 conv gives NaN loss very early, while 3x3 actually works.
  2. Speed-wise, it seems both 1x1 and 3x3 are similar (I'm surprised by this), and both faster than the original stem.
  3. In the isometric nets paper, Table 2 also suggests they're using 1x1, while text in bottom-left of Page4 suggests 3x3. Hard to tell without code, I'll reach out to the authors.

From my current experiment, I am guessing that the 1x1 in your paper is a typo and should be 3x3. However, it appears as 1x1 both in Fig1 and Tab2, making this unlikely. So I'm looking forward to your clarification.

@mrT23
Copy link
Collaborator

mrT23 commented Jul 27, 2020

Hi Lucas.
Thanks for the comment.
You are correct, there is a mismatch between the code and the paper.
the code is correct, it should be 3x3

don't be surprised that 1x1 and 3x3 conv are similar on GPU (from my past tests, 3x3 is usual 1.5-2 times slower).
that's because GPUs are limited by memory access, not flops. due to caching, 3x3 conv has a similar memory cost as 1x1 conv.

i don't think replacing the 3x3 conv with 1x1 should give Nans. make sure you initialize it properly

Tal

@lucasb-eyer
Copy link
Author

Thanks for your quick answer! Yeah, I agree the NaNs is suspicious and unexpected.

However, using 1x1 with s2d will make the receptive-field size of the stem only 4x4, whereas original stem has 11x11 after the max-pool, and this difference will amplify a lot throughout the full network. Using 3x3 with s2d makes the receptive field of the stem 12x12, so almost exactly the same as original. So, I agree with your code, 3x3 makes a lot more sense.

@mrT23 mrT23 closed this as completed Aug 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants