New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1x1 or 3x3 stem conv? #22
Comments
Update:
From my current experiment, I am guessing that the 1x1 in your paper is a typo and should be 3x3. However, it appears as 1x1 both in Fig1 and Tab2, making this unlikely. So I'm looking forward to your clarification. |
Hi Lucas. don't be surprised that 1x1 and 3x3 conv are similar on GPU (from my past tests, 3x3 is usual 1.5-2 times slower). i don't think replacing the 3x3 conv with 1x1 should give Nans. make sure you initialize it properly Tal |
Thanks for your quick answer! Yeah, I agree the NaNs is suspicious and unexpected. However, using 1x1 with s2d will make the receptive-field size of the stem only 4x4, whereas original stem has 11x11 after the max-pool, and this difference will amplify a lot throughout the full network. Using 3x3 with s2d makes the receptive field of the stem 12x12, so almost exactly the same as original. So, I agree with your code, 3x3 makes a lot more sense. |
Hi, just like you, I wanted to try s2d stem after reading the isometric nets paper :)
I noticed that in your paper, Figure 1, you show using 4x4 s2d followed by a 1x1 conv64. However, in your code here you clearly follow the 4x4 s2d by a 3x3 conv64. So, which one is used for the results in the paper?
The text was updated successfully, but these errors were encountered: