Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve TransposedConvolution layer. #1493

Merged
merged 6 commits into from Oct 12, 2019

Conversation

akhandait
Copy link
Member

No description provided.


TransposedConvolution<> module2(1, 1, 4, 4, 1, 1, 2, 2, 5, 5);
TransposedConvolution<> module2(1, 1, 3, 3, 1, 1, 1, 1, 6, 6, 6, 6);
Copy link
Member

@ShikharJ ShikharJ Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the original parameters here. These tests follow from the same Convolutional Arithmetic report, figure 4.1 and onwards.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShikharJ As far as I could figure out, tf.nn.conv2d_transpose doesn't allow size to go down(As it only allows VALID or SAME padding on the output size). In the above example, using the current formula for output size, it should be 4. Can you tell me what parameters(in the tf.nn.conv2d_transpose function) along with the below weights did you use to arrive at the value 2100?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akhandait I don't think the size should decrease, take a look at Figure 4.2, aren't you able to obtain an output size of 8? I don't remember the parameters from the top of my head, though I can derive them, just give me sometime, I need to reinstall tensorflow. I have sort of moved on to Pytorch these days.

Copy link
Member Author

@akhandait akhandait Aug 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the super late response. I understand the approach that was used. The output size calculated this way includes the padding added to the input of the equivalent convolution operation. However, most of the deep learning frameworks don't do that.(I tried Tensorflow and PyTorch. Thenao's approach is exactly the same as our new implementation.)
So, we will have to change the parameters. (but not the cases according to the paper)
For example, in this case, the output size should actually be 6(as shown in the figure, 1 + 1 is the padding and 8 is not the size of the original input. I tested it in PyTorch.) In PyTorch and our new implementation, this case should have parameters like this:

kernel size = 4
stride = 1
padding = 1(what all deep learning frameworks do is take the padding of the equivalent convolution operation, so here padding of 1 was used which changed the actual input size 6x6 to 8x8. We should output 6 and not 8.)
inputHeight and Width = 5
outputHight and Width = 6

So, 5 = (6 + 2(1) - 4) / 1 + 1
But, as you correctly mentioned, we should test all the cases from that paper. I am going to shift to PyTorch for calculating the values for testing as Tensorflow sadly only allows 'SAME' and VALID padding which doesn't allow us to test all the cases from that paper.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zoq Please verify this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShikharJ @zoq Can you please comment on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akhandait Sorry for the slow response, I don't see any reason against the change; I did also check PyTorch and as you already said it's the same parameter set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zoq Do you have any further comments on the PR? Can you give it a final review?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akhandait Your approach sounds reasonable to me, I was actually referring to the decreased kernel sizes which have now been fixed. I'll need to run the tests for the GANs and test their output, and cross check the tests as well. I'm planning on doing these over the weekend. I'll let you know what I find.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShikharJ Sounds good. Thanks!

@ShikharJ
Copy link
Member

@akhandait You can rebase to master now. #1491 is merged.

@akhandait akhandait force-pushed the transposed_conv_fix branch 2 times, most recently from 41654d0 to a3e5c2f Compare August 29, 2018 14:29
@zoq
Copy link
Member

zoq commented Sep 27, 2018

Looks like the Transposed layer test failed.

@akhandait
Copy link
Member Author

@zoq It should pass now.


TransposedConvolution<> module5(1, 1, 3, 3, 2, 2, 0, 0, 5, 5);
TransposedConvolution<> module5(1, 1, 3, 3, 2, 2, 0, 0, 2, 2, 5, 5);
Copy link
Member

@ShikharJ ShikharJ Oct 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the same input sizes here as well from the paper as done above? Also for module 6 and 7.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there's been some confusion. They already are according to the paper. The above module 5 has input size 2 and output size 5 which follows 4.5 in the paper, similarly the modules 6 and 7 follow 4.6 and 4.7 respectively.

@ShikharJ
Copy link
Member

ShikharJ commented Oct 1, 2018

@akhandait Sorry if I'm being unreasonable, but I'm unable to understand the rationale behind inserting the zeros on the input first and then applying convolution operation. Could you explain a bit on that? Also, I think since both the ForwardConvolutionRule and BackwardConvolutionRule are valid, I think it'd be better to use them in Forward and Backward functions respectively.

@akhandait
Copy link
Member Author

@zoq @ShikharJ Sorry for the inactivity on this one, I got busy with some other things. I will get back on this in a couple of days.

@akhandait
Copy link
Member Author

@ShikharJ Inserting zeros has been done to implement fractionally strided convolutions with the current convolution rules. Another way could have been to implement another convolution rule for fractionally strided convolution which performs it without actually adding zeros in the input. But this way for easier for the time being.

@akhandait
Copy link
Member Author

@ShikharJ Any updates here?

@ShikharJ
Copy link
Member

ShikharJ commented Jan 4, 2019

@akhandait Apologies, I'll get back on reviewing this and give you an update soon.

@zoq
Copy link
Member

zoq commented Mar 30, 2019

@akhandait looks like there is a merge conflict, I can resolve this one if you like, just let me know. @ShikharJ Any update on this one from your side?

@ShikharJ
Copy link
Member

@zoq I had been primarily busy with CycleGAN recently, but I think this is close to completion, just need to check the output on the GANs.

@akhandait
Copy link
Member Author

I resolved it, let's check the tests now.

@ShikharJ
Copy link
Member

ShikharJ commented Apr 6, 2019

@akhandait This seems to have a bit of style issues, do you think you can rectify them?

@akhandait
Copy link
Member Author

Hey @ShikharJ, Jenkins isn't showing me where the checks have failed, it instead shows a Stack trace. Can you please restart the Jenkins build as I am not sure how to?

@zoq
Copy link
Member

zoq commented May 13, 2019

@mlpack-jenkins test this please

@akhandait
Copy link
Member Author

@zoq @ShikharJ I can't understand the static checks warning here:
"The expression kW - padW - 1 > 0 will work as kW - padW != 1." - line 211, 233
Why is it != 1?
If they had said > 1, I wouldn't have worried about it, but I can't figure out this case.

@rcurtin
Copy link
Member

rcurtin commented May 25, 2019

@akhandait I think both kW and padW are size_t, so this has to do with the fact that negative numbers can't be represented. If kW < padW (which hopefully should not be the case) then kW - padW is some giant number greater than 0. So, the only situation for which we can make the condition kW - padW - 1 > 0 false is when kW - padW - 1 == 0, or, when kW - padW == 1 (hence, the suggested condition of kW - padW != 1. I think this is right, hope it helps. :)

@akhandait
Copy link
Member Author

@rcurtin Ahh I missed that, silly error. I will just cast it to int.

@@ -222,18 +214,23 @@ class TransposedConvolution
* @param input The input to be padded.
* @param wPad Padding width of the input.
* @param hPad Padding height of the input.
* @param wExtra The number of extra zeros to the right.
* @param hExtra The number of extra zeros to the bottom.
* @param output The padded output data.
*/
template<typename eT>
void Pad(const arma::Mat<eT>& input,
Copy link
Member

@saksham189 saksham189 Jul 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could introduce a Padding layer and use that inside the convolution layers to avoid redundant code. Pytorch already has support for quite a lot of padding layers. This would also make the implementation more clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saksham189 Has there been any discussion on this on irc after you posted this? I think this might be good idea and should be given some thought.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the slow response, makes totally sense to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akhandait I will work on adding the Padding layer and remove the redundant code in a new PR so that it is easy to review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the Padding layer is done and merged; should we incorporate those changes here, or maybe in another PR after this is merged? It may be worth opening an issue so we don't forget. :)

@mlpack-bot
Copy link

mlpack-bot bot commented Aug 4, 2019

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

@akhandait
Copy link
Member Author

@rcurtin I guess that would be much better for me right now, thanks! I'll push the changes tomorrow.

Copy link
Contributor

@toshal-a toshal-a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, it looks good. However, I found one small mistake. It's quite small. But maybe quite important.

weights.set_size((outSize * inSize * kW * kH) + outSize, 1);

aW = (outputWidth + kW - 2 * padW - 2) % dW;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are serializing these parameters then we don't need to calculate them while loading. Let me know if it's not correct. May be I missing something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, actually, this is a good point. If we are serializing aW and aH, then we have to handle reverse compatibility. (That's not hard but it's just a little bit tedious.) Since as far as I can tell aW and aH are always computed, I'd suggest that we just remove the two lines ar & BOOST_SERIALIZATION_NVP(aw); and ar & BOOST_SERIALIZATION_NVP(aH); above.

@rcurtin
Copy link
Member

rcurtin commented Sep 26, 2019

@walragatver pointed out in IRC today that we shouldn't wait on the release any longer, as mlpack 3.1.1 downloads ensmallen-latest.tar.gz---which as of ensmallen 2.10.0 no longer works! I agreed, so I've gone ahead and released 3.2.0 tonight (which doesn't have the same issue as 3.1.1 did with respect to the ensmallen version), and I've moved this to the 3.2.1 milestone. Once this is finished and merged we can easily issue a patch release. 👍

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any further comments on this one other than about the serialization; do we think that it's ready otherwise? I'd love to get this merged and release 3.2.2 soon. :)

weights.set_size((outSize * inSize * kW * kH) + outSize, 1);

aW = (outputWidth + kW - 2 * padW - 2) % dW;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, actually, this is a good point. If we are serializing aW and aH, then we have to handle reverse compatibility. (That's not hard but it's just a little bit tedious.) Since as far as I can tell aW and aH are always computed, I'd suggest that we just remove the two lines ar & BOOST_SERIALIZATION_NVP(aw); and ar & BOOST_SERIALIZATION_NVP(aH); above.

@rcurtin
Copy link
Member

rcurtin commented Oct 6, 2019

It looks like there is still a travis test failing, also. If nobody has any time, I can try and debug it.

@akhandait
Copy link
Member Author

Sorry for stalling here. I am having issues building mlpack on my system, so it became harder to work on this as I need to wait for travis to build and then debug through those errors. I'll still try completing this today.

@zoq
Copy link
Member

zoq commented Oct 7, 2019

Hopefully the build issues you see, are fixed with the latest ensmallen + mlpack version.

@rcurtin
Copy link
Member

rcurtin commented Oct 9, 2019

@akhandait no problem, and I'm happy to help try and debug your system if you like, just let me know. I think @zoq is right that the update of ensmallen might be the thing that makes a difference here; if you make sure the branch is rebased against master and then delete/remake your build directory, that should work out any issues (I hope).

@akhandait
Copy link
Member Author

@zoq It didn't :(. I am using ensmallen 2.10.3 and mlpack latest master(so rebase isn't an issue). I just posted the entire error on irc. Can you please see if it seems familiar? If not I will post the build config so that someone can reproduce it.

@akhandait
Copy link
Member Author

Hmm, one of the jobs in travis is still failing due to some reason. I don't think it's due to this PR though.

@rcurtin
Copy link
Member

rcurtin commented Oct 10, 2019

You're right, it just timed out. Let me try running it again and we'll see what happens...

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good to me and the tests are passing. 👍 There was one comment about using the Padding layer, so I think when we merge this we should be sure to open an issue about that:

https://github.com/mlpack/mlpack/pull/1493/files#r299114822

Also, either before merge or during merge, we should update HISTORY.md also. :)

Copy link

@mlpack-bot mlpack-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

@akhandait akhandait merged commit 44457ef into mlpack:master Oct 12, 2019
@akhandait
Copy link
Member Author

I forgot to update HISTORY.md before the merge, could someone please do it? Or should I commit directly to the repository?

@zoq
Copy link
Member

zoq commented Oct 12, 2019

Updated HISTORY.mdin 7d677d0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

None yet

6 participants