Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Resnet Module #61

Closed
wants to merge 44 commits into from
Closed

Conversation

Aakash-kaushik
Copy link
Contributor

This PR aims to implement resnet module which would be able to create all the resnet variants from the paper and this aims to follow the same architecture as PyTorch for some reasons.

  1. We can't train so many models on imagenet right now.
  2. We don't know if they will converge.
  3. keeping the same architecture that allows us to get weights from PyTorch.

Things i have some doubts about:

  1. How would the residual block be implemented from the sequential layer?
  2. Can I get the output of a layer at a random stage and add it to another layer? (skip connections)

Resources:

  1. Pytorch's resnet implementation.
  2. Resnet paper.

Aakash-kaushik and others added 10 commits April 21, 2021 09:59
* test path change

* configured tests

* trying test fix

* trying to fix windows build

* removed unit testing from cmake

* Update windows-steps.yaml

* trying windows fix

* try fix, P.S. copied from mlpack

* anotehr try

* dir files display windows

* namespace except tests dir

* namespace added
* namespace except tests dir

* namespace added

* Catch2 (#6)

* Update windows-steps.yaml

* Update windows-steps.yaml

* copying dll and lib files to bin

* copy dll and lib to build/test

* copy dll and lib to build/test

* cleanup

* added exclusion for catch

* style check solve

* fix style check

* applied some suggestions

* added new line in main.cpp

* style check warnings

* updating with models/master (#7)

* Update windows-steps.yaml

* Update windows-steps.yaml

* copying dll and lib files to bin

* copy dll and lib to build/test

* copy dll and lib to build/test

* cleanup

* added exclusion for catch

* style check solve

* fix style check

* trigger build

Co-authored-by: kartikdutt18 <39593019+kartikdutt18@users.noreply.github.com>

* removed mlpack::ann::models in favour of mlpack::models

* style checks

* style checks

* ctest tests add

* ctest parsing

* added catch.cmake

* build fix

* test name fix

* syntax error fix

* removed main test from ctest as that would run tests 2 times

* specifying CMAKE_INSTALL_PREFIX

* reverting from --list-test-name-only to --list-tests

* update cmake_install_prefix

* turn off mlpack debugging in models repo

Co-authored-by: kartikdutt18 <39593019+kartikdutt18@users.noreply.github.com>
@zoq
Copy link
Member

zoq commented May 30, 2021

The idea of the sequential layer is that it wraps arbitrary layers and exposes them as if it would be a single layer. The sequential layer has a template parameter (https://github.com/mlpack/mlpack/blob/83e70110595eaf3cf3758f270433801e673615b2/src/mlpack/methods/ann/layer/sequential.hpp#L70), which tells the layer to add the input to the output of the last layer. There is also a convenient typedef (https://github.com/mlpack/mlpack/blob/83e70110595eaf3cf3758f270433801e673615b2/src/mlpack/methods/ann/layer/sequential.hpp#L260-L261) that already sets the template parameter for you. Below is an elementary example:

 Residual<>* residual = new Residual<>(true);

Linear<>* linearA = new Linear<>(10, 10);
Linear<>* linearB = new Linear<>(10, 10);
	 
residual->Add(linearA);
residual->Add(linearB);

in this case linearA and linearB are run and the input is also added to the output of the last layer, which in this case is linearB.

There is also a test case - https://github.com/mlpack/mlpack/blob/83e70110595eaf3cf3758f270433801e673615b2/src/mlpack/tests/ann_layer_test.cpp#L3325

@Aakash-kaushik
Copy link
Contributor Author

Aakash-kaushik commented May 30, 2021

The idea of the sequential layer is that it wraps arbitrary layers and exposes them as if it would be a single layer. The sequential layer has a template parameter (https://github.com/mlpack/mlpack/blob/83e70110595eaf3cf3758f270433801e673615b2/src/mlpack/methods/ann/layer/sequential.hpp#L70), which tells the layer to add the input to the output of the last layer. There is also a convenient typedef (https://github.com/mlpack/mlpack/blob/83e70110595eaf3cf3758f270433801e673615b2/src/mlpack/methods/ann/layer/sequential.hpp#L260-L261) that already sets the template parameter for you. Below is an elementary example:

 Residual<>* residual = new Residual<>(true);

Linear<>* linearA = new Linear<>(10, 10);
Linear<>* linearB = new Linear<>(10, 10);
	 
residual->Add(linearA);
residual->Add(linearB);

in this case linearA and linearB are run and the input is also added to the output of the last layer, which in this case is linearB.

There is also a test case - https://github.com/mlpack/mlpack/blob/83e70110595eaf3cf3758f270433801e673615b2/src/mlpack/tests/ann_layer_test.cpp#L3325

Hi thanks @zoq for this but the part that confused me was that it checks in the code that if the dimensions of the first layer are equal to the last layer or not and for ResNet there would be a case where the input dim of the first layer will be diff from the last one and so I need a conv 1*1 block just for the first layer input that is not run like all layers but separately before adding it to the last layer.

how do you suggest i accomplish that ?

cc: @kartikdutt18

@zoq
Copy link
Member

zoq commented May 30, 2021

In this case you can use a combination of AddMerge and Sequential. The AddMerge layer just takes arbitrary runs each layer and at the end adds them together.

AddMerge<> resblock(false, false);

Sequential<>* sequential = new Sequential<>(true);

Linear<>* linearA = new Linear<>(10, 10);
Linear<>* linearB = new Linear<>(10, 10);

 sequential->Add(linearA);
 sequential->Add(linearB);

Convolution<> conv = new Convolution<>(...);

resblock.Add(sequential);
resblock.Add(conv);

Let me know if this is what you looked for. Maybe it makes sense to implement that structure as an independent layer in mlpack.

@Aakash-kaushik
Copy link
Contributor Author

Aakash-kaushik commented Jun 5, 2021

In this case you can use a combination of AddMerge and Sequential. The AddMerge layer just takes arbitrary runs each layer and at the end adds them together.

AddMerge<> resblock(false, false);

Sequential<>* sequential = new Sequential<>(true);

Linear<>* linearA = new Linear<>(10, 10);
Linear<>* linearB = new Linear<>(10, 10);

 sequential->Add(linearA);
 sequential->Add(linearB);

Convolution<> conv = new Convolution<>(...);

resblock.Add(sequential);
resblock.Add(conv);

Let me know if this is what you looked for. Maybe it makes sense to implement that structure as an independent layer in mlpack.

I am still stuck at this and don't exactly know what to do, it would be easy if we somehow had a way to define the flow for the network but that is not how it is designed and the main problem here is the downsampling block, I can put all of the things inside a residual block and say it saves the input of the first layer into a temp variable and then tries to add it to the last layer in the residual block but when it does that it finds out that the shapes don't match and I don't see how I can use AddMerge to achieve the same flow. but do let me know if you see some other way around it. I have been thinking around it for way too long.

@kartikdutt18
Copy link
Member

I will try to think of a solution for this and get back to you.

@zoq
Copy link
Member

zoq commented Jun 5, 2021

Just to make sure, I get what you are trying to do, in some cases the output of the sequential part doesn't match with the input so if you add the skip connection you have to add another layer to convert the input?

Something like:

Unbenannte Zeichnung

@Aakash-kaushik
Copy link
Contributor Author

Just to make sure, I get what you are trying to do, in some cases the output of the sequential part doesn't match with the input so if you add the skip connection you have to add another layer to convert the input?

Something like:

Unbenannte Zeichnung

Yes i believe this is exactly what i am trying to do.
this was a great diagram for that. Thank you so much.

@zoq
Copy link
Member

zoq commented Jun 18, 2021

Okay nice, if not let me know and I'll check if I can find a solution as well.

@Aakash-kaushik
Copy link
Contributor Author

Okay nice, if not let me know and I'll check if I can find a solution as well.

I guess that worked but we have another thing

Convolution: 3 64 7 7 2 2 3 3 112 112
BatchNorm: 64
Relu
Padding: 1,1,1,1 114 114
MaxPool: 3,3,2,2 56 56
Convolution: 64 64 3 3 1 1 1 1 56 56
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
BatchNorm: 64
IdentityLayer
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
BatchNorm: 64
Relu
Convolution: 64 64 3 3 1 1 1 1 56 56
BatchNorm: 64
IdentityLayer
Relu
new layer
Convolution: 64 128 3 3 2 2 1 1 28 28
BatchNorm: 128
Relu
Convolution: 128 128 3 3 1 1 1 1 28 28
BatchNorm: 128
DownSample below
Convolution: 64 128 1 1 2 2 0 0 56 56
BatchNorm: 128
Relu

error: addition: incompatible matrix dimensions: 100352x1 and 200704x1
terminate called after throwing an instance of 'std::logic_error'
  what():  addition: incompatible matrix dimensions: 100352x1 and 200704x1
Aborted (core dumped)

anything passed to the downsample should be reduced now but it isin't doing that.

@Aakash-kaushik
Copy link
Contributor Author

Hey @kartikdutt18, @zoq I have pushed the code too can you take another look ?

@kartikdutt18
Copy link
Member

I'm out rn, will take a look around 2030 IST.

@Aakash-kaushik
Copy link
Contributor Author

Aakash-kaushik commented Jun 18, 2021

I'm out rn, will take a look around 2030 IST.

Thanks.

Copy link
Contributor Author

@Aakash-kaushik Aakash-kaushik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore this, this was some error as github was forcing me to leave a review.

downSampleInputHeight, strideWidth, strideHeight, kernelWidth,
kernelHeight, padW, padH, true);

downSample->Add(new ann::BatchNorm<>(outSize));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should batchnorm be added as third connection or as part of downsample layer. Wrap it as sequential and insert it into downsample layer. This will add BaseLayer, Downsample, BatchNorm as three different Paths causing incorrect size. I can elaborate if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed it, can you verify if this is what you meant ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked !!! 🚀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Cool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for spotting it.

@Aakash-kaushik
Copy link
Contributor Author

Hey, so all the architectures are ready but the thing that pulls me back a bit is a resnet152 with 224 * 224 * 3 input dim takes up 7.1 GB of RAM, isin't that too much, I am happy i had a swap space otherwise it wouldn't even run, btw this is runtime memory not compile time.

@Aakash-kaushik
Copy link
Contributor Author

Also @kartikdutt18 it would be great if you can walk me through the weight converter because that part is a bit tougher than what thought, i assumed that i would just need to supply the model object and it would save them as a file which i could further edit.

@kartikdutt18
Copy link
Member

Let's first clean up this PR so it can be reviewed. This included adding comments, making the style check pass and squashing commits.

@Aakash-kaushik
Copy link
Contributor Author

Let's first clean up this PR so it can be reviewed. This included adding comments, making the style check pass and squashing commits.

btw should i keep the output that is printed now? maybe as mlpack log or something that can be enabled or something like that ?

@kartikdutt18
Copy link
Member

You can use logs similar to Darknet. I think they are more clear / easier to understand.

@Aakash-kaushik
Copy link
Contributor Author

You can use logs similar to Darknet. I think they are more clear / easier to understand.

Great, shall do that.

@Aakash-kaushik
Copy link
Contributor Author

btw how do we see the output of mlpack::log::info ? i haven't used it before.

@Aakash-kaushik
Copy link
Contributor Author

I haven't added the pretrained part of code because i didn't really had weights but that can be easily added once we have the code so i don't think much to worry about that.

@Aakash-kaushik
Copy link
Contributor Author

Hey @kartikdutt18, @zoq I have created #63, It wouild be great if we can review the PR over there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants