Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I use this to create a powerful MLP architecture? #32

Closed
kennyorn1 opened this issue Apr 23, 2023 · 19 comments
Closed

Can I use this to create a powerful MLP architecture? #32

kennyorn1 opened this issue Apr 23, 2023 · 19 comments

Comments

@kennyorn1
Copy link

No description provided.

@MingLin-home
Copy link
Collaborator

We did not try but it is possible. And we strongly suggest our new work "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network". The code is here: https://github.com/alibaba/lightweight-neural-architecture-search

@MingLin-home
Copy link
Collaborator

MingLin-home commented Apr 25, 2023 via email

@kennyorn1
Copy link
Author

kennyorn1 commented Apr 25, 2023

Hi Kenny, I cannot find your reply on Github so I'm not sure whether you deleted it or not. In case you are still interested: * The network entropy only depends on the width and depth of the architecture, up to some constants. So you can use non-Gaussian distribution to initialize your weights but the results will be nearly the same, up to some constants. * Please note that you cannot use adaptive weight initialization, e.g., the weight will be normalized by fan-in / fan-out. These normalized methods will "normalize" your entropy no matter how many channels you have. Best, Ming

On Mon, Apr 24, 2023 at 10:06 PM Kenny Wu @.> wrote: We did not try but it is possible. And we strongly suggest our new work "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network". The code is here: https://github.com/alibaba/lightweight-neural-architecture-search [image: image] https://user-images.githubusercontent.com/110704880/234179263-4985f32e-f3e6-4511-bf67-6c6f8781a3bb.png I went over how you derived the entropy of the MLP in DeepMAD. I found that the final formula needed to be based on the standard normal distribution assumption in box 1, which also resulted in the entropy of the MLP being dependent only on the model structure of the MLP itself (the width of each layer and the depth of the network), not on the weight and input of the network. So when I use this formula based on the standard normal distribution hypothesis to design the MLP network, can I really not care about the weight of the network and the input distribution? — Reply to this email directly, view it on GitHub <#32 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIQVWNLN6YFZJ6FKDZPDDDXC5LUVANCNFSM6AAAAAAXINWQCM . You are receiving this because you commented.Message ID: @.>

Thanks for replying!

Some questions I specifically asked under the DeepMAD repository and got appropriate answers, so I closed them.

By the way, I have another question which confuses me when I try to put the theory of DeepMAD into practice.

How do I design a MLP based on a specific dataset using the theory of DeepMAD?

For example, the number of samples in my dataset is sometimes large and sometimes small, which will affect the design of the network structure, but it seems that I don't see a discussion or study of this in DeepMAD.

@kennyorn1
Copy link
Author

kennyorn1 commented Apr 26, 2023

image

I think this should be 'L-1' if I don't misunderstand.

Detail in alibaba/lightweight-neural-architecture-search#11 (comment)

@MingLin-home
Copy link
Collaborator

MingLin-home commented Apr 26, 2023 via email

@kennyorn1
Copy link
Author

kennyorn1 commented Apr 26, 2023 via email

@MingLin-home
Copy link
Collaborator

MingLin-home commented Apr 26, 2023 via email

@kennyorn1
Copy link
Author

We did not try but it is possible. And we strongly suggest our new work "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network". The code is here: https://github.com/alibaba/lightweight-neural-architecture-search

I would like to ask if Zen-Score and DeepMAD are two completely different things? Are they hard to explain to each other?

@kennyorn1
Copy link
Author

We did not try but it is possible. And we strongly suggest our new work "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network". The code is here: https://github.com/alibaba/lightweight-neural-architecture-search

I find that DeepMAD is based on information theory and Zen-Score is based on the theory of linear regions of deep neural networks, so I think DeepMAD and Zen-Score might be hard to reconcile now. They should be two completely different things.

@kennyorn1
Copy link
Author

kennyorn1 commented May 3, 2023

We did not try but it is possible. And we strongly suggest our new work "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network". The code is here: https://github.com/alibaba/lightweight-neural-architecture-search

On papers with code website, I find that DeepMAD beats Zen-Score on graphics tasks. Does this mean that designing an MLPs' NAS algorithm using DeepMAD based on information theory is better than using Zen-Score based on theory of linear regions of deep neural networks?

@kennyorn1
Copy link
Author

kennyorn1 commented May 3, 2023

We did not try but it is possible. And we strongly suggest our new work "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network". The code is here: https://github.com/alibaba/lightweight-neural-architecture-search

Then I also notice that Zen-Score seems to only propose NAS ideas for networks with a CNN structure, and has no ready-made ideas for NAS tasks for MLP networks. I want to make sure that's true, in case I misunderstand the paper.

If this is true, I would like to ask you whether you recommend using the method of information theory or using the theory related to linear regions to develop the NAS algorithm of MLP.

Sorry to bother you! But these questions are significant to me.

Hope for replying!

@kennyorn1
Copy link
Author

kennyorn1 commented May 3, 2023

By the way, could you recommend some articles or work on the relationship between training sample complexity (such as the number of samples) and model complexity? Thanks a lot!!

Because I find that many articles are discussing the expressiveness of a model, but the training of the model cannot be separated from the sample. The most basic common sense is that if the number of samples is not enough, the expressiveness of the model, no matter how strong, cannot be trained successfully.

@kennyorn1
Copy link
Author

I have another question that I find that many theories are applied to classification tasks. Can these theories applied to classification tasks be applied to regression tasks which is also a very important task in industry?

@MingLin-home
Copy link
Collaborator

Sorry for the late reply!

It has nothing to do with the downstream task. Our DeepMAD method (or Zen-NAS as its early-version) finds "mathematically optimal" structure in our framework. So if you buy in our key argument that max-entropy and max-effectiveness are critical to build a good model, you can "believe" that this model can perform well in various downstream tasks. As you do not have more prior knowledge about the downstream task, that is pretty much the best we can hope.

Of course, if you have more prior knowledge, you can always design better models. Yet there is no significant difference between classification and regression to me.

@kennyorn1
Copy link
Author

kennyorn1 commented May 8, 2023 via email

@dovedx
Copy link

dovedx commented May 29, 2023 via email

@kennyorn1 kennyorn1 reopened this May 29, 2023
@kennyorn1
Copy link
Author

kennyorn1 commented May 29, 2023

Hi,
I have some problems about a new NAS work. May I have your email to connect with you? I would appreciate it very much if you agree.@MingLin-home

@MingLin-home
Copy link
Collaborator

You are welcome!
My email is on my homepage, linming04@gmail.com.

@dovedx
Copy link

dovedx commented Sep 19, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants