Can I use this to create a powerful MLP architecture? #32

kennyorn1 · 2023-04-23T10:04:27Z

No description provided.

MingLin-home · 2023-04-24T02:24:12Z

We did not try but it is possible. And we strongly suggest our new work "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network". The code is here: https://github.com/alibaba/lightweight-neural-architecture-search

MingLin-home · 2023-04-25T17:06:25Z

Hi Kenny, I cannot find your reply on Github so I'm not sure whether you deleted it or not. In case you are still interested: * The network entropy only depends on the width and depth of the architecture, up to some constants. So you can use non-Gaussian distribution to initialize your weights but the results will be nearly the same, up to some constants. * Please note that you cannot use adaptive weight initialization, e.g., the weight will be normalized by fan-in / fan-out. These normalized methods will "normalize" your entropy no matter how many channels you have. Best, Ming

…

On Mon, Apr 24, 2023 at 10:06 PM Kenny Wu ***@***.***> wrote: We did not try but it is possible. And we strongly suggest our new work "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network". The code is here: https://github.com/alibaba/lightweight-neural-architecture-search [image: image] <https://user-images.githubusercontent.com/110704880/234179263-4985f32e-f3e6-4511-bf67-6c6f8781a3bb.png> I went over how you derived the entropy of the MLP in DeepMAD. I found that the final formula needed to be based on the standard normal distribution assumption in box 1, which also resulted in the entropy of the MLP being dependent only on the model structure of the MLP itself (the width of each layer and the depth of the network), not on the weight and input of the network. So when I use this formula based on the standard normal distribution hypothesis to design the MLP network, can I really not care about the weight of the network and the input distribution？ — Reply to this email directly, view it on GitHub <#32 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFIQVWNLN6YFZJ6FKDZPDDDXC5LUVANCNFSM6AAAAAAXINWQCM> . You are receiving this because you commented.Message ID: ***@***.***>

kennyorn1 · 2023-04-25T17:23:33Z

Hi Kenny, I cannot find your reply on Github so I'm not sure whether you deleted it or not. In case you are still interested: * The network entropy only depends on the width and depth of the architecture, up to some constants. So you can use non-Gaussian distribution to initialize your weights but the results will be nearly the same, up to some constants. * Please note that you cannot use adaptive weight initialization, e.g., the weight will be normalized by fan-in / fan-out. These normalized methods will "normalize" your entropy no matter how many channels you have. Best, Ming
…
On Mon, Apr 24, 2023 at 10:06 PM Kenny Wu @.> wrote: We did not try but it is possible. And we strongly suggest our new work "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network". The code is here: https://github.com/alibaba/lightweight-neural-architecture-search [image: image] https://user-images.githubusercontent.com/110704880/234179263-4985f32e-f3e6-4511-bf67-6c6f8781a3bb.png I went over how you derived the entropy of the MLP in DeepMAD. I found that the final formula needed to be based on the standard normal distribution assumption in box 1, which also resulted in the entropy of the MLP being dependent only on the model structure of the MLP itself (the width of each layer and the depth of the network), not on the weight and input of the network. So when I use this formula based on the standard normal distribution hypothesis to design the MLP network, can I really not care about the weight of the network and the input distribution？ — Reply to this email directly, view it on GitHub <#32 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIQVWNLN6YFZJ6FKDZPDDDXC5LUVANCNFSM6AAAAAAXINWQCM . You are receiving this because you commented.Message ID: @.>

Thanks for replying!

Some questions I specifically asked under the DeepMAD repository and got appropriate answers, so I closed them.

By the way, I have another question which confuses me when I try to put the theory of DeepMAD into practice.

How do I design a MLP based on a specific dataset using the theory of DeepMAD？

For example, the number of samples in my dataset is sometimes large and sometimes small, which will affect the design of the network structure, but it seems that I don't see a discussion or study of this in DeepMAD.

kennyorn1 · 2023-04-26T02:56:23Z

I think this should be 'L-1' if I don't misunderstand.

Detail in alibaba/lightweight-neural-architecture-search#11 (comment)

MingLin-home · 2023-04-26T05:08:59Z

Thanks for the feedback! Please just use all L layers. I did not check whether we index from 0 to L-1 or from 1 to L.

…

On Tue, Apr 25, 2023 at 7:56 PM Kenny Wu ***@***.***> wrote: [image: image] <https://user-images.githubusercontent.com/110704880/234454886-b1a13c16-af8a-4f04-a444-0a931dde0c09.png> I think this should be 'L-1' if I don't misunderstand. — Reply to this email directly, view it on GitHub <#32 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFIQVWILYAF5FH5MHGGCXY3XDCFGFANCNFSM6AAAAAAXINWQCM> . You are receiving this because you commented.Message ID: ***@***.***>

kennyorn1 · 2023-04-26T05:15:41Z

Thanks for replying! What about the attributes of the training datasets.  Specifically, if the number of samples in a training datasets is 10000, and the number of samples in another training datasets is 100.  Basically, how to quantify the influence of training datasets size on MLP structure design. Kenny ***@***.***  

…

------------------ 原始邮件 ------------------ 发件人: "idstcv/ZenNAS" ***@***.***>; 发送时间: 2023年4月26日(星期三) 中午1:09 ***@***.***>; ***@***.******@***.***>; 主题: Re: [idstcv/ZenNAS] Can I use this to create a powerful MLP architecture? (Issue #32) Thanks for the feedback! Please just use all L layers. I did not check whether we index from 0 to L-1 or from 1 to L. On Tue, Apr 25, 2023 at 7:56 PM Kenny Wu ***@***.***> wrote: > [image: image] > <https://user-images.githubusercontent.com/110704880/234454886-b1a13c16-af8a-4f04-a444-0a931dde0c09.png> > > I think this should be 'L-1' if I don't misunderstand. > > — > Reply to this email directly, view it on GitHub > <#32 (comment)>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AFIQVWILYAF5FH5MHGGCXY3XDCFGFANCNFSM6AAAAAAXINWQCM> > . > You are receiving this because you commented.Message ID: > ***@***.***> > — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

MingLin-home · 2023-04-26T05:30:15Z

There is no easy way to introduce the number of training instances in DeepMAD. According to the machine learning theory, the safeway is to ensure that the number of network parameters should be much smaller than the number of training instances.

…

On Tue, Apr 25, 2023 at 10:15 PM Kenny Wu ***@***.***> wrote: Thanks for replying! What about the attributes of the training datasets.  Specifically, if the number of samples in a training datasets is 10000, and the number of samples in another training datasets is 100.  Basically, how to quantify the influence of training datasets size on MLP structure design. Kenny ***@***.***   ------------------ 原始邮件 ------------------ 发件人: "idstcv/ZenNAS" ***@***.***>; 发送时间: 2023年4月26日(星期三) 中午1:09 ***@***.***>; ***@***.******@***.***>; 主题: Re: [idstcv/ZenNAS] Can I use this to create a powerful MLP architecture? (Issue #32) Thanks for the feedback! Please just use all L layers. I did not check whether we index from 0 to L-1 or from 1 to L. On Tue, Apr 25, 2023 at 7:56 PM Kenny Wu ***@***.***> wrote: > [image: image] > < https://user-images.githubusercontent.com/110704880/234454886-b1a13c16-af8a-4f04-a444-0a931dde0c09.png> > > I think this should be 'L-1' if I don't misunderstand. > > — > Reply to this email directly, view it on GitHub > < #32 (comment)>, or > unsubscribe > < https://github.com/notifications/unsubscribe-auth/AFIQVWILYAF5FH5MHGGCXY3XDCFGFANCNFSM6AAAAAAXINWQCM> > . > You are receiving this because you commented.Message ID: > ***@***.***> > — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***> — Reply to this email directly, view it on GitHub <#32 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFIQVWLGOT2NLNOIEM3AG7LXDCVQPANCNFSM6AAAAAAXINWQCM> . You are receiving this because you commented.Message ID: ***@***.***>

kennyorn1 · 2023-05-03T10:43:45Z

We did not try but it is possible. And we strongly suggest our new work "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network". The code is here: https://github.com/alibaba/lightweight-neural-architecture-search

I would like to ask if Zen-Score and DeepMAD are two completely different things? Are they hard to explain to each other?

kennyorn1 · 2023-05-03T10:47:18Z

We did not try but it is possible. And we strongly suggest our new work "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network". The code is here: https://github.com/alibaba/lightweight-neural-architecture-search

I find that DeepMAD is based on information theory and Zen-Score is based on the theory of linear regions of deep neural networks, so I think DeepMAD and Zen-Score might be hard to reconcile now. They should be two completely different things.

kennyorn1 · 2023-05-03T10:52:47Z

We did not try but it is possible. And we strongly suggest our new work "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network". The code is here: https://github.com/alibaba/lightweight-neural-architecture-search

On papers with code website, I find that DeepMAD beats Zen-Score on graphics tasks. Does this mean that designing an MLPs' NAS algorithm using DeepMAD based on information theory is better than using Zen-Score based on theory of linear regions of deep neural networks?

kennyorn1 · 2023-05-03T10:58:33Z

We did not try but it is possible. And we strongly suggest our new work "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network". The code is here: https://github.com/alibaba/lightweight-neural-architecture-search

Then I also notice that Zen-Score seems to only propose NAS ideas for networks with a CNN structure, and has no ready-made ideas for NAS tasks for MLP networks. I want to make sure that's true, in case I misunderstand the paper.

If this is true, I would like to ask you whether you recommend using the method of information theory or using the theory related to linear regions to develop the NAS algorithm of MLP.

Sorry to bother you! But these questions are significant to me.

Hope for replying!

kennyorn1 · 2023-05-03T11:06:36Z

By the way, could you recommend some articles or work on the relationship between training sample complexity （such as the number of samples） and model complexity? Thanks a lot!!

Because I find that many articles are discussing the expressiveness of a model, but the training of the model cannot be separated from the sample. The most basic common sense is that if the number of samples is not enough, the expressiveness of the model, no matter how strong, cannot be trained successfully.

kennyorn1 · 2023-05-03T11:17:41Z

I have another question that I find that many theories are applied to classification tasks. Can these theories applied to classification tasks be applied to regression tasks which is also a very important task in industry?

MingLin-home · 2023-05-06T15:08:54Z

Sorry for the late reply!

It has nothing to do with the downstream task. Our DeepMAD method (or Zen-NAS as its early-version) finds "mathematically optimal" structure in our framework. So if you buy in our key argument that max-entropy and max-effectiveness are critical to build a good model, you can "believe" that this model can perform well in various downstream tasks. As you do not have more prior knowledge about the downstream task, that is pretty much the best we can hope.

Of course, if you have more prior knowledge, you can always design better models. Yet there is no significant difference between classification and regression to me.

kennyorn1 · 2023-05-08T06:33:28Z

Thanks a lot!

…

---Original--- From: "Ming ***@***.***> Date: Sat, May 6, 2023 23:09 PM To: ***@***.***>; Cc: "Kenny ***@***.******@***.***>; Subject: Re: [idstcv/ZenNAS] Can I use this to create a powerful MLParchitecture? (Issue #32) Sorry for the late reply! It has nothing to do with the downstream task. Our DeepMAD method (or Zen-NAS as its early-version) finds "mathematically optimal" structure in our framework. So if you buy in our key argument that max-entropy and max-effectiveness are critical to build a good model, you can "believe" that this model can perform well in various downstream tasks. As you do not have more prior knowledge about the downstream task, that is pretty much the best we can hope. Of course, if you have more prior knowledge, you can always design better models. Yet there is no significant difference between classification and regression to me. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

dovedx · 2023-05-29T07:15:31Z

这是来自QQ邮箱的假期自动回复邮件。邮件我已经成功接收，我会尽快处理并答复，谢谢！

kennyorn1 · 2023-05-29T07:31:29Z

Hi,
I have some problems about a new NAS work. May I have your email to connect with you? I would appreciate it very much if you agree.@MingLin-home

MingLin-home · 2023-05-29T16:22:36Z

You are welcome!
My email is on my homepage, linming04@gmail.com.

dovedx · 2024-09-19T15:02:35Z

这是来自QQ邮箱的假期自动回复邮件。邮件我已经成功接收，我会尽快处理并答复，谢谢！

kennyorn1 closed this as completed May 29, 2023

kennyorn1 reopened this May 29, 2023

kennyorn1 closed this as completed Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I use this to create a powerful MLP architecture? #32

Can I use this to create a powerful MLP architecture? #32

kennyorn1 commented Apr 23, 2023

MingLin-home commented Apr 24, 2023

MingLin-home commented Apr 25, 2023 via email

kennyorn1 commented Apr 25, 2023 •

edited

Loading

kennyorn1 commented Apr 26, 2023 •

edited

Loading

MingLin-home commented Apr 26, 2023 via email

kennyorn1 commented Apr 26, 2023 via email

MingLin-home commented Apr 26, 2023 via email

kennyorn1 commented May 3, 2023

kennyorn1 commented May 3, 2023

kennyorn1 commented May 3, 2023 •

edited

Loading

kennyorn1 commented May 3, 2023 •

edited

Loading

kennyorn1 commented May 3, 2023 •

edited

Loading

kennyorn1 commented May 3, 2023

MingLin-home commented May 6, 2023

kennyorn1 commented May 8, 2023 via email

dovedx commented May 29, 2023 via email

kennyorn1 commented May 29, 2023 •

edited

Loading

MingLin-home commented May 29, 2023

dovedx commented Sep 19, 2024 via email

Can I use this to create a powerful MLP architecture? #32

Can I use this to create a powerful MLP architecture? #32

Comments

kennyorn1 commented Apr 23, 2023

MingLin-home commented Apr 24, 2023

MingLin-home commented Apr 25, 2023 via email

kennyorn1 commented Apr 25, 2023 • edited Loading

kennyorn1 commented Apr 26, 2023 • edited Loading

MingLin-home commented Apr 26, 2023 via email

kennyorn1 commented Apr 26, 2023 via email

MingLin-home commented Apr 26, 2023 via email

kennyorn1 commented May 3, 2023

kennyorn1 commented May 3, 2023

kennyorn1 commented May 3, 2023 • edited Loading

kennyorn1 commented May 3, 2023 • edited Loading

kennyorn1 commented May 3, 2023 • edited Loading

kennyorn1 commented May 3, 2023

MingLin-home commented May 6, 2023

kennyorn1 commented May 8, 2023 via email

dovedx commented May 29, 2023 via email

kennyorn1 commented May 29, 2023 • edited Loading

MingLin-home commented May 29, 2023

dovedx commented Sep 19, 2024 via email

kennyorn1 commented Apr 25, 2023 •

edited

Loading

kennyorn1 commented Apr 26, 2023 •

edited

Loading

kennyorn1 commented May 3, 2023 •

edited

Loading

kennyorn1 commented May 3, 2023 •

edited

Loading

kennyorn1 commented May 3, 2023 •

edited

Loading

kennyorn1 commented May 29, 2023 •

edited

Loading