-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can I use this to create a powerful MLP architecture? #32
Comments
We did not try but it is possible. And we strongly suggest our new work "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network". The code is here: https://github.com/alibaba/lightweight-neural-architecture-search |
Hi Kenny,
I cannot find your reply on Github so I'm not sure whether you deleted it
or not. In case you are still interested:
* The network entropy only depends on the width and depth of the
architecture, up to some constants. So you can use non-Gaussian
distribution to initialize your weights but the results will be nearly the
same, up to some constants.
* Please note that you cannot use adaptive weight initialization, e.g., the
weight will be normalized by fan-in / fan-out. These normalized methods
will "normalize" your entropy no matter how many channels you have.
Best,
Ming
…On Mon, Apr 24, 2023 at 10:06 PM Kenny Wu ***@***.***> wrote:
We did not try but it is possible. And we strongly suggest our new work
"DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural
Network". The code is here:
https://github.com/alibaba/lightweight-neural-architecture-search
[image: image]
<https://user-images.githubusercontent.com/110704880/234179263-4985f32e-f3e6-4511-bf67-6c6f8781a3bb.png>
I went over how you derived the entropy of the MLP in DeepMAD. I found
that the final formula needed to be based on the standard normal
distribution assumption in box 1, which also resulted in the entropy of the
MLP being dependent only on the model structure of the MLP itself (the
width of each layer and the depth of the network), not on the weight and
input of the network.
So when I use this formula based on the standard normal distribution
hypothesis to design the MLP network, can I really not care about the
weight of the network and the input distribution?
—
Reply to this email directly, view it on GitHub
<#32 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFIQVWNLN6YFZJ6FKDZPDDDXC5LUVANCNFSM6AAAAAAXINWQCM>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Thanks for replying! Some questions I specifically asked under the DeepMAD repository and got appropriate answers, so I closed them. By the way, I have another question which confuses me when I try to put the theory of DeepMAD into practice. How do I design a MLP based on a specific dataset using the theory of DeepMAD? For example, the number of samples in my dataset is sometimes large and sometimes small, which will affect the design of the network structure, but it seems that I don't see a discussion or study of this in DeepMAD. |
I think this should be 'L-1' if I don't misunderstand. Detail in alibaba/lightweight-neural-architecture-search#11 (comment) |
Thanks for the feedback! Please just use all L layers. I did not check
whether we index from 0 to L-1 or from 1 to L.
…On Tue, Apr 25, 2023 at 7:56 PM Kenny Wu ***@***.***> wrote:
[image: image]
<https://user-images.githubusercontent.com/110704880/234454886-b1a13c16-af8a-4f04-a444-0a931dde0c09.png>
I think this should be 'L-1' if I don't misunderstand.
—
Reply to this email directly, view it on GitHub
<#32 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFIQVWILYAF5FH5MHGGCXY3XDCFGFANCNFSM6AAAAAAXINWQCM>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Thanks for replying!
What about the attributes of the training datasets.
Specifically, if the number of samples in a training datasets is 10000, and the number of samples in another training datasets is 100.
Basically, how to quantify the influence of training datasets size on MLP structure design.
Kenny
***@***.***
…------------------ 原始邮件 ------------------
发件人: "idstcv/ZenNAS" ***@***.***>;
发送时间: 2023年4月26日(星期三) 中午1:09
***@***.***>;
***@***.******@***.***>;
主题: Re: [idstcv/ZenNAS] Can I use this to create a powerful MLP architecture? (Issue #32)
Thanks for the feedback! Please just use all L layers. I did not check
whether we index from 0 to L-1 or from 1 to L.
On Tue, Apr 25, 2023 at 7:56 PM Kenny Wu ***@***.***> wrote:
> [image: image]
> <https://user-images.githubusercontent.com/110704880/234454886-b1a13c16-af8a-4f04-a444-0a931dde0c09.png>
>
> I think this should be 'L-1' if I don't misunderstand.
>
> —
> Reply to this email directly, view it on GitHub
> <#32 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AFIQVWILYAF5FH5MHGGCXY3XDCFGFANCNFSM6AAAAAAXINWQCM>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
There is no easy way to introduce the number of training instances in
DeepMAD. According to the machine learning theory, the safeway is to ensure
that the number of network parameters should be much smaller than the
number of training instances.
…On Tue, Apr 25, 2023 at 10:15 PM Kenny Wu ***@***.***> wrote:
Thanks for replying!
What about the attributes of the training datasets.
Specifically, if the number of samples in a training datasets is 10000,
and the number of samples in another training datasets is 100.
Basically, how to quantify the influence of training datasets size on
MLP structure design.
Kenny
***@***.***
------------------ 原始邮件 ------------------
发件人: "idstcv/ZenNAS" ***@***.***>;
发送时间: 2023年4月26日(星期三) 中午1:09
***@***.***>;
***@***.******@***.***>;
主题: Re: [idstcv/ZenNAS] Can I use this to create a powerful MLP
architecture? (Issue #32)
Thanks for the feedback! Please just use all L layers. I did not check
whether we index from 0 to L-1 or from 1 to L.
On Tue, Apr 25, 2023 at 7:56 PM Kenny Wu ***@***.***> wrote:
> [image: image]
> <
https://user-images.githubusercontent.com/110704880/234454886-b1a13c16-af8a-4f04-a444-0a931dde0c09.png>
>
> I think this should be 'L-1' if I don't misunderstand.
>
> —
> Reply to this email directly, view it on GitHub
> <
#32 (comment)>,
or
> unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AFIQVWILYAF5FH5MHGGCXY3XDCFGFANCNFSM6AAAAAAXINWQCM>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID:
***@***.***>
—
Reply to this email directly, view it on GitHub
<#32 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFIQVWLGOT2NLNOIEM3AG7LXDCVQPANCNFSM6AAAAAAXINWQCM>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I would like to ask if Zen-Score and DeepMAD are two completely different things? Are they hard to explain to each other? |
I find that DeepMAD is based on information theory and Zen-Score is based on the theory of linear regions of deep neural networks, so I think DeepMAD and Zen-Score might be hard to reconcile now. They should be two completely different things. |
On papers with code website, I find that DeepMAD beats Zen-Score on graphics tasks. Does this mean that designing an MLPs' NAS algorithm using DeepMAD based on information theory is better than using Zen-Score based on theory of linear regions of deep neural networks? |
Then I also notice that Zen-Score seems to only propose NAS ideas for networks with a CNN structure, and has no ready-made ideas for NAS tasks for MLP networks. I want to make sure that's true, in case I misunderstand the paper. If this is true, I would like to ask you whether you recommend using the method of information theory or using the theory related to linear regions to develop the NAS algorithm of MLP. Sorry to bother you! But these questions are significant to me. Hope for replying! |
By the way, could you recommend some articles or work on the relationship between training sample complexity (such as the number of samples) and model complexity? Thanks a lot!! Because I find that many articles are discussing the expressiveness of a model, but the training of the model cannot be separated from the sample. The most basic common sense is that if the number of samples is not enough, the expressiveness of the model, no matter how strong, cannot be trained successfully. |
I have another question that I find that many theories are applied to classification tasks. Can these theories applied to classification tasks be applied to regression tasks which is also a very important task in industry? |
Sorry for the late reply! It has nothing to do with the downstream task. Our DeepMAD method (or Zen-NAS as its early-version) finds "mathematically optimal" structure in our framework. So if you buy in our key argument that max-entropy and max-effectiveness are critical to build a good model, you can "believe" that this model can perform well in various downstream tasks. As you do not have more prior knowledge about the downstream task, that is pretty much the best we can hope. Of course, if you have more prior knowledge, you can always design better models. Yet there is no significant difference between classification and regression to me. |
Thanks a lot!
…---Original---
From: "Ming ***@***.***>
Date: Sat, May 6, 2023 23:09 PM
To: ***@***.***>;
Cc: "Kenny ***@***.******@***.***>;
Subject: Re: [idstcv/ZenNAS] Can I use this to create a powerful MLParchitecture? (Issue #32)
Sorry for the late reply!
It has nothing to do with the downstream task. Our DeepMAD method (or Zen-NAS as its early-version) finds "mathematically optimal" structure in our framework. So if you buy in our key argument that max-entropy and max-effectiveness are critical to build a good model, you can "believe" that this model can perform well in various downstream tasks. As you do not have more prior knowledge about the downstream task, that is pretty much the best we can hope.
Of course, if you have more prior knowledge, you can always design better models. Yet there is no significant difference between classification and regression to me.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
这是来自QQ邮箱的假期自动回复邮件。
邮件我已经成功接收,我会尽快处理并答复,谢谢!
|
Hi, |
You are welcome! |
这是来自QQ邮箱的假期自动回复邮件。
邮件我已经成功接收,我会尽快处理并答复,谢谢!
|
No description provided.
The text was updated successfully, but these errors were encountered: