More on client_main.py and train.py #12

hansen7 · 2020-08-12T19:56:26Z

why we need no bias decay in the model update
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/client1_main.py#L56-L60
what if the model is not received from the server successfully
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/client1_main.py#L84-L85
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/client1_main.py#L90
Why we need this?
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/client1_main.py#L144-L150
Why we need this? since there are warmup scheduler already...
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/train.py#L34-L37
Will the scheduler work? (Checking on myself...) -> Checked, it is the same (memory location)
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/train.py#L87
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/client1_main.py#L156-L159
Why break?
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/train.py#L76

hansen7 · 2020-08-13T04:27:14Z

Encryption and Decryption has issues:

The stored parameters from the sever are too large for the model parameters after encryption given the seed of client_1

https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/config/client1_config.json#L13

This is what random initialised parameters looks like:

Need to be rewrite in the encryption and decryption

ganjf · 2020-08-13T05:07:06Z

why we need no bias decay in the model update
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/client1_main.py#L56-L60
No bias decay is a training trick to imporve the performace of CNN classifier .
paper [Bag of Tricks for Image Classification with Convolutional Neural Networks]

ganjf · 2020-08-13T05:18:33Z

Why we need this?
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/client1_main.py#L144-L150
Avoid data overflow.
$$
model =\frac{len_i}{\sum len_i} \cdot model_i，
model = \frac{len_i \cdot model_i}{\sum len_i \cdot model_i}
$$
In code ,we use the first expression to get the weighted model.
If we use the second expression, the data(l${\sum len_i \cdot model_i}$) will overflow.

ganjf · 2020-08-13T05:23:14Z

Why we need this? since there are warmup scheduler already...
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/train.py#L34-L37
This part could be deleted.

ganjf · 2020-08-13T05:26:35Z

Why we need this?
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/client1_main.py#L144-L150

Avoid data overflow.
$$
model =\frac{len_i}{\sum len_i} \cdot model_i，
model = \frac{len_i \cdot model_i}{\sum len_i \cdot model_i}
$$
In code ,we use the first expression to get the weighted model.
If we use the second expression, the data(l${\sum len_i \cdot model_i}$) will overflow.
我们通过在正式联合训练开始之前进行多一次的交互，通过server整合所有client的数据集大小，来得到归一化的weigh (0,1)，之后client直接上传weighted model weight，server端只需要做简单的相加。

ganjf · 2020-08-13T05:28:29Z

Why break?
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/train.py#L76
It's used for testing code, I'm sorry we forget to delet it.

Roserland · 2020-08-13T05:55:21Z

[x] Why break?
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/train.py#L76

It's used for testing code, I'm sorry we forget to delet it.

As I think, the client will train the model just using the former received Model_Params for a round, then send it to the server. However, when the "error" response occurred in the initial round(which is designed for parameters encryption), then there will be some mistakes in the parameter loading process.

hansen7 · 2020-08-13T05:58:17Z

[ ] Why we need this?
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/client1_main.py#L144-L150

Avoid data overflow.
$$
model =\frac{len_i}{\sum len_i} \cdot model_i，
model = \frac{len_i \cdot model_i}{\sum len_i \cdot model_i}
$$
In code ,we use the first expression to get the weighted model.
If we use the second expression, the data(l${\sum len_i \cdot model_i}$) will overflow.

抱歉没太懂。。可以说得更详细些吗

hansen7 · 2020-08-13T07:44:03Z

关于Train_Scheduler:
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/client1_main.py#L65
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/train.py#L69

应该是改成：

train_scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, (num_epochs - warm_epoch) * iter_per_epoch)
train_scheduler.step()

hansen7 · 2020-08-13T10:12:35Z

[ ] Why we need this?
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/client1_main.py#L144-L150

Avoid data overflow.
$$
model =\frac{len_i}{\sum len_i} \cdot model_i，
model = \frac{len_i \cdot model_i}{\sum len_i \cdot model_i}
$$
In code ,we use the first expression to get the weighted model.
If we use the second expression, the data(l${\sum len_i \cdot model_i}$) will overflow.

抱歉没太懂。。可以说得更详细些吗

应改成:

if epoch_num == 0: 
    dec_model_state[key] = dec_model_state[key] / _client_num
else: 
    dec_model_state[key] = dec_model_state[key]

https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/client1_main.py#L168-L169

因为后面sent的是weighted parameters，所以在server就是simple summation之后就发回来，所以之后的dec_model_state就直接用就行了

hansen7 · 2020-08-13T20:02:20Z

还有一个问题是，一个client训练完了然后server就自动停了。。。其它的client就没法继续run了。。。

hansen7 · 2020-08-13T20:03:00Z

还有变量个数对不上：
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/server/aggregation.py#L38

https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/server/aggregation.py#L66

hansen7 changed the title ~~More on Client_main.py~~ More on client_main.py Aug 12, 2020

hansen7 changed the title ~~More on client_main.py~~ More on client_main.py and train.py Aug 13, 2020

hansen7 mentioned this issue Aug 13, 2020

Refinements on Server #13

Merged

hansen7 closed this as completed Aug 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More on client_main.py and train.py #12

More on client_main.py and train.py #12

hansen7 commented Aug 12, 2020 •

edited

hansen7 commented Aug 13, 2020 •

edited

ganjf commented Aug 13, 2020 •

edited

ganjf commented Aug 13, 2020 •

edited

ganjf commented Aug 13, 2020

ganjf commented Aug 13, 2020 •

edited

ganjf commented Aug 13, 2020 •

edited by Roserland

Roserland commented Aug 13, 2020

hansen7 commented Aug 13, 2020

hansen7 commented Aug 13, 2020 •

edited

hansen7 commented Aug 13, 2020 •

edited

hansen7 commented Aug 13, 2020 •

edited

hansen7 commented Aug 13, 2020

More on client_main.py and train.py #12

More on client_main.py and train.py #12

Comments

hansen7 commented Aug 12, 2020 • edited

hansen7 commented Aug 13, 2020 • edited

ganjf commented Aug 13, 2020 • edited

ganjf commented Aug 13, 2020 • edited

ganjf commented Aug 13, 2020

ganjf commented Aug 13, 2020 • edited

ganjf commented Aug 13, 2020 • edited by Roserland

Roserland commented Aug 13, 2020

hansen7 commented Aug 13, 2020

hansen7 commented Aug 13, 2020 • edited

hansen7 commented Aug 13, 2020 • edited

hansen7 commented Aug 13, 2020 • edited

hansen7 commented Aug 13, 2020

hansen7 commented Aug 12, 2020 •

edited

hansen7 commented Aug 13, 2020 •

edited

ganjf commented Aug 13, 2020 •

edited

ganjf commented Aug 13, 2020 •

edited

ganjf commented Aug 13, 2020 •

edited

ganjf commented Aug 13, 2020 •

edited by Roserland

hansen7 commented Aug 13, 2020 •

edited

hansen7 commented Aug 13, 2020 •

edited

hansen7 commented Aug 13, 2020 •

edited