Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More on client_main.py and train.py #12

Closed
5 of 6 tasks
hansen7 opened this issue Aug 12, 2020 · 12 comments
Closed
5 of 6 tasks

More on client_main.py and train.py #12

hansen7 opened this issue Aug 12, 2020 · 12 comments

Comments

@hansen7
Copy link
Contributor

hansen7 commented Aug 12, 2020

@hansen7 hansen7 changed the title More on Client_main.py More on client_main.py Aug 12, 2020
@hansen7 hansen7 changed the title More on client_main.py More on client_main.py and train.py Aug 13, 2020
@hansen7
Copy link
Contributor Author

hansen7 commented Aug 13, 2020

Encryption and Decryption has issues:
image
The stored parameters from the sever are too large for the model parameters after encryption given the seed of client_1

https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/config/client1_config.json#L13

This is what random initialised parameters looks like:
image

Need to be rewrite in the encryption and decryption

@ganjf
Copy link
Contributor

ganjf commented Aug 13, 2020

@ganjf
Copy link
Contributor

ganjf commented Aug 13, 2020

@ganjf
Copy link
Contributor

ganjf commented Aug 13, 2020

@ganjf
Copy link
Contributor

ganjf commented Aug 13, 2020

@ganjf
Copy link
Contributor

ganjf commented Aug 13, 2020

@Roserland
Copy link
Member

[x] Why break?
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/train.py#L76

It's used for testing code, I'm sorry we forget to delet it.

As I think, the client will train the model just using the former received Model_Params for a round, then send it to the server. However, when the "error" response occurred in the initial round(which is designed for parameters encryption), then there will be some mistakes in the parameter loading process.

@hansen7
Copy link
Contributor Author

hansen7 commented Aug 13, 2020

[ ] Why we need this?
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/client1_main.py#L144-L150

Avoid data overflow.
$$
model =\frac{len_i}{\sum len_i} \cdot model_i,
model = \frac{len_i \cdot model_i}{\sum len_i \cdot model_i}
$$
In code ,we use the first expression to get the weighted model.
If we use the second expression, the data(l${\sum len_i \cdot model_i}$) will overflow.

抱歉没太懂。。可以说得更详细些吗

@hansen7
Copy link
Contributor Author

hansen7 commented Aug 13, 2020

关于Train_Scheduler:
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/client1_main.py#L65
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/train.py#L69

应该是改成:

train_scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, (num_epochs - warm_epoch) * iter_per_epoch)
train_scheduler.step()

@hansen7
Copy link
Contributor Author

hansen7 commented Aug 13, 2020

[ ] Why we need this?
https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/client1_main.py#L144-L150

Avoid data overflow.
$$
model =\frac{len_i}{\sum len_i} \cdot model_i,
model = \frac{len_i \cdot model_i}{\sum len_i \cdot model_i}
$$
In code ,we use the first expression to get the weighted model.
If we use the second expression, the data(l${\sum len_i \cdot model_i}$) will overflow.

抱歉没太懂。。可以说得更详细些吗

应改成:

if epoch_num == 0: 
    dec_model_state[key] = dec_model_state[key] / _client_num
else: 
    dec_model_state[key] = dec_model_state[key]

https://github.com/HUST-EIC-AI-LAB/COVID-19-Fedrated-Learning-Framework/blob/d8ca89a7f328bf4ec5353785385f21298f232aea/client/client1_main.py#L168-L169

因为后面sent的是weighted parameters,所以在server就是simple summation之后就发回来,所以之后的dec_model_state就直接用就行了

@hansen7
Copy link
Contributor Author

hansen7 commented Aug 13, 2020

还有一个问题是,一个client训练完了 然后server就自动停了。。。其它的client就没法继续run了。。。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants