__Purpose:__ Introduce Personalized Federated Learning, specifically by implementing APFL on our dataset and then trying other methods.
<br>
1. We are still assuming we can test on the second half (updates 10-19ish) since (human/co-adaptive) learning should be complete by then!  For reasons shown in earlier NBs

Adapting their code to actually be able to be run in something other than top-down server-only approach
> Their Github: https://github.com/MLOPTPSU/FedTorch <br>
> APFL link: https://github.com/MLOPTPSU/FedTorch/blob/ab8068dbc96804a5c1a8b898fd115175cfebfe75/fedtorch/comms/trainings/federated/apfl.py#L33

loss.backward() computes dloss/dx for every parameter x which has requires_grad=True. These are accumulated into x.grad for every parameter x. Loss.backward() does not update the weights, only computes the gradients.  The graph is used by loss.backward() to compute gradients.  In pseudo-code: x.grad += dloss/dx

optimizer.step updates the value of x using the gradient x.grad. For example, the SGD optimizer performs:

x += -lr * x.grad
optimizer.zero_grad() clears x.grad for every parameter x in the optimizer. It’s important to call this before loss.backward(), otherwise you’ll accumulate the gradients from multiple passes.

optimizer.zero_grad() and optimizer.step() do not affect the graph of autograd objects. They only touch the model’s parameters and the parameter’s grad attributes.

If you have multiple losses (loss1, loss2) you can sum them and then call backwards once:

loss3 = loss1 + loss2
loss3.backward()

In [1]:
import pandas as pd
import os
import numpy as np
import random
from matplotlib import pyplot as plt
from scipy.optimize import minimize
import copy

from experiment_params import *
from cost_funcs import *
from fl_sim_classes import *
import time
import pickle
from sklearn.decomposition import PCA

In [2]:
path = r'C:\Users\kdmen\Desktop\Research\personalization-privacy-risk\Data'
cond0_filename = r'\cond0_dict_list.p'
all_decs_init_filename = r'\all_decs_init.p'
nofl_decs_filename = r'\nofl_decs.p'
id2color = {0:'lightcoral', 1:'maroon', 2:'chocolate', 3:'darkorange', 4:'gold', 5:'olive', 6:'olivedrab', 
            7:'lawngreen', 8:'aquamarine', 9:'deepskyblue', 10:'steelblue', 11:'violet', 12:'darkorchid', 13:'deeppink'}
implemented_client_training_methods = ['EtaGradStep', 'EtaScipyMinStep', 'FullScipyMinStep']
implement_these_methods_next = ['APFL', 'AFL', 'PersA_FL_MAML', 'PersA_FL_ME', 'PFA']
num_participants = 14

# For exclusion when plotting later on
bad_nodes = [1,3,13]

with open(path+cond0_filename, 'rb') as fp:
    cond0_training_and_labels_lst = pickle.load(fp)

D_0_7 = np.random.rand(2,7)

# Testing APFL

Testing the APFL Implementation
> Why does the client and global server need num_steps... is it not just set by the server?

Dynamic learning rate, adaptive off

In [3]:
np.random.seed(0)
user_c0_APFL_realhess_noeta = [Client(i, np.random.rand(2,7), 'NAN', cond0_training_and_labels_lst[i], 
                       'streaming', adaptive=False, 
                       num_steps=10, global_method='APFL') for i in range(14)]
global_model_APFL_realhess_noeta = Server(1, np.random.rand(2,7), 'APFL', user_c0_APFL_realhess_noeta)

big_loop_iters = 250
for i in range(big_loop_iters):
    if i%10==0:
        print(f"Round {i} of {big_loop_iters}")
    global_model_APFL_realhess_noeta.execute_FL_loop()
    
print()
print("(Current Local Round, Current Local Update)")
for my_client in global_model_APFL_realhess_noeta.all_clients:
    print((my_client.current_round, my_client.current_update))

Round 0 of 250
Round 10 of 250
Round 20 of 250
Round 30 of 250
Round 40 of 250
Round 50 of 250
Round 60 of 250
Round 70 of 250
Round 80 of 250


KeyboardInterrupt: 

Note that graphs don't start at zero because we do dim reduc which essentially shaves (averages) points off the start and end

In [None]:
condensed_external_plotting(user_c0_APFL_realhess_noeta, 'local', global_error=False, dim_reduc_factor=10, show_update_change=False, custom_title='Cost Func')

In [None]:
condensed_external_plotting(user_c0_APFL_realhess_noeta, 'local', plot_gradient=True, local_error=False, 
                            global_error=False, show_update_change=False, custom_title='Local Gradient When Using Real Hessian')

In [None]:
condensed_external_plotting(user_c0_APFL_realhess_noeta, 'local', plot_global_gradient=True, local_error=False, 
                            global_error=False, show_update_change=False, custom_title='Global Gradient When Using Real Hessian')

In [None]:
condensed_external_plotting(user_c0_APFL_realhess_noeta, 'local', plot_pers_gradient=True, local_error=False, 
                            global_error=False, show_update_change=False, custom_title='Personalized Gradient When Using Real Hessian')

In [None]:
u0 = user_c0_APFL_realhess_noeta[0]
print("Local vs Personalized Gradient")
print(np.array(u0.gradient_log) - np.array(u0.pers_gradient_log))
print("\nLocal vs Global Gradient")
print(np.array(u0.gradient_log) - np.array(u0.global_gradient_log))
print("\nPers vs Global Gradient")
print(np.array(u0.pers_gradient_log) - np.array(u0.global_gradient_log))

In [None]:
plt.plot(u0.mu_log)

In [None]:
plt.plot(u0.L_log)

In [None]:
plt.plot(u0.eta_t_log, label='Dynamic LR')
plt.plot(1/(2*np.array(u0.L_log)), label='Upper bound')
plt.title("Used learning rate vs safe theoretical max")
plt.legend()
plt.show()

In [None]:
for i in range(len(user_c0_APFL_realhess_noeta)):
    u0 = user_c0_APFL_realhess_noeta[i]
    plt.plot(u0.eta_t_log, color='blue', label='Dynamic LR')
    plt.plot(1/(2*np.array(u0.L_log)), color='red', label='Upper bound')
plt.title("Used learning rate vs safe theoretical max")
#plt.legend()
plt.show()

Clearly, we are well within the safe learning rate limit. Let's try setting the adaptive mixing parameter and see if that helps

Dynamic learning rate, adaptive ON

In [None]:
np.random.seed(0)
user_c0_APFL_realhess_adapt = [Client(i, np.random.rand(2,7), 'NAN', cond0_training_and_labels_lst[i], 
                       'streaming', adaptive=True, 
                       num_steps=10, global_method='APFL') for i in range(14)]
global_model_APFL_realhess_adapt = Server(1, np.random.rand(2,7), 'APFL', user_c0_APFL_realhess_adapt)

big_loop_iters = 250
for i in range(big_loop_iters):
    if i%10==0:
        print(f"Round {i} of {big_loop_iters}")
    global_model_APFL_realhess_adapt.execute_FL_loop()
    
print()
print("(Current Local Round, Current Local Update)")
for my_client in global_model_APFL_realhess_adapt.all_clients:
    print((my_client.current_round, my_client.current_update))

In [None]:
condensed_external_plotting(user_c0_APFL_realhess_adapt, 'local', global_error=False, dim_reduc_factor=10, show_update_change=False, custom_title='(Adaptive) Cost Func')

In [None]:
condensed_external_plotting(user_c0_APFL_realhess_adapt, 'local', plot_gradient=True, local_error=False, 
                            global_error=False, show_update_change=False, custom_title='(Adaptive) Gradient When Using Real Hessian')

In [None]:
for i in range(len(user_c0_APFL_realhess_adapt)):
    u0 = user_c0_APFL_realhess_adapt[i]
    plt.plot(u0.eta_t_log, color='blue', label='Dynamic LR')
    plt.plot(1/(2*np.array(u0.L_log)), color='red', label='Upper bound')
plt.title("Used learning rate vs safe theoretical max")
#plt.legend()
plt.show()

LR (eta) = 0.001, adaptive off

In [None]:
user_c0_APFL_eta_001 = [Client(i, D_0_7, 'NAN', cond0_training_and_labels_lst[i], 
                       'streaming', eta=0.001, input_eta=True, gradient_clipping=True, adaptive=False, 
                       num_steps=10, global_method='APFL') for i in range(14)]
global_model_APFL_eta_001 = Server(1, D_0_7, 'APFL', user_c0_APFL_eta_001)

big_loop_iters = 250
for i in range(big_loop_iters):
    if i%10==0:
        print(f"Round {i} of {big_loop_iters}")
    global_model_APFL_eta_001.execute_FL_loop()
    
print("(Current Local Round, Current Local Update)")
for my_client in global_model_APFL_eta_001.all_clients:
    print((my_client.current_round, my_client.current_update))

In [None]:
condensed_external_plotting(user_c0_APFL_eta_001, 'local', global_error=False, dim_reduc_factor=10, show_update_change=False, custom_title='Cost Func')

In [None]:
condensed_external_plotting(user_c0_APFL_eta_001, 'local', plot_gradient=True, local_error=False, 
                            global_error=False, show_update_change=False, custom_title='Local Gradient, Eta=0.001')

In [None]:
condensed_external_plotting(user_c0_APFL_eta_001, 'local', plot_global_gradient=True, local_error=False, 
                            global_error=False, show_update_change=False, custom_title='Global Gradient, Eta=0.001')

In [None]:
condensed_external_plotting(user_c0_APFL_eta_001, 'local', plot_pers_gradient=True, local_error=False, 
                            global_error=False, show_update_change=False, custom_title='Personalized Gradient, Eta=0.001')

LR (eta) = 1e-7, adaptive off

In [None]:
user_c0_APFL_eta_em7 = [Client(i, D_0_7, 'NAN', cond0_training_and_labels_lst[i], 
                       'streaming', eta=1e-7, input_eta=True, gradient_clipping=True, adaptive=False, 
                       num_steps=10, global_method='APFL') for i in range(14)]
global_model_APFL_eta_em7 = Server(1, D_0_7, 'APFL', user_c0_APFL_eta_em7)

big_loop_iters = 250
for i in range(big_loop_iters):
    if i%10==0:
        print(f"Round {i} of {big_loop_iters}")
    global_model_APFL_eta_em7.execute_FL_loop()
    
print("(Current Local Round, Current Local Update)")
for my_client in global_model_APFL_eta_em7.all_clients:
    print((my_client.current_round, my_client.current_update))

In [None]:
condensed_external_plotting(user_c0_APFL_eta_em7, 'local', global_error=False, dim_reduc_factor=10, show_update_change=False, custom_title='Cost Func')

In [None]:
condensed_external_plotting(user_c0_APFL_eta_em7, 'local', plot_gradient=True, local_error=False, 
                            global_error=False, show_update_change=False, custom_title='Local Gradient, Eta=1e-7')

In [None]:
condensed_external_plotting(user_c0_APFL_eta_em7, 'local', plot_global_gradient=True, local_error=False, 
                            global_error=False, show_update_change=False, custom_title='Global Gradient, Eta=1e-7')

In [None]:
condensed_external_plotting(user_c0_APFL_eta_em7, 'local', plot_pers_gradient=True, local_error=False, 
                            global_error=False, show_update_change=False, custom_title='Personalized Gradient, Eta=1e-7')

LR (eta) = 1e-10, adaptive off

In [None]:
user_c0_APFL_eta_em10 = [Client(i, D_0_7, 'NAN', cond0_training_and_labels_lst[i], 
                       'streaming', eta=1e-10, input_eta=True, gradient_clipping=True, adaptive=False, 
                       num_steps=10, global_method='APFL') for i in range(14)]
global_model_APFL_eta_em10 = Server(1, D_0_7, 'APFL', user_c0_APFL_eta_em10, num_steps=10)

big_loop_iters = 250
for i in range(big_loop_iters):
    if i%10==0:
        print(f"Round {i} of {big_loop_iters}")
    global_model_APFL_eta_em10.execute_FL_loop()
    
print("(Current Local Round, Current Local Update)")
for my_client in global_model_APFL_eta_em10.all_clients:
    print((my_client.current_round, my_client.current_update))

In [None]:
condensed_external_plotting(user_c0_APFL_eta_em10, 'local', global_error=False, dim_reduc_factor=10, show_update_change=False, custom_title='Cost Func')

In [None]:
condensed_external_plotting(user_c0_APFL_eta_em10, 'local', plot_gradient=True, local_error=False, 
                            global_error=False, show_update_change=False, custom_title='Local Gradient, Eta=1e-10')

In [None]:
condensed_external_plotting(user_c0_APFL_eta_em10, 'local', plot_global_gradient=True, local_error=False, 
                            global_error=False, show_update_change=False, custom_title='Global Gradient, Eta=1e-10')

In [None]:
condensed_external_plotting(user_c0_APFL_eta_em10, 'local', plot_pers_gradient=True, local_error=False, 
                            global_error=False, show_update_change=False, custom_title='Personalized Gradient, Eta=1e-10')

In [None]:
central_tendency_plotting(all_user_input, highlight_default=False, default_local=False, default_global=False, default_pers=False, plot_mean=True, plot_gradient=False, global_error=False, local_error=False, pers_error=True, custom_title="", input_linewidth=1, my_legend_loc='best', iterable_labels=[], iterable_colors=[])

In [None]:
central_tendency_plotting(all_user_input, highlight_default=False, default_local=False, default_global=False, default_pers=False, plot_mean=True, plot_gradient=False, global_error=False, local_error=False, pers_error=True, custom_title="", input_linewidth=1, my_legend_loc='best', iterable_labels=[], iterable_colors=[])