__Purpose:__ Introduce Federated Learning, specifically by implementing FedAveraging on our dataset and moving on to more advanced methods.  Start by modifying the Simulations code, worry about (a)synchronicity later.
<br>
1. The dec matrix is the weights to pass back an forth (I think), although it comes out of SmoothBatch first
1. We are assuming we can test on the second half (updates 10-19ish) since learning should be complete by then!
1. Scipy.optimize.minimize() runs many iters to fully minimize its cost function.  You can change it to run as many iters as you'd like, although AFAIK you won't know how many it takes to converge.  But this is still a good set up for FL.
1. Hmm minimize() is doing BFGS rn and not SGD... not sure if that matters really.  Could probably implement SGD on my own or find it.  BFGS is 2nd order but we don't have a lot of parameters, I don't think.  Plus we can (already have?) solved analytically for the Hessian.

In [1]:
import pandas as pd
import os
import numpy as np
#from numpy.matlib import repmat
#from matplotlib import pyplot as plt
#from scipy.signal import detrend, firwin, freqz, lfilter
#from sklearn.model_selection import train_test_split, ShuffleSplit
from scipy.optimize import minimize, least_squares
import copy
from itertools import permutations

In [2]:
from experiment_params import *
from simulations import *
import time
# Do the below if you're in the pytch environment
#import pickle5 as pickle
import pickle

# Reminder of Conditions Order

NOTE: 

* **CONDITIONS** = array(['D_1', 'D_2', 'D_5', 'D_6', 'D_3', 'D_4', 'D_7','D_8']
* **LEARNING RATES:** alpha = 0.25 and 0.75; alpha = 0.25 for D1, D2, D5, D6; alpha = 0.75 for D3, D4, D7, D8
* **SMOOTHBATCH:** W_next = alpha*W_old + ((1 - alpha) * W_calc)

* **DECODER INIT:** pos for D1 - D4, neg for D5 - D8

* **PENALTY TERM:** $\lambda_E$ = 1e-6 for all, $\lambda_F$ = 1e-7 for all, $\lambda_D$ = 1e-3 for 1, 3, 5, 7 and 1e-4 for 2, 4, 6, 8 


| DECODER | ALPHA | PENALTY | DEC INIT |
| --- | --- | --- | --- |
| 1 | 0.25 | 1e-3 | + |
| 2 | 0.25 | 1e-4 | + |
| 3 | 0.75 | 1e-3 | + |
| 4 | 0.75 | 1e-4 | + |
| 5 | 0.25 | 1e-3 | - |
| 6 | 0.25 | 1e-4 | - |
| 7 | 0.75 | 1e-3 | - |
| 8 | 0.75 | 1e-4 | - |


## Load Our Data In

In [3]:
'''
t0 = time.time()
emg_data_df1 = pd.read_csv("Data\emg_full_data1.csv")
emg_data_df2 = pd.read_csv("Data\emg_full_data2.csv")
emg_data_df = pd.concat((emg_data_df1, emg_data_df2))
try:
    emg_data_df.drop('Unnamed: 0', axis=1, inplace=True)
except:
    print("NO UNNAMED COLUMN DETECTED!")
t1 = time.time()
total = t1-t0  
print(total)
print(emg_data_df.shape)
emg_data_df.head()
'''
# Just use the emg data directly from the pickle file for now
;

''

In [4]:
'''
t0 = time.time()
#envelope_df50 = pd.read_csv("Data\envelope_df50.csv")
envelope_df100 = pd.read_csv("Data\envelope_df100.csv")
#envelope_df150 = pd.read_csv("Data\envelope_df150.csv")
#envelope_df200 = pd.read_csv("Data\envelope_df200.csv")
#envelope_df250 = pd.read_csv("Data\envelope_df250.csv")
#envelope_df300 = pd.read_csv("Data\envelope_df300.csv")
#raw_envs = [envelope_df50, envelope_df100, envelope_df150, envelope_df200, envelope_df250, envelope_df300]
#all_envs = [env.drop('Unnamed: 0', axis=1) for env in raw_envs]
try:
    envelope_df100.drop('Unnamed: 0', axis=1, inplace=True)
except:
    print("NO UNNAMED COLUMN DETECTED!")
t1 = time.time()
total = t1-t0  
print(total)
print(envelope_df100.shape)
envelope_df100.head()
'''
# Just use the emg data directly from the pickle file for now
;

''

In [5]:
t0 = time.time()

with open('Data\continuous_full_data_block1.pickle', 'rb') as handle:
    #refs_block1, poss_block1, dec_vels_block1, int_vel_block1, emgs_block1, Ws_block1, Hs_block1, alphas_block1, pDs_block1, times_block1, conditions_block1 = pickle.load(handle)
    refs_block1, _, _, _, emgs_block1, Ws_block1, _, _, _, _, _ = pickle.load(handle)

#with open('Data\continuous_full_data_block2.pickle', 'rb') as handle:
    #refs_block2, poss_block2, dec_vels_block2, int_vel_block2, emgs_block2, Ws_block2, Hs_block2, alphas_block2, pDs_block2, times_block2, conditions_block2 = pickle.load(handle)
    #refs_block2, _, _, _, emgs_block2, Ws_block2, _, _, _, _, _ = pickle.load(handle)

t1 = time.time()
total = t1-t0  
print(total)

6.021999835968018


In [6]:
# 8 conditions, 20770 data points (only 19 unique sets!), xy, channels
Ws_block1[keys[0]].shape

(8, 20770, 2, 64)

In [7]:
update_ix

array([    0,  1200,  2402,  3604,  4806,  6008,  7210,  8412,  9614,
       10816, 12018, 13220, 14422, 15624, 16826, 18028, 19230, 20432,
       20769])

In [8]:
dec_cond0_user1_update0 = Ws_block1[keys[0]][0,0,:,:]
dec_cond0_user1_update1 = Ws_block1[keys[0]][0,update_ix[1],:,:]
dec_cond0_user1_update2 = Ws_block1[keys[0]][0,update_ix[2],:,:]

print(f"Shape of decoder: {dec_cond0_user1_update0.shape}")
print()
print(f"Total difference between dec0 and dec1: {(dec_cond0_user1_update0 - dec_cond0_user1_update1).sum()}")
print("E.g., as previously shown, the first two decs are the same")
print()
print(f"Total difference between dec0 and dec2: {(dec_cond0_user1_update0 - dec_cond0_user1_update2).sum()}")

Shape of decoder: (2, 64)

Total difference between dec0 and dec1: 0.0
E.g., as previously shown, the first two decs are the same

Total difference between dec0 and dec2: 3.1981579823181594


In [53]:
#emg_cond0_user1_update0 = emg_data_df.iloc[:64,:].shape

# (Condition, datapoints, channels)
print(emgs_block1[keys[0]][:,:,:].shape)

# Condition 0 of subject 1 ("0")
print(emgs_block1[keys[0]][0,:,:].shape)

(8, 20770, 64)
(20770, 64)


## Run One Iteration On Above Data and Check Decoders Are the Same
1. Modifying Simulations Code

In [54]:
# Just 1 person
filtered_signals = emgs_block1[keys[0]][0,:,:]
# Read in the reference positions from the pickle file
cued_target_position = refs_block1[keys[0]][0,:,:]

print(filtered_signals.shape)
print(cued_target_position.shape)

(20770, 64)
(20770, 2)


In [55]:
# Previously created random decoder, but we are trying to rerun
#D_0 = np.random.rand(2,64)
D_0 = Ws_block1[keys[0]][0,0,:,:]
total_datapoints = emgs_block1[keys[0]][0,:,:].shape[0]

#learning_batch = 8
learning_batch = update_ix[1]  # I think this is supposed to be the number of datapoints per update?... why was it only 8 before then? Still don't know where they were getting 60 data points from

In [56]:
#alpha = .95 # higher alpha means more old decoder (slower update)
#alphaF = 1e-1
#alphaD = 1e-1

# For condition 1:
alpha = .25 # higher alpha means more old decoder (slower update)
# Assuming these are the same as lambda's, the decoder cost penalties
alphaF = 1e-7
alphaD = 1e-3
#where is lambda E?

Assuming I don't need to take all these different funcs into account... should probably ask

In [57]:
# Original code for running simulations...
# for ix in range(10000):
    #accuracy_constant_,D_constant,p_constrained_constant = simulation_constant_intent(D_constant,learning_batch,alpha,alphaF=alphaF,alphaD=alphaD)
    #accuracy_constant.extend(accuracy_constant_)
    #accuracy_,D,p_constrained = simulation(D,learning_batch,alpha,alphaF=alphaF,alphaD=alphaD)    
    #accuracy.extend(accuracy_)
    #accuracy_bounded_,D_bounded,p_bounded = simulation_bounded_pos(D_bounded,learning_batch,alpha,alphaF=alphaF,alphaD=alphaD)  
    #accuracy_bounded.extend(accuracy_bounded_)
    #accuracy_constant_bounded_,D_constant_bounded,p_constant_bounded = simulation_constant_intent_bounded(D_constant_bounded,learning_batch,alpha,alphaF=alphaF,alphaD=alphaD)
    #accuracy_constant_bounded.extend(accuracy_constant_bounded_)
    
# Modified code for running simulations...
# Why loop at all right now...
#for ix in range(10):
    #accuracy_,D,p_constrained = simulation(D,learning_batch,alpha,alphaF=alphaF,alphaD=alphaD)    


In [58]:
D = []
D.append(D_0)

# Added 2 new parameters
#def simulation(D,learning_batch,alpha,alphaF=1e-2,alphaD=1e-2,display_info=False,num_iters=False):
#D  # Already defined
#learning_batch  # Already defined
#alpha  # Already defined
#alphaF=1e-2  #defined as something else earlier...
#alphaD=1e-2  #defined as something else earlier...
display_info=True
num_iters=False

#num_updates = int(np.floor((filtered_signals.shape[0]-1)/learning_batch)) # how many times can we update decoder based on learning batch    
num_updates = 19  # This is 19 for us

# batches the trials into each of the update batch
# Do num_updates-1 because the very last update is only 1 datapoint, the 2nd to last is only 337
for ix in range(num_updates-1):
    #print(ix)
    # For less cluttering when debugging
    #display_info = False
    
    # Instead of using learning_batch, we should get the same results just using update_ix values
    lower_bound = update_ix[ix]
    if ix==(num_updates-1):
        upper_bound = total_datapoints
    else:
        upper_bound = update_ix[ix+1]
    #print(lower_bound)
    #print(upper_bound)
        
    # stack s (64 x (60 timepoints x learning batch size))
    #s = np.hstack([x for x in filtered_signals[int(ix*learning_batch+1):int((ix+1)*learning_batch+1),:,:]])
    s = np.transpose(filtered_signals[lower_bound:upper_bound,:])  # Last working one
    #s = np.hstack([x for x in filtered_signals[lower_bound:upper_bound,:]])
    #print(f"s: {s.shape}")
    
    # stack p_intended (2 x 60 timepoints x learning batch size)
    #p_intended = np.hstack([np.tile(x[:,np.newaxis],60) for x in cued_target_position[int(ix*learning_batch+1):int((ix+1)*learning_batch+1),:]])
    p_intended = np.transpose(cued_target_position[lower_bound:upper_bound,:])  # This is the last working one that runs
    # One of these is how it is supposed to run I think
    #p_intended = np.hstack([np.tile(x[:,np.newaxis],60) for x in cued_target_position[lower_bound:upper_bound,:]])
    # Try 64 so the dimensions work out? I don't think I have enough data to transform the other one to 64 x (60x1200) since it's just 64x1200 right now
    #p_intended = np.hstack([np.tile(x[:,np.newaxis],64) for x in cued_target_position[lower_bound:upper_bound,:]])
    #print(f"p_int: {p_intended.shape}")
    
    v_intended,p_constrained = output_new_decoder(s,D[-1],p_intended)
    #print(f"v_int: {v_intended.shape}")
    #print()

    # UPDATE DECODER
    u = copy.deepcopy(s) # u is the person's signal s (64 CHANNELS X TIMEPOINTS)
    q = copy.deepcopy(v_intended) # use cued positions as velocity vectors for updating decoder should be 2 x num_trials

    # emg_windows against intended_targets (trial specific cued target)
    F = copy.deepcopy(u[:,:-1]) # note: truncate F for estimate_decoder
    V = copy.deepcopy(q)
    #print(f"u: {u.shape}")
    #print(f"q: {q.shape}")
    #print(f"F: {F.shape}")
    #print(f"V: {V.shape}")
    #print()

    # initial decoder estimate for gradient descent
    # ^ This is their comment
    # eg just GD's starting point.  Thus the different between D_0 and D0
    # Are they using GD? It looks like BFGS
    D0 = np.random.rand(2,64)

    # set alphas
    H = np.zeros((2,2))
    # use scipy minimize for gradient descent and provide pre-computed analytical gradient for speed
    if num_iters is False:
        out = minimize(lambda D: cost_l2(F,D,H,V,learning_batch,alphaF,alphaD), D0, method='BFGS', jac=lambda D: gradient_cost_l2(F,D,H,V,learning_batch,alphaF,alphaD), options={'disp': display_info})
    else:
        out = minimize(lambda D: cost_l2(F,D,H,V,learning_batch,alphaF,alphaD), D0, method='BFGS', jac=lambda D: gradient_cost_l2(F,D,H,V,learning_batch,alphaF,alphaD), options={'disp': display_info, 'maxiter':num_iters})

    # reshape to decoder parameters
    W_hat = np.reshape(out.x,(2, 64))

    # DO SMOOTHBATCH
    W_new = alpha*D[-1] + ((1 - alpha) * W_hat)
    D.append(W_new)

  out = minimize(lambda D: cost_l2(F,D,H,V,learning_batch,alphaF,alphaD), D0, method='BFGS', jac=lambda D: gradient_cost_l2(F,D,H,V,learning_batch,alphaF,alphaD), options={'disp': display_info})


Optimization terminated successfully.
         Current function value: 137.753028
         Iterations: 92
         Function evaluations: 126
         Gradient evaluations: 126
Optimization terminated successfully.
         Current function value: 126.003342
         Iterations: 102
         Function evaluations: 134
         Gradient evaluations: 134
Optimization terminated successfully.
         Current function value: 170.284743
         Iterations: 94
         Function evaluations: 123
         Gradient evaluations: 123
Optimization terminated successfully.
         Current function value: 178.425631
         Iterations: 93
         Function evaluations: 126
         Gradient evaluations: 126
Optimization terminated successfully.
         Current function value: 167.921070
         Iterations: 93
         Function evaluations: 123
         Gradient evaluations: 123
Optimization terminated successfully.
         Current function value: 124.951363
         Iterations: 92
         Func

p_int: (2, 1202) <br>
v_int: (2, 1202) <br>
u: (64, 1202) <br>
q: (2, 1202) <br>
F: (64, 1201) <br>
V: (2, 1202) <br>

In [59]:
# Remeber we also have D_0 as the first dec, thus we have num_updates+1 decs
print(len(D))

# The inits of each are the same (by definition since that's what I set D_0 to)
print((D[0] - Ws_block1[keys[0]][0,update_ix[0],:,:]).sum())

# Recall that in cphs data, the first two decoders are the same for some reason
print((Ws_block1[keys[0]][0,update_ix[1],:,:] - Ws_block1[keys[0]][0,update_ix[0],:,:]).sum())
print((D[0] - Ws_block1[keys[0]][0,update_ix[1],:,:]).sum())

19
0.0
0.0
0.0


In [60]:
# Thus the first instance where they could concievable have the same dec value is the 3rd dec in Ws_block (AKA index 2)
print((D[1] - Ws_block1[keys[0]][0,update_ix[2],:,:]).sum())

2.58337953616601


In [61]:
# Check how different the final decs are, this is all we really care about
# Although if the earlier decs are different how could the last ones be the same lol
print((D[-1] - Ws_block1[keys[0]][0,update_ix[-1],:,:]).sum())

1.6142654430840957


In [62]:
# Differences between consecutive decoders

# From this file
print(f"Length of D (sims code): {len(D)}")
print(f"Length of Ws_block1 (cphs code): {len(update_ix)}")
print()
print("Labels;       D (Sims);     Ws (CPHS);     Sim - CPHS")
for i in range(len(D)-2):
    print(f"Dec{i+1} - Dec{i}: {(D[i+1] - D[i]).sum():9.5f};    {(Ws_block1[keys[0]][0,update_ix[i+1],:,:] - Ws_block1[keys[0]][0,update_ix[i],:,:]).sum():9.5f};      {(D[i] - Ws_block1[keys[0]][0,update_ix[i],:,:]).sum():9.5f}")

Length of D (sims code): 19
Length of Ws_block1 (cphs code): 19

Labels;       D (Sims);     Ws (CPHS);     Sim - CPHS
Dec1 - Dec0:  -0.61478;      0.00000;        0.00000
Dec2 - Dec1:   1.22128;     -3.19816;       -0.61478
Dec3 - Dec2:  -0.24410;      8.21960;        3.80466
Dec4 - Dec3:  -0.96454;     -7.00649;       -4.65904
Dec5 - Dec4:  -0.94666;      2.21186;        1.38291
Dec6 - Dec5:   3.79658;     -3.56196;       -1.77561
Dec7 - Dec6:  -2.02195;     10.80750;        5.58294
Dec8 - Dec7:  -0.06733;    -12.59955;       -7.24650
Dec9 - Dec8:  -1.69894;     11.45546;        5.28572
Dec10 - Dec9:  -1.55810;    -12.38370;       -7.86868
Dec11 - Dec10:   0.61123;     -0.43172;        2.95692
Dec12 - Dec11:   2.44064;     -1.10106;        3.99987
Dec13 - Dec12:   1.92855;      7.55098;        7.54157
Dec14 - Dec13:  -5.70004;      0.16666;        1.91914
Dec15 - Dec14:   2.17454;     -4.90057;       -3.94757
Dec16 - Dec15:   1.28047;     -0.14916;        3.12755
Dec17 - Dec16:   0.8

In [63]:
# Adding one to account for the fact that Ws_block 0 and 1 are the same.
for i in range(len(D)-2):
    print(f"{(D[i] - Ws_block1[keys[0]][0,update_ix[i+1],:,:]).sum():9.5f}")

  0.00000
  2.58338
 -4.41494
  2.34745
 -0.82895
  1.78636
 -5.22456
  5.35305
 -6.16974
  4.51502
  3.38863
  5.10093
 -0.00941
  1.75247
  0.95301
  3.27670
  0.25980
