In [2]:
import pandas as pd
import numpy as np

# Data Augmentation

This section performs data augmentation on the original simulation set. Essentially by adding different amounts of shift in time to the same voltage curve, you can turn one simulation into multiple "simulations", increasing your training set size. This is good for the neural network.

In [3]:
Q_df_sim = pd.read_csv(f'30_sim_features/features.csv',sep=',',index_col=0)
Q_matrix_sim = Q_df_sim.values

`max_Q_shift` needs to be specified. This is the maximum amount of shift that can be added. 

`N` is the number of times you want to repeat the augmentation procedure, multiplying the size of the intial data set by `N`

In [4]:
max_Q_shift = 0.15

N = 8

Q_df_sims_augmented = []
for i in range(N):
    shift = np.random.uniform(0,max_Q_shift,len(Q_df_sim))
    Q_matrix_sim_with_shifts = Q_matrix_sim + (np.ones(np.shape(Q_matrix_sim)).T*shift).T
    Q_df_sim_with_shifts = pd.DataFrame(Q_matrix_sim_with_shifts,index=Q_df_sim.index,columns=Q_df_sim.columns)
    Q_df_sims_augmented.append(Q_df_sim_with_shifts)

Q_df_sims_augmented_df = pd.concat(Q_df_sims_augmented)
Q_df_sims_augmented_df.to_csv(f'30_sim_features/augmented_features_{N}x.csv',index=True)

# Feature Resolution

In some cases, it may be advantageous to reduce the feature resolution when representing a voltage curve. This can help reduce model-experiment discrepancy and can reduce training times for neural networks. This notebook saves feature files with lower voltage resolution.

`mV_resolution` corresponds to the voltage resolution (in units mV) desired. The default resolution in the original `features.csv` is 1 mV.


In [5]:
res_set = [1,10,100]

# loop through multiple feature files if you want!
for feature_filename in ['features','augmented_features_8x' ]:

    features = pd.read_csv(f'30_sim_features/{feature_filename}.csv',sep=',',index_col=0)

    for mV_resolution in res_set:
        features_lowres = features.T[::mV_resolution].T
        features_lowres.to_csv(f'30_sim_features/{feature_filename}_{mV_resolution}mV.csv',sep=',',index=True)

# Adding Simulation Parameters as Features

In some cases (like this example), you may want something you passed as a simulation parameter to be an additional feature that you pass as an input into your neural network. Here, we want the C-rate to be an input in predicting the degradation parameters, so we will read in the necessary feature file(s) and add a column at the end that is the C-rate.

In [8]:
sim_params = pd.read_csv('10_training_set_params/Simulation_Parameters_I0.csv',sep=',',index_col=0)

In [10]:
# loop through multiple feature files!
for feature_filename in ['features','augmented_features_8x' ]:
    for mV_resolution in res_set:
        features = pd.read_csv(f'30_sim_features/{feature_filename}_{mV_resolution}mV.csv',sep=',',index_col=0)
        
        # the index of the feature files is an identification number, and corresponds to a row in the sim_params table
        # so we can use the index to identify the exact C-rate used to generate each set of features (i.e. each simulated voltage curve)
        features['CRATE'] = sim_params.loc[features.index,'CRATE']
        features.to_csv(f'30_sim_features/{feature_filename}_{mV_resolution}mV.csv',sep=',',index=True)