# Battery Data Exploration

In this notebook I explore some open source data on battery performace.


HERE I WILL MAKE A TOC


Data source: [Battery Remaining Useful Life (RUL)](https://www.kaggle.com/datasets/ignaciovinuales/battery-remaining-useful-life-rul)

More information on the data and how it was processed: [github:ignavinuales/Battery_RUL_Prediction](https://github.com/ignavinuales/Battery_RUL_Prediction) 


##### Importing Libraries ... 

In [1]:
# general
import sys
import numpy as np
import pandas as pd
pd.plotting.register_matplotlib_converters()

# mpl
import matplotlib as mpl
import matplotlib.pyplot as plt

# plotly
import plotly
import plotly.express as px
import plotly.io as pio
import plotly.figure_factory as ff
import plotly.graph_objects as go
pio.renderers.default = 'iframe'


# stats
from scipy import stats
#from lmfit.models import GaussianModel

# my helper functions
sys.path.insert(0, '../helpers/')
import pd_helpers
import plotly_helpers

## Data Exploration

Here I explore the dataset I found without much prior knowledge of the data.
Later in the notebook, I will [describe the data](#Data-Description) and plot some [interesting distributions](#Looking-at-distributions).


In [2]:
# import data
bat_path = './data/Battery_RUL.csv'
bat_data = pd.read_csv(bat_path) 


In [3]:
# explore data
bat_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15064 entries, 0 to 15063
Data columns (total 9 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Cycle_Index                15064 non-null  float64
 1   Discharge Time (s)         15064 non-null  float64
 2   Decrement 3.6-3.4V (s)     15064 non-null  float64
 3   Max. Voltage Dischar. (V)  15064 non-null  float64
 4   Min. Voltage Charg. (V)    15064 non-null  float64
 5   Time at 4.15V (s)          15064 non-null  float64
 6   Time constant current (s)  15064 non-null  float64
 7   Charging time (s)          15064 non-null  float64
 8   RUL                        15064 non-null  int64  
dtypes: float64(8), int64(1)
memory usage: 1.0 MB


In [4]:
# look at top two and bottom two rows
pd_helpers.pd_headtail(bat_data)

Unnamed: 0,Cycle_Index,Discharge Time (s),Decrement 3.6-3.4V (s),Max. Voltage Dischar. (V),Min. Voltage Charg. (V),Time at 4.15V (s),Time constant current (s),Charging time (s),RUL
0,1.0,2595.3,1151.4885,3.67,3.211,5460.001,6755.01,10777.82,1112
1,2.0,7408.64,1172.5125,4.246,3.22,5508.992,6762.02,10500.35,1111
2,3.0,7393.76,1112.992,4.249,3.224,5508.993,6762.02,10420.38,1110
15061,1110.0,769.12,179.357143,3.773,3.742,915.513,1412.31,6637.12,2
15062,1111.0,773.88,162.374667,3.763,3.839,539.375,1148.0,7660.62,1
15063,1112.0,677537.27,142740.64,4.206,3.305,49680.004,599830.14,599830.14,0


------------
There are more cycle indexes than rows. Let's try to understand this by plotting the indexes on a histogram.


#### Looking at Cycle Indexes

In [5]:
# cycle index histo
fig = px.histogram(bat_data, x='Cycle_Index',nbins=1000)
fig.show()

---------------
Let's look at rows with `Cycle_Index == 1`

In [6]:
init_rows = bat_data.loc[ bat_data.Cycle_Index == 1 ]
print(len(init_rows))
init_rows.head()

14


Unnamed: 0,Cycle_Index,Discharge Time (s),Decrement 3.6-3.4V (s),Max. Voltage Dischar. (V),Min. Voltage Charg. (V),Time at 4.15V (s),Time constant current (s),Charging time (s),RUL
0,1.0,2595.3,1151.4885,3.67,3.211,5460.001,6755.01,10777.82,1112
1076,1.0,2604.0,1186.4955,3.666,3.213,5424.991,6706.02,10772.99,1107
2155,1.0,2562.02,1140.991,3.666,3.219,5452.993,6740.99,10836.0,1107
3232,1.0,2566.08,1161.983,3.667,3.214,5452.992,6740.99,10938.33,1107
4313,1.0,2590.02,1239.007,3.66,3.225,5376.0,6678.01,10821.98,1133


------------------
Here I see that the data describes 14 batteries. The cycling data is in fact concatenated. From the figure above we can also see that some cycling data is missing. I will now mark each battery.

------------------

##### Adding battery ID's to the data frame

In [7]:
# save a copy of the original data
bat_data_orig = bat_data.copy()

In [69]:
#get index of each first cycle
init_indexes = pd.Series(init_rows.index)
init_indexes.head(5)


0       0
1    1076
2    2155
3    3232
4    4313
dtype: int64

In [9]:
# define mapping function
def bat_id_map(init_indexes, val):
    
    # init_indexes -- pd.Series of initial cycle indexes
    #                 sorted lowest to highest
    
    # val -- comparison value
    return init_indexes[init_indexes<=val].idxmax()

# --------------------------------
# map batteries
bat_data['Battery_ID'] = bat_data.index.map(lambda i : bat_id_map(init_indexes,i))

In [10]:
# testing
display(bat_data.head())
bat_data.loc[ bat_data.Cycle_Index == 500 ].head()

Unnamed: 0,Cycle_Index,Discharge Time (s),Decrement 3.6-3.4V (s),Max. Voltage Dischar. (V),Min. Voltage Charg. (V),Time at 4.15V (s),Time constant current (s),Charging time (s),RUL,Battery_ID
0,1.0,2595.3,1151.4885,3.67,3.211,5460.001,6755.01,10777.82,1112,0
1,2.0,7408.64,1172.5125,4.246,3.22,5508.992,6762.02,10500.35,1111,0
2,3.0,7393.76,1112.992,4.249,3.224,5508.993,6762.02,10420.38,1110,0
3,4.0,7385.5,1080.320667,4.25,3.225,5502.016,6762.02,10322.81,1109,0
4,6.0,65022.75,29813.487,4.29,3.398,5480.992,53213.54,56699.65,1107,0


Unnamed: 0,Cycle_Index,Discharge Time (s),Decrement 3.6-3.4V (s),Max. Voltage Dischar. (V),Min. Voltage Charg. (V),Time at 4.15V (s),Time constant current (s),Charging time (s),RUL,Battery_ID
478,500.0,1604.16,459.839429,3.944,3.545,3098.36,3968.36,8013.7,613,0
1554,500.0,1587.87,409.285714,3.884,3.589,2974.752,3860.35,8379.71,608,1
2631,500.0,1631.84,470.571429,3.927,3.556,3176.223,4040.22,8225.22,608,2
3711,500.0,1638.15,480.857143,3.934,3.551,3205.151,4076.35,8285.41,608,3
4794,500.0,1656.0,476.0,3.935,3.549,3284.32,4148.32,8396.32,634,4


## Data Description

- There are 14 batteries in the dataset
- Each battery is cycled for ~1100 times
- Each cycle is one discharge and charge
- `Cycle_Index` indexes each cycle

- The battery is charged at a nominal rate of C/2 (1h/0.5 = 2h = 7.2k s) and discharged at a rate of 1.5C (1h/1.5 = 40 min = 2.4k s) 
- `Discharge` and `Charge` time is the time it takes to fully discharge and charge the battery 
- `Decrement 3.6-3.4V` marks the time beween those voltages during discharge
- `Max/Min Voldatge` marks the max (min) voltage before each discharge (charge)
- `Time at 4.15V` indicates the moment in time during charge when the battery hits 4.15V
- During the charge cycle the battery is at first charged at **constant current** and then at **constant voltage** 
- `Time constant current (s)` shows the time the battery has spent in **constant cuurent** mode
- `RUL` - remaining useful life of the battery measured in **# of cycles**


## Looking at distributions

##### _Plot Cosmetics_

In [50]:
# fig_labels = { 'Cycle_Index' : 'Cycle Number'
#               ,'Battery_ID'   : 'Battery ID'
#              }

#### Cycle vs Discharge and Charge Time

The discharge and charge times are expected to shorten with the number of cycles as the batery gets worn and its capacity decreases. I plot the cycle number against the discharging / charging time to see this relationship.

##### First attempt : Seeing the Outliers

In [63]:
# title
sp1_title = "Charge time development over consecutive cycles"

# plots 
plots_y = ['Discharge Time (s)','Charging time (s)']

sp_figs = []
for y in plots_y :
    fig = px.scatter(bat_data
                     ,x = 'Cycle_Index'
                     ,y = y
                     ,color = 'Battery_ID')
    sp_figs.append(fig)


# put px plots into subplots
sp1 = plotly_helpers.make_px_subplots(sp_figs,1,2,plots_y
                                      ,x_labels=['Cycle Index' for i in sp_figs ]
                                      ,y_labels=plots_y
                                      ,c_label='Battery ID')
 
    
# cosmetics
sp1.update_layout(title=sp1_title
                 ,coloraxis = {'colorscale':'viridis'})

# draw
sp1.show()


##### Second attempt : Outliers Removed

It is clear that there are many outliers in the data. These could be special charging cycles. Let's try observing the distributions with the outliers removed.

In [65]:
# making the same plots as above but with the outliers removed

outlier_th = [3e3,12e3]   # charge, discharge 

# making subfigures
sp_figs = []
y_labels = []

for i,y in enumerate(plots_y) :
    fig = px.scatter(bat_data[bat_data[y] < outlier_th[i]]
                     ,x = 'Cycle_Index'
                     ,y = y
                     ,trendline='ols'
                     ,trendline_color_override='red'
                     ,color = 'Battery_ID')
    sp_figs.append(fig)
    y_labels.append(y+' < '+'{:.1e}'.format(outlier_th[i]))

# put px plots into subplots
sp2 = plotly_helpers.make_px_subplots(sp_figs,1,2,plots_y
                                      ,x_labels=['Cycle Index' for i in sp_figs ]
                                      ,y_labels=y_labels
                                      ,c_label='Battery ID')
 
    
# cosmetics
sp2.update_layout(title=sp1_title+", outliers removed"
                 ,coloraxis = {'colorscale':'viridis'})

# draw
sp2.show()



It is clear that both charge and discharge time has a clear relationship with charge/discharge time.

#### Cycle vs Min and Max Voltage 

The minimum voltage at discharge and maximum voltage at charge are also related to the resistance and capasity of the battery.

For example, after a full charge, when the current is disconnected the voltage of the battery drops due to **polarisation effects**. This voltage is given as the variable `Max. Voltage Dischar.`, i.e. the max voltage before the start of the discharge. The size of the drop is positively related to the resistance of the battery, and the current used to charge it.     

As the battery wears, the resistance of the batery increases and the **maximum voltage at discharge** decreases, and the **minimum voltage at charge** increases. Let's see these distributions.


In [67]:
# title
sp1_title = "Max and Min Voltage development over consecutive cycles"

# plots 
plots_y = ['Max. Voltage Dischar. (V)','Min. Voltage Charg. (V)']

sp_figs = []
for y in plots_y :
    fig = px.scatter(bat_data
                     ,x = 'Cycle_Index'
                     ,y = y
                     ,color = 'Battery_ID')
    sp_figs.append(fig)


# put px plots into subplots
sp1 = plotly_helpers.make_px_subplots(sp_figs,1,2,plots_y
                                      ,x_labels=['Cycle Index' for i in sp_figs ]
                                      ,y_labels=plots_y
                                      ,c_label='Battery ID')
 
    
# cosmetics
sp1.update_layout(title=sp1_title
                 ,coloraxis = {'colorscale':'viridis'})

# draw
sp1.show()



As expected a clear relationship can be seen between max(min) voltages at discharge(charge) \[MaxDV/MinCV\] and the cycle number. 
There also are clear indications of "non standard" cycles creating outliers, as seen previously in charge time.

#### Understanding the outliers:

When a battery is charged slower (at lower current) a higher maximum voltage can be achieved as the polarisation effect is smaller. 
To summarise:
- Fast charge = low MaxDV
- Slow charge = high MaxDV
- Fast discharge = high MinCV
- Slow discharge = low MinCV

Interestingly charging times and min. voltage at charge seem a lot more varied.

It will likely be interesting to see whether removing outliers would improve or worsen the performance of a ML algorithm predicting battery life.

To make sure this observation is correct I will print out a few lines arround a **low** and **high** MinCV cycle.

In [99]:
# choose battery 12

cols= ['Cycle_Index'
       ,'Discharge Time (s)'
       ,'Max. Voltage Dischar. (V)'
       ,'Min. Voltage Charg. (V)'
       ,'Charging time (s)'
       ,'Battery_ID']

test_data = bat_data.loc[bat_data['Battery_ID'] == 12,cols]

# plot values so I can choose useful Cycle_ID's
fig_t = px.scatter(test_data
                   ,x='Cycle_Index'
                   ,y='Min. Voltage Charg. (V)'
                   ,height=400
                   ,width=800)
fig_t.show()

In [100]:
# lines I will print before and after the event of interest
Nprint = 2   # 2+2+2 --> 5 lines

##### Baseline -- normal event

In [104]:
# BASELINE --- cycle 115
index = test_data.loc[test_data['Cycle_Index'] == 115].index[0]
test_data.loc[ index - Nprint : index + Nprint ]


Unnamed: 0,Cycle_Index,Discharge Time (s),Max. Voltage Dischar. (V),Min. Voltage Charg. (V),Charging time (s),Battery_ID
13051,113.0,2095.94,3.993,3.419,9122.08,12
13052,114.0,2093.15,3.992,3.419,9119.14,12
13053,115.0,2089.91,3.993,3.421,9109.85,12
13054,116.0,2087.05,3.993,3.421,9118.05,12
13055,117.0,2083.16,3.993,3.422,9084.41,12


The (dis)charge times and (dis)charge voltages look consistent.

##### Low MinCV event -- expect slow discharge (large Discharge time) in event 

In [107]:
# BASELINE --- cycle 692
index = test_data.loc[test_data['Cycle_Index'] == 692].index[0]
test_data.loc[ index - Nprint : index + Nprint ]


Unnamed: 0,Cycle_Index,Discharge Time (s),Max. Voltage Dischar. (V),Min. Voltage Charg. (V),Charging time (s),Battery_ID
13607,690.0,1298.34,3.881,3.608,7985.47,12
13608,691.0,1282.88,3.878,3.788,8518.41,12
13609,692.0,191038.09,4.24,3.349,70827.33,12
13610,694.0,1396.83,3.891,3.604,8541.98,12
13611,695.0,1397.92,3.895,3.6,8408.03,12


The assumtion was correct, the discharge time was indeed very long for this event (2 orders of magnitude higher).
