# Battery Data Exploration

In this notebook I explore some open source data on battery performace.

Data source: [Battery Remaining Useful Life (RUL)](https://www.kaggle.com/datasets/ignaciovinuales/battery-remaining-useful-life-rul)

More information on the data and how it was processed: [github:ignavinuales/Battery_RUL_Prediction](https://github.com/ignavinuales/Battery_RUL_Prediction) 

-----------------------------
**Contents:**
1. [Initial Data Exploration](#1.-Initial-Data-Exploration)  
   1.1. [Looking at Cycle Indexes](#1.1-Looking-at-Cycle-Indexes)  
   1.2. [Adding battery ID's to the data frame](#1.2-Adding-battery-ID's-to-the-data-frame)

2. [Data Overview](#2.-Data-Overview)  

3. [Looking at the distributions](#3.-Looking-at-the-distributions)  
   3.1. [Cycle # vs Discharge and Charge Time](#3.1-Cycle-#-vs-Discharge-and-Charge-Time)  
   3.2. [Cycle # vs Min and Max Voltage](#-3.2-Cycle-#-vs-Min-and-Max-Voltage)  
   3.3. [Cycle # vs Decrement 3.6 to 3.4 V](#3.3-Cycle-#-vs-Decrement-3.6-to-3.4-V)    
      3.3.1. [Discharge Time vs Decrement 3.6-3.4V](#3.3.1.-Discharge-Time-vs-Decrement-3.6-3.4V)
   3.4. [Cycle # vs Time at 4.15 V during charge](#3.4-Cycle-#-vs-Time-at-4.15-V-during-charge)  
      3.4.1. [Charge Time vs Time at 4.15 V during charge](#3.4.1-Charge-Time-vs-Time-at-4.15-V-during-charge)   
   3.5. [Cycle # vs Time at Constant Current (CC)](#3.5-Cycle-#-vs-Time-at-Constant-Current-(CC))   
      3.5.1. [Charge Time vs Time at CC](#3.5.1.-Charge-Time-vs-Time-at-CC)

4. [Predicting RUL](#4.-Predicting-RUL)   
   4.1. [Data Cleaning](#4.1-Data-Cleaning)   
      4.1.1. [Checking missing records](#4.1.1-Checking-missing-records)   
      4.1.2. [Removing unphysical records](#4.1.2-Removing-unphysical-records)   
      4.1.3. [Plotting Distributions](#4.1.3-Plotting-Distributions)   
      

-----------------------------



##### Importing Libraries ... 

In [1]:
# general
import sys
import numpy as np
import pandas as pd
pd.plotting.register_matplotlib_converters()

# mpl
import matplotlib as mpl
import matplotlib.pyplot as plt

# plotly
import plotly
import plotly.express as px
import plotly.io as pio
import plotly.figure_factory as ff
import plotly.graph_objects as go
pio.renderers.default = 'iframe'


# stats
from scipy import stats
#from lmfit.models import GaussianModel

# my helper functions
sys.path.insert(0, '../helpers/')
import pd_helpers
import plotly_helpers

## 1. Initial Data Exploration
<sup>[Go to top](#-Battery-Data-Exploration)</sup>  
Here I explore the dataset I found without much prior knowledge of the data.


In [2]:
# import data
bat_path = './data/Battery_RUL.csv'
bat_data = pd.read_csv(bat_path) 


In [3]:
# explore data
bat_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15064 entries, 0 to 15063
Data columns (total 9 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Cycle_Index                15064 non-null  float64
 1   Discharge Time (s)         15064 non-null  float64
 2   Decrement 3.6-3.4V (s)     15064 non-null  float64
 3   Max. Voltage Dischar. (V)  15064 non-null  float64
 4   Min. Voltage Charg. (V)    15064 non-null  float64
 5   Time at 4.15V (s)          15064 non-null  float64
 6   Time constant current (s)  15064 non-null  float64
 7   Charging time (s)          15064 non-null  float64
 8   RUL                        15064 non-null  int64  
dtypes: float64(8), int64(1)
memory usage: 1.0 MB


In [4]:
# look at top two and bottom two rows
pd_helpers.pd_headtail(bat_data)

Unnamed: 0,Cycle_Index,Discharge Time (s),Decrement 3.6-3.4V (s),Max. Voltage Dischar. (V),Min. Voltage Charg. (V),Time at 4.15V (s),Time constant current (s),Charging time (s),RUL
0,1.0,2595.3,1151.4885,3.67,3.211,5460.001,6755.01,10777.82,1112
1,2.0,7408.64,1172.5125,4.246,3.22,5508.992,6762.02,10500.35,1111
2,3.0,7393.76,1112.992,4.249,3.224,5508.993,6762.02,10420.38,1110
15061,1110.0,769.12,179.357143,3.773,3.742,915.513,1412.31,6637.12,2
15062,1111.0,773.88,162.374667,3.763,3.839,539.375,1148.0,7660.62,1
15063,1112.0,677537.27,142740.64,4.206,3.305,49680.004,599830.14,599830.14,0


------------
There are more cycle indexes than rows. Let's try to understand this by plotting the indexes on a histogram.

### 1.1 Looking at Cycle Indexes

In [5]:
# cycle index histo
fig = px.histogram(bat_data, x='Cycle_Index',nbins=1000)
fig.show()

---------------
Let's look at rows with `Cycle_Index == 1`

In [6]:
init_rows = bat_data.loc[ bat_data.Cycle_Index == 1 ]
print(len(init_rows))
init_rows.head()

14


Unnamed: 0,Cycle_Index,Discharge Time (s),Decrement 3.6-3.4V (s),Max. Voltage Dischar. (V),Min. Voltage Charg. (V),Time at 4.15V (s),Time constant current (s),Charging time (s),RUL
0,1.0,2595.3,1151.4885,3.67,3.211,5460.001,6755.01,10777.82,1112
1076,1.0,2604.0,1186.4955,3.666,3.213,5424.991,6706.02,10772.99,1107
2155,1.0,2562.02,1140.991,3.666,3.219,5452.993,6740.99,10836.0,1107
3232,1.0,2566.08,1161.983,3.667,3.214,5452.992,6740.99,10938.33,1107
4313,1.0,2590.02,1239.007,3.66,3.225,5376.0,6678.01,10821.98,1133


------------------
Here I see that the data describes 14 batteries. The cycling data is in fact concatenated. From the figure above we can also see that some cycling data is missing. I will now mark each battery.

------------------

### 1.2 Adding battery ID's to the data frame

In [7]:
# save a copy of the original data
bat_data_orig = bat_data.copy()

In [8]:
#get index of each first cycle
init_indexes = pd.Series(init_rows.index)
init_indexes.head(5)


0       0
1    1076
2    2155
3    3232
4    4313
dtype: int64

In [9]:
# define mapping function
def bat_id_map(init_indexes, val):
    
    # init_indexes -- pd.Series of initial cycle indexes
    #                 sorted lowest to highest
    
    # val -- comparison value
    return init_indexes[init_indexes<=val].idxmax()

# --------------------------------
# map batteries
bat_data['Battery_ID'] = bat_data.index.map(lambda i : bat_id_map(init_indexes,i))

In [10]:
# testing
display(bat_data.head())
bat_data.loc[ bat_data.Cycle_Index == 500 ].head()

Unnamed: 0,Cycle_Index,Discharge Time (s),Decrement 3.6-3.4V (s),Max. Voltage Dischar. (V),Min. Voltage Charg. (V),Time at 4.15V (s),Time constant current (s),Charging time (s),RUL,Battery_ID
0,1.0,2595.3,1151.4885,3.67,3.211,5460.001,6755.01,10777.82,1112,0
1,2.0,7408.64,1172.5125,4.246,3.22,5508.992,6762.02,10500.35,1111,0
2,3.0,7393.76,1112.992,4.249,3.224,5508.993,6762.02,10420.38,1110,0
3,4.0,7385.5,1080.320667,4.25,3.225,5502.016,6762.02,10322.81,1109,0
4,6.0,65022.75,29813.487,4.29,3.398,5480.992,53213.54,56699.65,1107,0


Unnamed: 0,Cycle_Index,Discharge Time (s),Decrement 3.6-3.4V (s),Max. Voltage Dischar. (V),Min. Voltage Charg. (V),Time at 4.15V (s),Time constant current (s),Charging time (s),RUL,Battery_ID
478,500.0,1604.16,459.839429,3.944,3.545,3098.36,3968.36,8013.7,613,0
1554,500.0,1587.87,409.285714,3.884,3.589,2974.752,3860.35,8379.71,608,1
2631,500.0,1631.84,470.571429,3.927,3.556,3176.223,4040.22,8225.22,608,2
3711,500.0,1638.15,480.857143,3.934,3.551,3205.151,4076.35,8285.41,608,3
4794,500.0,1656.0,476.0,3.935,3.549,3284.32,4148.32,8396.32,634,4


## 2. Data Overview
<sup>[Go to top](#-Battery-Data-Exploration)</sup>

- There are 14 batteries in the dataset
- Each battery is cycled for ~1100 times
- Each cycle is one discharge and charge
- `Cycle_Index` indexes each cycle

- The battery is charged at a nominal rate of C/2 (1h/0.5 = 2h = 7.2k s) and discharged at a rate of 1.5C (1h/1.5 = 40 min = 2.4k s) 
- `Discharge` and `Charge` time is the time it takes to fully discharge and charge the battery 
- `Decrement 3.6-3.4V` marks the time beween those voltages during discharge
- `Max/Min Voltage` marks the max (min) voltage before each discharge (charge)
- `Time at 4.15V` indicates the moment in time during charge when the battery hits 4.15V
- During the charge cycle the battery is at first charged at **constant current** and then at **constant voltage** 
- `Time constant current (s)` shows the time the battery has spent in **constant cuurent** mode
- `RUL` - remaining useful life of the battery measured in **# of cycles**

## 3. Looking at the distributions
<sup>[Go to top](#-Battery-Data-Exploration)</sup>

##### _Plot Cosmetics_

In [11]:
fig_labels = { 'Cycle_Index' : 'Cycle Number'
              ,'Battery_ID'   : 'Battery ID'
             }

#### 3.1 Cycle # vs Discharge and Charge Time

The discharge and charge times are expected to shorten with the number of cycles as the batery gets worn and its capacity decreases. I plot the cycle number against the discharging / charging time to see this relationship.

##### First attempt : Seeing the Outliers

In [12]:
# title
sp1_title = "Charge time development over consecutive cycles"

# plots 
plots_y = ['Discharge Time (s)','Charging time (s)']

sp_figs = []
for y in plots_y :
    fig = px.scatter(bat_data
                     ,x = 'Cycle_Index'
                     ,y = y
                     ,color = 'Battery_ID')
    sp_figs.append(fig)


# put px plots into subplots
sp1 = plotly_helpers.make_px_subplots(sp_figs,1,2,plots_y
                                      ,x_labels=['Cycle Number' for i in sp_figs ]
                                      ,y_labels=plots_y
                                      ,c_label='Battery ID')
 
    
# cosmetics
sp1.update_layout(title=sp1_title
                 ,coloraxis = {'colorscale':'viridis'})

# draw
sp1.show()


##### Second attempt : Outliers Removed

It is clear that there are many outliers in the data. These could be special charging cycles. Let's try observing the distributions with the outliers removed.

In [13]:
# making the same plots as above but with the outliers removed

outlier_th = [3e3,12e3]   # discharge, charge

# making subfigures
sp_figs = []
y_labels = []

for i,y in enumerate(plots_y) :
    fig = px.scatter(bat_data[bat_data[y] < outlier_th[i]]
                     ,x = 'Cycle_Index'
                     ,y = y
                     ,trendline='ols'
                     ,trendline_color_override='red'
                     ,color = 'Battery_ID')
    sp_figs.append(fig)
    y_labels.append(y+' < '+'{:.1e}'.format(outlier_th[i]))

# put px plots into subplots
sp2 = plotly_helpers.make_px_subplots(sp_figs,1,2,plots_y
                                      ,x_labels=['Cycle Index' for i in sp_figs ]
                                      ,y_labels=y_labels
                                      ,c_label='Battery ID')
 
    
# cosmetics
sp2.update_layout(title=sp1_title+", outliers removed"
                 ,coloraxis = {'colorscale':'viridis'})

# draw
sp2.show()



It is clear that both charge and discharge time have a relationship with charge/discharge time.


#### 3.2 Cycle # vs Min and Max Voltage 

The minimum voltage at discharge and maximum voltage at charge are also related to the resistance and capasity of the battery.

For example, after a full charge, when the current is disconnected the voltage of the battery drops due to **polarisation effects**. This voltage is given as the variable `Max. Voltage Dischar.`, i.e. the max voltage before the start of the discharge. The size of the drop is positively related to the resistance of the battery, and the current used to charge it.     

As the battery wears, the **resistance of the batery increases** and the **maximum voltage at discharge** decreases, and the **minimum voltage at charge** increases. Let's see these distributions.


In [14]:
# title
sp1_title = "Max and Min Voltage development over consecutive cycles"

# plots 
plots_y = ['Max. Voltage Dischar. (V)','Min. Voltage Charg. (V)']

sp_figs = []
for y in plots_y :
    fig = px.scatter(bat_data
                     ,x = 'Cycle_Index'
                     ,y = y
                     ,color = 'Battery_ID')
    sp_figs.append(fig)


# put px plots into subplots
sp3 = plotly_helpers.make_px_subplots(sp_figs,1,2,plots_y
                                      ,x_labels=['Cycle Number' for i in sp_figs ]
                                      ,y_labels=plots_y
                                      ,c_label='Battery ID')
 
    
# cosmetics
sp3.update_layout(title=sp1_title
                 ,coloraxis = {'colorscale':'viridis'})

# draw
sp3.show()



As expected a clear relationship can be seen between max(min) voltages at discharge(charge) and the cycle number. 
There also are clear indications of "non standard" cycles creating outliers, as seen previously in charge time.

Interestingly charging times and min. voltage at charge seem a lot more varied.

It will likely be interesting to see whether removing outliers would improve or worsen the performance of a ML algorithm predicting battery life.

#### 3.3 Cycle # vs Decrement 3.6 to 3.4 V

I.e. time spent between 3.6 and 3.4 Volts during discharge. As the battery wears down the discharge becomes faster and so should the time spent between these voltages. 


##### First attempt: with Outliers

In [15]:
fig4_title= "Decrement 3.6-3.4V during discharge over consecutive cycles"
fig4 = px.scatter(bat_data
                  ,x='Cycle_Index'
                  ,y='Decrement 3.6-3.4V (s)'
                  ,color = 'Battery_ID'
                  ,color_continuous_scale='viridis'
                  ,labels=fig_labels
                  ,title =fig4_title)
fig4.show()

Notice that there is a huge ammount of outliers preventing from visualising the distributions. Interestingly there are a lot of events with negative time. which is likely bad data, and will need to be dropped before using it for ML prediction.

##### Second attempt: removing the outliers

In [16]:
th = [1e4,2e3] # threshold
sdc = 'Decrement 3.6-3.4V (s)'

sp5_title = fig4_title+', outliers removed'
 
sp_figs = []
sp_titles = []
for t in th : 
    plot_data = bat_data.loc[(bat_data[sdc]<t) & (bat_data[sdc]>0)]
    fig = px.scatter(plot_data
                    ,x='Cycle_Index'
                    ,y='Decrement 3.6-3.4V (s)'
                    ,color = 'Battery_ID')
    sp_figs.append(fig)
    sp_titles.append('Threshold = {:.0f}s'.format(t))
    
    
# put px plots into subplots
sp5 = plotly_helpers.make_px_subplots(sp_figs,1,2,sp_titles
                                      ,x_labels=['Cycle Number' for i in sp_figs ]
                                      ,y_labels=['Decrement 3.6-3.4V (s)' for i in sp_figs]
                                      ,c_label='Battery ID')
 
    
# cosmetics
sp5.update_layout(title=sp5_title
                 ,coloraxis = {'colorscale':'viridis'})

# draw
sp5.show()

We still see outliers above 2000s. These correspond to large discharge times.

In [17]:
index = bat_data.loc[(bat_data['Battery_ID'] == 1) &(bat_data['Cycle_Index'] == 303)].index[0]
bat_data.loc[index-2:index+2]

Unnamed: 0,Cycle_Index,Discharge Time (s),Decrement 3.6-3.4V (s),Max. Voltage Dischar. (V),Min. Voltage Charg. (V),Time at 4.15V (s),Time constant current (s),Charging time (s),RUL,Battery_ID
1375,301.0,1808.62,495.0,3.927,3.528,3694.65,4616.25,8524.56,807,1
1376,302.0,1805.0,492.0,3.935,3.724,3598.312,4571.0,8699.5,806,1
1377,303.0,200939.71,2970.015,4.262,3.289,64709.988,75921.25,75921.25,805,1
1378,304.0,1860.7,501.0,3.922,3.505,4393.152,5444.35,9830.91,804,1
1379,305.0,1869.57,509.0,3.926,3.529,3867.551,4796.35,8728.93,803,1


As expected, the cycles with the slow discharge times also have slow decrement times. It might be useful to drop all events with decrement time >2000 before using them to predict RUL.

#### 3.3.1. Discharge Time vs Decrement 3.6-3.4V

A positive relationship between decrement time between 3.6-3.4V and full discharge time is expected. 


In [18]:
plot_data = bat_data.loc[(bat_data[sdc]>0)&(bat_data[sdc]<2000)] # decrement thresholds
plot_data = plot_data.loc[(bat_data['Discharge Time (s)']<10000)] # discharge time thresholds (for visibility)

fig6_title= "Decrement 3.6-3.4V vs Discharge time"
fig6 = px.scatter(plot_data
                  ,x='Discharge Time (s)'
                  ,y='Decrement 3.6-3.4V (s)'
                  ,color = 'Battery_ID'
                  ,color_continuous_scale='viridis'
                  ,labels=fig_labels
                  ,title =fig6_title)
fig6.show()

A positive relationship between 3.6-3.4V decrement and full discharge time is seen, as expected. 
Again ouliers with long discharge times are also clearly seen. It would possibly be benefitial to also remove data with `Discharge Time (s) > 3000` before using the data to predict RUL.

#### 3.4 Cycle # vs Time at 4.15 V during charge



In [19]:
fig7_title= "Time at 4.15 V during charge over consecutive cycles"

# removing outliers
plot_data = bat_data.loc[(bat_data['Time at 4.15V (s)']<1e4)]


# make plot
fig7 = px.scatter(plot_data
                  ,x='Cycle_Index'
                  ,y='Time at 4.15V (s)'
                  ,color = 'Battery_ID'
                  ,color_continuous_scale='viridis'
                  ,labels=fig_labels
                  ,title =fig7_title)
fig7.show()

The figure above shows, that the moment in time during charge, when the voltage reaches 4.15 V decreases. This behaviour is expected as the time it takes to charge the battery shortens as the battery deteriorates.

Again, quite a lot of outliers were seen.

#### 3.4.1 Charge Time vs Time at 4.15 V during charge



In [20]:
plot_data = bat_data.loc[(bat_data['Time at 4.15V (s)']<1e4) 
                         & (bat_data['Time at 4.15V (s)']>0)] # Time at 4.15 threshold
plot_data = plot_data.loc[(plot_data['Charging time (s)']<1e4)] # charge time thresholds

fig8_title= "Time at 4.15V during charge versus Charge time"
fig8 = px.scatter(plot_data
                  ,x='Charging time (s)'
                  ,y='Time at 4.15V (s)'
                  ,color = 'Battery_ID'
                  ,color_continuous_scale='viridis'
                  ,labels=fig_labels
                  ,title =fig8_title)
fig8.show()

A positive relationship can clearly be seen, as expected. 

Also, some negative values in `Time at 4.15V` records were found. Such cycles should be removed before using this data to predict RUL.

There are also peculiar events where the full charging time is shorter than the time at 4.15 V during the charge part of the cycle. Given known information, this is unphysical. Let's check how many cycles have this property.


In [21]:
weird_ch_data = bat_data.loc[bat_data['Charging time (s)']<bat_data['Time at 4.15V (s)']]
print('# weird events: {}/{}'.format(len(weird_ch_data),len(bat_data)))


# weird events: 28/15064


Such events only make up a small fraction of all events and will be removed.

I will also check events where `Decrement 3.6-3.4V (s)` is greater than `Discharge Time (s)` which also should be removed.

In [22]:
weird_disch_data = bat_data.loc[bat_data['Discharge Time (s)']<bat_data['Decrement 3.6-3.4V (s)']]
print('# weird events: {}/{}'.format(len(weird_disch_data),len(bat_data)))



# weird events: 33/15064


In all, a small number of cycles where `Time at 4.15V` > `Chatge time` and `Decrement 3.6-3.4V (s)` > `Discharge Time (s)` were found and should be removed from the data.

#### 3.5 Cycle # vs Time at Constant Current (CC)

To charge the battery two current modes are used - constant current (CC) and constant voltage (CV).
It is expected that the `Time constant current (s)` variable will behave similarly to `Charging time (s)`.

In [23]:
fig9_title= "Time at CC charge mode over consecutive cycles, outliers removed"

plot_data = plot_data.loc[(bat_data['Time constant current (s)']<1e4)] # CC threshold

fig9 = px.scatter(plot_data
                  ,x='Cycle_Index'
                  ,y='Time constant current (s)'
                  ,color = 'Battery_ID'
                  ,color_continuous_scale='viridis'
                  ,labels=fig_labels
                  ,title =fig9_title)
fig9.show()

As expected, `Time constant current (s)` behaves similarly to  `Charging time (s)` and has a negative correlation with the number of cycles. As the battery deteriorates the charging time becomes shorter.

Now I will look at the relationship of `Time constant current (s)` and `Charging time (s)`

#### 3.5.1. Charge Time vs Time at CC



In [24]:
plot_data = bat_data.loc[(bat_data['Time constant current (s)']<1e4)] # CC threshold
plot_data = plot_data.loc[(plot_data['Charging time (s)']<1e4)]       # charge time threshold

fig10_title= "Charge time vs Time at CC"
fig10 = px.scatter(plot_data
                  ,x='Charging time (s)'
                  ,y='Time constant current (s)'
                  ,color = 'Battery_ID'
                  ,color_continuous_scale='viridis'
                  ,labels=fig_labels
                  ,title =fig10_title)
fig10.show()

In [25]:
weird_ch_data2 = bat_data.loc[bat_data['Charging time (s)']<bat_data['Time constant current (s)']]
print('# weird events: {}/{}'.format(len(weird_ch_data2),len(bat_data)))

# weird events: 0/15064


As expected there is a positive relationship between `Time constant current (s)` and `Charging time (s)`. I also checked if there were any unphysical cycle records where `Time constant current (s)` > total `Charging time (s)`, and there were no such records.

## 4. Predicting RUL
<sup>[Go to top](#-Battery-Data-Exploration)</sup>

### 4.1 Data Cleaning

As discussed in the previous section, there are some cases with unphysical records and a lot of outlying events. Before I try to use this data to predict the RUL of a battery, I need to clean this data.

**NOTE:** at first I will attemt to predict RUL **without any outlier removal**

In [26]:
clean_data1 = bat_data.copy()

In [27]:
# ---------------------------------
# function to print crosscheck
def print_cc(idf,fdf):
    ilen = len(idf)
    flen = len(fdf)
    diff = ilen - flen

    output = """
-------------
Crosscheck 
    len(data) before: {}
    len(data) after:  {} 
    diff:             {}
""".format(ilen,flen,diff)  
    
    print(output)
# ---------------------------------

#### 4.1.1 Checking missing records

In [28]:
# print rows with missing records
bat_data[bat_data.isna().any(axis=1)]


Unnamed: 0,Cycle_Index,Discharge Time (s),Decrement 3.6-3.4V (s),Max. Voltage Dischar. (V),Min. Voltage Charg. (V),Time at 4.15V (s),Time constant current (s),Charging time (s),RUL,Battery_ID


No missing records found.

#### 4.1.2 Removing unphysical records

In the case of an unphysical measurement I will remove the record row.


**Negative Values** 

In [29]:
neg_counts = (bat_data<0).sum()

print('Counts of Negative Values:\n{}\n\nTotal:{}'.format(neg_counts,neg_counts.sum()))

clean_data1 = bat_data[(bat_data>=0).all(axis=1)]

print_cc(bat_data,clean_data1)


Counts of Negative Values:
Cycle_Index                   0
Discharge Time (s)            0
Decrement 3.6-3.4V (s)       24
Max. Voltage Dischar. (V)     0
Min. Voltage Charg. (V)       0
Time at 4.15V (s)             9
Time constant current (s)     0
Charging time (s)             0
RUL                           0
Battery_ID                    0
dtype: int64

Total:33

-------------
Crosscheck 
    len(data) before: 15064
    len(data) after:  15031 
    diff:             33



**Unphysical time measurements**

In [30]:
# make copy so as not to overwrite the data
clean_data_2 = clean_data1.copy()

# Get number of rows with unphysical events
weird_df = pd.concat([weird_disch_data,weird_ch_data]).drop_duplicates()

output = """
Rows with `Discharge Time (s)` < `Decrement 3.6-3.4V (s)` 
     OR   `Charging time (s)`  < `Time at 4.15V (s)`
: {} """.format(len(weird_df))

print(output)

# removing weird discharge events
clean_data2 = clean_data1[
    clean_data1['Discharge Time (s)'] > clean_data1['Decrement 3.6-3.4V (s)']
]


# removing weird discharge events
clean_data2 = clean_data2[
    clean_data2['Charging time (s)']  > clean_data2['Time at 4.15V (s)']
]

print_cc(clean_data1,clean_data2)



Rows with `Discharge Time (s)` < `Decrement 3.6-3.4V (s)` 
     OR   `Charging time (s)`  < `Time at 4.15V (s)`
: 35 

-------------
Crosscheck 
    len(data) before: 15031
    len(data) after:  14996 
    diff:             35



In [31]:
clean_data = clean_data_2.copy()

Total rows removed: 68

#### 4.1.3 Plotting Distributions

In [40]:
cols = ['Discharge Time (s)'
        ,'Decrement 3.6-3.4V (s)'
        ,'Max. Voltage Dischar. (V)'
        ,'Min. Voltage Charg. (V)'
        ,'Time at 4.15V (s)'
        ,'Time constant current (s)'
        ,'Charging time (s)']

# thresholds for visibility
pltd = clean_data.copy()
pltd = pltd[pltd['Discharge Time (s)']<5e3]
 
figs = []
for c in cols : 
    #print(c)
    fig = px.scatter(pltd
                     ,x='Cycle_Index'
                     ,y=c
                     ,color='Battery_ID'
                     ,color_continuous_scale='viridis'
                     ,labels=fig_labels)
    
    figs.append(fig)
    
# put px plots into subplots
test_fig = plotly_helpers.make_px_subplots(figs,7,1,cols
                                          ,x_labels=['Cycle Number' for i in figs ]
                                          ,y_labels=cols
                                          ,c_label='Battery ID')

# # --------------------------
# # >> uncomment to show plots

# # cosmetics
# test_fig.update_layout(title="Test Distributions"
#                        ,coloraxis = {'colorscale':'viridis'}
#                        ,height=4000
#                        ,coloraxis_showscale=False)

# # draw
# test_fig.show()


### 4.2 Setting up data 

In [33]:
# TODO's 

# print figs to file

# make sure things are commented

SyntaxError: invalid syntax (525519296.py, line 1)