Processing data from: https://data.matr.io/1/projects/5c48dd2bc625d700019f3204/batches/5c86bd64fa2ede00015ddbb3

# Importing Dependencies
Here we import all the dependencies

In [2]:
import h5py
import numpy as np
import pickle

# Loading the data
Let us load the data in the form of a HDF5 File
To know more about Hierarchical Data Format (HDF), visit: https://en.wikipedia.org/wiki/Hierarchical_Data_Format

In [3]:
name = "/content/drive/MyDrive/BTP/Datasets/Severson/2018-04-12_batchdata_updated_struct_errorcorrect.mat"
f = h5py.File(name)

# Walking through the data
Lets check what keys this batch have

In [4]:
batch = f["batch"]
list(batch.keys())

['Vdlin',
 'barcode',
 'channel_id',
 'cycle_life',
 'cycles',
 'policy',
 'policy_readable',
 'summary']

Let us compute the number of cells this batch contains

In [5]:
number_of_cells = batch["summary"].shape[0]
print(number_of_cells)

46


For every cell, there is a summary provided which contains the following keys

In [6]:
keys_per_cell = list(f[batch["summary"][0, 0]].keys())
print(keys_per_cell)

['IR', 'QCharge', 'QDischarge', 'Tavg', 'Tmax', 'Tmin', 'chargetime', 'cycle']


**IR** - Internal Resistance <br>
**QCharge** - Charge Capacity <br>
**QDischarge** - Discharge Capacity <br>
**Tavg** - Average Temperature <br>
**Tmax** - Maximum Temperature <br>
**Tmin** - Minimum Temperature <br>
**chargetime** - Charging time for this cell <br>
**Cycle** - Details of every cycle for this cell <br>

The summary for every cycle for each cell contains the following keys

In [7]:
keys_per_cycle = list(f[batch["cycles"][0, 0]].keys())
print(keys_per_cycle)

['I', 'Qc', 'Qd', 'Qdlin', 'T', 'Tdlin', 'V', 'discharge_dQdV', 't']


These are values per cycle<br>
**I** - Current <br>
**Qc** - Charge Capacity <br>
**Qd** - Discharge Capacity <br>
**Qdlin** - ? <br>
**T** - Temperature<br>
**Tdlin** - ?<br>
**V** - Voltage <br>
**discharge_dQdV** - Discharging dQ/dV<br>
**t** - Time<br>

# Creating a dataset
The below code creates a database in the form of a Python dictionary which we store for reusability as the online database available is very costly process and requires decent amount of computational power to process everything.

In [8]:
data = {}
for i in range(number_of_cells):
  cycle_life = f[batch["cycle_life"][i, 0]][0, 0]
  policy = f[batch['policy_readable'][i, 0]][0, 0].tobytes()[::2].decode()
  summary_data = {}
  batch_summary = f[batch["summary"][i, 0]]
  cycles = f[batch["cycles"][i, 0]]
  for j, val in enumerate(keys_per_cell):
    summary_val = list(np.hstack(batch_summary[val][0, :]))
    summary_data[val] = summary_val
  cycle_data = {}
  num_cycles = cycles["I"].shape[0]
  for j in range(num_cycles):
    cd = {}
    for k, val in enumerate(keys_per_cycle):
      value = f[cycles[val][j, 0]]
      cycle_val = np.hstack(f[cycles[val][j, 0]])
      cd[val] = cycle_val
    cycle_data[str(j)] = cd
  cell_dict = {"cycle_life": cycle_life, "charge_policy": policy, "summary": summary_data, "cycles": cycle_data}
  key = f"b3c{str(i)}"
  data[key] = cell_dict

The below code tests how many cells are there for this batch (should be 46)

In [9]:
len(data)

46

# Storing the data
Let us now store this database in the form of a pickle file so that we can easily fetch it whenever we want

In [10]:
with open('/content/drive/MyDrive/BTP/Datasets/Severson/batch3.pkl','wb') as fp:
  pickle.dump(data, fp)

We have now stored the data on our Google Drive. Similar thing was done for Batch 2 and Batch 3. <br>

After data processing, we do some visualisation. The visualisation part is done in a separate notebook, visit this link