# Creating Archive-Ready Metadata
The raw data is split into a few different files:
- [A mapping of tests to filenames](./raw-data/Summary_of_CAMP_Cells.xlsx)
- [A mapping of tests to battery design](./raw-data/Summary_of_builds_JK.xlsx)
- The actual raw data from the machines in MACCOR format

In [1]:
from batdata.extractors.maccor import MACCORExtractor
from batdata.data import BatteryDataset
from shutil import rmtree
from tqdm.auto import tqdm
from pathlib import Path
import pandas as pd

  from .autonotebook import tqdm as notebook_tqdm


Configuration

In [2]:
data_path = Path('./raw-data/CAMP_data/')
h5_path = Path('./data/')

## Load in the Mapping Spreadsheets
These spreadsheets allow us to understand the content of the in our MACCOR files

In [3]:
test_descriptions = pd.read_excel('raw-data/Summary_of_CAMP_Cells.xlsx')
test_descriptions.head(2)

Unnamed: 0.1,Unnamed: 0,File Name,Owner,Batch,Cell Number,Cell Test,Start Time,Initial Cycle Number,Last Cycle,Test Time,Max Capacity (Ah),Max Energy,Max Current (A),Min Voltage,Max Voltage,Date of Test,Path,File Comments,Procedure,Number of Cycles in file
0,0,ARGONNE #20_SET-LN3024-104-1a.001,SET,LN3024_104,1,1a,03/31/2016 16:05:31,0.0,0.0,1.1667,0.0,0.0,0.0,3.305715,3.306783,\t03/31/2016\t,\tC:\Data\MIMS\Backup\ARGONNE #20\SET-LN3024-1...,SET-LN3024-104 Targray NCM811 [LN2086-32-4] ...,ABRHV-NCM523-Form-4p1.000NCM 523 Formation T...,0.0
1,1,ARGONNE #20_SET-LN3024-104-1aa.001,SET,LN3024_104,1,1aa,03/31/2016 16:07:53,0.0,3.0,4942.6788,0.003038,0.01179,0.000242,2.999924,4.300908,\t03/31/2016\t,\tC:\Data\MIMS\Backup\ARGONNE #20\SET-LN3024-1...,SET-LN3024-104 Targray NCM811 [LN2086-32-4] ...,ABRHV-NCM523-Form-4p3.000NCM 523 Formation T...,3.0


In [4]:
cell_descriptions = pd.read_excel('raw-data/Summary_of_builds_JK.xlsx')
cell_descriptions.head(2)

Unnamed: 0,build,anode,cathode,description,electrolyte,electrolyte_additive,total_cathode_area (cm2),number_layers,anode_supplier,anode_mat_name,...,cathode_supplier.1,target_capacity (Ah),anode_thickness (um),anode_loading (mg/cm2),anode_porosity,cathode_thickness (um),cathode_loading (mg/cm2),cathode_porosity,temperature (C),Notes
0,B1,C,HE5050,A12 vs. Toda HE5050,Gen 2,NONE,,,Conoco-Phillips,A12,...,TodaHE5050,0.375,86,5.75,35,68,14.5,42,30,
1,B1A,C,HE5050,A12 vs. Toda HE5050,Gen 2,NONE,,,Conoco-Phillips,A12,...,TodaHE5050,0.375,86,5.75,35,68,14.5,42,30,


### Filter down to best-documented cells
Get only the test descriptions where we have the "Batch" described in the cell descriptions

In [5]:
is_documented = test_descriptions['Batch'].apply(lambda x: x in set(cell_descriptions['build']))

In [6]:
print(f'Found descriptions for {is_documented.sum()}/{len(is_documented)} tests')

Found descriptions for 3409/8618 tests


In [7]:
test_descriptions = test_descriptions[is_documented]

In [8]:
print(f'There is a total of {len(test_descriptions[["Batch", "Cell Number"]].value_counts())} unique cells')

There is a total of 611 unique cells


## Load in an Example Test
Tests are stored in MACCOR format. Let's load one in to see how the data looks

In [9]:
extractor = MACCORExtractor()

In [10]:
data = extractor.generate_dataframe('raw-data/example/ARGONNE_11_CFF-B13A-P9b.033')

## Process all known cells
Loop through everything and save it into HDF5 format

In [11]:
if h5_path.is_dir():
    rmtree(h5_path)
h5_path.mkdir()

In [12]:
success_count = 0
for (batch_id, cell_id), group in tqdm(test_descriptions.groupby(['Batch', 'Cell Number'])):
    # Get the cells
    files = group['File Name'].apply(lambda x: data_path / x).tolist()
    
    # Parse them
    try:
        data = extractor.parse_to_dataframe(files)
    except Exception as exc:
        #print(batch_id, cell_id, files[0], exc)
        continue
    
    # Save it to the HDF5 format
    name = f'batch_{batch_id}_cell_{cell_id}.h5'
    data.to_batdata_hdf(h5_path / name, complevel=9)
    success_count += 1
print(f'Succeeded in parsing {success_count} cells')

  a = dI / dt
  b = dV / dt
  a = abs(a / max(abs(a)))
  a = dI / dt
  b = dV / dt
  a = abs(a / max(abs(a)))
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 611/611 [09:57<00:00,  1.02it/s]

Succeeded in parsing 310



