# Creating Archive-Ready Metadata
The raw data is split into a few different files:
- [A mapping of tests to filenames](./raw-data/Summary_of_CAMP_Cells.xlsx)
- [A mapping of tests to battery design](./raw-data/Summary_of_builds_JK.xlsx)
- The actual raw data from the machines in MACCOR format

In [1]:
from batdata.extractors.maccor import MACCORExtractor
from batdata.schemas.battery import ElectrodeDescription, ElectrolyteDescription, BatteryDescription
from batdata.schemas import BatteryMetadata
from batdata.data import BatteryDataset
from shutil import rmtree
from tqdm.auto import tqdm
from pathlib import Path
import pandas as pd

  from .autonotebook import tqdm as notebook_tqdm


Configuration

In [2]:
data_path = Path('./raw-data/CAMP_data/')
h5_path = Path('./data/')

## Load in the Mapping Spreadsheets
These spreadsheets allow us to understand the content of the in our MACCOR files

In [3]:
test_descriptions = pd.read_excel('raw-data/Summary_of_CAMP_Cells.xlsx')
test_descriptions.head(2)

Unnamed: 0.1,Unnamed: 0,File Name,Owner,Batch,Cell Number,Cell Test,Start Time,Initial Cycle Number,Last Cycle,Test Time,Max Capacity (Ah),Max Energy,Max Current (A),Min Voltage,Max Voltage,Date of Test,Path,File Comments,Procedure,Number of Cycles in file
0,0,ARGONNE #20_SET-LN3024-104-1a.001,SET,LN3024_104,1,1a,03/31/2016 16:05:31,0.0,0.0,1.1667,0.0,0.0,0.0,3.305715,3.306783,\t03/31/2016\t,\tC:\Data\MIMS\Backup\ARGONNE #20\SET-LN3024-1...,SET-LN3024-104 Targray NCM811 [LN2086-32-4] ...,ABRHV-NCM523-Form-4p1.000NCM 523 Formation T...,0.0
1,1,ARGONNE #20_SET-LN3024-104-1aa.001,SET,LN3024_104,1,1aa,03/31/2016 16:07:53,0.0,3.0,4942.6788,0.003038,0.01179,0.000242,2.999924,4.300908,\t03/31/2016\t,\tC:\Data\MIMS\Backup\ARGONNE #20\SET-LN3024-1...,SET-LN3024-104 Targray NCM811 [LN2086-32-4] ...,ABRHV-NCM523-Form-4p3.000NCM 523 Formation T...,3.0


In [4]:
cell_descriptions = pd.read_excel('raw-data/Summary_of_builds_JK.xlsx')
cell_descriptions.head(2)

Unnamed: 0,build,anode,cathode,description,electrolyte,electrolyte_additive,total_cathode_area (cm2),number_layers,anode_supplier,anode_mat_name,...,cathode_supplier.1,target_capacity (Ah),anode_thickness (um),anode_loading (mg/cm2),anode_porosity,cathode_thickness (um),cathode_loading (mg/cm2),cathode_porosity,temperature (C),Notes
0,B1,C,HE5050,A12 vs. Toda HE5050,Gen 2,NONE,,,Conoco-Phillips,A12,...,TodaHE5050,0.375,86,5.75,35,68,14.5,42,30,
1,B1A,C,HE5050,A12 vs. Toda HE5050,Gen 2,NONE,,,Conoco-Phillips,A12,...,TodaHE5050,0.375,86,5.75,35,68,14.5,42,30,


### Filter down to best-documented cells
Get only the test descriptions where we have the "Batch" described in the cell descriptions

In [5]:
is_documented = test_descriptions['Batch'].apply(lambda x: x in set(cell_descriptions['build']))

In [6]:
print(f'Found descriptions for {is_documented.sum()}/{len(is_documented)} tests')

Found descriptions for 3409/8618 tests


In [7]:
test_descriptions = test_descriptions[is_documented]

In [8]:
print(f'There is a total of {len(test_descriptions[["Batch", "Cell Number"]].value_counts())} unique cells')

There is a total of 611 unique cells


## Create a Function to Document Cell
Build a batdata-compliant metadata for a test given the information from the "test descriptons" and "cell descriptions" spreadsheets.
This new format will contain the same information, but mapped to community-agreed-upon names for concepts

First get an example record

In [9]:
record = test_descriptions.iloc[0]
record

Unnamed: 0                                                                202
File Name                                        ARGONNE_10_CFF-B11A-P13a.003
Owner                                                                     CFF
Batch                                                                    B11A
Cell Number                                                                13
Cell Test                                                                P13a
Start Time                                                           09:59:33
Initial Cycle Number                                                      0.0
Last Cycle                                                                7.0
Test Time                                                           10000.555
Max Capacity (Ah)                                                    0.265064
Max Energy                                                           0.809472
Max Current (A)                                                 

Look up the cell metadata

In [10]:
cell_metadata = cell_descriptions.query(f'build == "{record["Batch"]}"').iloc[0]
cell_metadata

build                                  B11A
anode                                     C
cathode                            5Vspinel
description                 A12 vs NEI LMNO
electrolyte                           Gen 2
electrolyte_additive                   NONE
total_cathode_area (cm2)              169.2
number_layers                            12
anode_supplier              Conoco-Phillips
anode_mat_name                          A12
cathode_supplier                        NEI
cathode_supplier.1                    SP-10
target_capacity (Ah)                    0.3
anode_thickness (um)                     59
anode_loading (mg/cm2)                 12.5
anode_porosity                         31.8
cathode_thickness (um)                   62
cathode_loading (mg/cm2)              14.77
cathode_porosity                         35
temperature (C)                          30
Notes                                   NaN
Name: 19, dtype: object

We just need to rearrange this data into the structure provided by `batdata`.

In [11]:
cathode_metadata = ElectrodeDescription(
    name=cell_metadata['cathode'],
    supplier=cell_metadata['cathode_supplier'],
    product=cell_metadata['cathode_supplier.1'],
    thickness=cell_metadata['cathode_thickness (um)'],
    area=cell_metadata['total_cathode_area (cm2)'],
    loading=cell_metadata['cathode_loading (mg/cm2)'],
    porosity=cell_metadata['cathode_porosity']
)
print(cathode_metadata.json(indent=2))

{
  "name": "5Vspinel",
  "supplier": "NEI",
  "product": "SP-10",
  "thickness": 62.0,
  "area": 169.2,
  "loading": 14.77,
  "porosity": 35.0
}


We put all of this into a single function for convenience

In [12]:
def describe_cell(test_record: dict) -> BatteryMetadata:
    """Create a single metadata record
    
    Args:
        test_record: Record for a certain test
    Returns:
        Formatted metadata for the battery
    """
    
    # Match cell description
    matches = cell_descriptions.query(f'build == "{record["Batch"]}"')
    assert len(matches) == 1, f'Found {len(matches)} descriptions for buuld={record["Batch"]}'
    cell_metadata = matches.iloc[0]

    # Describe the electrodes
    cathode_metadata = ElectrodeDescription(
        name=cell_metadata['cathode'],
        supplier=cell_metadata['cathode_supplier'],
        product=cell_metadata['cathode_supplier.1'],
        thickness=cell_metadata['cathode_thickness (um)'],
        area=cell_metadata['total_cathode_area (cm2)'],
        loading=cell_metadata['cathode_loading (mg/cm2)'],
        porosity=cell_metadata['cathode_porosity']
    )
    anode_metadata = ElectrodeDescription(
        name=cell_metadata['anode'],
        supplier=cell_metadata['anode_supplier'],
        product=cell_metadata['anode_mat_name'],
        thickness=cell_metadata['anode_thickness (um)'],
        loading=cell_metadata['anode_loading (mg/cm2)'],
        porosity=cell_metadata['anode_porosity']
    )

    # Get the electrolyte information
    additives = cell_metadata['electrolyte_additive']
    additives = [] if additives == 'NONE' else [{'name': x.strip()} for x in additives.split(",")]
    electrolyte = ElectrolyteDescription(
        name=cell_metadata['electrolyte'],
        additives=additives
    )
    
    # Combine to form a cell description
    battery = BatteryDescription(
        anode=anode_metadata,
        cathode=cathode_metadata,
        electrolyte=electrolyte,
        layer_count=cell_metadata['number_layers'],
        nominal_capacity=cell_metadata['target_capacity (Ah)']
    )
    return battery
describe_cell(record).dict()

{'manufacturer': None,
 'design': None,
 'layer_count': 12,
 'anode': {'name': 'C',
  'supplier': 'Conoco-Phillips',
  'product': 'A12',
  'thickness': 59.0,
  'area': None,
  'loading': 12.5,
  'porosity': 31.8},
 'cathode': {'name': '5Vspinel',
  'supplier': 'NEI',
  'product': 'SP-10',
  'thickness': 62.0,
  'area': 169.2,
  'loading': 14.77,
  'porosity': 35.0},
 'electrolyte': {'name': 'Gen 2', 'additives': []},
 'nominal_capacity': 0.3}

## Load in an Example Test
Tests are stored in MACCOR format. Let's load one in to see how the data looks

In [13]:
extractor = MACCORExtractor()

In [14]:
data = extractor.generate_dataframe(data_path / record['File Name'])
data.head(2)

Unnamed: 0,cycle_number,file_number,test_time,state,current,voltage,step_index,method,substep_index
0,0,0,0.0,ChargingState.hold,0.0,0.076905,0,ControlMethod.rest,0
1,0,0,10.002,ChargingState.hold,0.0,0.076753,0,ControlMethod.rest,0


## Process all known cells
Loop through everything and save it into HDF5 format

In [15]:
if h5_path.is_dir():
    rmtree(h5_path)
h5_path.mkdir()

In [16]:
success_count = 0
for (batch_id, cell_id), group in tqdm(test_descriptions.groupby(['Batch', 'Cell Number'])):
    # Get the metadata for the cell 
    cell_name = f'batch_{batch_id}_cell_{cell_id}'
    cell_metadata = describe_cell(group.iloc[0])
    
    # Assemble the metadata for everything else
    metadata = BatteryMetadata(
        name=f'CAMP_{cell_name}',
        battery=cell_metadata,
        dataset_name='paulson_2019',
        authors=[
            ['Noah H.', 'Paulson'],
            ['Joseph', 'Kubal'],
            ['Logan', 'Ward'],
            ['Saurabh', 'Saxena'],
            ['Wenquan', 'Lu'],
            ['Susan J.', 'Babinec']
        ],
        associated_ids=['https://doi.org/10.1016/j.jpowsour.2022.231127']
    )
    
    # Get the test results
    files = group['File Name'].apply(lambda x: data_path / x).tolist()
    
    # Parse them
    try:
        data = extractor.parse_to_dataframe(files, metadata=metadata)
    except Exception as exc:
        #print(batch_id, cell_id, files[0], exc)
        continue

    # Save it to the HDF5 format
    name = f'{cell_name}.h5'
    data.to_batdata_hdf(h5_path / name, complevel=9)
    success_count += 1
print(f'Succeeded in parsing {success_count} cells')

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 611/611 [10:23<00:00,  1.02s/it]

Succeeded in parsing 310 cells





Show off the metadata for one of the cells

In [17]:
example_cell = next(h5_path.glob('*.h5'))

In [18]:
data = BatteryDataset.from_batdata_hdf(str(example_cell))

In [19]:
print(data.metadata.json(exclude_defaults=True, indent=2))

{
  "name": "CAMP_batch_B21A_cell_4",
  "battery": {
    "layer_count": 12,
    "anode": {
      "name": "C",
      "supplier": "Conoco-Phillips",
      "product": "A12",
      "thickness": 59.0,
      "loading": 12.5,
      "porosity": 31.8
    },
    "cathode": {
      "name": "5Vspinel",
      "supplier": "NEI",
      "product": "SP-10",
      "thickness": 62.0,
      "area": 169.2,
      "loading": 14.77,
      "porosity": 35.0
    },
    "electrolyte": {
      "name": "Gen 2",
      "additives": []
    },
    "nominal_capacity": 0.3
  },
  "dataset_name": "paulson_2019",
  "authors": [
    [
      "Noah H.",
      "Paulson"
    ],
    [
      "Joseph",
      "Kubal"
    ],
    [
      "Logan",
      "Ward"
    ],
    [
      "Saurabh",
      "Saxena"
    ],
    [
      "Wenquan",
      "Lu"
    ],
    [
      "Susan J.",
      "Babinec"
    ]
  ],
  "associated_ids": [
    "https://doi.org/10.1016/j.jpowsour.2022.231127"
  ]
}
