<a href="https://colab.research.google.com/github/AlexandruPascu/data_processing/blob/main/MSM_Faraday_Battery_Data_Importer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The below code downloads a zip file from a GitHub repository that contains **examples of battery cycling data. These can be used to test and explore the battery data analysis functions in this notebook.**

*Note that this code assumes that the current working directory is /content/, so specific file paths used in this code may need to be adjusted based on the user's directory structure accordingly if needed. It also assumes the notebook is running on a Linux or Mac environment with wget and unzip installed. wget and unzip may need to be installed or the code may need to be modified to work with the Windows equivalents of these tools.*

In [None]:
!wget https://github.com/AlexandruPascu/WMG-Data-Importer-Faraday-MSM/archive/refs/heads/main.zip || exit 1
!unzip -q main.zip -d example_cyclers || exit 1
!mv ./example_cyclers/WMG-Data-Importer-Faraday-MSM-main/data_examples ./cyclers_examples || exit 1
!rm main.zip
!rm -r ./example_cyclers

This section imports the necessary Python libraries for the data processing and visualization that will be performed later in the notebook.

*   re is used for regular expressions, which can be useful for string matching and manipulation.
*   os provides a way to interact with the operating system, such as navigating directories and accessing files.
*   sys provides access to some variables used or maintained by the interpreter and to functions that interact strongly with the interpreter.
*   chardet is used to detect the character encoding of a file.
*   openpyxl is a library for working with Excel files.
*   csv provides functionality for working with CSV (Comma-Separated Values) files.
*   math provides mathematical functions and constants.
*   numpy is a library for numerical computations with Python, providing efficient implementations of arrays and matrices.
*   pandas is a library for data manipulation and analysis, offering data structures and operations for manipulating numerical tables and time series.
*   matplotlib is a plotting library that allows users to create a wide range of visualizations.

*Make sure to install these libraries before running the code if needed.*

In [None]:
# If you get an error from the below code block it means that you need to 
# install some libraries, therefore remove the hashtag and the space from those 
# that you need or all of them if you cannot tell
# %pip install chardet
# %pip install openpyxl
# %pip install numpy
# %pip install pandas
# %pip install matplotlib

In [None]:
import re
import os
import sys

import chardet
import openpyxl
import csv
import math

import numpy as np
import pandas as pd

This section of the notebook includes various custom functions that are designed to manipulate battery cyclers data, analyze and visualize it. Here is a brief overview of what each function does:

*   **look_for_files**(path_or_file: str): This function searches for files in the specified path or file and returns a list of file paths.
*   **convert_xlsx_to_csv**(file_path: str): This function converts an Excel file to a CSV file.
*   **find_words**(file_path: str, cycler_keywords: dict) -> tuple: This function searches for specified keywords in a file and returns the pointer location where the keyword was found and the file encoding.
*   **split_file**(pointer: int, file_path: str, save_option: str) -> tuple: This function splits a file at a specified pointer and saves if wanted the resulting files (metadata and data separately) according to the specified save option.
*   **read_data_to_pandas**(data, filepath: str, encoding: str) -> pd.DataFrame: This function reads data from the file containing only the data and not the metadata file and returns a Pandas DataFrame.
*   **change_units**(df: pd.DataFrame, standard_units: dict, standard_time: list) -> pd.DataFrame: This function converts the units of the specified columns in a DataFrame to standardize units format.
*   **change_headers**(df: pd.DataFrame, standard_headers: dict) -> pd.DataFrame: This function renames the headers of the specified columns in a DataFrame to standardize headers format.
*   **add_state_label**(df: pd.DataFrame, current_epsilon: float = 0.0005, voltage_epsilon: float = 0.0005, time_threshold: int = 15) -> pd.DataFrame: This function adds a state labels to a DataFrame based on the specified voltage and current epsilon and time threshold for Charging/Discharging/Rest and CCCV with the corresponding approximate values.
*   **segment_df**(df: pd.DataFrame, request: str) -> list: This function returns a list of dataframes segmented by a given request string such as constant currents, constant voltages, CCCV, charging, discharging, rest
*   **find_cccv_periods**(df): This function finds periods of constant current constant voltage (CCCV) in a given dataframe.
*   **plot_current_voltage_diff**(df) -> None: This function plots the derivative (differences between consecutive values) of the current and voltage columns over their normal values in a DataFrame.
*   **display_data**(df: pd.DataFrame) -> None: This function displays a preview of the DataFrame in a text format and plots for Current, Voltage, Step and Temperature over Time.
*   **save_file**(df: pd.DataFrame, file_path: str): This function saves a Pandas dataframe to a CSV file in a subfolder of the directory containing the input file.

*Make sure to understand the purpose of each function before using it in your analysis, and make sure to specify the correct arguments when calling each function.*

The data_importer function is the main function of the notebook. This function is the backbone of the data processing in the notebook, and it allows the user to easily import and process data from different types of files with various options. The function takes the following arguments:

*   **path_or_file**: The file path or name to read data from.
*   **save_option**: Option to choose saving files or display data. It defaults to 'save'.
*   **state_option**: Option to add state labels to the data. It defaults to an empty string.
*   **print_option**: Option to print the dataframe and the 4 plots. It defaults to an empty string.
*   **cycler_keywords**: Dictionary containing header row options to terminate meta info for various types of data files. It defaults to a precompiled dictionary with keywords collected from multiple cycler machines.
*   **standard_units**: Dictionary of units to be converted to standard units. It defaults to a dictionary with standardize units according to the usual battery conventions.
*   **standard_time**: List of time-related columns. It defaults to a list of times we noticed are not set in seconds by default.
*   **standard_headers**: Dictionary of standard header names to replace original headers. It defaults to a dictionary with standardize headers according to the pybamm library.

The function processes each file found in the given *path_or_file*. It uses helper functions to find the metadata and data in the file, split the file, read the data into a pandas dataframe, change the units, and change the headers. If *state_option* is set to 'yes', it adds state labels to the data.

If *save_option* is set to 'save all' or 'save', the function saves the file. If *print_option* is set to 'yes' or 'all', the function prints the dataframe and the 4 plots. If *print_option* is set to 'all', it also prints the difference on current and voltage.

*The function also has error handling to catch any errors that may occur while processing the data, and it prints out the error and the file name where the error occurred.*

In [None]:
from pathlib import Path
from pbdp import Parser
from pbdp import create_logger
# This creates a logger object that can be used to log messages in the default pbdp way
_logger = create_logger()

parser = Parser()

By calling data_importe with the path to the directory containing the example data files, the function will process each file in the directory and apply the desired data processing and output options to all the files available(in this case only saving pre-proccesed battery from the original files without the battery labels or the plots). The resulting data can then be used for further analysis or visualization in the notebook.

In [None]:
#examples/notebooks/cyclers_examples/
parser.data_importer(path_or_file = Path('./cyclers_examples/'))

Alternatively here is an example for just one file from the examples but which also saves the metadata and the raw data, adds the battery state labels, plots the CCCV and then finaly prints the data and displays the 4 plots

In [None]:
parser.data_importer(path_or_file = Path('./cyclers_examples/Cell033_RPT_0p3C_0p3C_100_0_80cyc_10degC.csv'), state_option = 'yes', print_option = 'all', save_option = 'save all')

This code block reads in a CSV file with the **cleaned labeled cycler data** processed above (make sure to **edit the file location** accordingly with your usecase and firstly run the **data_importer on the raw file with the state_option as yes**) into a pandas dataframe. The segment_df function is then used to **segment the data** in the dataframe based on a **specific pattern of states**. The resulting segmented data is returned as a list of dataframes and then printed.

*Note that returning dataframes will always be in this order: **rest, charging, discharging, constant current, constant voltage and CCCV**. You can ask for as many segments you want (**always separate them by commas**) and for all of the aformentioned but the rest and CCCV you can ask for a specific **value in A or V**.*

In [None]:
import pandas as pd
from pbdp import segment
df = pd.read_csv("./cyclers_examples/processed/Cell033_RPT_0p3C_0p3C_100_0_80cyc_10degC_cleaned_data.csv")
list_df = segment.segment_data(df, ['rest', 'chg 3A', 'dchg', 'const curr', 'const voltage 4.2V', 'CCCV'])
print(list_df)