# Step 1 - Create Training Data

## Overview

To train the Neural Network in this work we must generate training data.

Dataset:
- Input 1: (Image)
    From the microstructure of the generated 2D NMC positive electrode. These cells were generated using a non-overlapping Random     Sequential Addition (RSA) approach. Thus each image from the electrode has a centered particle with the associated               surroundings. This includes the solid, void, and, if near the borders, the padding. The particle-of-interest is zoomed into       so each image fed into the Neural Network is homogeneous in size.
- Input 2: (Metadata)
    Descriptors of the input: (`x`, `y`) coordinates of the particle, `R` for radius, `L` the length of the electrode, zoom           factor, C-rate, time, distance from current source (is the same as `x` in discharge case), and local porosity.
- Target -  (Image)
    The State-of-Lithiation value from the electrochemical simulations. The particles are zoomed in to fully-inscribe the square     image.
    
## Workflow

`create_micro_pngs.py` -> `create_col_map.py` -> `create_ml_dataset.py` -> `preprocess_ml_data.py`

## Description:

- `create_micro_pngs.py`: Creates NumPy arrays from the `metadata.json` file representing the positive electrode.
- `create_col_map.py`: Processes COMSOL Multiphysics data to generate NumPy arrays with State-of-Lithiation values overlaying the NMC particles.
- `create_ml_dataset.py`: Takes data from two previous steps and saves the data into input (image), input (metadata), and target NumPy files.
- `preprocess_ml_data.py`: Creates the Extract-Transform-Load dataset loader. This batches up the data using streaming, so data does not have to fit within memory, for more details see [here](https://www.tensorflow.org/api_docs/python/tf/data/Dataset). At this point, the data is suitable for training a neural network using TensorFlow.


### Settings

Should create `micro_1.npy` in this directory if this Jupyter Notebook is run again.

- L: Length of the electrode ($\mu$m)
- h_cell: Width/height of electrode ($\mu$m)
- scale: Multiplies the axis by this value, so if $L = 100$ and $scale=5$, then the length axis will have 500 pixels.
- grid_size: The number of elements used in `np.meshgrid()`
- pore_phase: Value set in NumPy array to represent the pore phase.
- solid_phase: " " " " " to represent the solid NMC phase.

In [1]:
!python ../create_micro_pngs.py

## Process Electrochemical Simulations to Target Data

### Settings

Should create files in `1/col/*.npy`.

- header_row: The row where the header is the generated `csv` files. For example, open a `csv` file in the `1` folder.
- c_rates: The dataset will consist of these C-rate electrochemial simualations. Of course, this will have to match the available `csv` files. For example, folder `1` has C-rates of [0.25, 0.5, 1, 2, 3].
- substrings_to_parse_out: Ignore all these substrings. The Python file checks all the files in the directory to find where the `csv` folders live.

#### Hide Progress

In [2]:
!python ../create_col_map.py

  0%|                                                    | 0/13 [00:00<?, ?it/s]
  0%|                                                   | 0/344 [00:00<?, ?it/s][A
  1%|▎                                          | 2/344 [00:00<00:26, 12.80it/s][A
  1%|▌                                          | 4/344 [00:00<00:24, 14.04it/s][A
  2%|▊                                          | 6/344 [00:00<00:24, 13.68it/s][A
  2%|█                                          | 8/344 [00:00<00:23, 14.54it/s][A
  3%|█▏                                        | 10/344 [00:00<00:21, 15.21it/s][A
  3%|█▍                                        | 12/344 [00:00<00:20, 16.14it/s][A
  4%|█▋                                        | 14/344 [00:00<00:19, 16.68it/s][A
  5%|█▉                                        | 16/344 [00:01<00:20, 16.28it/s][A
  5%|██▏                                       | 18/344 [00:01<00:19, 16.57it/s][A
  6%|██▍                                       | 20/344 [00:01<00:19, 17.05it/s

## Create the Machine Learning Dataset

### Settings

- input_dir: Name of the folder where the input NumPy arrays will be saved.
- label_dir: Name of the folder where the target NumPy arrays will be saved.
- img_size: Size of the NumPy arrays saved in the `input_dir` and `label_dir`.
- width_wrt_radius: Defines how much surrounding space show in multiples of radius. For example, if this parameter is 3, then 1R will be alloted to the particle, so there is 2R of surroundings.
- sol_max: Value of State-of-Lithiation to be scaled to in the target NumPy data. For example, if sol_max = 65535, then So SoL of 0-1 goes from 0-65535.
- padding_encoding: The padding in the NumPy array is represented by this value.

In [3]:
!python ../create_ml_dataset.py

100%|███████████████████████████████████████████| 60/60 [02:24<00:00,  2.41s/it]


## Using the Extract-Transform-Load Script

### Settings

The data is ready to be used to train a neural network at this point.

- data_dir: Name of the directory where the dataset files live.
- trn_split: % of total dataset allocated for training.
- val_split: % of total dataset allocated for validation.
- dataset_data: Name of the `json` file describing the generated dataset.
- tf_img_size: Output size of the image data - both inputs and targets - to be used to train the neural network. Thus the data could be saved in one size and there is flexibility to load it in other sizes to be used in TensorFlow.
- batch_size: 64
- [others]: img_size, width_wrt_radius, and scale are the same as previously defined, other parameters are used to normalize the metadata.

In [4]:
!python ../preprocess_ml_data.py

Metal device set to: Apple M1 Pro

systemMemory: 16.00 GB
maxCacheSize: 5.33 GB

2022-08-21 17:45:06.153189: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-08-21 17:45:06.153290: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
