# Example Runthrough of Our Package

The following notebook demonstrates the usage of the functions available when downloading teh `kratosbat` package. We will show how to extract data from the materials project and then further analyze it and predict outcomes using multiple statistical and machine learning methods.

## Extracting data from materialsproject.org
We will first import some relevant modules required for extracting data from the materialsproject.org. The submodule for any data extraction methods is known as `data_extract`:

In [4]:
import data_extract
import pandas as pd

The `get_bat_dat` function is able to mine the entire materials project for all materials classified as battery materials and will return a dataframe of all battery materials and their properties. 

*NOTE* The function takes an API key as an input to access the materialsproject database. We suggest generating your own API key from the dashboard of materials project. 

In [5]:
data_extract.get_bat_dat('EcOxTpa0ymKFe24R')

Unnamed: 0_level_0,Reduced Cell Formula,Average Voltage (V),Min Voltage (V),Max Voltage (V),Number of Steps,Min Instability,Gravimetric Capacity (mAh/g),Volumetric Capacity (Ah/L),Working Ion,Min Fraction,...,Spacegroup,Specific Energy (Wh/kg),Energy Density (Wh/L),Number of Sites,Type,Max Delta Volume,Charge Formula,Discharge Formula,Spacegroup Number,Crystal Lattice
Battery ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
mp-504791_Li,P(WO4)2,2.325185,2.325185,2.325185,1,0.000000,50.228609,269.262771,Li,0.0,...,"{'number': 19, 'hall_number': 115, 'internatio...",116.790816,626.085802,11.0,intercalation,0.028564,P(WO4)2,LiP(WO4)2,19,Orthorombic
mp-763480_Li,P3W2O13,3.291748,3.291748,3.291748,1,0.000097,39.674483,154.215698,Li,0.0,...,"{'number': 14, 'hall_number': 81, 'internation...",130.598387,507.639167,18.0,intercalation,0.017562,P3W2O13,LiP3W2O13,14,Monoclinic
mp-1176966_Li,P8W3O29,3.609671,2.660169,5.062579,3,0.025051,181.943205,651.708695,Li,0.0,...,"{'number': 165, 'hall_number': 457, 'internati...",656.755144,2352.454095,40.0,intercalation,0.061034,P8W3O29,Li2P8W3O29,165,Trigonal
mvc-5592_Li,P2WO7,3.201169,3.201169,3.201169,1,0.049028,73.484217,329.726946,Li,0.0,...,"{'number': 2, 'hall_number': 2, 'international...",235.235410,1055.511733,10.0,intercalation,0.021096,P2WO7,LiP2WO7,2,Triclinic
mp-763566_Li,P2WO7,2.438309,2.438309,2.438309,1,0.000000,73.484217,306.081406,Li,0.0,...,"{'number': 4, 'hall_number': 6, 'international...",179.177232,746.321063,10.0,intercalation,0.025349,P2WO7,LiP2WO7,4,Monoclinic
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
mp-1180276_Mg,VCu3Se4,-0.098693,-0.098693,-0.098693,1,0.000000,92.144929,461.340369,Mg,0.0,...,"{'number': 215, 'hall_number': 511, 'internati...",-9.094055,-45.531044,8.0,intercalation,0.069416,VCu3Se4,MgVCu3Se4,215,Cubic
mp-1042619_Mg,Cu3(SnO3)4,1.890890,1.890890,1.890890,1,0.053999,60.789787,387.828816,Mg,0.0,...,"{'number': 204, 'hall_number': 500, 'internati...",114.946775,733.341476,19.0,intercalation,0.006392,Cu3(SnO3)4,MgCu3(SnO3)4,204,Cubic
mp-1046932_Mg,BaTl(NiO3)2,3.071616,3.071616,3.071616,1,0.315336,135.921407,914.229317,Mg,0.0,...,"{'number': 1, 'hall_number': 1, 'international...",417.498404,2808.161640,10.0,intercalation,0.058047,BaTl(NiO3)2,Ba2Mg3Tl2(NiO3)4,1,Triclinic
mp-1041103_Mg,Ho(MoO3)2,1.982244,1.982244,1.982244,1,0.059259,112.348870,735.558063,Mg,0.0,...,"{'number': 1, 'hall_number': 1, 'international...",222.702915,1458.055830,9.0,intercalation,0.019274,Ho(MoO3)2,HoMg(MoO3)2,1,Triclinic


This function also saves this data as a .csv file called **BatteryData.csv**. This data is also cached in the repository as the `get_bat_dat` function can take over 30 minutes to run as it loops through over 4000 battery materials. Therefore, it may be more convenient to read the given .csv file as shown below.

In [8]:
df = pd.read_csv('BatteryData.csv')
df

Unnamed: 0,Battery ID,Reduced Cell Formula,Average Voltage (V),Min Voltage (V),Max Voltage (V),Number of Steps,Min Instability,Gravimetric Capacity (mAh/g),Volumetric Capacity (Ah/L),Working Ion,...,Spacegroup,Specific Energy (Wh/kg),Energy Density (Wh/L),Number of Sites,Type,Max Delta Volume,Charge Formula,Discharge Formula,Spacegroup Number,Crystal Lattice
0,mp-504791_Li,P(WO4)2,2.325185,2.325185,2.325185,1,0.000000,50.228609,269.262771,Li,...,"{'number': 19, 'hall_number': 115, 'internatio...",116.790816,626.085802,11.0,intercalation,0.028564,P(WO4)2,LiP(WO4)2,19,Orthorombic
1,mp-763480_Li,P3W2O13,3.291748,3.291748,3.291748,1,0.000097,39.674483,154.215698,Li,...,"{'number': 14, 'hall_number': 81, 'internation...",130.598387,507.639167,18.0,intercalation,0.017562,P3W2O13,LiP3W2O13,14,Monoclinic
2,mp-1176966_Li,P8W3O29,3.609671,2.660169,5.062579,3,0.025051,181.943205,651.708695,Li,...,"{'number': 165, 'hall_number': 457, 'internati...",656.755144,2352.454095,40.0,intercalation,0.061034,P8W3O29,Li2P8W3O29,165,Trigonal
3,mvc-5592_Li,P2WO7,3.201169,3.201169,3.201169,1,0.049028,73.484217,329.726946,Li,...,"{'number': 2, 'hall_number': 2, 'international...",235.235410,1055.511733,10.0,intercalation,0.021096,P2WO7,LiP2WO7,2,Triclinic
4,mp-763566_Li,P2WO7,2.438309,2.438309,2.438309,1,0.000000,73.484217,306.081406,Li,...,"{'number': 4, 'hall_number': 6, 'international...",179.177232,746.321063,10.0,intercalation,0.025349,P2WO7,LiP2WO7,4,Monoclinic
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4396,mp-1180276_Mg,VCu3Se4,-0.098693,-0.098693,-0.098693,1,0.000000,92.144929,461.340369,Mg,...,"{'number': 215, 'hall_number': 511, 'internati...",-9.094055,-45.531044,8.0,intercalation,0.069416,VCu3Se4,MgVCu3Se4,215,Cubic
4397,mp-1042619_Mg,Cu3(SnO3)4,1.890890,1.890890,1.890890,1,0.053999,60.789787,387.828816,Mg,...,"{'number': 204, 'hall_number': 500, 'internati...",114.946775,733.341476,19.0,intercalation,0.006392,Cu3(SnO3)4,MgCu3(SnO3)4,204,Cubic
4398,mp-1046932_Mg,BaTl(NiO3)2,3.071616,3.071616,3.071616,1,0.315336,135.921407,914.229317,Mg,...,"{'number': 1, 'hall_number': 1, 'international...",417.498404,2808.161640,10.0,intercalation,0.058047,BaTl(NiO3)2,Ba2Mg3Tl2(NiO3)4,1,Triclinic
4399,mp-1041103_Mg,Ho(MoO3)2,1.982244,1.982244,1.982244,1,0.059259,112.348870,735.558063,Mg,...,"{'number': 1, 'hall_number': 1, 'international...",222.702915,1458.055830,9.0,intercalation,0.019274,Ho(MoO3)2,HoMg(MoO3)2,1,Triclinic


We are aware that the materials project is constantly adding materials to its database. To check to see if the cached .csv file is up to date with that available at materialsproject.org, we can simply use the `update_check` function available in the `data_extract` module. Again, this function requires an API key.

In [10]:
data_extract.update_check('EcOxTpa0ymKFe24R')

The Current BatteryData.csv file is up to date!


Next we require some additional data from .....

## Neural Network

The neural network that we developed utilizes ``PyTorch`` for creation of the model, as well as the forward and backpropogation techniques to properly train the neural network.

First, we will import our ``nn.py`` file:

In [1]:
import nn

``nn`` includes the imports of the necessary modules, including ``torch``, ``pandas``, and ``numpy``.

Now, we will create a model to predict volumetric and gravimetric capacity based off the information from **materialsproject.org**, normalized using our PCA process. The function takes a dataframe, the number of inputs, the size of the first, then second, hidden layer, and the number of outputs. The final parameter is the number of datapoints that we will use in our model.

In [2]:
model = nn.nn_capacity('NEWTrainingData_MinMaxScaler.csv', 115, 100, 75, 2, 4000)

If we look at our output:

In [3]:
model

Sequential(
  (0): Linear(in_features=115, out_features=100, bias=True)
  (1): LeakyReLU(negative_slope=0.01)
  (2): Linear(in_features=100, out_features=75, bias=True)
  (3): LeakyReLU(negative_slope=0.01)
  (4): Linear(in_features=75, out_features=2, bias=True)
)

We see that it is a PyTorch model! Let's predict some data now.
First, we will create a dataframe that takes some information from the **materialsproject.org** database:

In [5]:
import pandas as pd
import numpy as np
import torch

In [7]:
train_bat = pd.read_csv('NEWTrainingData_MinMaxScaler.csv')

x_train = train_bat.drop(columns=['Unnamed: 0', 'Gravimetric Capacity (units)', 'Volumetric Capacity', 'Max Delta Volume'])
y_train = train_bat[['Gravimetric Capacity (units)', 'Volumetric Capacity']]

x_train = x_train.sample(frac=1)
y_train = y_train.sample(frac=1)

x_test = x_train[4000:]
y_test = y_train[4000:]

dtype = torch.float
device = torch.device('cpu')

x_test_np = np.array(x_test)
x_test_torch = torch.tensor(x_test_np, device = device, dtype = dtype)

y_test_np = np.array(y_test)
y_test_torch = torch.tensor(y_test_np, device = device, dtype = dtype)

Now we will call our model on our test data:

In [10]:
y_pred = model(x_test_torch)

If we look at the first 5 entries, we can see the predicted values, with **gravimetric capacity** in the first column and **volumetric capacity** in the second.

In [13]:
y_pred[0:5]

tensor([[0.2037, 0.2015],
        [0.1596, 0.1886],
        [0.1779, 0.1950],
        [0.1518, 0.1911],
        [0.1730, 0.1850]], grad_fn=<SliceBackward>)