# Example Runthrough of Our Package

The following notebook demonstrates the usage of the functions available when downloading teh `kratosbat` package. We will show how to extract data from the materials project and then further analyze it and predict outcomes using multiple statistical and machine learning methods.

## Extracting data from materialsproject.org
We will first import some relevant modules required for extracting data from the materialsproject.org. The submodule for any data extraction methods is known as `data_extract`:

In [1]:
import data_extract
import pandas as pd

The `get_bat_dat` function is able to mine the entire materials project for all materials classified as battery materials and will return a dataframe of all battery materials and their properties. 

*NOTE* The function takes an API key as an input to access the materialsproject database. We suggest generating your own API key from the dashboard of materials project. 

In [2]:
data_extract.get_bat_dat('EcOxTpa0ymKFe24R')

KeyboardInterrupt: 

This function also saves this data as a .csv file called **BatteryData.csv**. This data is also cached in the repository as the `get_bat_dat` function can take over 30 minutes to run as it loops through over 4000 battery materials. Therefore, it may be more convenient to read the given .csv file as shown below.

In [None]:
df = pd.read_csv('BatteryData.csv')
df

We are aware that the materials project is constantly adding materials to its database. To check to see if the cached .csv file is up to date with that available at materialsproject.org, we can simply use the `update_check` function available in the `data_extract` module. Again, this function requires an API key.

In [None]:
data_extract.update_check('EcOxTpa0ymKFe24R')

Next we require some additional data from .....

## Neural Network

The neural network that we developed utilizes ``PyTorch`` for creation of the model, as well as the forward and backpropogation techniques to properly train the neural network.

First, we will import our ``nn.py`` file:

In [3]:
import nn

``nn`` includes the imports of the necessary modules, including ``torch``, ``pandas``, and ``numpy``.

Now, we will create a model to predict volumetric and gravimetric capacity based off the information from **materialsproject.org**, normalized using our PCA process. The function takes a dataframe, the number of inputs, the size of the first, then second, hidden layer, and the number of outputs. The final parameter is the number of datapoints that we will use in our model.

In [4]:
model = nn.nn_capacity('NEWMinMaxScaler.csv', 91, 100, 75, 2, 4000)

If we look at our output:

In [5]:
model

Sequential(
  (0): Linear(in_features=91, out_features=100, bias=True)
  (1): LeakyReLU(negative_slope=0.01)
  (2): Linear(in_features=100, out_features=75, bias=True)
  (3): LeakyReLU(negative_slope=0.01)
  (4): Linear(in_features=75, out_features=2, bias=True)
)

We see that it is a PyTorch model! Let's predict some data now.
First, we will create a dataframe that takes some information from the **materialsproject.org** database:

In [6]:
import pandas as pd
import numpy as np
import torch

In [7]:
train_bat = pd.read_csv('NEWMinMaxScaler.csv')

x_train = train_bat.drop(columns=['Unnamed: 0', 'Gravimetric Capacity (units)', 'Volumetric Capacity', 'Max Delta Volume'])
y_train = train_bat[['Gravimetric Capacity (units)', 'Volumetric Capacity']]

x_train = x_train.sample(frac=1)
y_train = y_train.sample(frac=1)

x_test = x_train[4000:]
y_test = y_train[4000:]

dtype = torch.float
device = torch.device('cpu')

x_test_np = np.array(x_test)
x_test_torch = torch.tensor(x_test_np, device = device, dtype = dtype)

y_test_np = np.array(y_test)
y_test_torch = torch.tensor(y_test_np, device = device, dtype = dtype)

Now we will call our model on our test data:

In [8]:
y_pred = model(x_test_torch)

If we look at the first 5 entries, we can see the predicted values, with **gravimetric capacity** in the first column and **volumetric capacity** in the second.

In [9]:
y_pred[0:5]

tensor([[0.1191, 0.2612],
        [0.1378, 0.2090],
        [0.1250, 0.2428],
        [0.1275, 0.2585],
        [0.1108, 0.2504]], grad_fn=<SliceBackward>)

## For the user

To generate the information needed for the Neural Network, users must use the ``generate_inputs`` Python file:

In [10]:
import generate_inputs as gi

The user just needs to input their working ion, crystal system number, spacegroup number, charge formula, and discharge formula.

In [11]:
x = gi.generate_model_df('Li', 115, 19, 'P(WO4)2', 'LiP(WO4)2')

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike
  return self._getitem_tuple(key)


Then convert their value into a PyTorch ``Tensor``:

In [12]:
x_torch = torch.tensor(x, dtype = torch.float)

Now, calling the model on our data:

In [13]:
y = model(x_torch)

In [14]:
y = model(x_torch).clone().detach()

In [15]:
y = np.array(y)

In [16]:
print(str(y[0]))

[0.11218072 0.26535672]


And that's it!

To scale our data back:

In [19]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
df_output=pd.read_csv('../kratosbat/Data/TrainingData.csv')[['Gravimetric Capacity (units)','Volumetric Capacity']]
ms = MinMaxScaler()
ms.fit(DF_OUTPUT)
result=MS.inverse_transform(y)
print(result)

[[ 291.51227 2039.493  ]]


As you can see, our gravimetric capcity would be 291.5 and our volumetric capacity is 2039!