<a href="https://colab.research.google.com/github/quantum-intelligence/computational-physics/blob/main/CP_Lecture16.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Computational Physics
Lecture 16: GPUs and Neural Networks

In [3]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
import os

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from sklearn.metrics import r2_score

Download data from :
https://archive.materialscloud.org/record/2019.0020/v1

Description of data and corresponding study can be found here:
https://www.nature.com/articles/s41598-020-72811-z

open and load "magneticmoment_Ef_data.csv" using pandas.
- save the file to your google drive (with colab) or your local drive (jupyter notebook).

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


verify mount and check path for the csv file. Change the path below as needed.

In [None]:
ls drive/MyDrive/Colab\ Notebooks

In [5]:
# Create dataframe of "magneticmoment_Ef_data.csv" using pandas.
# Change the path to magneticmoment_Ef_data.csv as needed.
data_path = "drive/MyDrive/Colab Notebooks/magneticmoment_Ef_data.csv"
df = pd.read_csv(data_path)

Explore the pandas object by examinging the columns:
- df.column()

A summary of the dataframe:
- df.head()


In [None]:
df.head(n=3)

In [7]:
#choose the appropriate set of descriptors here:
descriptor_list = ['num_p', 'num_d', 'num_f',
       'atomic_rad_sum_dif', 'atomic_rad_std_dif',
       'atomic_rad_std', 'atomic_rad_avg', 'atomic_rad_max_dif',
       'atomic_vol_sum_dif', 'atomic_vol_std_dif', 'atomic_vol_std',
       'atomic_vol_avg', 'atomic_vol_max_dif', 'covalentrad_sum_dif',
       'covalentrad_std_dif', 'covalentrad_std', 'covalentrad_avg',
       'covalentrad_max_dif', 'dipole_sum_dif', 'dipole_std_dif', 'dipole_std',
       'dipole_avg', 'dipole_max_dif', 'eaffinity_sum_dif',
       'eaffinity_std_dif', 'eaffinity_std', 'e_affinity_avg',
       'e_affinity_max_dif', 'numelectron_sum_dif', 'numelectron_std_dif',
       'numelectron_std', 'numelectron_avg', 'numelectron_max_dif',
       'vdwradius_sum_dif', 'vdwradius_std_dif', 'vdwradius_std',
       'vdwradius_avg', 'vdwradius_max_dif', 'e_negativity_sum_dif',
       'e_negativity_std_dif', 'e_negativity_std', 'e_negativity_avg',
       'e_negativity_max_dif', 'nvalence_sum_dif', 'nvalence_std_dif',
       'nvalence_std', 'nvalence_avg', 'nvalence_max_dif', 
       'cmpd_skew_p', 'cmpd_skew_d', 'cmpd_skew_f', 'cmpd_sigma_p',
       'cmpd_sigma_d', 'cmpd_sigma_f', 'frac_f ', 'std_ion', 'sum_ion',
       'mean_ion', 'Born', 'hardness_mean', 'hardness_var', 'Nup_mean',
       'Nup_var', 'cs_bob', 'cs_PE', 'cs_IR', 'cs_AR', 'cs_OX']

Consider the following target property, y and descriptors, X.

y --> 'formation_energy'

X --> descriptors in descriptor_list defined above...



- Create model input, X and labels, y
- Create a training set and a validation set

In [8]:
y = df['formation_energy'].values
X = df[descriptor_list]
X = np.asarray(X)
print("X.shape", X.shape)

X.shape (226, 68)


In [9]:
# code to create training/test set commended below:
test_size = 0.2 #test set split, e.g. 20% test set to 80% training set size...
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=42)

In [10]:
print('descriptor dimension:', X_train[0].shape)

descriptor dimension: (68,)


Create a Sequential model using Keras

In [11]:
model = keras.Sequential()
model.add(layers.Dense(20, input_dim=68, activation="relu"))
model.add(layers.Dense(10, activation="relu"))
model.add(layers.Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_absolute_error'])

Fit the model to the training data:

In [None]:
history = model.fit(X_train, y_train, epochs=20, batch_size=64,
                    validation_data=(X_test, y_test))

In [None]:
model.summary()

Visualize the loss, model metrics as a function of epoch:

In [14]:
print(history.history.keys())

dict_keys(['loss', 'mean_absolute_error', 'val_loss', 'val_mean_absolute_error'])


In [None]:
# summarize history for model metrics
plt.plot(history.history['mean_absolute_error'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

In [None]:
# summarize history for loss
plt.plot(history.history['loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

Model prediction and performance assessment:

In [17]:
y_pred = model.predict(X_test)
y_train_pred = model.predict(X_train)

In [18]:
r2 = r2_score(y_test, y_pred)
print('Test set R-square', r2)
r2 = r2_score(y_train, y_train_pred)
print('Training set R-square', r2)

Test set R-square -753961700.0916079
Training set R-square -748551134.9802226


TASKS:
- Inspect the loss/metric score after each epoch, what is happening?
- Plot the validation scores for the model metrics and the loss alongside the corresponding training set values shown above.
- Is the model overfitting?
- Experiment with the model architecture
  - i.e. tune the number of hidden layers and the number of hidden units.
- Use subset selection on the set of descriptors to improve your results
- Create a plot of the actual formation energy versus the predicted formation energy for both the training set and the test set.
- Compare the number of coefficients in the model to the number of data points. Is there a potential problem? What might that be?
- Is the maximum number of epochs used appropriate?

## Playing with GPUs
QUESTION:
- Time the NN training task for the CPU
- Switch to a GPU and time the NN training task for the GPU

(To change to a GPU, click on "Runtime" (in the menu above) and click on "Change Runtime type". Select the GPU option.)