<a href="https://colab.research.google.com/github/jglazar/multipAL/blob/main/test_multipal.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook, we'll use active learning to efficiently search through the [NIST JARVIS DFT](https://jarvis.nist.gov/jarvisdft/) database for piezoelectric topological insulators.

Let's start with some basic package imports

In [None]:
import pandas as pd
import multipal

from IPython.display import display_html
from IPython.display import HTML

# Data setup + visualization

Next, we'll instatiate a Data object that contains the JARVIS DFT records. The data object has a dataframe with each material's features and properties. The featurization for piezoelectric topological insulators has been baked into the JarvisPTData subclass.

In [None]:
pt_data = multipal.JarvisPTData()
display( pt_data.df.head() )

We can also use some cool visualization methods in the Data class. The below graph shows the competition between piezoelectricity and topological insulation. We quantify piezoelectricity with the maximum piezoelectric tensor value and topological insulation with the spin-orbit spillage. [Spillage](https://journals.aps.org/prb/abstract/10.1103/PhysRevB.90.125133) is a common measure of band inversion useful for high-throughput studies.

In [None]:
fig_compete = pt_data.plot_compete('dfpt_piezo_max_eij', 'spillage')
HTML( fig_compete.to_html() )

We can also compare the features of high-piezoelectric and high-spillage materials. We find that they tend to be opposites, as expected for competing properties. 

In [None]:
print('Our features: ',  *pt_data.ftrs_list, sep='  ')
fig_compare = pt_data.plot_compare('avg_mass')
HTML( fig_compare.to_html() )

We can also create a TSNE featurization to visualize a map of materials.

In [None]:
pt_data.add_tsne()
fig_tsne = pt_data.plot_map('dfpt_piezo_max_eij')
HTML( fig_tsne.to_html() )

# Active learning

Now that we have our data set up, let's do an active learning search. This example will be a bit trivial since we'll search through known materials, but it makes a nice proof-of-concept.

We have to first instantiate an active learning object. The JarvisAL subclass has some baked-in methods to test and visualize the active learning performance on the known JARVIS materials. We'll start by setting up an active learning dataframe with 5 materials in the training set.

In [None]:
pt_learn = JarvisAL( pt_data, 'dfpt_piezo_max_eij' )
al_df = pt_learn.df_setup()

We can do a basic active learning search for the best piezoelectric material now. This will output the JARVIS ID numbers of each selected material during the search.

In [None]:
ids = pt_learn.al( al_df, n_steps=10)
display(ids)

Next up is a more rigorous comparison between different acquisition functions. The below method runs active learning loops with 10 different starting training sets for each acquisition function. The final result is the improvement of the known materials over time.  

This will take a few minutes to run.

In [None]:
comp_df = pt_learn.compare_aq( n_avg=10, n_steps=100 )

Finally, we can visualize the average improvement of the dataset over time for the different acquisition functions. We clearly see that the active learning strategies (maxu and maxv) far outperform random guessing!

In [None]:
fig_racetrack = pt_learn.plot_racetrack( comp_df, error_bars=False )
HTML( fig_racetrack.to_html() )