# Machine learning models for magnetic and topological property prediction in transition metal oxides.
* Build a classifier that predicts FM vs AFM ground states from structure and composition features.
* Build a classifier (using structure and composition features) that predicts if a magnetic material has nontrivial band topology for some value of Hubbard $U$.

This notebook accompanies (Frey *et al*., High-throughput search for magnetic and topological order in transition metal oxides)

In [1]:
__author__ = "Nathan C. Frey"
__copyright__ = "MIT License"
__version__ = "0.0.1"
__maintainer__ = "Nathan C. Frey"
__email__ = "ncfrey@lbl.gov"
__date__ = "May 5 2020"

In [2]:
# Check requirements.txt for version requirements
import pandas as pd
import numpy as np

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import KFold
from sklearn.metrics import f1_score, classification_report

from joblib import dump, load

from monty.serialization import loadfn, dumpfn

### Machine learning model for predicting magnetic ground state ordering

In [3]:
datadir = "data/"
modeldir = "models/"

In [4]:
X_train = pd.read_json(datadir + 'X_train.json')
X_train = X_train.values
y_train = loadfn(datadir + 'y_train.json')

Most important features to consider are values and averages of:
* Number of magnetic sublattices  
* Sine Coulomb Matrix  
* Nearest-neighbor and next-nearest-neighbor distances between magnetic ions 
* Number of unfilled $d$ orbitals
* Coordination number of magnetic ions  
* Structural complexity per atom
* Spacegroup number

In [5]:
clf = load(modeldir + 'mag_clf.joblib')

In [6]:
fm_scores = []
afm_scores = []

kf = KFold(n_splits=5, shuffle=True)
kf.get_n_splits(X_train, y_train)

for k, (train, test) in enumerate(kf.split(X_train, y_train)):
    clf.fit(X_train[train], y_train[train])
    f1s_fm = f1_score(y_train[test], clf.predict(X_train[test]), pos_label=1)
    f1s_afm = f1_score(y_train[test], clf.predict(X_train[test]), pos_label=0)
    fm_scores.append(f1s_fm)
    afm_scores.append(f1s_afm)
    
    print("[fold {0}] FM score: {1:.3f}".format(k, f1s_fm))
    print("[fold {0}] AFM score: {1:.3f}".format(k, f1s_afm))

fm_scores = np.array(fm_scores)
afm_scores = np.array(afm_scores)
print('FM Mean: %.2f median: %.2f stdev: %.2f' % (np.mean(fm_scores), np.median(fm_scores), np.std(fm_scores)))
print('AFM Mean: %.2f median: %.2f stdev: %.2f' % (np.mean(afm_scores), np.median(afm_scores), np.std(afm_scores)))

[fold 0] FM score: 0.854
[fold 0] AFM score: 0.845
[fold 1] FM score: 0.872
[fold 1] AFM score: 0.850
[fold 2] FM score: 0.843
[fold 2] AFM score: 0.841
[fold 3] FM score: 0.857
[fold 3] AFM score: 0.855
[fold 4] FM score: 0.831
[fold 4] AFM score: 0.833
FM Mean: 0.85 median: 0.85 stdev: 0.01
AFM Mean: 0.84 median: 0.84 stdev: 0.01


In [8]:
X_test = pd.read_json(datadir + 'X_test.json')
X_test = X_test.values
y_test = loadfn(datadir + 'y_test.json')

In [9]:
round(clf.score(X_test, y_test), 2)

0.8

In [10]:
print(classification_report(y_test, clf.predict(X_test), target_names=["AFM", "FM"]))

              precision    recall  f1-score   support

         AFM       0.88      0.84      0.86       236
          FM       0.59      0.67      0.63        79

    accuracy                           0.80       315
   macro avg       0.74      0.76      0.75       315
weighted avg       0.81      0.80      0.80       315



### Machine learning model for magnetic topological phase classification
MAGNDATA and appropriate citations available from:
* https://arxiv.org/ftp/arxiv/papers/2003/2003.00012.pdf and
* https://www.topologicalquantumchemistry.fr/magnetic/index.html
* http://webbdcrista1.ehu.es/magndata/

In [11]:
X_train = pd.read_json(datadir + 'mtm_X_train.json')
X_train = X_train.values
y_train = loadfn(datadir + 'mtm_y_train.json')

Most important features to consider are values and averages of symmetry- and orbital-related descriptors. Some examples are:
* Number of $d$ valence electrons
* Spacegroup number
* Number of unfilled $f$ orbitals
* Crystal system

In [12]:
clf = load(modeldir + 'mag_topo_clf.joblib')

In [13]:
topo_scores = []
triv_scores = []

kf = KFold(n_splits=5, shuffle=True)
kf.get_n_splits(X_train, y_train)

for k, (train, test) in enumerate(kf.split(X_train, y_train)):
    clf.fit(X_train[train], y_train[train])
    f1s_topo = f1_score(y_train[test], clf.predict(X_train[test]), pos_label=1)
    f1s_triv = f1_score(y_train[test], clf.predict(X_train[test]), pos_label=0)
    topo_scores.append(f1s_topo)
    triv_scores.append(f1s_triv)
    
    print("[fold {0}] Topological score: {1:.3f}".format(k, f1s_topo))
    print("[fold {0}] Trivial score: {1:.3f}".format(k, f1s_triv))

topo_scores = np.array(topo_scores)
triv_scores = np.array(triv_scores)
print('Topological Mean: %.2f median: %.2f stdev: %.2f' % (np.mean(topo_scores), np.median(topo_scores), np.std(topo_scores)))
print('Trivial Mean: %.2f median: %.2f stdev: %.2f' % (np.mean(triv_scores), np.median(triv_scores), np.std(triv_scores)))

[fold 0] Topological score: 0.683
[fold 0] Trivial score: 0.711
[fold 1] Topological score: 0.725
[fold 1] Trivial score: 0.816
[fold 2] Topological score: 0.691
[fold 2] Trivial score: 0.725
[fold 3] Topological score: 0.791
[fold 3] Trivial score: 0.759
[fold 4] Topological score: 0.828
[fold 4] Trivial score: 0.819
Topological Mean: 0.74 median: 0.72 stdev: 0.06
Trivial Mean: 0.77 median: 0.76 stdev: 0.04


In [15]:
X_test = pd.read_json(datadir + 'mtm_X_test.json')
X_test = X_test.values
y_test = loadfn(datadir + 'mtm_y_test.json')

In [16]:
round(clf.score(X_test, y_test), 2)

0.72

In [17]:
print(classification_report(y_test, clf.predict(X_test), target_names=["Trivial", "Topological"]))

              precision    recall  f1-score   support

     Trivial       0.87      0.71      0.78        28
 Topological       0.50      0.73      0.59        11

    accuracy                           0.72        39
   macro avg       0.68      0.72      0.69        39
weighted avg       0.77      0.72      0.73        39

