# Generating materials descriptors – Exercises

In these exercises, we'll load a cleaned dataframe, decorate it with multiple descriptors, and prepare it to be used for machine learning.

Before starting, we need to use matminer's `load_dataframe_from_json()` function to load a cleaned version of the `elastic_tensor_2015` dataset. We will use this dataset for all the exercises.

In [1]:
import os
from matminer.utils.io import load_dataframe_from_json

df = load_dataframe_from_json(os.path.join("resources", "elastic_tensor_2015_cleaned.json"))
df.head()

Reading file resources/elastic_tensor_2015_cleaned.json: 1181it [00:01, 762.18it/s] #################################################################################################3               | 1034/1181 [00:01<00:00, 1158.33it/s]
Decoding objects from resources/elastic_tensor_2015_cleaned.json: 100%|###############################################################################################################################| 1181/1181 [00:01<00:00, 762.39it/s]


Unnamed: 0,structure,formula,K_VRH
0,"[[0.94814328 2.07280467 2.5112 ] Nb, [5.273...",Nb4CoSi,194.268884
1,"[[0. 0. 0.] Al, [1.96639263 1.13529553 0.75278...",Al(CoSi)2,175.449907
2,"[[1.480346 1.480346 1.480346] Si, [0. 0. 0.] Os]",SiOs,295.077545
3,"[[0. 1.09045794 0.84078375] Ga, [0. ...",Ga,49.13067
4,"[[1.0094265 4.24771709 2.9955487 ] Si, [3.028...",SiRu2,256.768081


## Exercise 1: Convert formulas to pymatgen Compositions

Use matminer's `StrToComposition` conversion featurizer to first convert the `formula` column of the dataframe to pymatgen `Composition`s. This is necessary because matminer's Composition featurizers need pymatgen compositions as input. 

In [2]:
from matminer.featurizers.conversions import StrToComposition

stc = StrToComposition()

# Complete exercise below

df = stc.featurize_dataframe(df, "formula")
df.head()

StrToComposition:   0%|          | 0/1181 [00:00<?, ?it/s]

Unnamed: 0,structure,formula,K_VRH,composition
0,"[[0.94814328 2.07280467 2.5112 ] Nb, [5.273...",Nb4CoSi,194.268884,"(Nb, Co, Si)"
1,"[[0. 0. 0.] Al, [1.96639263 1.13529553 0.75278...",Al(CoSi)2,175.449907,"(Al, Co, Si)"
2,"[[1.480346 1.480346 1.480346] Si, [0. 0. 0.] Os]",SiOs,295.077545,"(Si, Os)"
3,"[[0. 1.09045794 0.84078375] Ga, [0. ...",Ga,49.13067,(Ga)
4,"[[1.0094265 4.24771709 2.9955487 ] Si, [3.028...",SiRu2,256.768081,"(Si, Ru)"


## Exercise 2: Add composition features

Now add `ElementFraction` features by featurizing the `composition` column.

In [3]:
from matminer.featurizers.composition.element import ElementFraction

ep = ElementFraction()

# Complete exercise below

df = ep.featurize_dataframe(df, "composition")
df.head()

ElementFraction:   0%|          | 0/1181 [00:00<?, ?it/s]

Unnamed: 0,structure,formula,K_VRH,composition,H,He,Li,Be,B,C,...,Pu,Am,Cm,Bk,Cf,Es,Fm,Md,No,Lr
0,"[[0.94814328 2.07280467 2.5112 ] Nb, [5.273...",Nb4CoSi,194.268884,"(Nb, Co, Si)",0.0,0,0.0,0.0,0.0,0.0,...,0,0,0,0,0,0,0,0,0,0
1,"[[0. 0. 0.] Al, [1.96639263 1.13529553 0.75278...",Al(CoSi)2,175.449907,"(Al, Co, Si)",0.0,0,0.0,0.0,0.0,0.0,...,0,0,0,0,0,0,0,0,0,0
2,"[[1.480346 1.480346 1.480346] Si, [0. 0. 0.] Os]",SiOs,295.077545,"(Si, Os)",0.0,0,0.0,0.0,0.0,0.0,...,0,0,0,0,0,0,0,0,0,0
3,"[[0. 1.09045794 0.84078375] Ga, [0. ...",Ga,49.13067,(Ga),0.0,0,0.0,0.0,0.0,0.0,...,0,0,0,0,0,0,0,0,0,0
4,"[[1.0094265 4.24771709 2.9955487 ] Si, [3.028...",SiRu2,256.768081,"(Si, Ru)",0.0,0,0.0,0.0,0.0,0.0,...,0,0,0,0,0,0,0,0,0,0


## Exercise 3: Add structure features

Finally, structure features using the `DensityFeatures` featurizer on the `structure` column.

In [4]:
from matminer.featurizers.structure.order import DensityFeatures

de = DensityFeatures()

# Complete exercise below

df = de.featurize_dataframe(df, "structure")
df.head()

DensityFeatures:   0%|          | 0/1181 [00:00<?, ?it/s]

Unnamed: 0,structure,formula,K_VRH,composition,H,He,Li,Be,B,C,...,Bk,Cf,Es,Fm,Md,No,Lr,density,vpa,packing fraction
0,"[[0.94814328 2.07280467 2.5112 ] Nb, [5.273...",Nb4CoSi,194.268884,"(Nb, Co, Si)",0.0,0,0.0,0.0,0.0,0.0,...,0,0,0,0,0,0,0,7.834556,16.201654,0.688834
1,"[[0. 0. 0.] Al, [1.96639263 1.13529553 0.75278...",Al(CoSi)2,175.449907,"(Al, Co, Si)",0.0,0,0.0,0.0,0.0,0.0,...,0,0,0,0,0,0,0,5.384968,12.397466,0.644386
2,"[[1.480346 1.480346 1.480346] Si, [0. 0. 0.] Os]",SiOs,295.077545,"(Si, Os)",0.0,0,0.0,0.0,0.0,0.0,...,0,0,0,0,0,0,0,13.968635,12.976265,0.569426
3,"[[0. 1.09045794 0.84078375] Ga, [0. ...",Ga,49.13067,(Ga),0.0,0,0.0,0.0,0.0,0.0,...,0,0,0,0,0,0,0,6.036267,19.180359,0.479802
4,"[[1.0094265 4.24771709 2.9955487 ] Si, [3.028...",SiRu2,256.768081,"(Si, Ru)",0.0,0,0.0,0.0,0.0,0.0,...,0,0,0,0,0,0,0,9.539514,13.358418,0.598395


Great! We've generated our features. Onto the next section.