# Generating materials descriptors - Exercises

In this exercise, we'll load the dataframe we made in the previous exercise, decorate it with multiple descriptors, and prepare it for input to machine learning algorithms.



## Load the dataset from the previous unit.

Use matminer's `load_dataframe_from_json` function to load the file you stored in unit 1 (`unit1.json`). 

In [3]:
from matminer.utils.io import load_dataframe_from_json


df = load_dataframe_from_json("unit1.json")
df.head()

Unnamed: 0,structure,formula,K_VRH
0,"[[0.94814328 2.07280467 2.5112 ] Nb, [5.273...",Nb4CoSi,194.268884
1,"[[0. 0. 0.] Al, [1.96639263 1.13529553 0.75278...",Al(CoSi)2,175.449907
2,"[[1.480346 1.480346 1.480346] Si, [0. 0. 0.] Os]",SiOs,295.077545
3,"[[0. 1.09045794 0.84078375] Ga, [0. ...",Ga,49.13067
4,"[[1.0094265 4.24771709 2.9955487 ] Si, [3.028...",SiRu2,256.768081


## Convert formulas to pymatgen Compositions

Let's use matminer's `StrToComposition` conversion featurizer to first convert the `formula` column of the dataframe to pymatgen `Composition`s. This is necessary because matminer's Composition featurizers need pymatgen compositions as input. 


In [5]:
from matminer.featurizers.conversions import StrToComposition

stc = StrToComposition()

df = stc.featurize_dataframe(df, "formula")


numpy.ufunc size changed, may indicate binary incompatibility. Expected 216, got 192



HBox(children=(IntProgress(value=0, description='StrToComposition', max=975, style=ProgressStyle(description_w…




## Add composition features

Now we will add MagPie statistics as we did in the lesson using the `ElementProperty` `"magpie"` preset. Remember to use the "composition" column as input.

In [6]:
from matminer.featurizers.composition import ElementProperty

ep = ElementProperty.from_preset(preset_name="magpie")
df = ep.featurize_dataframe(df, "composition")
df.head()


numpy.ufunc size changed, may indicate binary incompatibility. Expected 216, got 192


numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject


numpy.ufunc size changed, may indicate binary incompatibility. Expected 216, got 192



HBox(children=(IntProgress(value=0, description='ElementProperty', max=975, style=ProgressStyle(description_wi…




Unnamed: 0,structure,formula,K_VRH,composition,MagpieData minimum Number,MagpieData maximum Number,MagpieData range Number,MagpieData mean Number,MagpieData avg_dev Number,MagpieData mode Number,MagpieData minimum MendeleevNumber,MagpieData maximum MendeleevNumber,MagpieData range MendeleevNumber,MagpieData mean MendeleevNumber,MagpieData avg_dev MendeleevNumber,MagpieData mode MendeleevNumber,MagpieData minimum AtomicWeight,MagpieData maximum AtomicWeight,MagpieData range AtomicWeight,MagpieData mean AtomicWeight,MagpieData avg_dev AtomicWeight,MagpieData mode AtomicWeight,MagpieData minimum MeltingT,MagpieData maximum MeltingT,MagpieData range MeltingT,MagpieData mean MeltingT,MagpieData avg_dev MeltingT,MagpieData mode MeltingT,MagpieData minimum Column,MagpieData maximum Column,MagpieData range Column,MagpieData mean Column,MagpieData avg_dev Column,MagpieData mode Column,MagpieData minimum Row,MagpieData maximum Row,MagpieData range Row,MagpieData mean Row,MagpieData avg_dev Row,MagpieData mode Row,MagpieData minimum CovalentRadius,MagpieData maximum CovalentRadius,MagpieData range CovalentRadius,MagpieData mean CovalentRadius,MagpieData avg_dev CovalentRadius,MagpieData mode CovalentRadius,MagpieData minimum Electronegativity,MagpieData maximum Electronegativity,MagpieData range Electronegativity,MagpieData mean Electronegativity,MagpieData avg_dev Electronegativity,MagpieData mode Electronegativity,MagpieData minimum NsValence,MagpieData maximum NsValence,MagpieData range NsValence,MagpieData mean NsValence,MagpieData avg_dev NsValence,MagpieData mode NsValence,MagpieData minimum NpValence,MagpieData maximum NpValence,MagpieData range NpValence,MagpieData mean NpValence,MagpieData avg_dev NpValence,MagpieData mode NpValence,MagpieData minimum NdValence,MagpieData maximum NdValence,MagpieData range NdValence,MagpieData mean NdValence,MagpieData avg_dev NdValence,MagpieData mode NdValence,MagpieData minimum NfValence,MagpieData maximum NfValence,MagpieData range NfValence,MagpieData mean NfValence,MagpieData avg_dev NfValence,MagpieData mode NfValence,MagpieData minimum NValence,MagpieData maximum NValence,MagpieData range NValence,MagpieData mean NValence,MagpieData avg_dev NValence,MagpieData mode NValence,MagpieData minimum NsUnfilled,MagpieData maximum NsUnfilled,MagpieData range NsUnfilled,MagpieData mean NsUnfilled,MagpieData avg_dev NsUnfilled,MagpieData mode NsUnfilled,MagpieData minimum NpUnfilled,MagpieData maximum NpUnfilled,MagpieData range NpUnfilled,MagpieData mean NpUnfilled,MagpieData avg_dev NpUnfilled,MagpieData mode NpUnfilled,MagpieData minimum NdUnfilled,MagpieData maximum NdUnfilled,MagpieData range NdUnfilled,MagpieData mean NdUnfilled,MagpieData avg_dev NdUnfilled,MagpieData mode NdUnfilled,MagpieData minimum NfUnfilled,MagpieData maximum NfUnfilled,MagpieData range NfUnfilled,MagpieData mean NfUnfilled,MagpieData avg_dev NfUnfilled,MagpieData mode NfUnfilled,MagpieData minimum NUnfilled,MagpieData maximum NUnfilled,MagpieData range NUnfilled,MagpieData mean NUnfilled,MagpieData avg_dev NUnfilled,MagpieData mode NUnfilled,MagpieData minimum GSvolume_pa,MagpieData maximum GSvolume_pa,MagpieData range GSvolume_pa,MagpieData mean GSvolume_pa,MagpieData avg_dev GSvolume_pa,MagpieData mode GSvolume_pa,MagpieData minimum GSbandgap,MagpieData maximum GSbandgap,MagpieData range GSbandgap,MagpieData mean GSbandgap,MagpieData avg_dev GSbandgap,MagpieData mode GSbandgap,MagpieData minimum GSmagmom,MagpieData maximum GSmagmom,MagpieData range GSmagmom,MagpieData mean GSmagmom,MagpieData avg_dev GSmagmom,MagpieData mode GSmagmom,MagpieData minimum SpaceGroupNumber,MagpieData maximum SpaceGroupNumber,MagpieData range SpaceGroupNumber,MagpieData mean SpaceGroupNumber,MagpieData avg_dev SpaceGroupNumber,MagpieData mode SpaceGroupNumber
0,"[[0.94814328 2.07280467 2.5112 ] Nb, [5.273...",Nb4CoSi,194.268884,"(Nb, Co, Si)",14.0,41.0,27.0,34.166667,9.111111,41.0,47.0,78.0,31.0,54.0,9.333333,47.0,28.0855,92.90638,64.82088,76.440703,21.954237,92.90638,1687.0,2750.0,1063.0,2409.166667,454.444444,2750.0,5.0,14.0,9.0,7.166667,2.888889,5.0,3.0,5.0,2.0,4.5,0.666667,5.0,111.0,164.0,53.0,148.833333,20.222222,164.0,1.6,1.9,0.3,1.696667,0.128889,1.6,1.0,2.0,1.0,1.333333,0.444444,1.0,0.0,2.0,2.0,0.333333,0.555556,0.0,0.0,7.0,7.0,3.833333,1.277778,4.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,9.0,5.0,5.5,1.166667,5.0,0.0,1.0,1.0,0.666667,0.444444,1.0,0.0,4.0,4.0,0.666667,1.111111,0.0,0.0,6.0,6.0,4.5,2.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,7.0,4.0,5.833333,1.555556,7.0,10.245,20.44,10.195,17.234167,2.329722,18.18,0.0,0.773,0.773,0.128833,0.214722,0.0,0.0,1.548471,1.548471,0.258079,0.430131,0.0,194.0,229.0,35.0,222.833333,9.611111,229.0
1,"[[0. 0. 0.] Al, [1.96639263 1.13529553 0.75278...",Al(CoSi)2,175.449907,"(Al, Co, Si)",13.0,27.0,14.0,19.0,6.4,14.0,58.0,78.0,20.0,69.0,8.8,58.0,26.981539,58.933195,31.951656,40.203786,14.983527,28.0855,933.47,1768.0,834.53,1568.694,254.0896,1687.0,9.0,14.0,5.0,11.8,2.24,9.0,3.0,4.0,1.0,3.4,0.48,3.0,111.0,126.0,15.0,119.0,6.4,111.0,1.61,1.9,0.29,1.834,0.0896,1.88,2.0,2.0,0.0,2.0,0.0,2.0,0.0,2.0,2.0,1.0,0.8,0.0,0.0,7.0,7.0,2.8,3.36,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,9.0,6.0,5.8,2.56,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,5.0,2.6,2.08,0.0,0.0,3.0,3.0,1.2,1.44,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,5.0,2.0,3.8,0.64,3.0,10.245,20.44,10.195,15.57,4.26,10.245,0.0,0.773,0.773,0.3092,0.37104,0.0,0.0,1.548471,1.548471,0.619388,0.743266,0.0,194.0,227.0,33.0,213.4,15.52,194.0
2,"[[1.480346 1.480346 1.480346] Si, [0. 0. 0.] Os]",SiOs,295.077545,"(Si, Os)",14.0,76.0,62.0,45.0,31.0,14.0,57.0,78.0,21.0,67.5,10.5,57.0,28.0855,190.23,162.1445,109.15775,81.07225,28.0855,1687.0,3306.0,1619.0,2496.5,809.5,1687.0,8.0,14.0,6.0,11.0,3.0,8.0,3.0,6.0,3.0,4.5,1.5,3.0,111.0,144.0,33.0,127.5,16.5,111.0,1.9,2.2,0.3,2.05,0.15,1.9,2.0,2.0,0.0,2.0,0.0,2.0,0.0,2.0,2.0,1.0,1.0,0.0,0.0,6.0,6.0,3.0,3.0,0.0,0.0,14.0,14.0,7.0,7.0,0.0,4.0,22.0,18.0,13.0,9.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,4.0,2.0,2.0,0.0,0.0,4.0,4.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,4.0,0.0,4.0,0.0,4.0,14.09,20.44,6.35,17.265,3.175,14.09,0.0,0.773,0.773,0.3865,0.3865,0.0,0.0,0.0,0.0,0.0,0.0,0.0,194.0,227.0,33.0,210.5,16.5,194.0
3,"[[0. 1.09045794 0.84078375] Ga, [0. ...",Ga,49.13067,(Ga),31.0,31.0,0.0,31.0,0.0,31.0,74.0,74.0,0.0,74.0,0.0,74.0,69.723,69.723,0.0,69.723,0.0,69.723,302.91,302.91,0.0,302.91,0.0,302.91,13.0,13.0,0.0,13.0,0.0,13.0,4.0,4.0,0.0,4.0,0.0,4.0,122.0,122.0,0.0,122.0,0.0,122.0,1.81,1.81,0.0,1.81,0.0,1.81,2.0,2.0,0.0,2.0,0.0,2.0,1.0,1.0,0.0,1.0,0.0,1.0,10.0,10.0,0.0,10.0,0.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,13.0,13.0,0.0,13.0,0.0,13.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,5.0,0.0,5.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,5.0,0.0,5.0,0.0,5.0,18.8575,18.8575,0.0,18.8575,0.0,18.8575,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,64.0,64.0,0.0,64.0,0.0,64.0
4,"[[1.0094265 4.24771709 2.9955487 ] Si, [3.028...",SiRu2,256.768081,"(Si, Ru)",14.0,44.0,30.0,34.0,13.333333,44.0,56.0,78.0,22.0,63.333333,9.777778,56.0,28.0855,101.07,72.9845,76.741833,32.437556,101.07,1687.0,2607.0,920.0,2300.333333,408.888889,2607.0,8.0,14.0,6.0,10.0,2.666667,8.0,3.0,5.0,2.0,4.333333,0.888889,5.0,111.0,146.0,35.0,134.333333,15.555556,146.0,1.9,2.2,0.3,2.1,0.133333,2.2,1.0,2.0,1.0,1.333333,0.444444,1.0,0.0,2.0,2.0,0.666667,0.888889,0.0,0.0,7.0,7.0,4.666667,3.111111,7.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,8.0,4.0,6.666667,1.777778,8.0,0.0,1.0,1.0,0.666667,0.444444,1.0,0.0,4.0,4.0,1.333333,1.777778,0.0,0.0,3.0,3.0,2.0,1.333333,3.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,4.0,0.0,4.0,0.0,4.0,13.51,20.44,6.93,15.82,3.08,13.51,0.0,0.773,0.773,0.257667,0.343556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,194.0,227.0,33.0,205.0,14.666667,194.0


## Add structure features

As we did in the lesson, we'll now use `DensityFeatures` to generate some features using the crytal structures as input. 

In [7]:
from matminer.featurizers.structure import DensityFeatures

de = DensityFeatures()

df = de.featurize_dataframe(df, "structure")

HBox(children=(IntProgress(value=0, description='DensityFeatures', max=975, style=ProgressStyle(description_wi…




Great! We've generated our features. Onto the next section.