<a href="https://colab.research.google.com/github/peterbmob/DHMVADoE/blob/main/Project_description_e.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project in the course Design of Experiemnts, Data Handling and Statistical Analysis for Material Scientists

matminer is a Python library for data mining the properties of materials.

Matminer contains routines for 40+ datasets (described [here](https://hackingmaterials.lbl.gov/matminer/dataset_summary.html))

The datasets available by loading the following modules:

- load_boltztrap_mp()
- load_brgoch_superhard_training()
- load_castelli_perovskites()
- load_citrine_thermal_conductivity()
- load_dielectric_constant()
- load_double_perovskites_gap()
- load_double_perovskites_gap_lumo()
- load_elastic_tensor()
- load_expt_formation_enthalpy()
- load_expt_gap()
- load_flla()
- load_glass_binary()
- load_glass_ternary_hipt()
- load_glass_ternary_landolt()
- load_heusler_magnetic()
- load_jarvis_dft_2d()
- load_jarvis_dft_3d()
- load_jarvis_ml_dft_training()
- load_m2ax()
- load_mp()
- load_phonon_dielectric_mp()
- load_piezoelectric_tensor()
- load_steel_strength()
- load_wolverton_oxides()

To load the data set:



```
from matminer.datasets.convenience_loaders import load_elastic_tensor
df = load_elastic_tensor()  # loads dataset in a pandas DataFrame object

```

Watch columns:



```
df.head()
```

Drop unwanted columns:


```
unwanted_columns = ["volume", "nsites", "compliance_tensor", "elastic_tensor",
                    "elastic_tensor_original", "K_Voigt", "G_Voigt", "K_Reuss", "G_Reuss"]
df = df.drop(unwanted_columns, axis=1)
```

Matminer has its own featurizer described [here](https://hackingmaterials.lbl.gov/matminer/featurizer_summary.html). More features can be obtained using the [CBFV package](https://github.com/Kaaiian/CBFV).

## Task:
1. Choose a suitable material property.
2. Investigate what features are important to describe it.
3. Build a model with as few variables as possible that describe the property you selected.
4. Predict/optimize the property at hand... here you must have an "imaginary" application for the property you have choosen. Try to figure out a case where you want a certain value for the proerty at hand.


In the report, I want to see how you set up the workflow (from data aqcuisition to sharing the results) for solving the problem. Try to write an introduction from a materials science point of view.





In [1]:
!pip install matminer
!pip install ydata-profiling

Collecting matminer
  Downloading matminer-0.9.0-py3-none-any.whl (1.4 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.4 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.2/1.4 MB[0m [31m5.6 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.4/1.4 MB[0m [31m20.6 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m16.7 MB/s[0m eta [36m0:00:00[0m
Collecting pymongo (from matminer)
  Downloading pymongo-4.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (671 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m671.3/671.3 kB[0m [31m66.6 MB/s[0m eta [36m0:00:00[0m
Collecting monty (from matminer)
  Downloading monty-2023.9.25-py3-none-any.whl (63 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m63.4/63.4 kB[0m [31m8.0

In [2]:
from matminer.datasets import get_available_datasets

get_available_datasets()

boltztrap_mp: Effective mass and thermoelectric properties of 8924 compounds in The  Materials Project database that are calculated by the BoltzTraP software package run on the GGA-PBE or GGA+U density functional theory calculation results. The properties are reported at the temperature of 300 Kelvin and the carrier concentration of 1e18 1/cm3.

brgoch_superhard_training: 2574 materials used for training regressors that predict shear and bulk modulus.

castelli_perovskites: 18,928 perovskites generated with ABX combinatorics, calculating gllbsc band gap and pbe structure, and also reporting absolute band edge positions and heat of formation.

citrine_thermal_conductivity: Thermal conductivity of 872 compounds measured experimentally and retrieved from Citrine database from various references. The reported values are measured at various temperatures of which 295 are at room temperature.

dielectric_constant: 1,056 structures with dielectric properties, calculated with DFPT-PBE.

double_

['boltztrap_mp',
 'brgoch_superhard_training',
 'castelli_perovskites',
 'citrine_thermal_conductivity',
 'dielectric_constant',
 'double_perovskites_gap',
 'double_perovskites_gap_lumo',
 'elastic_tensor_2015',
 'expt_formation_enthalpy',
 'expt_formation_enthalpy_kingsbury',
 'expt_gap',
 'expt_gap_kingsbury',
 'flla',
 'glass_binary',
 'glass_binary_v2',
 'glass_ternary_hipt',
 'glass_ternary_landolt',
 'heusler_magnetic',
 'jarvis_dft_2d',
 'jarvis_dft_3d',
 'jarvis_ml_dft_training',
 'm2ax',
 'matbench_dielectric',
 'matbench_expt_gap',
 'matbench_expt_is_metal',
 'matbench_glass',
 'matbench_jdft2d',
 'matbench_log_gvrh',
 'matbench_log_kvrh',
 'matbench_mp_e_form',
 'matbench_mp_gap',
 'matbench_mp_is_metal',
 'matbench_perovskites',
 'matbench_phonons',
 'matbench_steels',
 'mp_all_20181018',
 'mp_nostruct_20181018',
 'phonon_dielectric_mp',
 'piezoelectric_tensor',
 'ricci_boltztrap_mp_tabular',
 'steel_strength',
 'superconductivity2018',
 'tholander_nitrides',
 'ucsb_thermoe

In [4]:
from matminer.datasets import get_all_dataset_info

print(get_all_dataset_info("dielectric_constant"))

Dataset: dielectric_constant
Description: 1,056 structures with dielectric properties, calculated with DFPT-PBE.
Columns:
	band_gap: Measure of the conductivity of a material
	cif: optional: Description string for structure
	e_electronic: electronic contribution to dielectric tensor
	e_total: Total dielectric tensor incorporating both electronic and ionic contributions
	formula: Chemical formula of the material
	material_id: Materials Project ID of the material
	meta: optional, metadata descriptor of the datapoint
	n: Refractive Index
	nsites: The \# of atoms in the unit cell of the calculation.
	poly_electronic: the average of the eigenvalues of the electronic contribution to the dielectric tensor
	poly_total: the average of the eigenvalues of the total (electronic and ionic) contributions to the dielectric tensor
	poscar: optional: Poscar metadata
	pot_ferroelectric: Whether the material is potentially ferroelectric
	space_group: Integer specifying the crystallographic structure of t

In [3]:
from matminer.datasets import load_dataset

df = load_dataset("dielectric_constant")

Fetching dielectric_constant.json.gz from https://ndownloader.figshare.com/files/13213475 to /usr/local/lib/python3.10/dist-packages/matminer/datasets/dielectric_constant.json.gz


Fetching https://ndownloader.figshare.com/files/13213475 in MB: 0.8867839999999999MB [00:00, 150.32MB/s]      


In [5]:
df.head()

Unnamed: 0,material_id,formula,nsites,space_group,volume,structure,band_gap,e_electronic,e_total,n,poly_electronic,poly_total,pot_ferroelectric,cif,meta,poscar
0,mp-441,Rb2Te,3,225,159.501208,"[[1.75725875 1.2425695 3.04366125] Rb, [5.271...",1.88,"[[3.44115795, -3.097e-05, -6.276e-05], [-2.837...","[[6.23414745, -0.00035252, -9.796e-05], [-0.00...",1.86,3.44,6.23,False,#\#CIF1.1\n###################################...,{u'incar': u'NELM = 100\nIBRION = 8\nLWAVE = F...,Rb2 Te1\n1.0\n5.271776 0.000000 3.043661\n1.75...
1,mp-22881,CdCl2,3,166,84.298097,"[[0. 0. 0.] Cd, [ 4.27210959 2.64061969 13.13...",3.52,"[[3.34688382, -0.04498543, -0.22379197], [-0.0...","[[7.97018673, -0.29423886, -1.463590159999999]...",1.78,3.16,6.73,False,#\#CIF1.1\n###################################...,{u'incar': u'NELM = 100\nIBRION = 8\nLWAVE = F...,Cd1 Cl2\n1.0\n3.850977 0.072671 5.494462\n1.78...
2,mp-28013,MnI2,3,164,108.335875,"[[0. 0. 0.] Mn, [-2.07904300e-06 2.40067320e+...",1.17,"[[5.5430849, -5.28e-06, -2.5030000000000003e-0...","[[13.80606079, 0.0006911900000000001, 9.655e-0...",2.23,4.97,10.64,False,#\#CIF1.1\n###################################...,{u'incar': u'NELM = 100\nIBRION = 8\nLWAVE = F...,Mn1 I2\n1.0\n4.158086 0.000000 0.000000\n-2.07...
3,mp-567290,LaN,4,186,88.162562,[[-1.73309900e-06 2.38611186e+00 5.95256328e...,1.12,"[[7.09316738, 7.99e-06, -0.0003864700000000000...","[[16.79535386, 8.199999999999997e-07, -0.00948...",2.65,7.04,17.99,False,#\#CIF1.1\n###################################...,{u'incar': u'NELM = 100\nIBRION = 8\nLWAVE = F...,La2 N2\n1.0\n4.132865 0.000000 0.000000\n-2.06...
4,mp-560902,MnF2,6,136,82.826401,"[[1.677294 2.484476 2.484476] Mn, [0. 0. 0.] M...",2.87,"[[2.4239622, 7.452000000000001e-05, 6.06100000...","[[6.44055613, 0.0020446600000000002, 0.0013203...",1.53,2.35,7.12,False,#\#CIF1.1\n###################################...,{u'incar': u'NELM = 100\nIBRION = 8\nLDAUTYPE ...,Mn2 F4\n1.0\n3.354588 0.000000 0.000000\n0.000...


In [6]:
df.columns

Index(['material_id', 'formula', 'nsites', 'space_group', 'volume',
       'structure', 'band_gap', 'e_electronic', 'e_total', 'n',
       'poly_electronic', 'poly_total', 'pot_ferroelectric', 'cif', 'meta',
       'poscar'],
      dtype='object')

In [7]:
df.describe()

Unnamed: 0,nsites,space_group,volume,band_gap,n,poly_electronic,poly_total
count,1056.0,1056.0,1056.0,1056.0,1056.0,1056.0,1056.0
mean,7.530303,142.970644,166.420376,2.119432,2.434886,7.248049,14.777898
std,3.388443,67.264591,97.425084,1.604924,1.148849,13.054947,19.435303
min,2.0,1.0,13.980548,0.11,1.28,1.63,2.08
25%,5.0,82.0,96.262337,0.89,1.77,3.13,7.5575
50%,8.0,163.0,145.944691,1.73,2.19,4.79,10.54
75%,9.0,194.0,212.106405,2.885,2.73,7.44,15.4825
max,20.0,229.0,597.341134,8.32,16.03,256.84,277.78


In [8]:
df["band_gap"]

0       1.88
1       3.52
2       1.17
3       1.12
4       2.87
        ... 
1051    0.87
1052    3.60
1053    0.14
1054    0.21
1055    0.26
Name: band_gap, Length: 1056, dtype: float64

## Filtering data

In [9]:
mask = df["volume"] >= 580
df[mask]

Unnamed: 0,material_id,formula,nsites,space_group,volume,structure,band_gap,e_electronic,e_total,n,poly_electronic,poly_total,pot_ferroelectric,cif,meta,poscar
206,mp-23280,AsCl3,16,19,582.085309,"[[0.13113333 7.14863883 9.63476955] As, [2.457...",3.99,"[[2.2839161900000002, 0.00014519, -2.238000000...","[[2.49739759, 0.00069379, 0.00075864], [0.0004...",1.57,2.47,3.3,False,#\#CIF1.1\n###################################...,{u'incar': u'NELM = 100\nIBRION = 8\nLWAVE = F...,As4 Cl12\n1.0\n4.652758 0.000000 0.000000\n0.0...
216,mp-9064,RbTe,12,189,590.136085,"[[6.61780282 0. 0. ] Rb, [1.750...",0.43,"[[3.25648277, 5.9650000000000007e-05, 1.57e-06...","[[5.34517928, 0.00022474000000000002, -0.00018...",2.05,4.2,6.77,False,#\#CIF1.1\n###################################...,{u'incar': u'NELM = 100\nIBRION = 8\nLWAVE = F...,Rb6 Te6\n1.0\n10.118717 0.000000 0.000000\n-5....
219,mp-23230,PCl3,16,62,590.637274,"[[6.02561815 8.74038483 7.55586375] P, [2.7640...",4.03,"[[2.39067769, 0.00017593, 8.931000000000001e-0...","[[2.80467218, 0.00034093000000000003, 0.000692...",1.52,2.31,2.76,False,#\#CIF1.1\n###################################...,{u'incar': u'NELM = 100\nIBRION = 8\nLWAVE = F...,P4 Cl12\n1.0\n6.523152 0.000000 0.000000\n0.00...
251,mp-2160,Sb2Se3,20,62,597.341134,"[[3.02245275 0.42059268 1.7670481 ] Sb, [ 1.00...",0.76,"[[19.1521058, 5.5e-06, 0.00025268], [-1.078000...","[[81.93819038000001, 0.0006755800000000001, 0....",3.97,15.76,63.53,True,#\#CIF1.1\n###################################...,{u'incar': u'NELM = 100\nIBRION = 8\nLWAVE = F...,Sb8 Se12\n1.0\n4.029937 0.000000 0.000000\n0.0...


In [10]:
mask = df["band_gap"] > 0
nonmetal_df = df[mask]
nonmetal_df

Unnamed: 0,material_id,formula,nsites,space_group,volume,structure,band_gap,e_electronic,e_total,n,poly_electronic,poly_total,pot_ferroelectric,cif,meta,poscar
0,mp-441,Rb2Te,3,225,159.501208,"[[1.75725875 1.2425695 3.04366125] Rb, [5.271...",1.88,"[[3.44115795, -3.097e-05, -6.276e-05], [-2.837...","[[6.23414745, -0.00035252, -9.796e-05], [-0.00...",1.86,3.44,6.23,False,#\#CIF1.1\n###################################...,{u'incar': u'NELM = 100\nIBRION = 8\nLWAVE = F...,Rb2 Te1\n1.0\n5.271776 0.000000 3.043661\n1.75...
1,mp-22881,CdCl2,3,166,84.298097,"[[0. 0. 0.] Cd, [ 4.27210959 2.64061969 13.13...",3.52,"[[3.34688382, -0.04498543, -0.22379197], [-0.0...","[[7.97018673, -0.29423886, -1.463590159999999]...",1.78,3.16,6.73,False,#\#CIF1.1\n###################################...,{u'incar': u'NELM = 100\nIBRION = 8\nLWAVE = F...,Cd1 Cl2\n1.0\n3.850977 0.072671 5.494462\n1.78...
2,mp-28013,MnI2,3,164,108.335875,"[[0. 0. 0.] Mn, [-2.07904300e-06 2.40067320e+...",1.17,"[[5.5430849, -5.28e-06, -2.5030000000000003e-0...","[[13.80606079, 0.0006911900000000001, 9.655e-0...",2.23,4.97,10.64,False,#\#CIF1.1\n###################################...,{u'incar': u'NELM = 100\nIBRION = 8\nLWAVE = F...,Mn1 I2\n1.0\n4.158086 0.000000 0.000000\n-2.07...
3,mp-567290,LaN,4,186,88.162562,[[-1.73309900e-06 2.38611186e+00 5.95256328e...,1.12,"[[7.09316738, 7.99e-06, -0.0003864700000000000...","[[16.79535386, 8.199999999999997e-07, -0.00948...",2.65,7.04,17.99,False,#\#CIF1.1\n###################################...,{u'incar': u'NELM = 100\nIBRION = 8\nLWAVE = F...,La2 N2\n1.0\n4.132865 0.000000 0.000000\n-2.06...
4,mp-560902,MnF2,6,136,82.826401,"[[1.677294 2.484476 2.484476] Mn, [0. 0. 0.] M...",2.87,"[[2.4239622, 7.452000000000001e-05, 6.06100000...","[[6.44055613, 0.0020446600000000002, 0.0013203...",1.53,2.35,7.12,False,#\#CIF1.1\n###################################...,{u'incar': u'NELM = 100\nIBRION = 8\nLDAUTYPE ...,Mn2 F4\n1.0\n3.354588 0.000000 0.000000\n0.000...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1051,mp-568032,Cd(InSe2)2,7,111,212.493121,"[[0. 0. 0.] Cd, [2.9560375 0. 3.03973 ...",0.87,"[[7.74896783, 0.0, 0.0], [0.0, 7.74896783, 0.0...","[[11.85159471, 1e-08, 0.0], [1e-08, 11.8515962...",2.77,7.67,11.76,True,#\#CIF1.1\n###################################...,{u'incar': u'NELM = 100\nIBRION = 8\nLWAVE = F...,Cd1 In2 Se4\n1.0\n5.912075 0.000000 0.000000\n...
1052,mp-696944,LaHBr2,8,194,220.041363,"[[2.068917 3.58317965 3.70992025] La, [4.400...",3.60,"[[4.40504391, 6.1e-07, 0.0], [6.1e-07, 4.40501...","[[8.77136355, 1.649999999999999e-06, 0.0], [1....",2.00,3.99,7.08,True,#\#CIF1.1\n###################################...,{u'incar': u'NELM = 100\nIBRION = 8\nLWAVE = F...,La2 H2 Br4\n1.0\n4.137833 0.000000 0.000000\n-...
1053,mp-16238,Li2AgSb,4,216,73.882306,"[[1.35965225 0.96141925 2.354987 ] Li, [2.719...",0.14,"[[212.60750153, -1.843e-05, 0.0], [-1.843e-05,...","[[232.59707383, -0.0005407400000000001, 0.0025...",14.58,212.61,232.60,True,#\#CIF1.1\n###################################...,{u'incar': u'NELM = 100\nIBRION = 8\nLWAVE = F...,Li2 Ag1 Sb1\n1.0\n4.078957 0.000000 2.354987\n...
1054,mp-4405,Rb3AuO,5,221,177.269065,"[[0. 2.808758 2.808758] Rb, [2.808758 2....",0.21,"[[6.40511712, 0.0, 0.0], [0.0, 6.40511712, 0.0...","[[22.43799785, 0.0, 0.0], [0.0, 22.4380185, 0....",2.53,6.41,22.44,True,#\#CIF1.1\n###################################...,{u'incar': u'NELM = 100\nIBRION = 8\nLWAVE = F...,Rb3 Au1 O1\n1.0\n5.617516 0.000000 0.000000\n0...
