<a href="https://colab.research.google.com/github/peterbmob/DHMVADoE/blob/main/Project_description_e.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project in the course Design of Experiemnts, Data Handling and Statistical Analysis for Material Scientists

matminer is a Python library for data mining the properties of materials.

Matminer contains routines for 40+ datasets (described [here](https://hackingmaterials.lbl.gov/matminer/dataset_summary.html))

The datasets available by loading the following modules:

- load_boltztrap_mp()
- load_brgoch_superhard_training()
- load_castelli_perovskites()
- load_citrine_thermal_conductivity()
- load_dielectric_constant()
- load_double_perovskites_gap()
- load_double_perovskites_gap_lumo()
- load_elastic_tensor()
- load_expt_formation_enthalpy()
- load_expt_gap()
- load_flla()
- load_glass_binary()
- load_glass_ternary_hipt()
- load_glass_ternary_landolt()
- load_heusler_magnetic()
- load_jarvis_dft_2d()
- load_jarvis_dft_3d()
- load_jarvis_ml_dft_training()
- load_m2ax()
- load_mp()
- load_phonon_dielectric_mp()
- load_piezoelectric_tensor()
- load_steel_strength()
- load_wolverton_oxides()

To load the data set:



```
from matminer.datasets.convenience_loaders import load_elastic_tensor
df = load_elastic_tensor()  # loads dataset in a pandas DataFrame object

```

Watch columns:



```
df.head()
```

Drop unwanted columns:


```
unwanted_columns = ["volume", "nsites", "compliance_tensor", "elastic_tensor",
                    "elastic_tensor_original", "K_Voigt", "G_Voigt", "K_Reuss", "G_Reuss"]
df = df.drop(unwanted_columns, axis=1)
```

Matminer has its own featurizer described [here](https://hackingmaterials.lbl.gov/matminer/featurizer_summary.html). More features can be obtained using the [CBFV package](https://github.com/Kaaiian/CBFV).

## Task:
1. Choose a suitable material property.
2. Investigate what features are important to describe it.
3. Build a model with as few variables as possible that describe the property you selected.
4. Predict/optimize the property at hand... here you must have an "imaginary" application for the property you have choosen. Try to figure out a case where you want a certain value for the proerty at hand.


In the report, I want to see how you set up the workflow (from data aqcuisition to sharing the results) for solving the problem. Try to write an introduction from a materials science point of view.





In [1]:
!pip install matminer
!pip install ydata-profiling

Collecting matminer
  Downloading matminer-0.9.0-py3-none-any.whl (1.4 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.4 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.1/1.4 MB[0m [31m3.4 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━[0m [32m1.2/1.4 MB[0m [31m18.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m16.2 MB/s[0m eta [36m0:00:00[0m
Collecting pymongo (from matminer)
  Downloading pymongo-4.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (671 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m671.3/671.3 kB[0m [31m48.2 MB/s[0m eta [36m0:00:00[0m
Collecting monty (from matminer)
  Downloading monty-2023.9.25-py3-none-any.whl (63 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m63.4/63.4 kB[0m

In [2]:
from matminer.datasets import get_available_datasets

get_available_datasets()

boltztrap_mp: Effective mass and thermoelectric properties of 8924 compounds in The  Materials Project database that are calculated by the BoltzTraP software package run on the GGA-PBE or GGA+U density functional theory calculation results. The properties are reported at the temperature of 300 Kelvin and the carrier concentration of 1e18 1/cm3.

brgoch_superhard_training: 2574 materials used for training regressors that predict shear and bulk modulus.

castelli_perovskites: 18,928 perovskites generated with ABX combinatorics, calculating gllbsc band gap and pbe structure, and also reporting absolute band edge positions and heat of formation.

citrine_thermal_conductivity: Thermal conductivity of 872 compounds measured experimentally and retrieved from Citrine database from various references. The reported values are measured at various temperatures of which 295 are at room temperature.

dielectric_constant: 1,056 structures with dielectric properties, calculated with DFPT-PBE.

double_

['boltztrap_mp',
 'brgoch_superhard_training',
 'castelli_perovskites',
 'citrine_thermal_conductivity',
 'dielectric_constant',
 'double_perovskites_gap',
 'double_perovskites_gap_lumo',
 'elastic_tensor_2015',
 'expt_formation_enthalpy',
 'expt_formation_enthalpy_kingsbury',
 'expt_gap',
 'expt_gap_kingsbury',
 'flla',
 'glass_binary',
 'glass_binary_v2',
 'glass_ternary_hipt',
 'glass_ternary_landolt',
 'heusler_magnetic',
 'jarvis_dft_2d',
 'jarvis_dft_3d',
 'jarvis_ml_dft_training',
 'm2ax',
 'matbench_dielectric',
 'matbench_expt_gap',
 'matbench_expt_is_metal',
 'matbench_glass',
 'matbench_jdft2d',
 'matbench_log_gvrh',
 'matbench_log_kvrh',
 'matbench_mp_e_form',
 'matbench_mp_gap',
 'matbench_mp_is_metal',
 'matbench_perovskites',
 'matbench_phonons',
 'matbench_steels',
 'mp_all_20181018',
 'mp_nostruct_20181018',
 'phonon_dielectric_mp',
 'piezoelectric_tensor',
 'ricci_boltztrap_mp_tabular',
 'steel_strength',
 'superconductivity2018',
 'tholander_nitrides',
 'ucsb_thermoe

In [4]:
from matminer.datasets import get_all_dataset_info

print(get_all_dataset_info("expt_gap"))

Dataset: expt_gap
Description: Experimental band gap of 6354 inorganic semiconductors.
Columns:
	formula: chemical formula
	gap expt: band gap (in eV) measured experimentally
Num Entries: 6354
Reference: https://pubs.acs.org/doi/suppl/10.1021/acs.jpclett.8b00124
Bibtex citations: ['@article{doi:10.1021/acs.jpclett.8b00124,\nauthor = {Zhuo, Ya and Mansouri Tehrani, Aria and Brgoch, Jakoah},\ntitle = {Predicting the Band Gaps of Inorganic Solids by Machine Learning},\njournal = {The Journal of Physical Chemistry Letters},\nvolume = {9},\nnumber = {7},\npages = {1668-1673},\nyear = {2018},\ndoi = {10.1021/acs.jpclett.8b00124},\nnote ={PMID: 29532658},\neprint = {\nhttps://doi.org/10.1021/acs.jpclett.8b00124\n\n}}']
File type: json.gz
Figshare URL: https://ndownloader.figshare.com/files/13464434
SHA256 Hash Digest: 2d0980e3533c1ba6ad6e392a88f08cfcf2d311d4b7fe6eb0b0c8e876211dfda3




In [5]:
from matminer.datasets import load_dataset

df = load_dataset("expt_gap")

Fetching expt_gap.json.gz from https://ndownloader.figshare.com/files/13464434 to /usr/local/lib/python3.10/dist-packages/matminer/datasets/expt_gap.json.gz


Fetching https://ndownloader.figshare.com/files/13464434 in MB: 0.051199999999999996MB [00:00,  9.82MB/s]     


In [6]:
df.head()

Unnamed: 0,formula,gap expt
0,Hg0.7Cd0.3Te,0.35
1,CuBr,3.08
2,LuP,1.3
3,Cu3SbSe4,0.4
4,ZnO,3.44


In [7]:
df.columns

Index(['formula', 'gap expt'], dtype='object')

In [8]:
df.describe()

Unnamed: 0,gap expt
count,6354.0
mean,1.252225
std,1.539961
min,0.0
25%,0.0
50%,0.71
75%,2.13
max,11.7


In [14]:
from ydata_profiling import ProfileReport

In [15]:
profile = ProfileReport(df, title="Profiling Report")

In [16]:
profile.to_notebook_iframe()

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

## Filtering data

In [13]:
mask = df["gap expt"] > 0
nonmetal_df = df[mask]
nonmetal_df

Unnamed: 0,formula,gap expt
0,Hg0.7Cd0.3Te,0.35
1,CuBr,3.08
2,LuP,1.30
3,Cu3SbSe4,0.40
4,ZnO,3.44
...,...,...
3891,ZnTe,2.25
3892,ZnTe,2.29
3893,ZnSe,2.76
3894,ZnSnP2,1.66


In [18]:
from pymatgen.core import Composition
from matminer.featurizers.composition.element import ElementFraction

ef = ElementFraction()

In [23]:
ff=[]
for formel in df['formula']:
  ff.append(Composition(formel))

df['composition']=ff

df

Unnamed: 0,formula,gap expt,composition
0,Hg0.7Cd0.3Te,0.35,"(Hg, Cd, Te)"
1,CuBr,3.08,"(Cu, Br)"
2,LuP,1.30,"(Lu, P)"
3,Cu3SbSe4,0.40,"(Cu, Sb, Se)"
4,ZnO,3.44,"(Zn, O)"
...,...,...,...
6349,Tm2MgTl,0.00,"(Tm, Mg, Tl)"
6350,Nb5Ga4,0.00,"(Nb, Ga)"
6351,Tb2Sb5,0.00,"(Tb, Sb)"
6352,Lu2AlTc,0.00,"(Lu, Al, Tc)"
