# Report : Predicting bulk modulus


## 1. Data View

In [None]:
from matminer.datasets.convenience_loaders import load_elastic_tensor
df = load_elastic_tensor()  # loads dataset in a pandas DataFrame object
unwanted_columns = ["volume", "nsites", "compliance_tensor", "elastic_tensor", 
                    "elastic_tensor_original", "K_Voigt", "G_Voigt", "K_Reuss", "G_Reuss"]
df = df.drop(unwanted_columns, axis=1)
df.describe()

**使用decribe()函数可以先预览一下当前表格里面的数据**

## 2. Data Featurization

**把材料的抽象属性（比如成分、晶体结构）转化成一组可以用数字表示的特征，以便模型能够理解和处理**  

**StrToComposition** 用于将化学式从字符串拆分成可被模型读取的成分列  
  
**elementproperty-Magpie** 材料信息学里一个经典的特征体系，它包含约 145 个描述符，基于元素的基础属性
   
>  原子量 (atomic_mass) | 电负性 (X) | 原子体积 (atom_radius) | 熔点 (melting_point) | 价电子数 (valence_electrons) |  
> 以及这些属性在化合物中的统计量（均值、方差、最大值、最小值、范围等）  

**CompositionToOxidComposition** 通过化学式内元素成分推算对应氧化价态

**OxidationStates** 根据推算出的氧化价态数据计算统计特征

**DensityFeatures** 基于每个材料的晶体结构信息计算一些密度相关的几何特征。


In [None]:
from matminer.featurizers.conversions import StrToComposition
df = StrToComposition().featurize_dataframe(df, "formula")
df.head()

In [None]:
from matminer.featurizers.composition import ElementProperty

ep_feat = ElementProperty.from_preset(preset_name="magpie")
df = ep_feat.featurize_dataframe(df, col_id="composition")  # input the "composition" column to the featurizer
df.head()

In [None]:
from matminer.featurizers.conversions import CompositionToOxidComposition
from matminer.featurizers.composition import OxidationStates

df = CompositionToOxidComposition().featurize_dataframe(df, "composition")

os_feat = OxidationStates()
df = os_feat.featurize_dataframe(df, "composition_oxid")
df.head()

In [None]:
from matminer.featurizers.structure import DensityFeatures

df_feat = DensityFeatures()
df = df_feat.featurize_dataframe(df, "structure")  # input the structure column to the featurizer
df.head()
df_feat.feature_labels()

## 3. Machine Learning