## <font color=#FF0000>元素描述符的主要网站和来源</font>
### 1) 无机元素描述符
- https://github.com/hackingmaterials/matminer/tree/main/matminer/utils/data_files/magpie_elementdata
- https://mendeleev.readthedocs.io/en/stable/data.html

### 2) 有机分子元素描述符
- https://www.rdkit.org/docs/GettingStartedInPython.html
- https://github.com/CompPhysVienna/MLSummerSchoolVienna2022/blob/main/Day10_July22/polymer_featurization.ipynb
- https://github.com/digital-synthesis-lab/conformers
- https://github.com/digital-chemistry-laboratory/morfeus
- http://www.scbdd.com/padel_desc/index/
- https://onlinelibrary.wiley.com/doi/full/10.1002/jcc.21707

### <font color=#FF0000>1. 基于matminer的元素描述符</font>
- https://github.com/hackingmaterials/matminer/blob/main/matminer/featurizers/composition/composite.py
- https://hackingmaterials.lbl.gov/matminer/featurizer_summary.html
- https://matsci.org/t/how-to-generate-average-bond-length-average-bond-angle-features-by-using-matmier/36678
- https://matsci.org/t/add-vdw-radius-as-a-feature/50317/4

In [None]:
from matminer.featurizers.conversions import StrToComposition
df = StrToComposition().featurize_dataframe(df, "formula")

from matminer.featurizers.composition import ElementProperty
ep_feat = ElementProperty.from_preset(preset_name="magpie", stats=["mean"])  
# preset_name: (str) can be one of "magpie", "deml", "matminer", "matscholar_el", or "megnet_el".  
# stats = ["minimum", "maximum", "range", "mean", "avg_dev", "mode"] for magpie
# stats = ["minimum", "maximum", "range", "mean", "std_dev"] for "deml", "matminer", "matscholar_el", or "megnet_el".
df = ep_feat.featurize_dataframe(df, col_id="composition")

# 氧化态
from matminer.featurizers.conversions import CompositionToOxidComposition
df = CompositionToOxidComposition().featurize_dataframe(df, "composition")

from matminer.featurizers.composition import OxidationStates
os_feat = OxidationStates()
df = os_feat.featurize_dataframe(df, "composition_oxid")

### <font color=#FF0000>2. 基于CBFV的元素描述符</font>
- https://github.com/Kaaiian/CBFV
- https://github.com/Kaaiian/CBFV/tree/master/cbfv/element_properties

**elem_prop='magpie'，除了magpie还可以赋值下面的值！**
- jarvis
- magpie
- mat2vec
- oliynyk (default)
- onehot
- random_200

In [None]:
from CBFV import composition
X, y, formulae, skipped = composition.generate_features(df,
                                                        elem_prop='magpie',
                                                        drop_duplicates=False,
                                                        extend_features=False,
                                                        sum_feat=False)

### <font color=#FF0000>3. 基于ElementEmbeddings的元素描述符</font>
- https://github.com/WMD-group/ElementEmbeddings
- https://wmd-group.github.io/ElementEmbeddings/0.4/

**embedding="magpie"，除了magpie还可以赋值下面的值！**
- Magpie	magpie
- Magpie (scaled)	magpie_sc
- Mat2Vec	mat2vec
- Matscholar	matscholar
- Megnet (16 dimensions)	megnet16
- Modified pettifor scale	mod_petti
- Oliynyk	oliynyk
- Oliynyk (scaled)	oliynyk_sc
- Random (200 dimensions)	random_200
- SkipAtom	skipatom
- Atomic Number	atomic

**stats=["mean","sum"]，除了mean和sum还可以赋值下面的值！**
- "mean",
- "variance",
- "minpool",
- "maxpool",
- "range",
- "sum",
- "geometric_mean",
- "harmonic_mean",

In [None]:
from elementembeddings.composition import composition_featuriser

df_featurised = composition_featuriser(df, embedding="magpie", stats=["mean","sum"])

df_featurised

### <font color=#FF0000>4. 基于XenonPy的元素描述符</font>
- https://github.com/yoshida-lab/XenonPy/releases/tag/v0.6.8
- https://xenonpy.readthedocs.io/en/latest/tutorials/2-descriptor.html
- https://github.com/yoshida-lab/XenonPy/blob/master/samples/custom_descriptor_calculator.ipynb

In [None]:
import pandas as pd

df = pd.read_excel('band-gap.xlsx')

df.rename(columns = {"Materials": "composition"}, inplace=True) 
df.rename(columns = {"a=b=c (Å)\nPBE": "La"}, inplace=True)
df.rename(columns = {"Eg (eV)\nHSE+SOC": "target"}, inplace=True)


chemical_list = df['composition'].tolist()
converted_list = []

for chemical_formula in chemical_list:
    chemical_dict = {}
    elements = re.findall('[A-Z][a-z]?\d*', chemical_formula)
    
    for element in elements:
        element_name = re.findall('[A-Z][a-z]?', element)[0]
        element_count = re.findall('\d+', element)
        if element_count:
            element_count = int(element_count[0])
        else:
            element_count = 1
        
        chemical_dict[element_name] = element_count
    
    converted_list.append(chemical_dict)

from xenonpy.descriptor import Compositions
import pandas as pd

data = pd.read_pickle('elements_completed.pd')

# Define the compounds' compositions
comps = converted_list

# Create an instance of the Compositions calculator
cal = Compositions()

# Transform the compositions and obtain the descriptors
descriptors = cal.transform(comps)

# Convert the descriptors to a pandas DataFrame
df = pd.DataFrame(descriptors)

# Print the resulting DataFrame
df

### <font color=#FF0000>5. 基于jabir的元素描述符</font>
- https://github.com/Gashmard/jabir
- https://github.com/Gashmard/Soraya