This Notebook provides an overview of the usage of the **Adss_DataExtract_PostProcess** class in the **data** script.：
- **Adss_DataExtract_PostProcess**：Extract data from the VASP output files or post-process the VASP structure to make it suitable for frequency and differential charge density calculations.

In [1]:
from Jworkflow.data_process import Adss_DataExtract_PostProcess

# Adss_DataExtract_PostProcess

If you need to process adsorption-related data, you can set the element of the molecule and the height criterion of the molecule through the **reset_type_element** and **set_height_filter**

**reset_type_element** sets two types of elements, one is all the elements contained in the adsorbate, and the other is the element of the adsorbate skeleton (mainly used to determine whether the adsorbate has decomposed and rotated).

**set_height_filter** sets a height criterion so that only atoms above a certain level can be included in the molecule. It is mainly used when the substrate contains elements of the same type as the adsorbate. You can also let **ADP.height_type='z'** to use a numerical height criterion.  

Here we mainly consider the NRR reaction. The adsorbed molecule has two elements, N and H. N is set as the skeleton of the molecule, and the molecule must be an atom above the fourth layer (counting from 0).  

You also need to pay attention to the name of the POSCAR file. This script compares the movement of atoms before and after optimization by comparing POSCAR and CONTCAR. I am used to saving the initial structure as a new file and naming it POSCARoo, which is also the default value of the POSCAR file name.

In [2]:
ADP = Adss_DataExtract_PostProcess()
ADP.reset_type_element(['N', 'H'], ['N'])
ADP.set_height_filter(3)
#ADP.poscar = 'POSCAR'

---------------------------Reset type element----------------------------
                 Adsorbate elements : ['N', 'H']
                  Skeleton elements : ['N']
-----------------------------------End-----------------------------------
----------------------------Set_height_filter----------------------------
                   Height threshold : 3
                        Height type : layer
-----------------------------------End-----------------------------------


## extract
Use extract to grab VASP outputs. 
### Key parameters
- **path**：Specifies the path for VASP results
- **task_type**：Specifies the type for VASP results

**task_type** can be **'adss'**, **'slab'**, **'Gcor'** or **'sta'**. They stand for molecular adsorption, surface relaxation, free energy correction (**ther_info** file obtained by redirecting the output of vaspkit free energy correction) and single point energy calculation.

### Example
Below, we extract VASP results of different task types respectively and store the data of corresponding tasks in excel tables.

#### Grab adsorption energy calculation results
The information obtained by scraping mainly includes computational convergence information, the degree of molecular and surface reconstruction, and molecular adsorption related information (obtained from the names defined by the **slab** script after adding adsorbed molecules).

In [3]:
df_adss = ADP.extract(r'example\data_process&reaction\res_adss\Output_file', 'adss')
df_adss.to_excel(r'example\data_process&reaction\adss.xlsx')

---------------------------------Extract---------------------------------
                          Direction : example\data_process&reaction\res_adss\Output_file
                          Task type : adss
                   Extract progress : Compeleted
                     DataFrame sort : Compeleted
                         Write file : False
-----------------------------------End-----------------------------------


In [4]:
df_adss[:3]

Unnamed: 0,ads_sys,system,adsb,site,rotate,energy,converg,mtransla_skel,mupgrade_skel,mtransla_adsb,mdista_adsb,mshift_slab,mtransla_slab,mupgrade_slab,Etime,setp
0,Ga3Sc_N2h_45_0t,Ga3Sc,N2h,0t,45,-84.11741,True,0.056,1.048,0.056,0.002,0.115,0.037,0.111,1064.364,87
1,Ga3Sc_N2h_90_1t,Ga3Sc,N2h,1t,90,-84.061876,True,0.291,2.125,0.291,-0.012,0.161,0.042,0.159,1459.964,163
2,Ga3Sc_N2v_0_0t,Ga3Sc,N2v,0t,0,-84.610355,True,0.0,0.628,0.0,-0.001,0.118,0.027,0.115,796.715,69


- converg represents whether the calculation converges. 
- The first "m" in the following distance indicates what is calculated is the maximum travel distance of a single atom.
- The second represents the calculation method of the distance, transla (plane travel distance), upgrade (Z-axis travel distance), dista (distance change between atoms), shift (3-dimensional travel distance).
- The third represents the types of atoms included in calculating the distance criterion, adsb/skel (adsorbate), slab (surface).
- All distances are a measure of the change in position of a single atom or pair of atoms before and after optimization, in angstroms.

#### Grab surface relaxation results

In [5]:
df_slab = ADP.extract(r'example\data_process&reaction\res_slab\Output_file', 'slab')
df_slab.to_excel(r'example\data_process&reaction\slab.xlsx')

---------------------------------Extract---------------------------------
                          Direction : example\data_process&reaction\res_slab\Output_file
                          Task type : slab
                   Extract progress : 33.33%                   Extract progress : Compeleted
                     DataFrame sort : Compeleted
                         Write file : False
-----------------------------------End-----------------------------------


In [6]:
df_slab[:3]

Unnamed: 0,system,energy,converg,mshift_slab,ashift_slab,mtransla_slab,mupgrade_slab,Etime,setp
0,Ga3Sc,-67.373818,True,0.141,0.041,0.037,-0.141,413.217,17
1,ScYTiGaGeSn,-78.635955,True,0.678,0.177,0.661,-0.342,3872.376,28
2,Sn3Y,-79.482628,True,0.295,0.059,0.076,-0.295,634.256,8


#### Grab free energy corrects results

In [7]:
df_slab = ADP.extract(r'example\data_process&reaction\res_fre\Output_file', 'Gcor')
df_slab.to_excel(r'example\data_process&reaction\Gcor.xlsx')

---------------------------------Extract---------------------------------
                          Direction : example\data_process&reaction\res_fre\Output_file
                          Task type : Gcor
                   Extract progress : 6.67%                   Extract progress : 46.67%                   Extract progress : 86.67%                   Extract progress : Compeleted
                     DataFrame sort : Compeleted
                         Write file : False
-----------------------------------End-----------------------------------


In [8]:
df_slab[:3]

Unnamed: 0,ads_sys,system,adsb,G,ZPE,H,S,TS
0,Ga3Sc_N2v_0_0t,Ga3Sc,N2v,0.104764,0.196273,0.082376,0.000583,0.173821
1,Ga3Sc_NH2_45_0t,Ga3Sc,NH2,0.528516,0.62335,0.083402,0.000598,0.178294
2,Ga3Sc_NH3_0_0t,Ga3Sc,NH3,0.899295,0.996729,0.088659,0.000624,0.186046


## process_POSCAR
Post-process VASP structures to make them suitable for frequency and differential charge density calculations.

### Key parameters
- path：Specifies the path for VASP results
- deal_type：Specifies the type of post-processing，'fre' or 'chargediff'

### Example
We first move the most stable adsorption configuration of each adsorbate on each surface to the **adss_stablest** folder through screening and analysis (which can be assisted by the **reaction** script).

These structures are processed using **process_POSCAR** to obtain POSCARs suitable for frequency calculation, and new POSCARs are stored in the **fre** folder.

In [9]:
ADP.process_POSCAR(r'example/data_process&reaction/adss_stablest', 'fre')

-----------------------------Process POSCAR------------------------------
                          Direction : example/data_process&reaction/adss_stablest
                          Deal type : fre
                  Created direction : fre
-----------------------------------END-----------------------------------


These structures are again processed using **process_POSCAR**, this time separating the surface and adsorbate to obtain POSCARs suitable for differential charge density calculations.

After processing, the surface, molecules, and the original adsorption configuration are stored separately in **chargediff\slab**，**chargediff/adsb** and **chargediff/all**。

In [10]:
ADP.process_POSCAR(r'example/data_process&reaction/adss_stablest', 'chargediff')

-----------------------------Process POSCAR------------------------------
                          Direction : example/data_process&reaction/adss_stablest
                          Deal type : chargediff
                  Created direction : chargediff
-----------------------------------END-----------------------------------
