<img src="../../EnvXGen/images/EnvXGen_logo.png" alt="EnvXGen_logo">

# Crystal Structure Generation Results Analysis

This notebook is designed for post-processing and analysis of crystalline structure generation results.

## Main Capabilities

### Structure Descriptor Computation
* **RDF (Radial Distribution Function)** - radial distribution function
* **ALIGNN** - descriptors based on atomistic linear graph neural network

### Dimensionality Reduction
* **PCA** (Principal Component Analysis)
* **UMAP** (Uniform Manifold Approximation and Projection)

### Analysis and Visualization
* Structure clustering with optimal cluster number determination
* Cosine similarity calculation between generated and original structures
* 2D and 3D visualization of structure projections before and after relaxation
* Energy characteristics analysis

### Optimal Structure Search
* Automatic search for energetically favorable structures in different clusters

The notebook provides a complete pipeline from descriptor computation to creating informative visualizations for analyzing generated crystalline structures.

---

### Downloading modules and functions

In [1]:
import sys
import os

generator_path = os.path.abspath('../../EnvXGen') # change to '../EnvXGen' if you run postprocessing in the results directory
if generator_path not in sys.path:
    sys.path.append(generator_path)

from postprocessing_scripts import *

  from .autonotebook import tqdm as notebook_tqdm


### Download database

In [2]:
database_filename = f'H3S_relaxation_results_summarized.pkl'

with open(database_filename, 'rb') as database:
    data = pkl.load(database)

### Calculating Descriptors
<b>Here you can choose one of two algorithms — RDF or ALIGNN</b>

As a result, you will get a  `descriptors\` folder containing a subfolder named after the selected algorithm (`RDF\` or `ALIGNN\`), which will contain the following files:

`generated_structures.csv` - descriptors of generated structures  
`POSCAR_init.csv` - descriptors of initial structure  
`relaxed_structures.csv` - descriptors of structures after relaxation  
`relaxed_structures_init_atoms.csv` - descriptors of initial atoms in relaxed structures  
`relaxed_structures_similarities.csv` - similarities of initial atoms in relaxed structures vs POSCAR_init  

In [4]:
calculate_descriptors(descriptor_algorithm='ALIGNN',
                      poscar_init_path='POSCAR_init',  # change to '../POSCAR_init' if you run postprocessing in the results directory
                      database_filename='H3S_relaxation_results_summarized.pkl',
                      device='cpu',
                      batch_size=32,
                      n_jobs=-1)

Using device: cpu
Using 12 parallel processes
Loading ALIGNN model...
Calculating descriptor for POSCAR_init...
Processing generated structures...
Calculating descriptors for generated_structures.csv...


Processing generated_structures.csv: 100%|██████████| 7/7 [02:47<00:00, 23.97s/it, batch=7/7]


ALIGNN descriptors for generated_structures.csv saved successfully

Processing relaxed structures...
Calculating descriptors for relaxed_structures.csv...


Processing relaxed_structures.csv: 100%|██████████| 7/7 [01:59<00:00, 17.01s/it, batch=7/7]


ALIGNN descriptors for relaxed_structures.csv saved successfully

Processing initial atoms in relaxed structures and calculating similarities...


Processing relaxed structures with init indices: 100%|██████████| 7/7 [00:30<00:00,  4.34s/it]

ALIGNN descriptors for initial atoms in relaxed structures saved successfully
Descriptor calculation completed!





### Reducing Descriptors
<b>Here you can choose one of two algorithms — PCA or UMAP</b>

As a result, you will get new files in `descriptors\RDF\` or `descriptors\ALIGNN\` folder:

`generated_structures_PCA.csv` or `generated_structures_UMAP.csv`  
`relaxed_structures_PCA.csv` or `relaxed_structures_UMAP.csv` 

This files will contain all neccessary information for 2D or 3D visualization.  
You may already have precomputed RDF or ALIGNN descriptors, so be sure to specify which type of descriptors you want to reduce.

In [8]:
reducing_descriptors(descriptor_algorithm='ALIGNN',
                     reducer_algorithm='UMAP',
                     poscar_init_path='POSCAR_init', # default path
                    )

Reducing data


<h3>Visualization of Crystal Structures by Descriptors</h3>

<b>This function allows you to build 2D or 3D plots of crystal structures based on their descriptors</b>
<br>

Users can customize the visualization using the <b>following parameters:</b>


<li><b>descriptor</b>: Type of descriptors used to build the plot. Possible values:
  <ul>
    <code>"RDF"</code> – Radial Distribution Function;<br>
    <code>"ALIGNN"</code> – Descriptors obtained from the ALIGNN model.
  </ul>
</li><br>

  <li><b>reducer</b>: Dimensionality reduction algorithm:
    <ul>
      <code>"PCA"</code> – Principal Component Analysis;<br>
      <code>"UMAP"</code> – Uniform Manifold Approximation and Projection.
    </ul>
  </li><br>

  <li><b>dim</b> (int): Target dimension of the projection:
    <ul>
      <code>2</code> – Two-dimensional projection;<br>
      <code>3</code> – Three-dimensional projection.
    </ul><br>
  </li>

  <li><b>enthalpy</b> (bool): Whether to add formation enthalpy of structures as an additional axis.
    <ul>
      If <code>dim = 2</code> and <code>enthalpy = True</code>, the plot will be 3D: X, Y, Enthalpy;<br>
      If <code>dim = 3</code>, adding an additional axis is not possible – the <code>enthalpy</code> parameter is ignored.
    </ul><br>
  </li>

  <li><b>structures_before_relaxation</b> and <b>structures_after_relaxation</b> (bool):
    <ul>
      Specify whether to display structures before and/or after relaxation.</br>
      At least one of these parameters must be set to <code>True</code>. A plot with neither relaxed nor unrelaxed structures is not allowed.</li>
    </ul>
  </li>



In [14]:
df_results = plot_results(database_filename=database_filename,
                          descriptor_algorithm='ALIGNN',
                          reducer_algorithm='UMAP',
                          descriptors_dimensionality=3,
                          include_enthalpy=False,
                          structures_before_relaxation=True,
                          structures_after_relaxation=True,
                          poscar_init_path='POSCAR_init' # default path
                          )

You can also analyze the <code>df_results</code> dataframe, which contains information in the following format:<p>
<code>ID</code> – ID of the structure  
<code>x</code>, <code>y</code> or <code>x</code>, <code>y</code>, <code>z</code> – coordinates, calculated using reducer  
<code>Energy</code> and <code>Volume</code> of the structure  
<code>SG</code> – space group of the structure
<code>SG_symbol</code> – international symbol of space group  
<code>Cosine_similarity</code> – similarity between initial atoms in relaxed structures vs POSCAR_init  
(насколько сильно итоговая структура похожа на исходную структуру)

### Get top structures from clusters

In [15]:
df_top_structures = find_different_optimal_structures(
    database_filename=database_filename,
    descriptor_algorithm='ALIGNN',
    k=5
)

In [16]:
df_top_structures

Unnamed: 0,ID,cluster,epoch,CalcFold,generated_structure_energy,generated_structure_volume,generated_structure_SG,generated_structure_symbol,generated_structure,relaxed_structure_energy,relaxed_structure_volume,relaxed_structure_SG,relaxed_structure_symbol,relaxed_structure,warnings
0,ID-147,0,0,147,210.226445,381.20165,1,P1,"(Atom('S', [0.0, 0.0, 0.0], index=0), Atom('S'...",130.667871,258.381381,229,Im-3m,"(Atom('S', [0.15325479582011786, 0.14788239744...",
1,ID-167,0,0,167,217.441397,381.20165,1,P1,"(Atom('S', [0.0, 0.0, 0.0], index=0), Atom('S'...",130.668273,258.382902,229,Im-3m,"(Atom('S', [5.878957103616604, 0.2247119255502...",
2,ID-23,0,0,23,235.672765,381.20165,1,P1,"(Atom('S', [0.0, 0.0, 0.0], index=0), Atom('S'...",131.834389,258.780244,10,P2/m,"(Atom('S', [5.750276515777676, 0.0576856765873...",
3,ID-198,0,0,198,201.225896,381.20165,1,P1,"(Atom('S', [0.0, 0.0, 0.0], index=0), Atom('S'...",131.834572,258.783505,10,P2/m,"(Atom('S', [5.8732081487551975, 5.891182890844...",
4,ID-136,0,0,136,232.843067,381.20165,1,P1,"(Atom('S', [0.0, 0.0, 0.0], index=0), Atom('S'...",131.834844,258.778604,10,P2/m,"(Atom('S', [0.0634802914036606, 0.122078388085...",
5,ID-19,1,0,19,211.338849,381.20165,1,P1,"(Atom('S', [0.0, 0.0, 0.0], index=0), Atom('S'...",134.356894,259.199938,1,P1,"(Atom('S', [0.06049914760573516, 0.03015980059...",
6,ID-76,1,0,76,298.741937,381.20165,1,P1,"(Atom('S', [0.0, 0.0, 0.0], index=0), Atom('S'...",134.449651,259.357799,1,P1,"(Atom('S', [5.867568135772986, 0.1186649570862...",
7,ID-98,1,0,98,224.040683,381.20165,1,P1,"(Atom('S', [0.0, 0.0, 0.0], index=0), Atom('S'...",134.902488,259.015756,1,P1,"(Atom('S', [5.8573024400365785, 5.856655608677...",
8,ID-109,1,0,109,203.076097,381.20165,1,P1,"(Atom('S', [0.0, 0.0, 0.0], index=0), Atom('S'...",134.982323,260.030948,1,P1,"(Atom('S', [0.033846952926157095, 0.1003885439...",
9,ID-183,1,0,183,270.272511,381.20165,1,P1,"(Atom('S', [0.0, 0.0, 0.0], index=0), Atom('S'...",135.194846,259.417396,1,P1,"(Atom('S', [5.762412749820957, 0.1165790654883...",
