# Generating polymers pool and calculating the polymers descriptors

This notebook outlines its main objective: to cultivate a diverse pool of polymers, utilizing computational techniques for simulating and studying polymer structures with an aim to propel the discovery of functional polymers using the rule-based virtual library generator, `SMiPoly`. Additionally, it integrates `RadonPy`, a powerful toolkit for molecular dynamics and property prediction, further enriching the analysis and optimization of the generated polymer structures by providing detailed insights into their physical and chemical behaviors.

### Generating pool
The `generate_pool` function utilizes the `SMiPoly` library to process molecular data, specifically `SMILES` strings, transforming them into a unique pool of polymers. The process begins by classifying monomers through their `SMILES` representations. It identifies candidates suitable for polymerization and filters out those that are not, focusing only on those with the potential to form polymers.

The function then simulates chemical reactions between pairs of monomers to generate bipolymers, following the specific rules set by `SMiPoly for polymer formation. This crucial step ensures the creation of polymers from two distinct monomer units, which closely mirrors real-world polymerization techniques. The resulting polymers are further processed to filter specific structural features, perform deduplication to ensure each polymer is unique, and cleanse the data to produce a structured dataframe. This dataframe is enriched with unique identifiers for each polymer and placeholders for additional molecular properties, which prepares it for more complex analyses, such as Bayesian Optimizations.

In [None]:
import pandas as pd
from spacier.wrappers.smipoly import generate_pool

# Load molecular data from a CSV file
DATA_DIR = "../spacier/data/"
monomer_path = DATA_DIR + "monomer.csv"
df = pd.read_csv(monomer_path)

# Define additional properties to calculate for each polymer
props = ["refractive_index", "abbe_number"]

# Generate a pool of unique polymers
df_pool = generate_pool(df, props)

# Save the pool to a CSV file
df_pool.to_csv("df_pool.csv", index=False)

### Calculate Force Field (FF) descriptors
`calc_ff_descriptors` function calculates force field descriptors essential for understanding polymers' molecular dynamics, using kernel mean embedding to standardize the complex and variable molecular force field parameters from GAFF2 (General Amber Force Field 2) into uniform, fixed-length vectors. GAFF2 parameters, which include a wide range of molecular interactions from covalent bonds to non-covalent forces like van der Waals and Coulomb forces, are mapped into a high-dimensional feature space using a Gaussian kernel function. This allows for molecular comparisons by simplifying their interactions into a single, comprehensive vector. 

The discretization of these parameters into intervals represented by Gaussian functions further refines this process, enabling an accurate approximation of their distribution across the dataset. This technique not only facilitates the quantitative analysis of molecular behaviors but also significantly enhances the efficiency of polymer research by providing a streamlined, informative view of each molecule's intrinsic properties. Detailed explanations are given [here](https://github.com/RadonPy/RadonPy/blob/develop/docs/FF-Descriptor_man.pdf).

In [None]:
from spacier.wrappers.radonpy import calc_ff_descriptors

# Calculate the force field descriptors for the pool
df_pool_X = calc_ff_descriptors(df_pool)

# Save the pool with the force field descriptors
df_pool_X.to_csv("df_pool_X.csv", index=False)