In [None]:
!git --version

In [None]:
!git init

In [None]:
!git remote remove origin

!git remote add origin https://github.com/evanfishfish/py4sci_project.git


### Note:
The above cells exist to set up future github integration.

# What is SMIPoly?

SMiPoly (Small Molecules into Polymers) is a Python-based, rule-driven virtual library generator.

### Now what does this mean?
SMIPoly utilizes 22 main 'rules' of polymerization to determine potential structures of polymers based on input criteria.

### Polymerization rules
| #  | Polymer Class       | Monomer Class 1      | Monomer Class 2         | Reaction Type         | Description/Example Functional Groups                   |
|----|---------------------|----------------------|-------------------------|----------------------|---------------------------------------------------------|
| 1  | Polyolefin          | Vinyl                | -                       | Addition             | Polymerization of simple olefins (e.g., ethylene)       |
| 2  | Polyolefin          | Cyclic olefin        | -                       | Addition             | Polymerization of cyclic olefins                        |
| 3  | Polyolefin          | Vinyl                | Vinyl                   | Addition             | Copolymerization of two vinyl monomers                  |
| 4  | Polyolefin          | Vinyl                | Cyclic olefin           | Addition             | Copolymerization of vinyl and cyclic olefin             |
| 5  | Polyolefin          | Cyclic olefin        | Cyclic olefin           | Addition             | Copolymerization of two cyclic olefins                  |
| 6  | Polyolefin          | Vinyl                | -                       | Ring-opening         | Ring-opening polymerization of vinyl monomers           |
| 7  | Polyolefin          | Cyclic olefin        | -                       | Ring-opening         | Ring-opening polymerization of cyclic olefins           |
| 8  | Polyester           | Diol                 | Diacid (or equivalent)  | Polycondensation     | Esterification (e.g., diol + diacid)                    |
| 9  | Polyester           | Hydroxy acid         | -                       | Polycondensation     | Self-condensation of hydroxy acid                       |
| 10 | Polyester           | Diol                 | Phosgene (or equivalent)| Polycondensation     | Diol + phosgene (polycarbonate synthesis)               |
| 11 | Polyester           | Diol                 | Diester                 | Polycondensation     | Diol + diester                                          |
| 12 | Polyether           | Diol                 | -                       | Polycondensation     | Self-condensation of diol                               |
| 13 | Polyamide           | Diamine              | Diacid (or equivalent)  | Polycondensation     | Diamine + diacid (amide formation)                      |
| 14 | Polyamide           | Amino acid           | -                       | Polycondensation     | Self-condensation of amino acid                         |
| 15 | Polyamide           | Diamine              | Diester                 | Polycondensation     | Diamine + diester                                       |
| 16 | Polyimide           | Dianhydride          | Diamine                 | Polycondensation     | Dianhydride + diamine                                   |
| 17 | Polyurethane        | Diol                 | Diisocyanate            | Polyaddition         | Diol + diisocyanate                                     |
| 18 | Polyurethane        | Diamine              | Diisocyanate            | Polyaddition         | Diamine + diisocyanate                                  |
| 19 | Polyoxazolidone     | Di/polyepoxide       | Di/polyisocyanate       | Polyaddition         | Epoxide + isocyanate                                    |
| 20 | Polyether           | Epoxide              | -                       | Ring-opening         | Ring-opening polymerization of epoxides                 |
| 21 | Polyethersulfone    | Bisphenol            | Dichlorodiphenyl sulfone| Polycondensation     | Bisphenol + dichlorodiphenyl sulfone                   |
| 22 | Polyetherketone     | Bisphenol            | Dichlorobenzophenone    | Polycondensation     | Bisphenol + dichlorobenzophenone                       |


## SMIPoly has two main functions: monc and polg.

#### monc.py 
Classifies monomers based on their functional groups.

## Smiles: 
Simplified Molecular Input Line Entry System

In [None]:
import pandas as pd
DF = pd.read_csv("https://raw.githubusercontent.com/evanfishfish/py4sci_project/refs/heads/main/202207_smip_monset.csv")
DF

| Classification | Description                                                                                 | Examples / Notes                              |
|----------------|---------------------------------------------------------------------------------------------|-----------------------------------------------|
| **Vinyl**      | Contains C=C double bonds for addition (chain) polymerization                              | Ethylene, Styrene                             |
| **Epo**        | Epoxide groups (three-membered cyclic ethers)                                             | Epichlorohydrin                              |
| **cOle**       | Cyclic olefins with ring-strained double bonds                                            | Norbornene                                   |
| **Lactone**    | Cyclic esters suitable for ring-opening polymerization                                    | ε-Caprolactone                               |
| **Lactam**     | Cyclic amides used in ring-opening polymerization                                         | Caprolactam                                  |
| **hydCOOH**    | Hydroxy carboxylic acids (monomers with both -OH and -COOH groups)                        | 6-Hydroxyhexanoic acid                       |
| **aminCOOH**   | Amino acids (monomers with both -NH₂ and -COOH groups)                                   | Glycine, Alanine                             |
| **hindPhenol** | Sterically hindered phenols, often bisphenol analogs                                     | Bisphenol A                                  |
| **cAnhyd**     | Cyclic carboxylic acid anhydrides                                                        | Phthalic anhydride                           |
| **CO**         | Carbon monoxide (used in coordination polymerization or carbonyl-containing monomers)     | Context-specific                             |
| **HCHO**       | Formaldehyde (used in phenolic/formaldehyde resins)                                      | Phenol-formaldehyde resins                    |
| **sfonediX**   | Bis(p-halogenated aryl) sulfones (X = halogen)                                           | Bis(4-fluorophenyl) sulfone                   |
| **BzodiF**     | Bis(p-fluoroaryl) ketones                                                                | Aromatic ketones with fluorine substituents  |
| **diepo**      | Di- or poly-epoxides                                                                     | Diglycidyl ether of bisphenol A (DGEBA)      |
| **diCOOH**     | Di- or poly-carboxylic acids                                                             | Terephthalic acid                            |
| **diol**       | Di- or poly-hydroxyl compounds (polyols)                                                 | Ethylene glycol, Glycerol                     |
| **diamin**     | Di- or poly-amines                                                                       | Hexamethylenediamine                         |
| **diNCO**      | Di- or poly-isocyanates                                                                  | Toluene diisocyanate (TDI)                    |
| **dicAnhyd**   | Di- or poly-cyclic carboxylic acid anhydrides                                           | Pyromellitic dianhydride                      |
| **pridiamin**  | Primary di- or poly-amines (specific subclass of diamin)                                | Primary diamines                              |
| **diol_b**     | Di- or poly-hydroxyl compounds excluding thiols                                         | Typical polyols without sulfur                |


In [None]:
from smipoly.smip import monc

DF_class = monc.moncls(df=DF, smiColn='SMILES', dsp_rsl=True)

This will list the number of each classification found in the dataframe.

In [None]:
DF_class

This now adds the columns for each classification and True/False statements indication whether each monomer belongs to said classification.

In [None]:
DF_class.to_csv('DF_class.csv', index=False)

!git add DF_class.csv
!git commit -m "class_stuff"
!git push -u origin main --force

This will ocnvert the dataframe to a csv file and upload it directly to my Github!

In [None]:
DF_class_indexed = DF_class.set_index ('SMILES')
DF_class_indexed

A lot of the things used in this operation require SMILES, so I set the index to the SMILES column.

In [None]:
DF_class_indexed.to_csv('DF_class_indexed.csv', index=False)


!git add DF_class_indexed.csv
!git commit -m "midpoint_stuff_1"
!git push -u origin main --force

Once again changed to a csv and uploaded.

In [None]:
DF_class_indexed_bool = DF_class_indexed.replace({'True': True, 'False': False})
DF_class_indexed_bool

For my next operation I found that it was registering the True/False statements as strings, so I changed them to boolean.

In [None]:
DF_class_indexed_bool.to_csv('DF_class_indexed_bool.csv', index=False)
!git add DF_class_indexed_bool.csv
!git commit -m "bool_stuff"
!git push -u origin main --force

In [None]:
smiles_input = input(f"Enter the SMILES string").strip()

if smiles_input in DF_class_indexed_bool.index:
    row = DF_class_indexed_bool.loc[smiles_input]
    # Use == True instead of is True
    true_columns = [col for col in DF_class_indexed.columns if row[col] == True]
    if true_columns:
        print(f"Columns with True value for SMILES '{smiles_input}': {true_columns}")
    else:
        print(f"No columns have True value for SMILES '{smiles_input}'.")
else:
    print(f"SMILES '{smiles_input}' not found in the DataFrame index.")


Now you can enter a SMILES structure from the dataframe and it will tell you its classification(s).

#### Polg
Generates polymer repeating units by applying functional group transformations,

In [None]:
from smipoly.smip import monc, polg
import pandas as pd
import numpy as np
import time

In [None]:
from smipoly.smip import monc, polg
import pandas as pd

DF = pd.read_csv("https://raw.githubusercontent.com/evanfishfish/py4sci_project/refs/heads/main/202207_smip_monset.csv")

def polymerize_two_monomers():
    monomer1 = input("Enter first monomer SMILES: ").strip()
    monomer2 = input("Enter second monomer SMILES: ").strip()
    
    df = pd.DataFrame({
        'SMILES': [monomer1, monomer2],
        'Name': ['Monomer1', 'Monomer2']
    })
    
    classified_df = monc.moncls(df, smiColn='SMILES', dsp_rsl=False)
    
    polymers = polg.biplym(classified_df, targ=['all'], Pmode='a', dsp_rsl=False)
    
    if not polymers.empty:
        print("\nPossible polymers:")
        for idx, row in polymers.iterrows():
            print(f"Polymer {idx+1}: {row['polym']} (Type: {row['polymer_class']})")
    else:
        print("No compatible polymers found.")

if __name__ == "__main__":
    polymerize_two_monomers()

Firstly, you can enter two monomers and it will tell you the possible polymers that can stem from their reaction, and the classification of the generated polymer(s).

In [None]:
from smipoly.smip import monc, polg
import pandas as pd
from rdkit import Chem
from rdkit.Chem import Draw
from IPython.display import display  # For Jupyter Notebook rendering

def polymerize_and_display():
    # Get user input
    m1 = input("Enter first monomer SMILES: ").strip()
    m2 = input("Enter second monomer SMILES: ").strip()
    
    # Create temporary monomer DataFrame
    monomer_df = pd.DataFrame({
        'SMILES': [m1, m2],
        'Name': ['Monomer1', 'Monomer2']
    })
    
    # Classify monomers and generate polymers
    classified = monc.moncls(monomer_df, smiColn='SMILES', dsp_rsl=False)
    polymers = polg.biplym(classified, targ=['all'], Pmode='a', dsp_rsl=False)
    
    if not polymers.empty:
        print(f"\nGenerated {len(polymers)} polymers:")
        for idx, row in polymers.iterrows():
            # Extract polymer details
            smi = row['polym']          # SMILES string
            ptype = row['polymer_class']  # Polymer type
            mol = Chem.MolFromSmiles(smi)
            
            # Print text info
            print(f"\nPolymer {idx+1}:")
            print(f"SMILES: {smi}")
            print(f"Type: {ptype}")
            
            # Display structure if valid
            if mol:
                display(mol)  # Renders molecule in Jupyter Notebook
                # For non-Jupyter environments, use:
                # Draw.MolToImage(mol, size=(300,300)).show()
            else:
                print("Invalid SMILES - cannot display structure")
            print("─" * 40)
    else:
        print("No polymers generated")

if __name__ == "__main__":
    polymerize_and_display()


This now displays the mol structure of the generated polymer(s).

## SKIP NEXT CELL
### pls

In [None]:
tpstart = time.perf_counter()
DF_run2 = polg.biplym(df=DF_class_indexed_bool, targ=['polyolefin', ], Pmode='a', dsp_rsl=True)
tpend = time.perf_counter()
tm = tpend-tpstart
print("CPU time: ", tm)
print(f"DF_run2 memory usage: {DF_run2.memory_usage(deep=True).sum() / 1024 ** 2} MB")

In [None]:
tpstart = time.perf_counter()
DF_run1 = polg.biplym(df=DF_class_indexed_bool, targ=['polyether', ], Pmode='a', dsp_rsl=True)
tpend = time.perf_counter()
tm = tpend-tpstart
print("CPU time: ", tm)
print(f"DF_run1 memory usage: {DF_run1.memory_usage(deep=True).sum() / 1024 ** 2} MB")

With this, you can specify a polymer type and it will run the number of reactions required to generate as many of that type as possible from the available monomers.

In [None]:
DF_run1

In [None]:
DF_run1.to_csv('DF_run1.csv', columns=['mon1', 'mon2', 'polym', 'polymer_class'], index=False)

Next, I can pull random structures from this new database using their SMILES structures.

In [None]:
import random
from rdkit import Chem
from rdkit.Chem import Draw

polyeth_1 = [random.randint(0, 50) for i in range(len(DF_run1))]
random_fellas = [Chem.MolFromSmiles(DF_run1.iloc[i, 2]) for i in polyeth_1]
Draw.MolsToGridImage(random_fellas,molsPerRow=5, subImgSize=(200,200))

In order to add the SMILES structures to the corresponding mol structures, I had to convert the entries in the polym (SMILES) column from objects to strings.

In [None]:
import pandas as pd

print(DF_run1.dtypes)


In [None]:
import pandas as pd

DF_run1['polym'] = DF_run1['polym'].astype("string")

In [None]:
DF_run1

In [None]:
print (DF_run1.dtypes)

In [None]:
import random
from rdkit import Chem
from rdkit.Chem import Draw

polyeth_1 = [random.randint(0, 50) for i in range(len(DF_run1))]
random_fellas = [Chem.MolFromSmiles(DF_run1.iloc[i, 2]) for i in polyeth_1]
Draw.MolsToGridImage(random_fellas,molsPerRow=5, subImgSize=(200,200))

# Get molecules and SMILES strings from the same rows
random_fellas = [Chem.MolFromSmiles(DF_run1.iloc[i, 2]) for i in polyeth_1]
smiles_legends = [DF_run1.iloc[i, 2] for i in polyeth_1]

# Draw with SMILES as legends
Draw.MolsToGridImage(random_fellas, molsPerRow=5, subImgSize=(200,200), legends=smiles_legends)

Random.radiant allows for repeat samples, so I switched to random.sample.

In [None]:
import random
from rdkit import Chem
from rdkit.Chem import Draw

# Ensure you don't sample more than the number of rows in the DataFrame
num_samples = 50
num_rows = len(DF_run1)
if num_samples > num_rows:
    raise ValueError("num_samples cannot be greater than the number of rows in the DataFrame.")

# Get 50 unique random indices
polyeth_1 = random.sample(range(num_rows), num_samples)

# Get molecules and SMILES strings from the same rows
random_fellas = [Chem.MolFromSmiles(DF_run1.iloc[i, 2]) for i in polyeth_1]
smiles_legends = [DF_run1.iloc[i, 2] for i in polyeth_1]

# Draw with SMILES as legends
Draw.MolsToGridImage(random_fellas, molsPerRow=5, subImgSize=(200,200), legends=smiles_legends)


In [None]:
import random
from rdkit import Chem
from rdkit.Chem import Draw

# Prompt user for the number of samples and convert to int
num_samples = int(input('Please enter the number of samples to a max of 50: '))
num_rows = len(DF_run1)
if num_samples > num_rows:
    raise ValueError("num_samples cannot be greater than the number of rows in the DataFrame.")

# Get unique random indices
polyeth_1 = random.sample(range(num_rows), num_samples)

# Get molecules and SMILES strings from the same rows
random_fellas = [Chem.MolFromSmiles(DF_run1.iloc[i, 2]) for i in polyeth_1]
smiles_legends = [DF_run1.iloc[i, 2] for i in polyeth_1]

# Draw with SMILES as legends
Draw.MolsToGridImage(random_fellas, molsPerRow=5, subImgSize=(300,300), legends=smiles_legends)


In [None]:
import random
from rdkit import Chem
from rdkit.Chem import Draw

# Prompt user for the number of samples and convert to int
num_samples = int(input('Please enter the number of samples to a max of 50: '))
num_rows = len(DF_run1)
if num_samples > num_rows:
    raise ValueError("num_samples cannot be greater than the number of rows in the DataFrame.")

# Get unique random indices
polyeth_1 = random.sample(range(num_rows), num_samples)

# Get molecules and SMILES strings from the same rows
random_fellas = [Chem.MolFromSmiles(DF_run1.iloc[i, 2]) for i in polyeth_1]
smiles_legends = [DF_run1.iloc[i, 2] for i in polyeth_1]

# Prompt user for the filename to save the image
filename = input("Enter the desired filename for the image (e.g., mygrid.png): ")  # <-- NEW LINE

# Draw with SMILES as legends and save the image
img = Draw.MolsToGridImage(
    random_fellas,
    molsPerRow=5,
    subImgSize=(300, 300),
    legends=smiles_legends,
    useSVG=False,    # Ensure PIL Image is returned
    returnPNG=False
)
img.save(filename)  # Save with user-provided filename

import os

os.system(f'git add "{filename}"')
os.system(f'git commit -m "Add generated image: {filename}"')
os.system('git push')

Now for some copolymer reactions.

In [None]:
#Draw example of generated polymerization reaction
pickupNo = int(random.randint(0,len(DF_run1)))
print(pickupNo)
m1=Chem.MolFromSmiles(DF_run1.iloc[pickupNo,0])
m2=Chem.MolFromSmiles(DF_run1.iloc[pickupNo,1])
p=Chem.MolFromSmiles(DF_run1.iloc[pickupNo,2])
if DF_run1.iloc[pickupNo,1] != '':
    L = [m1, m2, p]
else:
    m2 = Chem.MolFromSmiles('') #None
    L = [m1, m2, p]
Draw.MolsToGridImage(L,molsPerRow=3, subImgSize=(300,300))

In [None]:
import random
from rdkit import Chem
from rdkit.Chem import Draw

# Pick a random row
pickupNo = random.randint(0, len(DF_run1) - 1)
print(pickupNo)

# Get molecules from SMILES
m1 = Chem.MolFromSmiles(DF_run1.iloc[pickupNo, 0])
m2 = Chem.MolFromSmiles(DF_run1.iloc[pickupNo, 1])
p = Chem.MolFromSmiles(DF_run1.iloc[pickupNo, 2])

# Prepare molecule list and corresponding SMILES legends
if DF_run1.iloc[pickupNo, 1] != '':
    L = [m1, m2, p]
    legends = [
        Chem.MolToSmiles(m1) if m1 else '',
        Chem.MolToSmiles(m2) if m2 else '',
        Chem.MolToSmiles(p) if p else ''
    ]
else:
    m2 = None  # More idiomatic than Chem.MolFromSmiles('')
    L = [m1, m2, p]
    legends = [
        Chem.MolToSmiles(m1) if m1 else '',
        '',
        Chem.MolToSmiles(p) if p else ''
    ]

# Draw with SMILES as legends
Draw.MolsToGridImage(L, molsPerRow=3, subImgSize=(300, 300), legends=legends)


In [None]:
import random
from rdkit import Chem
from rdkit.Chem import Draw

# Prompt user for the number of samples
num_samples = int(input("Enter the number of reactions to sample: "))
num_rows = len(DF_run1)
if num_samples > num_rows:
    raise ValueError("Number of samples cannot exceed number of rows in the DataFrame.")

# Randomly select unique indices
sample_indices = random.sample(range(num_rows), num_samples)

# Prepare all molecules and legends for the grid
all_mols = []
all_legends = []

for idx in sample_indices:
    m1 = Chem.MolFromSmiles(DF_run1.iloc[idx, 0])
    m2 = Chem.MolFromSmiles(DF_run1.iloc[idx, 1])
    p = Chem.MolFromSmiles(DF_run1.iloc[idx, 2])
    
    # Build the molecule list for this reaction
    if DF_run1.iloc[idx, 1] != '':
        L = [m1, m2, p]
        legends = [
            Chem.MolToSmiles(m1) if m1 else '',
            Chem.MolToSmiles(m2) if m2 else '',
            Chem.MolToSmiles(p) if p else ''
        ]
    else:
        m2 = None
        L = [m1, m2, p]
        legends = [
            Chem.MolToSmiles(m1) if m1 else '',
            '',
            Chem.MolToSmiles(p) if p else ''
        ]
    all_mols.extend(L)
    all_legends.extend(legends)

# Draw all reactions in a grid (3 columns per reaction)
Draw.MolsToGridImage(
    all_mols,
    molsPerRow=3,
    subImgSize=(300, 300),
    legends=all_legends
)


In [None]:
import random
from rdkit import Chem
from rdkit.Chem import Draw

# Prompt user for the number of samples
num_samples = int(input("Enter the number of reactions to sample: "))
num_rows = len(DF_run1)
if num_samples > num_rows:
    raise ValueError("Number of samples cannot exceed number of rows in the DataFrame.")

# Randomly select unique indices
sample_indices = random.sample(range(num_rows), num_samples)

# Prepare all molecules and legends for the grid
all_mols = []
all_legends = []

for idx in sample_indices:
    m1 = Chem.MolFromSmiles(DF_run1.iloc[idx, 0])
    m2 = Chem.MolFromSmiles(DF_run1.iloc[idx, 1])
    p = Chem.MolFromSmiles(DF_run1.iloc[idx, 2])
    
    # Build the molecule list for this reaction
    if DF_run1.iloc[idx, 1] != '':
        L = [m1, m2, p]
        legends = [
            Chem.MolToSmiles(m1) if m1 else '',
            Chem.MolToSmiles(m2) if m2 else '',
            Chem.MolToSmiles(p) if p else ''
        ]
    else:
        m2 = None
        L = [m1, m2, p]
        legends = [
            Chem.MolToSmiles(m1) if m1 else '',
            '',
            Chem.MolToSmiles(p) if p else ''
        ]
    all_mols.extend(L)
    all_legends.extend(legends)

# Prompt for filename and ensure valid extension
filename = input("Enter the desired filename for the image (e.g., reactions.png): ")
if not (filename.lower().endswith('.png') or filename.lower().endswith('.jpg') or filename.lower().endswith('.jpeg')):
    filename += '.png'  # Default to PNG if no valid extension is provided

# Draw all reactions in a grid (3 columns per reaction) and save the image
img = Draw.MolsToGridImage(
    all_mols,
    molsPerRow=3,
    subImgSize=(300, 300),
    legends=all_legends,
    useSVG=False,    # Ensure a PIL Image is returned
    returnPNG=False
)
img.save(filename)
print(f"Image saved as {filename}")

import os

os.system(f'git add "{filename}"')
os.system(f'git commit -m "Add generated image: {filename}"')
os.system('git push')
