# This is a workflow to display as many as SMILES chemical string to 10,000 images

This script reads a list of SMILES (Simplified Molecular Input Line Entry System) strings from a file, converts them into chemical structures using RDKit, and displays the structures as images directly within a Jupyter notebook. Each structure is labeled with its entry number.

## Instal conda in Colab

In [None]:
!pip install -q condacolab
import condacolab
condacolab.install()

# Please give a short time for the kenerl to restart, before you move on to the next cell

In [None]:
! conda install -c rdkit rdkit -y &> /dev/null
!conda install -q -y -c openbabel openbabel -y &> /dev/null

# Read smiles file

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('1.csv')
df

# Grab out SMILES and display with Rdkit

```
$2 means the 3rd coloumn, which is the smiles column, please adjust this accordingly. For example, if your smiles is on 4th column, you need $3
```


In [None]:
!awk -F "\"*,\"*" '{print $2}' 1.csv > smile.smi

## The smile.smi was downloaded to local laptop, and the first line was deleted since if it is not a SMILES string. Save the new smile.smi and upload again to overwrite the old one.

But it was not always necessay, for example if your first row is already smiles, you don't need to do anything.

### Display the structures

In [None]:
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem import rdDepictor
from IPython.display import display
import math

# Function to split the list into chunks
def chunks(lst, n):
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

# Read SMILES from file
smiles_file = 'smile.smi'  # Replace with the path to your SMILES file

with open(smiles_file, 'r') as file:
    smiles_list = [line.strip() for line in file]

# Convert SMILES to RDKit molecule objects
molecules = [Chem.MolFromSmiles(smiles) for smiles in smiles_list if smiles]

# Generate 2D coordinates for each molecule (important for drawing)
for mol in molecules:
    if mol:
        rdDepictor.Compute2DCoords(mol)

# Add entry numbers as legends
legends = [f'{i+1}' for i in range(len(molecules))]

# Split the molecules into chunks
molecule_chunks = list(chunks(molecules, 50))  # Adjust the chunk size as needed
legend_chunks = list(chunks(legends, 50))

# Draw and display images for each chunk
for i, (chunk, legend_chunk) in enumerate(zip(molecule_chunks, legend_chunks)):
    img = Draw.MolsToImage(chunk, subImgSize=(200, 200), legends=legend_chunk, molsPerRow=4)
    display(img)
