# From SMILES to Images: Chemical Structure Conversion with RDKit in Jupyter


# Instal conda in Colab
Please allow a short moment for the colab connection to restart before proceeding to the next cell.

In [1]:
!pip install -q condacolab
import condacolab
condacolab.install()

⏬ Downloading https://github.com/conda-forge/miniforge/releases/download/23.11.0-0/Mambaforge-23.11.0-0-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:12
🔁 Restarting kernel...


# Install Rdkit


In [1]:
! conda install -c rdkit rdkit -y &> /dev/null
import pandas as pd

# Prepare Input file from a raw file
Read smiles as data frame, please modify the input name ```1.csv``` to your input file name


In [None]:
#change the inout name, it can be *.csv or *.smi formats
df = pd.read_csv('1.csv')
df

Grab out SMILES and write it into a new file

```
$2 means the 3rd coloumn, which is the smiles column, please adjust this accordingly. For example, if your smiles is on 4th column, you need $3
```


In [5]:
!awk -F "\"*,\"*" '{print $2}' 1.csv > smile.smi

If an error occurs in the next step, you may need to double-check the format of the smile.smi file. Ensure that there are no unnecessary titles or extra content that could cause issues. This is especially important, as different users may provide files with varying formats, which can lead to unpredictable errors.

# Display the structure

But it was not always necessay, for example if your first row is already smiles, you don't need to do anything.

### Display the structures

In [None]:
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem import rdDepictor
from IPython.display import display
import math

# Function to split the list into chunks
def chunks(lst, n):
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

# Read SMILES from file
smiles_file = 'smile.smi'  # Replace with the path to your SMILES file

with open(smiles_file, 'r') as file:
    smiles_list = [line.strip() for line in file]

# Convert SMILES to RDKit molecule objects
molecules = [Chem.MolFromSmiles(smiles) for smiles in smiles_list if smiles]

# Generate 2D coordinates for each molecule (important for drawing)
for mol in molecules:
    if mol:
        rdDepictor.Compute2DCoords(mol)

# Add entry numbers as legends
legends = [f'{i+1}' for i in range(len(molecules))]

# Split the molecules into chunks
molecule_chunks = list(chunks(molecules, 50))  # Adjust the chunk size as needed
legend_chunks = list(chunks(legends, 50))

# Draw and display images for each chunk
for i, (chunk, legend_chunk) in enumerate(zip(molecule_chunks, legend_chunks)):
    img = Draw.MolsToImage(chunk, subImgSize=(200, 200), legends=legend_chunk, molsPerRow=4)
    display(img)
