<a href="https://colab.research.google.com/github/shahd1995913/OCR-for-Chemistry/blob/main/fragmentation_for_chemicals_in_images.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

* This code loads a pre-trained object detection model and uses it to 
detect objects within an input image. 
* If the detected object is a chemical, its region is extracted and preprocessed, then its chemical structure is identified using **RDKit's MolFromImage function**. 
* The resulting SMILES string can then be used for further analysis .
* customized depending on the specific requirements of the task at hand.

In [None]:
! pip install rdkit

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting rdkit
  Downloading rdkit-2022.9.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (29.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m29.4/29.4 MB[0m [31m33.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: rdkit
Successfully installed rdkit-2022.9.5


In [None]:
import cv2
import numpy as np
from rdkit import Chem
from tensorflow.keras.models import load_model

# Load pre-trained object detection model
model = load_model('path/to/model.h5')

# Load image and preprocess
img = cv2.imread('path/to/image.jpg')
img = cv2.resize(img, (640, 640))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# Perform object detection
detections = model.predict(img)

# Extract chemical regions from detections
chemical_regions = []
for detection in detections:
    if detection[0] == 'chemical':
        x1, y1, x2, y2 = detection[1:]
        chemical_regions.append(img[y1:y2, x1:x2])

# Identify chemicals in each region
for region in chemical_regions:
    # Preprocess region
    region = cv2.cvtColor(region, cv2.COLOR_RGB2GRAY)
    region = cv2.threshold(region, 0, 255, cv2.THRESH_BINARY)[1]

    # Identify chemical structure
    mol = Chem.MolFromImage(region)
    smiles = Chem.MolToSmiles(mol)

    # Do something with the identified chemical
    print(smiles)