# IRS Project Report
## I. Introduction

This report will be a brief presentation on the pip-installable package of IRS, Infra-Red Simulator. The main results of the functions, along with the project challenges limitations will be demonstrated below. 

##  II. Relevant Chemistry
Infrared (IR) spectroscopy is an analytical technique used to identify and study chemical compounds based on how they absorb infrared radiation. When molecules are exposed to IR light, specific vibrational modes of their chemical bonds are excited, depending on the frequency of the radiation. For a vibrational transition to be IR-active, it must involve a change in the dipole moment of the molecule.

###  Vibrational Theory

Under the harmonic approximation, atoms in a molecule are modeled as point masses connected by springs, representing chemical bonds. These atoms undergo vibrational motions (stretching, bending, etc.) around their equilibrium positions. The vibrational frequencies \( \nu \) are related to the second derivatives of the molecular energy with respect to nuclear displacements — these derivatives form the Hessian matrix.

In a simple diatomic molecule, the vibrational frequency can be approximated using the harmonic oscillator model:
$$
\nu = \frac{1}{2\pi c} \sqrt{\frac{k}{\mu}}
$$

Where:

- $\nu$ is the vibrational frequency (in Hz),
- $c$ is the speed of light (in cm/s),
- $k$ is the force constant of the bond (in N/m),
- $\mu$ is the reduced mass of the two atoms, given by:

$$
\mu = \frac{m_1 m_2}{m_1 + m_2}
$$

This model explains how heavier atoms (larger $\mu$) or weaker bonds (smaller $k$) lead to lower vibrational frequencies. Conversely, light atoms connected by stiff bonds vibrate at higher frequencies.

In reality, molecules consist of many atoms and therefore exhibit multiple vibrational modes: $3N - 6$ for nonlinear molecules, and $3N - 5$ for linear molecules, where $N$ is the number of atoms. Each of these normal modes corresponds to a specific pattern of atomic motion and can be analyzed for infrared (IR) activity.

A mode is IR-active if it involves a change in the dipole moment during the vibration. These active modes appear as peaks in the IR spectrum. The frequency and intensity of each peak provide insight into the bond types and functional groups present in the molecule.

Each functional group (such as –OH, –NH₂, C=O, etc.) has characteristic absorption bands in the IR region, typically between 4000 and 400 cm⁻¹. This results in a unique "molecular fingerprint" for every compound. The IR spectrum plots frequency (in wavenumbers, cm⁻¹) on the x-axis and intensity or transmittance on the y-axis, reflecting how much IR radiation is absorbed at each frequency.

Infrared spectroscopy is a powerful analytical technique because it allows chemists to identify the presence of specific functional groups within a molecule by examining their characteristic absorption bands. It provides a rapid and non-destructive method for confirming molecular structure. Additionally, it is commonly employed to monitor the progress of chemical reactions by observing changes in the IR spectrum that correspond to bond formation or cleavage. Because each molecule has a distinct IR absorption pattern, the technique is also valuable for distinguishing between similar compounds or assessing sample purity.

## III. Motivation
The initial project idea was to create a molecule combiner/analyzer, but was quickly dismissed due to the generality and the simplicity of the final result. Sharing interest in quantum chemistry, our team decided to create a tool combining quantum mechanical simulation and utility for chemists, which led to the idea of a computational Infra-Red Simulator. Combining vibrational spectroscopy and computational chemistry, we aimed to bridge theoretical methods with practical visualization tools. The project evolved into an Infra-Red Simulator capable of predicting IR spectra from molecular structures using quantum mechanical principles. Our goal is to make the vibrational analysis of molecules more accessible, allowing the user to visualize molecular vibrations and functional group behavior through a simple interface. Using RDKit, QM packages(Psi4, ORCA), and Streamlit, this package enables chemists to to generate and explore IR spectra from molecular structures in an intuitive environment.

## IV. Usage Example
After installing the package (as described in the README), users gain access to the full suite of functions it offers. This project is modular in design and can be divided into three independent components, each of which can be run separately: two quantum mechanical modules (using ORCA and Psi4, respectively) and a structural analysis module focused on identifying key features within a molecule.

The functions associated with each of these components are outlined and explained below.

## QM approach: ORCA
## QM approach: Psi4
## Structural approach
`get_functional_groups`
This function identifies and counts the presence of specific functional groups in a molecule based on its SMILES representation. For this, it uses a predefined dictionary of SMARTS (`FUNCTIONAL_GROUPS`) patterns, where each SMARTS describes the chemical structure of a functional group. For each SMARTS, the function searches for corresponding substructures in the molecule using RDKit's substructure matching. The output is a dictionnary where the keys are names of the functional gorups, and the values indicate how many times each molecule appears in the molecule.

```bash
def get_functional_groups(smiles):
    
    mol = Chem.MolFromSmiles(smiles)
    if mol is None:
        raise ValueError("Invalid SMILES string")
    
    mol = Chem.AddHs(mol) 
    fg_counts = defaultdict(int)

    arene_matches = set()
    for fg_name, smarts in FUNCTIONAL_GROUPS.items():
        pattern = Chem.MolFromSmarts(smarts)
        if not pattern:
            continue
            
        matches = mol.GetSubstructMatches(pattern)
        if matches:
            if fg_name in {"Benzene", "Naphthalene", "Pyridine"}:
                for match in matches:
                    atoms = frozenset(match)
                    if atoms not in arene_matches:
                        fg_counts[fg_name] += 1
                        arene_matches.add(atoms)
            else:
                fg_counts[fg_name] += len(matches)
    
    return {k: v for k, v in fg_counts.items() if v > 0}
```

`detect_main_functional_groups`
The function refines the output of the `get_functional_groups` by removing overlaps and avoiding double-counting. It applies correction rules to prioritize the main functional groups and subtract counts for the simpler groups they contain. The output is a dictionnary with the 

```bash
def detect_main_functional_groups(fg_counts: dict, smiles: str) -> dict:
    
    d = fg_counts.copy() 

    if "Naphthalene" in d:
        if "Benzene" in d:
            d["Benzene"] = max(0, d["Benzene"] - 2 * d["Naphthalene"])
    if "Anthracene" in d:
        if "Benzene" in d:
            d["Benzene"] = max(0, d["Benzene"] - 3 * d["Anthracene"])
        if "Naphthalene" in d:
            d["Naphthalene"] = max(0, d["Naphthalene"] - 2 * d["Anthracene"])
    if "Phenanthrene" in d:
        if "Benzene" in d:
            d["Benzene"] = max(0, d["Benzene"] - 3 * d["Phenanthrene"])
        if "Naphthalene" in d:
            d["Naphthalene"] = max(0, d["Naphthalene"] - 2 * d["Phenanthrene"])
    if "Indole" in d:
        if "Benzene" in d:
            d["Benzene"] = max(0, d["Benzene"] - d["Indole"])
        if "Pyrrole" in d:
            d["Pyrrole"] = max(0, d["Pyrrole"] - d["Indole"])
    if "Quinone" in d:
        if "Ketone" in d:
            d["Ketone"] = max(0, d["Ketone"] - 2 * d["Quinone"])
    if "Lactam" in d:
        for group in ["Amide", "Amine (Secondary)", "Ketone"]:
            if group in d:
                d[group] = max(0, d[group] - d["Lactam"])
    if "Peracid" in d:
        for group in ["Hydroperoxide", "Ketone"]:
            if group in d:
                d[group] = max(0, d[group] - d["Peracid"])
    if "Acyl Halide" in d:
        acyl_halide_count = d["Acyl Halide"]
        acyl_halide_substituents = {
            "Fluoroalkane": "R-CO-F",  
            "Chloroalkane": "R-CO-Cl",
            "Bromoalkane": "R-CO-Br",  
            "Iodoalkane": "R-CO-I"    
        }
        for haloalkane, acyl_type in acyl_halide_substituents.items():
            if haloalkane in d:
                d[haloalkane] = max(0, d[haloalkane] - acyl_halide_count)
        if "Ketone" in d.keys():
            d["Ketone"] = max(0, d["Ketone"] - d["Acyl Halide"])
    if "Acid Anhydride" in d:
        if "Ether" in d:
            d["Ether"] = max(0, d["Ether"] - d["Acid Anhydride"])
        if "Ester" in d:
            d["Ester"] = max(0, d["Ester"] - 2 * d["Acid Anhydride"])
        if "Ketone" in d:
            d["Ketone"] = max(0, d["Ketone"] - 2 * d["Acid Anhydride"])
    if "Lactone" in d:
        for group in ["Ester", "Ketone", "Ether"]:
            if group in d:
                d[group] = max(0, d[group] - d["Lactone"])
    if "Ester" in d:
        for group in ["Ether", "Ketone"]:
            if group in d:
                d[group] = max(0, d[group] - d["Ester"])
    if "Carboxylic Acid" in d:
        for group in ["Alcohol", "Ketone"]:
            if group in d:
                d[group] = max(0, d[group] - d["Carboxylic Acid"])
    if "Epoxide" in d and "Ether" in d:
        d["Ether"] = max(0, d["Ether"] - d["Epoxide"])
    if "Aldehyde" in d and "Ketone" in d:
        d["Ketone"] = max(0, d["Ketone"] - d["Aldehyde"])
    if "Isocyanate" in d and "Ketone" in d:
        d["Ketone"] = max(0, d["Ketone"] - d["Isocyanate"])
    
    return {k: v for k, v in d.items() if v > 0}
```

`count_ch_bonds`
This function takes as an input a SMILES of a molecule and as an output how many hydrogen atoms are bonded to carbon atoms of different hybridization (sp^3, sp^2, sp) in the molecule.

```bash
def count_ch_bonds(smiles):
    mol = Chem.MolFromSmiles(smiles)
    if mol is None:
        raise ValueError("Invalid SMILES string.")
    
    mol = Chem.AddHs(mol)

    
    sp3_ch = sp2_ch = sp_ch = 0
    
    for atom in mol.GetAtoms():
        if atom.GetSymbol() == 'C' and not atom.IsInRing():
            h_count = sum(1 for neighbor in atom.GetNeighbors() 
                         if neighbor.GetSymbol() == 'H')
            
            has_triple = any(bond.GetBondType() == Chem.BondType.TRIPLE 
                            for bond in atom.GetBonds())
            if has_triple:
                sp_ch += h_count
                continue
                
            hybridization = atom.GetHybridization()
            if hybridization == Chem.HybridizationType.SP3:
                sp3_ch += h_count
            elif hybridization == Chem.HybridizationType.SP2:
                sp2_ch += h_count
            elif hybridization == Chem.HybridizationType.SP:
                sp_ch += h_count
    
    return {
        "sp³ C-H": sp3_ch,
        "sp² C-H": sp2_ch,
        "sp C-H": sp_ch
    }
```
`count_carbon_bonds`
This function takes as an input the SMILES of a molecule and returns how many C-C, C=C, and C≡C bonds there are in this molecule.







## V. Diffuculties

`






