# GreenChemPanion 🍃
----
Welcome to this interactive Jupyter Notebook Report for GCP! 🎉

**📝 About This Notebook:**

The principles of Green Chemistry seek to transform the way chemical processes are designed, with the goal of reducing or eliminating the use and generation of dangerous substances. Sustainable chemistry insists on improving reaction efficiency, minimizing waste, and ensuring the long-term safety of both products and processes. Key metrics such as the E-Factor, Process Mass Intensity (PMI), and Atom Economy have been developed to quantitatively measure the environmental impact of chemical reactions. Also, solvent choice and product properties are key to assessing the sustainability of chemical transformations.

With this idea in mind, we have developed GreenChemPanion in the context of the CH-200 Practical Programming for Chemistry course at EPFL: an interactive Python Package, based on RDKit and Streamlit, designed to help chemists assess and optimize the sustainability of their reactions. GCP integrates core green chemistry metrics, including E-Factor, PMI, and Atom Economy, as well as evaluations of solvent sustainability, molecular greenness (based on atomic composition), and reaction conditions.

On the one hand, the package includes a pip-installable module which contains many functions and methods for Green Chemistry applications, which are, for the most part, centered around the `Reaction` class, main component of the module, which serves as a way to input chemical reactions in a way suited for computer programming & treatment.
On the other hand, a Streamlit applet is also included, which provides the user with an interactive interface to input key reaction parameters such as reactants, products, and solvents & extra material used in the process, as well as to compute Green Chemistry factors for the given reaction, and establish "greenness" assessments of the process, based on solvents used, molecular structures of the compounds and the factors' values.

Through this project, we aim to provide chemists with a centralized, intuitive, and practical tool that supports greener decision-making: GreenChemPanion bridges the gap between synthetic chemistry green chemistry principles, and cheminformatics, helping users evaluate their current reactions and design more sustainable processes.

**🔔 Before getting started:**

Make sure you have gone through the README file on the root folder, which indicates how to install the package properly!

Feel free to modify the inputs, test different reactions, and explore how different molecular structures impact sustainability metrics. The code is modular and documented to support experimentation and learning.

**❓ Questions**

*This package was made with ♥️ by Marc AL HACHEM, Ralph GEBRAN, Taïs THOMAS, and Valentine WIEN, for the EPFL CH-200 Practical Programming for Chemistry course in 2025.*

For any questions, please contact `marc.alhachem@epfl.ch`, `ralph.gebran@epfl.ch`, `tais.thomas@epfl.ch` or `valentine.wien@epfl.ch`

### 🔝 Import dependencies

To begin, run the following cell to import all necessary modules, libraries and most importantly, GCP functions. Dependencies of GCP include standard tools such as pandas and math, along with the cheminformatics toolkit RDKit, which is used to represent, manipulate, and analyze molecular structures.

In [None]:
import streamlit as st
import pandas as pd
import math
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem import Descriptors
from streamlit_ketcher import st_ketcher
from greenchempanion import Atom_Count_With_H, Reaction, compute_PMI, canonicalize_smiles, compute_E 
from greenchempanion import get_solvent_info, waste_efficiency, PMI_assesment, Atom_ec_assesment, logP_assessment_molecule, atoms_assessment, structural_assessment

To begin evaluating the sustainability of a chemical reaction, users must define the reaction they wish to assess. With the `Reaction` class you can enter your chemical reactions as python variables, and you can also include solvents and extra materials, as well as reaction yield as additional data for some functions. 

### 📚 GreenChemPanion Functions

**1️⃣ Define your chemical reaction**

The main input type for chemical compounts in GCP is SMILES strings (a standardized way to represent molecular structures). If you don't know SMILES, you can either find them online, or generate them on the Streamlit app (see below).

The `Reaction` class consists of two `Dict[Mol:int]` (one for reactants, one for products), as well as an `int` to designate the main/desired product's index in the second dictionary (set by default to 0, the first element).

In the following example, we consider a SN reaction between N3 and I on a butane chain:

![snequation.png](../assets/sn_equation.png)

In [None]:
# Compounds of a SN reaction
sodium_azide = Chem.MolFromSmiles("[N-]=[N+]=[N-].[Na+]")
two_iodobutane = Chem.MolFromSmiles("CCC(I)C")
two_azidobutane = Chem.MolFromSmiles("CCC([N]=[N+]=[N-])C")
sodium_iodide = Chem.MolFromSmiles("[Na+].[I-]")

In [None]:
# Create the Reaction variable
reactants = {two_iodobutane:1, sodium_azide:1} 
products = {two_azidobutane:1, sodium_iodide:1}

sn_reaction = Reaction(reactants, products)

This step creates a Reaction object that encodes the chemical transformation. SMILES are automatically converted into internal RDKit Mol objects (keys of the dictionaries), and the reaction is stored with the corresponding stoichiometry (values of the dictionaries).

**2️⃣ Calculating the Atom Economy**

Atom economy is a fundamental indicator in green chemistry, measuring the efficiency of using atoms in reactants to form the main product. A reaction with good atom economy limits waste production and maximizes the value of raw materials. The higher the atom economy, the better. The atom economy is expressed as a percentage and can never exceed 100% for well balanced reactions.

GreenChemPanion allows you to calculate this indicator using two methods of the `Reaction` class:
 - By **number of atoms** (`Atom_Economy_A()`): evaluates the proportion of atoms present in the main product compared to all the reactants. Implicit hydrogens are included via an the `Atom_Count_With_H()` function.
 - By **molar mass** (`Atom_Economy_M()`): weights atoms by their mass to give a more accurate measurement in an industrial context. the mass of each molecule is obtained automatically from its structure, using the exact atomic masses provided by RDKit.


In [None]:
print(f" Atom Economy by number of atoms is: {sn_reaction.Atom_Economy_A()}")
# Result should be 88.89 %
print(f" Atom Economy by molar mass is: {sn_reaction.Atom_Economy_M()}")
# Result should be 39.81 %

**3️⃣ Calculation of PMI and E Factor**

These two indicators quantify the material efficiency of a chemical process by integrating the waste generated. Unlike the atom economy, they take into account all material flows, including by-products and solvents and extra material, when specified.

The two metrics are calculated using functions which take in three arguments: a GCP `Reaction` object, a `Dict[Mol:float]` containing solvents & extras with their masses (in g) per kg of main product, and finally a `float` indicating the main product's yield.

 - **PMI (Process Mass Intensity)** (`compute_PMI()`) is the mass (kg) of inputs in the process (reactants + solvent + extras) per kilogram of main product.
 - **E-factor** (`compute_E()`) is the mass (kg) of waste generated by the process (side products + solvent + extras) per kilogram of main product.

If needed, the user must thus specify the extras in a dictionary before using the compute functions.

(For simplicity reasons, the extras inputs must be in mass and not volume, as some extras may not be liquid, and we can't deduce the density of any given compound. Nonetheless, a library of common solvents whose densities are stored can be added as volumetric inputs in the Streamlit applet, more below)

 Here are the specifications for the previously used reaction:

![SN Reaction](../assets/snreaction.png)



In [None]:
# Extras Compounds
water = Chem.MolFromSmiles("O")
acetone = Chem.MolFromSmiles("CC(=O)C")
ether = Chem.MolFromSmiles("CCOCC")

sn_extras = {acetone:17470, water:112000, ether:81800}

sn_reaction_pmi = compute_PMI(sn_reaction, sn_extras , 0.9)
sn_reaction_e= compute_E(sn_reaction, sn_extras, 0.9)

print(f"PMI : {sn_reaction_pmi}")
# Result shoud be 214
print(f"E-factor : {sn_reaction_e}")
# Result should be 212.9

**4️⃣ Green Chemistry Evaluation**

With all the information calculated, additional functions are available to perform a global evaluation of the green chemistry profile of a reaction. These functions assess the computed factors (E-factor, PMI, Atom Economy) and highlight key indicators such as solvent quality, LogP of the main product, and structural concerns.

<u>1. Evaluation of the solvents used</u>


Solvents can be responsible for a large part of the environmental impact of a chemical process. In this project, we integrated a method for classifying solvents used in a reaction, based on their SMILES structure.

Principle – each solvent is compared to three pre-defined categories:
 - ✅ Green: solvents considered environmentally friendly according to green chemistry guides   (e.g., water, ethanol, ethyl acetate, 1,3-dioxolane, carbon dioxide, isopropanol, methanol).
 - 🟨 Acceptable: Default category for unclassified solvents, assumed to be acceptable but not optimal.
 - ❌ Bad: solvents that are problematic for health or the environment   (e.g., dichloromethane, chloroform, benzene, carbon tetrachloride, n-hexane, pentane).

The function then iterates through a dictionary of molecules (SMILES) associated with their mass in grams per kilogram of product, and increments a warning count if any bad solvents are detected.

In [None]:
# Here is the smiles list of the solvent in the database
Green = {"O", "CCO", "CC(=O)OCC", "CC1COCC1", "O=C=O", "CC(O)C", "CO"}
Bad = {"ClCCl", "ClC(Cl)Cl", "c1ccccc1", "ClC(Cl)(Cl)Cl", "CCCCCC", "CCCCC"}

#for example here is possible solvents
sn_extras = {acetone:17470, water:112000, ether:81800}
print(get_solvent_info(sn_extras)) 
#should be All Green ;)

Based on the count, the function issues an overall verdict along with a corresponding color code:

- If at least one **"Bad"** solvent is found, a warning is returned with a Red color code.
- If no "Bad" solvents are found but **"Acceptable"** ones are used, a warning is shown with a Yellow color code.
- If all solvents are classified as **"Green"**, a  message confirms that the selection meets green chemistry guidelines, with a Green color code.



 <u>2. Presence of problematic atomic elements</u>
 
An initial filter checks whether any of the products contain elements considered hazardous from a green chemistry perspective. The predefined list includes halogens (e.g., chlorine, bromine, iodine), heavy metals (e.g., lead, mercury, cadmium), and other environmentally concerning elements (e.g., arsenic, selenium, palladium). 

If any of these atoms are detected in the structure of a product, the function returns a warning message along with a yellow color code. If no risky atoms are found, a green confirmation is returned.



In [None]:
# List of predefined risky atoms used in the function
RISKY_ATOMS = { "F", "Cl", "Br", "I", "Li", "Ti", "Sn", "Pb", "Pd", "Hg", "Cd", "As", "Cr", "Ni", "Se", "Tl", "Pt", "Rh"}

# Example 1:  product containing a risky atom
chloroethane = Chem.MolFromSmiles("CCCl") 
ethanol = Chem.MolFromSmiles("CCO")        

reaction = Reaction(reactants={chloroethane: 1}, products={chloroethane: 1})
message, color = atoms_assessment(reaction)
print(message)  # should see ⚠️ Concerning atoms: Cl

# Example 2: A product without any risky atom
reaction_safe = Reaction(reactants={ethanol: 1}, products={ethanol: 1})
message_safe, color_safe = atoms_assessment(reaction_safe)
print(message_safe)  # ✅ All atoms are green

<u>3. Evaluation of logP (hydrophobicity)</u>

The second indicator is the logP value, which corresponds to the logarithm of the octanol/water partition coefficient. It gives insight into the solubility and environmental behavior of the main product.

Low logP values indicate hydrophilic compounds, generally associated with better biodegradability. High logP values suggest that the molecule is more hydrophobic and may accumulate in organisms or the environment.

The function calculates the logP of the main product using RDKit and returns a verdict based on predefined thresholds. The result is accompanied by a short message and a color code reflecting the environmental impact.

In [None]:
# Example: long-chain alkane (decane) – expected to have high logP (4.15)
decane = Chem.MolFromSmiles("CCCCCCCCCC")
logP_value = Descriptors.MolLogP(decane)

print(logP_assessment_molecule(logP_value)) # should give a warning and yellow hex code

The assessment is based on a simple classification:

 - ✅ 1.5<= logP <= 2.5 → Hydrophilic, favorable product
 - 🚸​ 0 <= logP <= 1.5 and 2.5<= logP <= 4 → Moderately hydrophobic product
 - 🚫 logP > 4 → Potentially problematic product

<u>4 Structural assessment</u>

In addition to atomic composition and other indicators, the molecular structure of the product can influence its environmental status. Certain structural patterns are known to reduce biodegradability, dammage the environment.

The `structural_assessment()` function evaluates the main product using the following criteria:

- A dictionary of SMARTS patterns defines known problematic groups:
  - Carbon oxides (`CO`, `CO₂`)
  - Nitro, azo, and azide groups (`NO₂`, `N=N`, `[N₃]`)
  - Halogenated aromatics ( dichlorobenzenes)

- If the molecule contains more than 10 non-hydrogen atoms, it is flagged as structurally heavy, which may indicate poor biodegradability.

- For each SMARTS pattern, the molecule is checked using `mol.HasSubstructMatch(...)`. If no exact match is found, a fingerprint similarity check is applied (Tanimoto ≥ 0.15) to detect close structural analogs.

- If any of these conditions are met, a warning message is returned along with a red color code. Otherwise, the molecule is considered structurally acceptable and a green message is returned.

The function is based on sets, which ensures that each problematic group is only reported once, even if it appears multiple times across different patterns. It also uses a boolean flag to detect the presence of long heavy-atom chains (more than 10 non-hydrogen atoms), which is evaluated separately from the SMARTS matching.



In [None]:
nitrobenzene = Chem.MolFromSmiles("O=[N+]([O-])c1ccc(cc1)CCCCCCC")  # nitrobenzene with a long alkyl chain
ethanol = Chem.MolFromSmiles("CCO")

reaction_bad = Reaction(reactants={nitrobenzene: 1}, products={nitrobenzene: 1})
msg_bad, color_bad = structural_assessment(reaction_bad)
# Should be Negative message, with Long chains and Nitro Groups
print(msg_bad)

reaction_safe = Reaction(reactants={ethanol: 1}, products={ethanol: 1})
msg_safe, color_safe = structural_assessment(reaction_safe)
#Should be Positive, no bad structural assessments
print(msg_safe)

<u>5 E-factor</u>

Previously calculated, the E-factor is evaluated based on its magnitude:

- ≤ 1 → excellent  
- 1–5 → acceptable  
- &gt; 5 → poor, high waste

---

<u>6 PMI (Process Mass Intensity)</u>

The PMI result is interpreted as follows:

- &lt; 10 → excellent efficiency  
- 10–50 → acceptable  
- &gt; 50 → low material efficiency

---

<u>7 Atom Economy</u>

Both molar mass– and atom count–based Atom Economy are evaluated using the same thresholds:

- &gt; 89% → excellent  
- 80–89% → very good  
- 60–79% → moderate  
- 40–59% → poor  
- ≤ 39% → very poor


### 📊 Streamlit Applet

To showcase the functionalities of the package, we came up with a fully useable web app, containing many modules which make use of GreenChemPanion functions!

**0️⃣ Running the App**

TIn the `src/greenchempanion/` folder of the repository, the `app.py` file is present, containing the interactive applet.

Make sure your terminal is on the `greenchempanion` folder:

```
$ cd "YourRepoLocation"/src/greenchempanion
```

Run the applet:
```
$ streamlit run app.py
```

The applet should open on local tab in your default browser. Feel free to experiment with the different sections, which showcase the GCP fuctionalities!

**1️⃣ SMILES Converters**

Two tools are present on the app to help the user with SMILES, which serves as the principal format for molecular inputs on the applet:

- The first (1) s the molecular structure of a given SMILES (to help check if it is correct, for example).

- The second (2) is a small interface to draw a molecular structure, and output its SMILES (if you know a structure but not its SMILES)

![Converters](../assets/converters.png)


**2️⃣ Enter a Reaction**

This section contains three sections to add elements of your chemical reaction:

- In the first section (3) compounds of your reactions and their stoichiometric coefficients, selecting them as reactants or products.

- The second and third sections are for adding extra material (solvents, extraction, catalysts...): If your solvent is a commonly used species, you choose it in the list present in (4) and add it as a volumetric input (per kg of product). If the extra is solid or an uncommon compound, you can add it as a mass input (per kg of product) in (5). Additionally, you can enter the reaction yield (in %) in this third expander.

![Enter Reaction](../assets/enter_reaction.png)

**3️⃣ Stored Reaction and Compute Factors**

- The first column (6) displays the reaction stored by the user: each compound, in reactants, products or extras, listed with its SMILES, a thumbnail of its structure, and its stoichiometric coefficient (Extras display their mass per kg of product). The chosen yield, as well, is displayed, and you can also choose the main/desired product in this section, which will be marked by a star.

- Once a full reaction is inserted (at least one reactant and one product), the second column (7) will display three buttons, to compute the three Green Chemistry Factors, namely, Atom Economies, PMI and E-factor.

Additionally, above these two columns, a red pop-up will appear warning the user if the reaction is not balanced, and that results may be incoherent if so.

![Reac-Fac](../assets/reaction_factors.png)

**4️⃣ Green Chemistry Evaluation**

In this section (9), the different evaluations of the Reaction and its main product are displayed. It summarizes the computed metrics and highlights anything that might be worth improving in the reaction.

![Evaluation](../assets/gcp_evaluation.png)

## 🌿 GreenChemPanion: Challenges, Features and Limitations

### **🧪 Introduction**

GreenChemPanion is an interactive notebook designed to assess the environmental sustainability of chemical reactions based on the principles of green chemistry. It is based on the analysis of indicators such as atom economy, E-factor, solvent use, and the nature of the products generated.

### **🎯 Motivations**

This project is aimed at anyone, students, teachers, or researchers, who wishes to integrate sustainability concepts from the design phase of a chemical synthesis. By offering a rapid, visual, and multi-criteria assessment of a reaction based on its SMILES representation, GreenChemPanion makes it possible to identify the main weak points of a transformation from a green chemistry perspective in just a few seconds. Using various indicators (atom savings, waste generated, nature of solvents, atomic elements present, molecular structure of the product), the project encourages a more critical and responsible approach to chemistry.

Beyond the technical aspect, GreenChemPanion is also intended as an educational and awareness-raising tool. It reminds us that every choice made during a synthesis—from the solvent to the product formed—can have a measurable environmental impact. By facilitating access to criteria often considered secondary in laboratory planning, this project contributes to making sustainable chemistry a concrete, accessible, and applicable priority starting in university education.

### **🌟 Main Features**

- **Atom Economy Calculation**: Measures the efficiency with which reactants are converted into the main product.
E-Factor & PMI: Evaluate the mass of waste generated and the mass of material consumed.

- **Solvent Assessment**: Classifies solvents into three categories ("Green", "Acceptable", "Bad") according to their environmental impact.

- **LogP Analysis**: Provides an indication of the product's biodegradability and persistence in a biological environment.

- **Elemental Risk Scan**: Detects the presence of problematic elements (heavy metals, halogens, etc.).

- **Structural Assessment**: Searches for structural motifs associated with environmental risks or toxicity.

- **Streamlit Interface**: Interactive interface for easily testing different reactions.

### **📉 Challenges Encountered**

- **Stoichiometry and balancing**: The code had to ensure that the reactions were well balanced before running the calculations.

- **Reliable atom counting**: Implicit hydrogens had to be taken into account to obtain a realistic count.

- **Choice of reference solvents**: The classification of solvents was based on an arbitrary selection of representative SMILES, which sometimes required questionable decisions.

- **Product evaluation**: Certain criteria (such as logP or the presence of "at risk" atoms) may vary depending on the application context, which introduces a degree of subjectivity.

- **Using Streamlit**: Integrating it into a dynamic interface while maintaining explicit and colorful user feedback required technical adjustments.

### **🚧 Limitations**

- **Subjectivity in some criteria**: The evaluation of problematic solvents or structures is based on lists chosen by the developers, which may not cover all cases or depend on questionable sources.

- **SMILES only**: Some complex or ambiguous reactions cannot be properly analyzed if the SMILES are not well written or ambiguous.

- **Simplification of waste**: All products that are not the main product are considered waste, which can be reductive in the case of recoverable co-products.

- **Lack of energy factors**: The software does not take into account experimental conditions (such as temperature and pressure) which also influence durability.

- **Limit to one reaction at a time**: The system evaluates only one transformation at a time and does not yet allow a global evaluation of a multi-step synthetic route.

### **✅ Conclusion**

GreenChemPanion represents a first step toward an interactive and accessible green chemistry assistant. While still imperfect, it allows for an initial assessment of chemical transformations from a sustainability perspective. By combining various indicators in a simple interface, it serves as an educational tool and awareness-raising tool. The project remains open to future expansions: consideration of energy, multi-step approach, reaction database, or even the integration of artificial intelligence for more refined recommendations.



### 🪶 **References**

- Common Solvents Used in Organic Chemistry: Table of Properties. https://organicchemistrydata.org/solvents/ (accessed 2025-05-22)