<div align="center">

# **💥MolStorage💥**

</div>

<div align="center">

### A simple tool to help students analyze and understand the safety of different molecules and their proper storage

</div>

## **1. Introduction**
### 1.1. Motivation


<div style="text-align: justify;">

Chemistry students are often faced with the recurring challenge of gathering critical information about chemical compounds, such as safety data sheets (SDS), hazard pictograms, and specific properties. This process is not only tedious and time-consuming but also essential for ensuring proper handling and safety in both academic and laboratory settings.

We therefore came up with the idea of developing an algorithm that can correctly organize various chemical products according to the guidelines provided by EPFL, in order to automate the process and save users from having to spend a long time searching for the different characteristics of each product. This tool also identifies and displays the different hazard categories associated with each substance.

</div>

### 1.2. Theory
<div style="text-align: justify;">

The first source we used is from Jean-Luc Marendaz’s course, which specifies whether two products can be stored together based on their hazard pictograms. This table indicates the information we need to gather in order to store them correctly and safely. The criteria are presented in the figure below.

</div>

![Security Table](security_table_english.png)

<div style="text-align: justify;">

By looking at this table, we can see that in order to properly store the products, we need to identify their pictograms as well as determine whether they are acids or bases. The SDS Section 7 and 10 must also be retrieved.

The second source, shown below, details the proper storage of a product based on its pictogram.

</div>

![Flowchart](Security_flowchart.jpg)

## **2. The code**

### 2.1. Functions

### 2.2 Results

### 2.3. Limitations

<style>
  .justified-text {
    text-align: justify;
    text-justify: inter-word;
  }
</style>

<div class="justified-text">

<h2><b>3. Problems and Challenges</b></h2>

<h4><b>Database Availability</b></h4>

<p>Pictograms and safety hazard statements were not available in any existing database. Although it was possible to download databases for certain categories, such as "corrosive" compounds from PubChem, the corresponding hazard statements were missing or difficult to extract.</p>

<h4><b>No Direct Database Access</b></h4>

<p>By inspecting the HTML code of PubChem’s website, it seemed feasible to extract safety pictogram names and hazard statements using the <code>BeautifulSoup</code> package. However, this approach failed because the pictogram images are loaded dynamically via JavaScript on the compound pages, and therefore do not appear in the static HTML that BeautifulSoup parses.</p>

<p>To overcome this, the <code>selenium</code> package (<a href="https://pypi.org/project/selenium/">https://pypi.org/project/selenium/</a>) was tested, as it can control a web browser (e.g., Google Chrome) and scrape dynamic JavaScript-loaded content. Although Selenium worked well for retrieving information for a single compound, it proved too slow when processing multiple chemicals because it required fully loading each PubChem page in a browser, taking several minutes per compound—an unacceptable delay for large datasets.</p>

<p>Ultimately, the PubChem PUG-View REST API (<a href="https://pubchem.ncbi.nlm.nih.gov/docs/pug-view">https://pubchem.ncbi.nlm.nih.gov/docs/pug-view</a>) was used instead. Initially, it was believed that this API did not contain the needed data, but after thorough analysis of its structure, the locations of the pictograms and hazard statements were successfully identified. This method was significantly faster, even when tested on a dozen compounds, and thus was adopted.</p>

<p>Additionally, the <code>pubchempy</code> package (<a href="https://pubchempy.readthedocs.io/en/latest/guide/introduction.html">https://pubchempy.readthedocs.io/en/latest/guide/introduction.html</a>) was used to retrieve each compound’s generic name, IUPAC name, and SMILES notation.</p>

<h4><b>Hazardous Chemicals Sorting</b></h4>

<p>The sorting of hazardous chemicals followed EPFL’s safety directives (<a href="https://www.epfl.ch/campus/security-safety/wp-content/uploads/2024/01/Chemicals-Storage-flowchart_2024.pdf">EPFL Chemicals Storage Flowchart 2024</a>), which specify how to store chemicals based on their safety pictograms and hazard statements.</p>

<p>Incompatibilities between pictograms were considered according to a referenced diagram, which also recommends separating liquids from solids and storing explosive compounds or compressed gases separately, sometimes in isolation from other chemicals of the same class (e.g., oxygen storage).</p>

<p>The pictograms have a defined priority order; this was incorporated in the code by sorting each compound’s pictograms from highest to lowest priority before sorting the chemicals accordingly. Moreover, acids and bases were always separated due to the risk of violent reactions.</p>

<h4><b>Storage Categories and Complex Cases</b></h4>

<p>The EPFL storage categories (such as no pictograms or exclamation point, hazardous to the environment, acute toxicity, CMR/STOT, toxicity category 2/3, corrosive category 1, irritant, pyrophoric, flammable, oxidizer) become insufficient when handling chemicals with multiple hazard pictograms, especially if they include conflicting hazards.</p>

<p>For instance, triethylamine is flammable, corrosive, and acutely toxic, and as a base, it should not be stored with acids or corrosive bases due to its flammability, creating a storage conflict.</p>

<p>To resolve this, a new specific storage category called <b>“base corrosive flammable”</b> was created for chemicals like triethylamine and ethanolamine. Several other custom storage categories were also developed to address similar complex cases.</p>

</div>