# Welcome to __PERFUMEme.py__ 🧪👃🧴⚛️⚗️

## Introduction
Perfumes are a part of our everyday lives—whether it’s a subtle floral hint or a bold, spicy statement, most people have a signature scent that others come to recognize them by. But have you ever wondered what’s actually in your favorite fragrance? Perfumes are complex mixtures of countless molecules, and each one plays a unique role in shaping the scent you love.

So how can you figure out which molecule is responsible for that special note in your perfume? And once you have a molecule in mind, how do you know if it’s safe, aromatic, or even used in other fragrances?

Meet __PERFUMEme.py__, — a Python package that helps demystify the chemistry behind your scent. Simply input a molecule, and the tool will tell you:

- Whether the molecule is fragrant or toxic
- Its key physical properties, such as boiling point
- How it evaporates over time, via an evaporation model
- And best of all, which perfumes (when data allows) contain that exact molecule

Unlock the secrets behind the scents — one molecule at a time.

The main chemistry involved is organic chemistry and physical chemistry, particularly focusing on volatility and evaporation.


## 🛠️ Project Overview

The __PERFUMEme__ project is a Python-based toolkit designed to bridge the gap between perfumery and cheminformatics by enabling structured, molecule-level exploration of commercial fragrances. Its primary function is to understand if a certain molecule is usable in the perfume industry and understand why or why not.

Perfumes are typically described by their olfactory notes (e.g., floral, woody, spicy) and ingredient names, but rarely include explicit chemical representations. This lack of molecular data limits the potential for computational analysis and machine learning applications in fragrance design. PERFUMEme addresses this issue by enabling users to convert perfume data—including name, brand, ingredient names, and scent notes—into a format enriched with chemical identifiers, suitable for further computational analysis, clustering, or visualization.

The package processes structured JSON input and performs the following key tasks:

- Queries PubChem using molecule names to retrieve SMILES strings
- Handles ambiguous or unrecognized molecule names with graceful fallbacks.
- Returns structured data that can be visualized and analyzed.

By combining publicly available molecular data with perfume metadata, PERFUMEme provides a foundation for exploring relationships between molecular composition and fragrance perception, opening the door to computational fragrance classification, similarity scoring, and even automated formulation tools. The typical user of this package would be fragrance chemists, cosmetic scientists, cheminformatics researchers, or curious enthusiasts with a background in organic or physical chemistry — especially those interested in functional groups and molecular volatility. 

_This may be a bit too repetitive._

## 🧰 Material and methods

### 🗂 Data Sources
Molecular data were retrieved from PubChem, a publicly accessible chemical database maintained by the National Center for Biotechnology Information (NCBI). The PubChem PUG REST API was used to access compound properties, such as their molecular names, SMILES strings, boiling point, enthalpy of vaporisation and pressure value.

The fragrance data, including perfume names, brands, notes, and associated molecules, were compiled manually and stored in a local JSON file. This file served as the data set for the perfume_molecule.py functions.

### 𝌤 Package Structure
The __PERFUMEme__ package was implemented in Python and organized into several functional modules:

- scraper.py: includes functions used to add the molecules listed in the perfumes in perfume.json to a new database and on to addtheir smiles from PubChem, used withodors.csv to add their respective odors (if they had smell).
- utils.py: contains utility functions for validating molecule data, handling API errors, and processing JSON inputs.
- data/: directory containing the JSON files with structured perfume and molecular data.

The package was developed to be modular, allowing users to integrate new data sources or extend the current capabilities without modifying core functions. For example, adding a molecule with its odor and SMILES to molecules.json.

## 📊 Results and discussions


### Functions in __utils.py__

##### 🧪 SMILES Extraction

The SMILES strings for each molecule are retrievable for each molecule using the __get_smiles__ function. A casual user might not know the SMILES of the molecule they want to analyse, so this function makes it easier for them. It is also used in the __resolve_input_to_smiles_and_cid__ function. A simple example usage is presented below.

In [1]:
import matplotlib

In [None]:
import os
sys.path.append(os.path.abspath("../src"))


from perfumeme.utils import get_smiles
vetiverol= get_smiles("Vetiverol")
print (vetiverol)

CC1CC(C=C(C2C1CC(=C(C)C)C2)C)O


_This function can handle API queries, error checking, and returns the results in a structured format:_
show example of this


##### Resolving the input

The function __resolve_input_to_smiles_and_cid__, used in the main functions, is what allows the user to input either a SMILEs or the compound name. This is a function to make the main functions easier to use and accessable. It also uses the __get_cid_from_smiles__ function, which retrieves the PubChem Compound ID (CID) corresponding to a given SMILES string.

##### Get the odor

This function is used in __scraper.py__ to get the odors of a molecule and them to the database.

##### Getting information from PubChem

The function __get_pubchem_description__ is used in the function __has_a_smell__ to get the descriptive information associated with a compound identified by its Compound ID (CID), such as general chemical information and properties.

The function __get_pubchem_record_sections__ gets the structured data sections for a compound from PubChem using its CID. It is used in several functions of the __main_functions.py__.

_Maybe say a bit more on these functions_


_Discuss the functions in utils.py, if there is something to discuss._

### 🧐 Is the molecule usable in a perfume?

One of the main functions of this package was to determine if a given molecule is usable in a perfume. Multiple factors need to be taken into account to determine this. Such as if the molecule has a smell, is too toxic for the skin and see if it will stay long enough. But of course even without a smell, some molecules are present in perfumes.

#### Does it smell?

The first step was to develop a function that found if the molecule had a smell. To achieve this ...

In the example below...

Discuss this function

Can the function handle type errors?

tests for this function

#### ☠️ Is it toxic?



#### 📈 Evaporation trace

#### 🤔 So... can we use this molecule?

The function __usable_in_perfume__ evaluates whether a molecule is suitable for use in perfume formulations.

### 👃 What odor does it give in the perfume?

The function __odor_molecule_perfume__ answers this question. It will return a dictionnary which matches the given molecule to a selection of perfumes in which it is present, and which smell it gives the perfume. Two functions were defined to be used in the function, or used seperately if you just wish to know the scent of the molecule for example. These two functions are __match_molecule_to_perfumes__ and __match_mol_to_odor__.

First, the database __perfumes.json__ was manually created with 29 perfumes. This list is limited as there is no database with all the perfumes and their ingredients. There is also the fact that not all perfume makers provide the list of molecules used in their perrfumes. A separate databes, __molecules.json__, contains all the molecules that are contained in the perfumes. To each is associated their SMILES and their odor. The odor was found from an additional database found online, __withodors.csv__. This last database contains a lot of unnecessary molecules for this package, which is why the choice of creating a smaller database was made. The functions used to create these databases are found in __scraper.py__. It also doesn't include any molecules which are odorles.

#### Match molecule to perfume

The function __match_molecule_to_perfumes__ will return a list of the perfumes which contain this molecule. Below is an example usage.

In [None]:
from perfumeme.perfume_molecule import match_molecule_to_perfumes

# Find all perfumes containing Linalool
linalool_perfumes = match_molecule_to_perfumes("Linalool")
print(linalool_perfumes)

Discuss this function

#### Match molecule to odor

The function __match_mol_to_odor__ returns a list which contains the odors of a given molecule.

An example usage for this function:

In [None]:
from perfumeme.perfume_molecule import match_mol_to_odor

# Find all odors of Linalool
linalool_odors = match_mol_to_odor("Linalool")
print(linalool_odors)

Discuss

### Limitations of the module

The limitation for __odor_molecule_perfume__ is that there are only 29 perfumes listed in the database. Perfumes can always be added to it, but only manually. The function __match_mol_to_odor__ only returns the odors of a molecule if it's in the database __molecules.json__. It is definetly possible to allow the user to add the molecule they insert to the database. This is an update that can be done in the future.

## 👉 Conclusion

In [None]:
from perfumeme.main_functions import has_a_smell, is_toxic_skin, evaporation_trace

evaporation_trace("water")

ModuleNotFoundError: No module named 'src'

🔍 What does usable_in_perfume() do?
This all-in-one function performs the following:

Odor check: uses has_a_smell() to detect whether the molecule is likely to be perceived by smell.
Toxicity check: uses is_toxic_skin() to determine if it's safe for dermal application.
Evaporation modeling: calls evaporation_trace() to retrieve and simulate:
Vapor pressure (Pvap)
Boiling point
Enthalpy of vaporization
Evaporation curve (Clausius-Clapeyron model)
Note classification:
If Pvap is known, the molecule is classified as top, heart, or base note based on its extrapolated vapor pressure at 37°C (body temperature).
If Pvap is missing, fallback is done using boiling point.
Result visualization: the evaporation curve is annotated with the note type and saved as an image.
Return values:
A text summary including smell, toxicity, and note type
The path to the annotated plot image