# Data Analysis On Cannabidiol
#### As students, we're bound to know or have seen young people around us using cannabis. That's why we were directly interested in these molecules, to learn more about them from a chemical point of view.
#### Cannabis is a smokable substance made up of molecules from the cannabidiol family, the main ones being THC and CBD. These two molecules form the basis of our study. 

In [None]:
from hempy import *

The modules of our package are imported, whether functions, classes, variables or definitions, and are now accessible.

In [None]:
molecule_properties.smiles_code("THC")

In [None]:
molecule_properties.smiles_code("CBD")

This function takes the names of the molecules studied, in this case THC and CBD, and returns their corresponding smiles codes. This line calls the `smiles.code` method on the `molecule_properties` object, taking `"THC"`as argument.

The SMILES code is a standard way of describing the structure of a molecule using a string of characters. These are textual representations of chemical structures.

References : PubChem for the SMILES : https://pubchem.ncbi.nlm.nih.gov/search/help_search.html

In [None]:
from rdkit import Chem

THC_smiles = "CCCCCC1=CC(=C2C3C=C(CCC3C(OC2=C1)(C)C)C)O"
CBD_smiles = "CCCCCC1=CC(=C(C(=C1)O)C2C=C(CCC2C(=C)C)C)O"

THC_mol = Chem.MolFromSmiles(THC_smiles)
CBD_mol = Chem.MolFromSmiles(CBD_smiles)

This code imports RDKit's `Chem` module, defines SMILES chains with strings for THC and CBD molecules, then converts these chains into RDKit molecular objects `Mol`. This makes it possible to manipulate the molecules programmatically and use the RDKit library to analyze and visualize chemical structures.

In [None]:
molecule_visualization.draw_2D(THC_smiles)

In [None]:
molecule_visualization.draw_2D(CBD_smiles)

This line of code calls the `draw_2D` method on the `molecule_visualization` object, giving it the smiles string of the desired molecule.

The challenges for these functions were to learn how to manipulate sklearn.metrics

In [None]:
molecule_visualization.draw_3D(THC_mol,"THC.pdb")

In [None]:
molecule_visualization.draw_3D(CBD_mol,"CBD.pdb")

This fonction generate a 3D representation of the THC and CBD and save it in a PDB file.

This line of code calls the `draw_3D` method on the `molecule_visualization` object, passing it two arguments:

For example: 
`THC_mol`: the molecule object representing THC.
`"THC.pdb"`: the name of the file in which the molecule's 3D structure will be saved.

The challenges for these functions were to learn how to manipulate nglview.

References : ChatGPT, Chemdraw

In [None]:
cas_number='1972-08-3' #CAS for THC

properties_data, chemical_url = molecule_properties.fetch_chemical_properties_by_cas(cas_number)

display(properties_data) 
print(f"URL to chemical page: {chemical_url}")

This code contains a `molecule_properties` module with a method for retrieving chemical properties from a CAS number.

The CAS number for THC is defined and the function `fetch_chemical_properties_by_cas` is called with this number. It returns a dictionary containing the chemical properties of THC and a URL to a web page with more information. `display` displays the chemical properties in a table. The URL is printed to provide a direct link to further information.

The CAS number is a unique, unambiguous identifier for a specific molecule. It links all available data and research on this substance.

The main challenge of this function was to be able to go online from the Jupyter space to retrieve the desired URL.

In [None]:
similarity_jaccard, similarity_tanimoto = molecule_properties.calculate_molecular_similarity(THC_mol, CBD_mol)

print("Similarity (Jaccard):", similarity_jaccard)
print("Similarity (Tanimoto):", similarity_tanimoto)

This line calls up a method for calculating Jaccard and Tanimoto similarities between two molecules, in this case THC and CBD. It also stores these results in the variables `similarity_jaccard and similarity_tanimoto`. These similarities are then displayed. 

Molecular similarity is a measure used in computational chemistry to assess how similar two molecules are. Jaccard and Tanimoto indices are often used for this purpose. The closer the index is to 1, the greater the similarity between two molecules.

The red message that appears is not an error message as you might think, but an information message. The machine informs us that the values we gave it in bolleans have been converted to jarccard matrix. This message serves to warn that a change has been made, if this type is not suitable for future functions. 

References : geeksforgeeks.org : https://www.geeksforgeeks.org/how-to-calculate-jaccard-similarity-in-python/ and stackoverflow.com : https://stackoverflow.com/questions/50683128/python-how-to-compute-the-jaccard-index-between-two-networks

In [None]:
synthesis_reactions.visualize_THC_synthesis()

In [None]:
synthesis_reactions.visualize_CBD_synthesis()

These lines of code call a method for generating and displaying a graphical representation of the THC chemical synthesis process. 

For this, RDKIT's `Chem` and `draw` are used, which manipulate and draw molecular objects. Each step in the synthesis is a dictionary with reactants and products represented by their SMILES chains, converted into objects and then displayed as images. 

The visualization is useful to better understand the steps involved in THC and CBD synthesis.

This code is limited to showing only this specified reaction. To be able to make a global function to write all possible reactions in chemistry would be far too complicated, if not impossible. 

In [None]:
survey_results.plot_consumption_percentage()

This line of code calls a method for generating and displaying a graph representing the consumption percentages obtained from the results of a survey. 

The `matplotlib.pyplot` library is imported under the name `plt` to create graphs. The `SurveyResults` class is initialized with survey data stored in `self.data` and `plot_consumption_percentage generates a bar chart representing consumption percentages for different categories. A `data dictionary contains the consumption categories and their respective percentages.

These analyses provide a better understanding of participants' THC and CBD consumption trends and behaviors.

In [None]:
survey_results.plot_time_vs_age()

The code line `survey_results.plot_time_vs_age()` calls a method for generating and displaying a graph of time versus age of participants.
The `SurveyResults` class is initialized with lists of ages `ages` and times `times`.
The `plot_time_vs_age` method generates a scatter plot of time spent versus age of participants.

This analysis allows us to present the relationships between two quantitative variables, providing a better understanding of how time spent varies with the age of the participants.

This survey does not explain why a 24-year-old has a longer effect duration than others. It only serves to observe the results. To interpret it, biological research or other tests on a larger population could be carried out, which is not our task here. 

The challenges for these functions were to learn how to manipulate the Beautiful Soup and Requests libraries.

Références : ChatGPT, geeksforgeeks.org