# Project Proposal: PubChem Toolkit

## Problem Statement:
Chemical research often requires access to comprehensive databases containing information about chemical compounds, their properties, and related data. However, accessing and retrieving such information can be cumbersome and time-consuming, especially when dealing with large datasets. Researchers, students, and professionals in chemistry and related fields may face challenges in efficiently querying and analyzing chemical data.

## Proposed Solution:
The proposed solution is to develop the "PubChem Toolkit," a Python package that provides easy access to the [PubChem database](https://pubchem.ncbi.nlm.nih.gov/docs/about). PubChem is a comprehensive chemical database maintained by the National Center for Biotechnology Information (NCBI) that contains information on the biological activities of small molecules. The PubChem Toolkit will offer functionalities to search for chemical compounds, retrieve compound properties, analyze chemical structures, and perform other relevant tasks.

![image.png](attachment:image.png)

In [1]:
!pip install .

Processing d:\cmu_sem_2\creating scientific research software\v1\pubchem
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: pubchem-toolkit
  Building wheel for pubchem-toolkit (setup.py): started
  Building wheel for pubchem-toolkit (setup.py): finished with status 'done'
  Created wheel for pubchem-toolkit: filename=pubchem_toolkit-0.0.1-py3-none-any.whl size=1471 sha256=4516490d1c45100f8114537c54a23df22ca56517c4d45df83f968a9bd8a1753b
  Stored in directory: C:\Users\Sriram\AppData\Local\Temp\pip-ephem-wheel-cache-fsnrh_00\wheels\7e\15\9d\1f9b5bc4238b74d7fde7b3ebab2f56670cf3d833b7679c1809
Successfully built pubchem-toolkit
Installing collected packages: pubchem-toolkit
  Attempting uninstall: pubchem-toolkit
    Found existing installation: pubchem-toolkit 0.0.1
    Uninstalling pubchem-toolkit-0.0.1:
      Successfully uninstalled pubchem-toolkit-0.0.1
Successfully installed pubchem-toolkit-0.0


[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


## Package structure

In [2]:
# !pip install custom-tree

In [3]:
!custom-tree

├── 📂 build
│   └── 📂 bdist.win-amd64
├── 📂 pubchem_toolkit
│   ├── 📄 compound_search.py
│   ├── 📄 __innit__.py
│   └── 📂 __pycache__
│       ├── 📄 compound_search.cpython-311.pyc
│       └── 📄 utils.cpython-311.pyc
├── 📂 pubchem_toolkit.egg-info
│   ├── 📄 dependency_links.txt
│   ├── 📄 PKG-INFO
│   ├── 📄 requires.txt
│   ├── 📄 SOURCES.txt
│   └── 📄 top_level.txt
├── 📄 README.md
├── 📄 run.ipynb
├── 📄 setup.py
└── 📂 tests
    ├── 📄 tests_utils.py
    └── 📂 __pycache__
        └── 📄 tests_utils.cpython-311.pyc

Total directories 📂: 7
Total files 📄: 14

If you liked it give us a star & follow for more :) ... https://github.com/dragon-devs/custom-tree


## Code Quality and Testing

### Black

Black is a Python code formatter that automatically formats Python code according to a set of predefined rules. It ensures that your Python code is formatted consistently and follows the PEP 8 style guide. Black takes Python code as input and outputs formatted code with consistent indentation, line breaks, and spacing.

Black aims to provide a simple and opinionated formatting style without any configuration options. It focuses on producing code that is easy to read and understand, while also minimizing unnecessary differences in coding styles between different developers or teams.

Using Black can help improve code quality, maintainability, and collaboration by enforcing a consistent coding style across projects. It's widely used in the Python community and is integrated with various development tools and workflows.

In [4]:
!black .

All done! ✨ 🍰 ✨
5 files left unchanged.


### Unittest

A unittest is a testing framework in Python used to write and execute test cases for your Python code. It is part of the Python Standard Library, which means you don't need to install any additional packages to use it.

With `unittest`, you can define test cases by creating subclasses of `unittest.TestCase`. Each test case typically consists of one or more methods that start with the word `test`. These methods are automatically executed when you run your test suite.

`unittest` provides various assertion methods to check whether the expected behavior of your code matches the actual behavior. Some common assertion methods include `assertEqual`, `assertTrue`, `assertFalse`, `assertRaises`, etc.


In [6]:
!python -m unittest discover tests/

...
----------------------------------------------------------------------
Ran 3 tests in 3.685s

OK


## Manual testing

Testing the working functionalities of the package.

In [7]:
# Now you can import modules from the pubchem_toolkit package
from pubchem_toolkit import compound_search

# Example usage with SMILES notation
smiles_notation = "CC(=O)OC1=CC=CC=C1C(=O)O"
property_name = "Molecular Weight"
property_value = compound_search.get_compound_properties(
    smiles_notation, "smiles", property_name
)

if property_value is not None:
    print(
        f"The {property_name} of the compound with SMILES '{smiles_notation}' is {property_value}"
    )
else:
    print(
        f"The property {property_name} for the compound with SMILES '{smiles_notation}' was not found."
    )

The Molecular Weight of the compound with SMILES 'CC(=O)OC1=CC=CC=C1C(=O)O' is 180.16


In [8]:
compound_name = "Aspirin"
property_name = "Molecular Weight"
property_value = compound_search.get_compound_properties(
    compound_name, "name", property_name
)

if property_value is not None:
    print(f"The {property_name} of the compound '{compound_name}' is {property_value}")
else:
    print(
        f"The property {property_name} for the compound '{compound_name}' was not found."
    )

The Molecular Weight of the compound 'Aspirin' is 180.16


In [9]:
compound_name = "Aspirin"
property_name = "Molecular Formula"
property_value = compound_search.get_compound_properties(
    compound_name, "name", property_name
)

if property_value is not None:
    print(f"The {property_name} of the compound '{compound_name}' is {property_value}")
else:
    print(
        f"The property {property_name} for the compound '{compound_name}' was not found."
    )

The Molecular Formula of the compound 'Aspirin' is C9H8O4


In [10]:
# All results in one
compound_name = "aspirin"
properties = compound_search.get_compound_properties(compound_name)
print("Compound Properties:")
for key, value in properties.items():
    print(f"{key}: {value}")

Compound Properties:
Compound: 1
Compound Complexity: 212
Count: 3
Fingerprint: None
IUPAC Name: 2-acetoxybenzoic acid
InChI: InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)
InChIKey: BSYNRYMUTXBXSQ-UHFFFAOYSA-N
Log P: 1.2
Mass: 180.04225873
Molecular Formula: C9H8O4
Molecular Weight: 180.16
SMILES: CC(=O)OC1=CC=CC=C1C(=O)O
Topological: 63.6
Weight: 180.04225873


In [11]:
# Your code for manual tests
compound_name = "aspirinn"
properties = compound_search.get_compound_properties(compound_name)
print("Compound Properties:")
for key, value in properties.items():
    print(f"{key}: {value}")

ERROR:root:Failed to retrieve compound properties: 404 Client Error: PUGREST.NotFound for url: https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/aspirinn/json


HTTPError: 404 Client Error: PUGREST.NotFound for url: https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/aspirinn/json