# Project Proposal: PubChem Toolkit


In [2]:
!pip install .

Processing d:\cmu_sem_2\creating scientific research software\pubchem v2
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: pubchem-toolkit
  Building wheel for pubchem-toolkit (setup.py): started
  Building wheel for pubchem-toolkit (setup.py): finished with status 'done'
  Created wheel for pubchem-toolkit: filename=pubchem_toolkit-0.0.2-py3-none-any.whl size=1474 sha256=c4eade0417731e85fa64eda9acfc29d59f496fb71b1cf2cd11b67e2b3bc0e078
  Stored in directory: C:\Users\Sriram\AppData\Local\Temp\pip-ephem-wheel-cache-kx3nl668\wheels\af\75\31\772358da26a50fdcc74f561c5c05c89ab3de1d4ddbdde8aa63
Successfully built pubchem-toolkit
Installing collected packages: pubchem-toolkit
  Attempting uninstall: pubchem-toolkit
    Found existing installation: pubchem-toolkit 0.0.2
    Uninstalling pubchem-toolkit-0.0.2:
      Successfully uninstalled pubchem-toolkit-0.0.2
Successfully installed pubchem-toolkit-0.0


[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


## Package structure

In [3]:
!custom-tree

├── 📂 build
│   └── 📂 bdist.win-amd64
├── 📂 pubchem_toolkit
│   ├── 📄 compound_search.py
│   ├── 📄 __innit__.py
│   └── 📂 __pycache__
│       ├── 📄 auto_complete.cpython-311.pyc
│       ├── 📄 compound_search.cpython-311.pyc
│       └── 📄 utils.cpython-311.pyc
├── 📂 pubchem_toolkit.egg-info
│   ├── 📄 dependency_links.txt
│   ├── 📄 PKG-INFO
│   ├── 📄 requires.txt
│   ├── 📄 SOURCES.txt
│   └── 📄 top_level.txt
├── 📄 README.md
├── 📄 run.ipynb
├── 📄 setup.py
├── 📂 tests_pytest
│   ├── 📄 test_compound_search.py
│   ├── 📄 __innit__.py
│   └── 📂 __pycache__
│       ├── 📄 tests_autocomplete_search_test.cpython-311-pytest-8.1.1.pyc
│       ├── 📄 tests_compound_search_test.cpython-311-pytest-8.1.1.pyc
│       ├── 📄 test_autocomplete_search.cpython-311-pytest-8.1.1.pyc
│       └── 📄 test_compound_search.cpython-311-pytest-8.1.1.pyc
└── 📂 tests_unittest
    ├── 📄 tests_compound_search_test.py
    └── 📂 __pycache__
        ├── 📄 tests_autocomplete_search.cpython-311.pyc
        ├── 📄 tests_autocomple

## Code Quality and Testing

### Black

Black is a Python code formatter that automatically formats Python code according to a set of predefined rules. It ensures that your Python code is formatted consistently and follows the PEP 8 style guide. Black takes Python code as input and outputs formatted code with consistent indentation, line breaks, and spacing.

Black aims to provide a simple and opinionated formatting style without any configuration options. It focuses on producing code that is easy to read and understand, while also minimizing unnecessary differences in coding styles between different developers or teams.

Using Black can help improve code quality, maintainability, and collaboration by enforcing a consistent coding style across projects. It's widely used in the Python community and is integrated with various development tools and workflows.

In [4]:
!black .

All done! ✨ 🍰 ✨
7 files left unchanged.


### Unittest

A unittest is a testing framework in Python used to write and execute test cases for your Python code. It is part of the Python Standard Library, which means you don't need to install any additional packages to use it.

With `unittest`, you can define test cases by creating subclasses of `unittest.TestCase`. Each test case typically consists of one or more methods that start with the word `test`. These methods are automatically executed when you run your test suite.

`unittest` provides various assertion methods to check whether the expected behavior of your code matches the actual behavior. Some common assertion methods include `assertEqual`, `assertTrue`, `assertFalse`, `assertRaises`, etc.


In [5]:
!python -m unittest discover tests_unittest

..
----------------------------------------------------------------------
Ran 2 tests in 3.144s

OK


### Pytest

Pytest is a testing framework for Python that simplifies the process of writing and executing test cases. It offers a more concise syntax compared to the built-in `unittest` module and provides powerful features for effective testing.

With Pytest, you can write test functions using Python's assert statement, making the test code more readable and intuitive. Test functions don't need to be part of a specific class; they can be standalone functions or methods within a class.

Pytest offers various features that enhance the testing experience:

1. **Fixture Support**: Pytest provides a fixture mechanism for managing test dependencies and resources. Fixtures help set up preconditions for tests and can be reused across multiple test functions.

2. **Parameterization**: Pytest supports test parametrization, allowing you to run the same test with different inputs. This promotes code reuse and reduces the need for duplicate test code.

3. **Rich Assertion Library**: Pytest comes with a rich set of built-in assertion methods for making test assertions. These assertions are expressive and cover a wide range of scenarios.

4. **Plugins**: Pytest has a vast ecosystem of plugins that extend its functionality. Plugins can be used for features such as code coverage analysis, test reporting, and mocking.

5. **Integration**: Pytest integrates well with other tools and frameworks, making it suitable for various testing workflows. It can be seamlessly integrated with continuous integration (CI) systems and other testing tools.

Overall, Pytest offers a flexible and powerful testing framework that simplifies the process of writing and running tests, leading to more robust and maintainable codebases.


In [6]:
!python -m pytest tests_pytest/

platform win32 -- Python 3.11.5, pytest-8.1.1, pluggy-1.5.0
rootdir: d:\CMU_Sem_2\Creating Scientific Research Software\PubChem V2
plugins: anyio-4.1.0
collected 4 items

tests_pytest\test_compound_search.py [32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m                                [100%][0m



## Manual testing

Testing the working functionalities of the package.

In [11]:
from pubchem_toolkit.compound_search import autocomplete_search

# Perform auto-complete search for compounds with the query "aspirin"
compounds = autocomplete_search("cafoeine", dictionary="compound", limit=6)
print("Compounds:", compounds)

Compounds: ['caffeine', 'L-cysteine', 'beta-carotene', 'Chlorine', 'CHOLINE CHLORIDE', 'S-Methyl-L-cysteine']


In [12]:
from pubchem_toolkit.compound_search import autocomplete_search
from pubchem_toolkit.compound_search import get_compound_properties_with_autocomplete


compound_name = "cafoeine"
property_name = "Molecular Weight"
property_value = get_compound_properties_with_autocomplete(compound_name, "name", property_name)

if property_value is not None:
    print(f"The {property_name} of the compound '{compound_name}' is {property_value}")
else:
    print(f"The property {property_name} for the compound '{compound_name}' was not found.")

The Molecular Weight of the compound 'cafoeine' is 194.19


In [14]:
from pubchem_toolkit.compound_search import get_compound_properties_with_autocomplete

compound_name = "cafoeine"
properties = get_compound_properties_with_autocomplete(compound_name)
print("Compound Properties:")
for key, value in properties.items():
    print(f"{key}: {value}")

Compound Properties:
Compound: 1
Compound Complexity: 293
Count: 0
Fingerprint: None
IUPAC Name: caffeine
InChI: InChI=1S/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3
InChIKey: RYYVLZVUVIJVGH-UHFFFAOYSA-N
Log P: -0.1
Mass: 194.08037557
Molecular Formula: C8H10N4O2
Molecular Weight: 194.19
SMILES: CN1C=NC2=C1C(=O)N(C(=O)N2C)C
Topological: 58.4
Weight: 194.08037557


In [33]:
from pubchem_toolkit.auto_complete import autocomplete_search

# Perform auto-complete search for compounds with the query "aspirin"
compounds = autocomplete_search("aspirriiinnin", dictionary="compound", limit=6)
print("Compounds:", compounds)

# Perform auto-complete search for genes with the query "egfr"
genes = autocomplete_search("egfr", dictionary="gene", limit=5)
print("Genes:", genes)

# Perform auto-complete search for assays with the query "p68"
assays = autocomplete_search("p68", dictionary="assay", limit=8)
print("Assays:", assays)

# Perform auto-complete search for taxa with the query "mouse"
taxa = autocomplete_search("mouse", dictionary="taxonomy", limit=5)
print("Taxa:", taxa)

Compounds: ['Aspidinin', 'aspirin', 'NO-aspirin', 'Aspirin sodium', 'Aspirin DL-lysine', 'Aspirin calcium']
Genes: ['Egfr', 'egfra', 'egfrb', 'Egfros', 'Egfrap']
Assays: ['Chain A, RNA-directed RNA polymerase (NS5B) (P68) - Protein Target', 'Inhibition of HIV2 reverse transcriptase p68/p54 expressed in Escherichia coli - Assay Name', 'Inhibition of HIV2 reverse transcriptase p68/p54 expressed in Escherichia coli at 495 uM - Assay Name', 'Binding affinity to wild-type human partial length PRKCH (S324 to P683 residues) expressed in Escherichia coli BL21 at 1 uM by Kinomescan method relative to control - Assay Name', 'Inhibition of wild type human partial length PRKCH (S324 to P683 residues) expressed in bacterial expression system at 1 uM by Kinomescan method relative to control - Assay Name', 'Inhibition of wild-type human partial length PRKCH (S324 to P683 residues) expressed in bacterial expression system at 1 uM by Kinomescan method relative to control - Assay Name', 'Inhibition of w

## Conclusion

The PubChem Toolkit project aims to provide a comprehensive Python package that facilitates easy access to chemical compound data from the PubChem database. By leveraging the functionalities offered by this toolkit, researchers, students, and professionals in chemistry and related fields can streamline their workflows, enhance their analysis capabilities, and accelerate their research endeavors. With a user-friendly interface, robust testing, and adherence to best coding practices, the PubChem Toolkit stands as a valuable resource for the scientific community, empowering users to explore, analyze, and extract insights from chemical data with ease and efficiency.
