# Getting Started with PubChemPy
This notebooks is designed to introduced you to PubChemPy, a library for working with [PubChem](https://www.example.com) resource. To use pubchempy, you'll need to either use the command

```pip install pubchempy```

on your command line or use the command

```!pip install pubchempy```

in the first coding cell in this notebook.

In [None]:
!pip install pubchempy

It's not enough to have it installed on your computer. You need to tell the Jupyter notebook to access the library.

In [None]:
import pubchempy as pcp

We are just going to look at a few things that you can do with PubChemPy, which accesses the [PubChem database](https://pubchem.ncbi.nlm.nih.gov/). We'll learn
1. How to access a molecule using its chemical ID#.
2. How to access a molecule by name
3. Some of the things we can learn about the molecule once we have its chemical ID#
4. How to visualize the molecule

We'll start looking at a molecule called NAD+ that I worked with almost every day in graduate school. It looks like this and its compound ID# is 5892.

![2D image of NAD+](images/NAD.png "The 2D structure of redox cofactor NAD+")

In [None]:
pcp.get_compounds('aspirin', 'name', record_type='3d')

In [None]:
molecule = pcp.Compound.from_cid(2244)

In [None]:
print(molecule.molecular_weight)

In [None]:
print(molecule.iupac_name)
print(molecule.molecular_formula)
# print(molecule.synonyms)

In [None]:
# Visualize the aspirin in 3D

import py3Dmol
py3Dmol.view()
view = py3Dmol.view(width = 680, height = 250, query ='cid:2244', viewergrid = (1,3), linked = True)

view.setStyle({'line': {'linewidth': 8}}, viewer = (0,0))
view.setStyle({'stick': {'colorscheme':'cyanCarbon'}}, viewer = (0,1))
view.setStyle({'sphere': {}}, viewer = (0,2))

view.setBackgroundColor('#ebf4fb', viewer = (0,0))
view.setBackgroundColor('#cda9fc', viewer = (0,1))
view.setBackgroundColor('#e6e6e6', viewer = (0,2))

## Lipinski's Rule of 5

We can use PCP to get the values for Lipinski's rule of 5 for a compound in the PubChem database directly.

In [None]:
# How could we make this more versatile?

drug = 'aspirin'

HBA = pcp.get_properties(
  properties = 'HBondAcceptorCount',
  identifier = "aspirin",
  namespace = "name"
  )
HBD = pcp.get_properties(
  properties = 'HBondDonorCount',
  identifier = "aspirin",
  namespace = "name"
  )
MW = pcp.get_properties(
  properties = 'MolecularWeight',
  identifier = "aspirin",
  namespace = "name"
  )
XLP = pcp.get_properties(
  properties = 'XlogP',
  identifier = "aspirin",
  namespace = "name"
  )
print(HBA, '\n', HBD, '\n', MW, '\n', XLP)

In [None]:
# Create a list variable to hold all of the properties you want to explore
properties = ['HBondAcceptorCount', 'HBondDonorCount', 'MolecularWeight', 'XlogP']
properties2 = ['MolecularFormula', 'MolecularWeight', 'CanonicalSMILES', 'IsomericSMILES', 'InChI', 'InChIKey', 'IUPACName', 'XLogP', 'ExactMass', 'MonoisotopicMass', 'TPSA', 'Complexity', 'Charge', 'HBondDonorCount', 'HBondAcceptorCount', 'RotatableBondCount', 'HeavyAtomCount', 'IsotopeAtomCount', 'AtomStereoCount', 'DefinedAtomStereoCount', 'UndefinedAtomStereoCount', 'BondStereoCount', 'DefinedBondStereoCount', 'UndefinedBondStereoCount', 'CovalentUnitCount', 'Volume3D', 'XStericQuadrupole3D', 'YStericQuadrupole3D', 'ZStericQuadrupole3D', 'FeatureCount3D', 'FeatureAcceptorCount3D', 'FeatureDonorCount3D', 'FeatureAnionCount3D', 'FeatureCationCount3D', 'FeatureRingCount3D', 'FeatureHydrophobeCount3D', 'ConformerModelRMSD3D', 'EffectiveRotorCount3D', 'ConformerCount3D']

In [None]:
import pandas as pd
Lip5 = pcp.get_properties(properties, 'aspirin', 'name', as_dataframe = True)
Lip5

In [None]:
AllProps = pcp.get_properties(properties2, 'aspirin', 'name', as_dataframe = True)
AllProps

In [None]:
# Repeat this process for three more drugs
pen = pcp.get_properties(properties, 'penicillin', 'name', as_dataframe = True)
vioxx = pcp.get_properties(properties, 'vioxx', 'name', as_dataframe = True)
strep = pcp.get_properties(properties, 'streptomycin', 'name', as_dataframe = True)
doxy = pcp.get_properties(properties, 'doxycycline', 'name', as_dataframe = True)

In [None]:
result_df = pd.concat([Lip5, pen, vioxx, strep, doxy], ignore_index=True)
result_df

In [None]:
# make a list of drug names to add into a new dataframe column
drug_name = ['aspirin', 'penicillin', 'vioxx', 'streptomycin', 'doxycycline']

# add a new column to the dataframe with the drug names
result_df['name'] = drug_name
result_df

In [None]:
# reorder the columns
result_df = result_df[['name', 'MolecularWeight', 'XLogP', 'HBondDonorCount', 'HBondAcceptorCount']]
result_df