# Getting Started with PubChemPy
This notebooks is designed to introduced you to PubChemPy, a library for working with [PubChem](https://www.example.com) resource. To use pubchempy, you'll need to either use the command

```pip install pubchempy```

on your command line or use the command

```!pip install pubchempy```

in the first coding cell in this notebook.

In [1]:
!pip install pubchempy



It's not enough to have it installed on your computer. You need to tell the Jupyter notebook to access the library.

In [2]:
import pubchempy as pcp

We are just going to look at a few things that you can do with PubChemPy, which accesses the [PubChem database](https://pubchem.ncbi.nlm.nih.gov/). We'll learn
1. How to access a molecule using its chemical ID#.
2. How to access a molecule by name
3. Some of the things we can learn about the molecule once we have its chemical ID#
4. How to visualize the molecule

We'll start looking at a molecule called NAD+ that I worked with almost every day in graduate school. It looks like this and its compound ID# is 5892.

![2D image of NAD+](images/NAD.png "The 2D structure of redox cofactor NAD+")

In [4]:
molecule = pcp.Compound.from_cid(5892)

In [5]:
print(molecule.molecular_weight)

663.4


In [None]:
# pcp.get_compounds('<enter the name of a compound here>', 'name', record_type='3d')

In [6]:
print(molecule.iupac_name)
print(molecule.molecular_weight)
print(molecule.molecular_formula)
print(molecule.synonyms)

[[(2R,3S,4R,5R)-5-(6-aminopurin-9-yl)-3,4-dihydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl] [(2R,3S,4R,5R)-5-(3-carbamoylpyridin-1-ium-1-yl)-3,4-dihydroxyoxolan-2-yl]methyl phosphate
663.4
C21H27N7O14P2
['nadide', '53-84-9', 'coenzyme I', 'NAD+', 'beta-NAD', 'beta-nicotinamide adenine dinucleotide', 'Codehydrogenase I', 'diphosphopyridine nucleotide', 'Codehydrase I', 'nicotinamide adenine dinucleotide', 'beta-Diphosphopyridine nucleotide', 'Cozymase I', 'beta-NAD+', 'COZYMASE', 'Enzopride', 'Nadidum', 'Nicotinamide-adenine dinucleotide', 'NAD', 'codehydrogenase', 'Nadida', 'CO-I', 'Nicotinamide dinucleotide', 'Pyridine, nucleotide diphosphate', 'Adenine-nicotinamide dinucleotide', 'Nicotineamide adenine dinucleotide', 'Nadidum [INN-Latin]', 'NAD zwitterion', 'Nadida [INN-Spanish]', 'CO-1', 'beta-DPN', '.beta.-NAD', 'Nadide [USAN:INN:BAN:JAN]', 'EINECS 200-184-4', 'MFCD00036253', 'NSC 20272', 'BRN 3584133', '0U46U6E8UK', 'NADIDE [USAN]', 'NADIDUM [HPUS]', 'NADIDE [INN]', 'NADIDE [JAN]', 

In [7]:
# Visualize the aspirin in 3D

import py3Dmol
py3Dmol.view()
view = py3Dmol.view(width = 680, height = 250, query ='cid:5892', viewergrid = (1,3), linked = True)

view.setStyle({'line': {'linewidth': 8}}, viewer = (0,0))
view.setStyle({'stick': {'colorscheme':'cyanCarbon'}}, viewer = (0,1))
view.setStyle({'sphere': {}}, viewer = (0,2))

view.setBackgroundColor('#ebf4fb', viewer = (0,0))
view.setBackgroundColor('#cda9fc', viewer = (0,1))
view.setBackgroundColor('#e6e6e6', viewer = (0,2))

<py3Dmol.view at 0x14407d460>

## Lipinski's Rule of 5

We can use PCP to get the values for Lipinski's rule of 5 for a compound in the PubChem database directly.

In [8]:
HBA = pcp.get_properties(
  properties = 'HBondAcceptorCount',
  identifier = "aspirin",
  namespace = "name"
  )
HBD = pcp.get_properties(
  properties = 'HBondDonorCount',
  identifier = "aspirin",
  namespace = "name"
  )
MW = pcp.get_properties(
  properties = 'MolecularWeight',
  identifier = "aspirin",
  namespace = "name"
  )
XLP = pcp.get_properties(
  properties = 'XlogP',
  identifier = "aspirin",
  namespace = "name"
  )
print(HBA, '\n', HBD, '\n', MW, '\n', XLP)

[{'CID': 2244, 'HBondAcceptorCount': 4}] 
 [{'CID': 2244, 'HBondDonorCount': 1}] 
 [{'CID': 2244, 'MolecularWeight': '180.16'}] 
 [{'CID': 2244, 'XLogP': 1.2}]


In [10]:
# Do this for a list of compounds and put them in a pandas dataframe
properties = ['HBondAcceptorCount', 'HBondDonorCount', 'MolecularWeight','XlogP']

In [13]:
a = pcp.get_properties(properties, 'THC', 'name', as_dataframe = True)

In [14]:
a

Unnamed: 0_level_0,MolecularWeight,XLogP,HBondDonorCount,HBondAcceptorCount
CID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
16078,314.5,7,1,2
