# PubChemPy for the Bioinformatics Club
This notebooks is designed to introduced you to PubChemPy, a library for working with [PubChem](https://www.example.com) resource. To use pubchempy, you'll need to either use the command

```pip install pubchempy```

on your command line or use the command

```!pip install pubchempy```

in the first coding cell in this notebook.

In [None]:
!pip install pubchempy

Once you have installed pubchempy on your computer, you'll need to import it to use it. The standard abbreviation for pubchempy is pcp.

In [2]:
import pubchempy as pcp

Now let's play with it a bit. We're going to learn a bit about the compound object that pubchempy creates, starting with NAD+, a compound I worked with every day in graduate school. In the next cell, use the 

```Compound.from_cid(compound#)```

command to pull NAD+ from PubChem.

In [3]:
molecule = pcp.Compound.from_cid(5287958)

Now we will use explore the contents of the compound object that can be extracted using the command

```molecule = c.trait```

where trait can be molecular_weight, molecular_formula, isomeric_smiles, xlogp, iupac_name, and synonyms. You can also select any trait from a menu if you type

```print(molecule.<tab>)```

where <tab> means to hit the tab kit so you can see all options. Try a few.

In [4]:
print(molecule.molecular_weight)

663.4


In [5]:
print(molecule.molecular_formula)

C21H27N7O14P2


In [6]:
print(molecule.isomeric_smiles)

C1=C(C=[NH+]C=C1C(=O)N)[C@H]2[C@@H]([C@@H]([C@H](O2)COP(=O)([O-])OP(=O)(O)OC[C@@H]3[C@H]([C@H]([C@@H](O3)N4C=NC5=C(N=CN=C54)N)O)O)O)O


In [7]:
print(molecule.xlogp)

-6.2


In [8]:
print(molecule.iupac_name)

[[(2R,3S,4R,5R)-5-(6-aminopurin-9-yl)-3,4-dihydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl] [(2R,3S,4R,5S)-5-(5-carbamoylpyridin-1-ium-3-yl)-3,4-dihydroxyoxolan-2-yl]methyl phosphate


In [9]:
print(molecule.synonyms)

['5-BETA-D-RIBOFURANOSYLNICOTINAMIDE ADENINE DINUCLEOTIDE', 'DB03020']


What if you don't know the PubChem cid for your compound of interest? pubchempy has a get_compound function that addresses this.

In [14]:
results = pcp.get_compounds('C21H27N7O14P2', 'formula')
print(results)

[Compound(5892), Compound(5288979), Compound(21604869), Compound(444170), Compound(10897651), Compound(5289104), Compound(444215), Compound(127255362), Compound(16219771), Compound(925), Compound(24916815), Compound(72200610), Compound(9874504), Compound(25162925), Compound(111288), Compound(90663709), Compound(163190097), Compound(6604186), Compound(12358825), Compound(4231851), Compound(4349538), Compound(5287958), Compound(196623), Compound(134720244), Compound(46936879), Compound(146167235), Compound(16758169), Compound(5315996), Compound(134559577), Compound(90657086), Compound(138105875), Compound(146019234), Compound(154701110), Compound(154701119), Compound(45109817), Compound(129630323), Compound(86289063), Compound(6419894), Compound(45105095), Compound(169424640), Compound(3283972), Compound(23644209), Compound(44297758), Compound(44297952), Compound(46936557), Compound(46936558), Compound(46936878), Compound(59148228), Compound(59148240), Compound(71751003), Compound(891309

In [15]:
pcp.get_compounds('tylenol', 'name', record_type='3d')

[Compound(1983)]

In [17]:
tylenol = pcp.Compound.from_cid(1983)
print(tylenol.iupac_name)
print(tylenol.molecular_weight)
print(tylenol.molecular_formula)
print(tylenol.synonyms)

N-(4-hydroxyphenyl)acetamide
151.16
C8H9NO2
['acetaminophen', 'Paracetamol', '4-Acetamidophenol', '103-90-2', 'Tylenol', 'N-(4-Hydroxyphenyl)acetamide', 'APAP', 'Panadol', 'N-Acetyl-p-aminophenol', "4'-Hydroxyacetanilide", 'Acetaminofen', 'Datril', 'p-Acetamidophenol', 'p-Hydroxyacetanilide', 'Algotropyl', 'Doliprane', 'Injectapap', 'Lonarid', 'Naprinol', 'Acenol', 'Biocetamol', 'Febridol', 'Servigesic', 'Vermidon', 'Acamol', 'Alpiny', 'Anelix', 'Multin', 'Neopap', 'Paracet', 'p-Acetaminophenol', 'Abensanil', 'Acetagesic', 'Acetalgin', 'Clixodyne', 'Gelocatil', 'Liquagesic', 'Pyrinazine', 'Acephen', 'Alvedon', 'Anaflon', 'Apamide', 'Dafalgan', 'Disprol', 'Dolprone', 'Dymadon', 'Febrilix', 'Febrolin', 'Finimal', 'Homoolan', 'Lestemp', 'Ortensan', 'Paldesic', 'Salzone', 'Tabalgin', 'Tralgon', 'Tussapap', 'Valadol', 'Valgesic', 'Amadil', 'Anhiba', 'Calpol', 'Captin', 'Dirox', 'Eneril', 'Fendon', 'Hedex', 'Lyteca', 'Pacemo', 'Panets', 'Parmol', 'Tapar', 'Tempra', 'Acetamide, N-(4-hydroxyph

In [18]:
pcp.get_compounds('benzene', 'name')

[Compound(241)]

In [19]:
benzene = pcp.Compound.from_cid(241)
print(benzene.isomeric_smiles)

C1=CC=CC=C1


### Dataframes from PubChemPy

You can import information from PubChem in the form of a pandas DataFrame.

In [20]:
df1 = pcp.get_compounds('C20H41Br', 'formula', as_dataframe=True)
df2 = pcp.get_substances([9,99,999,9999], as_dataframe=True)
df3 = pcp.get_properties(['isomeric_smiles', 'xlogp', 'rotatable_bond_count'], 'C20H41Br', 'formula', as_dataframe=True)

In [None]:
df3.head()

In [None]:
df2.head()

In [None]:
df1.head()

In [None]:
# Download image files from PubChem

pcp.download('PNG', 'images/asp.png', 'Aspirin', 'name', overwrite=True)
pcp.download('PNG', 'images/acet.png', 'Acetaminophen', 'name', overwrite=True)
pcp.download('CSV', 'data/s.csv', [1,2,3], operation='property/CanonicalSMILES,IsomericSMILES', overwrite=True)


In [None]:
#Display the aspirin image

from IPython.display import Image, display

image_paths = ['images/asp.png', 'images/acet.png']

for image_path in image_paths:
    display(Image(filename=image_path))

In [None]:
# pcp.download('SDF', 'images/asp.sdf', 'Aspirin', 'name', overwrite=True)
cid = pcp.get_cids('acetaminophen', 'name')
cid

In [None]:
# Visualize the aspirin in 3D

import py3Dmol
py3Dmol.view()
view = py3Dmol.view(width = 680, height = 250, query ='cid:3345', viewergrid = (1,3), linked = True)

view.setStyle({'line': {'linewidth': 8}}, viewer = (0,0))
view.setStyle({'stick': {'colorscheme':'cyanCarbon'}}, viewer = (0,1))
view.setStyle({'sphere': {}}, viewer = (0,2))

view.setBackgroundColor('#ebf4fb', viewer = (0,0))
view.setBackgroundColor('#cda9fc', viewer = (0,1))
view.setBackgroundColor('#e6e6e6', viewer = (0,2))