## Parsing and manipulating PDB files

In [None]:
%matplotlib inline
from matplotlib import pyplot as plt

import numpy as np
import pandas as pd

+ Two important pieces of metadata for a crystal structure are $R$ and $R_{free}$.  These measure how well the crystallographic model fits experimental data used to generate the model.  The location of these numbers are indicated in pdb files by the following text.

  ```
  R VALUE          (WORKING SET, NO CUTOFF)

  and 

  FREE R VALUE                  (NO CUTOFF)
  ```

  Write a function called **get_R** that takes a string specifying a pdb file as an argument, extracts $R$ and $R_{free}$ from the file, and then returns them as a tuple of floats `(R,R_free)`.



In [None]:
R_values = get_R("1g9u.pdb")
assert round(R_values[0],2) == 0.17
assert round(R_values[1],2) == 0.23

+ Create a histogram of all $C_{\alpha,i} \rightarrow C_{\alpha,i+1}$ distances in the pdb file `1stn.pdb`, where $i$ counts along residues in the sequence.  (This means the distance between $C_{\alpha}$ for residue 1 and 2, then for 2 and 3, etc.)

Write a function called **center_of_mass** that calculates the center of mass for a pdb file.  It should take a string specifying the pdb file as an argument and return a tuple with the center of mass in (x, y, z).  The center of mass is the average of the coordinates of the protein atoms, weighted by their mass.  The type of each atom is given on the far right of each line of the pdb.  You should use the following masses for the atoms:

|atom | mass     |
|-----|----------|
| `C` | 12.0107  |
| `N` | 14.0067  |
| `O` | 15.9994  |
| `S` | 32.065   |

In [None]:
com = center_of_mass("1stn.pdb")
assert round(com[0],1) == 4.8
assert round(com[1],1) == 22.4
assert round(com[2],1) == 14.5

The `HN` hydrogen atom attached to the `N` is often not in crystal structures because it is invisible in the diffraction experiment used to make the model.   Unfortunately, the `HN` atom coordinates are necessary to calculate things like structural biologists care about like [Ramachandran plots](https://en.wikipedia.org/wiki/Ramachandran_plot).  The missing atom is indicated with the red arrow in the figure below.

![hn_definition](https://raw.githubusercontent.com/harmsm/pythonic-science/master/labs/03_molecular-structure/NH_proton.png)

Write a program that adds missing `HN` atoms and writes them out to a pdb file. 
   + `HN` will be in the plane formed by `C` atom from the *previous* resiude and the `N` and `CA` from its residue. (gray plane in image).  
   + `{"C","N","HN"}` forms an angle of $119\ ^{\circ}$.
   + `{"CA","N","HN"}` forms an angle of $119\ ^{\circ}$.
   + The distance between `N` and `HN` is 1.02 angstrom.
    