# DNA Sequencing file analysis
Before you start
### Matplotlib extension Requirement
```bash

pip install ipywidgets
pip install ipympl
jupyter nbextension enable --py --sys-prefix widgetsnbextension
jupyter nbextension install --py --sys-prefix ipympl
jupyter nbextension enable --py --sys-prefix ipympl

```

In [1]:
from Bio import SeqIO

In [2]:
record = SeqIO.read("/home/iasst/storage/works/shabiha_dna_16s_seq/1st_BASE_3669174_M45_27F.ab1", "abi")

The data that we are most interested in is in the annotations attribute of the record.

In [3]:
record.annotations.keys()

dict_keys(['sample_well', 'dye', 'polymer', 'machine_model', 'run_start', 'run_finish', 'abif_raw', 'molecule_type'])

Under `abif_raw` is another dictionary of data.

In [4]:
record.annotations["abif_raw"].keys()

dict_keys(['AEPt1', 'AEPt2', 'APFN2', 'APXV1', 'APrN1', 'APrV1', 'APrX1', 'ARTN1', 'ASPF1', 'ASPt1', 'ASPt2', 'AUDT1', 'B1Pt1', 'B1Pt2', 'BCTS1', 'BufT1', 'CTID1', 'CTNM1', 'CTOw1', 'CTTL1', 'CpEP1', 'DATA1', 'DATA2', 'DATA3', 'DATA4', 'DATA5', 'DATA6', 'DATA7', 'DATA8', 'DATA9', 'DATA10', 'DATA11', 'DATA12', 'DCHT1', 'DSam1', 'DySN1', 'Dye#1', 'DyeN1', 'DyeN2', 'DyeN3', 'DyeN4', 'DyeW1', 'DyeW2', 'DyeW3', 'DyeW4', 'EPVt1', 'EVNT1', 'EVNT2', 'EVNT3', 'EVNT4', 'FTab1', 'FVoc1', 'FWO_1', 'Feat1', 'GTyp1', 'HCFG1', 'HCFG2', 'HCFG3', 'HCFG4', 'InSc1', 'InVt1', 'LANE1', 'LAST1', 'LIMS1', 'LNTD1', 'LsrP1', 'MCHN1', 'MODF1', 'MODL1', 'NAVG1', 'NLNE1', 'NOIS1', 'P1AM1', 'P1RL1', 'P1WD1', 'P2AM1', 'P2BA1', 'P2RL1', 'PBAS1', 'PBAS2', 'PCON1', 'PCON2', 'PDMF1', 'PDMF2', 'PLOC1', 'PLOC2', 'PSZE1', 'PTYP1', 'PXLB1', 'RGNm1', 'RGOw1', 'RMXV1', 'RMdN1', 'RMdV1', 'RMdX1', 'RPrN1', 'RPrV1', 'RUND1', 'RUND2', 'RUND3', 'RUND4', 'RUNT1', 'RUNT2', 'RUNT3', 'RUNT4', 'Rate1', 'RunN1', 'S/N%1', 'SCAN1', 'SMED

According to the ABI data specification (page 40) (http://www.appliedbiosystem.com/support/software_community/ABIF_File_Format.pdf), all of the data necessary for the traces that are conventionally displayed are in the DATA9 through DATA12 channels. We can grab these channels out programmatically. It isn’t stated clearly, though, what letters correspond to which colour, and hence which exact channel.

In [5]:
from collections import defaultdict

channels = ["DATA9", "DATA10", "DATA11", "DATA12"]
trace = defaultdict(list)
for c in channels:
    trace[c] = record.annotations["abif_raw"][c]

Now, it’s possible to plot them on a matplotlib figure.

In [6]:
%matplotlib widget

In [15]:
import matplotlib.pyplot as plt

#fig = plt.figure()
fig = plt.figure(figsize=(9, 4))

#plt.rc('xtick',labelsize=30)
#plt.rc('ytick',labelsize=30)

plt.plot(trace["DATA9"], color="blue")
plt.plot(trace["DATA10"], color="red")
plt.plot(trace["DATA11"], color="green")
plt.plot(trace["DATA12"], color="orange") #yellow




Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

[<matplotlib.lines.Line2D at 0x7ff70b878a00>]

Dividing into facets matplotlib figure.

In [20]:
import seaborn as sns

sns.set()
sns.set_style("white")

#f, axes = plt.subplots(4, 1, figsize=(40, 10), sharex=True)
#fig = plt.figure(figsize=(40, 10))

plt.rc('xtick',labelsize=10)
plt.rc('ytick',labelsize=10)

#plt.plot(trace["DATA9"], color="blue")
#plt.plot(trace["DATA10"], color="red")
#plt.plot(trace["DATA11"], color="green")
#plt.plot(trace["DATA12"], color="yellow")

f, axes = plt.subplots(4, 1, figsize=(9, 8), sharex=True)
sns.lineplot(data=trace["DATA9"], color="royalblue", ax=axes[0]) # royalblue
sns.lineplot(data=trace["DATA10"], color="crimson", ax=axes[1]) # red
sns.lineplot(data=trace["DATA11"], color="darkgreen", ax=axes[2]) # darkgreen
sns.lineplot(data=trace["DATA12"], color="orange", ax=axes[3]) # yellow


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

<AxesSubplot:>