# Results
## • Number of records in reference plasmids

| File Name                | Number of Records |
| -----------------------  | ----------------- |
| plasmid.1.1.genomic.fna  | **4597**          |
| plasmid.2.1.genomic.fna  | **3234**          |  
| plasmid.3.1.genomic.fna  | **2524**          | 
| plasmid.4.1.genomic.fna  | **3023**          | 
| plasmid.5.1.genomic.fna  | **1698**          | 

In [None]:
total_number_of_records = 4597+3234+2524+3023+1698
print("Total number of records are",total_number_of_records)

## • Aligning reads with references

• We analyze our reads with 15076 reference plasmids using *Burrows-Wheeler Aligner MEM* algortihm and *Burrows-Wheeler Aligner ALN* algortihm.

### • Results of the Bwa-mem
• Considering two different libraries which is called F5 and F20 in this case, 336 of 15076 references are aligned with more than 1000 reads.  

In [None]:
import pandas as pd 
import numpy as np

def highlight_max(s):
    '''
    highlight the maximum in a Series yellow.
    '''
    is_max = s == s.max()
    return ['background-color: yellow' if v else '' for v in is_max]

df=pd.read_csv("Files/csv/bwa-mem-all.csv")
df

• NZ_LS997974.1 Salmonella enterica subsp. enterica serovar Typhimurium strain D23580 genome assembly, plasmid: D23580_liv_pSLT-BT have most mapped reads with 66036.

• If we have a look F5 library, we will see that plasmid.2.1.genomic.fna.gz has most mapped references with **48** records.

In [None]:
a = df.groupby(["Refseq","Library"]).size().to_frame()
a.style.apply(highlight_max)

###  • Results of the Bwa-aln
• Considering two different libraries which is called F5 and F20 in this case, 297 of 15076 references are aligned with more than 1000 reads.  

In [None]:
df1=pd.read_csv("Files/csv/bwa-aln-all.csv")
df1

• NZ_LS997974.1 Salmonella enterica subsp. enterica serovar Typhimurium strain D23580 genome assembly, plasmid: D23580_liv_pSLT-BT have most mapped reads with 60557.

• If we have a look F5 library, we will see that plasmid.2.1.genomic.fna.gz has most mapped references with **41** records.

In [None]:
b = df1.groupby(["Refseq","Library"]).size().to_frame()
b.style.apply(highlight_max)

### • Breadth of Coverage

• NZ_CP027357.1 Escherichia coli strain 2013C-4991 plasmid unnamed2 have most coverage on refence genome with % 94.

In [None]:
df2=pd.read_csv("Files/coverage/bwa-mem-plasmidcoverageallsummarysorted.csv")
df2.style.apply(highlight_max)

In [None]:
import matplotlib.pyplot as plt
import matplotlib.cm as cm

x = np.arange(len(df2))
colors = cm.rainbow(np.linspace(0, 1, len(x)))

# For ggplot style plotting
plt.style.use('ggplot')

# For bigger output
plt.figure(figsize=(15,7))
plt.scatter(df2["Breadth of Coverage"],df2["Accession"],s=200,c=colors,marker="D",linewidths=8)

# For axis labels fontsize and rotation
plt.xticks(rotation='vertical',fontsize="10")
plt.yticks(fontsize="10")

# For Main Title
plt.title("Breadth of Coverage", fontsize=20)

# For Axes title
plt.xlabel('Percentage of Coverage', fontsize="15")
plt.ylabel('Accessions', fontsize="15")


plt.show()

In [None]:
y= df2["Breadth of Coverage"].tolist()
x= np.arange(len(df2["Breadth of Coverage"]))

fig, ax = plt.subplots()
title = 'Breadth of Coverage'
ax.plot(x,y)
ax.set_xticks(x)
ax.set_xticklabels(df2["Accession"].tolist(), rotation=90)

# For Main Title
plt.title(title, fontsize=15)

# For Axes title
ax.set_ylabel('Percentage of Coverage', fontsize="12")
ax.set_xlabel('Accessions',fontsize="12")

plt.show()