In [9]:
#http://readiab.org/book/latest/3/1

'''From a bioinformatics perspective, studying biological diversity is centered around a few key pieces of information:

A table of the frequencies of certain biological features (e.g., species or OTUs) on a per sample basis.
Sample metadata describing exactly what each of the samples is, as well as any relevant technical information.
Feature metadata describing each of the features. This can be taxonomic information, for example, but we'll come back to this when we discuss features in more detail (this will be completed as part of #105).
Optionally, information on the relationships between the biological features, typically in the form of a phylogenetic tree where tips in the tree correspond to OTUs in the table.'''
%matplotlib inline
import numpy as np
import pandas as pd

df= pd.read_csv('/home/erika/Desktop/likeliest_match_mz.csv')
newdf = pd.read_csv('/home/erika/Desktop/likeliest_match_abspres.csv')

averagetable = newdf.groupby(['Slope', 'Depth'])['n_peaks']
averagetable= averagetable.agg(['mean'])
averagetable.style \
  .format('{:.2f}') \
  .bar(align='left', color=['#0c750b', '#266352']) \
  .set_caption('masses') \
  .set_properties(padding="15px", border='3px solid black', width='200px')



Unnamed: 0_level_0,Unnamed: 1_level_0,mean
Slope,Depth,Unnamed: 2_level_1
1S,05,4977.0
1S,15,5710.12
1S,30,5187.62
1S,60,5002.12
2B,05,5580.0
2B,15,5775.88
2B,30,5153.25
2B,60,4675.5
3F,05,5943.0
3F,15,5552.75


In [3]:
'''
The first metric that we'll look at is a quantitative non-phylogenetic  β  diversity metric called Bray-Curtis. The Bray-Curtis dissimilarity between a pair of samples,  j  and  k , is defined as follows:

BCjk=∑i|Xij−Xik|∑i(Xij+Xik) 
i  : feature (e.g., OTUs)

Xij  : frequency of feature  i  in sample  j 
Xik  : frequency of feature  i  in sample  k 
This could be implemented in python as follows:
'''

def bray_curtis_distance(table, sample1_id, sample2_id):
    numerator = 0
    denominator = 0
    sample1_counts = table[sample1_id]
    sample2_counts = table[sample2_id]
    for sample1_count, sample2_count in zip(sample1_counts, sample2_counts):
        numerator += abs(sample1_count - sample2_count)
        denominator += sample1_count + sample2_count
    return numerator / denominator