## Problem

Due to natural abundance of various isotopes (ie - <sup>13</sup>C), as well as any modications that may occur duing chromatographic separation (we won't cover this in this notebook), we'll have to normalize our ```Fractional Abundance```.  Once we successfully perform this correction, we can move forward to evaluate the ```Fractional Enrichment``` of a particular metabolite, which, if regresses strongly to the test condition, can yield light to possible pathways that may be affected.

There are two methods to normalize this data:

1. Labeled data corrected via unlabeled dataset.
2. Labeled data corrected via theoretical natural abundance estimates. 

As we have no unlabeled datasets, we'll have to go with option 2 in order to perform the Isotopologue Correction.  

We'll assume 1.07% Natural Abundance for <sup>13</sup>C<sup>[7]</sup>.

$CM * I_{corr} = I_{meas}$

Where:

- $CM$ is the ```Correction Matrix```
- $I_{corr}$ is the ```Corrected Matrix Distribution Vector```
- $I_{meas}$ is the ```Measured Matrix Distribution Vector```

$I_{meas} = (I_{0}, I_{1}, I_{2}, ... I_{n})$

In [None]:
$l_{n} = 

$FA_{I_{n}} =\frac{l_{m_{k}}}{\sum_{k=0}^n I_{m_{k}}}$

We're interested in computing the Correction Matrix

## Import Libraries

In [2]:
import csv
import numpy as np
import pandas as pd
import os
import timeit

## Let's define some core functions.

In [None]:
def calculate_fractional_abundance()
    '''
    
    '''

In [3]:
def binomial(n, k):
    """
    A fast way to calculate binomial coefficients
    """
    if 0 <= k <= n:
        ntok = 1
        ktok = 1
        for t in range(1, min(k, n - k) + 1):
            ntok *= n
            ktok *= t
            n -= 1
        return ntok // ktok
    else:
        return 0

In [None]:
def calculate_number_of_isotopomers(num_carbon):
    
    num_iso = 0
    
    for x in range(num_carbon+1):
        num_iso += binom(num_carbon, x)
    
    return num_iso

In [None]:
def generate_correction_matrix()


In [None]:
def generate_mvd_measured()


In [None]:
def perform_isotopologue_correction()

In [None]:
def calculate_fractional_enrichment()

In [None]:
def calculate_pool_totals()

In [None]:
def calculate_fractional_contribution()

## Let's define some utility functions

In [None]:
def perform_isotopologue_correction(data, unlabeled):
    
    # average the unlabeled data by column
    averages = np.average(unlabeled, axis=0).tolist() 

    diagonal_matrix = []
    
    num_rows = len(averages)
    
    # Make a copy of everything in averages in new list
    # Add zeros at the front to make values sit on diagonal
    # Slice the end to make it square
    for row_number in range(num_rows):
        averages_copy = list(averages)
        averages_zeros  = [0] * row_number + averages_copy
        averages_sliced = averages_zeros[:num_rows]
        diagonal_matrix.append(averages_sliced)

    diagonal_matrix = np.array(diagonal_matrix)
    #print(diagonal_matrix)

    inverse = np.linalg.inv(diagonal_matrix)
    normalised = np.dot(data, inverse)

    # Numpy vector where <n>th element is the sum of row <n>
    data_rows = len(data)
    #print(data_rows)
    row_sums = np.sum(normalised, axis=1)
    for row_number in range(data_rows):
        normalised[row_number, :] *= 100/row_sums[row_number]
    return normalised 


In [None]:
def prepare_data_for_analysis(all_data_input):
    # Read in data file line by line
    data = []
    all_data_input = all_data_input.replace(',', '\t')
    for line in all_data_input.split('\n'):
        # If the line is a whitespace error from excel ignore it
        if line.isspace():
            continue
        #strip line to deal with trailing commas
        strip_line = line.rstrip('\t')
        
        data_line = []
        for str_float in strip_line.split('\t'):
            if not str_float.isspace():
                data_line.append(float(str_float))
        data.append(data_line)
    data = np.array(data)
    return data

In [None]:
def prepare_unlabeled_for_analysis(user_unlabeled_data):
    unlabeled = np.array(
        list(csv.reader(user_unlabeled_data.split('\n'), delimiter=",")),
    ).astype(np.float)
    return unlabeled



In [None]:
# Read all files in and identify keys e.g.
#    if there are files like:
#      [set-a_unlabeled, set-b_unlabeled, set-a_data, set-b_data]
#    produce
#      [set-a, set-b]
job_keys = set()
filenames = os.listdir()
for filename in filenames:
    basename = os.path.splitext(filename)[0]
    # Find just the basename
    basename = basename.replace('_unlabeled', '').replace('_data', '')
    job_keys.add(basename)



In [None]:
# For all the possible job keys e.g. set-a, set-b, 3hb-coa, etc
#    open the files, get the data and run job
#    ignore job keys from random files that do not have the _unlabeled and _data
valid_jobs = set()
for job_key in job_keys:
    # Check that there exists <job_key>_unlabeled AND <job_key>_data
    unlabeled_fname = '{0}_unlabeled.csv'.format(job_key)
    data_fname = '{0}_data.csv'.format(job_key)
    # TODO maybe check for both .txt and .csv above for robustness
    if unlabeled_fname in filenames and data_fname in filenames:
        valid_jobs.add(job_key)



In [None]:
# For each job  in valid_jobs, load into numpy, do analyis, write result
for job_key in valid_jobs:
    unlabeled_fname = '{0}_unlabeled.csv'.format(job_key)
    data_fname = '{0}_data.csv'.format(job_key)

    # Call function from tracerutils to prepare unlabeled data from CSV (or web)
    # Cleans up trailing commas, non numbers, etc
    text_from_unlabeled_file = open(unlabeled_fname).read()
    unlabeled = prepare_unlabeled_for_analysis(text_from_unlabeled_file)

    # Call function from tracerutils to prepare data from CSV (or web)
    # for analysis. Needs to strip trailing commas, fix non numbers etc.
    text_from_data_file = open(data_fname).read()    
    data = prepare_data_for_analysis(text_from_data_file)

    #print('averages:', averages)
    print('data:', data)

    result = perform_isotopologue_correction(data, unlabeled)
    
    # print to a file
    output_fname = '{0}_output.csv'.format(job_key)
    np.savetxt(output_fname, result, fmt='%.1f', delimiter=',')