# Diferenčno izražanje reakcij
Primerjali bomo pare reakcij (kontrola, utišanje) na podlagi vzorčenih metabolnih pretokov.

Uporabimo lahko npr. test Kolmogorov-Smirnov, ki ne predpostavlja normalne porazdelitve v vzorcih, statistična značilnost razlik pa je ovrednotena s p-vrednostmi.

Dodatno bomo opazovali kako močno se reakcije med vzorci razlikujejo (fold-changes):

$$FC = \frac{\overline{R_{kd}} - \overline{R_{control}}}{\left|\overline{R_{kd}} + \overline{R_{control}}\right|}$$

In [1]:
import pandas as pd
import numpy as np

from scipy.stats import ks_2samp
#import statsmodels.stats.multitest as multi

import os.path

from helpers import bh

### Osnovne nastavitve

In [2]:
require_biomass = True
folder_samples = os.path.join('samples','biomass') if require_biomass else os.path.join('samples','no_biomass')
folder_enrich = os.path.join('enrichment','biomass') if require_biomass else os.path.join('enrichment','no_biomass')

### Branje iz datotek

In [3]:
df_control = pd.read_csv(os.path.join(f'{folder_samples}','samples_control.csv'))
df_kd = pd.read_csv(os.path.join(f'{folder_samples}','samples_kd.csv'))

In [4]:
reactions = sorted(list(set(df_control.columns) | set(df_kd.columns)))
len(reactions) # število reakcij

2282

### Diferenčna aktivnost reakcij

In [5]:
df = pd.DataFrame(columns=['reaction', 'FC', 'p', 'q', 'enrichment', 'changed'])
df['reaction']=reactions

n_samples = df_control.shape[0]

# sprehodimo se čez vse reakcije
for reaction in reactions:
    if reaction in df_control.columns:
        control = df_control[reaction].values
    else:
        # če reakcije ni v kontrolni skupini, ji pripišemo same ničle
        control = np.zeros(n_samples)
        
    if reaction in df_kd.columns:
        kd = df_kd[reaction].values
    else:
        # če reakcije ni v kd skupini, ji pripišemo same ničle
        kd = np.zeros(n_samples)
        
    # iztračunamo sredino za kontrolo in kd
    mean_control = np.mean(control)
    mean_kd = np.mean(kd)
    
    # izračunamo FC - fold change in signifikanco z uporabo 2 sample Kolmogorov-Smirnov testa
    if mean_control != 0 or mean_kd != 0:
        FC = (mean_kd-mean_control)/(abs(mean_kd + mean_control))
        p = ks_2samp(control,kd)[1]
    else:
        FC = 0
        p = 1     
        
    df.loc[df['reaction']==reaction, 'FC'] = FC
    df.loc[df['reaction']==reaction, 'p'] = p
    
    
# korigiramo p vrednosti za večkratno testiranje - FDR korekcija
df['q'] = bh(df['p'])

# signifikanca zahteva vsaj 10-kratno up-/down-regulacijo
df.loc[(df['FC'] >= 0.82) & (df['q'] < 0.05),'enrichment'] = 1
df.loc[(df['FC'] <= -0.82) & (df['q'] < 0.05),'enrichment'] = -1
df.loc[~df['enrichment'].isna(),'changed'] = 1
df = df.fillna(0)
    
    

In [6]:
df.to_csv(f"{folder_enrich}\\reactions.csv", index=False)

In [8]:
df[df.enrichment == -1]

Unnamed: 0,reaction,FC,p,q,enrichment,changed
2,10FTHFtl,-5.179573,0.000000e+00,0.000000e+00,-1,1
3,10FTHFtm,-7.129161,5.205457e-12,5.981296e-12,-1,1
7,24_25DHVITD2t,-1.000000,0.000000e+00,0.000000e+00,-1,1
8,24_25DHVITD2tm,-1.000000,0.000000e+00,0.000000e+00,-1,1
9,24_25DHVITD3t,-1.000000,0.000000e+00,0.000000e+00,-1,1
...,...,...,...,...,...,...
2271,r2510,-100.826726,0.000000e+00,0.000000e+00,-1,1
2272,r2511,-44.191875,0.000000e+00,0.000000e+00,-1,1
2275,r2514e,-485.463099,5.249252e-295,1.133282e-294,-1,1
2277,r2519,-23.927067,1.352908e-72,2.033818e-72,-1,1
