# Diferenčno izražanje reakcij
Primerjali bomo pare reakcij (kontrola, utišanje) na podlagi vzorčenih metabolnih pretokov.

Uporabimo lahko npr. test Kolmogorov-Smirnov, ki ne predpostavlja normalne porazdelitve v vzorcih, statistična značilnost razlik pa je ovrednotena s p-vrednostmi.

Dodatno bomo opazovali kako močno se reakcije med vzorci razlikujejo (fold-changes):

$$FC = \frac{\overline{R_{kd}} - \overline{R_{control}}}{\left|\overline{R_{kd}} + \overline{R_{control}}\right|}$$

In [3]:
import pandas as pd
import numpy as np

from scipy.stats import ks_2samp
#import statsmodels.stats.multitest as multi

import os.path

from helpers import bh

### Basic settings

In [13]:
require_biomass = True
folder_samples = os.path.join('samples','biomass') if require_biomass else os.path.join('samples','no_biomass')
folder_enrich = os.path.join('enrichment','biomass') if require_biomass else os.path.join('enrichment','no_biomass')

### Read the data

In [14]:
df_control = pd.read_csv(os.path.join(f'{folder_samples}','samples_control.csv'))
df_kd = pd.read_csv(os.path.join(f'{folder_samples}','samples_kd.csv'))

In [15]:
reactions = sorted(list(set(df_control.columns) | set(df_kd.columns)))
len(reactions) # število reakcij

2282

### Diferenčna aktivnost reakcij

(?) Mogoče malo več anotacije spodnje kode? Ne vem, koliko bodo razumeli brez nje (razen, če boste šli skupaj po korakih).

In [16]:
df = pd.DataFrame(columns=['reaction', 'FC', 'p', 'q', 'enrichment', 'changed'])
df['reaction']=reactions

n_samples = df_control.shape[0]
for reaction in reactions:
    if reaction in df_control.columns:
        control = df_control[reaction].values
    else:
        control = np.zeros(n_samples)
        
    if reaction in df_kd.columns:
        kd = df_kd[reaction].values
    else:
        kd = np.zeros(n_samples)
        
    
    mean_control = np.mean(control)
    mean_kd = np.mean(kd)
    
    if mean_control != 0 or mean_kd != 0:
        FC = mean_kd-mean_control/(abs(mean_kd + mean_control))
        p = ks_2samp(control,kd)[1]
    else:
        FC = 0
        p = 1     
        
    df.loc[df['reaction']==reaction, 'FC'] = FC
    df.loc[df['reaction']==reaction, 'p'] = p
    
df['q'] = bh(df['p'])
df.loc[(df['FC'] >= 0.82) & (df['q'] < 0.05),'enrichment'] = 1
df.loc[(df['FC'] <= -0.82) & (df['q'] < 0.05),'enrichment'] = -1
df.loc[~df['enrichment'].isna(),'changed'] = 1
df = df.fillna(0)
    
    

In [17]:
df.to_csv(f"{folder_enrich}\\reactions.csv", index=False)

In [27]:
df[df.enrichment == 1]

Unnamed: 0,reaction,FC,p,q,enrichment,changed
0,10FTHF5GLUtl,5.179573,0.000000e+00,0.000000e+00,1,1
1,10FTHF5GLUtm,5.179573,0.000000e+00,0.000000e+00,1,1
4,12DHCHOLabc,3.125442,3.094078e-138,5.473400e-138,1,1
5,12DHCHOLt2,3.125442,3.094078e-138,5.473400e-138,1,1
6,12PPDRte,1.000000,8.201490e-322,1.818162e-321,1,1
...,...,...,...,...,...,...
2274,r2513,21.711914,0.000000e+00,0.000000e+00,1,1
2276,r2515,7.538761,0.000000e+00,0.000000e+00,1,1
2278,r2521,144.808185,0.000000e+00,0.000000e+00,1,1
2280,r2537,94.918610,0.000000e+00,0.000000e+00,1,1
