# Custom normalizer

The `Dataset.normalize_by_gene()` method can take either one of two string options (`"z_score"` or `"standard_scale"`) or function as an argument. Therefore a user can define a custom normaliser, which takes the `sample` x `gene` array as an argument and returns an array of the same dimension. Let's see this in action. 

Here we will apply a robust z-score per gene as described by [Lipinski et al](https://www.pnas.org/content/pnas/suppl/2012/11/30/1209673109.DCSupplemental/pnas.201209673SI.pdf).

In [1]:
def robust_z_score(X, min_mad=0.1):
    median_subtracted = X - np.median(X, axis=0) # median expression for each gene
    median_deviations = np.abs(median_subtracted)
    
    mads = np.median(median_deviations, axis=0)
    mads = np.clip(mads, a_min=min_mad, a_max=None)
    
    # Multiply values by 1.4826 to make MAD comparable to SD
    # (https://en.wikipedia.org/wiki/Median_absolute_deviation)
    zscore = median_subtracted / (mads * 1.4826)
    return zscore

Next we can create a `Dataset` with some fake expression profiles:

In [2]:
from deep_lincs import Dataset
import pandas as pd
import numpy as np
np.random.seed(42)

gene_meta_df = pd.DataFrame(
    {"name": ["Gene A", "Gene B", "Gene C"]}, 
    index=pd.Index(list('ABC'), name="gene_id")
)
sample_meta_df = pd.DataFrame(
    {"cell_id": ["cell_A", "cell_A", "cell_B", "cell_C"]}, 
    index=pd.Index(list('wxyz'), name="inst_id")
)
data_df = pd.DataFrame(
    np.random.rand(12).reshape(-1,3), 
    columns=gene_meta_df.index.values, 
    index=sample_meta_df.index
)

dataset = Dataset.from_dataframes(data_df, sample_meta_df, gene_meta_df)
dataset

<L1000 Dataset: (samples: 4, genes: 3)>

Now we can pass our custom normalizer as as argument, and voilà, the data are robust z-score normalized!

In [3]:
dataset.normalize_by_gene(robust_z_score)
dataset.data

Unnamed: 0_level_0,A,B,C
inst_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
w,-0.453227,0.746242,0.239365
x,0.453227,-0.60274,-1.867531
y,-1.733144,0.60274,-0.239365
z,0.895755,-0.832637,1.109616


In [4]:
dataset.plot_meta_counts()

TypeError: barplot() got an unexpected keyword argument 'x'