# Investigating well position effect
**Author:** Jessica Ewald <br>

The purpose of this notebook is to quantify well position effect using two plates with 48X repeated ALK WT and one ALK VAR (a positive control). Previously, classifiers were constructed between cells in each WT versus each VAR well pair, between each possible pair of VAR wells and each possible pair of WT pairs. The WT-WT and VAR-VAR comparisons can be interrogated to get an estimate of well position effect across different positions. The WT-VAR comparisons will give an idea of the size of the morphological perturbation relative to the well position effect. 

In [1]:
# Imports
import pathlib
import polars as pl
import pandas as pd
import numpy as np
import os

import black
import jupyter_black

jupyter_black.load(
    lab=False,
    line_length=79,
    verbosity="DEBUG",
    target_version=black.TargetVersion.PY310,
)

import warnings
warnings.filterwarnings("ignore")

DEBUG:jupyter_black:config: {'line_length': 79, 'target_versions': {<TargetVersion.PY310: 10>}}


<IPython.core.display.Javascript object>

In [6]:
result_dir = pathlib.Path(f'/dgx1nas1/storage/data/jess/varchamp/sc_data/classification_results/B4A3R1/ALK_WT_VAR')
files = os.listdir(result_dir)
files

['B4A3R1_non_protein_REF_control_feat_importance_normalized_feature_selected.csv',
 'B4A3R1_non_protein_feat_importance_normalized_feature_selected.csv',
 'B4A3R1_protein_VAR_control_feat_importance_normalized_feature_selected.csv',
 'B4A3R1_non_protein_VAR_control_feat_importance_normalized_feature_selected.csv',
 'B4A3R1_non_protein_REF_control_f1score_normalized_feature_selected.csv',
 'B4A3R1_non_protein_VAR_control_f1score_normalized_feature_selected.csv',
 'B4A3R1_protein_REF_control_feat_importance_normalized_feature_selected.csv',
 'B4A3R1_protein_REF_control_f1score_normalized_feature_selected.csv',
 'B4A3R1_protein_VAR_control_f1score_normalized_feature_selected.csv',
 'B4A3R1_non_protein_f1score_normalized_feature_selected.csv',
 'B4A3R1_protein_f1score_normalized_feature_selected.csv',
 'B4A3R1_protein_feat_importance_normalized_feature_selected.csv']

In [13]:
df = pd.read_csv(f'{result_dir}/{files[10]}')

From a quick examination of the results, we see that cells in all pairs of wells are almost perfectly separable. Since the signal detected by our classifier is saturated, we will have to either make the classification task more difficult, or compute distance in some other way (perhaps directly: Euclidean, cosine, etc, with and without PCA transformation). That way we can at least assess the well position effect relative to the genetic perturbation.