# Usage examples

## 02_analyse.ipynb



This notebook describes the detailed documentation and instructions for the main  **NileRedQuant** `nileredquant.analyse` module.


**What this notebook covers:**
   - Blank subtraction functionality & variations
   - Background Fluorescence subtraction
   - Lipid signal computation
   - An example of an automated analysis using *`analyse()`*
   - Demo: *`analyse.subtract_background_absorbance()`*, *`analyse.get_fluorescence_signal()`*, *`analyse.signal_biomass_normalisation()`* and *`analyse.analyse()`*.
   
____

**Table of Contents**

- [Blank Subtraction](#Blank-Subtraction)
    - [Subtracting blank absorbance with one numeric value](#Subtracting-blank-absorbance-with-one-numeric-value)
    - [Subtracting blank absorbance using a list or tuple of numeric values per Condition](#Subtracting-blank-absorbance-using-a-list-or-tuple-of-numeric-values-per-Condition)
    - [Subtracting blank absorbance with automatic blank well detection](#Subtracting-blank-absorbance-with-automatic-blank-well-detection)
- [Background Fluorescence Intensity Subtraction](#Background-Fluorescence-Intensity-Subtraction)
- [Lipid signal: Biomass Normalised Fluorescence (RFU)](#Lipid-signal:-Biomass-Normalised-Fluorescence-(RFU))
- [Automated Analysis Workflow](#Automated-Analysis-Workflow)


In [1]:
# importing the tool's utilis & analyse module

from nileredquant import utils, analyse
import pandas as pd

In [2]:
# Read organised data

data = utils.read_file("./data_example_long.csv")
data

Unnamed: 0_level_0,Strain,Condition,Abs,FI_bg,FI_fp
Well,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
A1,CS1,Condition1,0.555612,36,6008.844
A2,Blank,Condition1,0.079700,21,317.000
A3,CS2,Condition1,0.523681,36,7592.142
A4,CS3,Condition1,0.523782,33,9335.587
A5,CS4,Condition1,0.671698,46,14251.460
...,...,...,...,...,...
H8,S3,Condition2,0.403269,38,17995.150
H9,S4,Condition2,0.453199,31,23412.430
H10,S5,Condition2,0.406459,29,18209.960
H11,S6,Condition2,0.426239,28,20863.380


In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 96 entries, A1 to H12
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Strain     96 non-null     object 
 1   Condition  96 non-null     object 
 2   Abs        96 non-null     float64
 3   FI_bg      96 non-null     int64  
 4   FI_fp      96 non-null     float64
dtypes: float64(2), int64(1), object(2)
memory usage: 4.5+ KB


In [4]:
data.FI_fp = data.FI_fp.round(0).astype("int64")

# Blank Subtraction

For the background absorbance subtraction (aka *blank subtraction*), 3 options are implemented:

1. Subtracting blank absorbance with **one numeric value** (float or string of a float) **from all instances in the data, regardless of the `Condition` variable**. Useful when cultivation mediums have approximately the same background absorbance.  

&NewLine;

2. Subtracting blank absorbance *per Condition* using a **list or tuple of numeric values per condition**. **Note**:The order of the values should follow the order of first occurrence of the condition in data (should be floats).   

&NewLine;  
 
3. Subtracting blank absorbance *per Condition* using a **designated string label in *`Strain`* column for automatic blank well detection from the layout**. In case of automatic blank well detection from plate layout, an averge value is computed if several replicates per *Condition* are provided. The code also includes a check for unusually high blank absorbance, which can indicate contamination. The default threshold is set to 0.2, but can be optionally provided as a float. 

&NewLine;  

The recommended approach is to integrate blank wells into the plate layout and automatically detect blank wells per condition. This approach is also implemented in the [automated analysis workflow](#Automated-Analysis-Workflow), as  this encourages the user to add the blanks to the experiments. 

The output of this function entails 2 data frames and the blank value(s) used in the experiment. For the first two approaches, the two data frames are identical, representing the copy of the original data with the added `Absorbance` column. For the 3rd case, the data frames are NOT identical. The first data frame represents the copy of the original data with added `Absorbance` column and still entails the blank wells/labels. In the second data frame, these blank wells/labels are removed and represents data without blanks (*data_wo_blanks*). 



&NewLine;  

**! Note !**
The input raw absorbance column without blank subtraction should **not be named `'Absorbance'`**, as this column name is reserved for the output column that gets created. The default recognized column name is `'Abs'`, but any other name can be used. See API documentation and details in *00_input_format_utils.ipynb*.


The blank subtraction is computed as follows:

$$ Absorbance = Abs - Abs_{blank} $$ 



###  Subtracting blank absorbance with one numeric value

In [5]:
# The numeric value is a string of a float

data_all, data_wo_blanks, blank_data = analyse.subtract_background_absorbance(
    data=data, 
    blanks='0.08025', 
    contamination_thr=0.2
)

print(f"Blank data used:\n {blank_data}")

# Data with blank absorbance subtraction - `Absorbance` variable generated
data_all

Blank data used:
 0.08025


Unnamed: 0_level_0,Strain,Condition,Abs,FI_bg,FI_fp,Absorbance
Well,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A1,CS1,Condition1,0.555612,36,6009,0.4754
A2,Blank,Condition1,0.079700,21,317,-0.0006
A3,CS2,Condition1,0.523681,36,7592,0.4434
A4,CS3,Condition1,0.523782,33,9336,0.4435
A5,CS4,Condition1,0.671698,46,14251,0.5914
...,...,...,...,...,...,...
H8,S3,Condition2,0.403269,38,17995,0.3230
H9,S4,Condition2,0.453199,31,23412,0.3729
H10,S5,Condition2,0.406459,29,18210,0.3262
H11,S6,Condition2,0.426239,28,20863,0.3460


Two identical data frames get created, with same shape. 

In [6]:
data_wo_blanks.shape

(96, 6)

In [7]:
data_all.shape

(96, 6)

In [8]:
# The numeric value is a float

data_all, data_wo_blanks, blank_data = analyse.subtract_background_absorbance(
    data=data, 
    blanks=0.08025, 
    contamination_thr=0.2
)

print(f"Blank data used:\n {blank_data}")

# Data with blank absorbance subtraction - `Absorbance` variable generated
data_all

Blank data used:
 0.08025


Unnamed: 0_level_0,Strain,Condition,Abs,FI_bg,FI_fp,Absorbance
Well,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A1,CS1,Condition1,0.555612,36,6009,0.4754
A2,Blank,Condition1,0.079700,21,317,-0.0006
A3,CS2,Condition1,0.523681,36,7592,0.4434
A4,CS3,Condition1,0.523782,33,9336,0.4435
A5,CS4,Condition1,0.671698,46,14251,0.5914
...,...,...,...,...,...,...
H8,S3,Condition2,0.403269,38,17995,0.3230
H9,S4,Condition2,0.453199,31,23412,0.3729
H10,S5,Condition2,0.406459,29,18210,0.3262
H11,S6,Condition2,0.426239,28,20863,0.3460


### Subtracting blank absorbance using a list or tuple of numeric values per Condition 

In [9]:
# Retrieve the values from example, synthetic data

blanks = data[data.Strain == 'Blank'].groupby(['Condition', 'Strain']).mean().Abs.to_list()
blanks

[0.08025, 0.14775]

In [10]:
# Define blanks list using the values from above 
# We would usually start here

blanks = [0.08025, 0.14775]

We know we have 2 conditions in the `Condition` column: 'Condition1' and 'Conditin2'. We also see that the 'Condition1' occurs before 'Condition2' in the data, therefore the order of the values is:

        [0.08025, 0.14775]

representing:

        ['Condition1', 'Condition2']


In [11]:
data_all, data_wo_blanks, blank_data = analyse.subtract_background_absorbance(
    data=data, 
    blanks=blanks, 
    contamination_thr=0.2
)

print(f"Blank data used:\n {blank_data}")

# Data with blank absorbance subtraction - `Absorbance` variable generated
data_all

Blank data used:
 Condition1    0.08025
Condition2    0.14775
dtype: float64


Unnamed: 0_level_0,Strain,Condition,Abs,FI_bg,FI_fp,Absorbance
Well,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A1,CS1,Condition1,0.555612,36,6009,0.4754
A2,Blank,Condition1,0.079700,21,317,-0.0006
A3,CS2,Condition1,0.523681,36,7592,0.4434
A4,CS3,Condition1,0.523782,33,9336,0.4435
A5,CS4,Condition1,0.671698,46,14251,0.5914
...,...,...,...,...,...,...
H8,S3,Condition2,0.403269,38,17995,0.2555
H9,S4,Condition2,0.453199,31,23412,0.3054
H10,S5,Condition2,0.406459,29,18210,0.2587
H11,S6,Condition2,0.426239,28,20863,0.2785


In [12]:
# The blanks can also be stored in a tuple

blanks_tuple = tuple(blanks)

In [13]:
data_all, data_wo_blanks, blank_data = analyse.subtract_background_absorbance(
    data=data, 
    blanks=blanks_tuple, 
    contamination_thr=0.2
)

print(f"Blank data used:\n {blank_data}")

# Data with blank absorbance subtraction - `Absorbance` variable generated
data_all

Blank data used:
 Condition1    0.08025
Condition2    0.14775
dtype: float64


Unnamed: 0_level_0,Strain,Condition,Abs,FI_bg,FI_fp,Absorbance
Well,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A1,CS1,Condition1,0.555612,36,6009,0.4754
A2,Blank,Condition1,0.079700,21,317,-0.0006
A3,CS2,Condition1,0.523681,36,7592,0.4434
A4,CS3,Condition1,0.523782,33,9336,0.4435
A5,CS4,Condition1,0.671698,46,14251,0.5914
...,...,...,...,...,...,...
H8,S3,Condition2,0.403269,38,17995,0.2555
H9,S4,Condition2,0.453199,31,23412,0.3054
H10,S5,Condition2,0.406459,29,18210,0.2587
H11,S6,Condition2,0.426239,28,20863,0.2785


In [14]:
data_all.shape

(96, 6)

In [15]:
data_wo_blanks.shape

(96, 6)

Again, 2 identical data frames get created, with same shape. 

### Subtracting blank absorbance with automatic blank well detection 

In [16]:
# 'Blank' value in `Strain` column representing as Blank wells per condition

data_all, data_wo_blanks, blank_data = analyse.subtract_background_absorbance(
    data=data, 
    blanks='Blank',  # <- provide the label value - can be any string
    contamination_thr=0.4
)

In [17]:
# Data with blank absorbance subtraction - `Absorbance` variable generated
data_all

Unnamed: 0_level_0,Strain,Condition,Abs,FI_bg,FI_fp,Absorbance
Well,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A1,CS1,Condition1,0.555612,36,6009,0.4754
A2,Blank,Condition1,0.079700,21,317,-0.0006
A3,CS2,Condition1,0.523681,36,7592,0.4434
A4,CS3,Condition1,0.523782,33,9336,0.4435
A5,CS4,Condition1,0.671698,46,14251,0.5914
...,...,...,...,...,...,...
H8,S3,Condition2,0.403269,38,17995,0.2555
H9,S4,Condition2,0.453199,31,23412,0.3054
H10,S5,Condition2,0.406459,29,18210,0.2587
H11,S6,Condition2,0.426239,28,20863,0.2785


In [18]:
# Data frame with remowed blank wells - smaller
data_wo_blanks

Unnamed: 0_level_0,Strain,Condition,Abs,FI_bg,FI_fp,Absorbance
Well,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A1,CS1,Condition1,0.555612,36,6009,0.4754
A3,CS2,Condition1,0.523681,36,7592,0.4434
A4,CS3,Condition1,0.523782,33,9336,0.4435
A5,CS4,Condition1,0.671698,46,14251,0.5914
A6,S1,Condition1,0.662686,37,10606,0.5824
...,...,...,...,...,...,...
H8,S3,Condition2,0.403269,38,17995,0.2555
H9,S4,Condition2,0.453199,31,23412,0.3054
H10,S5,Condition2,0.406459,29,18210,0.2587
H11,S6,Condition2,0.426239,28,20863,0.2785


In [19]:
# Shape of data after blank wells removed

data_wo_blanks.shape

(88, 6)

In [20]:
# Which values were used as Blanks per Condition
blank_data

Condition
Condition1    0.08025
Condition2    0.14775
Name: Abs, dtype: float64

In [21]:
# Example with contamination threshold left as default 0.2

data_all, data_wo_blanks, blank_data = analyse.subtract_background_absorbance(
    data=data, 
    blanks='Blank',  # <- provide the label value - can be any string
)
data_wo_blanks

  data_all, data_wo_blanks, blank_data = analyse.subtract_background_absorbance(


Unnamed: 0_level_0,Strain,Condition,Abs,FI_bg,FI_fp,Absorbance
Well,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A1,CS1,Condition1,0.555612,36,6009,0.4754
A3,CS2,Condition1,0.523681,36,7592,0.4434
A4,CS3,Condition1,0.523782,33,9336,0.4435
A5,CS4,Condition1,0.671698,46,14251,0.5914
A6,S1,Condition1,0.662686,37,10606,0.5824
...,...,...,...,...,...,...
H8,S3,Condition2,0.403269,38,17995,0.3063
H9,S4,Condition2,0.453199,31,23412,0.3562
H10,S5,Condition2,0.406459,29,18210,0.3095
H11,S6,Condition2,0.426239,28,20863,0.3292


Because a possible contamination in blank wells was detected, the wells with unusually high absorbance values (>= threshold) were not considered for blank subtraction. 

In [22]:
# Blank values used per condition
blank_data

Condition
Condition1    0.08025
Condition2    0.09700
Name: Abs, dtype: float64

# Background Fluorescence Intensity Subtraction


As discussed in the accompanied manuscript, some cultivation media (e.g. YPD) have a very high background fluorescence on default, even when the fluorescent probe or cells are not present. In such cases a washing step is recommended, where prior the measurement, cells are transfered to a buffered suspension with low background fluorescence. 
Even then, or when this is not possible, it is a good practice to measure the background fluorescence intensity of cells in a suspension without the added fluorescent probe. This way we get the fluorescent signal which is proportional to the analyte concentration and biomass we measure. 


To do so, we subtract the measured background fluorescence intensity form the fluorescence intensity 
after the addition of the fluorescent probe and compute the fluorescence signal as:



$$ Fluorescence = FI_{fp} − FI_{bg} $$

** FI$_{fp}$ - fluorescence intensity of fluorescent probe;* 

&NewLine; 

**FI$_{bg}$ - fluorescence intensity of the background fluorescence.* 


The subtraction is performed row-like (per well). If the background fluorescence was measured only in some or in one well, then either the average of the values or that one value should be spanned through the whole column. 
____

**! Note !**
The input fluorescence columns should **not be named `'Fluorescence'`**, as this column name is reserved for the output column that gets created. The default recognized column names are `'FI_bg'` and `'FI_fp'`, but any other name can be used. See API documentation and details in *00_input_format_utils.ipynb*.



In [23]:
data_siganls = analyse.get_fluorescence_signal(data=data_wo_blanks)
data_siganls

Unnamed: 0_level_0,Strain,Condition,Abs,FI_bg,FI_fp,Absorbance,Fluorescence
Well,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
A1,CS1,Condition1,0.555612,36,6009,0.4754,5973
A3,CS2,Condition1,0.523681,36,7592,0.4434,7556
A4,CS3,Condition1,0.523782,33,9336,0.4435,9303
A5,CS4,Condition1,0.671698,46,14251,0.5914,14205
A6,S1,Condition1,0.662686,37,10606,0.5824,10569
...,...,...,...,...,...,...,...
H8,S3,Condition2,0.403269,38,17995,0.3063,17957
H9,S4,Condition2,0.453199,31,23412,0.3562,23381
H10,S5,Condition2,0.406459,29,18210,0.3095,18181
H11,S6,Condition2,0.426239,28,20863,0.3292,20835


# Lipid signal: Biomass Normalised Fluorescence (RFU)


**Biomass normalization & variance-stabilization**

After we obtain the fluorescent intensity, the signal is proportional to the analyte concentration in the well, but because wells have different biomass,  the signal needs to be normalised to the biomass proxy, which is in our case the Absorbance. 

We derive the *`'Lipid'`* signal as biomass normalised Fluorescence (***Relative Fluorescence Units; RFUs***), computed as:

$$ Lipids = \frac{Fluorescence}{Absorbance} $$



Variance stabilisation of the Lipid signal is then performed using log transform:


$$ Log(Lipids) = Log(\frac{Fluorescence}{Absorbance}) $$

*Unit of the `'Log(Lipids)'` is defined as Log(RFU)

In [24]:
lipids = analyse.signal_biomass_normalisation(data=data_siganls)

lipids

Unnamed: 0_level_0,Strain,Condition,Abs,FI_bg,FI_fp,Absorbance,Fluorescence,Lipids,Log(Lipids)
Well,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
A1,CS1,Condition1,0.555612,36,6009,0.4754,5973,12564.156,9.439
A3,CS2,Condition1,0.523681,36,7592,0.4434,7556,17041.046,9.743
A4,CS3,Condition1,0.523782,33,9336,0.4435,9303,20976.325,9.951
A5,CS4,Condition1,0.671698,46,14251,0.5914,14205,24019.276,10.087
A6,S1,Condition1,0.662686,37,10606,0.5824,10569,18147.321,9.806
...,...,...,...,...,...,...,...,...,...
H8,S3,Condition2,0.403269,38,17995,0.3063,17957,58625.53,10.979
H9,S4,Condition2,0.453199,31,23412,0.3562,23381,65640.09,11.092
H10,S5,Condition2,0.406459,29,18210,0.3095,18181,58743.134,10.981
H11,S6,Condition2,0.426239,28,20863,0.3292,20835,63289.793,11.055


# Automated Analysis Workflow

The above described analysis steps are combined in an automated workflow. The whole analysis is thus computed at once, with one additional step - outlier detection of all selected numerical columns of the original data frame. 

The details of outlier detection strategies are described in *03_QC_&_Plotting.ipynb*. 

___
The analysis workflow consist of the following consecutive steps:
1. **Subtract Blanks**: Compute `Absorbance` column & drop blank wells
2. **Subtract Background Fluorescence Intensity**: Compute `Fluorescence` column
3. **Derive Lipid Signal**: Compute `Lipids` and `Log(Lipids)` columns

&NewLine;  

and

&NewLine;  

4. **Detect Replicate Outliers**: Compute `Outlier` column & drop outliers



**! Note !**
- The input data should be either an data frame in long format or path to the file (also in long format, with all variables combined in one table).

&NewLine; 
- The blank subtration is performed with automatic blank well detection strategy per Condition - A label value in the column `'Strain'` should be selected. Look at the details in [Blank Subtraction](#Blank-Subtraction) and [Subtracting blank absorbance with automatic blank well detection](#Subtracting-blank-absorbance-with-automatic-blank-well-detection) chapters.

&NewLine; 
- For outlier detection - in none of the columns are specified, all numerical columns are considered. In this case the function marks a row as an outlier if it’s flagged in at least one of the columns (i.e., union / ANY across columns within the group). Meaning, being an outlier in any of the numerical columns (within its group - `Condition` × `Strain`) is enough to mark the row as an outlier. 

In [25]:
# Compute the whole workflow at once
data_all, data_wo_outliers, outliers = analyse.analyse(
    filename="./data_example_long.csv", 
    blanks="Blank", 
    contamination_thr=0.2,
    outlier_method='IQR',
    outlier_columns=['Lipids'],
    save=True
)

  data, data_wo_blanks, blank = subtract_background_absorbance(


The blank Absorbance value(s) used: Condition
Condition1    0.08025
Condition2    0.09700
Name: Abs, dtype: float64


In [26]:
# Copy of the original data with all computed columns
data_all

Unnamed: 0,Well,Strain,Condition,Abs,FI_bg,FI_fp,Absorbance,Fluorescence,Lipids,Log(Lipids),Outlier
0,A1,CS1,Condition1,0.555612,36,6008.844,0.4754,5973,12564.156,9.439,False
1,A2,Blank,Condition1,0.079700,21,317.000,-0.0006,,,,
2,A3,CS2,Condition1,0.523681,36,7592.142,0.4434,7556,17041.046,9.743,False
3,A4,CS3,Condition1,0.523782,33,9335.587,0.4435,9303,20976.325,9.951,False
4,A5,CS4,Condition1,0.671698,46,14251.460,0.5914,14205,24019.276,10.087,False
...,...,...,...,...,...,...,...,...,...,...,...
91,H8,S3,Condition2,0.403269,38,17995.150,0.3063,17957,58625.53,10.979,False
92,H9,S4,Condition2,0.453199,31,23412.430,0.3562,23381,65640.09,11.092,False
93,H10,S5,Condition2,0.406459,29,18209.960,0.3095,18181,58743.134,10.981,False
94,H11,S6,Condition2,0.426239,28,20863.380,0.3292,20835,63289.793,11.055,False


In [27]:
# Data without blanks & without outliers

data_wo_outliers

Unnamed: 0_level_0,Strain,Condition,Abs,FI_bg,FI_fp,Absorbance,Fluorescence,Lipids,Log(Lipids)
Well,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
A1,CS1,Condition1,0.555612,36,6008.844,0.4754,5973,12564.156,9.439
A3,CS2,Condition1,0.523681,36,7592.142,0.4434,7556,17041.046,9.743
A4,CS3,Condition1,0.523782,33,9335.587,0.4435,9303,20976.325,9.951
A5,CS4,Condition1,0.671698,46,14251.460,0.5914,14205,24019.276,10.087
A6,S1,Condition1,0.662686,37,10605.590,0.5824,10569,18147.321,9.806
...,...,...,...,...,...,...,...,...,...
H8,S3,Condition2,0.403269,38,17995.150,0.3063,17957,58625.53,10.979
H9,S4,Condition2,0.453199,31,23412.430,0.3562,23381,65640.09,11.092
H10,S5,Condition2,0.406459,29,18209.960,0.3095,18181,58743.134,10.981
H11,S6,Condition2,0.426239,28,20863.380,0.3292,20835,63289.793,11.055


In [28]:
# Detected outlier rows

outliers

Unnamed: 0_level_0,Strain,Condition,Abs,FI_bg,FI_fp,Absorbance,Fluorescence,Lipids,Log(Lipids)
Well,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
C7,S2,Condition1,0.515669,40,7192.001,0.4354,7152,16426.275,9.707
E11,S6,Condition2,0.391279,29,21497.38,0.2943,21468,72945.973,11.197
G4,CS3,Condition2,0.313998,35,14225.16,0.217,14190,65391.705,11.088
H1,CS1,Condition2,0.352938,32,6224.592,0.2559,6193,24200.86,10.094


In [29]:
# Repeat the analysis but don't state columns for the outlier detection

data_all, data_wo_outliers, outliers = analyse.analyse(
    filename="./data_example_long.csv", 
    blanks="Blank", 
    contamination_thr=0.2,
    outlier_method='IQR',
    outlier_columns=None,
    save=False
)

  data, data_wo_blanks, blank = subtract_background_absorbance(


The blank Absorbance value(s) used: Condition
Condition1    0.08025
Condition2    0.09700
Name: Abs, dtype: float64


In [30]:
# Data without blanks & without outliers

data_wo_outliers

Unnamed: 0_level_0,Strain,Condition,Abs,FI_bg,FI_fp,Absorbance,Fluorescence,Lipids,Log(Lipids)
Well,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
A1,CS1,Condition1,0.555612,36,6008.844,0.4754,5973,12564.156,9.439
A4,CS3,Condition1,0.523782,33,9335.587,0.4435,9303,20976.325,9.951
A5,CS4,Condition1,0.671698,46,14251.460,0.5914,14205,24019.276,10.087
A6,S1,Condition1,0.662686,37,10605.590,0.5824,10569,18147.321,9.806
A7,S2,Condition1,0.651732,41,11212.320,0.5715,11171,19546.807,9.881
...,...,...,...,...,...,...,...,...,...
H7,S2,Condition2,0.405568,30,18545.410,0.3086,18515,59996.759,11.002
H9,S4,Condition2,0.453199,31,23412.430,0.3562,23381,65640.09,11.092
H10,S5,Condition2,0.406459,29,18209.960,0.3095,18181,58743.134,10.981
H11,S6,Condition2,0.426239,28,20863.380,0.3292,20835,63289.793,11.055


In [31]:
# Copy of the original data with all computed columns

data_all

Unnamed: 0,Well,Strain,Condition,Abs,FI_bg,FI_fp,Absorbance,Fluorescence,Lipids,Log(Lipids),Outlier
0,A1,CS1,Condition1,0.555612,36,6008.844,0.4754,5973,12564.156,9.439,False
1,A2,Blank,Condition1,0.079700,21,317.000,-0.0006,,,,
2,A3,CS2,Condition1,0.523681,36,7592.142,0.4434,7556,17041.046,9.743,True
3,A4,CS3,Condition1,0.523782,33,9335.587,0.4435,9303,20976.325,9.951,False
4,A5,CS4,Condition1,0.671698,46,14251.460,0.5914,14205,24019.276,10.087,False
...,...,...,...,...,...,...,...,...,...,...,...
91,H8,S3,Condition2,0.403269,38,17995.150,0.3063,17957,58625.53,10.979,True
92,H9,S4,Condition2,0.453199,31,23412.430,0.3562,23381,65640.09,11.092,False
93,H10,S5,Condition2,0.406459,29,18209.960,0.3095,18181,58743.134,10.981,False
94,H11,S6,Condition2,0.426239,28,20863.380,0.3292,20835,63289.793,11.055,False


In [32]:
# Detected outlier rows

outliers

Unnamed: 0_level_0,Strain,Condition,Abs,FI_bg,FI_fp,Absorbance,Fluorescence,Lipids,Log(Lipids)
Well,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
A3,CS2,Condition1,0.523681,36,7592.142,0.4434,7556,17041.046,9.743
B1,CS1,Condition1,0.441097,38,3908.486,0.3608,3870,10726.164,9.28
B4,CS3,Condition1,0.58047,34,10212.59,0.5002,10179,20349.86,9.921
B5,CS4,Condition1,0.43643,45,9514.596,0.3562,9470,26586.187,10.188
C7,S2,Condition1,0.515669,40,7192.001,0.4354,7152,16426.275,9.707
C8,S3,Condition1,0.551179,37,8507.228,0.4709,8470,17986.834,9.797
C9,S4,Condition1,0.578894,34,10784.39,0.4986,10750,21560.369,9.979
D4,CS3,Condition1,0.479603,33,7303.544,0.3994,7271,18204.807,9.809
D6,S1,Condition1,0.624316,33,10181.58,0.5441,10149,18652.821,9.834
E11,S6,Condition2,0.391279,29,21497.38,0.2943,21468,72945.973,11.197
