# Usage examples

## 00_input_format_utils.ipynb



This notebook describes the detailed documentation, instructions and the expected *long-format input schema* for **NileRedQuant** (I/O). It also demonstrates the basic helpers in `nileredquant.utils` for reading files, converting layouts from microtiter plate arrays to long (tidy) formats and back, and for adding metadata to the long data table format. 


**What this notebook covers:**
   - Accepted file types (.csv, .tsv, .xlsx)
   - Required and recommended columns for long-format input
   - Format re-organisations: plate layout → long format, long format → plate layout.
   - Metadata merges
   - Utilities demo: `utils.read_file()`, `utils.plate_to_list()`, `utils.map_metadata()`


----

**Table of Contents**
- [Input Format](#Input-Format)
    - [Examplease data (long format)](#Example-data-(long-format))
- [Import Data](#Import-Data)
    - [Supported File Formats](#Supported-File-Formats)
- [Format Conversions](#Format-Conversions)
    - [Plate-to-List](#Plate-to-List)
    - [List-to-Plate](#List-to-Plate)
- [Adding Metadata](#Adding-Metadata)

# Input Format
**Supported files**: Comma-separated (`.csv`), tab-separated (`.tsv`), or Excel (`.xlsx`) plate reader exports are accepted via `utils.read_file(path)`. 

**Schema (long format).** The analysis workflow expects an tidy/long format, one well per row. The format can be generated from plate arrays via `utils.plate_to_list(path, variable)`, but should contain the plate column and row values in the `Well` variable/column (recommended).


##### **Required columns**
- **Strain** — strain/organism identifier (string). Numeric IDs are normalized to strings internally.
- **Condition** — cultivation/assay condition (string). Used for per-condition blanking, calibration & grouping.

The same *`Strain`* × *`Condition`* combinations are allowed in the (long) analysis format and are treated as replicates.

##### **Recommended columns**
- **Well** — plate position like `A1…H12` (**96-well**) or `A1…P24` (**384-well**). Labels are normalized (`A01 → A1`) and validated. Required in some functions (e.g. `utils.map_metadata()`) but optional in other.


Column names recognised as *Default*:
- **Abs** — absorbance at *t$_{0}$* (biomass proxy; before fluorescent probe addition), used for blank subtraction and biomass normalization.
- **FI_bg** — Fluorescence Intensity (FI) of background (at *t$_{0}$*; before fluorescent probe addition).
- **FI_fp** — Fluorescence Intensity (FI) of fluorescent proble (= Nile red addition at *t$_{1}$*; post ~24–25 min dye incubation). `"INVALID"` entries are treated as missing values.

Column names *Abs,FI_bg, and FI_fp* are not enforced. Any other column name can be used, as long as its clearly stated in the functions (*see examples below*) and are not either `'Absorbance'` or `'Fluorescence'`, as these column names are reserved for output columns. 

##### **Other columns (free-form)**
Any metadata variables you want to use through analysis (e.g., `Strain_Genotype`, `Time_point`, `Format`, `Biomass_volume`, `Date`, `NR_concentration`, …) can be included. Metadata variables can be merged/mapped by `'Well'` positions via `utils.map_metadata()` function.

##### **Absorbance Blank handling**
Provide one of these:
1. a numeric blank absorbance value used for all present conditions,
1. list or tuple of blank absorbance values per condition (in same order as conditions in data) or
1. blank labels in the layout: **Strain** column (e.g., `'B'`, `['B', 'Blank']`). 


## Example data (long format)

| **Well** | **Strain** | **Condition** | **Abs** | **FI_bg** | **FI_fp** | … Metadata (e.g. Format) |
|---|---|---|---:|---:|---:|---|
| A1 | Strain_1 | Condition_1 | 0.21 | 1234 | 8765 | 96-well |
| A2 | Blank | Condition_1 | 0.019 | 18 | 22 | 96-well |
| ⋮  | ⋮        | ⋮           | ⋮    | ⋮     | ⋮     | ⋮        |
| H12| Strain_X | Condition_X | 0.23 | 1310 | 9010 | 96-well |

In the example above, the *Blank* value in column `Strain` can be used for cultivation media absorbance subtraction (aka blank).





**Data used**  
Example datasets used in this and other usage example notebooks are available at: **[`../data/`](../data/)**.

In [14]:
# importing the tool's utilis module

import nileredquant.utils as utils
import pandas as pd

# Import Data

## Supported File Formats

As mentioned in the [Input Format](#Input-Format) chapter, Comma-separated (.csv), tab-separated (.tsv), or Excel (.xlsx) files are accepted and imported via `utils.read_file()` function. This function is a basic reader function, which automatically detects the format and returns a data table. Note: the loader may treat the first column as an index by default(reset the index if needed).


The imported data can be either in the plate or long format. If Excel file is provided, the user can choose which sheet is used for import. If sheet name is not provided, the first sheet in the workbook is imported. 

In [15]:
# Import data in various file formats - CSV; 

data_Abs_csv = utils.read_file(filename="./Abs_plate_96well.csv")

data_Abs_csv

Unnamed: 0_level_0,1,2,3,4,5,6,7,8,9,10,11,12
col,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
A,0.555612,0.0797,0.523681,0.523782,0.671698,0.662686,0.651732,0.666005,0.668971,0.644954,0.665087,0.62611
B,0.441097,0.0785,0.499192,0.58047,0.43643,0.592873,0.685463,0.621846,0.652417,0.580536,0.57179,0.699885
C,0.587683,0.0848,0.528565,0.5261,0.547319,0.6818,0.515669,0.551179,0.578894,0.655637,0.735597,0.663815
D,0.492979,0.078,0.561911,0.479603,0.585315,0.624316,0.624138,0.620299,0.569577,0.724116,0.615193,0.643204
E,0.31384,0.15,0.339402,0.340502,0.332135,0.410613,0.379552,0.420177,0.427571,0.420257,0.391279,0.364933
F,0.418274,0.085,0.445446,0.463635,0.376915,0.422224,0.343154,0.441401,0.33675,0.489805,0.432326,0.391557
G,0.420599,0.056,0.344865,0.313998,0.359658,0.434892,0.457851,0.420877,0.350427,0.4135,0.397566,0.47504
H,0.352938,0.3,0.35654,0.392011,0.418079,0.473767,0.405568,0.403269,0.453199,0.406459,0.426239,0.386404


In [16]:
# Import data in various file formats - TSV;

data_Abs_tsv = utils.read_file(filename="./Abs_plate_96well.tsv") 

data_Abs_tsv

Unnamed: 0_level_0,1,2,3,4,5,6,7,8,9,10,11,12
col,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
A,0.555612,0.0797,0.523681,0.523782,0.671698,0.662686,0.651732,0.666005,0.668971,0.644954,0.665087,0.62611
B,0.441097,0.0785,0.499192,0.58047,0.43643,0.592873,0.685463,0.621846,0.652417,0.580536,0.57179,0.699885
C,0.587683,0.0848,0.528565,0.5261,0.547319,0.6818,0.515669,0.551179,0.578894,0.655637,0.735597,0.663815
D,0.492979,0.078,0.561911,0.479603,0.585315,0.624316,0.624138,0.620299,0.569577,0.724116,0.615193,0.643204
E,0.31384,0.15,0.339402,0.340502,0.332135,0.410613,0.379552,0.420177,0.427571,0.420257,0.391279,0.364933
F,0.418274,0.085,0.445446,0.463635,0.376915,0.422224,0.343154,0.441401,0.33675,0.489805,0.432326,0.391557
G,0.420599,0.056,0.344865,0.313998,0.359658,0.434892,0.457851,0.420877,0.350427,0.4135,0.397566,0.47504
H,0.352938,0.3,0.35654,0.392011,0.418079,0.473767,0.405568,0.403269,0.453199,0.406459,0.426239,0.386404


In [17]:
# Import data in various file formats - Excel, with undefined sheet name;

data_strain_ex = utils.read_file(
    filename="./data_example_plate.xlsx", 
    sheet_name=None  # <-- First sheet is taken if none column name is provided
) 

data_strain_ex

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,11,12
A,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7
B,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7
C,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7
D,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7
E,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7
F,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7
G,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7
H,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7


In [18]:
# Import data in various file formats - Excel, with sheet name `FI_bg`;

data_FIbg_ex = utils.read_file(
    filename="./data_example_plate.xlsx", 
    sheet_name="FI_bg"  # <-- defined sheet name for import.
) 
data_FIbg_ex

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,11,12
A,36,21,36,33,46,37,41,38,36,34,41,33
B,38,27,35,34,45,37,40,39,36,32,43,37
C,37,30,37,34,46,37,40,37,34,33,44,35
D,38,26,37,33,46,33,41,41,37,33,42,33
E,35,30,37,32,41,27,32,33,32,30,29,27
F,33,29,37,48,43,28,33,34,31,29,31,28
G,35,27,36,35,39,29,30,35,30,28,30,27
H,32,31,35,35,38,27,30,38,31,29,28,26


All above examples show imports of data in a 96-well microtiter plate format. The import works for any shape. 

If your plate reader outputs data in a microtiter plate format like above, use of an excel file is recommended, with every variable being stored in a separate sheet. **The reader function won't recognise separate plate formats corresponding to several variables stored in one sheet.** 

The data variables are either combined in a long format in any of the above stated file formats. Or are stored in separate files. 


# Format Conversions

## Plate-to-List


The function `utils.plate_to_list()` takes these inputs:

1. Either a path to a file (including file name) or a pandas.DataFrame object, 
1. The variable to be considered for rearrangement
1. Boolean or path value if the generated format should be saved to a newly created CSV file. 
    - True: file is saved to current working directory with default name ('./DataFrame_input_parameter_long.csv')
    - False: file is not saved
    - path: file is saved at a desired location and with the following naming convention: '{path/file_name_provided}_parameter_long.csv'
    
    
It returns a pandas.DataFrame which can be used further. 

In [19]:
# reshape data from a plate array to a long format for each variable 


Strain_list_format = utils.plate_to_list(
    filename="./data_example_plate.xlsx", # <- either a path to file, or a pandas.DataFrame object can be used
    parameter="Strain",  # <- parameter = Excel sheet name defition = variable/column name
    save=False # <- if the transformed data should be saved in a new CSV file
)

# Show 
Strain_list_format

Unnamed: 0_level_0,Strain
Well,Unnamed: 1_level_1
A1,CS1
A2,Blank
A3,CS2
A4,CS3
A5,CS4
...,...
H8,S3
H9,S4
H10,S5
H11,S6


In [20]:
# or do it in a loop & combine all plate arrays to one long data table 


sheets = []
for variable in ["Strain","Abs", "FI_bg", "FI_fp"]: # <- define all sheet names in the Excel file
    
    df = utils.plate_to_list(
        filename="./data_example_plate.xlsx", 
        parameter=variable,  
        save=False  
    )
    
    sheets.append(df)
    

# Combine all variables into one table
combined_data = pd.concat(sheets, axis=1)    
    
# Save created data table to CSV
combined_data.to_csv(f"./Combined_data_long.csv")

# Or save it to Excel with desired sheet name
combined_data.to_excel(f"./Combined_data_long.xlsx", sheet_name = 'Input_data')

combined_data

Unnamed: 0_level_0,Strain,Abs,FI_bg,FI_fp
Well,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A1,CS1,0.555612,36,6008.844
A2,Blank,0.079700,21,317.000
A3,CS2,0.523681,36,7592.142
A4,CS3,0.523782,33,9335.587
A5,CS4,0.671698,46,14251.460
...,...,...,...,...
H8,S3,0.403269,38,17995.150
H9,S4,0.453199,31,23412.430
H10,S5,0.406459,29,18209.960
H11,S6,0.426239,28,20863.380


## List-to-Plate

Similar to the function above, `utils.list_to_plate()` takes as input:
1. Either a path to a file (including file name) or a pandas.DataFrame object, 
1. The variable to be considered for rearrangement
1. Boolean or path value if the generated format should be saved to a newly created CSV file. 
    - True: file is saved to current working directory with default name ('./DataFrame_input_parameter_long.csv')
    - False: file is not saved
    - path: file is saved at a desired location and with the following naming convention: '{path/file_name_provided}_parameter_long.csv'


It returns a pandas.DataFrame which can be used further. 

In [23]:
combined_data

Unnamed: 0_level_0,Strain,Abs,FI_bg,FI_fp
Well,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A1,CS1,0.555612,36,6008.844
A2,Blank,0.079700,21,317.000
A3,CS2,0.523681,36,7592.142
A4,CS3,0.523782,33,9335.587
A5,CS4,0.671698,46,14251.460
...,...,...,...,...
H8,S3,0.403269,38,17995.150
H9,S4,0.453199,31,23412.430
H10,S5,0.406459,29,18209.960
H11,S6,0.426239,28,20863.380


In [22]:
# Reshape the data from long to plate array

Strain_plate_format = utils.list_to_plate(
    filename=combined_data, # <- either a path to file, or a pandas.DataFrame object can be used
    parameter="Strain",  # <- which variable should be rearranged 
    save=True
)

# The newly created file was saved to the current working dorectory and its name is: 
# 'List_to_plate_Strain.csv'

Strain_plate_format

row,1,2,3,4,5,6,7,8,9,10,11,12
col,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
A,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7
B,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7
C,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7
D,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7
E,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7
F,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7
G,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7
H,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7


In [9]:
# Reshape the data from long to plate array

Strain_plate_format = utils.list_to_plate(
    filename=combined_data, # <- either a path to file, or a pandas.DataFrame object can be used
    parameter="Strain",  # <- which variable should be rearranged 
    save="./Custom_name_Strain_long.csv"
)

# The newly created file was saved to the current working dorectory and its name is: 
# 'Custom_name_Strain_long.csv'

Strain_plate_format

row,1,2,3,4,5,6,7,8,9,10,11,12
col,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
A,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7
B,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7
C,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7
D,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7
E,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7
F,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7
G,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7
H,CS1,Blank,CS2,CS3,CS4,S1,S2,S3,S4,S5,S6,S7


# Adding Metadata


For combining data tables with its corresponding data, we can use the `utils.map_metadata()` function. This function maps the metadata according to the microtiter plate well position, requiring the `Well` column and the data being in a long format. 

Essentially, we have 2 options for mapping the metadata: 

1. Import the the metadata using `utils.read_file()` and then providing the pandas.DataFrame to the  `utils.map_metadata()` function along with the 'data' (also pandas.DataFrame). Comma-separated (.csv), tab-separated (.tsv), or Excel (.xlsx) file formats are accepted.

&NewLine;

1. Directly provide the path to the metadata file and 'data' (pandas.DataFrame).

Both usage options are shown below. For preparing metadata in long format, we can use the `utils.plate_to_list()` function as shown previously. 
___

The saving functionality of the combined data is the same as with previous functions. A boolean or path value is required:
- True: file is saved to current working directory with default name ('./DataFrame_input_parameter_w_metadata.csv')
- False: file is not saved
- path: file is saved at a desired location and with the following naming convention: '{path/file_name_provided}_w_metadata.csv'

In [10]:
# Import the metadata - using teh utils.read_file() function from above

metadata = utils.read_file(filename="./data_example_plate.xlsx", sheet_name="Metadata")
metadata

Unnamed: 0_level_0,Strain,Replicate,Condition,Time_point,Format,Biomass_volume,Collection,Organism
Well,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
A1,CS1,R1,Condition1,24h,96-well,100,synthetic_data,S. cerevisiae
A2,Blank,R1,Condition1,24h,96-well,100,synthetic_data,/
A3,CS2,R1,Condition1,24h,96-well,100,synthetic_data,S. cerevisiae
A4,CS3,R1,Condition1,24h,96-well,100,synthetic_data,S. cerevisiae
A5,CS4,R1,Condition1,24h,96-well,100,synthetic_data,S. cerevisiae
...,...,...,...,...,...,...,...,...
H8,S3,R4,Condition2,72h,96-well,100,synthetic_data,S. cerevisiae
H9,S4,R4,Condition2,72h,96-well,100,synthetic_data,S. cerevisiae
H10,S5,R4,Condition2,72h,96-well,100,synthetic_data,S. cerevisiae
H11,S6,R4,Condition2,72h,96-well,100,synthetic_data,S. cerevisiae


In [11]:
# Map metadata to data providing pandas.DataFrames

combined_w_meta = utils.map_metadata(filename=metadata, data=combined_data, save=False)
combined_w_meta

Unnamed: 0,Well,Strain,Replicate,Condition,Time_point,Format,Biomass_volume,Collection,Organism,Abs,FI_bg,FI_fp
0,A1,CS1,R1,Condition1,24h,96-well,100,synthetic_data,S. cerevisiae,0.555612,36,6008.844
1,A1,CS1,R1,Condition1,24h,96-well,100,synthetic_data,S. cerevisiae,0.441097,38,3908.486
2,A1,CS1,R1,Condition1,24h,96-well,100,synthetic_data,S. cerevisiae,0.587683,37,5889.523
3,A1,CS1,R1,Condition1,24h,96-well,100,synthetic_data,S. cerevisiae,0.492979,38,5654.330
4,A1,CS1,R1,Condition1,24h,96-well,100,synthetic_data,S. cerevisiae,0.313840,35,9627.569
...,...,...,...,...,...,...,...,...,...,...,...,...
763,H12,S7,R4,Condition2,72h,96-well,100,synthetic_data,S. cerevisiae,0.643204,33,11916.210
764,H12,S7,R4,Condition2,72h,96-well,100,synthetic_data,S. cerevisiae,0.364933,27,20547.900
765,H12,S7,R4,Condition2,72h,96-well,100,synthetic_data,S. cerevisiae,0.391557,28,20465.390
766,H12,S7,R4,Condition2,72h,96-well,100,synthetic_data,S. cerevisiae,0.475040,27,25411.510


In [12]:
# Map metadata to data providing a path to file & sheet name. 

meta_data = utils.map_metadata(
    filename="./data_example_plate.xlsx", 
    data=combined_data,
    sheet_name="Metadata", 
    save=True
)
meta_data

Unnamed: 0,Well,Strain,Replicate,Condition,Time_point,Format,Biomass_volume,Collection,Organism,Abs,FI_bg,FI_fp
0,A1,CS1,R1,Condition1,24h,96-well,100,synthetic_data,S. cerevisiae,0.555612,36,6008.844
1,A1,CS1,R1,Condition1,24h,96-well,100,synthetic_data,S. cerevisiae,0.441097,38,3908.486
2,A1,CS1,R1,Condition1,24h,96-well,100,synthetic_data,S. cerevisiae,0.587683,37,5889.523
3,A1,CS1,R1,Condition1,24h,96-well,100,synthetic_data,S. cerevisiae,0.492979,38,5654.330
4,A1,CS1,R1,Condition1,24h,96-well,100,synthetic_data,S. cerevisiae,0.313840,35,9627.569
...,...,...,...,...,...,...,...,...,...,...,...,...
763,H12,S7,R4,Condition2,72h,96-well,100,synthetic_data,S. cerevisiae,0.643204,33,11916.210
764,H12,S7,R4,Condition2,72h,96-well,100,synthetic_data,S. cerevisiae,0.364933,27,20547.900
765,H12,S7,R4,Condition2,72h,96-well,100,synthetic_data,S. cerevisiae,0.391557,28,20465.390
766,H12,S7,R4,Condition2,72h,96-well,100,synthetic_data,S. cerevisiae,0.475040,27,25411.510


In [13]:
# If only certain metdata is to be considered for mapping

meta_data_red = utils.map_metadata(metadata.loc[:,["Strain", "Condition"]], combined_data, save=False)
meta_data_red

Unnamed: 0,Well,Strain,Condition,Abs,FI_bg,FI_fp
0,A1,CS1,Condition1,0.555612,36,6008.844
1,A1,CS1,Condition1,0.441097,38,3908.486
2,A1,CS1,Condition1,0.587683,37,5889.523
3,A1,CS1,Condition1,0.492979,38,5654.330
4,A1,CS1,Condition1,0.313840,35,9627.569
...,...,...,...,...,...,...
763,H12,S7,Condition2,0.643204,33,11916.210
764,H12,S7,Condition2,0.364933,27,20547.900
765,H12,S7,Condition2,0.391557,28,20465.390
766,H12,S7,Condition2,0.475040,27,25411.510
