# CBT short course 2024, jupyter notebook for data process pipeline tutorial 

This Python juypyter notebook demonstrates how to use our automated Python package to process raw fedbatch culture data to calculate cumulative consumptions/productions as well as specific rates of key chemical species we want to monitor in the fedbatch culture. After data-processing, the package allows users to aggregate processed datasets from different experiments/cell lines to an exported Excel worksheet. A built-in interactivate plotting function is provided such that one can compare different process attributes across different experiments and cell lines.

The following tuturtoial notebook will be splitted into to four sections: 1. package setup, 2. example of processing one cell line dataset with multiple experiments, 3. example of processing datasets from multiple cell lines, 4. built-in interactive plot.

# 1. Package setup

In [1]:
from CCDPApy.helper import input_path
from CCDPApy import FedBatchParameters, FedBatchCellLine, FedBatchExpriment, FedBatchCellCulture

import pandas as pd
pd.set_option('display.max_columns', 200)

# 2. Processing one cell dataset with multiple experiments

## 2.1 Define data processing setting

In [2]:
from CCDPApy import FedBatchParameters

# Define data prcessing setting
param_1 = FedBatchParameters(cell_line_name='CL1', # cell line name
                             use_concentration_after_feed=False, # if the there are measurements on concentrations after feeding
                             use_feed_concentration=True, # If using the feed information to calculate the concentration after feeding
                             regression_method=['polynomial', 'rolling_window_polynomial'], # regression method to calculate the specific rates (in addition to two-point calculation)
                             rolling_polynomial_degree=3, # polynomial degrees of rolling regresssion
                             rolling_polynomial_window=6) # window size of rolling regression

"""
TODO:
1. let the rolling regression use the same polynomial degree as the polynomial regression inputing from the data input files
"""

# check/printout the data processing setting
param_1

Cell Line: CL1
Feed concentration will be used: True
Concentration after feeding will be used: False
Regression Methods
     Polynomial: True
     Rolling window polynomial True

# 2.2 Data processing

In [3]:
# create a Python object for data-processing
cell_line_1= FedBatchCellCulture()

# path to the input data file
path = input_path('fed_batch_data.xlsx')

# load the data set to the data-processing object
cell_line_1.load_data(file=path)

# peform data processsing with defined setting
cell_line_1.perform_data_process(parameters=[param_1])

# 2.3 Export to Excel sheet

In [7]:
cell_line_1.save_excel(file_name='output_CL1.xlsx')


output_CL1.xlsx  saved.


# 3. Processing datasets from multiple cell lines

## 3.1 Option 1: data processing at once

In [5]:
# Define data prcessing setting for cell line 1 (CL1)
param_1 = FedBatchParameters(cell_line_name='CL1', # cell line name
                             use_concentration_after_feed=False, # if the there are measurements on concentrations after feeding
                             use_feed_concentration=True, # If using the feed information to calculate the concentration after feeding
                             regression_method=['polynomial', 'rolling_window_polynomial'], # regression method to calculate the specific rates (in addition to two-point calculation)
                             rolling_polynomial_degree=3, # polynomial degrees of rolling regresssion
                             rolling_polynomial_window=6) # window size of rolling regression

# Define data prcessing setting for cell line 2 (CL2)
param_2 = FedBatchParameters(cell_line_name='CL2', # cell line name
                             use_concentration_after_feed=True, # if the there are measurements on concentrations after feeding
                             use_feed_concentration=False, # If using the feed information to calculate the concentration after feeding
                             regression_method=['polynomial', 'rolling_window_polynomial'], # regression method to calculate the specific rates (in addition to two-point calculation)
                             rolling_polynomial_degree=3, # polynomial degrees of rolling regresssion
                             rolling_polynomial_window=6) # window size of rolling regression


### Repeat section 2.2
# create a Python object for data-processing of CL1/2
cell_line_1_2= FedBatchCellCulture()

# path to the input data file
path = input_path('fed_batch_data.xlsx')

# load the data set to the data-processing object
cell_line_1_2.load_data(file=path)

# peform data processsing with defined setting. Can take separate settings for different cell lines/processes
cell_line_1_2.perform_data_process(parameters=[param_1, param_2])

# export the processed datasets
cell_line_1_2.save_excel(file_name='output_CL1_2_option1.xlsx')

output_CL1_2_option1.xlsx  saved.


## 3.2 Option 2: append new datasets to the existing processed datasets

In [6]:

# Define data prcessing setting for cell line 2 (CL2)
param_2 = FedBatchParameters(cell_line_name='CL2', # cell line name
                             use_concentration_after_feed=True, # if the there are measurements on concentrations after feeding
                             use_feed_concentration=False, # If using the feed information to calculate the concentration after feeding
                             regression_method=['polynomial', 'rolling_window_polynomial'], # regression method to calculate the specific rates (in addition to two-point calculation)
                             rolling_polynomial_degree=3, # polynomial degrees of rolling regresssion
                             rolling_polynomial_window=6) # window size of rolling regression


### Repeat section 2.2
# create a Python object for data-processing of CL1/2
cell_line_1_2= FedBatchCellCulture()

# path to the input data file
path = input_path('fed_batch_data.xlsx')


# load the data set to the data-processing object
cell_line_1_2.load_data(file=path)

# import the existing processed dataset from CL1
cell_line_1_2.import_data(file_name='output_CL1.xlsx')


# peform data processsing with defined setting. Can take separate settings for different cell lines/processes
cell_line_1_2.perform_data_process(parameters=[param_2])


# export the processed datasets
cell_line_1_2.save_excel(file_name='output_CL1_2_option2.xlsx')

5


KeyError: 'Igg'

# 4. Interactive plot

In [None]:
# call out the interactive plot
cell_line_1_2.interactive_plot(port=8081)


JupyterDash is deprecated, use Dash instead.
See https://dash.plotly.com/dash-in-jupyter for more details.

