# Script Description

* 이 스크립트는 baseline fitting algorithm 비교 분석용을 위한 전처리용으로 사용된다.
* `2.dsp-executer.ipynb` 는 다음 2가지의 data를 전처리한다.
    * `1.cfx-to-parquet-converter.ipynb` 에 의해 전처리된 CFX Data (RFU Baseline Subtracted by CFX Manager) `parquets`의 `DSP2.1` 연산 결과를 출력한다.  
    * `1.cfx-to-parquet-converter.ipynb` 에 의해 전처리된 Raw Sample Data `parquets`의 `DSP2.1` 연산 결과를 출력한다. 

# DSP2.1 연산을 위한 전처리 작업

## Path 설정

In [12]:
# Config
import pydsptools.config.pda as pda
import pydsptools.biorad as biorad

# Analysis Preparation
import polars as pl
import pandas as pd
import numpy as np
from scipy.stats import chi2 # https://en.cppreference.com/w/cpp/numeric/random/chi_squared_distribution

# DSP Processing
import pydsp.run.worker
import pydsptools.biorad.parse as bioradparse

# PreProcessing
import pprint
import pyarrow as pa
import os
import subprocess
import pathlib as Path


# Visualization
import pydsptools.plot as dspplt
import plotly.express as px
import matplotlib.pyplot as plt

In [13]:
root_path = Path.Path.cwd() # local: /home/kmkim/pda/dsp-research-strep-a/kkm, PDA-pro: /home/jupyter-kmkim/dsp-research-strep-a/kkm
prefix = 'data'
directory_names = ['cfx-baseline-subtracted','pda-raw-sample']
product_names = ['GI-B-I', 'GI-B-II', 'GI-P', 'GI-V', 'RP1', 'RP2', 'RP3', 'RP4', 'STI-CA', 'STI-EA', 'STI-GU']
consumables = ['8-strip','96-cap']
plate_numbers = ['plate_data_' + number for number in ['002','005','031','032','036','041']]
# !python -m pydsptools.biorad.parse -t cfx-batch-csv -f parquet -o './data/baseline-subtracted/processed/example1' './data/baseline-subtracted/cfx-data/'

## Parquet 파일에 Column 추가

Each Processed Parquet 파일에 `pcr_system`, `consumable`, `temperature`, `experiment_name`, `plate_name` 컬럼 추가

In [14]:
TESTNAME = "example1"
PCR_SYSTEM = "CFX96"
CONSUMABLE = "8-strip"
EXPERIMENT_NAME = "GI-B-I"
TYPE = "cfx-manager"
PROTOCOL = {4: "Low", 5: "High"}

paths = Path.Path(f"./data/cfx-baseline-subtracted/processed/{TESTNAME}").glob("*.parquet")
for path in paths:
    name = path.name[:-8]
    df = pd.read_parquet(path)
    df["pcr_system"] = PCR_SYSTEM
    df["consumable"] = CONSUMABLE
    df["temperature"] = df.apply(lambda x: PROTOCOL[x["step"]], axis=1)
    df["experiment_name"] = EXPERIMENT_NAME
    df["data_type"] = TYPE
    outdir = Path.Path(f"./data/baseline-subtracted/computed/{TESTNAME}/pcr_results")
    if outdir.exists() is False:
        outdir.mkdir(parents=True)
    df.to_parquet(outdir / f"{name}.parquet")


In [15]:
TESTNAME = "example1"
PCR_SYSTEM = "CFX96"
CONSUMABLE = "8-strip"
EXPERIMENT_NAME = "GI-B-I"
TYPE = "pda-raw-sample"
PROTOCOL = {4: "Low", 5: "High"}

paths = Path.Path(f"./data/pda-raw-sample/GI-B-I/GI-B-I_8-strip/processed/{TESTNAME}").glob("*.parquet")
for path in paths:
    name = path.name[:-8]
    df = pd.read_parquet(path)
    df["pcr_system"] = PCR_SYSTEM
    df["consumable"] = CONSUMABLE
    df["temperature"] = df.apply(lambda x: PROTOCOL[x["step"]], axis=1)
    df["experiment_name"] = EXPERIMENT_NAME
    df["data_type"] = TYPE
    outdir = Path.Path(f"./data/pda-raw-sample/computed/{TESTNAME}/pcr_results")
    if outdir.exists() is False:
        outdir.mkdir(parents=True)
    df.to_parquet(outdir / f"{name}.parquet")

## DSP 연산 결과 Directories 생성
`config__dsp2_orig/basesub` 디렉토리가 반드시 있어야 autobaseline 연산 결과를 얻을 수 있음. DSP2에서는 자동으로 생성하는 것 같은데 DSP1에서는 수동으로 디렉토리를 만들어줘야 하는 것 같음.
(김형규 과장님이 개선 예정)
* `{상위 path}/computed/{TESTNAME}/config__dsp1_orig/dsp`
* `{상위 path}/computed/{TESTNAME}/config__dsp2_orig/basesub`
* `{상위 path}/computed/{TESTNAME}/config__dsp2_orig/dsp`


In [16]:
[Path.Path(f"./data/baseline-subtracted/computed/example1/config__dsp2_orig/{subdir}").mkdir(parents=True, exist_ok=True) for subdir in ["dsp", "basesub", "log"]]
Path.Path(f"./data/baseline-subtracted/computed/example1/config__dsp2_orig/log").mkdir(parents=True, exist_ok=True)
Path.Path(f"./data/pda-raw-sample/computed/example1/config__dsp2_orig/dsp").mkdir(parents=True, exist_ok=True)
Path.Path(f"./data/pda-raw-sample/computed/example1/config__dsp2_orig/basesub").mkdir(parents=True, exist_ok=True)

# DSP 실행

## docker를 이용한 DSP 실행 

```
docker run -it --rm -v $(pwd):/code seegene/pydsp:2.1.0-alpha.1 python -m pydsp.run.worker multiple \
    -i /code/computed/example1/pcr_results \
    -c /code/config/yaml/PRJDS001/RP1/dsp1_orig.yml \
    -o /code/computed/example1/config__dsp1_orig
```

## Python Script를 이용한 DSP 실행

```
!python -m pydsp.run.worker multiple \
    -i ./data/cfx-baseline-subtracted/computed/example1/pcr_results/ \
    -c config/yaml/PRJDS001/GI-B-I/dsp2_generic.yml \
    -o ./data/cfx-baseline-subtracted/computed/example1/config__dsp2_orig
```

`pydsp.run.worker()` 가 DSP를 실행하는 함수인데 3개의 옵션이 있다. 
* `-i`: input data있는 path
* `-c`: DSP 설정값 file이 있는 path
* `-o`: DSP 연산 결과가 출력될 path

DSP 설정값에는 제품별, DSP 버전별로 따로 yaml파일이 존재한다. 참고로 DSP1은 autobaseline 결과가 출력이 안된다(알려주는 사람이 없어서 삽질 여러번 함). 
DSP2로 돌려야 autobaseline 결과도 같이 출력되는데 이때 출력될 path에 `config__dsp2_orig/basesub`, `config__dsp2_orig/dsp`, `config__dsp2_orig/log`라는 directorys가 있어야 한다.

### DSP 실행 for CFX Manager Data

In [6]:
# Execution in the local Environment
!python -m pydsp.run.worker multiple \
    -i ./data/cfx-baseline-subtracted/computed/example1/pcr_results/ \
    -c config/yaml/PRJDS001/GI-B-I/dsp2_generic_research.yml \
    -o ./data/cfx-baseline-subtracted/computed/example1/config__dsp2_orig

/opt/tljh/user/bin/python: Error while finding module specification for 'pydsp.run.worker' (ModuleNotFoundError: No module named 'pydsp')


In [None]:
# Execution in the PDA-pro
pydsp.run.worker.multiple_tasks(
    "./data/cfx-baseline-subtracted/computed/example1/pcr_results/",               # Input directory
    f"config/yaml/PRJDS001/GI-B-I/dsp2_generic_research.yml",          # Configuration
    f"./data/cfx-baseline-subtracted/computed/example1/config__dsp2_orig",    # Output directory
    4,                                      # Number of processes
    is_verbose=True                         # Verboase mode
)

### DSP 실행 for PDA-Raw-Sample Data

In [48]:
# Execution in the local Environment
!python -m pydsp.run.worker multiple \
    -i ./data/pda-raw-sample/computed/example1/pcr_results \
    -c config/yaml/PRJDS001/GI-B-I/dsp2_generic_research.yml \
    -o ./data/pda-raw-sample/computed/example1/config__dsp2_orig

/opt/tljh/user/bin/python: Error while finding module specification for 'pydsp.run.worker' (ModuleNotFoundError: No module named 'pydsp')


In [17]:
# Execution in the PDA-pro
pydsp.run.worker.multiple_tasks(
    "./data/pda-raw-sample/computed/example1/pcr_results",               # Input directory
    f"config/yaml/PRJDS001/GI-B-I/dsp2_generic_research.yml",          # Configuration
    f"./data/pda-raw-sample/computed/example1/config__dsp2_orig",    # Output directory
    4,                                      # Number of processes
    is_verbose=True                         # Verboase mode
)

2024-02-29 04:41:58,740 - 65109 - INFO - Check entry /home/jupyter-kmkim/dsp-research-strep-a/kkm/data/pda-raw-sample/computed/example1/pcr_results/admin_2015-02-24 10-10-02_BR100135_Allplex GI_B1_TCF_specificity-1.parquet
2024-02-29 04:41:58,740 - 65109 - INFO - Check entry /home/jupyter-kmkim/dsp-research-strep-a/kkm/data/pda-raw-sample/computed/example1/pcr_results/admin_2015-02-24 10-10-02_BR100135_Allplex GI_B1_TCF_specificity-1.parquet
2024-02-29 04:41:58,740 - 65109 - INFO - Check entry /home/jupyter-kmkim/dsp-research-strep-a/kkm/data/pda-raw-sample/computed/example1/pcr_results/admin_2015-02-24 10-10-02_BR100135_Allplex GI_B1_TCF_specificity-1.parquet
2024-02-29 04:41:58,754 - 65109 - INFO - Check entry /home/jupyter-kmkim/dsp-research-strep-a/kkm/data/pda-raw-sample/computed/example1/pcr_results/admin_2015-02-24 10-10-39_BR100172_Allplex GI_B1_TCF_specificity-3.parquet
2024-02-29 04:41:58,754 - 65109 - INFO - Check entry /home/jupyter-kmkim/dsp-research-strep-a/kkm/data/pda-r

0

### DSP (Strep Assay 적용 알고리즘 = Multi-Amp, 다중 증폭 알고리즘) 실행

1) Multiamp 적용한 (is_multiamp = 1) 설정값 yml 파일 생성 (1번만 하면됨)

In [9]:
DSP_VERSION = "2.1.1"
DSP_PRODUCT = "StrepAssay"

CONFIG_NAME = "240213_multiamp_research"
COMMENTS = "replace is_multiamp value as 1"

XSLX_PATH_8_STRIP = "./config-xlsx/dspconfig_template_240213_multiamp.xlsx"
# XSLX_PATH_96_CAP = PATH
#(
#    pda.ConfigParser("multiamp-2402", "xlsx")
#        .set_dsp_version(DSP_VERSION)
#        .set_dsp_product(DSP_PRODUCT)
#        .set_name(CONFIG_NAME)
#        .set_comments(COMMENTS)
#        .add_config("8-strip", path=XSLX_PATH_8_STRIP)
#        # .add_config("96-cap", path=XSLX_PATH_96_CAP)
#        .dump(f"./config/{CONFIG_NAME}.yml")
#)

2. DSP (Strep Assay 적용 알고리즘) 실행

In [21]:
CONFIG_NAME = "240213_multiamp_research"

In [22]:
pydsp.run.worker.multiple_tasks(
    "./data/pda-raw-sample/computed/example1/pcr_results",               # Input directory
    f"../pro/config/{CONFIG_NAME}.yml",                      # Configuration
    f"./data/strep_assay/computed/config__{CONFIG_NAME}",    # Output directory
    4,                                                       # Number of processes
    is_verbose=True                                          # Verboase mode
)

2024-02-28 10:46:01,954 - 61071 - INFO - Check entry /home/jupyter-kmkim/dsp-research-strep-a/kkm/data/pda-raw-sample/computed/example1/pcr_results/admin_2015-02-24 10-10-02_BR100135_Allplex GI_B1_TCF_specificity-1.parquet
2024-02-28 10:46:01,954 - 61071 - INFO - Check entry /home/jupyter-kmkim/dsp-research-strep-a/kkm/data/pda-raw-sample/computed/example1/pcr_results/admin_2015-02-24 10-10-02_BR100135_Allplex GI_B1_TCF_specificity-1.parquet
2024-02-28 10:46:01,954 - 61071 - INFO - Check entry /home/jupyter-kmkim/dsp-research-strep-a/kkm/data/pda-raw-sample/computed/example1/pcr_results/admin_2015-02-24 10-10-02_BR100135_Allplex GI_B1_TCF_specificity-1.parquet
2024-02-28 10:46:01,970 - 61071 - INFO - Check entry /home/jupyter-kmkim/dsp-research-strep-a/kkm/data/pda-raw-sample/computed/example1/pcr_results/admin_2015-02-24 10-10-39_BR100172_Allplex GI_B1_TCF_specificity-3.parquet
2024-02-28 10:46:01,970 - 61071 - INFO - Check entry /home/jupyter-kmkim/dsp-research-strep-a/kkm/data/pda-r

0

### DSP 실행: Autobaseline Data를 위한 실행

In [19]:
# Execution in the PDA-pro
pydsp.run.worker.multiple_tasks(
    "./data/pda-raw-sample/computed/example1/pcr_results",               # Input directory
    f"./config/yaml/PRJDS001/GI-B-I/dsp2_generic_research_auto.yml",          # Configuration
    f"./data/autobaseline-subtracted/computed/example1/config__dsp2_orig",    # Output directory
    4,                                      # Number of processes
    is_verbose=True                         # Verboase mode
)

2024-02-29 05:17:44,700 - 65109 - INFO - Check entry /home/jupyter-kmkim/dsp-research-strep-a/kkm/data/pda-raw-sample/computed/example1/pcr_results/admin_2015-02-24 10-10-02_BR100135_Allplex GI_B1_TCF_specificity-1.parquet
2024-02-29 05:17:44,700 - 65109 - INFO - Check entry /home/jupyter-kmkim/dsp-research-strep-a/kkm/data/pda-raw-sample/computed/example1/pcr_results/admin_2015-02-24 10-10-02_BR100135_Allplex GI_B1_TCF_specificity-1.parquet
2024-02-29 05:17:44,700 - 65109 - INFO - Check entry /home/jupyter-kmkim/dsp-research-strep-a/kkm/data/pda-raw-sample/computed/example1/pcr_results/admin_2015-02-24 10-10-02_BR100135_Allplex GI_B1_TCF_specificity-1.parquet
2024-02-29 05:17:44,700 - 65109 - INFO - Check entry /home/jupyter-kmkim/dsp-research-strep-a/kkm/data/pda-raw-sample/computed/example1/pcr_results/admin_2015-02-24 10-10-02_BR100135_Allplex GI_B1_TCF_specificity-1.parquet
2024-02-29 05:17:44,700 - 65109 - INFO - Check entry /home/jupyter-kmkim/dsp-research-strep-a/kkm/data/pda-r

0