# Notebook for conducting the PLP study
Patient-Level Prediction (PLP) is a methodology used in healthcare and clinical research, especially within the OMOP Common Data Model (CDM) ecosystem and the Observational Health Data Sciences and Informatics (OHDSI) initiative. The purpose of PLP is to develop and validate predictive models that can anticipate specific clinical events for individual patients based on their medical history and other relevant data.

### Import libraries

In [1]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from src import peticiones, plp, cohortes, miscelania, usoBBDD

### Import DataSet Synthea

Downloads the Dataset in the directory defined in *save_path*

In [2]:
save_path = "./data"  # Ruta en la que se descargará
#peticiones.loadSynthea(save_path)

### Create DataSet from Synthea's CSV files

From the directory defined in *save_path* it will be created a DB with DuckDB at the directory defined in *db_path*

In [3]:
csv_folder = save_path + '/Synthea27Nj'
db_path = save_path + '/Synthea_BBDD'
#usoBBDD.crear_bbdd_desde_csv(csv_folder, db_path) 

## Create cohorts and export to pandas DataFrame
3 cohorts are created:

### 1.- Target Cohort (Cohort of Interest):

This is the group of patients who will be studied or for whom the prediction will be made. This group is defined by a set of conditions or characteristics.
For example, you could have a cohort of people who are over 30 years old and have been diagnosed with diabetes.

### 2.- Outcome Cohort:

This is the clinical event you are trying to predict in the patients in the Target group. For example, whether a patient will be readmitted to the hospital within 30 days after discharge, or will develop a disease in the future.
The “outcome” is binary in many cases (occurs or does not occur), but there may be different types of outcomes depending on the clinical problem.

### 3.- Predictive Features:

These are the patient characteristics (demographics, treatments, diagnoses, procedures, etc.) that are used to build the prediction model. These data are obtained from the patient's medical history and other relevant sources, such as sociodemographic information, previous procedures, drug exposures, etc.
In PLP, these features are extracted directly from the OMOP database.

### 4.- Cohort to pandas DF

Every function returns a pandas DF 

In [4]:
# Parámetros para la cohorte target
target_tables = ['person']
target_conditions = ['person.year_of_birth < 1990']  # Ejemplo: personas mayores de 30 años
cohort_name = 'cohorte_1'

print('==================================')
print(f'Creando {cohort_name}')
print('==================================')

cohorte_1 = cohortes.create_cohort(db_path = db_path, tables = target_tables, where_conditions = target_conditions, cohort_name=cohort_name)


print('==============================')
print('          COHORTE 1')
print('==============================')
print(cohorte_1.head)

# Parámetros para la cohorte outcome
outcome_tables = ['person', 'CONDITION_ERA', 'CONCEPT']
outcome_conditions = ['CONCEPT.concept_name = \'Acute viral pharyngitis\'']  # Ejemplo: pacientes con Acute viral pharyngitis
cohort_target_name = 'cohorte_target'
cohort_out_name = 'cohorte_outcome'
print('==================================')
print('Creando cohortes target y outcome')
print('==================================')
target_cohort_df, outcome_cohort_df = cohortes.create_target_outcome_cohorts(db_path=db_path, target_tables=target_tables, target_conditions=target_conditions, 
                                                                             outcome_tables = outcome_tables, outcome_conditions= outcome_conditions, 
                                                                             target_cohort_name = cohort_target_name, outcome_cohort_name=cohort_out_name)
print('==============================')
print('          COHORTE TARGET')
print('==============================')
print(target_cohort_df.head)

print('==============================')
print('          COHORTE OUTCOME')
print('==============================')
print(outcome_cohort_df)

# Parámetros para las características predictorias
feature_tables = ['person', 'VISIT_DETAIL', 'DRUG_EXPOSURE']
feature_conditions = ['VISIT_DETAIL.visit_detail_start_date BETWEEN \'2019-01-01\' AND \'2023-01-01\'']

# Definir tabla de outcome (target) y condiciones
target_table = 'CONDITION_ERA'
target_condition = 'CONDITION_ERA.condition_concept_id = 201826'  # Ejemplo: Diabetes Milletius de tipo 2
predictive_features_name = 'predictive_features'

print('===============================================')
print('          COHORTE PREDICTIVE FEATURES          ')
print('===============================================')

# Definición de las tablas a utilizar
tables = ["person", "visit_occurrence", "drug_exposure", "condition_occurrence"]

# Definición de las columnas predictivas por tabla
feature_columns = {
    "person": ["year_of_birth", "gender_concept_id", "race_concept_id"],
    "visit_occurrence": ["visit_start_date", "visit_end_date", "visit_concept_id"],
    "drug_exposure": ["drug_concept_id", "drug_exposure_start_date"],
    "condition_occurrence": ["condition_concept_id", "condition_start_date"]
}

# Definir la tabla de outcomes y la condición para el target
target_table = "condition_occurrence"
target_condition = "condition_occurrence.condition_concept_id = 201826"

# Definir la columna de fecha índice
index_date_column = "visit_occurrence.visit_start_date"

# Definir el nombre de la tabla resultante que contendrá las variables predictivas
cohort_name = "predictive_features"

# Ejecutar la función para extraer las características predictoras
predictive_features = cohortes.extract_predictive_features(
    db_path=db_path,
    tables=tables,
    feature_columns=feature_columns,
    target_table=target_table,
    target_condition=target_condition,
    cohort_name=cohort_name,
    index_date_column=index_date_column,
    start_date='2018-01-01',
    end_date='2023-01-01'
)

# Mostrar las primeras filas del DataFrame resultante
print(predictive_features.head())

Creando cohorte_1
Executing query:

    CREATE OR REPLACE TABLE cohorte_1 AS
    SELECT *
    FROM "person"
    WHERE person.year_of_birth < 1990
    
          COHORTE 1
<bound method NDFrame.head of     person_id  gender_concept_id  year_of_birth  month_of_birth  day_of_birth  \
0           5               8507           1988               9            22   
1           7               8507           1938               2            22   
2           8               8532           1967               1            17   
3           9               8507           1978               6             1   
4          10               8507           1971               8            22   
5          11               8532           1953               9            21   
6          13               8507           1986               8            10   
7          14               8507           1987              12            21   
8          16               8532           1970               9       