# Notebook for conducting the PLP study
Patient-Level Prediction (PLP) is a methodology used in healthcare and clinical research, especially within the OMOP Common Data Model (CDM) ecosystem and the Observational Health Data Sciences and Informatics (OHDSI) initiative. The purpose of PLP is to develop and validate predictive models that can anticipate specific clinical events for individual patients based on their medical history and other relevant data.

### Import libraries

In [3]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from src import peticiones, plp, cohortes, miscelania

### Import DataSet Synthea

Downloads the Dataset in the directory defined in *save_path*

In [4]:
save_path = "./data"  # Ruta en la que se descargará
peticiones.loadSynthea(save_path)

Descargando ZIP desde https://github.com/OHDSI/EunomiaDatasets/archive/refs/heads/main.zip...
Zip path ./data\temp_repo\repo.zip
ZIP descargado correctamente. Descomprimiendo en ./data\temp_repo...
Carpeta Synthea27Nj movida a ./data\Synthea27Nj
Carpetas temporales y archivo ZIP eliminados.


### Create DataSet from Synthea's CSV files

From the directory defined in *save_path* it will be created a DB with DuckDB at the directory defined in *db_path*

In [5]:
csv_folder = save_path + '/Synthea27Nj'
db_path = save_path + 'Synthea_BBDD'
peticiones.crear_bbdd_desde_csv(csv_folder, db_path)

Tabla CARE_SITE creada a partir de CARE_SITE.csv
Tabla CDM_SOURCE creada a partir de CDM_SOURCE.csv
Tabla COHORT creada a partir de COHORT.csv
Tabla COHORT_DEFINITION creada a partir de COHORT_DEFINITION.csv
Tabla CONCEPT creada a partir de CONCEPT.csv
Tabla CONCEPT_ANCESTOR creada a partir de CONCEPT_ANCESTOR.csv
Tabla CONCEPT_CLASS creada a partir de CONCEPT_CLASS.csv
Tabla CONCEPT_RELATIONSHIP creada a partir de CONCEPT_RELATIONSHIP.csv
Tabla CONCEPT_SYNONYM creada a partir de CONCEPT_SYNONYM.csv
Tabla CONDITION_ERA creada a partir de CONDITION_ERA.csv
Tabla CONDITION_OCCURRENCE creada a partir de CONDITION_OCCURRENCE.csv
Tabla COST creada a partir de COST.csv
Tabla DEATH creada a partir de DEATH.csv
Tabla DEVICE_EXPOSURE creada a partir de DEVICE_EXPOSURE.csv
Tabla DOMAIN creada a partir de DOMAIN.csv
Tabla DOSE_ERA creada a partir de DOSE_ERA.csv
Tabla DRUG_ERA creada a partir de DRUG_ERA.csv
Tabla DRUG_EXPOSURE creada a partir de DRUG_EXPOSURE.csv
Tabla DRUG_STRENGTH creada a par

## Create cohorts and PLP study
3 cohorts are created:

### 1.- Target Cohort (Cohort of Interest):

This is the group of patients who will be studied or for whom the prediction will be made. This group is defined by a set of conditions or characteristics.
For example, you could have a cohort of people who are over 30 years old and have been diagnosed with diabetes.

### 2.- Outcome Cohort:

This is the clinical event you are trying to predict in the patients in the Target group. For example, whether a patient will be readmitted to the hospital within 30 days after discharge, or will develop a disease in the future.
The “outcome” is binary in many cases (occurs or does not occur), but there may be different types of outcomes depending on the clinical problem.

### 3.- Predictive Features:

These are the patient characteristics (demographics, treatments, diagnoses, procedures, etc.) that are used to build the prediction model. These data are obtained from the patient's medical history and other relevant sources, such as sociodemographic information, previous procedures, drug exposures, etc.
In PLP, these features are extracted directly from the OMOP database.

### 4.- Predictive Model:

The predictive model is the algorithm that attempts to learn a relationship between the predictor variables and the outcome. The most common algorithms in PLP include logistic regression models, decision trees, random forest, neural networks, among others.
The model is trained using a historical cohort data set and then validated on a validation cohort data set to evaluate its performance.

### 5.- Model Validation:

Once the model has been trained, an evaluation is performed on an independent dataset to measure its performance. The most commonly used metrics in PLP include:
AUC-ROC (Area Under the Receiver Operating Characteristic Curve): measures the model's ability to distinguish between different outcomes.
Accuracy, Sensitivity, Specificity: evaluate how well the model predicts the expected outcomes.
F1-score: measures the balance between accuracy and sensitivity.

In [8]:
# Parámetros para la cohorte target
target_tables = ['person', 'VISIT_DETAIL']
target_join_columns = ['person.person_id = VISIT_DETAIL.person_id']
target_conditions = ['person.year_of_birth < 1990']  # Ejemplo: personas mayores de 30 años

# Parámetros para la cohorte outcome
outcome_tables = ['person', 'CONDITION_ERA', 'CONCEPT']
outcome_join_columns = ['person.person_id = CONDITION_ERA.person_id', 'CONDITION_ERA.condition_concept_id = CONCEPT.concept_id']
outcome_conditions = ['CONCEPT.concept_name = \'Acute viral pharyngitis\'']  # Ejemplo: pacientes con Acute viral pharyngitis

cohortes.create_cohort(db_path = db_path, target_tables = target_tables , target_join_columns = target_join_columns, target_condition = target_conditions, cohort_name='cohorte1')

# Parámetros para las características predictorias
feature_tables = ['person', 'VISIT_DETAIL', 'DRUG_EXPOSURE']
feature_join_columns = [
    'person.person_id = VISIT_DETAIL.person_id',
    'person.person_id = DRUG_EXPOSURE.person_id'
]
feature_conditions = ['VISIT_DETAIL.visit_detail_start_date BETWEEN \'2019-01-01\' AND \'2023-01-01\'']
feature_columns = ['person.person_id', 'person.year_of_birth', 'VISIT_DETAIL.visit_detail_start_date', 'DRUG_EXPOSURE.drug_concept_id']

# Definir tabla de outcome (target) y condiciones
target_table = 'CONDITION_ERA'
target_join_column = 'person.person_id = CONDITION_ERA.person_id'
target_condition = 'CONDITION_ERA.condition_concept_id = 201826'  # Ejemplo: Diabetes Milletius de tipo 2

# Ejecutar el PLP
results = plp.run_plp_with_algorithms(db_path, target_tables, target_join_columns, target_conditions, outcome_tables, outcome_join_columns, outcome_conditions, 
                                  feature_tables, feature_join_columns, feature_conditions, feature_columns, target_table, target_join_column, target_condition)

TypeError: create_cohort() got an unexpected keyword argument 'target_tables'

In [3]:
miscelania.print_study_summary(feature_columns, target_condition, results)

**** RESUMEN DEL ESTUDIO PLP ****

El estudio PLP ha sido realizado con el objetivo de predecir si un paciente será diagnosticado con la condición:
  - CONDITION_ERA.condition_concept_id = 201826

Características predictorias utilizadas en el estudio:
  - person.person_id
  - person.year_of_birth
  - VISIT_DETAIL.visit_detail_start_date
  - DRUG_EXPOSURE.drug_concept_id

Se han entrenado y evaluado los siguientes modelos:

Modelo: Decision Tree
  - Precisión: 0.5656
  - Reporte de clasificación:
{'0': {'precision': 0.9994186046511628, 'recall': 0.5595789865294291, 'f1-score': 0.7174528723753117, 'support': 64511.0}, '1': {'precision': 0.031563160406299, 'recall': 0.9778247096092925, 'f1-score': 0.06115238566947335, 'support': 947.0}, 'accuracy': 0.5656298695346634, 'macro avg': {'precision': 0.5154908825287309, 'recall': 0.7687018480693608, 'f1-score': 0.38930262902239254, 'support': 65458.0}, 'weighted avg': {'precision': 0.9854163573215793, 'recall': 0.5656298695346634, 'f1-score': 0