# Notebook for conducting the PLP study
Patient-Level Prediction (PLP) is a methodology used in healthcare and clinical research, especially within the OMOP Common Data Model (CDM) ecosystem and the Observational Health Data Sciences and Informatics (OHDSI) initiative. The purpose of PLP is to develop and validate predictive models that can anticipate specific clinical events for individual patients based on their medical history and other relevant data.

### Import libraries

In [1]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from src import peticiones, plp, cohortes, miscelania

## Create cohorts and PLP study
3 cohorts are created:

### 1.- Target Cohort (Cohort of Interest):

This is the group of patients who will be studied or for whom the prediction will be made. This group is defined by a set of conditions or characteristics.
For example, you could have a cohort of people who are over 30 years old and have been diagnosed with diabetes.

### Outcome Cohort:

This is the clinical event you are trying to predict in the patients in the Target group. For example, whether a patient will be readmitted to the hospital within 30 days after discharge, or will develop a disease in the future.
The “outcome” is binary in many cases (occurs or does not occur), but there may be different types of outcomes depending on the clinical problem.

### Predictive Features:

These are the patient characteristics (demographics, treatments, diagnoses, procedures, etc.) that are used to build the prediction model. These data are obtained from the patient's medical history and other relevant sources, such as sociodemographic information, previous procedures, drug exposures, etc.
In PLP, these features are extracted directly from the OMOP database.

### Predictive Model:

The predictive model is the algorithm that attempts to learn a relationship between the predictor variables and the outcome. The most common algorithms in PLP include logistic regression models, decision trees, random forest, neural networks, among others.
The model is trained using a historical cohort data set and then validated on a validation cohort data set to evaluate its performance.

### Model Validation:

Once the model has been trained, an evaluation is performed on an independent dataset to measure its performance. The most commonly used metrics in PLP include:
AUC-ROC (Area Under the Receiver Operating Characteristic Curve): measures the model's ability to distinguish between different outcomes.
Accuracy, Sensitivity, Specificity: evaluate how well the model predicts the expected outcomes.
F1-score: measures the balance between accuracy and sensitivity.

In [2]:
# Parámetros de ejemplo
db_path = 'data/Eunomia/synthea27nj_5.4_bbdd'

#peticiones.crear_bbdd_desde_csv('data/Eunomia/Synthea27Nj_5.4', 'data/Eunomia/Synthea27Nj_5.4_BBDD')

# Parámetros para la cohorte target
target_tables = ['person', 'VISIT_DETAIL']
target_join_columns = ['person.person_id = VISIT_DETAIL.person_id']
target_conditions = ['person.year_of_birth < 1990']  # Ejemplo: personas mayores de 30 años

# Parámetros para la cohorte outcome
outcome_tables = ['person', 'CONDITION_ERA', 'CONCEPT']
outcome_join_columns = ['person.person_id = CONDITION_ERA.person_id', 'CONDITION_ERA.condition_concept_id = CONCEPT.concept_id']
outcome_conditions = ['CONCEPT.concept_name = \'Acute viral pharyngitis\'']  # Ejemplo: pacientes con Acute viral pharyngitis

# Parámetros para las características predictorias
feature_tables = ['person', 'VISIT_DETAIL', 'DRUG_EXPOSURE']
feature_join_columns = [
    'person.person_id = VISIT_DETAIL.person_id',
    'person.person_id = DRUG_EXPOSURE.person_id'
]
feature_conditions = ['VISIT_DETAIL.visit_detail_start_date BETWEEN \'2019-01-01\' AND \'2023-01-01\'']
feature_columns = ['person.person_id', 'person.year_of_birth', 'VISIT_DETAIL.visit_detail_start_date', 'DRUG_EXPOSURE.drug_concept_id']

# Definir tabla de outcome (target) y condiciones
target_table = 'CONDITION_ERA'
target_join_column = 'person.person_id = CONDITION_ERA.person_id'
target_condition = 'CONDITION_ERA.condition_concept_id = 201826'  # Ejemplo: Diabetes Milletius de tipo 2

# Ejecutar el PLP
results = plp.run_plp_with_algorithms(db_path, target_tables, target_join_columns, target_conditions, outcome_tables, outcome_join_columns, outcome_conditions, 
                                  feature_tables, feature_join_columns, feature_conditions, feature_columns, target_table, target_join_column, target_condition)

Creating target cohort...
Executing query:

    CREATE OR REPLACE TABLE target_cohort AS
    SELECT *
    FROM "person" LEFT JOIN "VISIT_DETAIL" ON person.person_id = VISIT_DETAIL.person_id
    WHERE person.year_of_birth < 1990
    
Target Cohort
   person_id  gender_concept_id  year_of_birth  month_of_birth  day_of_birth  \
0          5               8507           1988               9            22   
1          5               8507           1988               9            22   
2          5               8507           1988               9            22   
3          5               8507           1988               9            22   
4          5               8507           1988               9            22   

  birth_datetime  race_concept_id  ethnicity_concept_id location_id  \
0     1988-09-22             8527              38003563        None   
1     1988-09-22             8527              38003563        None   
2     1988-09-22             8527              38003563    

In [3]:
miscelania.print_study_summary(feature_columns, target_condition, results)

**** RESUMEN DEL ESTUDIO PLP ****

El estudio PLP ha sido realizado con el objetivo de predecir si un paciente será diagnosticado con la condición:
  - CONDITION_ERA.condition_concept_id = 201826

Características predictorias utilizadas en el estudio:
  - person.person_id
  - person.year_of_birth
  - VISIT_DETAIL.visit_detail_start_date
  - DRUG_EXPOSURE.drug_concept_id

Se han entrenado y evaluado los siguientes modelos:

Modelo: Decision Tree
  - Precisión: 0.5656
  - Reporte de clasificación:
{'0': {'precision': 0.9994186046511628, 'recall': 0.5595789865294291, 'f1-score': 0.7174528723753117, 'support': 64511.0}, '1': {'precision': 0.031563160406299, 'recall': 0.9778247096092925, 'f1-score': 0.06115238566947335, 'support': 947.0}, 'accuracy': 0.5656298695346634, 'macro avg': {'precision': 0.5154908825287309, 'recall': 0.7687018480693608, 'f1-score': 0.38930262902239254, 'support': 65458.0}, 'weighted avg': {'precision': 0.9854163573215793, 'recall': 0.5656298695346634, 'f1-score': 0