# Notebook for conducting the PLP study
Patient-Level Prediction (PLP) is a methodology used in healthcare and clinical research, especially within the OMOP Common Data Model (CDM) ecosystem and the Observational Health Data Sciences and Informatics (OHDSI) initiative. The purpose of PLP is to develop and validate predictive models that can anticipate specific clinical events for individual patients based on their medical history and other relevant data.

### Import libraries

In [9]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from src import peticiones, plp, cohortes, miscelania, usoBBDD

### Import DataSet Synthea

Downloads the Dataset in the directory defined in *save_path*

In [10]:
save_path = "./data"  # Ruta en la que se descargará
peticiones.loadSynthea(save_path)

Descargando ZIP desde https://github.com/OHDSI/EunomiaDatasets/archive/refs/heads/main.zip...


### Create DataSet from Synthea's CSV files

From the directory defined in *save_path* it will be created a DB with DuckDB at the directory defined in *db_path*

In [5]:
csv_folder = save_path + '/Synthea27Nj'
db_path = save_path + '/Synthea_BBDD'
usoBBDD.crear_bbdd_desde_csv(csv_folder, db_path)

Tabla CARE_SITE creada a partir de CARE_SITE.csv
Tabla CDM_SOURCE creada a partir de CDM_SOURCE.csv
Tabla COHORT creada a partir de COHORT.csv
Tabla COHORT_DEFINITION creada a partir de COHORT_DEFINITION.csv
Tabla CONCEPT creada a partir de CONCEPT.csv
Tabla CONCEPT_ANCESTOR creada a partir de CONCEPT_ANCESTOR.csv
Tabla CONCEPT_CLASS creada a partir de CONCEPT_CLASS.csv
Tabla CONCEPT_RELATIONSHIP creada a partir de CONCEPT_RELATIONSHIP.csv
Tabla CONCEPT_SYNONYM creada a partir de CONCEPT_SYNONYM.csv
Tabla CONDITION_ERA creada a partir de CONDITION_ERA.csv
Tabla CONDITION_OCCURRENCE creada a partir de CONDITION_OCCURRENCE.csv
Tabla COST creada a partir de COST.csv
Tabla DEATH creada a partir de DEATH.csv
Tabla DEVICE_EXPOSURE creada a partir de DEVICE_EXPOSURE.csv
Tabla DOMAIN creada a partir de DOMAIN.csv
Tabla DOSE_ERA creada a partir de DOSE_ERA.csv
Tabla DRUG_ERA creada a partir de DRUG_ERA.csv
Tabla DRUG_EXPOSURE creada a partir de DRUG_EXPOSURE.csv
Tabla DRUG_STRENGTH creada a par

## Create cohorts and export to pandas DataFrame
3 cohorts are created:

### 1.- Target Cohort (Cohort of Interest):

This is the group of patients who will be studied or for whom the prediction will be made. This group is defined by a set of conditions or characteristics.
For example, you could have a cohort of people who are over 30 years old and have been diagnosed with diabetes.

### 2.- Outcome Cohort:

This is the clinical event you are trying to predict in the patients in the Target group. For example, whether a patient will be readmitted to the hospital within 30 days after discharge, or will develop a disease in the future.
The “outcome” is binary in many cases (occurs or does not occur), but there may be different types of outcomes depending on the clinical problem.

### 3.- Predictive Features:

These are the patient characteristics (demographics, treatments, diagnoses, procedures, etc.) that are used to build the prediction model. These data are obtained from the patient's medical history and other relevant sources, such as sociodemographic information, previous procedures, drug exposures, etc.
In PLP, these features are extracted directly from the OMOP database.

### 4.- Cohort to pandas DF

Every function returns a pandas DF 

In [8]:
# Parámetros para la cohorte target
target_tables = ['person', 'VISIT_DETAIL']
target_conditions = ['person.year_of_birth < 1990']  # Ejemplo: personas mayores de 30 años
cohort_name = 'cohorte_1'
cohorte_1 = cohortes.create_cohort(db_path = db_path, tables = target_tables, where_conditions = target_conditions, cohort_name=cohort_name)


print('==============================')
print('          COHORTE 1')
print('==============================')
print(cohorte_1.head)

# Parámetros para la cohorte outcome
outcome_tables = ['person', 'CONDITION_ERA', 'CONCEPT']
outcome_conditions = ['CONCEPT.concept_name = \'Acute viral pharyngitis\'']  # Ejemplo: pacientes con Acute viral pharyngitis
cohort_target_name = 'cohorte_target'
cohort_out_name = 'cohorte_outcome'
                                               
target_cohort_df, outcome_cohort_df = cohortes.create_target_outcome_cohorts(db_path=db_path, target_tables=target_tables, target_conditions=target_conditions, outcome_tables = outcome_tables, outcome_conditions= outcome_conditions, 
                                                                             target_cohort_name = cohort_target_name, outcome_cohort_name=cohort_out_name)
print('==============================')
print('          COHORTE TARGET')
print('==============================')
print(target_cohort_df.head)

print('==============================')
print('          COHORTE OUTCOME')
print('==============================')
print(outcome_cohort_df.head)


# Parámetros para las características predictorias
feature_tables = ['person', 'VISIT_DETAIL', 'DRUG_EXPOSURE']
feature_conditions = ['VISIT_DETAIL.visit_detail_start_date BETWEEN \'2019-01-01\' AND \'2023-01-01\'']

# Definir tabla de outcome (target) y condiciones
target_table = 'CONDITION_ERA'
target_condition = 'CONDITION_ERA.condition_concept_id = 201826'  # Ejemplo: Diabetes Milletius de tipo 2
predictive_features_name = 'predictive_features'

print('===============================================')
print('          COHORTE PREDICTIVE FEATURES          ')
print('===============================================')

predictive_features = cohortes.extract_predictive_features_with_target(db_path=db_path, tables=feature_tables, feature_columns=feature_conditions, target_table=target_table, target_condition=target_condition, cohort_name=predictive_features_name)

ConnectionException: Connection Error: Can't open a connection to same database file with a different configuration than existing connections