# Measuring <font color='red'>Phenotype</font> in OpenSAFELY-TPP
This short report describes how <font color='red'>phenotype</font> can be identified in the OpenSAFELY-TPP database, and the strengths and weaknesses of the methods. This is a living document that will be updated to reflect changes to the OpenSAFELY-TPP database and the patient records within.

## OpenSAFELY
OpenSAFELY is an analytics platform for conducting analyses on Electronic Health Records inside the secure environment where the records are held. This has multiple benefits: 

* We don't transport large volumes of potentially disclosive pseudonymised patient data outside of the secure environments for analysis
* Analyses can run in near real-time as records are ready for analysis as soon as they appear in the secure environment
* All infrastructure and analysis code is stored in GitHub repositories, which are open for security review, scientific review, and re-use

A key feature of OpenSAFELY is the use of study definitions, which are formal specifications of the datasets to be generated from the OpenSAFELY database. This takes care of much of the complex EHR data wrangling required to create a dataset in an analysis-ready format. It also creates a library of standardised and validated variable definitions that can be deployed consistently across multiple projects. 

The purpose of this report is to describe all such variables that relate to <font color='red'>phenotype</font>, their relative strengths and weaknesses, in what scenarios they are best deployed. It will also describe potential future definitions that have not yet been implemented.

## Available Records
OpenSAFELY-TPP runs inside TPP’s data centre which contains the primary care records for all patients registered at practices using TPP’s SystmOne Clinical Information System. This data centre also imports external datasets from other sources, including A&E attendances and hospital admissions from NHS Digital’s Secondary Use Service, and death registrations from the ONS. More information on available data sources can be found within the OpenSAFELY documentation. 

In [None]:
from IPython.display import display, Markdown
from report_functions import *

In [None]:
### CONFIGURE OPTIONS HERE ###

# Import file
input_path = '../output/data/input_all.feather'

# Definitions
definitions = ['derived_bmi']

# Dates
date_min = '2019-01-01'
date_max = '2019-12-31'
time_delta = 'M'

# Min/max range
min_range = 4
max_range = 200

# Null value – 0 or NA
null = 0

# Covariates
demographic_covariates = ['age_band', 'sex', 'ethnicity', 'region', 'imd']
clinical_covariates = ['dementia', 'diabetes', 'hypertension', 'learning_disability']

In [None]:
# Preprocess data with configurations above
num_definitions = len(definitions)
df_occ = preprocess_data(input_path, definitions, demographic_covariates, clinical_covariates, date_min, date_max, time_delta, num_definitions, null)

_______
## Descriptive Statistics

### Occurrence

#### Unique Patients by Definition

In [None]:
count_unique(df_occ, definitions, time_delta, 'patient')

In [None]:
for group in demographic_covariates:
    count_unique(df_occ, definitions, time_delta, 'patient', group)

In [None]:
for group in clinical_covariates:
    count_unique(df_occ, definitions, time_delta, 'patient', group)

#### Unique Patients Over Time

In [None]:
report_over_time(df_occ, definitions, 'patient')

In [None]:
for group in demographic_covariates:
    report_over_time(df_occ, definitions, 'patient', group)

In [None]:
for group in clinical_covariates:
    report_over_time(df_occ, definitions, 'patient', group)

#### Unique Measurements by Definition

In [None]:
count_unique(df_occ, definitions, time_delta, 'measurement')

In [None]:
for group in demographic_covariates:
    count_unique(df_occ, definitions, time_delta, 'measurement', group)

In [None]:
for group in clinical_covariates:
    count_unique(df_occ, definitions, time_delta, 'measurement', group)

#### Unique Measurements Over Time

In [None]:
report_over_time(df_occ, definitions, 'measurement')

In [None]:
for group in demographic_covariates:
    report_over_time(df_occ, definitions, 'measurement', group)

In [None]:
for group in clinical_covariates:
    report_over_time(df_occ, definitions, 'measurement', group)

#### Frequency of Update

In [None]:
report_update_frequency(df_occ, definitions, time_delta, num_definitions)

In [None]:
for group in demographic_covariates:
    report_update_frequency(df_occ, definitions, time_delta, num_definitions, group)

In [None]:
for group in clinical_covariates:
    report_update_frequency(df_occ, definitions, time_delta, num_definitions, group)

### Value

#### Values Out of Range

In [None]:
report_out_of_range(df_occ, definitions, min_range, max_range, num_definitions, null)

In [None]:
for group in demographic_covariates:
    report_out_of_range(df_occ, definitions, min_range, max_range, num_definitions, null, group)

In [None]:
for group in clinical_covariates:
    report_out_of_range(df_occ, definitions, min_range, max_range, num_definitions, null, group)

#### Distributions

In [None]:
report_distribution(df_occ, definitions, num_definitions)

In [None]:
for group in demographic_covariates:
    report_distribution(df_occ, definitions, num_definitions, group)

In [None]:
for group in clinical_covariates:
    report_distribution(df_occ, definitions, num_definitions, group)

#### Measures Over Time

In [None]:
measure_over_time(df_occ, definitions)

In [None]:
for group in demographic_covariates:
    measure_over_time(df_occ, definitions, group)

In [None]:
for group in clinical_covariates:
    measure_over_time(df_occ, definitions, group)

In [None]:
# Comparison only runs if more than 1 definition provided
if num_definitions > 1: 
    display(Markdown("""#### Comparison Across Definitions
    """))
    
    compare_value(df_occ, definitions)
    
    for group in demographic_covariates:
        compare_value(df_occ, definitions, group)
    
    for group in clinical_covariates:
        compare_value(df_occ, definitions, group)

## Discussion

<font color='red'>To fill.</font>

The purpose of this live report is to bring a systematic approach to creating, documenting, cross-checking, and sharing variables to improve analyses in OpenSAFELY-TPP. If you have improvements or edits to this report, please contact <font color='red'>owner</font>.