# Cohort

We consider all adult patients that underwent kidney transplant with or without nephrectomy in our cohort.
If a patient has multiple records for a transplant within 7 days, we summarized this as one transplant on the first day (and assume that the other entries are "dirty data").

~We conclude with a cohort of 2089 patients, of the transplants 92 % are without nephrectomy.~
~99 % of the patients only have one transplant, 35 patients have two, and two patients have three transplants.~

The mean age of patients is 52 years (standard deviation of 13.5 years; excluding 6 patients with age anonymized due to re-identification risk). 37 % of the patients are female, leaving a majority of 63 % male. The ethnicity distribution is representative for New York City.

Down the line, we decided to only consider the first transplant for every patient, ending up with 2089 patients and 2089 transplants.

In [None]:
%run 00_default_options.ipynb

In [None]:
import math

import pandas as pd

from fiber.condition import Diagnosis, Procedure
from fiber import Cohort
from fiberutils.cohort_utils import (
    cohort_overlap, 
    deduplicate_cohort, 
    days_between_cohort_condition_occurrences,
)
from fiberutils.condition_utils import (
    compare_condition_incidence_in_cohort,
    condition_occurrence_distribution,
    condition_occurrence_quantiles_for_days
)

In [None]:
transplant_wo_nephrectomy_cond = Procedure(code='50360', context='CPT-4').age(min_age=18)
transplant_with_nephrectomy_cond = Procedure(code='50365', context='CPT-4').age(min_age=18)

transplant_condition = transplant_wo_nephrectomy_cond | transplant_with_nephrectomy_cond
transplant_cohort = Cohort(condition=transplant_condition)

len(transplant_cohort)

In [None]:
transplant_cohort_overlap = cohort_overlap({
    'Patients without nephrectomy (CPT-4: 50360)': Cohort(transplant_wo_nephrectomy_cond),
    'Patients with nephrectomy (CPT-4: 50365)': Cohort(transplant_with_nephrectomy_cond)
})

transplant_cohort_overlap['figure'].show()

In [None]:
cohort, _ = deduplicate_cohort(transplant_cohort, math.inf).values()

In [None]:
transplant_split = pd.merge(
    cohort.has_onset(
        time_windows=[[-7, 7]],
        condition=transplant_wo_nephrectomy_cond,
        name=f'without_nephrectomy',
    ),
    cohort.has_onset(
        time_windows=[[-7, 7]],
        condition=transplant_with_nephrectomy_cond,
        name=f'with_nephrectomy',
    )
)

{
    'occurrences without nephrectomy': transplant_split.without_nephrectomy_onset_from_7_days_before_to_7_days_after.sum() / len(transplant_split),
    'occurrences with nephrectomy': transplant_split.with_nephrectomy_onset_from_7_days_before_to_7_days_after.sum() / len(transplant_split)
}

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.figure(figsize=[6,6])
a = cohort.demographics['age']['figure']
a'Patient age at time of transplant')
a.set_xlabel('age in years', fontsize="12")
a.savefig('/home/martet02/age-distribution.pdf')

In [None]:
fig.show()

In [None]:
a = cohort.demographics

In [None]:
cohort.condition_statistics['figure'].show()

# Endpoints

Cytomegaloviral disease and urinary tract infections are reported as some of the most frequent infectious diseases after kidney transplant surgeries.
Also in our dataset, we can see not only that their incidence rates are among the highest, but also that their occurrences increases manifold in comparison to priori.

Cytomegaloviral disease and urinary tract infection are coded with ICD-9 and ICD-10 in MSDW.
To capture all codings correctly, we combine the diagnosis conditions with the OR operator,
i.e. patient had at least one of the conditions.

In [None]:
infection_cond = (
    Diagnosis(code=[
        'B33.%',   # other
        'B34.%',   # unspecified
        'B27.90',  # Infectious mononucleosis, unspecified without complication (EBV)
        'B97.89',  # Other viral agents as the cause of diseases classified elsewhere, BK virus
        'B15.%',   # HAV
        'B16.%',   # HBV
        'B17.%',   # other acute viral hepatitis (including HCV)
        'B18.%',   # chronic hepatitis
        'B19.%',   # unspecified chronic hepatitis
        'B20',     # HIV
        'B02.%',   # Varicella Zoster 
        ], 
        context='ICD-10'
    ) 
    | Diagnosis(code=['0%', '10%', '11%', '12%', '13%'], context='ICD-9')
)

compare_condition_incidence_in_cohort(
    condition=infection_cond, 
    cohort=cohort,
    lower_limit=-365, 
    upper_limit=365,
    should_calculate_increase=True,
)[0].head(100)

## Cytomegaloviral disease

For the patients in the cohort, circa 6.4 % (134) are diagnosed with cytomegaloviral disease within the first year after the transplant, which is a 13-fold increase over the year before the surgery. 
Out of all cases within the first year after the transplant, 61 % (81) occur in the second half of this period.

In [None]:
cmv_cond = Diagnosis(code='078.5', context='ICD-9') | Diagnosis(code=['B25%'], context='ICD-10')

In [None]:
cmv_longitudinality = condition_occurrence_distribution(
    cohort=cohort, 
    condition=cmv_cond,
    time_windows=[[-365, -1], [0, 365]]
)

In [None]:
ovi_longitudinality = condition_occurrence_distribution(
    cohort=cohort, 
    condition=viral_cond,
    time_windows=[[-365, -1], [0, 365]]
)

In [None]:
[ovi_longitudinality['incidence_rates']['results']['-365 to -1 days'] * 2089,
ovi_longitudinality['incidence_rates']['results']['0 to 365 days'] * 2089]

In [None]:
incidence_increase = compare_condition_incidence_in_cohort(
    condition=cmv_cond, 
    cohort=cohort,
    lower_limit=-365, 
    upper_limit=365,
    should_calculate_increase=True, 
    is_aggregated_condition=True
)[0]

In [None]:
incidence_increase

In [None]:
condition_occurrence_quantiles_for_days(
    cohort, 
    cmv_cond, 
    0, 
    365, 
    [7, 30, 90, 183, 365]
)['quantiles']

## Urinary Tract Infection

For the patients in the cohort, circa 34 % (715) are diagnosed with urinary tract infection within the first year after the transplant, which is a 9-fold increase over the year before the surgery. 
Out of all cases within the first year after the transplant, 71 % (511) occur in the second half of this period, and 50 % (355) in the first quarter. 

In [None]:
uti_cond = Diagnosis(code='599.0', context='ICD-9') | Diagnosis(code=['N39.0'], context='ICD-10')

In [None]:
uti_longitudinality = condition_occurrence_distribution(
    cohort, 
    condition=uti_cond,
    time_windows=[[0, 7], [0, 30], [0, 180], [0, 365], [0, math.inf]]
)

In [None]:
uti_longitudinality['incidence_rates']['results']

In [None]:
incidence_increase = compare_condition_incidence_in_cohort(
    condition=uti_cond, 
    cohort=cohort,
    lower_limit=-365, 
    upper_limit=365,
    should_calculate_increase=True, 
    is_aggregated_condition=True
)[0]

In [None]:
incidence_increase

In [None]:
condition_occurrence_quantiles_for_days(
    cohort, 
    uti_cond, 
    0, 
    365, 
    [7, 30, 90, 180, 365]
)['quantiles']

## Other viral infections

based on https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6312768/

(EBV, BKV, HAV, HBV, HCV, HIV, varicella zoster virus + other and unspecified)

In [None]:
viral_cond = Diagnosis(
    code=[
        'B33.%',   # other
        'B34.%',   # unspecified
        'B27.90',  # Infectious mononucleosis, unspecified without complication (EBV)
        'B97.89',  # Other viral agents as the cause of diseases classified elsewhere, BK virus
        'B15.%',   # HAV
        'B16.%',   # HBV
        'B17.%',   # other acute viral hepatitis (including HCV)
        'B18.%',   # chronic hepatitis
        'B19.%',   # unspecified chronic hepatitis
        'B20',     # HIV
        'B02.%',   # Varicella Zoster 
    ], 
    context='ICD-10'
) | Diagnosis(
    code=[
        '070.%',
        '071.%',
        '072.%',
        '073.%',
        '074.%',
        '075.%',
        '076.%',
        '077.%',
        '078.%',
        '079.%',
        '042',     # HIV
        '053.%',   # Varicella Zoster
    ],
    context='ICD-9'
)

In [None]:
viral_longitudinality = condition_occurrence_distribution(
    cohort=cohort, 
    condition=viral_cond,
    time_windows=[[0, 7], [0, 30], [0, 180], [0, 365], [0, math.inf]]
)

In [None]:
incidence_increase = compare_condition_incidence_in_cohort(
    condition=viral_cond, 
    cohort=cohort,
    lower_limit=-365, 
    upper_limit=365,
    should_calculate_increase=True, 
    is_aggregated_condition=True
)[0]

In [None]:
incidence_increase

In [None]:
condition_occurrence_quantiles_for_days(
    cohort, 
    viral_cond, 
    0, 
    365, 
    [7, 30, 90, 180, 365]
)['quantiles']

# When were the transplants done?

In [None]:
df = cohort.merge_patient_data()
l = (df.age_in_days / 365) + (df.date_of_birth.astype(str).str[:4].astype('float64'))

l.min(), l.max()

In [None]:
l.hist(bins=30)