# Cohort Diagram Numbers

**Goal:** Get values for updated cohort diagram (i.e. cohort described in [`tables/table-2_without-neither.ipynb`](tables/table-2_without-neither.ipynb)).

In [1]:
import pandas as pd
from tableone import TableOne

from cleaning.caregivers.main import load_data
from cleaning.utils import get_project_root
from notebooks.tables.ref import columns, nonnormal

df = load_data()

df_updated = df[df["ANNOTATION"] != "NEITHER"]

We'll need this to get the number of notes associated with the patients:

In [2]:
PROJECT_ROOT = get_project_root()
PATH_ORIGINAL = PROJECT_ROOT / "data/raw/caregivers_set13Jul2020.csv"

df_original = pd.read_csv(PATH_ORIGINAL)

## Number of patients

Before filtering `NEITHER` group:

In [3]:
df["SUBJECT_ID"].nunique()

1265

After filtering `NEITHER` group:

In [4]:
df_updated["SUBJECT_ID"].nunique()

801

Difference:

In [5]:
(df["SUBJECT_ID"].nunique() - 
 df_updated["SUBJECT_ID"].nunique())

464

## Number of hospital admissions

Before filtering `NEITHER` group:

In [6]:
df["HADM_ID"].nunique()

1389

After filtering `NEITHER` group:

In [7]:
df_updated["HADM_ID"].nunique()

858

Difference:

In [8]:
(df["HADM_ID"].nunique() - 
 df_updated["HADM_ID"].nunique())

531

## Number of ICU admissions

Before filtering `NEITHER` group:

In [9]:
df["N_ICUSTAYS"].sum()

1596

After filtering `NEITHER` group:

In [10]:
df_updated["N_ICUSTAYS"].sum()

994

Difference:

In [11]:
df["N_ICUSTAYS"].sum() - df_updated["N_ICUSTAYS"].sum()

602

## Number of notes

In [12]:
has_hadm = lambda df_: df_original["HADM_ID"].isin(df_["HADM_ID"])

Before filtering `NEITHER` group:

In [13]:
df_original[has_hadm(df)]["TEXT"].nunique()

30192

After filtering `NEITHER` group:

In [14]:
df_original[has_hadm(df_updated)]["TEXT"].nunique()

19269

Difference:

In [15]:
(df_original[has_hadm(df)]["TEXT"].nunique() - 
 df_original[has_hadm(df_updated)]["TEXT"].nunique())

10923