# Drug prescription
This note book is dedicated to select diabetes patient based on their drug prescription data source, the inclusion criterial is based on the provided input file drug.csv. This notebook include following sections:
1. Read the files
2. Filtering
3. Find the earliest date for each patient

## Read the file

In [None]:
import util.cleaning_tools as tools
from typing import *
import pandas as pd
import numpy as np
import os
import re
%load_ext autoreload
%autoreload 2

In [None]:
drug_records = tools.fileReader(r'../DATAFILE', r'phs_presc_data')

In [None]:
drug = pd.read_csv(r'../tables/input/drug.csv', header=0, index_col=0)

In [None]:
drug_desc = tools.fileReader(r'../DATAFILE', r'phs_drugs')

## Filtering
I first select the drug of diabetes-relevant and exlcude some of the drug based on advices provided by Doctors.

In [None]:
def getDiabDrug(drugnames:List[str]) -> pd.DataFrame:
    '''
    select the records of interested drugs
    Args:
        drugnames: list of drug names
    Return:
        targetd drugs and its' description
    '''
    temp = [drug_desc[drug_desc.drugname.str.contains(name)] for name in drugnames]
    return pd.concat(temp)
diab_drug = getDiabDrug(["INSULIN","VILDAGLIPTIN","GLUCAGON"])

In [None]:
# diab_drug.to_csv(r'../tables/output/diab_drug.csv')

As suggested by Dr.Chu , we need to exclude items of non DM-medicine

In [None]:
diab_drug_filtered = diab_drug.drop(index=[234,1579,3993,4148,8352,8691,9423,9760,9786,10713])

Join with the data table.

In [None]:
diab_record = pd.merge(left=drug_records, right=diab_drug_filtered, how='inner', left_on='item_cd', right_on='itemcode')

## Find the earliest date for each patient
Find out the earliest date of takig diabetes drug for each patient, we are only interested in the earliest records of prescription.

In [None]:
# find the row number
diab_record_rnk = tools.row_number(diab_record, "pseudo_patient_key", sort_key="disp_dtm")
diab_record["rnk"] = diab_record_rnk
first_diag = diab_record[diab_record.rnk == 1][["pseudo_patient_key", "presc_start_dtm", "diff_in_hour_dispense_dtm"]]

## Write to disk

In [None]:
first_diag["diab_type"] = "diab" # this data source only provide diabetes evidences.
first_diag["src"] = "drug"
# rename
first_diag.rename(columns={'presc_start_dtm':"dx_dtm", "diff_in_hour_dispense_dtm": "diff_hour"}, inplace=True)

In [None]:
# write to csv file
first_diag.to_csv(r'../tables/output/first_diag_drug.csv')