# Family Medicine
This note book is dedicated to select diabetes and pre-diabetes patient based on their family medicine data source, the inclusion criterial is based on the corresponding ICPC code. This notebook includes following sections:
1. Read the files
2. Filter pre-diabetes and diabetes
3. Select earliest records
4. Write to disk

## Read the files

In [None]:
import pandas as pd 
import os
import numpy as np
import util.cleaning_tools as tools
%load_ext autoreload
%autoreload 2

In [None]:
# read the file
filepath = r'../DATAFILE'
datafile = 'fm_cn_problem'
fm_cn_record = tools.fileReader(filepath,datafile)
fm_cn_record

In [None]:
fm_mapper = tools.fileReader(filepath,"map_icpc")

## Filter pre-diabetes and diabetes
I filter the pre and diabetes based on the corresponding ICPC code

In [None]:
# select the pre-diabetes and diabetes patient records
fm_pre = fm_cn_record.loc[fm_cn_record.icpc == 'T901'] # pre-diab
fm_diab = fm_cn_record.loc[fm_cn_record.icpc == 'T90'] # diab

#label the groups
fm_pre = fm_pre.assign(diab_type="pre")
fm_diab = fm_diab.assign(diab_type="diab")

## Select the earliest records

In [None]:
combine_records = pd.concat([fm_pre, fm_diab])
combine_records.reset_index(inplace=True, drop=True) # reset the index since the cumcount require unique index
# row_number window function to get the rank in each patient each diab_type
combine_records["rnk"] = combine_records.sort_values("src_create_dtm")\
                       .groupby(by=["pseudo_patient_key", "diab_type"])\
                       .cumcount() + 1

In [None]:
combine_records = combine_records[combine_records.rnk == 1].sort_values(["diab_type"])

## Write to disk

In [None]:
# write to csv file
combine_records.rename({"src_create_dtm":"dx_dtm", "diff_in_hour_creation_dtm": "diff_hour"}, axis='columns', inplace=True)
combine_records["src"] = "fm"
combine_records[["pseudo_patient_key", "dx_dtm", "diff_hour", "diab_type", "src"]].to_csv(r"../tables/output/first_diag_fm.csv")