### Effects of Variables on INR
* In healthy people an INR of 1.1 or below is considered normal. An INR range of 2.0 to 3.0 is generally an effective therapeutic range for people taking warfarin for disorders such as atrial fibrillation or a blood clot in the leg or lung. In certain situations, such as having a mechanical heart valve, you might need a slightly higher INR.
* If INR is too high, the blood is clotting too slow. https://www.mayoclinic.org/tests-procedures/prothrombin-time/about/pac-20384661
* Effects of treatments
    * Nsaid: Increase (when administered with warfarin) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2826747/
    * Transfusion
        * Plasma: Unclear https://www.ncbi.nlm.nih.gov/pubmed/16934060
        * Platelets: (Should) Decrease (since it helps form blood clots)
    * Anticoagulant: Increase
        * Delay effect: "Because warfarin has a long half-life, increases in the INR may not be noted for 24 to 36 hours after administration of the first dose, and maximum anticoagulant effect may not be achieved for 72 to 96 hours." https://www.aafp.org/afp/1999/0201/p635.html
    * Aspirin: Increase https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1492499/ https://bpac.org.nz/BT/2010/November/inr.aspx
* Chronics
    * Liver disease: Unclear
        * "Both procoagulants and anticoagulants that occur naturally in the body are produced by the liver, affecting your INR" http://www.ptinr.com/en/home/warfarin-you/general-information/health-conditions/liver-liver-disease-and-warfarin.html
    * Sickle cell: Decrease
        * "Sickle cell disorders, such as Hb SS and Hb SC, are associated with a hypercoagulable state that may contribute to the vaso-occlusive episodes observed in the disorders." https://www.ncbi.nlm.nih.gov/pubmed/11835343
* Interaction
    * Warfarin is likely to have interaction with other types of treatments
        * "Dose has an inverse relation with age"
        * "Drug interactions need to be considered when warfarin therapy is initiated." https://www.aafp.org/afp/1999/0201/p635.html


In [1]:
import pickle
import numpy as np
import scipy.stats
from preprocess import preprocess
from EM import EM
from plot import plot

In [2]:
bin_size = 60 * 18
cutoff = 10

In [3]:
data = pickle.load(open('../Data/unimputed_inr_patient_data.pkl', 'rb'), encoding='latin1')

In [4]:
y_pop, X_pop, c_pop = preprocess(data, cutoff, bin_size, missing_pct=20)

In [5]:
y_pop.shape

(107, 215)

### Results

* Cutoff 10, bin size 18, single effects(J=3), binary treatments, training_pct=.8, missing_pct=40, c=0
    * Pop: 0.8736229994959492; ind: 1.0661727806095787
* Cutoff 10, bin size 18, single effects(J=3), binary treatments, training_pct=.8, missing_pct=30, c=0
    * **Ind: 0.9905086506674388**
* Cutoff 5, bin size 18, multi effects(J=3), binary treatments, training_pct=.8, missing_pct=40
    * c=0: **Pop: 0.6068250959347828**
    * c!=0: Pop: 0.6673705701867897
* The best setting but with 1 iteration of EM
    * Ind: 0.7769050746610977

Coefficients learned with best population level results
* Coefficient for nsaid: [0.31291354 0.11150224 0.02927106]
* Coefficient for transfusion_plasma: [0.21646024 0.12142273 0.07316288]
* Coefficient for transfusion_platelet: [-0.24024057 -0.41378184 -0.28599656]
* Coefficient for anticoagulant: [-0.04605939  0.15568198  0.06412978]
* Coefficient for aspirin: [0.25202486 0.1102651  0.03499285]

Coefficients (jointly) learned with best population level results
* Coefficient for nsaid: [ 0.09144572  0.01808053 -0.00736706]
* Coefficient for transfusion_plasma: [-0.29370552 -0.15658492 -0.12820925]
* Coefficient for transfusion_platelet: [-0.32318352 -0.44511952 -0.43787362]
* Coefficient for anticoagulant: [-0.46557922  0.00142221 -0.02525039]
* Coefficient for aspirin: [-0.04017403 -0.04024049 -0.00733096]
* Coefficient for chronic kidney failure: 0.27480724324859296
* Coefficient for sickle cell: 0.238516267304465
* Coefficient for age: 0.01752724784756275

### Results Analysis
* Single vs. multi effects
    * When using single effect, prediction trajectory of individual level model appears flatter, perhaps contributing to a lower mse
* In EM for individual level model, the lowest MSE appears in different iterations for different individuals, but most often it appears after the first iteration. The total MSE increases with more iterations starting from iteration 1 (?!) 
    * This also happens sometimes in the simulation when run with only one sample. The plot seems to suggest that more iterations sometimes lead to spikes in prediction that doesn't correspond to the actual trajectory (the coefficients don't match up either), thus increasing MSE
    * This happens more drastically when we have more missingness (in the simulation)
    * Could simply because the model is learning bad coefficients, so as iterations continues it just gets worse. since with more missingness, we have less equations in the linear system so the solution is worse (?) 
* For population level, the MSE after the first iteration is also pretty close to the best MSE. In the run that produces the result, MSE rises after iterations begin then starts decreasing pretty soon
    * But at least in the simulation, MSE generally decreases with iterations or fluctuates around the lowest value

In [8]:
def get_data(patient):
    y = y_pop[patient, :].reshape(1, y_pop.shape[1])
    X = X_pop[patient, :, :].reshape(1, X_pop.shape[1], X_pop.shape[2])
    c = np.zeros((1, c_pop.shape[1])) #c_pop[patient, :].reshape(1, c_pop.shape[1]) 
    return (y, X, c)

In [9]:
mse = []

In [10]:
# patient is the patient index
def em_individual(patient):
    single_effect = True
    #print('Patient {}'.format(patient))
    y, X, c = get_data(patient)
    em = EM(y, X, c, 3, 0, .8, single_effect=single_effect)
    #iter_min_mse.append(em.run_EM(1000))
    em.run_EM(500)
    #print('Prediction MSE: {}'.format(em.get_MSE()))
    mse.append(em.get_MSE())
    '''
    if single_effect:
        print('Coefficient A: {}'.format(em.A))
    else:
        treatment_types = ['nsaid', 'transfusion_plasma', 'transfusion_platelet', 'anticoagulant', 'aspirin']
        for i, treatment in enumerate(treatment_types):
            print('Coefficient for {}: {}'.format(treatment, em.A[:, i]))
        static_types = ['chronic kidney failure', 'sickle cell', 'age']
        for j, static in enumerate(static_types):
            print('Coefficient for {}: {}'.format(static, em.b[j]))
    '''
    #plot(em, 0, bin_size)
    

In [11]:
%%time
%%capture
for patient in range(y_pop.shape[0]):
    em_individual(patient)

CPU times: user 14.3 s, sys: 15.3 ms, total: 14.3 s
Wall time: 14.3 s


In [12]:
sum(mse)/len(mse)

0.7708313333085737