# To do  
Figure out which had no CSF taken  
Figure out all the potential indep vars and what they are coded for  
Figure out why rdtresult doesn't match pdf

The end goal here is a logistic regression for both cerebral malaria and bacterial menengitis to serve as a classifier trained on 480 cases.

Outcome variables:  
cerebmal - 0 for no, 1 for yes
bactmen - 0 for no meningitis, 1 for yes, 2 for probable?  
bactmenall - contains a 3 for some patients, This could be bacteremia, which matches the number listed in the pdf

Variables of interest: These are the variables which would be known on day 1  
malaria - 0 for no, 1 for yes, indicated by microscopy 166 cases near match to pdf (167)  
rdtresult - 7 0s, 216 1s, 254 2s.  According to pdf there are 209 positives in rdt...\

Notes:  
404 patients with csf collected  
Blood culture taken in 99%


In [13]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

from sklearn import linear_model

In [14]:
#Outcomes of interest
cns_raw = pd.read_csv('data/cnsfinall2.csv', index_col=0)
cns_raw.shape
cns_raw.cerebmal.unique()
cns_raw.bactmen.unique()
cns_raw.bactmenall.unique()

(480, 976)

array([0, 1])

array([0, 1, 2])

array([0, 1, 2, 3])

In [15]:
#Indep vars of interest
cns_raw.dimain.unique() #initial diagnosis: 1=meningitis, 2=cerbral malaria, 3=meningoencephalitis, 4=other
cns_raw.vacmen.unique() #vaccination against meningitis'
cns_raw.vacpneumo.unique() #as vacbcg for pneumococcal vaccine
cns_raw.conv.unique() #convulsions
cns_raw.temp.unique() #temp (C)
cns_raw.clinaids.unique() #Clinical diagnosis of AIDS
cns_raw.diaids.unique() #Diagnosis of AIDS at inclusion
cns_raw.rdtresult.unique()

array([2, 3, 1, 4])

array([0, 3, 2, 1])

array([3, 2, 0, 1])

array([ 0.,  1., nan])

array([38.6, 38.4, 35.9, 37. , 38.1, 38. , 36.8, 37.8, 40.1, 36.9, 37.4,
       37.5, 37.2, 36.7, 36.4, 39. , 39.5, 37.3, 40.9, 38.2, 37.6, 39.7,
       40.3, 39.6, 39.4, 38.5, 37.7, 38.3, 36.1, 39.8, 38.8, 39.9, 37.1,
       38.7, 36. , 38.9, 37.9, 36.5, 39.1, 40. , 39.2, 40.2, 34.9, 36.6,
       35.7, 36.2, 41.2, 35.8, 40.6, 34.7, 40.4, 41.9, 35.4, 39.3, 35.6])

array([0, 1])

array([ 0.,  1., nan])

array([ 1.,  2., nan,  0.])

In [16]:
cns_raw['combinedoutcome'] = cns_raw.cerebmal + (cns_raw.bactmenall > 0)*2 #0 for neither, 1 for cerebmal, 2 for bactmen, 3 for both

In [17]:
sum(cns_raw.combinedoutcome)
cns_raw.combinedoutcome.unique()
sum(cns_raw.combinedoutcome==0)
sum(cns_raw.combinedoutcome==1)
sum(cns_raw.combinedoutcome==2)
sum(cns_raw.combinedoutcome==3)

258

array([0, 1, 2, 3])

300

105

72

3

In [18]:
cns_raw.to_csv('data/cns_outcomes.csv')

# **Some Stata code for logistic regression exploration**
/**
Author: Michael Williams
Exploration of cns_infection data, multinomial regression
**/

clear all

import delimited cns_outcomes.csv // Exploratory analysis of all outcomes (0 for neither, 1 for cerebmal, 2 for bactmen, 3 for both)

summ recodediag  //Dependent variable

mlogit recodediag i.vacmen //Vaccination against menengitis status has no appreciable effect

mlogit recodediag i.vacpneumo //Vaccination against pneumococcus status has no appreciable effect

mlogit recodediag i.rdtresult if rdtresult > 0 //Assuming a 1 is positive then a positive rdt test is highly predictive of cerebmal

mlogit recodediag i.conv //presence of convulsions statistical evidence for malaria


// #categorical: main diagnosis at inclusion: 1=meningitis, 2=cerbral malaria, 3=meningoencephalitis, 4=other

mlogit recodediag i.dimain  //diagnosis at onset is highly predictive of true outcome

// Generate rainy season variable, April to October (1), Nov to Mar (0)

split incdate, p("-")
gen rainseason = 1 if incdate2 =="04" | incdate2 =="05" ///
	| incdate2 =="06" | incdate2 =="07" | incdate2 =="08" ///
	| incdate2 =="09" |incdate2 =="10"
replace rainseason = 0 if missing(rainseason)

mlogit recodediag i.rainseason // It being the rainy season seems to (insignificantly) boost probability of both diseases...

mlogit recodediag clincon // convulsions indicate malaria

mlogit recodediag i.inchypo // nothing

mlogit recodediag i.inchypert // nothing

mlogit recodediag i.incirrit  // irritability indicative of mening and not having cerebmal

mlogit recodediag i.incblant if incblant != 9 // over 9mo, blantyre score indicative of cerebmal

mlogit recodediag i.incblant9  if incblant9 != 9 // for under 9mo, the association is weaker

mlogit recodediag i.incglas if incglas != 9 // glasgow score insignificant

mlogit recodediag i.inchead if inchead != 9 // not sure how this variable is coded but it appears that headache is negatively correlated with cerebmal

mlogit recodediag i.incphoto if incphoto != 9 // similarly confusing coding...

mlogit recodediag i.incneck if incneck != 9 // neckpain at inclusion negative for cerebmal and positive for bactmen

mlogit recodediag i.incfont if incfont !=9 // some positive correlation with bactmen but a lot of missingness apparently

mlogit recodediag i.incneuro if incneuro !=9 // nothing here

mlogit recodediag i.incseiza if  incseiza != 9 // seizure on admission indicative of malaria

mlogit recodediag i.incseizh if  incseizh != 9 // seizure in past 48hr indicative of malaria

mlogit recodediag i.inckern if inckern != 9 // kerning sign neg for cerebmal and pos for bactmen

mlogit recodediag i.incbrud if incbrud != 9 // brudzinski sign highly indicative of bactmen but only present in 14 patients, will likely be a good classifier

mlogit recodediag i.incpurp if incpurp != 9 // nothing here, seems like



# Results - Exploratory Analysis
1. RDT test highly predictive of ultimate outcome
1. Initial diagnosis highly predictive of ultimate outcome
1. Vaccination status (meningitis) not particularly useful
1. Convulsions predictive for cerebral malaria
1. Irritablility predictive of bacterial mening and predictive of no malaria
1. blantyre score indicates cerebmal
1. headache suggests no cerebmal
1. neckpain is negative for cerebmal and positive for bactmen
1. seizure both on admission and 48hr prior indicative of cerebmal
1. Generally, admission variables have value because at least one of them must be true to even get into the study
1. Need to understand more variables
