# Determining the cause of Hemiplegia 

A patient is brought to the hospital with hemiplegia (paralysis of one half of the body), and the doctor is given a brief family history and medical information by the patient's spouse, telling the doctor the patient was in a mild accident, and their father had brain cancer. The rest is unknown at the time, and the MRI, CT, and Brain Autopsy have yet to be completed. The doctor wants to know the probabilities of the patient having stroke, brain cancer, and brain trauma. What if the MRI and CT are completed?

There are many causes for hemiplegia, and the most common 3 causes for this are stroke, head trauma, and brain cancer. Hemiplegia can be caused by any one, multiple, or even none of these. 

(NOTICE: Most of these numbers are baseless/fabricated, and the model is modified from real diagnosis and medicine, sources at bottom)

- $HT$: Head Trauma
- $FH_S$: Family History of Stroke
- $FH_{BC}$: Family History of Brain Cancer
- $RA$: Patient involved in an accident recently
- $S$: Stroke
- $BC$: Brain Cancer
- $BT$:Brain/Head Trauma
- $MRI$: Positive MRI
- $BA$: Positive Brain Autopsy
- $CT$: Positive CT Scan
- $HP$: Hemiplegia 

<img src="img/hemiplegia.png" width="800">

| Variable | Conditional Probability (or Prior) | Variable | Conditional Probability (or Prior) |
|--------:|-------:|--------:|-------:|
| $HT$ | $$P(HT) = 0.33 $$ | $RA$ | $$P(RA) = 0.02$$ |
| $FH_S$ | $$P(FH_S) = 0.0025$$ | $BT$ | $$P(BT|RA) = .272$$ |
| $S$ | $$P(S|HT, FH_S) = 0.15$$ | | $$P(BT|\overline{RA}) = .01$$ |
| | $$P(S|HT, \overline{FH_S}) = 0.07$$ | $CT$ | $$P(CT|BT) = .1$$ |
| | $$P(S|\overline{HT}, FH_S) = 0.1$$ | | $$P(CT|\overline{BT}) = .95$$ |
| | $$P(S|\overline{HT}, \overline{FH_S}) = 0.0025$$ | $HP$ | $$P(HP|S, BC, BT) = 0.95$$ |
| $MRI$ | $$P(MRI | S) = 0.95$$ | | $$P(HP|\overline{S}, BC, BT) = 0.60$$ |
| | $$P(MRI | \overline{S}) = 0.01$$ | | $$P(HP|S, \overline{BC}, BT) = 0.80$$ |
| $FH_{BC}$| $$P(FH_{BC}) = 0.00043$$ | | $$P(HP|S, BC, \overline{BT}) = 0.75$$ |
| $BC$ | $$P(BC | FH_{BC}) = 0.05$$ | | $$P(HP|\overline{S}, \overline{BC}, BT) = 0.50$$ |
| | $$P(BC  |  \overline{FH_{BC}}) = 0.00043$$ | | $$P(HP|\overline{S}, BC, \overline{BT}) = 0.30$$ |
| $BA$ | $$P(BA|BC) = 0.999$$ | | $$P(HP|S, \overline{BC}, \overline{BT}) = 0.70$$ |
| | $$P(BA|\overline{BC}) = 0.001$$ | | $$P(HP|\overline{S}, \overline{BC}, \overline{BT}) = 0.0001$$ |

## Numbers not representative of real numbers

# Define the model in pgmpy

In [1]:
from pgmpy.models import BayesianModel as bysmodel
from pgmpy.factors.discrete import TabularCPD as tcpd

### Initialize the model with connections

In [2]:
# define the model with connections
model = bysmodel([('HT', 'S'), ('FHS', 'S'), ('S', 'MRI'), ('S', 'HP'), ('FHBC', 'BC'), 
                  ('BC', 'BA'), ('BC', 'HP'), ('RA', 'BT'), ('BT', 'CT'), ('BT', 'HP')])

### Add probabilities for each node

In [3]:
# Priors of Hypertension, Family history of stroke and brain cancer, and recent accident.
priorHT = tcpd(variable='HT', variable_card=2, values=[[.67, .33]])
priorFHS = tcpd(variable='FHS', variable_card=2, values=[[.9975, .0025]])
priorFHBC = tcpd(variable='FHBC', variable_card=2, values=[[.99957, .00043]])
priorRA = tcpd(variable='RA', variable_card=2, values=[[.98, .02]])

In [4]:
# Cond probs of rest
cpdS = tcpd(variable='S',  variable_card=2,
            evidence=['HT', 'FHS'], evidence_card=[2, 2],
            values=[[.9975, .9, .93, .85],
                    [.0025, .1, .07, .15]])

cpdBC = tcpd(variable='BC',  variable_card=2,
            evidence=['FHBC'], evidence_card=[2],
            values=[[.99957, .95],
                    [.00043, .05]])

cpdBT = tcpd(variable='BT',  variable_card=2,
            evidence=['RA'], evidence_card=[2],
            values=[[.99, .728],
                    [.01, .272]])

cpdMRI = tcpd(variable='MRI',  variable_card=2,
            evidence=['S'], evidence_card=[2],
            values=[[.99, .05],
                    [.01, .95]])

cpdBA = tcpd(variable='BA',  variable_card=2,
            evidence=['BC'], evidence_card=[2],
            values=[[.999, .001],
                    [.001, .999]])

cpdCT = tcpd(variable='CT',  variable_card=2,
            evidence=['BT'], evidence_card=[2],
            values=[[.9, .05],
                    [.1, .95]])

cpdHP = tcpd(variable='HP', variable_card=2,
             evidence=['S', 'BC', 'BT'], evidence_card=[2, 2, 2],
            values=[[.9999, .5, .7, .4, .3, .2, .25, .05],
                    [.0001, .5, .3, .6, .7, .8, .75, .95]])

### Add those probabilities to the model

In [5]:
# add probabilities to the model
model.add_cpds(priorHT, priorFHS, priorFHBC, priorRA, cpdS, cpdBC, cpdBT, cpdMRI, cpdBA, cpdCT, cpdHP)
# check consistency
model.check_model()

True

In [18]:
print(model.get_cpds('HT'))
print(model.get_cpds('FHS'))
print(model.get_cpds('FHBC'))
print(model.get_cpds('RA'))
print(model.get_cpds('S'))
print(model.get_cpds('BC'))
print(model.get_cpds('BT'))
print(model.get_cpds('MRI'))
print(model.get_cpds('BA'))
print(model.get_cpds('CT'))
print(model.get_cpds('HP'))

╒══════╤══════╕
│ HT_0 │ 0.67 │
├──────┼──────┤
│ HT_1 │ 0.33 │
╘══════╧══════╛
╒═══════╤════════╕
│ FHS_0 │ 0.9975 │
├───────┼────────┤
│ FHS_1 │ 0.0025 │
╘═══════╧════════╛
╒════════╤═════════╕
│ FHBC_0 │ 0.99957 │
├────────┼─────────┤
│ FHBC_1 │ 0.00043 │
╘════════╧═════════╛
╒══════╤══════╕
│ RA_0 │ 0.98 │
├──────┼──────┤
│ RA_1 │ 0.02 │
╘══════╧══════╛
╒═════╤════════╤═══════╤═══════╤═══════╕
│ HT  │ HT_0   │ HT_0  │ HT_1  │ HT_1  │
├─────┼────────┼───────┼───────┼───────┤
│ FHS │ FHS_0  │ FHS_1 │ FHS_0 │ FHS_1 │
├─────┼────────┼───────┼───────┼───────┤
│ S_0 │ 0.9975 │ 0.9   │ 0.93  │ 0.85  │
├─────┼────────┼───────┼───────┼───────┤
│ S_1 │ 0.0025 │ 0.1   │ 0.07  │ 0.15  │
╘═════╧════════╧═══════╧═══════╧═══════╛
╒══════╤═════════╤════════╕
│ FHBC │ FHBC_0  │ FHBC_1 │
├──────┼─────────┼────────┤
│ BC_0 │ 0.99957 │ 0.95   │
├──────┼─────────┼────────┤
│ BC_1 │ 0.00043 │ 0.05   │
╘══════╧═════════╧════════╛
╒══════╤══════╤═══════╕
│ RA   │ RA_0 │ RA_1  │
├──────┼──────┼───────┤
│ B

# With the created model, we use it to reason and infer diagnoses

## Using variable elimination

In [13]:
from pgmpy.inference import VariableElimination

VESolver = VariableElimination(model)

In [14]:
#Chance of Stroke and Brain Trauma
print('Stroke: %.4f%%' % (VESolver.query(['S'], evidence={'HP' : 1, 'FHS' : 1})['S'].values[1] * 100))
print('Brain Trauma: %.4f%%' % (VESolver.query(['BT'], evidence={'HP' : 1, 'RA' : 1})['BT'].values[1] * 100))
print('Brain Cancer: %.4f%%' % (VESolver.query(['BC'], evidence={'HP' : 1, 'FHBC' : 1})['BC'].values[1] * 100))

Stroke: 92.1757%
Brain Trauma: 91.4484%
Brain Cancer: 39.8694%


In [15]:
# With different test results
print('Stroke: %.4f%%' % (VESolver.query(['S'], evidence={'HP' : 1, 'FHS' : 1, 'MRI': 1})['S'].values[1] * 100))
print('Brain Trauma: %.4f%%' % (VESolver.query(['BT'], evidence={'HP' : 1, 'RA' : 1, 'CT': 0})['BT'].values[1] * 100))
print('Brain Cancer: %.4f%%' % (VESolver.query(['BC'], evidence={'HP' : 1, 'FHBC' : 1, 'BA': 0})['BC'].values[1] * 100))
print('Brain Cancer: %.4f%%' % (VESolver.query(['BC'], evidence={'HP' : 1, 'BA': 1})['BC'].values[1] * 100))

Stroke: 99.9107%
Brain Trauma: 37.2686%
Brain Cancer: 0.0663%
Brain Cancer: 85.0355%


## Bayesian Model Sampling

In [10]:
from pgmpy.factors.discrete import State
from pgmpy.sampling import BayesianModelSampling

SMPSolver = BayesianModelSampling(model)

nsamples = 1000

# case : have a family history of stroke, and hemiplegia
evd1  = [State('FHS', 1), State('HP', 1)]
smp1  = SMPSolver.rejection_sample(evidence=evd1, size=nsamples)
# case : with a recent accident, hemiplegia, and a  positive CT
evd2 = [State('RA', 1), State('HP', 1), State('CT', 1)]
smp2 = SMPSolver.rejection_sample(evidence=evd2, size=nsamples)

In [11]:
# Copied from demo
from pandas.core.frame import DataFrame

def calcCondProb(trace, event, cond):
    if type(trace) is DataFrame:
        trace = trace.transpose().to_dict().values()
    # find all samples satisfy conditions
    for k, v in cond.items():
        trace = [smp for smp in trace if smp[k] == v]
    # record quantity of all samples fulfill condition
    nCondSample = len(trace)
    # find all samples satisfy event
    for k, v in event.items():
        trace = [smp for smp in trace if smp[k] == v]
    # calculate conditional probability
    return len(trace) / nCondSample

In [12]:
print('Stroke : %.1f%%' % (calcCondProb(smp1, {'S' : 1}, {}) * 100))
print('Brain Cancer  : %.1f%%' % (calcCondProb(smp1, {'BC' : 1}, {}) * 100))
print('Brain Trauma   : %.1f%%' % (calcCondProb(smp2, {'BT' : 1}, {}) * 100))

Stroke : 92.7%
Brain Cancer  : 0.2%
Brain Trauma   : 98.7%


## Using Belief Propogation

In [7]:
from pgmpy.inference import BeliefPropagation

BPSolver = BeliefPropagation(model)

In [8]:
#Chance of Stroke and Brain Trauma
print('Stroke: %.4f%%' % (BPSolver.query(['S'], evidence={'HP' : 1, 'FHS' : 1})['S'].values[1] * 100))
print('Brain Trauma: %.4f%%' % (BPSolver.query(['BT'], evidence={'HP' : 1, 'RA' : 1})['BT'].values[1] * 100))
print('Brain Cancer: %.4f%%' % (BPSolver.query(['BC'], evidence={'HP' : 1, 'FHBC' : 1})['BC'].values[1] * 100))

Stroke: 92.1757%
Brain Trauma: 91.4484%
Brain Cancer: 39.8694%


In [9]:
# With different test results
print('Stroke: %.4f%%' % (BPSolver.query(['S'], evidence={'HP' : 1, 'FHS' : 1, 'MRI': 1})['S'].values[1] * 100))
print('Brain Trauma: %.4f%%' % (BPSolver.query(['BT'], evidence={'HP' : 1, 'RA' : 1, 'CT': 0})['BT'].values[1] * 100))
print('Brain Cancer: %.4f%%' % (BPSolver.query(['BC'], evidence={'HP' : 1, 'FHBC' : 1, 'BA': 0})['BC'].values[1] * 100))
print('Brain Cancer: %.4f%%' % (BPSolver.query(['BC'], evidence={'HP' : 1, 'BA': 1})['BC'].values[1] * 100))

Stroke: 99.9107%
Brain Trauma: 37.2686%
Brain Cancer: 0.0663%
Brain Cancer: 85.0355%


## Comparison of methods

As you can see above, belief propogation and variable elimination return the same results, and at a much faster rate than bayesian sampling. The values for these three methods are very similar, with only the sampling method slightly different, thus showing that they are all valid methods for calculation. If the network is usable and easy to compute, then it is better to use Variable Elimination or Belief propogation, but if the connections and the network are more difficult to compute, or there is a set of data for the samples and the priors are unknown, then it is much better to use the sampling method to deterine what it is that we are trying to find, when computing conditional probabilities. The final choices of what I would prefer to use for this project would be variable elimination or belief propogation, as the sampling took nearly 30 minutes of calcuations, to still not be completely accurate, as inferring with these calculations is indeed faster than generating and then computing ratios after the fact. Overall the model is able to easily reflect and infer probable causes for diseases, based upon the different conditions given. It accurately reflects the fact that some probabilities are much lower when given that one test case is true, and so on. 

### More test cases to try
Something to test and compare for yourself would be to see how given the same conditions, how likely each disease is. Below is a simple example using Believe propogation.

In [20]:
print('Stroke: %.4f%%' % (BPSolver.query(['S'], evidence={'HP' : 1, 'FHBC' : 1, 'RA': 0, 'MRI': 0})['S'].values[1] * 100))
print('Brain Cancer: %.4f%%' % (BPSolver.query(['BC'], evidence={'HP' : 1, 'FHBC' : 1, 'RA': 0, 'MRI': 0})['BC'].values[1] * 100))
print('Brain Trauma: %.4f%%' % (BPSolver.query(['BT'], evidence={'HP' : 1, 'FHBC' : 1, 'RA': 0, 'MRI': 0})['BT'].values[1] * 100))

Stroke: 4.3590%
Brain Cancer: 72.7026%
Brain Trauma: 24.2066%


### Sources
- https://www.heart.org/idc/groups/heart-public/@wcm/@sop/@smd/documents/downloadable/ucm_319587.pdf
- https://www.cdc.gov/stroke/facts.htm
- https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/812318
- http://events.braintumor.org/wp-content/uploads/2016/03/BrainTumorsBytheNumbers_12.04.15.pdf