Problem (Re)Statement:

* Shortness of breath (dyspnea) may be due to tuberculosis, lung cancer or bronchitis, or none of them, or more than one of them. 
* A recent visit to Asia increases the chances of tuberculosis.
* Smoking is known to be a risk factor for both lung cancer and bronchitis. 
* A positive chest X-ray suggests either lung cancer or tuberculosis, but cannot distinguish between them

Here is a data set to pull your model parameters from.  For all fields, 0 means False and 1 means True.

In [19]:
from pandas import *
df = pandas.read_csv("asia.csv")

In [20]:
df.head()

Unnamed: 0,Smoker,LungCancer,VisitToAsia,Tuberculosis,XRay,Bronchitis,Dyspnea
0,1,1,0,0,1,1,1
1,0,0,0,0,1,1,1
2,0,0,0,0,0,1,1
3,0,0,0,0,0,1,1
4,1,0,0,0,0,1,0


<image src="asia.gif" size=200/>

<image src="asia.gif"/>


Begin by writing out your model.  For example here are names of some nodes, and the arcs that connect them.  The arrow -> means a parent/child relationship

<pre>
Asia                                   -> Tuberculosis

Smoking                                -> LungCancer, Bronchitis

Tuberculosis, LungCancer               -> TuberculosisORLungCancer

TuberculosisORLungCancer               -> X-ray

TuberculosisORLungCancer, Bronchitis   -> Dyspnea

</pre>

<span style="color:red">
Informally write your model in this cell -- using the notation above. 
It will determine the parameters you will need to get from the data set
</span>

In [33]:
from pomegranate import *

asiaCounts = {}
avc = (df.VisitToAsia.value_counts() / len(df.VisitToAsia))
for i in range(0, 2):
    asiaCounts[i] = avc[i]
    
smokerCounts = {}
svc = (df.Smoker.value_counts() / len(df.Smoker))
for i in range(0, 2):
    smokerCounts[i] = svc[i]
    
tuberculosisCPT = []
tuberculosisCounts = pandas.crosstab(df.Tuberculosis, df.VisitToAsia, normalize='columns')

for asia in range(0, 2):
    for tub in range(0, 2):
        tuberculosisCPT.append([asia, tub, tuberculosisCounts[asia][tub]])
    
lungCancerCPT = []
lungCancerCounts = pandas.crosstab(df.LungCancer, df.Smoker, normalize='columns')

for smoker in range(0, 2):
    for lung in range(0, 2):
        lungCancerCPT.append([smoker, lung, lungCancerCounts[smoker][lung]])
        
bronchitisCPT = []
bronchitisCounts = pandas.crosstab(df.Bronchitis, df.Smoker, normalize='columns')

for smoker in range(0, 2):
    for bron in range(0, 2):
        bronchitisCPT.append([smoker, bron, bronchitisCounts[smoker][bron]])

xRayCPT = []
xRayCounts = pandas.crosstab(df.XRay, [df.Tuberculosis, df.LungCancer], normalize='columns')
for tub in range(0,2):
    for lung in range(0,2):
        for xray in range(0,2):
            xRayCPT.append([tub, lung, xray, xRayCounts[tub][lung][xray]])

dyspneaCPT = []
dyspneaCounts = pandas.crosstab(df.Dyspnea, [df.Tuberculosis, df.LungCancer, df.Bronchitis], normalize='columns')
for tub in range(0,2):
    for lung in range(0,2):
        for bron in range(0,2):
            for dys in range(0,2):
                dyspneaCPT.append([tub, lung, bron, dys, dyspneaCounts[tub][lung][bron][dys]])

Now define your distributions


In [34]:
#  All your distributions in this cell

asianDist = DiscreteDistribution(asiaCounts)
smokerDist = DiscreteDistribution(smokerCounts)
tuberculosisDist = ConditionalProbabilityTable(tuberculosisCPT, [asianDist])   
lungCancerDist = ConditionalProbabilityTable(lungCancerCPT, [smokerDist])       
bronchitisDist = ConditionalProbabilityTable(bronchitisCPT, [smokerDist])
xRayDist = ConditionalProbabilityTable(xRayCPT, [tuberculosisDist, lungCancerDist])
dyspneaDist = ConditionalProbabilityTable(dyspneaCPT, [tuberculosisDist, lungCancerDist, bronchitisDist])


Next define the nodes in your network

In [35]:
# All your nodes in this cell

asian = Node(asianDist, name="Asian")
smoker = Node(smokerDist, name="Smoker")
tuberculosis = Node(tuberculosisDist, name="Tuberculosis")
lungCancer = Node(lungCancerDist, name="LungCancer")
bronchitis = Node(bronchitisDist, name="Bronchitis")
xRay = Node(xRayDist, name="XRay")
dyspnea = Node(dyspneaDist, name="Dyspnea")

Define your model, adding states and edges

In [36]:
# -- your model here, for example -- 
model = BayesianNetwork("Shortness of Breath")
model.add_states(asian, smoker, tuberculosis, lungCancer, bronchitis, xRay, dyspnea)
model.add_edge(asian, tuberculosis)
model.add_edge(tuberculosis, dyspnea)
model.add_edge(lungCancer, dyspnea)
model.add_edge(bronchitis, dyspnea)
model.add_edge(smoker, lungCancer)
model.add_edge(smoker, bronchitis)
model.add_edge(tuberculosis, xRay)
model.add_edge(lungCancer, xRay)

model.bake()

------------------------------------------------

#### Questions

1.  What is the probability that an individual in the sampled population has either lung cancer or tuberculosis or both?

In [39]:
# Your calculation here

# Helper
def probDist(nodeName, model, evidence):
    def nodeIndex(model, nodeName, probs):
        return list(map(lambda s: s.name, model.states)).index(nodeName)
    return model.predict_proba(evidence)[nodeIndex(model, nodeName, model)].parameters[0]

hasLungCancerProb = probDist("LungCancer", model, {})[1]
hasTuberculosisProb = probDist("Tuberculosis", model, {})[1]
hasLungOrTubOrBoth = hasLungCancerProb + hasTuberculosisProb - (hasLungCancerProb * hasTuberculosisProb)
print(hasLungOrTubOrBoth)

0.06535486000000125


<span style="color:red">
Probability for lung cancer or tuberculosis or both = 0.065

Very low probability based on the sampled population
</span>

2.  What is the probability that an individual in the sampled population will have a positive chest X-ray?  

In [38]:
# Your calculation here
probDist("XRay", model, {})[1]

0.11052895971666489

<span style="color:red">
Probability for a positive chest X-ray = 0.11
</span>

3.  What is the probability that a smoker with a positive chest X-ray has lung cancer?  Does this probability depend on whether or not the individual has visited Asia?

In [45]:
# Your calculation here
hasLungCancerSmokerWithXray = probDist("LungCancer", model, {"Smoker" : 1, "XRay" : 1})[1]
hasLungCancerSmokerWithXrayNoAsia = probDist("LungCancer", model, {"Smoker" : 1, "XRay" : 1, "Asian" : 0})[1]
hasLungCancerSmokerWithXrayVisistedAsia = probDist("LungCancer", model, {"Smoker" : 1, "XRay" : 1, "Asian" : 1})[1]
print("Prob of lungCancer = {0:.5}, Prob of lungCancer not visisted Asia = {1:.5} and Prob of lungCancer visited Asia = {2:.5}".format(hasLungCancerSmokerWithXray, hasLungCancerSmokerWithXrayNoAsia, hasLungCancerSmokerWithXrayVisistedAsia))

Prob of lungCancer = 0.65508, Prob of lungCancer not visisted Asia = 0.65726 and Prob of lungCancer visited Asia = 0.4859


<span style="color:red">
Probability that a smoker with a positive chest X-ray has lung cancer = 0.655
    
Yes, visit to Asia can impact probability. An individual who visisted Asia with a positive X-ray can also introduces the possibility of Tuberculosis, thereby the probability of LungCancer decreases. Simialry, not visiting Asia, increases the probability of lung cancer for a smoker with positive chest XRay.
</span>

4.  How much does a trip to Asia affect the likelihood of an individual having Dyspnea?

In [46]:
# -- Your calculation here -- 
dyspneaProbWithNoTripToAsia = probDist("Dyspnea", model, {"Asian" : 0})[1]
dyspneaWithTripToAsiaProb = probDist("Dyspnea", model, {"Asian" : 1})[1]
change = (dyspneaWithTripToAsiaProb - dyspneaProbWithNoTripToAsia) / dyspneaProbWithNoTripToAsia

print("Prob of Dyspnea without a trip to Asia {0:.2},  Prob of Dyspnea with trip to Asia {1:.2}, change is {2:.2%}".format(dyspneaProbWithNoTripToAsia, dyspneaWithTripToAsiaProb, change))

Prob of Dyspnea without a trip to Asia 0.43,  Prob of Dyspnea with trip to Asia 0.46, change is 5.57%


<span style="color:red">
The likelihood of an individual having Dyspnea after a trip to Asia is 5.51% greater than individual who has not visited Asia.  
    
    -->
   Prob of Dyspnea without a trip to Asia 0.43,  
   Prob of Dyspnea with trip to Asia 0.46, 
   change is 5.51%
</span>

5.  Suppose you are a nonsmoker individual presenting with Dyspnea and you have never been to Asia. Based on this information what are the relative likelihoods that you have (a) Tuberculosis, (b) Lung Cancer, (c) Bronchitis, or (d) none of them?

In [47]:
## -- Your calculation here --  

hasTuberculosisProb = probDist("Tuberculosis", model, {"Smoker" : 0, "Dyspnea" : 1, "Asian" : 0})[1]
hasLungCancerProb = probDist("LungCancer", model, {"Smoker" : 0, "Dyspnea" : 1, "Asian" : 0})[1]
hasBronchitisProb = probDist("Bronchitis", model, {"Smoker" : 0, "Dyspnea" : 1, "Asian" : 0})[1]
NoneProb = (1 - hasTuberculosisProb) * (1 - hasLungCancerProb) * (1 - hasBronchitisProb)
print("Prob of Tuberculosis {0:.2},  Prob of LungCancer {1:.2}, Prob of Bronchitis {2:.2}, Prob of None {3:.2}".format(hasTuberculosisProb, hasLungCancerProb, hasBronchitisProb, NoneProb))

Prob of Tuberculosis 0.023,  Prob of LungCancer 0.022, Prob of Bronchitis 0.76, Prob of None 0.23


<span style="color:red">
    The likelihood that the individual has Bronchitis is much higher than any other. Given the individual is not a smoker the chances of Tuberculosis and Lung Cancer is very low (close to 0). The second best option is the individual not having any of them.
    
Here are the probabilities,    
    Tuberculosis = 0.023  ,   
    LungCancer = 0.022    ,
    Bronchitis = 0.76     ,
    None = 0.23 
    
</span>

6.  In your panic you have a chest XRay done, which comes out negative.   How does that change the relative likelihoods?

In [48]:
## -- Your calculation here -- 

hasTuberculosisProb = probDist("Tuberculosis", model, {"Smoker" : 0, "Dyspnea" : 1, "Asian" : 0, "XRay" : 0})[1]
hasLungCancerProb = probDist("LungCancer", model, {"Smoker" : 0, "Dyspnea" : 1, "Asian" : 0, "XRay" : 0})[1]
hasBronchitisProb = probDist("Bronchitis", model, {"Smoker" : 0, "Dyspnea" : 1, "Asian" : 0, "XRay" : 0})[1]
NoneProb = (1 - hasTuberculosisProb) * (1 - hasLungCancerProb) * (1 - hasBronchitisProb)
print("Prob of Tuberculosis {0:.2},  Prob of LungCancer {1:.2}, Prob of Bronchitis {2:.2}, Prob of None {3:.2}".format(hasTuberculosisProb, hasLungCancerProb, hasBronchitisProb, NoneProb))

Prob of Tuberculosis 0.00055,  Prob of LungCancer 0.00069, Prob of Bronchitis 0.78, Prob of None 0.22


<span style="color:red">
Given the XRay is negative and individual is not a smoker the possibility of having Tuberculosis or LungCancer is extermely low or 0, this is further reduced compared to before.
    
    The individual most likely has Bronchitis (increased by a slight margin) and again second best option being none of them.
    
Here are the probabilities,    
    Tuberculosis = 0.00055  ,   
    LungCancer = 0.00069  ,
    Bronchitis = 0.78     ,
    None = 0.22 
</span>

7.  On the basis of this information, should you seek medical attention?

<span style="color:red">
Yes, the individual likely has Bronchitis. It is better to seek medical attention.
</span>