## EXERCISE 1: Weather's probability
You are given a (fake) <a href="https://drive.google.com/file/d/1LjZLE9ozaHcBwiCl90mHaS1nXKcglfr4/view">padua_weather.csv</a>
of historical records for Padua's weather. The weather, which can be either rainy (= 1 in the dataset), misty (= 2), or sunny (= 3), is reported for each day of the week, for a whole year (52 weeks).

After you formalised the problem (i.e. identify the random variables and necessary mathematical formulae), write a Python program that reads the dataset and computes the following:
- probability of being sunny during the weekend (one or both days);
- expected weather for each day of the week (*);
- supposed you don't know which day of the week is today: although very unrealistic, how could you guess which day is today based only on the weather?

(\*) An expected value of, for example, 2.5 can be interpreted as "a mix of misty and sunny weather".
 



In [25]:
# variables:
# X: weather (values: {1,2,3})
# D: day of the week (values: {1,2,3,4,5,6,7} / {day names})

import csv
import numpy as np

weather_names = ['rainy', 'misty', 'sunny']

# get and organize weather data
day_names = list()
weather_data = dict()
week_count = 0
with open('padua_weather.csv', 'r') as csvfile:
    csvreader = csv.reader(csvfile)
    day_names = next(csvreader)
    weather_data = {day:list() for day in day_names}
    for row in csvreader:
        for i in range(0,7):
            weather_data[day_names[i]].append(int(row[i]))

week_count = len(weather_data[day_names[0]])

# question 1: we need P(X=3, weekend)
P_D = {day:(1/6) for day in day_names}
P_3_and_w = 0
N_3_w = 0
for i in range(0, week_count):
    if weather_data[day_names[5]][i] == 3 or weather_data[day_names[6]][i] == 3: N_3_w += 1
P_3_and_w = N_3_w / week_count

print("Prima domanda: P(X=3,weekend)=" + str(P_3_and_w))
print("-----------------------------------------------")

# question 2: expectance of X for D

# occurrences of every weather for every day: dict with day as index and list with occ. of weathers as value
weather_for_days = dict()
for day in day_names:   
    weather_for_days[day] = list()
    for j in range(1,4):
        weather_for_days[day].append(weather_data[day].count(j))

# P(X=weather | D=day): dict with day d as index and list with P(X=weather | d) as value
P_X_D = dict()
for day in day_names:
    P_X_D[day] = [(weather / week_count) for weather in weather_for_days[day]]

# weather expectance given a certain day
exp_D = dict()
for day in day_names:
    exp_D[day] = sum((j+1) * P_X_D[day][j] for j in range(0,3))

print("Seconda domanda: ")
for (day, exp) in exp_D.items():
    print(": ".join((day[0:3], "{:.2f}".format(exp))))
print("-----------------------------------------------")

# question 3: we need P(D=day | X=weather), already contained in P_D_X
print("Terza domanda: ")
for (day, p_x_d) in P_X_D.items():
    p_str = day[0:3] + ": "
    for i in range(0, 3):
        p_str += ": ".join((weather_names[i], "{:.2f}".format(p_x_d[i])))
        if (i != 2): p_str += "; "
    print(p_str)

Prima domanda: P(X=3,weekend)=0.4423076923076923
-----------------------------------------------
Seconda domanda: 
Mon: 2.08
Tue: 1.98
Wed: 2.04
Thu: 1.94
Fri: 1.96
Sat: 1.85
Sun: 1.75
-----------------------------------------------
Terza domanda: 
Mon: rainy: 0.33; misty: 0.27; sunny: 0.40
Tue: rainy: 0.37; misty: 0.29; sunny: 0.35
Wed: rainy: 0.31; misty: 0.35; sunny: 0.35
Thu: rainy: 0.37; misty: 0.33; sunny: 0.31
Fri: rainy: 0.37; misty: 0.31; sunny: 0.33
Sat: rainy: 0.38; misty: 0.38; sunny: 0.23
Sun: rainy: 0.52; misty: 0.21; sunny: 0.27


## EXERCISE 2: Broad Street cholera outbreak

The following is a simplified version of an example in Judea Pearl's *The Book of Why*. It refers to a case of cholera epidemic, caused by contaminated water, which killed hundreds of people in London between 1853 and 1854. The diagram below illustrates some of the key factors explaining this epidemic, in particular:
- $X$ indicates whether the water company's intake was downstream of the London's sewers;
- $W$ indicates whether the water was contaminated or not;
- $Z$ indicates the presence of other external factors (e.g. poverty, miasma, etc.);
- $Y$ indicates the outbreak of cholera.

<img src='https://drive.google.com/uc?id=10O10x_nuuxF55rqRk0TpanHV_7Q819MA'>

(please note the probabilities in the diagram are fake)

> - Formalise the problem using opportune mathematical notations and derive an expression for computing the probability distribution of the cholera given that the water company's intake is upstream (i.e. what is the query? how can it be decomposed?)
> - Write a Python program that computes the actual probabilities of the above distribution using the information from the given CPTs.

$$ \textbf{P}(Y | \neg x) = \alpha \cdot \textbf{P} (Y, \neg x) = \alpha \cdot  \sum_{W,Z} \textbf{P}(Y, \neg x, W, Z) $$
$$ \textbf{P}(Y, \neg x, W, Z) = P(\neg x) \cdot \textbf{P}(Z) \cdot \textbf{P}(W | \neg x, Z) \cdot \textbf{P}(Y | W, Z) $$
$$ \textbf{P}(Y | \neg x) = \alpha \cdot P(\neg x) \cdot \sum_{Z} P(Z) \cdot \sum_{W} \textbf{P}(W | \neg x, Z) \cdot \textbf{P}(Y | W, Z) $$

In [29]:
import numpy as np
t, f = 0, 1

# distribution of X and Z
P_X = np.array([0.5, 0.5])
P_Z = np.array([0.25, 0.75])

P_W = np.array([[[0.90, 0.85], [0.10, 0.02]], [[0.10, 0.15], [0.90, 0.98]]])
#              W (X,Z) (X,nZ)  (nX,Z)(nX,nZ)  nW(X,Z) (X,nZ) (nX,Z)(nX,nZ)         

P_Y = np.array([[[0.80, 0.75], [0.15, 0.05]], [[0.20, 0.25], [0.85, 0.95]]])
#              Y (Z,W) (Z,nW)  (nZ,W)(nZ,nW)  nY(Z,W) (Z,nW) (nZ,W)(nZ,nW)

$$ \textbf{P}(Y | \neg x) = \alpha \cdot P(\neg x) \cdot \sum_{Z} P(Z) \cdot \sum_{W} \textbf{P}(W | \neg x, Z) \cdot \textbf{P}(Y | W, Z) $$

In [32]:
aux = 0

for w in range(2):
    for z in range(2):
        aux += P_Z[z] * P_W[w,f,z] * P_Y[:,w,z]

aux = aux * P_X[f]
s = sum(aux)
aux = aux / s
print("P(Y|¬x):", aux)

P(Y|¬x): [0.10175 0.89825]
