Here is a common network example about a sidewalk.  Maybe it will inform a package-delivery agent as to the right speed or tires to use on its delivery route!

We have four variables:
* Season, which has domain 0 to 3 (Winter, Spring, Summer, Fall) 
* Sprinkler on, Rain,  Sidewalk Wet, and Sidewalk Slippery, all of which are binary

The network tells us that the variable **Season** directly influences both **Sprinkler** and **Rain** -- we discovered  that the sprinkler system is on a preset schedule that depends only on the season, and not on whether or not it is raining.  **Sprinkler** and **Rain**  both in turn influence **Wet**, which in turn influences **Slippery**.


![Slippery](SlipperyPicture.GIF)

We have a data set with historical observations about the variables.  In this case we have sampled from the joint distribution fully.

The data set is in the file slippery.csv.  In this file, Season is coded as 0 to 3 (Winter, Spring, Summer, Fall) and the other variables are binary (0 for false 1 for true).

If we were diligent data scientists, we would have to verify that the conditional independence assumptions implicit in the model are actually reflected in our sample.  For example, the network embodies the assumption that **Wet** is independent of both **Season** and **Sprinkler** conditioned on **Rain**.  This is either (approximately) true or false in the data set.

But rather than that, we will use the sample to get the probability parameters we need to build our network.

In [10]:
# Read the file into a data frame and look at the first few rows
import pandas as pd
df = pd.read_csv("slippery.csv", sep=",", header = 0)

In [11]:
type(df)

pandas.core.frame.DataFrame

In [12]:
df.head()

Unnamed: 0,Season,Sprinkler,Rain,Wet,Slippery
0,3,1,0,0,0
1,0,0,0,0,0
2,2,1,0,1,1
3,3,0,1,1,1
4,3,1,0,1,0


In [4]:
df.shape

(50000, 5)

In [5]:
df.columns

Index(['Season', 'Sprinkler', 'Rain', 'Wet', 'Slippery'], dtype='object')

In [None]:
# The columns came from the csv file
df.columns

In [13]:
len(df['Season'])

50000

In [14]:
type(df.Season)

pandas.core.series.Series

In [None]:
df.shape

In [15]:
df.Season.value_counts()

0    12623
3    12564
2    12425
1    12388
Name: Season, dtype: int64

In [16]:
df.Season.value_counts().sort_index()

0    12623
1    12388
2    12425
3    12564
Name: Season, dtype: int64

In [17]:
# This is marginal probability of Season
df.Season.value_counts() / df.shape[0]

0    0.25246
3    0.25128
2    0.24850
1    0.24776
Name: Season, dtype: float64

In [18]:
(df.Season.value_counts() / df.shape[0]).sort_index()

0    0.25246
1    0.24776
2    0.24850
3    0.25128
Name: Season, dtype: float64

In [20]:
# P(Sprinkler | Season)
pd.crosstab(df.Sprinkler, df.Season, normalize='columns')

Season,0,1,2,3
Sprinkler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,0.989305,0.75226,0.248531,0.490051
1,0.010695,0.24774,0.751469,0.509949


In [21]:
pss = pd.crosstab(df.Sprinkler, df.Season, normalize='columns')
print(f"Distribution conditioned on Season=2: {list(pss[2])}")
print(f"P(Sprinkler = 1 | Season=3): {pss[3][1]}")

Distribution conditioned on Season=2: [0.24853118712273642, 0.7514688128772636]
P(Sprinkler = 1 | Season=3): 0.5099490608086596


In [22]:
# P(Wet | Sprinkler, Rain)
pd.crosstab(df.Wet, [df.Sprinkler, df.Rain], normalize='columns')

Sprinkler,0,0,1,1
Rain,0,1,0,1
Wet,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
0,0.999262,0.248572,0.504077,0.092523
1,0.000738,0.751428,0.495923,0.907477


In [23]:
pwsr = pd.crosstab(df.Wet, [df.Sprinkler, df.Rain], normalize='columns')
print(f"P(Wet = 1 | Sprinker=0, Rain=1): {pwsr[1][0][1]}")

P(Wet = 1 | Sprinker=1, Rain=0): 0.49592252803261977


In [None]:
# P(Slippery | Wet)
pd.crosstab(df.Slippery, df.Wet, normalize='columns')