## Problems

In [1]:
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB, GaussianNB

from dmutils import classification_summary, gains_chart

**1. Personal Loan Acceptance.** The file `UniversalBank.csv` contains data on 5000 customers of Universal Bank. The data include customer demographic information (age, income, etc.), the customer's relationship with the bank (mortgage, securities account, etc.), and the customer response to the last personal loan campaign (Personal Loan). Among these 5000 customers, only 480 (=9.6%) accepted the personal loan that was offered to them in the earlier campaign. In this exercise, we focus on two predictors: Online (whether or not the customer is an active user of
online banking services) and Credit Card (abbreviated CC below) (does the customer hold a credit card issued by the bank), and the outcome Personal Loan (abbreviated Loan below).

Partition the data into training (60%) and validation (40%) sets.

**a.** Create a pivot table for the training data with Online as a column variable, CC as a row variable, and Loan as a secondary row variable. The values inside the table should convey the count. Use the pandas dataframe methods `melt()` and `pivot()`.

In [2]:
bank_df = pd.read_csv("../datasets/UniversalBank.csv")
bank_df.head()

Unnamed: 0,ID,Age,Experience,Income,ZIP Code,Family,CCAvg,Education,Mortgage,Personal Loan,Securities Account,CD Account,Online,CreditCard
0,1,25,1,49,91107,4,1.6,1,0,0,1,0,0,0
1,2,45,19,34,90089,3,1.5,1,0,0,1,0,0,0
2,3,39,15,11,94720,1,1.0,1,0,0,0,0,0,0
3,4,35,9,100,94112,1,2.7,2,0,0,0,0,0,0
4,5,35,8,45,91330,4,1.0,2,0,0,0,0,0,1


In [3]:
# data preparation for the exercise
predictors = ["Online", "CreditCard"]
outcome = "Personal Loan"

X_train, X_valid, y_train, y_valid = train_test_split(bank_df[predictors], bank_df[outcome],
                                                      test_size=0.4, random_state=1)

# data preparation to generate the pivot tables
train_df, valid_df = train_test_split(bank_df[predictors+[outcome]],
                                      test_size=0.4, random_state=1)

melt_table = pd.melt(train_df,
                     id_vars=["CreditCard", "Personal Loan"],
                     var_name=["Online"])\
               .groupby(["CreditCard", "Personal Loan"])\
               .count()["Online"]\
               .reset_index()
pivot_table = pd.pivot_table(train_df,
                             values=["Online"],
                             index=["CreditCard", "Personal Loan"],
                             aggfunc='count')\
                .reset_index()

print(melt_table)
print()
print(pivot_table)

   CreditCard  Personal Loan  Online
0           0              0    1909
1           0              1     199
2           1              0     804
3           1              1      88

   CreditCard  Personal Loan  Online
0           0              0    1909
1           0              1     199
2           1              0     804
3           1              1      88


**b.** Consider the task of classifying a customer who owns a bank credit card and is actively using online banking services. Looking at the pivot table, what is the probability that this customer will accept the loan offer? (This is the probability of loan acceptance (Loan = 1) conditional on having a bank credit card (CC = 1) and being an active user of online banking services (Online = 1)).

In this case, we will be looking at the probability of the record belonging to class `Personal Loan` = 1 given that its predictor values are `CC` = 1 and `Online` = 1:

<p style="text-align:center">
    $P(\text{Loan}=1 ∣ \text{CC}=1, \text{Online}=1) = \frac{88}{804+88} = 0.098 = 9.8\%$
</p>

**c.** Create two separate pivot tables for the training data. One will have Loan (rows) as a function of Online (columns) and the other will have Loan (rows) as a function of CC.

In [4]:
pd.set_option("precision", 4)
# probability of loan acceptance
print(train_df["Personal Loan"].value_counts() / len(train_df))
print()

for predictor in predictors:
    # construct the frequency table
    df = train_df[["Personal Loan", predictor]]
    freq_table = df.pivot_table(index="Personal Loan", columns=predictor, aggfunc=len)

    # divide each value by the sum of the row to get conditional probabilities
    prop_table = freq_table.apply(lambda x: x, axis=1)
    print(prop_table)
    print()

pd.reset_option("precision")

0    0.9043
1    0.0957
Name: Personal Loan, dtype: float64

Online            0     1
Personal Loan            
0              1119  1594
1               112   175

CreditCard        0    1
Personal Loan           
0              1909  804
1               199   88



**d.** Compute the following quantities [P(A ∣ B) means "the probability of A given B"]:

    i. P(CC = 1 ∣ Loan = 1) (the proportion of credit card holders among the loan acceptors)

<p style="text-align:center">
    $P(\text{CC}=1 ∣ \text{Loan}=1) = \frac{88}{199+88} = 0.306 = 30.6\%$
</p>

    ii. P(Online = 1 ∣ Loan = 1)

<p style="text-align:center">
    $P(\text{Online}=1 ∣ \text{Loan}=1) = \frac{175}{175+112} = 0.609 = 60.9\%$
</p>

    iii. P(Loan = 1) (the proportion of loan acceptors)

<p style="text-align:center">
    $P(\text{Loan}=1) = \frac{287}{3000} = 0.0957 = 9.57\%$
</p>

    iv. P(CC = 1 ∣ Loan = 0)

<p style="text-align:center">
    $P(\text{CC}=1 ∣ \text{Loan}=0) = \frac{804}{1909+804} = 0.296 = 29.6\%$
</p>

    v. P(Online = 1 ∣ Loan = 0)

<p style="text-align:center">
    $P(\text{Online}=1 ∣ \text{Loan}=0) = \frac{1594}{1119+1594} = 0.587 = 58.7\%$
</p>

    vi. P(Loan = 0)

<p style="text-align:center">
    $P(\text{Loan}=0) = 1 - P(\text{Loan}=1) = 1 - 0.0957 = 0.9043 = 90.4\%$
</p>

**e.** Use the quantities computed above to compute the naive Bayes probability $P(\text{Loan} = 1 ∣ \text{CC} = 1, \text{Online} = 1)$.

The naive Bayes probability for this case is given by:

<p style="text-align:center">
    $P_{nb}(\text{Loan} = 1 ∣ \text{CC} = 1, \text{Online} = 1) = \frac{P(\text{Loan}=1) P(\text{CC}=1 ∣ \text{Loan}=1) P(\text{Online}=1 ∣ \text{Loan}=1)}{P(\text{Loan}=1) P(\text{CC}=1 ∣ \text{Loan}=1) P(\text{Online}=1 ∣ \text{Loan}=1) + P(\text{Loan}=0) P(\text{CC}=1 ∣ \text{Loan}=0) P(\text{Online}=1 ∣ \text{Loan}=0)}$
    <br>$ = \frac{(0.0957)(0.306)(0.609)}{(0.0957)(0.306)(0.609) + (0.9043)(0.296)(0.587)} = $
    <br>$ = \frac{0.01783}{0.01783 + 0.1571} = 0.1019 = 10.1\%$ (+/-)
   
</p>

**f.** Compare this value with the one obtained from the pivot table in (b). Which is a more accurate estimate?

The value obtained by the naive Bayes probability are very close to the exact Bayes probabilities. Although they are not equal, both would lead to exactly the same classification for a cutoff of 0.5 (and many other values). It is often the case that the rank ordering of probabilities is even closer to the exact Bayes method than the probabilities themselves, and for classification purposes it is the rank orderings that matter.

**g.** Which of the entries in this table are needed for computing P(Loan = 1 ∣ CC = 1, Online = 1)? In Python, run naive Bayes on the data. Examine the model output on training data, and find the entry that corresponds to P(Loan = 1 ∣ CC = 1, Online = 1). Compare this to the number you obtained in (e).

I did not understand the first question :(

In [5]:
# run naive Bayes
loan_nb = MultinomialNB(alpha=0.01)
loan_nb.fit(X_train, y_train)

# predict probabilities
pred_proba_train = loan_nb.predict_proba(X_train)
pred_proba_valid = loan_nb.predict_proba(X_valid)

# predict class membership
y_train_pred = loan_nb.predict(X_train)
y_valid_pred = loan_nb.predict(X_valid)

# classify a CC=1 and Online=1 entry
df = pd.concat([pd.DataFrame({"actual": y_valid, "predicted": y_valid_pred}),
                pd.DataFrame(pred_proba_valid, index=y_valid.index)], axis=1)

mask = ((X_valid.Online == 1) & (X_valid.CreditCard== 1))
df[mask]

Unnamed: 0,actual,predicted,0,1
932,0,0,0.904419,0.095581
1132,0,0,0.904419,0.095581
3289,0,0,0.904419,0.095581
348,1,0,0.904419,0.095581
2971,0,0,0.904419,0.095581
...,...,...,...,...
4705,1,0,0.904419,0.095581
2429,0,0,0.904419,0.095581
2684,0,0,0.904419,0.095581
3401,0,0,0.904419,0.095581


**2. Automobile Accidents.** The file `accidentsFull.csv` contains information on 42,183 actual automobile accidents in 2001 in the United States that involved one of three levels of injury: NO INJURY, INJURY, or FATALITY. For each accident, additional information is recorded, such as day of week, weather conditions, and road type. A firm might be interested in developing a system for quickly classifying the severity of an accident based on initial reports and associated data in the system (some of which rely on GPS-assisted reporting).

Our goal here is to predict whether an accident just reported will involve an injury (MAX_SEV_IR = 1 or 2) or will not (MAX_SEV_IR = 0). For this purpose, create a dummy variable called INJURY that takes the value "yes" if MAX_SEV_IR = 1 or 2, and otherwise "no".

In [6]:
accidents_df = pd.read_csv("../datasets/accidentsFull.csv")
accidents_df.head()

Unnamed: 0,HOUR_I_R,ALCHL_I,ALIGN_I,STRATUM_R,WRK_ZONE,WKDY_I_R,INT_HWY,LGTCON_I_R,MANCOL_I_R,PED_ACC_R,...,SUR_COND,TRAF_CON_R,TRAF_WAY,VEH_INVL,WEATHER_R,INJURY_CRASH,NO_INJ_I,PRPTYDMG_CRASH,FATALITIES,MAX_SEV_IR
0,0,2,2,1,0,1,0,3,0,0,...,4,0,3,1,1,1,1,0,0,1
1,1,2,1,0,0,1,1,3,2,0,...,4,0,3,2,2,0,0,1,0,0
2,1,2,1,0,0,1,0,3,2,0,...,4,1,2,2,2,0,0,1,0,0
3,1,2,1,1,0,0,0,3,2,0,...,4,1,2,2,1,0,0,1,0,0
4,1,1,1,0,0,1,0,3,2,0,...,4,0,2,3,1,0,0,1,0,0


In [7]:
accidents_df.MAX_SEV_IR.map({0: "no", 1: "yes", 2: "yes"})

0        yes
1         no
2         no
3         no
4         no
        ... 
42178     no
42179    yes
42180     no
42181     no
42182     no
Name: MAX_SEV_IR, Length: 42183, dtype: object

In [8]:
# create a dummy variable called INJURY
accidents_df["INJURY"] = accidents_df.MAX_SEV_IR.map({0: "no", 1: "yes", 2: "yes"})

predictors = list(accidents_df.columns)[:-1]
outcome = "INJURY"

X = accidents_df[predictors]
y = accidents_df[outcome]
classes = set(y)

**a.** Using the information in this dataset, if an accident has just been reported and no further information is available, what should the prediction be? (INJURY = Yes or No?) Why?

Here we apply the **Naive Benchmark**

Benchmark: The Naive Rule

A very simple rule for classifying a record into one of *m* classes, ignoring all predictor information ($x_1$, $x_2$, ..., $x_p$) that we may have, is to classify the record as a member of the majority class. In other
words, "classify as belonging to the most prevalent class". The naive rule is used mainly as a baseline or benchmark for evaluating the performance of more complicated classifiers. Clearly, a classifier that uses external predictor information (on top of the class membership allocation) should outperform the naive rule. There are various
performance measures based on the naive rule that measure how much better than the naive rule a certain classifier performs.

In [9]:
print("Probability of INJURY:", sum(y == "yes") / len(y))
print("Probability no INJURY:", sum(y == "no") / len(y))

Probability of INJURY: 0.5087831590925255
Probability no INJURY: 0.4912168409074746


Since ~51% of the accidents in our data set resulted in an accident, we should predict that an accident will result in injury because it is slightly more likely.

**b.** Select the first 12 records in the dataset and look only at the response (INJURY) and the two predictors WEATHER_R and TRAF_CON_R. 

In [10]:
accidents_df.head(12)[["WEATHER_R", "TRAF_CON_R", "INJURY"]]

Unnamed: 0,WEATHER_R,TRAF_CON_R,INJURY
0,1,0,yes
1,2,0,no
2,2,1,no
3,1,1,no
4,1,0,no
5,2,0,yes
6,2,0,no
7,1,0,yes
8,2,0,no
9,2,0,no


    i. Create a pivot table that examines INJURY as a function of the two predictors for these 12 records.
       Use all three variables in the pivot table as rows/columns.

In [11]:
freq_table = pd.pivot_table(accidents_df.head(12)[["WEATHER_R", "TRAF_CON_R", "INJURY"]],
                            index=["INJURY"],
                            columns=["WEATHER_R", "TRAF_CON_R"],
                            aggfunc=len)
prop_table = freq_table.fillna(0)
prop_table.T

Unnamed: 0_level_0,INJURY,no,yes
WEATHER_R,TRAF_CON_R,Unnamed: 2_level_1,Unnamed: 3_level_1
1,0,1.0,2.0
1,1,1.0,0.0
1,2,1.0,0.0
2,0,5.0,1.0
2,1,1.0,0.0


    ii. Compute the exact Bayes conditional probabilities of an injury (INJURY = Yes)
        given the six possible combinations of the predictors.

In [12]:
freq_table = pd.pivot_table(accidents_df.head(12)[["WEATHER_R", "TRAF_CON_R", "INJURY"]],
                            index=["INJURY"],
                            columns=["WEATHER_R", "TRAF_CON_R"],
                            aggfunc=len).T.fillna(0)
prop_table = freq_table.apply(lambda x: x/sum(x), axis=1)

prop_table

Unnamed: 0_level_0,INJURY,no,yes
WEATHER_R,TRAF_CON_R,Unnamed: 2_level_1,Unnamed: 3_level_1
1,0,0.333333,0.666667
1,1,1.0,0.0
1,2,1.0,0.0
2,0,0.833333,0.166667
2,1,1.0,0.0


**Complete (Exact) Bayes Calculations**

The probabilities are computed as:

<p>
    <br>$P(\text{INJURY} ∣ \text{WEATHER_R = 1}, \text{TRAF_CON_R = 0}) = \frac{2}{3} = 0.67$
    <br>$P(\text{INJURY} ∣ \text{WEATHER_R = 1}, \text{TRAF_CON_R = 1}) = \frac{0}{1} = 0$
    <br>$P(\text{INJURY} ∣ \text{WEATHER_R = 1}, \text{TRAF_CON_R = 2}) = \frac{0}{1} = 0$
    <br>$P(\text{INJURY} ∣ \text{WEATHER_R = 2}, \text{TRAF_CON_R = 0}) = \frac{1}{6} = 0.167$
    <br>$P(\text{INJURY} ∣ \text{WEATHER_R = 2}, \text{TRAF_CON_R = 1}) = \frac{0}{1} = 0$
    <br>$P(\text{INJURY} ∣ \text{WEATHER_R = 2}, \text{TRAF_CON_R = 2}) = \infty$
</p>

    iii. Classify the 12 accidents using these probabilities and a cutoff of 0.5.

 	  WEATHER_R 	TRAF_CON_R 	INJURY  PROB_PRED
    00 	1 	        0       	yes        YES
    01 	2 	        0       	no         NO
    02 	2 	        1       	no         NO
    03 	1 	        1       	no         NO
    04 	1 	        0       	no         YES
    05 	2 	        0       	yes        NO
    06 	2 	        0       	no         NO
    07 	1 	        0       	yes        YES
    08 	2 	        0       	no         NO
    09 	2 	        0       	no         NO
    10 	2 	        0       	no         NO
    11 	1 	        2       	no         NO

    iv. Compute manually the naive Bayes conditional probability of an injury given WEATHER_R = 1
        and TRAF_CON_R = 1.

<p>
    <br>$P(\text{INJURY}) = \frac{3}{12} = 0.25$
    <br>$P(\text{NOT INJURY}) = \frac{9}{12} = 0.75$
    <br>$P_{nb}(\text{INJURY} ∣ \text{WEATHER_R} = 1, \text{TRAF_CON_R} = 1) = $
    <br>$ = \frac{P(\text{INJURY}) P(\text{WEATHER_R}=1 ∣ \text{INJURY}) P(\text{TRAF_CON_R} ∣ \text{INJURY})}{P(\text{INJURY}) P(\text{WEATHER_R}=1 ∣ \text{INJURY}) P(\text{TRAF_CON_R} ∣ \text{INJURY}) + P(\text{NO INJURY}) P(\text{WEATHER_R}=1 ∣ \text{ NO INJURY}) P(\text{TRAF_CON_R} ∣ \text{NO INJURY})} = $
    <br>$ = \frac{(0.25)(0.33)(0.0)}{(0.25)(0.33)(0.0) + (0.75)(0.33)(0.22)} = 0$   
</p>


    v. Run a naive Bayes classifier on the 12 records and 2 predictors using scikit-learn.
       Check the model output to obtain probabilities and classifications for all 12 records. Compare this
       to the exact Bayes classification. Are the resulting classifications equivalent? Is the ranking
       (=ordering) of observations equivalent?

In [13]:
X

Unnamed: 0,HOUR_I_R,ALCHL_I,ALIGN_I,STRATUM_R,WRK_ZONE,WKDY_I_R,INT_HWY,LGTCON_I_R,MANCOL_I_R,PED_ACC_R,...,SUR_COND,TRAF_CON_R,TRAF_WAY,VEH_INVL,WEATHER_R,INJURY_CRASH,NO_INJ_I,PRPTYDMG_CRASH,FATALITIES,MAX_SEV_IR
0,0,2,2,1,0,1,0,3,0,0,...,4,0,3,1,1,1,1,0,0,1
1,1,2,1,0,0,1,1,3,2,0,...,4,0,3,2,2,0,0,1,0,0
2,1,2,1,0,0,1,0,3,2,0,...,4,1,2,2,2,0,0,1,0,0
3,1,2,1,1,0,0,0,3,2,0,...,4,1,2,2,1,0,0,1,0,0
4,1,1,1,0,0,1,0,3,2,0,...,4,0,2,3,1,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
42178,0,2,1,0,1,1,0,1,0,0,...,1,2,1,1,1,0,0,1,0,0
42179,1,2,1,1,0,0,0,1,0,0,...,1,0,1,1,1,1,1,0,0,1
42180,0,2,2,0,0,1,0,1,0,0,...,1,0,1,1,1,0,0,1,0,0
42181,1,2,1,1,0,1,0,1,0,0,...,1,0,1,1,1,0,0,1,0,0


In [14]:
# prepare sample
X = accidents_df.head(12)[["WEATHER_R", "TRAF_CON_R"]]
y = accidents_df.head(12)["INJURY"]

# run naive Bayes
injury_nb = MultinomialNB(alpha=0.01)
injury_nb.fit(X, y)

# predict class membership
y_pred = injury_nb.predict(X)

# predict probabilities
pred_proba = injury_nb.predict_proba(X)

# classify using predictor values
df = pd.concat([X,
                pd.DataFrame({"actual": y, "predicted": y_pred}),
                pd.DataFrame(pred_proba, index=y.index)], axis=1)
df

Unnamed: 0,WEATHER_R,TRAF_CON_R,actual,predicted,0,1
0,1,0,yes,no,0.703564,0.296436
1,2,0,no,no,0.6525,0.3475
2,2,1,no,no,0.993756,0.006244
3,1,1,no,no,0.995053,0.004947
4,1,0,no,no,0.703564,0.296436
5,2,0,yes,no,0.6525,0.3475
6,2,0,no,no,0.6525,0.3475
7,1,0,yes,no,0.703564,0.296436
8,2,0,no,no,0.6525,0.3475
9,2,0,no,no,0.6525,0.3475


Running the `sklearn MultinomialNB` we can see that the Exact Bayes performed better (manual calculations). Though it seems a bit weird that the `MultinomialNB` is classifying records as a naive classifier (majority). This happens because for this sample $P(\text{TRAF_CON_R} = 1 ∣ \text{INJURY}) = 0$ and $P(\text{TRAF_CON_R} = 2 ∣ \text{INJURY}) = 0$ and $P(\text{TRAF_CON_R} = 1 ∣ \text{INJURY}) = 0.67$.

In [15]:
pd.set_option("precision", 4)
# probability of injury
print(accidents_df.head(12)["INJURY"].value_counts() / len(accidents_df.head(12)["INJURY"]))
print()

for predictor in ("WEATHER_R", "TRAF_CON_R"):
    # construct the frequency table
    df = accidents_df.head(12)[["INJURY", predictor]]
    freq_table = df.pivot_table(index="INJURY", columns=predictor, aggfunc=len).fillna(0)

    # divide each value by the sum of the row to get conditional probabilities
    prop_table = freq_table.apply(lambda x: x/sum(x), axis=1)
    print(prop_table)
    print()

pd.reset_option("precision")

no     0.75
yes    0.25
Name: INJURY, dtype: float64

WEATHER_R       1       2
INJURY                   
no         0.3333  0.6667
yes        0.6667  0.3333

TRAF_CON_R       0       1       2
INJURY                            
no          0.6667  0.2222  0.1111
yes         1.0000  0.0000  0.0000



**c.** Let us now return to the entire dataset. Partition the data into training (60%) and validation (40%).

In [16]:
# split the original data frame into a train and test using the same
# random_state
train_df, valid_df = train_test_split(accidents_df, test_size=0.40, random_state=1)

    i. Assuming that no information or initial reports about the accident itself are available at the
       time of prediction (only location characteristics, weather conditions, etc.), which predictors
       can we include in the analysis? (Use the data descriptions page from www.dataminingbook.com.)

We can use the predictors that describe the calendar time or road conditions: HOUR_I_R ALIGN_I WRK_ZONE WKDY_I_R INT_HWY LGTCON_I_R PROFIL_I_R SPD_LIM SUR_CON TRAF_CON_R TRAF_WAY WEATHER_R;

    ii. Run a naive Bayes classifier on the complete training set with the relevant predictors
        (and INJURY as the response). Note that all predictors are categorical. Show the confusion matrix.

In [17]:
# relevant predictors
predictors = ["HOUR_I_R",  "ALIGN_I" ,"WRK_ZONE",  "WKDY_I_R",
              "INT_HWY",  "LGTCON_I_R", "PROFIL_I_R", "SPD_LIM", "SUR_COND",
              "TRAF_CON_R",   "TRAF_WAY",   "WEATHER_R"]
outcome = "INJURY"

# run naive Bayes
delays_nb = MultinomialNB(alpha=0.01)
delays_nb.fit(train_df[predictors], train_df[outcome])

# predict probabilities
pred_proba_train = delays_nb.predict_proba(train_df[predictors])
pred_proba_valid = delays_nb.predict_proba(train_df[predictors])

# predict class membership
y_train_pred = delays_nb.predict(train_df[predictors])
y_valid_pred = delays_nb.predict(valid_df[predictors])

    iii. What is the overall error for the validation set?

In [18]:
# training
classification_summary(train_df[outcome], y_train_pred, class_names=classes)
print()

# validation
classification_summary(valid_df[outcome], y_valid_pred, class_names=classes)

Confusion Matrix (Accuracy 0.5291)

       Prediction
Actual  yes   no
   yes 4197 8195
    no 3724 9193

Confusion Matrix (Accuracy 0.5288)

       Prediction
Actual  yes   no
   yes 2838 5491
    no 2460 6085


In [19]:
print("Overall Error (validation set): {:.2%}".format(1 - 0.5288))

Overall Error (validation set): 47.12%


    iv. What is the percent improvement relative to the naive rule (using the validation set)?

In [20]:
# validation
classification_summary(valid_df[outcome], ["yes" for n in range(len(valid_df))], class_names=classes)

Confusion Matrix (Accuracy 0.5064)

       Prediction
Actual  yes   no
   yes    0 8329
    no    0 8545


In [21]:
print("Percent improvement relative to naive rule: {:.2%}".format(0.5288 - 0.5026))

Percent improvement relative to naive rule: 2.62%


    v. Examine the conditional probabilities in the pivot tables. Why do we get a
       probability of zero for P(INJURY = No ∣ SPD_LIM = 5)

In [22]:
pd.set_option("precision", 4)
# probability of injury
print(accidents_df["INJURY"].value_counts() / len(accidents_df["INJURY"]))
print()

for predictor in predictors:
    # construct the frequency table
    df = accidents_df[["INJURY", predictor]]
    freq_table = df.pivot_table(index="INJURY", columns=predictor, aggfunc=len).fillna(0)

    # divide each value by the sum of the row to get conditional probabilities
    prop_table = freq_table.apply(lambda x: x/sum(x), axis=1)
    print(prop_table)
    print()

pd.reset_option("precision")

yes    0.5088
no     0.4912
Name: INJURY, dtype: float64

HOUR_I_R       0       1
INJURY                  
no        0.5678  0.4322
yes       0.5734  0.4266

ALIGN_I       1       2
INJURY                 
no       0.8707  0.1293
yes      0.8663  0.1337

WRK_ZONE       0       1
INJURY                  
no        0.9760  0.0240
yes       0.9788  0.0212

WKDY_I_R       0       1
INJURY                  
no        0.2177  0.7823
yes       0.2387  0.7613

INT_HWY       0       1       9
INJURY                         
no       0.8498  0.1497  0.0005
yes      0.8600  0.1392  0.0008

LGTCON_I_R       1       2       3
INJURY                            
no          0.6916  0.1254  0.1830
yes         0.6972  0.1120  0.1908

PROFIL_I_R       0       1
INJURY                    
no          0.7504  0.2496
yes         0.7629  0.2371

SPD_LIM          5       10      15      20      25      30      35      40  \
INJURY                                                                        
no   

We do not get true probability of zero for no injury in accidents under speed limit of 5 because we can never be entirely sure something will not happen. However, considering the highly unlikely fact of sustaining an injury while in a car accident at such a low speed, it makes sense that the probability is quite close to 0.