# The Restaurant Decision

Problem: Decide whether to wait for a table at a restaurant,
based on the following attributes:

- Choice: is there an alternative restaurant nearby?
- Bar: is there a comfortable bar area to wait in?
- Day: is today Friday or Saturday?
- Hungry: are we hungry?
- Patron: how many people are in the restaurant?
- Price: what’s the price range?
- Rain: is it raining outside?
- Booking: have we made a reservation?
- Type: what kind of restaurant is it?
- Time: what’s the estimated waiting time?

In [1]:
import pandas as pd

pd.set_option('display.max_colwidth', None)
restaurant = pd.read_csv('../Data/restaurant.csv.zst')
display(restaurant)

Unnamed: 0,choice,bar,day,hungry,patron,price,rain,booking,type,time,wait
0,T,F,F,T,some,$$$,F,T,french,0,yes
1,T,F,F,T,full,$,F,F,thai,40,no
2,T,T,F,F,some,$,F,F,swiss,0,yes
3,T,F,T,T,full,$,F,F,thai,20,yes
4,T,F,T,F,full,$$$,F,T,french,60,no
5,F,T,F,T,some,$$,T,F,italian,0,yes
6,F,T,F,F,none,$,T,F,swiss,20,no
7,F,F,F,T,some,$$,T,T,thai,0,yes
8,F,T,T,F,full,$,T,F,swiss,60,no
9,T,T,T,T,full,$$$,F,T,italian,20,no


In [2]:
from sklearn.model_selection import train_test_split
from sklearn import preprocessing

x_train = restaurant[['choice', 'bar', 'day', 'hungry', 'patron', 'price', 'rain', 'booking', 'type']]
le = preprocessing.LabelEncoder()
x_train = pd.DataFrame(columns=x_train.columns, data=le.fit_transform(x_train.values.flatten()).reshape(x_train.shape))
x_train['time'] = restaurant['time']

## Build the co-occurrence matrix.

In [3]:
coocc = x_train.T.dot(x_train)
coocc.columns.name = "Co-occurence"
coocc

Co-occurence,choice,bar,day,hungry,patron,price,rain,booking,type,time
Co-occurence,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
choice,157,150,148,155,312,30,141,144,383,960
bar,150,150,144,150,308,27,141,139,378,920
day,148,144,143,147,294,28,136,137,367,980
hungry,155,150,147,157,315,30,143,144,386,900
patron,312,308,294,315,668,60,296,294,789,1600
price,30,27,28,30,60,14,26,31,52,160
rain,141,141,136,143,296,26,136,133,362,860
booking,144,139,137,144,294,31,133,136,352,860
type,383,378,367,386,789,52,362,352,1032,2300
time,960,920,980,900,1600,160,860,860,2300,11600


## Calculate support, confidence, completeness, lift, and leverage for the following rules. Build the co-occurrence matrix.

| Antecedents         | Consequents        |
|---------------------|--------------------|
| `choice` is `T`     | `bar` is `F`       |
| `bar` is `T`        | `day` is `F`       |
| `day` is `T`        | `hungry` is `T`    |
| `hungry` is `F`     | `patron` is `some` |
| `patron` is `full`  | `price` is `$$$`   |
| `price` is `$`      | `rain` is `T`      |
| `rain` is `F`       | `booking` is `F`   |
| `booking` is `T`    | `type` is `Swiss`  |
| `type` is `Italian` | `time` is `20`     |
| `type` is `Thai`    | `time` is `0`      |

In [4]:
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

te = TransactionEncoder()
te_ary = te.fit(x_train).transform(x_train)
df_new = pd.DataFrame(te_ary, columns=te.columns_)

frequent_itemsets: pd.DataFrame = apriori(df_new, min_support=0.000001, use_colnames=True).sort_values(by='support', ascending=False)
frequent_itemsets

Unnamed: 0,support,itemsets
13,0.416667,(r)
7,0.416667,(i)
0,0.333333,(a)
4,0.333333,(e)
10,0.333333,(n)
...,...,...
108,0.083333,"(o, b, i)"
109,0.083333,"(k, b, n)"
110,0.083333,"(o, k, b)"
111,0.083333,"(o, b, n)"


In [5]:
association_rules(frequent_itemsets, metric="confidence", min_threshold=0.000001)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(i),(e),0.416667,0.333333,0.250000,0.600000,1.8,0.111111,1.666667
1,(e),(i),0.333333,0.416667,0.250000,0.750000,1.8,0.111111,2.333333
2,(n),(r),0.333333,0.416667,0.250000,0.750000,1.8,0.111111,2.333333
3,(r),(n),0.416667,0.333333,0.250000,0.600000,1.8,0.111111,1.666667
4,(a),(r),0.333333,0.416667,0.250000,0.750000,1.8,0.111111,2.333333
...,...,...,...,...,...,...,...,...,...
2287,(n),"(g, h, y, u, r)",0.333333,0.083333,0.083333,0.250000,3.0,0.055556,1.222222
2288,(h),"(g, n, y, u, r)",0.166667,0.083333,0.083333,0.500000,6.0,0.069444,1.833333
2289,(y),"(g, n, h, u, r)",0.250000,0.083333,0.083333,0.333333,4.0,0.062500,1.375000
2290,(u),"(g, n, h, y, r)",0.083333,0.083333,0.083333,1.000000,12.0,0.076389,inf


It seems like `mlextend` doesn't easily allow giving custom rules, hence why I opted to basically compute all possible rules above, including the ones given in the exercise.

## Explain these measures.

| Measure      | Explanation                                                                                                         |
|--------------|---------------------------------------------------------------------------------------------------------------------|
| Support      | The percentage of groups that contain all of the items listed in the rule.                                          |
| Confidence   | The percentage of consequences given the antecedents.                                                               |
| Completeness | The ratio of all transactions with the predicted item covered by a the rule.                                        |
| Lift         | The ratio of the confidence of the rule and the expected confidence of the rule.                                    |
| Leverage     | The difference of `XX` and `YY` appearing together and their expected value if they were statistically independent. |


## Use the Apriori algorithm to find frequent item sets. We are only interested in item sets having a support value of at least 50%.

These are non-existent as can be seen earlier in the sorted `apriori()` computation.
Nonetheless, here is an explicit one:

In [6]:
apriori(df_new, min_support=0.5, use_colnames=True)

Unnamed: 0,support,itemsets
