# 3. ASSOCIATION RULES MINING (Milan dataset)

The goal of this notebook is to apply association rules mining to the Milan public establishments dataset.

Each establishment is treated as a transaction, composed of several categorical attributes such as:
- the type of exercise,
- the form of commerce,
- the form of sale,
- the sector,
- the zone code (ZD).

Using the same functions as in the teacher's notebook (`TransactionEncoder`, `apriori`, `association_rules`), the aim is to discover patterns of the form:

X → Y

that say things like:

"When a place has these characteristics, it almost always has this other characteristic as well."

This is different from strict functional dependencies: here we are interested in frequent and high-confidence co-occurrences, not in rules that must hold for 100% of the rows.


The libraries needed for association rules mining are imported.

The Milan dataset is loaded from the local CSV file using the semicolon as separator, as in the previous notebooks.


In [20]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

MILANO = pd.read_csv("Comune-di-Milano-Pubblici-esercizi(in)-2.csv", sep=";")
MILANO.head()


Unnamed: 0,þÿTipo esercizio storico pe,Insegna,Ubicazione,Tipo via,Descrizione via,Civico,Codice via,ZD,Forma commercio,Forma commercio prev,Forma vendita,Settore storico pe,Superficie somministrazione
0,,,ALZ NAVIGLIO GRANDE N. 12 ; isolato:057; (z.d. 6),ALZ,NAVIGLIO GRANDE,12,5144,6,,,,"Ristorante, trattoria, osteria;Genere Merceol....",83.0
1,,,ALZ NAVIGLIO GRANDE N. 44 (z.d. 6),ALZ,NAVIGLIO GRANDE,44,5144,6,,,,Bar gastronomici e simili,26.0
2,,,ALZ NAVIGLIO GRANDE N. 48 (z.d. 6),ALZ,NAVIGLIO GRANDE,48,5144,6,,,,Bar gastronomici e simili,58.0
3,,,ALZ NAVIGLIO GRANDE N. 8 (z.d. 6),ALZ,NAVIGLIO GRANDE,8,5144,6,,,,"BAR CAFFÿý E SIMILI;Ristorante, trattoria, ost...",101.0
4,,,ALZ NAVIGLIO PAVESE N. 24 (z.d. 6),ALZ,NAVIGLIO PAVESE,24,5161,6,,,,Bar gastronomici e simili,51.0


Association rules are usually defined on categorical data.  

From the Milan dataset a subset of attributes is selected that describes the characteristics of each public establishment and can be used as items in a transaction:
- `þÿTipo esercizio storico pe`: main type of exercise (bar, restaurant, etc.)
- `Forma commercio`: form of commerce
- `Forma vendita`: type of sale
- `Settore storico pe`: sector category
- `ZD`: zone code of the city

Each row in the dataset will become a transaction composed of these attribute-value pairs.


In [21]:
cols_for_rules = [
    "þÿTipo esercizio storico pe",
    "Forma commercio",
    "Forma vendita",
    "Settore storico pe",
    "ZD",
]

MILANO[cols_for_rules].head()



Unnamed: 0,þÿTipo esercizio storico pe,Forma commercio,Forma vendita,Settore storico pe,ZD
0,,,,"Ristorante, trattoria, osteria;Genere Merceol....",6
1,,,,Bar gastronomici e simili,6
2,,,,Bar gastronomici e simili,6
3,,,,"BAR CAFFÿý E SIMILI;Ristorante, trattoria, ost...",6
4,,,,Bar gastronomici e simili,6


For each establishment a list of items is created.

Each item is an attribute-value pair, for example:
- `þÿTipo esercizio storico pe=BAR CAFFÈ`
- `Forma vendita=SOMMINISTRAZIONE`
- `Settore storico pe=PUBBLICO ESERCIZIO`
- `ZD=6`

Missing values are ignored. The result is a list of transactions, where each transaction is a list of strings.


In [22]:
transactions = []

for _, row in MILANO[cols_for_rules].iterrows():
    items = []
    for col in cols_for_rules:
        value = row[col]
        if pd.isna(value):
            continue
        # Treat ZD as categorical but make sure it is a clean string
        if col == "ZD":
            value_str = str(int(value)) if not isinstance(value, str) else value
            items.append(f"{col}={value_str}")
        else:
            items.append(f"{col}={value}")
    if items:
        transactions.append(items)

len(transactions), transactions[:3]


(6904,
 [['Settore storico pe=Ristorante, trattoria, osteria;Genere Merceol.Autorizz.Sanit.;Ristorante',
   'ZD=6'],
  ['Settore storico pe=Bar gastronomici e simili', 'ZD=6'],
  ['Settore storico pe=Bar gastronomici e simili', 'ZD=6']])

The `TransactionEncoder` class from `mlxtend` is used to convert the list of transactions into a boolean matrix.

Each column corresponds to one possible item (for example `Forma vendita=SOMMINISTRAZIONE` or `ZD=6`).  
Each row corresponds to one establishment.  

A value `True` means that the item appears in that transaction, otherwise `False`.  
This is the input format required by the `apriori` function.



In [23]:
TE = TransactionEncoder()
array = TE.fit(transactions).transform(transactions)

basket = pd.DataFrame(array, columns=TE.columns_)
basket.head()


Unnamed: 0,Forma commercio=solo somministrazione,Forma commercio=somministrazione/minuto,Forma vendita=al banco,Forma vendita=al tavolo,Forma vendita=misto,Forma vendita=self service,Settore storico pe=BAR CAFFE' GELATERIA,Settore storico pe=BAR CAFFÿý,Settore storico pe=BAR CAFFÿý E SIMILI,Settore storico pe=BAR CAFFÿý E SIMILI;BAR CAFFÿý,...,þÿTipo esercizio storico pe=prodotti di gastronomia,þÿTipo esercizio storico pe=ristorante,"þÿTipo esercizio storico pe=ristorante, trattoria, osteria","þÿTipo esercizio storico pe=sale da ballo, locali notturni",þÿTipo esercizio storico pe=spaccio bevande analcoliche,"þÿTipo esercizio storico pe=tav.calde,self service,fast f.",þÿTipo esercizio storico pe=tavola calda,þÿTipo esercizio storico pe=tavola fredda,þÿTipo esercizio storico pe=trattoria,"þÿTipo esercizio storico pe=wine,birr.,pub enot.,caff.,the"
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


Before running Apriori, it is useful to have a quick idea of how frequent the main items are.

The support of each single item is computed by taking the mean of each boolean column. This represents the fraction of establishments where that item appears.



In [24]:
item_support = basket.mean().sort_values(ascending=False)
item_support.head(20)


Forma commercio=solo somministrazione                0.689890
þÿTipo esercizio storico pe=bar caffÿý               0.474218
Forma vendita=misto                                  0.320394
Forma vendita=al banco                               0.308951
ZD=1                                                 0.193366
Forma vendita=al tavolo                              0.157445
ZD=3                                                 0.128042
ZD=9                                                 0.117903
þÿTipo esercizio storico pe=ristorante               0.116889
ZD=4                                                 0.105012
ZD=2                                                 0.102984
ZD=8                                                 0.098349
ZD=6                                                 0.090817
ZD=5                                                 0.084009
Forma commercio=somministrazione/minuto              0.082561
þÿTipo esercizio storico pe=trattoria                0.080243
ZD=7    

The `apriori` function from `mlxtend.frequent_patterns` is applied to the boolean transaction matrix.

A minimum support threshold is chosen, for example 0.03, which means that only itemsets that appear in at least 3% of the establishments are kept.

The result is a DataFrame where each row represents a frequent itemset together with its support.  
An additional column `length` is created to store the size of each itemset.


In [25]:
freq_itemsets = apriori(basket, min_support=0.03, use_colnames=True)
freq_itemsets["length"] = freq_itemsets["itemsets"].apply(len)

freq_itemsets.sort_values("support", ascending=False).head(20)



Unnamed: 0,support,itemsets,length
0,0.68989,(Forma commercio=solo somministrazione),1
15,0.474218,(þÿTipo esercizio storico pe=bar caffÿý),1
31,0.390353,"(Forma commercio=solo somministrazione, þÿTipo...",2
4,0.320394,(Forma vendita=misto),1
2,0.308951,(Forma vendita=al banco),1
21,0.286211,"(Forma commercio=solo somministrazione, Forma ...",2
43,0.265643,"(Forma vendita=al banco, þÿTipo esercizio stor...",2
19,0.243192,"(Forma vendita=al banco, Forma commercio=solo ...",2
68,0.210313,"(Forma vendita=al banco, Forma commercio=solo ...",3
53,0.193656,"(þÿTipo esercizio storico pe=bar caffÿý, Forma...",2


Association rules are generated from the frequent itemsets using the `association_rules` function.

The chosen metric is `confidence` and the minimum confidence threshold is set to 0.5.  
This means that only rules X → Y that are correct in at least 50% of the cases where X appears are kept.

The resulting DataFrame contains, for each rule:
- the antecedent (left-hand side)
- the consequent (right-hand side)
- the support
- the confidence
- the lift and other interestingness measures.



In [26]:
rules = association_rules(freq_itemsets, metric="confidence", min_threshold=0.5)
rules.head()


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Forma vendita=al banco),(Forma commercio=solo somministrazione),0.308951,0.68989,0.243192,0.787154,1.140985,1.0,0.03005,1.456971,0.178807,0.321832,0.313645,0.569832
1,(Forma vendita=al tavolo),(Forma commercio=solo somministrazione),0.157445,0.68989,0.149768,0.951242,1.378831,1.0,0.041149,6.360182,0.326089,0.214701,0.842772,0.584166
2,(Forma vendita=misto),(Forma commercio=solo somministrazione),0.320394,0.68989,0.286211,0.893309,1.294858,1.0,0.065174,2.906625,0.335068,0.395279,0.655958,0.654087
3,(ZD=1),(Forma commercio=solo somministrazione),0.193366,0.68989,0.135139,0.698876,1.013026,1.0,0.001738,1.029843,0.015941,0.180639,0.028978,0.447381
4,(ZD=2),(Forma commercio=solo somministrazione),0.102984,0.68989,0.073146,0.710267,1.029537,1.0,0.002099,1.070331,0.031983,0.10163,0.06571,0.408146


The rules are sorted by confidence and lift to highlight the strongest and most interesting associations.

In particular, rules with:
- high confidence (close to 1) mean that the consequent almost always appears when the antecedent appears;
- high lift (greater than 1) indicate a positive association between antecedent and consequent beyond simple chance.


In [27]:
rules_sorted = rules.sort_values(["confidence", "lift"], ascending=False)
rules_sorted[["antecedents", "consequents", "support", "confidence", "lift"]].head(20)



Unnamed: 0,antecedents,consequents,support,confidence,lift
31,"(Forma vendita=al tavolo, þÿTipo esercizio sto...",(Forma commercio=solo somministrazione),0.070539,0.958661,1.389586
15,(þÿTipo esercizio storico pe=ristorante),(Forma commercio=solo somministrazione),0.111385,0.952912,1.381252
1,(Forma vendita=al tavolo),(Forma commercio=solo somministrazione),0.149768,0.951242,1.378831
34,"(þÿTipo esercizio storico pe=trattoria, Forma ...",(Forma commercio=solo somministrazione),0.039253,0.940972,1.363945
38,"(Forma vendita=misto, ZD=4)",(Forma commercio=solo somministrazione),0.03259,0.925926,1.342136
16,(þÿTipo esercizio storico pe=trattoria),(Forma commercio=solo somministrazione),0.074015,0.922383,1.337
37,"(Forma vendita=misto, ZD=3)",(Forma commercio=solo somministrazione),0.037514,0.896194,1.299039
2,(Forma vendita=misto),(Forma commercio=solo somministrazione),0.286211,0.893309,1.294858
14,(þÿTipo esercizio storico pe=pizzeria),(Forma commercio=solo somministrazione),0.045046,0.888571,1.28799
36,"(Forma vendita=misto, ZD=1)",(Forma commercio=solo somministrazione),0.057503,0.884187,1.281635


For interpretation it is easier to focus on rules where the antecedent contains only one or two items and the consequent is a single item.

The following filter keeps rules with:
- antecedent length ≤ 2
- consequent length = 1
- confidence ≥ 0.7

These rules describe clear and strong patterns such as:

"If a place has this type of exercise and this form of sale, then it is almost always in this sector or in this zone."


In [28]:
def set_length(s):
    return len(s)

filtered_rules = rules.copy()
filtered_rules["ante_len"] = filtered_rules["antecedents"].apply(set_length)
filtered_rules["cons_len"] = filtered_rules["consequents"].apply(set_length)

simple_rules = filtered_rules[
    (filtered_rules["ante_len"] <= 2)
    & (filtered_rules["cons_len"] == 1)
    & (filtered_rules["confidence"] >= 0.7)
]

simple_rules[["antecedents", "consequents", "support", "confidence", "lift"]] \
    .sort_values(["confidence", "lift"], ascending=False) \
    .head(20)


Unnamed: 0,antecedents,consequents,support,confidence,lift
31,"(Forma vendita=al tavolo, þÿTipo esercizio sto...",(Forma commercio=solo somministrazione),0.070539,0.958661,1.389586
15,(þÿTipo esercizio storico pe=ristorante),(Forma commercio=solo somministrazione),0.111385,0.952912,1.381252
1,(Forma vendita=al tavolo),(Forma commercio=solo somministrazione),0.149768,0.951242,1.378831
34,"(þÿTipo esercizio storico pe=trattoria, Forma ...",(Forma commercio=solo somministrazione),0.039253,0.940972,1.363945
38,"(Forma vendita=misto, ZD=4)",(Forma commercio=solo somministrazione),0.03259,0.925926,1.342136
16,(þÿTipo esercizio storico pe=trattoria),(Forma commercio=solo somministrazione),0.074015,0.922383,1.337
37,"(Forma vendita=misto, ZD=3)",(Forma commercio=solo somministrazione),0.037514,0.896194,1.299039
2,(Forma vendita=misto),(Forma commercio=solo somministrazione),0.286211,0.893309,1.294858
14,(þÿTipo esercizio storico pe=pizzeria),(Forma commercio=solo somministrazione),0.045046,0.888571,1.28799
36,"(Forma vendita=misto, ZD=1)",(Forma commercio=solo somministrazione),0.057503,0.884187,1.281635


### Interpretation and connection to the project

The association rules discovered in this notebook describe typical co-occurrence patterns between the attributes:

- `þÿTipo esercizio storico pe`
- `Forma commercio`
- `Forma vendita`
- `Settore storico pe`
- `ZD`

For example, many rules show that certain types of exercises are almost always associated with a specific form of sale or a specific sector. Other rules indicate that some sectors are strongly concentrated in particular zones ZD.

The goal is not to enforce strict constraints (as in functional dependencies), but to understand which combinations of characteristics are common and stable in the Milan public establishments data.

This complements the data profiling and functional dependencies notebooks:

- data profiling describes distributions and basic statistics for each attribute;
- functional dependencies check strict keys and address consistency;
- association rules reveal frequent patterns such as "if a place has these features, it almost always has this other feature too".

All three perspectives together provide a richer understanding of the quality and structure of the Milan dataset.


### Extra libraries for alternative algorithms

In addition to the Apriori implementation from `mlxtend`, three other algorithms will be demonstrated on the same Milan transactions:

- `efficient-apriori`
- `ECLAT` from `pyECLAT`
- `fpgrowth` from `mlxtend.frequent_patterns`

All of them will use exactly the same transaction data already built from the Milan dataset.


In [29]:
%pip install efficient-apriori pyECLAT


  pid, fd = os.forkpty()


Note: you may need to restart the kernel to use updated packages.


### 3.1 Association rules with `efficient-apriori`

The `efficient-apriori` library is another implementation of the Apriori algorithm.  
It works directly on the list of transactions, without converting them to a boolean DataFrame.

The same `transactions` built from the Milan public establishments are used here.  
A minimum support of 0.03 and a minimum confidence of 0.5 are applied, as in the previous Apriori example.


In [30]:
from efficient_apriori import apriori as eff_apriori

itemsets_ea, rules_ea = eff_apriori(
    transactions,
    min_support=0.03,
    min_confidence=0.5,
)

len(itemsets_ea), len(rules_ea)


(4, 76)

The rules discovered by `efficient-apriori` can be inspected to see some of the strongest patterns in the Milan data.


In [31]:
# Show the first 20 rules sorted by confidence (descending)
rules_ea_sorted = sorted(rules_ea, key=lambda r: (r.confidence, r.lift), reverse=True)

for r in rules_ea_sorted[:20]:
    print(r)


{Forma vendita=al tavolo, þÿTipo esercizio storico pe=ristorante} -> {Forma commercio=solo somministrazione} (conf: 0.959, supp: 0.071, lift: 1.390, conv: 7.502)
{þÿTipo esercizio storico pe=ristorante} -> {Forma commercio=solo somministrazione} (conf: 0.953, supp: 0.111, lift: 1.381, conv: 6.586)
{Forma vendita=al tavolo} -> {Forma commercio=solo somministrazione} (conf: 0.951, supp: 0.150, lift: 1.379, conv: 6.360)
{Forma vendita=al tavolo, þÿTipo esercizio storico pe=trattoria} -> {Forma commercio=solo somministrazione} (conf: 0.941, supp: 0.039, lift: 1.364, conv: 5.254)
{Forma vendita=misto, ZD=4} -> {Forma commercio=solo somministrazione} (conf: 0.926, supp: 0.033, lift: 1.342, conv: 4.186)
{þÿTipo esercizio storico pe=trattoria} -> {Forma commercio=solo somministrazione} (conf: 0.922, supp: 0.074, lift: 1.337, conv: 3.995)
{Forma vendita=misto, ZD=3} -> {Forma commercio=solo somministrazione} (conf: 0.896, supp: 0.038, lift: 1.299, conv: 2.987)
{Forma vendita=misto} -> {Forma co

### 3.2 Frequent itemsets with ECLAT (pyECLAT)

`pyECLAT` expects a transactional DataFrame where each row is a transaction and each column is an item position (0, 1, 2, ...), with the cell values equal to the item names.

For this reason, the original list of Milan transactions is converted into a wide DataFrame with integer column names, and this DataFrame is passed to `ECLAT`. The same minimum support and combination size as before are used.


In [33]:
from pyECLAT import ECLAT

# build a "wide" transactional DataFrame like in the teacher's MARKETBASKET example
max_len = max(len(t) for t in transactions)
padded_transactions = [t + [None] * (max_len - len(t)) for t in transactions]

df_eclat = pd.DataFrame(padded_transactions)
df_eclat.head()

# run ECLAT on this wide transactional table
eclat = ECLAT(data=df_eclat)

min_support = 0.03
min_combination = 2
max_combination = 2

rule_indices, rule_supports = eclat.fit(
    min_support=min_support,
    min_combination=min_combination,
    max_combination=max_combination,
    separator=" & ",
    verbose=True,
)



Combination 2 by 2


171it [02:04,  1.38it/s]


The frequent itemsets found by ECLAT are converted into a DataFrame and sorted by support.  
They represent combinations of attributes that appear frequently together in the Milan establishments.


In [34]:
result_eclat = pd.DataFrame(rule_supports.items(), columns=["Itemset", "Support"])
result_eclat.sort_values("Support", ascending=False).head(20)


Unnamed: 0,Itemset,Support
19,Forma commercio=solo somministrazione & þÿTipo...,0.390353
14,Forma commercio=solo somministrazione & Forma ...,0.286211
45,þÿTipo esercizio storico pe=bar caffÿý & Forma...,0.265643
20,Forma commercio=solo somministrazione & Forma ...,0.243192
39,Forma vendita=misto & þÿTipo esercizio storico...,0.193656
7,Forma commercio=solo somministrazione & Forma ...,0.149768
8,Forma commercio=solo somministrazione & ZD=1,0.135139
15,Forma commercio=solo somministrazione & þÿTipo...,0.111385
26,ZD=1 & þÿTipo esercizio storico pe=bar caffÿý,0.092121
11,Forma commercio=solo somministrazione & ZD=3,0.090962


### 3.3 Frequent itemsets and rules with FP-Growth

FP-Growth is another algorithm for frequent pattern mining.  
It compresses the transaction database into a compact FP-tree and mines frequent itemsets without generating all candidates explicitly.

The `fpgrowth` function from `mlxtend.frequent_patterns` is applied to the same `basket` DataFrame built from the Milan dataset.  
The resulting frequent itemsets are then used to generate association rules with the same `association_rules` function as before.


In [35]:
from mlxtend.frequent_patterns import fpgrowth

freq_itemsets_fp = fpgrowth(basket, min_support=0.05, use_colnames=True)
freq_itemsets_fp.head()


Unnamed: 0,support,itemsets
0,0.090817,(ZD=6)
1,0.193366,(ZD=1)
2,0.079519,(ZD=7)
3,0.102984,(ZD=2)
4,0.128042,(ZD=3)


Association rules are extracted from the FP-Growth frequent itemsets using a minimum confidence of 0.5.  
The rules are then sorted by confidence and lift, as before.


In [36]:
rules_fp = association_rules(freq_itemsets_fp, metric="confidence", min_threshold=0.5)

rules_fp_sorted = rules_fp.sort_values(
    ["confidence", "lift"],
    ascending=False
)[["antecedents", "consequents", "support", "confidence", "lift"]]

rules_fp_sorted.head(20)


Unnamed: 0,antecedents,consequents,support,confidence,lift
33,"(Forma vendita=al tavolo, þÿTipo esercizio sto...",(Forma commercio=solo somministrazione),0.070539,0.958661,1.389586
31,(þÿTipo esercizio storico pe=ristorante),(Forma commercio=solo somministrazione),0.111385,0.952912,1.381252
28,(Forma vendita=al tavolo),(Forma commercio=solo somministrazione),0.149768,0.951242,1.378831
36,(þÿTipo esercizio storico pe=trattoria),(Forma commercio=solo somministrazione),0.074015,0.922383,1.337
14,(Forma vendita=misto),(Forma commercio=solo somministrazione),0.286211,0.893309,1.294858
2,"(Forma vendita=misto, ZD=1)",(Forma commercio=solo somministrazione),0.057503,0.884187,1.281635
17,"(Forma vendita=misto, þÿTipo esercizio storico...",(Forma commercio=solo somministrazione),0.17106,0.883321,1.280379
22,"(Forma vendita=al banco, Forma commercio=solo ...",(þÿTipo esercizio storico pe=bar caffÿý),0.210313,0.8648,1.823635
20,(Forma vendita=al banco),(þÿTipo esercizio storico pe=bar caffÿý),0.265643,0.859822,1.813137
4,"(þÿTipo esercizio storico pe=bar caffÿý, ZD=1)",(Forma commercio=solo somministrazione),0.077781,0.84434,1.223876


### 4. Summary of the algorithms used

On the Milan public establishments dataset several association rule and frequent itemset mining algorithms have been applied:

- `apriori` from `mlxtend.frequent_patterns`, using the boolean `basket` matrix
- `efficient-apriori`, working directly on the list of Milan transactions
- `ECLAT` from `pyECLAT`, using the same boolean `basket` representation
- `fpgrowth` from `mlxtend.frequent_patterns`, again on the `basket` matrix

The item frequency table shows that the Milan dataset is strongly dominated by places whose form of commerce is “solo somministrazione”, with almost 70% of all establishments in this category. Almost half of the records are “bar caffè” and the most common forms of sale are “misto” and “al banco”, followed by “al tavolo”. Among the city zones, ZD=1 is the most represented, but all central zones (1, 2, 3, 4, 5, 8, 9) appear with non-negligible support. This already suggests a city where bars and similar venues, mainly focused on consumption on site, are very common and relatively well distributed across zones.

The frequent itemsets confirm this picture. The most frequent combinations always involve “Forma commercio=solo somministrazione” together with either “bar caffè” or one of the main forms of sale (“misto”, “al banco”, “al tavolo”). For example, the pair (solo somministrazione, bar caffè) appears in almost 39% of all establishments, and the combinations (solo somministrazione, forma vendita=al banco) or (solo somministrazione, forma vendita=misto) also have high support. This means that the typical public establishment in the dataset is a bar or similar place where people consume on site, often at the counter or with a mix of table and counter service.

The association rules extracted from these itemsets make this pattern more explicit. Rules such as “Forma vendita=al tavolo → Forma commercio=solo somministrazione” or “Forma vendita=misto → Forma commercio=solo somministrazione” have very high confidence (around 0.89–0.96) and lift clearly greater than 1. This means that almost every place that serves at the table or with a mixed service is classified as “solo somministrazione”, and that this co-occurrence is much stronger than what would be expected by chance. A similar interpretation holds for restaurants, trattorie and pizzerie: the rules “ristorante → solo somministrazione” and “trattoria → solo somministrazione” have confidence above 0.92, showing that these types of exercise are almost always pure consumption venues in the dataset. 

Another group of rules highlights the strong link between bar-type places and counter service. Rules like “Forma vendita=al banco → bar caffè” or “Forma vendita=al banco AND solo somministrazione → bar caffè” have confidence around 0.86–0.87 and lift close to 1.8. This shows that if a place mainly sells at the counter, it is very likely to be classified as a bar caffè, and this association is much stronger than the baseline probability of being a bar. These rules capture a very intuitive business pattern: counter service plus pure consumption strongly characterise the bar category in Milan.

The additional algorithms confirm the same structure. The `efficient-apriori` implementation recovers exactly the same top rules as `mlxtend.apriori`, with almost identical support, confidence and lift values. ECLAT, applied to the same Milan transactions, finds the same frequent pairs, for example (solo somministrazione, bar caffè), (solo somministrazione, forma vendita=al banco) and (solo somministrazione, ZD=1) as the most common itemsets. FP-Growth, again on the same boolean basket, produces rules whose ranking by confidence and lift is essentially the same as the ones from Apriori. Overall, all four methods agree on the main message: in the Milan public establishments dataset, bars and similar venues with on-site consumption dominate, and there are very strong and consistent associations between type of exercise, form of sale and form of commerce.

