<a href="https://colab.research.google.com/github/syljan/machine-learning-bootcamp/blob/main/unsupervised/03_association_rules/01_apriori.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### scikit-learn
Strona biblioteki: [https://scikit-learn.org](https://scikit-learn.org)  

Dokumentacja/User Guide: [https://scikit-learn.org/stable/user_guide.html](https://scikit-learn.org/stable/user_guide.html)

Podstawowa biblioteka do uczenia maszynowego w języku Python.

Aby zainstalować bibliotekę scikit-learn, użyj polecenia poniżej:
```
!pip install scikit-learn
```
Aby zaktualizować do najnowszej wersji bibliotekę scikit-learn, użyj polecenia poniżej:
```
!pip install --upgrade scikit-learn
```
Kurs stworzony w oparciu o wersję `0.22.1`

### Spis treści:
1. [Import bibliotek](#0)
2. [Wygenerowanie danych](#1)
3. [Przygotowanie danych](#2)
4. [Algorytm Apriori](#3)




### <a name='0'></a> Import bibliotek

In [None]:
import pandas as pd
import numpy as np

### <a name='1'></a> Wygenerowanie danych

In [None]:
data = {'produkty': ['chleb jajka mleko', 'mleko ser', 'chleb masło ser', 'chleb jajka']}

transactions = pd.DataFrame(data=data, index=[1, 2, 3, 4])
transactions

Unnamed: 0,produkty
1,chleb jajka mleko
2,mleko ser
3,chleb masło ser
4,chleb jajka


### <a name='2'></a> Przygotowanie danych

In [None]:
# rozwinięcie kolumny do obiektu DataFrame
expand = transactions['produkty'].str.split(expand=True)
expand

Unnamed: 0,0,1,2
1,chleb,jajka,mleko
2,mleko,ser,
3,chleb,masło,ser
4,chleb,jajka,


In [None]:
# wydobycie nazw wszystkich produktów
products = []
for col in expand.columns:
    for product in expand[col].unique():
        if product is not None and product not in products:
            products.append(product)

products.sort()
print(products)

['chleb', 'jajka', 'masło', 'mleko', 'ser']


In [None]:
transactions_encoded = np.zeros((len(transactions), len(products)), dtype='int8')
transactions_encoded

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]], dtype=int8)

In [None]:
# kodowanie 0-1
for row in zip(range(len(transactions)), transactions_encoded, expand.values):
    for idx, product in enumerate(products):
        if product in row[2]:
            transactions_encoded[row[0], idx] = 1

transactions_encoded

array([[1, 1, 0, 1, 0],
       [0, 0, 0, 1, 1],
       [1, 0, 1, 0, 1],
       [1, 1, 0, 0, 0]], dtype=int8)

In [None]:
transactions_encoded_df = pd.DataFrame(transactions_encoded, columns=products)
transactions_encoded_df

Unnamed: 0,chleb,jajka,masło,mleko,ser
0,1,1,0,1,0
1,0,0,0,1,1
2,1,0,1,0,1
3,1,1,0,0,0


### <a name='3'></a> Algorytm Apriori

In [None]:
from mlxtend.frequent_patterns import apriori, association_rules

supports = apriori(transactions_encoded_df, min_support=0.0, use_colnames=True)
supports

Unnamed: 0,support,itemsets
0,0.75,(chleb)
1,0.5,(jajka)
2,0.25,(masło)
3,0.5,(mleko)
4,0.5,(ser)
5,0.5,"(chleb, jajka)"
6,0.25,"(chleb, masło)"
7,0.25,"(mleko, chleb)"
8,0.25,"(ser, chleb)"
9,0.0,"(masło, jajka)"


In [None]:
supports = apriori(transactions_encoded_df, min_support=0.3, use_colnames=True)
supports

Unnamed: 0,support,itemsets
0,0.75,(chleb)
1,0.5,(jajka)
2,0.5,(mleko)
3,0.5,(ser)
4,0.5,"(chleb, jajka)"


In [None]:
rules = association_rules(supports, metric='confidence', min_threshold=0.65)
rules = rules.iloc[:, [0, 1, 4, 5, 6]]
rules

Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(chleb),(jajka),0.5,0.666667,1.333333
1,(jajka),(chleb),0.5,1.0,1.333333


In [None]:
rules.sort_values(by='lift', ascending=False)

Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(chleb),(jajka),0.5,0.666667,1.333333
1,(jajka),(chleb),0.5,1.0,1.333333
