# Bayesian optimization

Zbiór danych do analizy: https://www.kaggle.com/datasets/prishasawhney/mushroom-dataset

Mamy dane dotyczące grzybów. Model ma za zadanie ocenić, czy grzyb jest jadalny.
Cel biznesowy: Stworzenie aplikacji, która pomoże użytkownikowi w ocenie czy grzyb jest jadalny, poprawiając bezpieczeństwo.

Zmienne:
- Cap Diameter
- Cap Shape
- Gill Attachment
- Gill Color
- Stem Height
- Stem Width
- Stem Color
- Season
- Target Class - Is it edible or not?

Dokumentacja: https://bayesian-optimization.github.io/BayesianOptimization/2.0.0/


In [None]:
#pip install bayesian-optimization

In [1]:
from sklearn.svm import SVC
import pandas as pd
from sklearn.model_selection import  cross_val_score 
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from bayes_opt import BayesianOptimization

In [2]:
# puść ten kod, 
# jeżeli wywołujesz plik  w folderze rozwiąznaia, 
# a ramka danych znajduje się w folderze data
import os 
os.chdir('../')

In [3]:
# Pobranie danych
df = pd.read_csv('data/mushroom.csv')

In [3]:
from sklearn.preprocessing import scale

In [7]:
# skalowanie danych
df_x_scaled = pd.DataFrame(scale(df.drop('class',axis=1)),columns = df.drop('class',axis=1).columns)

In [8]:
df_x_scaled

Unnamed: 0,cap-diameter,cap-shape,gill-attachment,gill-color,stem-height,stem-width,stem-color,season
0,2.236139,-0.925864,-0.063737,0.834467,4.682845,0.631570,0.791508,2.788402
1,2.483444,-0.925864,-0.063737,0.834467,4.682845,0.646914,0.791508,2.788402
2,2.233361,-0.925864,-0.063737,0.834467,4.383334,0.658423,0.791508,2.788402
3,1.927704,0.925572,-0.063737,0.834467,4.652283,0.658423,0.791508,2.788402
4,2.049966,0.925572,-0.063737,0.834467,4.536146,0.527996,0.791508,-0.029348
...,...,...,...,...,...,...,...,...
54030,-1.373393,0.462713,0.384935,-1.665349,0.197600,-0.616434,1.098064,-0.029348
54031,-1.348385,-0.925864,0.384935,-1.665349,0.656036,-0.717450,1.098064,-0.029348
54032,-1.348385,0.462713,0.384935,-1.665349,0.240388,-0.597253,1.098064,-0.208490
54033,-1.356721,-0.925864,0.384935,-1.665349,0.423762,-0.716172,1.098064,-0.208490


In [9]:
# Podział na zbiór treningowy i testowy
train_x, test_x,train_y, test_y = train_test_split(df_x_scaled,df['class'], test_size=0.3, random_state=1000)

In [14]:
# funkcja optymalizacyjna
def opt_fun(C):
    model = SVC(C=C).fit(train_x, train_y)
    score = cross_val_score(model, train_x, train_y, cv=3,  scoring='roc_auc').mean()
    return score

In [10]:
# Przykładowy model
m = SVC(C=1).fit(train_x,train_y)

In [11]:
# cross val score
cross_val_score(m, train_x, train_y, cv=3,  scoring='roc_auc')

array([0.95114005, 0.94937216, 0.95059249])

In [12]:
# Zakres parametrow
params = {"C": [0.01,5]}

In [15]:
# Optymalizacja
optimization = BayesianOptimization(f = opt_fun,
                                    pbounds = params)

In [None]:
# Optymalizacja
optimization.maximize(n_iter=4,init_points=5)

|   iter    |  target   |     C     |
-------------------------------------
| [39m1        [39m | [39m0.9712   [39m | [39m3.107    [39m |
| [39m2        [39m | [39m0.9543   [39m | [39m1.189    [39m |
| [35m3        [39m | [35m0.9762   [39m | [35m4.522    [39m |
| [39m4        [39m | [39m0.964    [39m | [39m1.958    [39m |
| [35m5        [39m | [35m0.977    [39m | [35m4.863    [39m |
| [35m6        [39m | [35m0.9773   [39m | [35m5.0      [39m |
| [39m7        [39m | [39m0.9773   [39m | [39m5.0      [39m |
| [39m8        [39m | [39m0.9773   [39m | [39m5.0      [39m |
| [39m9        [39m | [39m0.9773   [39m | [39m5.0      [39m |


In [17]:
optimization.max

{'target': 0.9773043963308798, 'params': {'C': 4.999796980014235}}

In [22]:
C = optimization.max['params']['C']

In [23]:
model = SVC(C=C).fit(train_x,train_y)

In [24]:
test_pred = model.predict(test_x)

In [25]:
roc_auc_score(test_y,test_pred)

0.9416078599550147