# Bayesian optimization

Zbiór danych do analizy: https://www.kaggle.com/datasets/prishasawhney/mushroom-dataset

Mamy dane dotyczące grzybów. Model ma za zadanie ocenić, czy grzyb jest jadalny.
Cel biznesowy: Stworzenie aplikacji, która pomoże użytkownikowi w ocenie czy grzyb jest jadalny, poprawiając bezpieczeństwo.

Zmienne:
- Cap Diameter
- Cap Shape
- Gill Attachment
- Gill Color
- Stem Height
- Stem Width
- Stem Color
- Season
- Target Class - Is it edible or not?

Dokumentacja: https://bayesian-optimization.github.io/BayesianOptimization/2.0.0/


In [None]:
#pip install bayesian-optimization

In [38]:
from sklearn.svm import SVC
import pandas as pd
from sklearn.model_selection import  cross_val_score 
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from bayes_opt import BayesianOptimization

In [None]:
# puść ten kod, 
# jeżeli wywołujesz plik  w folderze rozwiąznaia, 
# a ramka danych znajduje się w folderze data
import os 
os.chdir('../')

In [None]:
# Pobranie danych
df = pd.read_csv('data/mushroom.csv')

In [None]:
from sklearn.preprocessing import scale

In [None]:
# skalowanie danych
df_x_scaled = pd.DataFrame(scale(df.drop('class',axis=1)),columns = df.drop('class',axis=1).columns)

In [None]:
df_x_scaled

In [None]:
# Podział na zbiór treningowy i testowy
train_x, test_x,train_y, test_y = train_test_split(df_x_scaled,df['class'], test_size=0.3, random_state=1000)

In [27]:
# funkcja optymalizacyjna
def opt_fun(C):
    model = SVC(C=C).fit(train_x,train_y)
    score = cross_val_score(model, train_x, train_y, cv=3, scoring='f1').mean()
    return score

In [28]:
# Przykładowy model
model = SVC(C=1).fit(train_x,train_y)

In [29]:
# cross val score
cross_val_score(model, train_x, train_y, cv=3, scoring='f1')

array([0.89101449, 0.89199856, 0.89524222])

In [30]:
# Zakres parametrow
params = {"C": [0.01, 5]}

In [31]:
# Optymalizacja
optimization = BayesianOptimization(f=opt_fun, pbounds = params)

In [32]:
# Optymalizacja
optimization.maximize(n_iter=4)

|   iter    |  target   |     C     |
-------------------------------------
| [39m1        [39m | [39m0.8819   [39m | [39m0.6675   [39m |
| [35m2        [39m | [35m0.9166   [39m | [35m2.013    [39m |
| [35m3        [39m | [35m0.9285   [39m | [35m3.476    [39m |
| [39m4        [39m | [39m0.9063   [39m | [39m1.471    [39m |
| [39m5        [39m | [39m0.8864   [39m | [39m0.7899   [39m |
| [35m6        [39m | [35m0.9361   [39m | [35m5.0      [39m |
| [39m7        [39m | [39m0.9329   [39m | [39m4.373    [39m |
| [39m8        [39m | [39m0.935    [39m | [39m4.8      [39m |
| [39m9        [39m | [39m0.9361   [39m | [39m5.0      [39m |


In [33]:
optimization.max

{'target': 0.9360537319953947, 'params': {'C': 4.999653706758339}}

In [34]:
C = optimization.max['params']['C']
C

4.999653706758339

In [35]:
model = SVC(C=C).fit(train_x,train_y)

In [36]:
pred_test = model.predict(test_x)

In [39]:
f1_score(test_y,pred_test)

0.9466192170818505