## Linki

https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/

## Opis cech (features)

### Lista występujących cech w danych:

##### 1. class  _klasa_:
- edible = e    > _jadalne_
- poisonous = p > _trujące_

##### 2. cap-shape  _kształt kapelusza_:
- bell = b      > _dzwon_
- conical = c   > _stożkowy_
- convex = x    > _wypukły_
- flat = f      > _płaski_
- knobbed = k   > _gałkowy_
- sunken = s    > _wklęsły_

##### 3. cap-surface  _powierzchnia kapelusza_:
- fibrous = f   > _włóknista_
- grooves = g   > _rowkowata_
- scaly = y     > _łuskowata_
- smooth = s    > _gładka_

##### 3. cap-color _kolor kapelusza_:
- brown = n     > _brązowy_
- buff = b      > _jasnobrązowożółty_
- cinnamon = c  > _cynamon_
- gray = g      > _szary_
- green = r     > _zielony_
- pink = p      > _różowy_
- purple = u    > _fioletowy_
- red = e       > _czerwony_
- white = w     > _biały_
- yellow = y    > _zółty_

##### 4. bruises _bruzdy_:
- bruises = t   > _prawda_
- no = f        > _fałsz_

##### 5. odor _zapach_:
- almond=a > _migdałowy_
- anise=l >_anyżowy_
- creoote=c >_kreotyzowy_
- fishy=y >_rybi_
- foul=f >_śmierdzący_
- musty=m >_stęchły_
- none=n 
- pungent=p >_cierpki_
- spicy=s > _pikantny_

##### 6. gill-attachment _blaszki:
- attached=a
- descending=d
- free=f
- notched=n

##### 7. gill-spacing _odstępy blaszek_:
- close=c
- crowded=w
- distant=d

##### 8. gill-size _rozmiar blaszek_:
- broad=b
- narrow=n

##### 9. gill-color _kolor blaszek_:
- black=k
- brown=n
- buff=b
- chocolate=h
- gray=g
- green=r
- orange=o
- pink=p
- purple=u
- red=e
- white=w
- yellow=y

##### 10. stalk-shape _trzon_:
- enlarging=e
- tapering=t

##### 11. stalk-root _korzeń trzonu_:
- bulbous=b
- club=c
- cup=u
- equal=e
- rhizomorphs=z
- rooted=r
- missing=?

##### 12. stalk-surface-above-ring _powierzchnia trzonu powyżej pierścienia_:
- fibrous=f > _włóknista_
- scaly=y > _łuskowata_
- silky=k > _jedwabista_
- smooth=s > _gładka_

##### 13. stalk-surface-below-ring _powierzchnia trzonu poniżej pierścienia_:
- fibrous=f
- scaly=y
- silky=k
- smooth=s

##### 14. stalk-color-above-ring _kolor trzonu powyżej pierścienia_:
- brown=n
- buff=b
- cinnamon=c
- gray=g
- orange=o
- pink=p
- red=e
- white=w
- yellow=y

##### 15. stalk-color-below-ring _kolor trzonu poniżej pierścienia_:
- brown=n
- buff=b
- cinnamon=c
- gray=g
- orange=o
- pink=p
- red=e
- white=w
- yellow=y

##### 16. veil-type _typ welonu_:
- parial=p
- universal=u

##### 17. veil-color _kolor welonu_:
- brown=n,
- orange=o
- white=w
- yellow=y

##### 18. ring-number _ilość pierścieni_:
- none=none=o,two=t

##### 19. ring-type _rodzaj pierścienia_:
- cobwebby=c
- evanescent=e
- flaring=f
- large=l
- none=n
- pendant=p
- sheathing=s
- zone=z

##### 20. spore-print-color _kolor zarodników_:
- black=k
- brown=n
- buff=b
- chocolate=h
- green=r
- orange=o
- purple=u
- white=w
- yellow=y

##### 21. population _populacja_:
- abundant=a
- clustered=c
- numerous=n
- scattered=s
- several=v
- solitary=y

##### 22. habitat _środowisko występowania_:
- grasses=g
- leaves=l
- meadows=m
- paths=p
- urban=u
- waste=w
- woods=d

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import xgboost as xgb
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.linear_model import LogisticRegression
from mlxtend import plotting
from sklearn.metrics import balanced_accuracy_score, roc_auc_score, make_scorer
from sklearn.metrics import confusion_matrix
from sklearn.metrics import plot_confusion_matrix

In [None]:
df = pd.read_csv('mushrooms.csv')
df_columns = pd.read_csv('mushrooms.csv')

In [None]:
# domyślnie podaje 5 wierszy
df.head()

In [None]:
# object Dtype = string
df.info()

In [None]:
df.describe()

In [None]:
# Liczebność grzybór jadalnych (e) i trujących (p):
jadalne = df["class"].value_counts()["e"]
trujace = df["class"].value_counts()["p"]
print(f"Liczba grzybów jadalnych: {jadalne}, trujących: {trujace}.")

In [None]:
#sprawdzenie duplikatów
df.duplicated().any()

In [None]:
# Sprawdzenie brakujących wartości
df.isnull().sum()

In [None]:
# Sprawdzenie wartości NaN
df.isna().sum()

In [None]:
#zamiana danych na OHE, wywalamy class p - to będzie nasz target (class_e)
df = pd.get_dummies(df)
df = df.drop(["class_p"], axis=1)

In [None]:
y = df.class_e
x = df.drop(["class_e"], axis=1)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

In [None]:
rf_classifier = RandomForestClassifier(n_estimators = 5, criterion = 'gini', max_depth=5, bootstrap=True, random_state=1)

rf_classifier.fit(X_train, y_train)

In [None]:
rf_classifier.score(X_test, y_test)

In [None]:
plt.figure(figsize = (15,12))

plot_tree(rf_classifier.estimators_[0],
        feature_names=x.columns,
        class_names=["0","1"],
        filled=True);

In [None]:
feature_importances = pd.DataFrame(rf_classifier.feature_importances_, index=x.columns,
                                   columns=['importance']).sort_values('importance', ascending=False)



In [None]:
most_important_features = feature_importances[feature_importances["importance"] > 0]
len(most_important_features)

In [None]:
feature_importances[feature_importances["importance"] > 0]

In [None]:
most_important_features[most_important_features["importance"] > 0.05]

In [None]:
plt.figure(figsize=(30,10))
plt.plot(most_important_features, "o" )

In [None]:
z = df_columns.columns
x_columns = pd.DataFrame(x) 
proba = x_columns.columns
z[1] in proba[0]


In [None]:
feature_importances

In [None]:
fi = pd.DataFrame(feature_importances).reset_index()
fi.columns = ["feature", "importance"]

In [None]:
odor = fi[fi["feature"].str.contains("odor")]
print(odor)

In [None]:
slownik = {}
for f in z:
    odor = fi[fi["feature"].str.contains(f)].importance.sum()
    print(f, odor)

In [None]:
for row in fi.itertuples():
    print(row.feature, row.importance)

In [None]:
#tworzenie widgetów

In [None]:
import ipywidgets as widgets
from IPython.display import display

In [None]:
widgets_cap_shape = widgets.Dropdown(options=["bell", "conical", "convex", "flat", "knobbed", "sunken"])
widgets_cap_surface = widgets.Dropdown(options=["fibrous", "grooves", "scaly", "smooth"])
widgets_cap_color = widgets.Dropdown(options=["brown", "buff", "cinnamon", "gray", "green", "pink", "purple", "red", "white", "yellow"])
widget_bruises = widgets.Dropdown(options=["Yes", "No"])

In [None]:
values = {"Cap Shape": widgets_cap_shape.value, "Cap Surface": widgets_cap_surface.value, "Cap Color": widgets_cap_color.value, "Bruises": widget_bruises.value}

def widget_handler(Cap_Shape, Cap_Surface, Cap_Color, Bruises):
    values["Cap Shape"] = Cap_Shape
    values["Cap Surface"] = Cap_Surface
    values["Cap Color"] = Cap_Color
    values["Bruises"] = Bruises

widgets.interact(widget_handler, Cap_Shape = widgets_cap_shape, Cap_Surface = widgets_cap_surface, Cap_Color = widgets_cap_color, Bruises = widget_bruises)

In [None]:
print(values)

In [None]:
run_button = widgets.Button(description = "Run")

def button_callback(button):
    print(values)

run_button.on_click(button_callback)

In [None]:
display(widgets_cap_shape, widgets_cap_surface, widgets_cap_color, widget_bruises, run_button)

In [None]:
xgb_cl_1 = xgb.XGBClassifier( n_estimators=100,
                           max_depth=3,
                           use_label_encoder=False,
                           eval_metric='error'
                        )
xgb_cl_1.fit(X_train, y_train)

In [None]:
plot_confusion_matrix(xgb_cl_1,
                      X_test,
                      y_test,
                      values_format='d',
                      display_labels=["Jadalne", "Trujace"])

In [None]:
xgb_cl_2 = xgb.XGBClassifier(n_estimators=100,
                             seed=42,
                             objective='binary:logistic',
                             use_label_encoder=False,
                             eval_metric='aucpr'
                             )
xgb_cl_2.fit(X_train, y_train)

In [None]:
plot_confusion_matrix(xgb_cl_2,
                      X_test,
                      y_test,
                      values_format='d',
                      display_labels=["Jadalne", "Trujace"])