<div style="width: 60%; margin: 0 auto;">
    <img src="shopping.jpeg" alt="Анализ покупок" style="width: 100%; border: 1px solid #ddd;"/>
</div>

## <center>PIPELINE</center>

В рамках этой части задания ответьте в ноутбуке на следующие вопросы:

- Построение моделей на числовых признаках (минимум - NB и kNN, максимум - любые) с параметрами по умолчанию, вычисление метрик + комментарий о качестве моделей (1 балл).
- Подбор гиперпараметров у каждой из моделей (только на числовых признаках) при помощи GridSearchCV + вычисление метрик для лучших найденных моделей + текстовый комментарий (3 балла).
- Добавление категориальных признаков в лучшую модель, обучение модели и заново подбор ее гиперпараметров + вычисление метрик + текстовый комментарий (2 балла).
- Построение Explainer Dashboard и сохранение файла с ним на GitHub (1 балл)
- Анализ модели в Explainer Dashboard (выводы пишем в том же Jupyter Notebook):
- какие факторы наиболее важны в среднем для получения прогноза? (1 балл)
- какие значения метрик получились и что это значит? (1 балл)
анализ 2-3 индивидуальных прогнозов с комментарием (1 балл)
***ВАЖНО!!!***
Кроме непосредственно кода в ноутбуке не забывайте делать выводы текстом!
Выводы - самая важная часть исследования

In [28]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

In [29]:
url = "https://github.com/aiedu-courses/eda_and_dev_tools/raw/main/datasets/online_shoppers_intention.csv"
df = pd.read_csv(url)
df.head()

Unnamed: 0,Administrative,Administrative_Duration,Informational,Informational_Duration,ProductRelated,ProductRelated_Duration,BounceRates,ExitRates,PageValues,SpecialDay,Month,OperatingSystems,Browser,Region,TrafficType,VisitorType,Weekend,Revenue
0,0,0.0,0,0.0,1,0.0,0.2,0.2,0.0,0.0,Feb,1,1,1,1,Returning_Visitor,False,False
1,0,0.0,0,0.0,2,64.0,0.0,0.1,0.0,0.0,Feb,2,2,1,2,Returning_Visitor,False,False
2,0,0.0,0,0.0,1,0.0,0.2,0.2,0.0,0.0,Feb,4,1,9,3,Returning_Visitor,False,False
3,0,0.0,0,0.0,2,2.666667,0.05,0.14,0.0,0.0,Feb,3,2,2,4,Returning_Visitor,False,False
4,0,0.0,0,0.0,10,627.5,0.02,0.05,0.0,0.0,Feb,3,3,1,4,Returning_Visitor,True,False


Пропуски в ``Informational_Duration`` и ``ProductRelated_Duration`` заполним нулями, так как  если время не зафиксировано, страницы не были просмотрены.

Пропуски ``ExitRates``заполним медианным значением, чтобы сохранить распределение метрики и избежать выбросов.

In [30]:
df[['Informational_Duration', 'ProductRelated_Duration']] = df[['Informational_Duration', 'ProductRelated_Duration']].fillna(0)
df['ExitRates'] = df['ExitRates'].fillna(df['ExitRates'].median())
print("Пропуски после обработки:")
print(df.isna().sum())

Пропуски после обработки:
Administrative             0
Administrative_Duration    0
Informational              0
Informational_Duration     0
ProductRelated             0
ProductRelated_Duration    0
BounceRates                0
ExitRates                  0
PageValues                 0
SpecialDay                 0
Month                      0
OperatingSystems           0
Browser                    0
Region                     0
TrafficType                0
VisitorType                0
Weekend                    0
Revenue                    0
dtype: int64


В колонке 'Month' мы заметили два дублирующихся значения 'Aug' и 'aug'. Объединим их в одно

In [31]:
df['Month'] = df['Month'].replace('aug', 'Aug')
df['Month'].unique()

array(['Feb', 'Aug', 'Mar', 'May', 'Oct', 'June', 'Jul', 'Nov', 'Sep',
       'Dec'], dtype=object)

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

### 1. Предварительная подготовка данных

In [32]:
# Разделение на признаки и целевую переменную
X = df.drop('Revenue', axis=1)
y = df['Revenue']

# Разделение на обучающую и тестовую выборки
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

Выделим числовые признаки

In [33]:
numeric_features = [
    'Administrative', 'Administrative_Duration', 
    'Informational', 'Informational_Duration',
    'ProductRelated', 'ProductRelated_Duration',
    'BounceRates', 'ExitRates', 'PageValues',
    'SpecialDay'
]

Выделим категориальные признаки

In [34]:
categorical_features = [
    'Month', 'OperatingSystems', 'Browser', 
    'Region', 'TrafficType', 'VisitorType', 'Weekend'
]

### 2.Построение моделей на числовых признаках

Создадим pipeline с масштабированием

In [35]:
pipeline = Pipeline([
    ('scaler', StandardScaler())
])

In [36]:
X_train_num = pipeline.fit_transform(X_train[numeric_features])
X_test_num = pipeline.transform(X_test[numeric_features])

Обучим модели по Байесу и по KNN

In [37]:
# Наивный Байес
nb = GaussianNB()
nb.fit(X_train_num, y_train)

# KNN
knn = KNeighborsClassifier()
knn.fit(X_train_num, y_train)

In [38]:
from sklearn.metrics import accuracy_score

# Для Naive Bayes
y_pred_nb = nb.predict(X_test_num)
print(f"NB Accuracy: {accuracy_score(y_test, y_pred_nb):.4f}")

# Для KNN
y_pred_knn = knn.predict(X_test_num)
print(f"KNN Accuracy: {accuracy_score(y_test, y_pred_knn):.4f}")

NB Accuracy: 0.8177
KNN Accuracy: 0.8836


`GaussianNB` предполагает нормальное распределение данных. Так как признаки сильно скошены, точность получилась низкой. Так же было бы неплохо удалить коррелирующие признаки(что, в принципе, уже мы сделали в первой части проекта- EDA).

- ExitRates и BounceRates (r = 0.91) - можно оставим только ExitRates, так как он сильнее коррелирует с Revenue.

- ProductRelated и ProductRelated_Duration (r = 0.83) - оставим ProductRelated, так как он проще интерпретируется.

- Administrative и Administrative_Duration (r = 0.60) - оставим оба, так как корреляция умеренная.

`KNN` требует масштабирования (мы его сделали через StandardScaler), но чувствителен к дисбалансу классов.

In [39]:
df = df.drop(columns=['BounceRates', 'ProductRelated_Duration'])

In [40]:
# Наивный Байес
nb = GaussianNB()
nb.fit(X_train_num, y_train)

# KNN
knn = KNeighborsClassifier()
knn.fit(X_train_num, y_train)

In [41]:
from sklearn.metrics import accuracy_score

# Для Naive Bayes
y_pred_nb = nb.predict(X_test_num)
print(f"NB Accuracy: {accuracy_score(y_test, y_pred_nb):.4f}")

# Для KNN
y_pred_knn = knn.predict(X_test_num)
print(f"KNN Accuracy: {accuracy_score(y_test, y_pred_knn):.4f}")

NB Accuracy: 0.8177
KNN Accuracy: 0.8836


Ничего не поменялось. Подберем гиперпараметры для KNN


### 3.Подбор гиперпараметров

In [42]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('knn', KNeighborsClassifier())
])

knn_params = {
    'knn__n_neighbors': [3, 5, 7, 9, 11, 15, 17, 20],
    'knn__weights': ['uniform', 'distance'],
    'knn__p': [1, 2]  # 1 - манхэттен, 2 - евклидово расстояние
}

Создаем GridSearchCV

In [43]:
knn_grid = GridSearchCV(
    estimator=pipeline,
    param_grid=knn_params,
    cv=5,
    scoring='accuracy',
    n_jobs=-1
)

обучим модель и найдем лучшие гиперпараметры

In [44]:
knn_grid.fit(X_train_num, y_train)

best_knn = knn_grid.best_estimator_
y_pred_knn = best_knn.predict(X_test_num)

In [45]:
print("=== KNN Best Parameters ===")
print(knn_grid.best_params_)
print("\nBest CV Accuracy:", knn_grid.best_score_)
print("\nTest Accuracy:", accuracy_score(y_test, y_pred_knn))

=== KNN Best Parameters ===
{'knn__n_neighbors': 17, 'knn__p': 2, 'knn__weights': 'uniform'}

Best CV Accuracy: 0.890126905705056

Test Accuracy: 0.8877716509892961


Разница между CV и тестовой точностью всего ~0.003 - модель показывает стабильное качество без признаков переобучения. 

### 4.Добавление категориальных признаков

In [46]:
preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numeric_features),
        ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)
    ])

pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('knn', KNeighborsClassifier())
])

In [47]:
knn_params = {
    'knn__n_neighbors': [5, 7, 9, 11, 15, 17, 20],
    'knn__weights': ['uniform', 'distance'],
    'knn__p': [1, 2]  # 1 - манхэттен, 2 - евклид
}

knn_grid = GridSearchCV(
    pipeline,
    knn_params,
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    verbose=1
)

knn_grid.fit(X_train, y_train)

Fitting 5 folds for each of 28 candidates, totalling 140 fits


In [48]:
# Лучшая модель
best_knn = knn_grid.best_estimator_

# Предсказания
y_pred = best_knn.predict(X_test)
y_proba = best_knn.predict_proba(X_test)[:, 1]

# Метрики
print("=== Best Parameters ===")
print(knn_grid.best_params_)

# print("\n=== Test Metrics ===")
# print(classification_report(y_test, y_pred))

print("\nBest CV Accuracy:", knn_grid.best_score_)
print("\nTest Accuracy:", accuracy_score(y_test, y_pred_knn))

=== Best Parameters ===
{'knn__n_neighbors': 20, 'knn__p': 2, 'knn__weights': 'distance'}

Best CV Accuracy: 0.8829893733647113

Test Accuracy: 0.8877716509892961


Категориальные признаки не улучшили качество модели. Test Accuracy осталась практически идентичной (0.8878 vs 0.8878). CV Accuracy даже немного снизилась (0.8901 → 0.8830).

Категориальные признаки неинформативны для данной задачи. Month, Browser, Region могут не влиять на целевую переменную Revenue. 
У нас наблюдается дисбаланс классов признака Revenue, это делает accuracy ненадёжной метрикой.

### 5.Строим дашборд

In [49]:
from explainerdashboard import ClassifierExplainer, ExplainerDashboard

In [50]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [51]:
# Получаем преобразованные данные
preprocessor = knn_grid.best_estimator_.named_steps['preprocessor']
X_test_proc = pd.DataFrame.sparse.from_spmatrix(
    preprocessor.transform(X_test.iloc[:500]),
    columns=preprocessor.get_feature_names_out()
)

# Создаем explainer
explainer = ClassifierExplainer(
    model=knn_grid.best_estimator_,
    X=X_test_proc,
    y=y_test.iloc[:500]
)

splitting pipeline...
Note: shap values for shap='kernel' normally get calculated against X_background, but paramater X_background=None, so setting X_background=shap.sample(X, 50)...
Generating self.shap_explainer = shap.KernelExplainer(model, X, link='identity')



X has feature names, but KNeighborsClassifier was fitted without feature names



In [52]:
import warnings
warnings.filterwarnings('ignore')
warnings.filterwarnings('ignore', category=UserWarning, module='sklearn')

In [None]:
db = ExplainerDashboard(explainer)

Building ExplainerDashboard..
Detected notebook environment, consider setting mode='external', mode='inline' or mode='jupyterlab' to keep the notebook interactive while the dashboard is running...
For this type of model and model_output interactions don't work, so setting shap_interaction=False...
The explainer object has no decision_trees property. so setting decision_trees=False...
Generating layout...
Calculating shap values...


  0%|          | 0/500 [00:00<?, ?it/s]

In [None]:
db.run(host='127.0.0.1')

Starting ExplainerDashboard on http://192.168.1.61:8050



X has feature names, but KNeighborsClassifier was fitted without feature names


X has feature names, but KNeighborsClassifier was fitted without feature names


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature

Empty DataFrame
Columns: [col, contribution, value]
Index: []
Empty DataFrame
Columns: [col, contribution, value]
Index: []



X has feature names, but KNeighborsClassifier was fitted without feature names


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted withou

Empty DataFrame
Columns: [col, contribution, value]
Index: []
Empty DataFrame
Columns: [col, contribution, value]
Index: []



X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted

Empty DataFrame
Columns: [col, contribution, value]
Index: []
Empty DataFrame
Columns: [col, contribution, value]
Index: []



X has feature names, but KNeighborsClassifier was fitted without feature names


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted withou

Empty DataFrame
Columns: [col, contribution, value]
Index: []
Empty DataFrame
Columns: [col, contribution, value]
Index: []



X has feature names, but KNeighborsClassifier was fitted without feature names


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted withou

Empty DataFrame
Columns: [col, contribution, value]
Index: []
Empty DataFrame
Columns: [col, contribution, value]
Index: []



X has feature names, but KNeighborsClassifier was fitted without feature names


X has feature names, but KNeighborsClassifier was fitted without feature names


X has feature names, but KNeighborsClassifier was fitted without feature names



Empty DataFrame
Columns: [col, contribution, value]
Index: []
Empty DataFrame
Columns: [col, contribution, value]
Index: []



X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted

Empty DataFrame
Columns: [col, contribution, value]
Index: []



X has feature names, but KNeighborsClassifier was fitted without feature names


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted withou

Empty DataFrame
Columns: [col, contribution, value]
Index: []
Empty DataFrame
Columns: [col, contribution, value]
Index: []



X has feature names, but KNeighborsClassifier was fitted without feature names


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted withou

Empty DataFrame
Columns: [col, contribution, value]
Index: []
Empty DataFrame
Columns: [col, contribution, value]
Index: []



X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted

Empty DataFrame
Columns: [col, contribution, value]
Index: []
Empty DataFrame
Columns: [col, contribution, value]
Index: []
                                   col  contribution     value
0                  num__Administrative      0.019392  1.125643
3          num__Informational_Duration      0.000000 -0.244323
4                  num__ProductRelated      0.000000  1.146374
5         num__ProductRelated_Duration      0.015601  0.711803
6                     num__BounceRates      0.006652 -0.409203
..                                 ...           ...       ...
70        cat__VisitorType_New_Visitor      0.000000  0.000000
71              cat__VisitorType_Other      0.000000  0.000000
72  cat__VisitorType_Returning_Visitor      0.000000  1.000000
73                  cat__Weekend_False      0.000000  1.000000
74                   cat__Weekend_True      0.000000  0.000000

[72 rows x 3 columns]
Empty DataFrame
Columns: [col, contribution, value]
Index: []
                                   


X has feature names, but KNeighborsClassifier was fitted without feature names


X has feature names, but KNeighborsClassifier was fitted without feature names


X has feature names, but KNeighborsClassifier was fitted without feature names


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature

Empty DataFrame
Columns: [col, contribution, value]
Index: []
                                   col  contribution     value
0                  num__Administrative      0.019392  1.125643
3          num__Informational_Duration      0.000000 -0.244323
4                  num__ProductRelated      0.000000  1.146374
5         num__ProductRelated_Duration      0.015601  0.711803
6                     num__BounceRates      0.006652 -0.409203
..                                 ...           ...       ...
70        cat__VisitorType_New_Visitor      0.000000  0.000000
71              cat__VisitorType_Other      0.000000  0.000000
72  cat__VisitorType_Returning_Visitor      0.000000  1.000000
73                  cat__Weekend_False      0.000000  1.000000
74                   cat__Weekend_True      0.000000  0.000000

[72 rows x 3 columns]



X has feature names, but KNeighborsClassifier was fitted without feature names


X has feature names, but KNeighborsClassifier was fitted without feature names


X has feature names, but KNeighborsClassifier was fitted without feature names


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature

                                   col  contribution     value
0                  num__Administrative      0.000000  0.213469
1         num__Administrative_Duration      0.000000  0.347052
2                   num__Informational      0.000000 -0.398701
3          num__Informational_Duration      0.000000 -0.244323
4                  num__ProductRelated      0.004126  0.364204
..                                 ...           ...       ...
70        cat__VisitorType_New_Visitor      0.000000  0.000000
71              cat__VisitorType_Other      0.000000  0.000000
72  cat__VisitorType_Returning_Visitor      0.000000  1.000000
73                  cat__Weekend_False      0.000000  1.000000
74                   cat__Weekend_True      0.000000  0.000000

[72 rows x 3 columns]
Empty DataFrame
Columns: [col, contribution, value]
Index: []



X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted

                                   col  contribution     value
0                  num__Administrative     -0.000000  0.213469
1         num__Administrative_Duration     -0.000000  0.347052
2                   num__Informational     -0.000000 -0.398701
3          num__Informational_Duration     -0.000000 -0.244323
4                  num__ProductRelated     -0.004126  0.364204
..                                 ...           ...       ...
70        cat__VisitorType_New_Visitor     -0.000000  0.000000
71              cat__VisitorType_Other     -0.000000  0.000000
72  cat__VisitorType_Returning_Visitor     -0.000000  1.000000
73                  cat__Weekend_False     -0.000000  1.000000
74                   cat__Weekend_True     -0.000000  0.000000

[72 rows x 3 columns]
Empty DataFrame
Columns: [col, contribution, value]
Index: []



X has feature names, but KNeighborsClassifier was fitted without feature names


X has feature names, but KNeighborsClassifier was fitted without feature names


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a dense numpy array.


X has feature names, but KNeighborsClassifier was fitted without feature names


pandas.DataFrame with sparse columns found.It will be converted to a den

Анализ модели в Explainer Dashboard (выводы пишем в том же Jupyter Notebook):
- какие факторы наиболее важны в среднем для получения прогноза? (1 балл)
- какие значения метрик получились и что это значит? (1 балл)
- анализ 2-3 индивидуальных прогнозов с комментарием (1 балл)

PageValues - самый влиятельный признак, далее ProductRelated и ProductRelated_Duration/


Accuracy: 0.876 - высокая общая точность (87.6%)

Precision: 0.667 - 67% предсказанных покупок реальные (хорошо)

Recall: 0.301 - находит только 30% реальных покупок (низко)

F1: 0.415 - низкий баланс из-за плохого Recall

ROC AUC: 0.866 - отличная разделяющая способность

PR AUC: 0.578 - средняя точность-полнота (несбалансированные данные)