# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="../images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 03: Batch Inference</span>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/churn/3_churn_batch_inference.ipynb)

### <span style='color:#ff5f27'> 📝 Imports

In [None]:
import joblib
from xgboost import plot_importance
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

## <span style="color:#ff5f27;"> 📡 Connecting to Hopsworks Feature Store </span>

In [None]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store()

## <span style="color:#ff5f27;"> ⚙️ Feature View Retrieval</span>


In [None]:
feature_view = fs.get_feature_view(
        name = 'churn_feature_view',
        version = 1,
)

## <span style="color:#ff5f27;">🗄 Model Registry</span>


In [None]:
mr = project.get_model_registry()

---

## <span style='color:#ff5f27'>🚀 Fetch and test the model</span>

To identify customers at risk of churn lets retrieve your churn prediction model from Hopsworks model registry.

In [None]:
retrieved_model = mr.get_model(
    name="churnmodel",
    version=1
)
saved_model_dir = retrieved_model.download()

In [None]:
retrieved_xgboost_model = joblib.load(saved_model_dir + "/churnmodel.pkl")
retrieved_xgboost_model

---
## <span style="color:#ff5f27;">🔮  Use trained model to identify customers at risk of churn </span>


In [None]:
def transform_preds(predictions):
    return ['Churn' if pred == 1 else 'Not Churn' for pred in predictions]

In [None]:
feature_view.init_batch_scoring(1)

batch_data = feature_view.get_batch_data()

batch_data.head(3)

Let's predict the all for all customer data and then visualize predictions.

In [None]:
batch_data.drop('customerid',axis = 1, inplace = True)

predictions = retrieved_xgboost_model.predict(batch_data)
predictions = transform_preds(predictions)
predictions[:5]

---
## <span style="color:#ff5f27;">👨🏻‍🎨 Prediction Visualisation</span>

Now you got your predictions but you also would like to exlain predictions to make informed decisions. Lets visualise them and explain important features that influences on the risk of churning.

In [None]:
import inspect 

# Recall that you applied transformation functions, such as min max scaler and laber encoder. 
# Now you want to transform them back to human readable format.
df_all = batch_data.copy()
td_transformation_functions = feature_view._batch_scoring_server._transformation_functions
for feature_name in td_transformation_functions:
    td_transformation_function = td_transformation_functions[feature_name]
    sig, foobar_locals = inspect.signature(td_transformation_function.transformation_fn), locals()
    param_dict = dict([(param.name, param.default) for param in sig.parameters.values() if param.default != inspect._empty])
    if td_transformation_function.name == "label_encoder":
        rev_dict = {v: k for k, v in param_dict["value_to_index"].items()}
        df_all[feature_name] = df_all[feature_name].map(lambda x: rev_dict[x])
    if td_transformation_function.name == "min_max_scaler":
        df_all[feature_name] = df_all[feature_name].map(lambda x: x*(param_dict["max_value"]-param_dict["min_value"])+param_dict["min_value"])

            
df_all = df_all
df_all['Churn'] = predictions
df_all.head()

Lets plot feature importance 

In [None]:
figure_imp = plot_importance(retrieved_xgboost_model, max_num_features=10, importance_type='weight')
plt.show()

In [None]:
plt.figure(figsize = (13,6))

sns.countplot(
    data = df_all,
    x = 'internetservice',
    hue = 'Churn'
)

plt.title('Churn rate according to internet service subscribtion', fontsize = 20)
plt.xlabel("internetservice", fontsize = 13)
plt.ylabel('Number of customers', fontsize = 13)

plt.show()

Lets visualise couple of more imporant features such as `streamingtv` and `streamingmovies`

In [None]:
plt.figure(figsize = (13,6))

sns.countplot(
    data = df_all,
    x = 'streamingtv',
    hue = 'Churn'
)

plt.title('Churn rate according to internet streaming tv subscribtion', fontsize = 20)
plt.xlabel("streamingtv", fontsize = 13)
plt.ylabel('Number of customers', fontsize = 13)

plt.show()

In [None]:
plt.figure(figsize = (13,6))

sns.countplot(
    data = df_all,
    x = 'streamingtv',
    hue = 'Churn'
)

plt.title('Churn rate according to streaming movies service subscribtion', fontsize = 20)
plt.xlabel("streamingmovies", fontsize = 13)
plt.ylabel('Number of customers', fontsize = 13)

plt.show()

In [None]:
plt.figure(figsize = (13,6))

sns.countplot(
    data = df_all,
    x = 'gender',
    hue = 'Churn'
)

plt.title('Churn rate according to Gender', fontsize = 20)
plt.xlabel("Gender", fontsize = 13)
plt.ylabel('Count', fontsize = 13)

plt.show()

In [None]:
plt.figure(figsize = (13,6))

sns.histplot(
    data = df_all,
    x = 'totalcharges',
    hue = 'Churn'
)

plt.title('Amount of each Payment Method', fontsize = 20)
plt.xlabel("Charge Value", fontsize = 13)
plt.ylabel('Count', fontsize = 13)

plt.show()

In [None]:
plt.figure(figsize = (13,6))

sns.countplot(
    data = df_all,
    x = 'paymentmethod',
    hue = 'Churn'
)

plt.title('Amount of each Payment Method', fontsize = 20)
plt.xlabel("Payment Method", fontsize = 13)
plt.ylabel('Total Amount', fontsize = 13)

plt.show()

In [None]:
plt.figure(figsize = (13,6))

sns.countplot(
    data = df_all,
    x = 'partner',
    hue = 'Churn'
)

plt.title('Affect of having a partner on Churn/Not', fontsize = 20)
plt.xlabel("Have a partner", fontsize = 13)
plt.ylabel('Count', fontsize = 13)

plt.show()

---
## <span style="color:#ff5f27;">🧑🏻‍🔬 StreamLit App </span>

If you want to use an **interactive dashboards** - you can use a StreamLit App.

Use the following commands in terminal to run a Streamlit App:

> `cd {%path_to_hopsworks_tutorials%}/`  </br>
> `conda activate ./miniconda/envs/hopsworks` </br>
> `python -m streamlit run churn/streamlit_app.py`</br>

**⚠️** If you are running on Colab, you will need to follow a different procedure. As highlighted in this [notebook](https://colab.research.google.com/github/mrm8488/shared_colab_notebooks/blob/master/Create_streamlit_app.ipynb). 

---
## <span style="color:#ff5f27;"> 👓  Exploration</span>
In the Hopsworks feature store, the metadata allows for multiple levels of explorations and review. Here we will show a few of those capacities. 

### <span style="color:#ff5f27;">🔎 <b>Search</b></span> 
Using the search function in the ui, you can query any aspect of the feature groups, feature_view and training data that was previously created.

### <span style="color:#ff5f27;">📊 <b>Statistics</b> </span>
We can also enable statistics in one or all the feature groups.

In [None]:
customer_info_fg = fs.get_feature_group("customer_info", version = 1)
customer_info_fg.statistics_config = {
    "enabled": True,
    "histograms": True,
    "correlations": True
}

customer_info_fg.update_statistics_config()
customer_info_fg.compute_statistics()

![fg-statistics](../churn/images/churn_statistics.gif)


### <span style="color:#ff5f27;">⛓️ <b> Lineage </b> </span>
In all the feature groups and feature view you can look at the relation between each abstractions; what feature group created which training dataset and that is used in which model.
This allows for a clear undestanding of the pipeline in relation to each element. 

---

### <span style="color:#ff5f27;">🥳 <b> Next Steps  </b> </span>
Congratulations you've now completed the churn risk prediction tutorial for Managed Hopsworks.

Check out our other tutorials on ➡ https://github.com/logicalclocks/hopsworks-tutorials

Or documentation at ➡ https://docs.hopsworks.ai