This is the first part of the series of tutorials about predicting customers that are at risk of churning with Hopsworks Feature Store. As part of this first module, you will work with user data related to the telco industry. 
The objective of this tutorial is to demonstrate how to work with the **Hopworks Feature Store** for batch data with a goal of training and deploying a model that can predict customers that are at risk of churning.


First of all you will load the data and do some feature engineering on it.

In [1]:
!pip install -U hopsworks --quiet

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m120.6/120.6 KB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 KB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m135.6/135.6 KB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.3/45.3 KB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m68.2/68.2 KB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.7/43.7 KB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

The data you will use comes from three different CSV files:

- `demography.csv`: Demographic informations,
- `customer_info.csv`: customer information such as contract type, billing methods and monthly charges as well as whether customer has churned within the last month.
- `subscriptions.csv`: customer subscription to services such as internet, mobile or movie streaming.

You can conceptualize these CSV files as originating from separate data sources.
**All three files have a customer id column `customerid` in common, which you can use for joins.**

Let's go ahead and load the data.

In [2]:
import pandas as pd

demography_df = pd.read_csv("https://repo.hops.works/dev/davit/churn/demography.csv")
customer_info_df = pd.read_csv("https://repo.hops.works/dev/davit/churn/customer_info.csv")
subscriptions_df = pd.read_csv("https://repo.hops.works/dev/davit/churn/subscriptions.csv")

In [3]:
demography_df.head(10)

Unnamed: 0,customerID,gender,SeniorCitizen,Dependents,Partner
0,7590-VHVEG,Female,0,No,Yes
1,5575-GNVDE,Male,0,No,No
2,3668-QPYBK,Male,0,No,No
3,7795-CFOCW,Male,0,No,No
4,9237-HQITU,Female,0,No,No
5,9305-CDSKC,Female,0,No,No
6,1452-KIOVK,Male,0,Yes,No
7,6713-OKOMC,Female,0,No,No
8,7892-POOKP,Female,0,No,Yes
9,6388-TABGU,Male,0,Yes,No


In [4]:
customer_info_df.head(10)

Unnamed: 0,customerID,Contract,tenure,PaymentMethod,PaperlessBilling,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Month-to-month,1,Electronic check,Yes,29.85,29.85,No
1,5575-GNVDE,One year,34,Mailed check,No,56.95,1889.5,No
2,3668-QPYBK,Month-to-month,2,Mailed check,Yes,53.85,108.15,Yes
3,7795-CFOCW,One year,45,Bank transfer (automatic),No,42.3,1840.75,No
4,9237-HQITU,Month-to-month,2,Electronic check,Yes,70.7,151.65,Yes
5,9305-CDSKC,Month-to-month,8,Electronic check,Yes,99.65,820.5,Yes
6,1452-KIOVK,Month-to-month,22,Credit card (automatic),Yes,89.1,1949.4,No
7,6713-OKOMC,Month-to-month,10,Mailed check,No,29.75,301.9,No
8,7892-POOKP,Month-to-month,28,Electronic check,Yes,104.8,3046.05,Yes
9,6388-TABGU,One year,62,Bank transfer (automatic),No,56.15,3487.95,No


In [5]:
subscriptions_df.head(10)

Unnamed: 0,customerID,DeviceProtection,OnlineBackup,OnlineSecurity,InternetService,MultipleLines,PhoneService,TechSupport,StreamingMovies,StreamingTV
0,7590-VHVEG,No,Yes,No,DSL,No phone service,No,No,No,No
1,5575-GNVDE,Yes,No,Yes,DSL,No,Yes,No,No,No
2,3668-QPYBK,No,Yes,Yes,DSL,No,Yes,No,No,No
3,7795-CFOCW,Yes,No,Yes,DSL,No phone service,No,Yes,No,No
4,9237-HQITU,No,No,No,Fiber optic,No,Yes,No,No,No
5,9305-CDSKC,Yes,No,No,Fiber optic,Yes,Yes,No,Yes,Yes
6,1452-KIOVK,No,Yes,No,Fiber optic,Yes,Yes,No,No,Yes
7,6713-OKOMC,No,No,Yes,DSL,No phone service,No,No,No,No
8,7892-POOKP,Yes,No,No,Fiber optic,Yes,Yes,Yes,Yes,Yes
9,6388-TABGU,No,Yes,Yes,DSL,No,Yes,No,No,No


---
## Data Preparation

In this section you will perform feature engineering, such as converting textual features to numerical featurs and replacing missing values to 0s. Let's start with the Customer information feature group.

In [6]:
# Fix missing values problem.
customer_info_df["TotalCharges"] = pd.to_numeric(customer_info_df["TotalCharges"], errors='coerce')
customer_info_df["TotalCharges"].fillna(0, inplace=True)

customer_info_df["Churn"].replace({"No" : 0, "Yes" : 1}, inplace=True)

---
## Creating Feature Groups

A [feature group](https://docs.hopsworks.ai/feature-store-api/latest/generated/feature_group/) can be seen as a collection of conceptually related features. In this case, you will create 3 feature groups:
1. **Customer information feature group.**
2. **Customer demography feature group.**
3. **Customer subscibtion feature group.** 

As you can see feature groups are related to their source data. These feature groups have `customerid` as primary key, which will allow you to join them when creating a dataset in the next tutorial.

Before you can create a feature group you need to connect to Hopsworks feature store.

**** Note ****:

- Create an API Key in 

https://c.app.hopsworks.ai/account/api

In [7]:
import hopsworks

project = hopsworks.login(api_key_value="<api key>")

fs = project.get_feature_store()

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/28868
Connected. Call `.close()` to terminate connection gracefully.


To create a feature group you need to give it a name and specify a primary key. It is also good to provide a description of the contents of the feature group.

In [8]:
customer_info_fg = fs.get_or_create_feature_group(
    name="customer_info",
    version=1,
    description="Customer info for churn prediction.",
    primary_key=['customerID'],
)

A full list of arguments can be found in the [documentation](https://docs.hopsworks.ai/feature-store-api/latest/generated/api/feature_store_api/#create_feature_group).

At this point, you have only specified some metadata for the feature group. It does not store any data or even have a schema defined for the data. To make the feature group persistent you need to populate it with its associated data using the `insert` function.

In [9]:
customer_info_fg.insert(customer_info_df)



Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/28868/fs/28788/fg/30208


Uploading Dataframe: 0.00% |          | Rows 0/7043 | Elapsed Time: 00:00 | Remaining Time: ?

Launching offline feature group backfill job...
Backfill Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/28868/jobs/named/customer_info_1_offline_fg_backfill/executions


(<hsfs.core.job.Job at 0x7fd26563d760>, None)

In [10]:
feature_descriptions = [
    {"name": "customerid", "description": "Customer id"}, 
    {"name": "contract", "description": "Type of contact"}, 
    {"name": "tenure", "description": "How long they’ve been a customer"}, 
    {"name": "paymentmethod", "description": "Payment method"}, 
    {"name": "paperlessbilling", "description": "Whether customer has paperless billing or not"}, 
    {"name": "monthlycharges", "description": "Monthly charges"}, 
    {"name": "totalcharges", "description": "Total charges"},
    {"name": "churn", "description": "Whether customer has left within the last month or not"},  
]

for desc in feature_descriptions: 
    customer_info_fg.update_feature_description(desc["name"], desc["description"])

In [11]:
demography_fg = fs.get_or_create_feature_group(
    name="customer_demography_info",
    version=1,
    description="Customer demography info for churn prediction.",
    primary_key=['customerID'],
)
demography_fg.insert(demography_df)



Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/28868/fs/28788/fg/30230


Uploading Dataframe: 0.00% |          | Rows 0/7043 | Elapsed Time: 00:00 | Remaining Time: ?

Launching offline feature group backfill job...
Backfill Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/28868/jobs/named/customer_demography_info_1_offline_fg_backfill/executions


(<hsfs.core.job.Job at 0x7fd262e04a90>, None)

In [12]:
feature_descriptions = [
    {"name": "customerid", "description": "Customer id"}, 
    {"name": "gender", "description": "Customer gender"},
    {"name": "seniorcitizen", "description": "Whether customer is a senior citizen or not"}, 
    {"name": "dependents", "description": "Whether customer has dependents or not"}, 
    {"name": "partner", "description": "Whether customer has partners or not"}, 
]

for desc in feature_descriptions: 
    demography_fg.update_feature_description(desc["name"], desc["description"])

In [13]:
subscriptions_fg = fs.get_or_create_feature_group(
    name="customer_subscription_info",
    version=1,
    description="Customer subscription info for churn prediction.",
    primary_key=['customerID'],
)
subscriptions_fg.insert(subscriptions_df)



Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/28868/fs/28788/fg/30254


Uploading Dataframe: 0.00% |          | Rows 0/7043 | Elapsed Time: 00:00 | Remaining Time: ?

Launching offline feature group backfill job...
Backfill Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/28868/jobs/named/customer_subscription_info_1_offline_fg_backfill/executions


(<hsfs.core.job.Job at 0x7fd26563dc10>, None)

In [14]:
feature_descriptions = [
    {"name": "customerid", "description": "Customer id"}, 
    {"name": "deviceprotection", "description": "Whether customer has signed up for device protection service"},
    {"name": "onlinebackup", "description": "Whether customer has signed up for online backup service"}, 
    {"name": "onlinesecurity", "description": "Whether customer has signed up for online security service"}, 
    {"name": "internetservice", "description": "Whether customer has signed up for internet service"}, 
    {"name": "multiplelines", "description": "Whether customer has signed up for multiple lines service"}, 
    {"name": "phoneservice", "description": "Whether customer has signed up for phone service"}, 
    {"name": "techsupport", "description": "Whether customer has signed up for tech support service"}, 
    {"name": "streamingmovies", "description": "Whether customer has signed up for streaming movies service"}, 
    {"name": "streamingtv", "description": "Whether customer has signed up for streaming TV service"}, 
]

for desc in feature_descriptions: 
    subscriptions_fg.update_feature_description(desc["name"], desc["description"])