# First data science project - data science gym

## Introduction

**Project description**

The fitness center network, 'Bodybuilder-Data Scientist,' is working on a strategy to engage users based on analytical data. One of the most common problems facing fitness clubs and similar services is customer churn. It's not always clear when a user has stopped using the service, as they may not always leave in an obvious way.

For a fitness center, a client is considered to have churned if they haven't visited the gym at least once in the last month. While it's possible that they went on vacation and will return to the gym upon their arrival, it's more likely that they won't. If a client starts going to the gym but then suddenly stops, they are unlikely to return.

Your task is to analyze the data and develop an action plan to retain customers.

Specifically, the objectives are to:

1. Learn how to predict the probability of customer churn (for the following month) for each client.
2. Create typical user profiles: identify several of the most prominent groups and characterize their key attributes.
3. Analyze the main features that have the greatest impact on churn.
4. Formulate key conclusions and develop recommendations to improve customer relationship management, including:
   1. Identifying target customer groups;
   2. Proposing measures to reduce churn;
   3. Determining other nuances of customer interactions.

**Data description**

We have a dataset `gym_churn.csv` containing information about the month prior to churn and the fact of churn for a specific month. The dataset includes the following fields:

1. `Churn` - indicating whether the customer churned in the current month.

The current fields in the dataset contain user data for the month prior to the churn check, such as:

2. `Gender` - the gender of the customer.
3. `Near_Location` - whether the customer lives or works in the area where the fitness center is located.
4. `Partner` - indicating whether the customer is an employee of a club partner company, in which case the fitness center stores information about the customer's employer.
5. `Promo_friends` - indicating whether the customer registered under the "bring a friend" promotion, using a promo code from an acquaintance when paying for the first subscription.
6. `Phone` - indicating whether the customer provided a contact phone number.
7. `Age` - the age of the customer.
8. `Lifetime` - the time since the customer's first visit to the fitness center (in months).

The dataset also includes information based on the client's visit log, purchases, and current subscription status, such as:

9. `Contract_period` - the duration of the customer's current active subscription, which can be a month, 3 months, 6 months, or a year.
10. `Month_to_end_contract` - the time until the end of the customer's current active subscription (in months).
11. `Group_visits` - indicating whether the customer attends group classes.
12. `Avg_class_frequency_total` - the average frequency of visits per week for the entire duration of the subscription.
13. `Avg_class_frequency_current_month` - the average frequency of visits per week for the previous month.
14. `Avg_additional_charges_total` - the total revenue from other fitness center services, such as cafes, sports goods, beauty, and massage salon.

In [40]:
import pandas as pd
import sklearn as sk
import numpy as np
import plotly.express as px
from IPython.display import display

# Save raw dataset in case we need it later
raw_gym = pd.read_csv('gym_churn.csv')

FIG_WIDTH = 8
FIG_HEIGHT = 5


# Data preconditioning

Let's make this dataset look nice with proper names.

In [41]:
df_gym = (
    raw_gym
    .copy()
    .rename(
        columns=lambda df: df.lower()
    )
)


# Exploratory data analysis

Let's check what we have inside:

1. Examine the dataset for the presence of missing features and analyze the mean values and standard deviations.
2. Compare the mean values of the features between two distinct groups: those who have churned and those who have not.
3. create bar charts and distributions to visualize the features of those who have churned and those who have not.
4. construct a correlation matrix and display it to analyze the relationships between the different features.

In [42]:
display(df_gym.describe().round(2).T)


Unnamed: 0,count,mean,std,min,25%,50%,75%,max
gender,4000.0,0.51,0.5,0.0,0.0,1.0,1.0,1.0
near_location,4000.0,0.85,0.36,0.0,1.0,1.0,1.0,1.0
partner,4000.0,0.49,0.5,0.0,0.0,0.0,1.0,1.0
promo_friends,4000.0,0.31,0.46,0.0,0.0,0.0,1.0,1.0
phone,4000.0,0.9,0.3,0.0,1.0,1.0,1.0,1.0
contract_period,4000.0,4.68,4.55,1.0,1.0,1.0,6.0,12.0
group_visits,4000.0,0.41,0.49,0.0,0.0,0.0,1.0,1.0
age,4000.0,29.18,3.26,18.0,27.0,29.0,31.0,41.0
avg_additional_charges_total,4000.0,146.94,96.36,0.15,68.87,136.22,210.95,552.59
month_to_end_contract,4000.0,4.32,4.19,1.0,1.0,1.0,6.0,12.0


In [43]:
display(
    df_gym[df_gym.churn == 1]
    .describe().round(2).T
)

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
gender,1061.0,0.51,0.5,0.0,0.0,1.0,1.0,1.0
near_location,1061.0,0.77,0.42,0.0,1.0,1.0,1.0,1.0
partner,1061.0,0.36,0.48,0.0,0.0,0.0,1.0,1.0
promo_friends,1061.0,0.18,0.39,0.0,0.0,0.0,0.0,1.0
phone,1061.0,0.9,0.3,0.0,1.0,1.0,1.0,1.0
contract_period,1061.0,1.73,2.13,1.0,1.0,1.0,1.0,12.0
group_visits,1061.0,0.27,0.44,0.0,0.0,0.0,1.0,1.0
age,1061.0,26.99,2.9,18.0,25.0,27.0,29.0,38.0
avg_additional_charges_total,1061.0,115.08,77.7,0.15,50.63,103.81,165.62,425.54
month_to_end_contract,1061.0,1.66,1.96,1.0,1.0,1.0,1.0,12.0


In [44]:
display(
    df_gym[df_gym.churn == 0]
    .describe().round(2).T
)

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
gender,2939.0,0.51,0.5,0.0,0.0,1.0,1.0,1.0
near_location,2939.0,0.87,0.33,0.0,1.0,1.0,1.0,1.0
partner,2939.0,0.53,0.5,0.0,0.0,1.0,1.0,1.0
promo_friends,2939.0,0.35,0.48,0.0,0.0,0.0,1.0,1.0
phone,2939.0,0.9,0.3,0.0,1.0,1.0,1.0,1.0
contract_period,2939.0,5.75,4.72,1.0,1.0,6.0,12.0,12.0
group_visits,2939.0,0.46,0.5,0.0,0.0,0.0,1.0,1.0
age,2939.0,29.98,3.01,19.0,28.0,30.0,32.0,41.0
avg_additional_charges_total,2939.0,158.45,99.8,0.17,76.92,149.88,224.45,552.59
month_to_end_contract,2939.0,5.28,4.36,1.0,1.0,6.0,10.0,12.0
