# TASK 2: User Engagement analysis

The aim of this notebook is to analyze the engagement of the users. By doing this will see how engage the user are towards the services of the TellCo company.
This will result in building & improving the Quality of Service (QoS) to leverage the mobile platforms and to get more users for the business.</br>
For this task, we are called to to track the user’s engagement using the following engagement metrics: 
* sessions frequency 
* the duration of the session 
* the sessions total traffic (download and upload (bytes))

In [1]:
# Importation of the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

import warnings
warnings.filterwarnings('ignore')

In [2]:
import sys
sys.path.append('../scripts')
from Extract_data import extract_data

In [3]:
# Import the dataset
df = pd.read_csv("../data/Cleaned_Data.csv")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150001 entries, 0 to 150000
Data columns (total 55 columns):
 #   Column                                    Non-Null Count   Dtype  
---  ------                                    --------------   -----  
 0   Bearer Id                                 149010 non-null  float64
 1   Start                                     150000 non-null  object 
 2   Start ms                                  150001 non-null  float64
 3   End                                       150000 non-null  object 
 4   End ms                                    150001 non-null  float64
 5   Dur. (s)                                  150001 non-null  float64
 6   IMSI                                      149431 non-null  float64
 7   MSISDN/Number                             148935 non-null  float64
 8   IMEI                                      149429 non-null  float64
 9   Last Location Name                        148848 non-null  object 
 10  Avg RTT DL (ms)     

Let's extract the important variables to perform this task. The imported dataset has already been cleaned and the cleaning process can be found in the notebook [here](Data_Preprocessing.ipynb).

In [23]:
# Extract the important variables for the task 2
dfTask2 = df[['MSISDN/Number','Bearer Id','Dur. (s)','Total UL (Bytes)','Total DL (Bytes)']]
dfTask2[['MSISDN/Number','Bearer Id']] = dfTask2[['MSISDN/Number','Bearer Id']].astype(str)
dfTask2.head(5)

106857

>## Task 2.1

### Aggregate the metrics per customer id (MSISDN)
To do so, we'll use a module we wrote in the script [Extract_data.py](..\scripts\Extract_data.py)

In [22]:
dfForAgg = extract_data(dfTask2)
# Aggregated sessions frequency
AggSession = dfForAgg.extract_Session('MSISDN/Number')
# Aggregated duration
AggDur = dfForAgg.sum_duration('MSISDN/Number')
# Aggregated duration
AggTotal = dfForAgg.extract_Total('MSISDN/Number')
# Merge the three metrics per user
import functools as ft
dfEachAgg = [AggSession,AggDur,AggTotal]
dfAgg = ft.reduce(lambda left,right: pd.merge(left,right,left_index=True,right_index=True,
        validate="one_to_one"),dfEachAgg)
dfAgg.head(5)

106857