## User Engagement Analysis

### Major Objectives 
Track the user’s engagement using the following engagement metrics: 
- sessions frequency 
- the duration of the session 
- the sessions total traffic (download and upload (bytes))

### Sub Tasks
- Aggregate the above metrics per customer id (MSISDN) and report the top 10 customers per engagement metric 
- Normalize each engagement metric and run a k-means (k=3) to classify customers in three groups of engagement. 
- Compute the minimum, maximum, average & total non- normalized metrics for each cluster. 
- Interpret your results visually with accompanying text explaining your findings.
- Aggregate user total traffic per application and derive the top 10 most engaged users per application
- Plot the top 3 most used applications using appropriate charts.  
- Using k-means clustering algorithm, group users in k engagement clusters based on the engagement metrics: 
- What is the optimized value of k (use elbow method for this)?  
- Interpret your findings. 


### Importing Liberaries

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import matplotlib
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
import math
import sys, os

In [6]:
sys.path.append(os.path.abspath(os.path.join('../scripts..')))
from data_visualizer import *
from data_selector import *
from outlier_handler import*

### Load Data

In [7]:
clean_df = pd.read_csv("../data/my_clean_data.csv")
clean_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146887 entries, 0 to 146886
Data columns (total 55 columns):
 #   Column                            Non-Null Count   Dtype  
---  ------                            --------------   -----  
 0   Unnamed: 0                        146887 non-null  int64  
 1   Bearer Id                         146887 non-null  int64  
 2   Start                             146887 non-null  object 
 3   Start ms                          146887 non-null  float64
 4   End                               146887 non-null  object 
 5   End ms                            146887 non-null  float64
 6   IMSI                              146887 non-null  int64  
 7   MSISDN/Number                     146887 non-null  int64  
 8   IMEI                              146887 non-null  int64  
 9   Last Location Name                146887 non-null  object 
 10  Avg RTT DL (ms)                   146887 non-null  float64
 11  Avg RTT UL (ms)                   146887 non-null  f

### User Engagement Analysis

**1. Top 10 customers per engagement metric**

In [10]:
total_users = clean_df['MSISDN/Number'].nunique()
print("The total number of Customer in the TellCo are = ",total_users)

The total number of Customer in the TellCo are =  105716


In [None]:
engagement_of_df = df[['msisdn_number', 'bearer_id', 'dur_(ms)', 'total_data']].copy(
).rename(columns={'dur_(ms)': 'duration', 'total_data': 'total_data_volume'})