# **App Users Segmentation: Case Study**

In the highly competitive world of apps, businesses and app developers need to understand and target specific user groups to improve engagement and retain more users to increase lifetime value. 

The dataset used for this project was collected from an app to find a data-driven approach to segment app users based on their usage habits and spending ability to find users that the application will retain and lose over time.

Below are all the features in the dataset:

userid: The identity number of the user.<br>
Average Screen Time: The average screen time of the user on the application.<br>
Average Spent on App (INR): The average amount spent by the user on the application.<br>
Left Review: Did the user leave any reviews about the experience on the application? (1 if true, otherwise 0).<br>
Ratings: Ratings given by the user to the application.<br>
New Password Request: The number of times the user requested a new password.<br>
Last Visited Minutes: Minuted passed by when the user was last active.<br>
Status: Installed if the application is installed and uninstalled if the user has deleted the application.<br>

The segments will be created by looking for relationships which exists between the users who are still using the application and the users who have uninstalled the application. Use these relationships to create user segments to identify retained users, and a segment of users who are still within the opportunity to be retained before they move to other alternatives.

In [2]:
# Import libraries and datasets.

import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
import pandas as pd
pio.templates.default = "plotly_white"

In [3]:
data = pd.read_csv("C:/Users/samue/Desktop/Data Scientist/Datasets/user-behaviour/user behaviour/userbehaviour.csv")
print(data.head())

   userid  Average Screen Time  Average Spent on App (INR)  Left Review  \
0    1001                 17.0                       634.0            1   
1    1002                  0.0                        54.0            0   
2    1003                 37.0                       207.0            0   
3    1004                 32.0                       445.0            1   
4    1005                 45.0                       427.0            1   

   Ratings  New Password Request  Last Visited Minutes       Status  
0        9                     7                  2990    Installed  
1        4                     8                 24008  Uninstalled  
2        8                     5                   971    Installed  
3        6                     2                   799    Installed  
4        5                     6                  3668    Installed  


### Perform data cleaning and exploration

In [4]:
print(data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 999 entries, 0 to 998
Data columns (total 8 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   userid                      999 non-null    int64  
 1   Average Screen Time         999 non-null    float64
 2   Average Spent on App (INR)  999 non-null    float64
 3   Left Review                 999 non-null    int64  
 4   Ratings                     999 non-null    int64  
 5   New Password Request        999 non-null    int64  
 6   Last Visited Minutes        999 non-null    int64  
 7   Status                      999 non-null    object 
dtypes: float64(2), int64(5), object(1)
memory usage: 62.6+ KB
None


From the information provided above, we can say this datatset is clean.

Lets observe the highest, lowest, and average screen time of all the users.

In [5]:
print(f'Average Screen Time = {data["Average Screen Time"].mean()}')
print(f'Highest Screen Time = {data["Average Screen Time"].max()}')
print(f'Lowest Screen Time = {data["Average Screen Time"].min()}')

Average Screen Time = 24.39039039039039
Highest Screen Time = 50.0
Lowest Screen Time = 0.0


Let’s have a look at the highest, lowest, and the average amount spent by all the users.

In [6]:
print(f'Average Spend of the Users = {data["Average Spent on App (INR)"].mean()}')
print(f'Highest Spend of the Users = {data["Average Spent on App (INR)"].max()}')
print(f'Lowest Spend of the Users = {data["Average Spent on App (INR)"].min()}')

Average Spend of the Users = 424.4154154154154
Highest Spend of the Users = 998.0
Lowest Spend of the Users = 0.0


### Explore the data to spot a relationship between the spending capacity and screen time of the active users and the users who have uninstalled the app.

In [7]:
figure = px.scatter(data_frame = data, 
                    x = "Average Screen Time",
                    y = "Average Spent on App (INR)", 
                    size = "Average Spent on App (INR)", 
                    color = "Status",
                    title = "Relationship Between Spending Capacity and Screentime",
                    trendline="ols")
figure.show()

Notice users who uninstalled the app had an average screen time of fewer than 5 minutes a day, and the average spent was less than 100. 

We can also see a linear relationship between the average screen time and the average spending of the users still using the app.

Spot the look at the relationship between the ratings given by users and the average screen time.


In [8]:
figure = px.scatter(data_frame = data, 
                    x = "Average Screen Time",
                    y = "Ratings", 
                    size = "Ratings", 
                    color = "Status", 
                    title = "Relationship Between Ratings and Screentime",
                    trendline = "ols")
figure.show()

We can observe from the chart above that users who uninstalled the app gave the app a maximum of five ratings. Their screen time is very low compared to users who rated more. 
So, this describes that users who don’t like to spend more time rate the app low and uninstall it at some point.

We can also observe that the users who remained in the app, have a minimum screen time of 5 minutes and a maximum screen time of 50 minutes.

## **App User Segmentation to Find Retained and Lost Users**

App User segmentation to find the users that the app retained and lost forever. 

I will be using the K-means clustering algorithm in Machine Learning for this task.

In [9]:
clustering_data = data[["Average Screen Time", "Left Review", 
                        "Ratings", "Last Visited Minutes", 
                        "Average Spent on App (INR)", 
                        "New Password Request"]]

from sklearn.preprocessing import MinMaxScaler
for i in clustering_data.columns:
    MinMaxScaler(i)
    
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3)
clusters = kmeans.fit_predict(clustering_data)
data["Segments"] = clusters

print(data.head(10))





   userid  Average Screen Time  Average Spent on App (INR)  Left Review  \
0    1001                 17.0                       634.0            1   
1    1002                  0.0                        54.0            0   
2    1003                 37.0                       207.0            0   
3    1004                 32.0                       445.0            1   
4    1005                 45.0                       427.0            1   
5    1006                 28.0                       599.0            0   
6    1007                 49.0                       887.0            1   
7    1008                  8.0                        31.0            0   
8    1009                 28.0                       741.0            1   
9    1010                 28.0                       524.0            1   

   Ratings  New Password Request  Last Visited Minutes       Status  Segments  
0        9                     7                  2990    Installed         0  
1        4    

In [10]:
# A look at the number of segments we obtained.

print(data["Segments"].value_counts())

0    910
1     45
2     44
Name: Segments, dtype: int64


In [11]:
# Rename the segments for clarity.StopAsyncIteration

data["Segments"] = data["Segments"].map({0: "Retained", 1: 
    "Churn", 2: "Needs Attention"})

### Visualise the segments.

In [12]:
PLOT = go.Figure()
for i in list(data["Segments"].unique()):
    

    PLOT.add_trace(go.Scatter(x = data[data["Segments"]== i]['Last Visited Minutes'],
                                y = data[data["Segments"] == i]['Average Spent on App (INR)'],
                                mode = 'markers',marker_size = 6, marker_line_width = 1,
                                name = str(i)))
PLOT.update_traces(hovertemplate='Last Visited Minutes: %{x} <br>Average Spent on App (INR): %{y}')

    
PLOT.update_layout(width = 800, height = 800, autosize = True, showlegend = True,
                   yaxis_title = 'Average Spent on App (INR)',
                   xaxis_title = 'Last Visited Minutes',
                   scene = dict(xaxis=dict(title = 'Last Visited Minutes', titlefont_color = 'black'),
                                yaxis=dict(title = 'Average Spent on App (INR)', titlefont_color = 'black')))

The blue segment shows the segment of users the app has retained over time. 

The red segment indicates the segment of users who just uninstalled the app or are about to uninstall it soon. 

The green segment indicates the segment of users that the application has lost.

### **Summary**

User segmentation equips marketing, product, customer sucess and sales teams teams with insights to;

- Develop marketing campaings and strategies(customer win-back strategy etc.) for customers in the red(strugling) and green segments(churning). 
- Formulate a survey for churned customers, stakeholders of this survey (product, sales and customer sucess teams) will understans what features arent solving the needs of customers and/or unintended adding friction to the distinct customer journey.
- Develop a customer loyalty and referal program, aimed at imporving retention rates.
- Support existing customers with helpful materials (how-to articles and videos etc.)
- Use insights obtained from the customer survey to create adjust products (features) and services (customer journey etc.).
- Develop cross-selling marketing strategies for custmers of similar products of the same company or from a competitors.

