#### First We need to import all the necesassry libraries used for this project 

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import IsolationForest


# Make sure to install these libraries before importing it

#### Load the data

In [None]:
file_path = 'userbehaviour.csv' # make sure the file path is correct
data = pd.read_csv(file_path)

#### Import data and check null values, column info, and descriptive statistics of the data.

In [None]:
# Printing Null values and Column Info
print("Null Values:\n", data.isnull().sum())
print("\nColumn Info:\n")
print(f"{data.info()}\n\n")

In [None]:
# Print Descriptive Statistics
print("descriptive statsitics")
print(data.describe())

#### Check the highest, lowest, and average screen time of all the users

In [None]:
max_screen_time = data['Average Screen Time'].max()
min_screen_time = data['Average Screen Time'].min()
avg_screen_time = data['Average Screen Time'].mean()
print("\nHighest Screen Time:", max_screen_time)
print("Lowest Screen Time:", min_screen_time)
print("Average Screen Time:", round(avg_screen_time, 4))

####  Check the highest, lowest, and the average amount spent by all the users

In [None]:
max_amount_spent = data['Average Spent on App (INR)'].max()
min_amount_spent = data['Average Spent on App (INR)'].min()
avg_amount_spent = data['Average Spent on App (INR)'].mean()
print("\nHighest Amount Spent:", max_amount_spent)
print("Lowest Amount Spent:", min_amount_spent)
print("Average Amount Spent:", round(avg_amount_spent, 3)) # Rounding float values for 

#### Now check the relationship between the spending capacity and screen time of the active users and the users who have uninstalled the app.

In [None]:
import plotly.express as px
import plotly.io as pio 
# data = pd.DataFrame(data)
pio.renderers.default = 'notebook'

fig = px.scatter(
    data, 
    x='Average Screen Time', 
    y='Average Spent on App (INR)', 
    color='Status',
    size='Average Spent on App (INR)',
    title='Relationship Between Spending Capacity and Screentime',
    labels={'Average Screen Time': 'Average Screen Time', 'Average Spent on App (INR)': 'Average Spent on App (INR)'},
    height=600,
    width=700
)

fig.show()

## Observations:
- **Active Users**: The scatter plot for active users shows a positive correlation between screen time and the amount spent. This indicates that users who spend more time on the app tend to spend more money. The cluster of high spenders with high screen time suggests that engaging users can lead to increased spending.
- **Uninstalled Users**: For users who have uninstalled the app, the relationship between screen time and spending is less pronounced. There are fewer high spenders, and the data points are more spread out. This suggests that users who uninstall the app may not be as engaged or willing to spend as those who remain active.

#### Now check the relationship between the ratings given by users and the average screen time. Also explain your observation.


In [None]:
# pio.renderers.default = 'notebook'
fig = px.scatter(
    data, 
    x='Average Screen Time', 
    y='Ratings', 
    color='Status',
    size='Ratings',
    title='Relationship Between Ratings and Screentime',
    labels={'Average Screen Time': 'Average Screen Time', 'Ratings': 'Ratings'},
    height=600,
    width=600
)

fig.show()

## Observations:
- The scatter plot shows a moderate positive correlation between ratings and average screen time. Users who spend more time on the app tend to give higher ratings. This indicates that higher engagement is associated with greater user satisfaction.
- There are some outliers where users with low screen time give high ratings and vice versa. These outliers could be due to individual user preferences or specific experiences that significantly influenced their ratings.

#### Moving forward to App User segmentation to find the users that the app retained and lost forever. For this i would use the K-means clustering algorithm in Machine Learning.

In [None]:
features = data[['Average Screen Time', 'Average Spent on App (INR)', 'Ratings', 'New Password Request', 'Last Visited Minutes']]
scaler = StandardScaler()
scaled_features = scaler.fit_transform(features)
kmeans = KMeans(n_clusters=3, random_state=42)
data['Segment'] = kmeans.fit_predict(scaled_features)
# print(data['Segment'])

#### Number of segments:

In [None]:
num_segments = data['Segment'].nunique()
print("\nNumber of Segments:", num_segments)

#### Visualize the Segments

In [None]:
fig = px.scatter(
    data, 
    x='Last Visited Minutes', 
    y='Average Spent on App (INR)', 
    color='Status',
    size='Average Spent on App (INR)',
    title='Relationship Between Ratings and Screentime',
    labels={'Last Visited Minutes': 'Last Visited Minutes', 'Average Spent on App (INR)': 'Average Spent on App (INR)'},
    height=600,
    width=650
)

fig.show()

## Summary

### Data Inspection
To begin with, the dataset was loaded from the provided CSV file, and a preliminary inspection was conducted. This involved checking for null values, understanding the column information, and generating descriptive statistics. This step is crucial as it helps us understand the structure and quality of the data before performing any analysis or modeling.

- **Null Values**: Ensured there were no missing values in the dataset, which could affect our analysis.
- **Column Information**: Verified the data types of each column to ensure they are appropriate for the analysis.
- **Descriptive Statistics**: Provided a statistical summary of each feature, including mean, standard deviation, minimum, and maximum values.

### Screen Time Analysis
The analysis of screen time included calculating the highest, lowest, and average screen times across all users. This helps in understanding user engagement on the application.

- **Highest Screen Time**: Identified the maximum screen time, indicating the most engaged user.
- **Lowest Screen Time**: Identified the minimum screen time, indicating the least engaged user.
- **Average Screen Time**: Calculated the mean screen time to understand the overall engagement level of users.

### Spending Analysis
Similarly, the spending analysis involved calculating the highest, lowest, and average amounts spent by users on the app. This helps in understanding the monetization aspects and user spending behavior.

- **Highest Amount Spent**: Determined the maximum amount spent by a user.
- **Lowest Amount Spent**: Determined the minimum amount spent by a user.
- **Average Amount Spent**: Calculated the mean amount spent to understand overall spending behavior.

### Spending vs. Screen Time for Active and Uninstalled Users
A scatter plot analysis was performed to explore the relationship between screen time and spending for active users versus users who have uninstalled the app. This helps in identifying patterns in user behavior that could indicate why some users remain active while others uninstall the app.

- **Active Users**: Observed the spending and screen time relationship for users who are still using the app.
- **Uninstalled Users**: Compared the same relationship for users who have uninstalled the app.

### Ratings vs. Screen Time
The relationship between the ratings given by users and their average screen time was analyzed using a scatter plot. This helps in understanding if there is a correlation between user satisfaction (as indicated by ratings) and their engagement (as indicated by screen time).

### User Segmentation Using K-Means Clustering
K-Means clustering was applied to segment users based on their behavior. This involved selecting relevant features, standardizing the data, and applying the K-Means algorithm to identify distinct user segments.

- **Feature Selection**: Chose features like screen time, amount spent, ratings, password requests, and last visited minutes.
- **Standardization**: Standardized the features to ensure equal weighting in the clustering algorithm.
- **Clustering**: Identified three distinct user segments, which can be further analyzed to understand different user behaviors and tailor marketing strategies accordingly.

### Visualizing the Segments
The user segments were visualized using scatter plots, showing clear distinctions between the different clusters. This visual representation helps in understanding the characteristics of each segment and aids in strategic decision-making.

### Conclusion
The entire analysis provides a comprehensive understanding of user behavior on the app, from engagement and spending to satisfaction and segmentation. These insights can be used to improve user retention, tailor marketing strategies, and enhance the overall user experience.
