## This notebook is for the data cleaning practice

This dataset provides a comprehensive analysis of mobile device usage patterns and user behavior classification. It contains 700 samples of user data, including metrics such as app usage time, screen-on time, battery drain, and data consumption. Each entry is categorized into one of five user behavior classes, ranging from light to extreme usage, allowing for insightful analysis and modeling.

Key Features:

* User ID: Unique identifier for each user.
* Device Model: Model of the user's smartphone.
* Operating System: The OS of the device (iOS or Android).
* App Usage Time: Daily time spent on mobile applications, measured in minutes.
* Screen On Time: Average hours per day the screen is active.
* Battery Drain: Daily battery consumption in mAh.
* Number of Apps Installed: Total apps available on the device.
* Data Usage: Daily mobile data consumption in megabytes.
* Age: Age of the user.
* Gender: Gender of the user (Male or Female).
* User Behavior Class: Classification of user behavior based on usage patterns (1 to 5).

In [25]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 

In [26]:
df = pd.read_csv('user_behavior_dataset.csv')

In [27]:
df.head()

Unnamed: 0,User ID,Device Model,Operating System,App Usage Time (min/day),Screen On Time (hours/day),Battery Drain (mAh/day),Number of Apps Installed,Data Usage (MB/day),Age,Gender,User Behavior Class
0,1,Google Pixel 5,Android,393,6.4,1872,67,1122,40,Male,4
1,2,OnePlus 9,Android,268,4.7,1331,42,944,47,Female,3
2,3,Xiaomi Mi 11,Android,154,4.0,761,32,322,42,Male,2
3,4,Google Pixel 5,Android,239,4.8,1676,56,871,20,Male,3
4,5,iPhone 12,iOS,187,4.3,1367,58,988,31,Female,3


In [28]:
df['Device Model'].value_counts()

Device Model
Xiaomi Mi 11          146
iPhone 12             146
Google Pixel 5        142
OnePlus 9             133
Samsung Galaxy S21    133
Name: count, dtype: int64

In [29]:
df.isnull().sum()

User ID                       0
Device Model                  0
Operating System              0
App Usage Time (min/day)      0
Screen On Time (hours/day)    0
Battery Drain (mAh/day)       0
Number of Apps Installed      0
Data Usage (MB/day)           0
Age                           0
Gender                        0
User Behavior Class           0
dtype: int64

In [30]:
df['Operating System'].value_counts()

Operating System
Android    554
iOS        146
Name: count, dtype: int64

In [39]:
df['Operating System'].replace(['Android','iOS'],[0,1],inplace=True)

df['Gender'].replace(['Male','Female'],[0,1],inplace=True)



The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Gender'].replace(['Male','Female'],[0,1],inplace=True)
  df['Gender'].replace(['Male','Female'],[0,1],inplace=True)


In [40]:
df.head()

Unnamed: 0,User ID,Device Model,Operating System,App Usage Time (min/day),Screen On Time (hours/day),Battery Drain (mAh/day),Number of Apps Installed,Data Usage (MB/day),Age,Gender,User Behavior Class
0,1,Google Pixel 5,0,393,6.4,1872,67,1122,40,0,4
1,2,OnePlus 9,0,268,4.7,1331,42,944,47,1,3
2,3,Xiaomi Mi 11,0,154,4.0,761,32,322,42,0,2
3,4,Google Pixel 5,0,239,4.8,1676,56,871,20,0,3
4,5,iPhone 12,1,187,4.3,1367,58,988,31,1,3


In [33]:
df['Operating System'].value_counts()

Operating System
0    554
1    146
Name: count, dtype: int64

In [34]:
df['Gender'].value_counts()

Gender
Male      364
Female    336
Name: count, dtype: int64

In [35]:
df.head()

Unnamed: 0,User ID,Device Model,Operating System,App Usage Time (min/day),Screen On Time (hours/day),Battery Drain (mAh/day),Number of Apps Installed,Data Usage (MB/day),Age,Gender,User Behavior Class
0,1,Google Pixel 5,0,393,6.4,1872,67,1122,40,Male,4
1,2,OnePlus 9,0,268,4.7,1331,42,944,47,Female,3
2,3,Xiaomi Mi 11,0,154,4.0,761,32,322,42,Male,2
3,4,Google Pixel 5,0,239,4.8,1676,56,871,20,Male,3
4,5,iPhone 12,1,187,4.3,1367,58,988,31,Female,3


In [36]:
df['Gender'].value_counts()

Gender
Male      364
Female    336
Name: count, dtype: int64