# Mobile phone usage prediction

> “If you have enough data about me and enough computer power and biological knowledge, you can hack my body, my brain, my life,” 
“You can reach a point where you know me better than I know myself.”
                                                                    **Yuval Noah Harari.**

#### In this notebook, we will use the data set to find the daily usage apps, time spent on every app, users sleep pattern and productivity.

Hypothesis I have made here:
* User does not spend his time more than 11 hours per day.
* The first and last thing the user does is checking his phone.
* if the phone screen locked more than 5 hours, then the user is sleeping.

# Data

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.
# settings
import warnings
warnings.filterwarnings("ignore")

In [None]:
#importing the required library files
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from fbprophet import Prophet
from sklearn.metrics import mean_squared_error, mean_absolute_error

In [None]:
#loading the data 
chk = pd.read_csv('../input/mobile-usage-dataset-individual-person/CheckDevice.csv')
ph_usage = pd.read_csv('../input/mobile-usage-dataset-individual-person/phone_usage.csv')


consists of phone unlocked count and screen on time per day

In [None]:
chk.head()

this data set consists of every app usage's duration and the time opened with date 

In [None]:
ph_usage.head()

In [None]:
print("info of CHECK COUNT")
print('--'*20)
chk.info()

# Data preprocessing

Cleaning the data...


In [None]:
#renaming the columns name
chk.rename(columns={'Check phone count': 'check_phn_count', 'Screen on time': 'screen_on_time'}, inplace=True)

In [None]:
#droping the NaN columns
chk.dropna(axis=0, inplace =True)

Converting the screen time on to minutes 

In [None]:
chk['duration'] = chk['screen_on_time'].str.split(':').apply(lambda x: int(x[0]) *60 + int(x[1])  )

In [None]:
chk.describe()

Duration indicated the minutes used every day. here as a cleaning process, assuming the person does not use 
mobile more 11hrs(660minutes) per day. if it exists 11hours then it is replaced with the median of duration

In [None]:
chk.loc[chk['duration'] > 660, 'duration'] = chk['duration'].median()

In [None]:
chk.describe()

### From the above information, the user unlocks the phone approximately 72 times per day and he uses phone approximately 5.5hrs(336 minutes) per day

In [None]:
chk["Date"]= pd.to_datetime(chk["Date"]) 


In [None]:
#Bar plot with respect ot date and the phone check count
plt.figure(figsize=(20,6))
sns.barplot(x="Date", y="check_phn_count", data=chk)
plt.title('Phone check count')
plt.xticks(rotation=90)
plt.show()

In [None]:
#Bar plot with respect to date and the phone usage duration everyday
plt.figure(figsize=(20,6))
sns.barplot(x="Date", y="duration", data=chk)
plt.title('Phone usage each day in minutes')
plt.xticks(rotation=90)
plt.show()

In [None]:
## converting the date column from object to time series
chk['Date'] = pd.to_datetime(chk['Date'])

In [None]:
chk['day_of_week'] = chk['Date'].dt.dayofweek
chk

In [None]:
chk.groupby('day_of_week').sum().nlargest(20,'duration').reset_index()

In [None]:
plt.figure(figsize=(15,6))
data = chk.groupby('day_of_week').sum().nlargest(20,'duration').reset_index()
sns.barplot(x='day_of_week',y='duration',data=data)
plt.title('DAY OF THE WEEK')
plt.xticks(rotation=90)
plt.show()

### the above plot is the comparison of day of week with the duration. the user uses his phone in same pattern on all day. there is slightly high usage of his phone on wednesday and saturday.

In [None]:
plt.figure(figsize=(15,6))
data = chk.groupby('day_of_week').sum().nlargest(20,'check_phn_count').reset_index()
sns.barplot(x='day_of_week',y='check_phn_count',data=data)
plt.title('DAY OF THE WEEK')
plt.xticks(rotation=90)
plt.show()

### the above plot is the comparison of day of week with the phone unlock. the user uses his phone in same pattern on all day of week. there is slightly high unlock found on tuesday and saturday.

In [None]:
chk['categories'] = chk['day_of_week'].apply(lambda x: 'weekday' if x < 5 else 'weekend')

chk['weekday'] = chk['categories'].apply(lambda x: '0' if x == 'weekday' else '1')
chk.drop(columns='categories', inplace=True)
chk

In [None]:
chk['month'] = pd.DatetimeIndex(chk['Date']).month
chk.head()

#### Loading the next data set

In [None]:
ph_usage.head(10)

In [None]:
#renaming the columns name
ph_usage.rename(columns={'App name': 'App_name'}, inplace=True)

In [None]:
ph_usage.info()

In [None]:
ph_usage.describe()

In [None]:
ph_usage.shape

In [None]:
#droping the NaN columns
ph_usage.dropna(axis=0, inplace =True)

In [None]:
ph_usage.columns

In [None]:
#making a new copy of data frame
ph_usg = ph_usage

In [None]:
# Creating a new column of datetime (timestamp)
ph_usg['DateTime']= pd.to_datetime(ph_usg['Date'] +" " + ph_usg['Time'],format='%d/%m/%Y %H:%M:%S')
ph_usg.head()

In [None]:
# Converting the duration into seconds.
ph_usg['usage_seconds'] = ph_usg['Duration'].str.split(':').apply(lambda x: int(x[0]) *3600 + int(x[1]) * 60 + int(x[2]))
ph_usg

In [None]:
# to find the number of days
ph_usg['DateTime'].max() - ph_usg['DateTime'].min()

In [None]:
#Filtering the system apps and system usage
system_tracker = ['Screen on (unlocked)','Screen off (locked)','Screen on (locked)', 'Screen off','Permission controller','System UI','Package installer',
'Device shutdown','Call Management']
service_app = ph_usg[ph_usg['App_name'].isin(system_tracker)]
service_app

In [None]:
#Getting all the user apps.
all_apps = ph_usg[~ph_usg['App_name'].isin(system_tracker)]

all_apps

In [None]:
#sorting the usage seconds in descending order
test = service_app.sort_values(by='usage_seconds',ascending=0)


In [None]:
sns.scatterplot(x='App_name', y='usage_seconds', data=test[test['usage_seconds'] > 3600])

In [None]:
plt.figure(figsize=(15,6))
sns.countplot(test['App_name'])
plt.title('APP name count')
plt.xticks(rotation=90)
plt.show()

In [None]:
sleep = ['Screen off (locked)','Screen on (locked)', 'Screen off']
sleep_duration = service_app[service_app['App_name'].isin(sleep)]
sleep_duration

In [None]:
sns.scatterplot(x='App_name', y='usage_seconds', data=sleep_duration[sleep_duration['usage_seconds'] > 18000])

In [None]:
new = sleep_duration[sleep_duration['usage_seconds'] > 18000]
new

In [None]:

plt.figure(figsize=(15,6))
sns.scatterplot(x='Date', y='usage_seconds', data=new)
plt.title('User sleep pattern')
plt.xticks(rotation=90)
plt.show()

In [None]:
#The user approximately sleeps 6.7 hours everyday.. since the screen off was filtered more than 5 hours. all the time are showing around 10PM to 1AM
new.usage_seconds.mean()

### The user approximately sleeps 6.7 hours every day.. 
#### since the screen off was filtered more than 5 hours. all the time are showing around 11PM to 1AM

### if the user sleeps around 11PM to 1AM and his sleeping hours approximately 7 hours, then obviously he will wake up by 6AM to 8AM


In [None]:
# Getting the screen on unlocked alone
wake = ['Screen on (unlocked)']
wake_up = service_app[service_app['App_name'].isin(wake)]
wake_up.head()

In [None]:
wake_up.tail()

In [None]:
#Grouping the datetime on the basis of frequency day and getting the minimum time of the day
wakeup_time= wake_up.set_index('DateTime').groupby(pd.Grouper(freq='D')).min()
wakeup_time.tail(50)


### This clearly shows that the user wakeup around 6AM to 8AM. On very few days he wakeup between 2AM to 3AM. he had a good sleep.

In [None]:
#Filtering the app usage seconds more than 10 seconds. Assuming that user use apps more than 10 seconds. 
all_apps = all_apps[(all_apps.usage_seconds > 10)]
all_apps

In [None]:
#All apps access count
all_apps['App_name'].value_counts()

In [None]:
plt.figure(figsize=(15,6))
sns.countplot(x = 'App_name',
              data = all_apps,
              order = all_apps['App_name'].value_counts().index)

plt.title('APP name count')
plt.xticks(rotation=90)
plt.show()

In [None]:
plt.figure(figsize=(15,6))
s = all_apps['App_name'].value_counts().head(25)
ax= s.plot.bar(width=.8) 

for i, v in s.reset_index().iterrows():
    ax.text(i, v.App_name + 0.2 , v.App_name, color='red')

### The above plot shows the user's access count of the top 25 apps. 
### Instagram was accessed 5370 times, whatsapp 5323 times. It shows that the user spends most of his time on social media.

In [None]:
all_apps['usage_minutes'] = all_apps['usage_seconds']//60


In [None]:
plt.figure(figsize=(15,6))
data = all_apps.groupby('App_name').sum().nlargest(20,'usage_minutes').reset_index()
sns.barplot(x='App_name',y='usage_minutes',data=data)
plt.title('Top 20 apps used')
plt.xticks(rotation=90)
plt.show()

In [None]:
all_apps.groupby('App_name').sum().nlargest(20,'usage_minutes').reset_index()

### user spends most of his time in instagram 17079 minutes which is 284 hours (12 days)
### out of 193 days he spends 12 days in instagram alone.
#### This seems very huge time, but on average he spends 1.5 hours on social media.


In [None]:
def dateFeatures(all_apps):
    features = ['day','week','dayofweek','month','weekofyear']
    for col in features:
        all_apps[col] = getattr(all_apps['DateTime'].dt,col) * 1

In [None]:
dateFeatures(all_apps)
all_apps

In [None]:
plt.figure(figsize=(15,6))
all_apps.groupby(['weekofyear'])['usage_minutes'].sum().plot(kind='bar')
plt.xticks(rotation=90)
plt.show()

### The above plot shows the total usage time for every week. week 20 and 48 shows low usage but there is shortage of data on those weeks.

In [None]:
plt.figure(figsize=(15,6))
all_apps.groupby(['month','App_name']).sum().nlargest(6,'usage_minutes')['usage_minutes'].plot(kind='bar')
plt.title('User spends more minutes on the app for each month')
plt.xticks(rotation=90)
plt.show()

### The above plot shows the total usage time for every month with respect most time spent app.
### the user used instagram every month at higher amount

In [None]:
all_apps

In [None]:
train = all_apps.copy()

In [None]:
def f(x):
    if (x > 5) and (x <= 8):
        return 'Early_Morn'
    elif (x > 8) and (x <= 12 ):
        return 'Morn'
    elif (x > 12) and (x <= 16):
        return'Noon'
    elif (x > 16) and (x <= 20) :
        return 'Eve'
    elif (x > 20) and (x <= 24):
        return'Night'
    elif (x <= 4):
        return'Late_Night'

In [None]:
# make a session
train['hour'] = train['DateTime'].dt.hour
train['session'] = train['hour'].apply(f)
train.drop(['weekofyear','usage_seconds'],axis=1, inplace=True)

In [None]:
train

In [None]:
# Grouping based on the session, date and app name, to find out which counts of app on each session each day
train.groupby(['session','Date','App_name']).size().reset_index()


In [None]:
sns.pairplot(train,
             hue='hour',
             x_vars=['hour','dayofweek','week','session'],
             y_vars='usage_minutes',
             height=5,
             plot_kws={'alpha':0.15, 'linewidth':0}
            )
plt.suptitle('Phone usage minutes, hour, Day of Week, week and session')
plt.show()

In [None]:
train.set_index('DateTime',inplace=True)

In [None]:
train

# Train/Test Split

In [None]:
split_date = '30/10/2019'
f_train = train.loc[train.index <= split_date].copy()
f_test  = train.loc[train.index > split_date].copy()

In [None]:
f_train

In [None]:
plt.style.use('fivethirtyeight') # For plots
# Color pallete for plotting
color_pal = ["#F8766D", "#D39200", "#93AA00",
             "#00BA38", "#00C19F", "#00B9E3",
             "#619CFF", "#DB72FB"]
train.plot(style='.', figsize=(20,6), color=color_pal, title='Usage plot')
plt.show()

In [None]:
# Format data for prophet model using ds and y
f_train.reset_index().rename(columns={'DateTime':'ds','usage_minutes':'y'}).head()

In [None]:
# Setup and train model and fit
model = Prophet()
model.fit(f_train.reset_index().rename(columns={'DateTime':'ds','usage_minutes':'y'}))

In [None]:
# Predict on training set with model
f_test_fcst = model.predict(df=f_test.reset_index().rename(columns={'DateTime':'ds'}))

In [None]:
f_test_fcst.head()

In [None]:
# Plot the forecast
f, ax = plt.subplots(1)
f.set_figheight(5)
f.set_figwidth(15)
fig = model.plot(f_test_fcst,ax=ax)
plt.show()

In [None]:
# Plot the components of the model
fig = model.plot_components(f_test_fcst)

    Compare to actual forecast

In [None]:
# Plot the forecast with the actuals
f, ax = plt.subplots(1)
f.set_figheight(5)
f.set_figwidth(15)
ax.scatter(f_test.index, f_test['usage_minutes'], color='r')
fig = model.plot(f_test_fcst, ax=ax)

In [None]:
mean_squared_error(y_true=f_test['usage_minutes'],
                   y_pred=f_test_fcst['yhat'])

In [None]:
mean_absolute_error(y_true=f_test['usage_minutes'],
                   y_pred=f_test_fcst['yhat'])

### Need to update the model. prediction is very poor
## need to work with XGBoost

# Conclusion

### * User unlocks the phone approximately 72 times per day and he uses phone approximately 5.5hrs(336 minutes) per day.
### * On average user spends 5.5 hours per day, that means out of 193 days he spends 40 days in phone alone. (very shocking)
### * User uses his phone in the same pattern on all day. there is slightly high usage of his phone on wednesday and saturday.
### * User unlocks his phone in the same pattern on all day of week. there is slightly high unlock found on tuesday and saturday.
### * He approximately sleeps 6.7 hours every day.he had a good sleep.
### * User sleeping time around 11PM to 1AM.
### * Wake up time around 6AM to 8AM. (On very few day user wakeups between 2AM to 3AM.)
### * Instagram was accessed 5370 times, whatsapp 5323 times. It shows that the user spends most of his time on  social media. on average user accesssed instagram and whatsapp ~27times/day. 
### * User accessed instagram and whatsapp on a frequency of every 45 minutes.
### * He spends most of his time in instagram 17079 minutes which is 284 hours (12 days)
### * out of 193 days he spends 12 days in instagram alone. (This seems very huge time, but on average he spends 1.5 hours on social media)



## Things to do for user to improve his productivity

### * User has to set the time limiter for social media apps. Currently, user spends 1.5 hours every day on social media. if he limits to 1 hour per day. then the user can save 15 hours per month.

### * User opens the social media for every 45 minutes. if the user restricts himself to open only for 2 hours, it saves time.

### * user also uses amazon kindle this shows user is a reader, so better he read hardcopy books to divert himself from the phone. (Manual interpretation: user switches from kindle to youtube, instagram, whatsapp, this show he has distraction for that only recommending user to read hardcopy books)

### * Turning off the mobile data/wifi during the office/productive time improves user from distraction.

### the average person spends over 4 hours a day on their device.

For more details,
[https://www.inc.com/melanie-curtin/are-you-on-your-phone-too-much-average-person-spends-this-many-hours-on-it-every-day.html](http://)



Thanks for reading! give me some feedback. :)