# EDA on Facebook Analytics

**Objective :**

Identifing certain patterns with respect to how the users are making use of this most popular social networking app depending on their age group,gender etc.

**Data Source :**

This exploratory data analysis gives insights from Facebook dataset which consists of identifying users that can be focused more to increase the business. These valuable insights should help Facebook to take intelligent decision to identify its useful users and provide correct recommendations to them.

**Work Flow :**

1. Data Extraction
2. Data Profiling
3. Exploratory Data Analysis
4. Analysis of Relation between Variables
5. Correlation of Features
6. Conclusions

In [1]:
# Import necessary libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# 1. Data Extraction

In [2]:
fb_df = pd.read_csv("facebook.csv")
fb_df.head()

FileNotFoundError: [Errno 2] No such file or directory: 'facebook.csv'

# 2. Data Profiling

In [None]:
fb_df.info()

1. There are a total of 99003 samples (rows) and 15 columns in the dataframe.

2. There are 13 columns with a numeric datatype.

3. There is a float and Category column each.

4. There are missing values in the data.

In [None]:
fb_df.describe(include='all')

**Pre Profiling**

In [None]:
import pandas_profiling
                                      
profile = fb_df.profile_report(title = 'Pre Profile Facebook Dataset')
profile.to_file(output_file='Pre Profile Facebook Data Analysis before Processing.html')
        

**Dataset info:**

Number of variables:15

Number of observations: 99003

Missing cells: <0.1%

Variables types:

Numeric = 14

Categorical = 1

Dataset has 175 (0.2%) Missing Value in gender column and 2 in tenure column

There are Zeros in the following Columns

1. friend_count
2. friendships_initiated
3. likes
4. likes_received
5. mobile_likes
6. mobile_likes_received
7. www_likes
8. www_likes_received

**Preprocessing**

Dealing with missing values

Dropping the column 'userid' as it has no value add to the analyse.

Replacing missing values of gender with mode values.

Replacing missing values of tenure with median values.

In [None]:
fb_df.drop('userid',axis=1,inplace=True)

In [None]:
fb_df['gender'].mode()

In [None]:
fb_df['gender'] = fb_df['gender'].replace(np.NaN,'male')

In [None]:
fb_df['gender'].unique()

In [None]:
fb_df['tenure'].median()

In [None]:
fb_df['tenure'] = fb_df['tenure'].replace(np.NaN,412.0)


In [None]:
fb_df.isnull().sum().sort_values(ascending = False)

Creating date_of_birth column using these variables: dob_year , dob_month , and dob_day

In [None]:
fb_df.insert(1,"date_of_birth",pd.to_datetime(fb_df.dob_year*10000+fb_df.dob_month*100+fb_df.dob_day,format='%Y%m%d'))

In [None]:
fb_df.head()

In [None]:
fb_df.info()

In [None]:
fb_df.describe(include = 'all')

**Post Profiling**

In [None]:
profile = fb_df.profile_report(title = 'Post Profile Facebook Dataset')
profile.to_file(output_file='Post Profile Facebook Data Analysis after Processing.html')

observations:

In the Dataset info, Total Missing(%) = 0.0%

Number of variables = 15

Observe the newly created variable date_of_birth.

Dataset has 8 (<0.1%) duplicate rows, needs to be removed from the datatset

**Post Processing**

In [None]:
fb_df.drop_duplicates(inplace=True)

In [None]:
fb_df.describe(include='all')

# 3. Exploratory Data Analysis

**Distrbution of age of users**

In [None]:
labels = ['12-14', '15-20','21-30','31-40','41-50', '51-60', '61-70', '71-80','81-90','91-100','101-110','111-120']
fb_df['age_group'] = pd.cut(fb_df['age'],
                         [10,15,20,30,40,50,60,70,80,90,100,110,120],
                         labels= labels, include_lowest=True)

In [None]:
sns.set_style('whitegrid')
plt.figure(figsize=(10,5))
sns.countplot(x='age_group',data=fb_df)

In [None]:
sns.set_style('whitegrid')
plt.figure(figsize=(20,12))
plt.xticks(rotation=45)
sns.countplot(x='age',data=fb_df)

In [None]:
fb_df.groupby(['age_group'])['age_group'].count()

Maximun users of Facebook are of Age 15 to 30 years.

Maximum Age:113 years and Minimum Age:13 years

**Gender wise Analysis**

In [None]:
fb_df['gender'].value_counts().plot(kind='pie',explode=[0.02,0.02],fontsize=14, autopct='%3.1f%%', 
                                               figsize=(10,10), shadow=True, startangle=135, legend=True, cmap='autumn')
plt.title('Pie chart showing the Gender wise Facebook Users Status')

Maximum users of Facebook are Males about 59.3%.

**Analysis based on Tenure**

In [None]:
sns.set(color_codes=True)
plt.figure(figsize=(20,12))
sns.set_palette(sns.color_palette("Set1", n_colors=5, desat=.5))
sns.distplot(fb_df["tenure"])

 More number of User have joined from Last 2 Years of data collection.

**Friend Count Distribution**


In [None]:
fb_df["friend_count"].value_counts()

In [None]:
Percentage_of_friend_count_nill = fb_df['friend_count'].value_counts().max() / (fb_df.friend_count.count())*100
print('Percentage of Zero Friend Count = ', Percentage_of_friend_count_nill.round(decimals=2))

In [None]:
sns.set(color_codes=True)
plt.figure(figsize=(18,9))
sns.set_palette(sns.color_palette("muted"))
sns.set(font_scale=1.5)
sns.distplot(fb_df["friend_count"])

more number of users have fewer than 500 Facebook Friends

2% of users did not have any friends


**Distribution of Friendships Initiated**


In [None]:
plt.figure(figsize=(18,9))
sns.set_palette(sns.color_palette("deep"))
sns.set(font_scale=1.5)
sns.distplot(fb_df["friendships_initiated"])

In [None]:
fb_df['friendships_initiated'].value_counts()


In [None]:
Percentage_of_friendships_Initiated_nill = fb_df['friendships_initiated'].value_counts().max() / (fb_df.friendships_initiated.count())*100
print('Percentage of Zero Friend Count = ', Percentage_of_friendships_Initiated_nill.round(decimals=2))

3% of users were not initiated even a single friends

# 4. Analysis of Relation between Variables

**Comparing Gender wise users in Age Group**


In [None]:
fig,ax =plt.subplots(figsize=(12,9))
sns.set(font_scale=1)
sns.countplot(data = fb_df,x = 'age_group', hue='gender')
plt.title('Age vs Gender')

Male users are more, except age group 51 to 80 where female users are more.


**Gender vs Age**


In [None]:
as_fig = sns.FacetGrid(fb_df,hue='gender',aspect=5)
as_fig.map(sns.kdeplot,'age',shade=True)
oldest = fb_df['age'].max()
as_fig.set(xlim=(0,oldest))
as_fig.add_legend()
plt.title('Age distribution using FacetGrid')

Maximum Facebook Users of Males & Females are of Age 15 - 30 years.


**Age vs Tenure**

In [None]:
fig,ax =plt.subplots(figsize=(15,10))
sns.set(font_scale=1)
sns.boxplot(data=fb_df,x='age_group',y='tenure')

In [None]:
sns.set(font_scale=1.5)
sns.set_palette(sns.color_palette("Set2", n_colors=5, desat=.5))
sns.catplot(x="age_group", y='tenure',hue ='gender',data=fb_df, kind="bar",height=8, aspect=2)

In [None]:
Max_tenure_in_year = (fb_df['tenure'].max())/365
Max_tenure_in_year



Facebook launched on 4th of February 2004 since then users are using, maximum tenure as per the dataset is around 9 years, from this we can confirm this dataset is created in 2013.
across all age group Female tenure is slightly higher than the Males.


**Comparing Age vs Friend Count**


In [None]:
sns.set(font_scale=1.5)
sns.set_palette(sns.color_palette("Set2", n_colors=5, desat=.5))
sns.catplot(x="age_group", y='friend_count',hue ='gender',data=fb_df, kind="bar",height=8, aspect=2)

People with an age of less than 30 and more than 80 have more friends than the middle aged people.

Females with an age of less than 30 have more friends than the Males.

**How users are using the Facebook through Website and Mobile App**



In [None]:
Percentage_of_likes = (fb_df.likes.count() - fb_df['likes'].value_counts().max()) / (fb_df.likes.count())*100
Percentage_of_www_likes = (fb_df.likes.count() - fb_df['www_likes'].value_counts().max())/(fb_df.likes.count())*100
Percentage_of_mobile_likes = (fb_df.likes.count() -  fb_df['mobile_likes'].value_counts().max())/(fb_df.likes.count())*100
Percentage_of_likes_received = (fb_df.likes.count() - fb_df['likes_received'].value_counts().max())/(fb_df.likes_received.count())*100
Percentage_of_www_likes_received = (fb_df.likes.count() - fb_df['www_likes_received'].value_counts().max())/(fb_df.likes_received.count())*100
Percentage_of_mobile_likes_received = (fb_df.likes.count() - fb_df['mobile_likes_received'].value_counts().max())/(fb_df.likes_received.count())*100

print('Percentage of Likes = ',Percentage_of_likes)
print('Percentage of Mobile Likes = ',Percentage_of_mobile_likes)
print('Percentage of Website Likes = ',Percentage_of_www_likes)

print('Percentage of Likes Received = ',Percentage_of_likes_received)
print('Percentage of Mobile Likes Received = ',Percentage_of_mobile_likes_received)
print('Percentage of website Likes Received = ',Percentage_of_www_likes_received)


In [None]:
Not_mobile_likes = 100- Percentage_of_mobile_likes
Not_www_likes = 100 - Percentage_of_www_likes
Not_mobile_likes_received = 100- Percentage_of_mobile_likes_received
Not_www_likes_received = 100 - Percentage_of_www_likes_received

In [None]:
labels = 'Yes','No'
size1 = [Percentage_of_mobile_likes,Not_mobile_likes]
size2 = [Percentage_of_www_likes,Not_www_likes]
size3 = [Percentage_of_mobile_likes_received,Not_mobile_likes_received]
size4 = [Percentage_of_www_likes_received,Not_www_likes_received]

colors = ['gold', 'lightcoral']
plt.figure(figsize=(18,10), dpi=1600)

ax1 = plt.subplot2grid((2,2),(0,0))
plt.pie(size1,labels=labels, colors=colors,autopct='%1.1f%%',startangle=90)
plt.title('Users liked the Posts through Mobile App')

ax1 = plt.subplot2grid((2,2),(0,1))
plt.pie(size2,labels=labels, colors=colors,autopct='%1.1f%%',startangle=90)
plt.title('Users liked the Posts through Website')

ax1 = plt.subplot2grid((2,2),(1,0))
plt.pie(size3,labels=labels, colors=colors,autopct='%1.1f%%',startangle=90)
plt.title('Users Received the likes through Mobile App')

ax1 = plt.subplot2grid((2,2),(1,1))
plt.pie(size4,labels=labels, colors=colors,autopct='%1.1f%%',startangle=90)
plt.title('Users Received the likes through Website')

plt.axis('equal')
plt.show()


65% Users liked the Posts through Mobile App

70% Users liked the Posts through Website

38% Users Received the likes through Mobile App

63% Users Received the likes through Website



**How Friend Count and Friendships Initiated is Correlated**


In [None]:
figsize=(20,12)
sns.scatterplot(x="friend_count", y="friendships_initiated", hue="gender", data = fb_df)

**How Friend Count and Tenure Correlated**


In [None]:
figsize=(20,12)
sns.scatterplot(x="friend_count", y="tenure", hue="gender", data = fb_df)

# 5. Correlation of Features

In [None]:
correlations = fb_df.corr()
f, ax = plt.subplots(figsize=(14, 8))
sns.heatmap(data=correlations, annot = True, cmap='magma')
sns.despine(left=True, bottom=True)

Friend Count and Friendships Initiated have strong Correlations, as Friendships Initiated increases, the Friend Count tends to increase.

In [None]:
sns.pairplot(fb_df[['age','gender','tenure','friend_count','friendships_initiated','likes']],
             vars= ['age','tenure','friend_count','friendships_initiated','likes'],hue="gender", palette="husl")
plt.title('Pair Plot')


# 6. Conclusions


Maximum users of Facebook are of Age 15 to 30 years.

Maximum Age:113 years and Minimum Age:13 years

Maximum users of Facebook are Males, about 59.3%.

Even though the male users are more in number but Female users are more active in Facebook.

More number of User have joined from Last 2 Years of Data collection

More number of users have fewer than 500 Facebook Friends

Across all age group male users are more, except age group 51 to 80 where female users are more.

People with an age of less than 30 and more than 80 have more friends than the middle aged people.

Females with an age of less than 30 have more friends than the Males.

Facebook launched on 4th of February 2004 since then users are using, maximum tenure as per the dataset is around 9 years, from this we can confirm this dataset is created in 2013.

Across all age group male and female users tenure is almost same

65% Users liked the Posts through Mobile App 70% Users liked the Posts through Website 38% Users Received the likes through Mobile App 63% Users Received the likes through Website

Upto 60 years of age Female Users were initiated more friends.

Above 60 year of age Male Users were initiated more friends.