# Social Buzz

Use the "Run" button to execute the code.

Importing all the required libraries that we will need for data cleaning and exploration.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

Loading all three tables from the forage website using the read_csv function provided in pandas library.

In [2]:
content = pd.read_csv('https://cdn.theforage.com/vinternships/companyassets/T6kdcdKSTfg2aotxT/MsAqi7SNLKw3C6LAr/1664298350004/Content.csv')
reactions = pd.read_csv('https://cdn.theforage.com/vinternships/companyassets/T6kdcdKSTfg2aotxT/MsAqi7SNLKw3C6LAr/1664298375459/Reactions.csv')
reaction_types = pd.read_csv('https://cdn.theforage.com/vinternships/companyassets/T6kdcdKSTfg2aotxT/MsAqi7SNLKw3C6LAr/1664298399720/ReactionTypes.csv')

Deleting the columns with no names from all the three tables.

In [3]:
content.drop('Unnamed: 0',axis=1,inplace=True)
reactions.drop(['Unnamed: 0','User ID'],axis=1,inplace=True)
reaction_types.drop('Unnamed: 0',axis=1,inplace=True)

Deleting the unnecessary columns like 'User ID' and 'URL' from the tables. Also cleaning and correcting the data wherever required.

In [4]:
cont=content.drop(['User ID','URL'],axis=1)
cont.dropna(axis=0,inplace=True) 
cont['Category'] = cont['Category'].str.replace('"',' ')
cont.rename(columns={'Type':'Content_type'},inplace=True)
cont

Unnamed: 0,Content ID,Content_type,Category
0,97522e57-d9ab-4bd6-97bf-c24d952602d2,photo,Studying
1,9f737e0a-3cdd-4d29-9d24-753f4e3be810,photo,healthy eating
2,230c4e4d-70c3-461d-b42c-ec09396efb3f,photo,healthy eating
3,356fff80-da4d-4785-9f43-bc1261031dc6,photo,technology
4,01ab84dd-6364-4236-abbb-3f237db77180,video,food
...,...,...,...
995,b4cef9ef-627b-41d7-a051-5961b0204ebb,video,public speaking
996,7a79f4e4-3b7d-44dc-bdef-bc990740252c,GIF,technology
997,435007a5-6261-4d8b-b0a4-55fdc189754b,audio,veganism
998,4e4c9690-c013-4ee7-9e66-943d8cbd27b7,GIF,culture


In [5]:
reactions.dropna(inplace=True)
reactions.rename(columns={'Type':'Reaction_type'},inplace=True)
reactions

Unnamed: 0,Content ID,Reaction_type,Datetime
1,97522e57-d9ab-4bd6-97bf-c24d952602d2,disgust,2020-11-07 09:43:50
2,97522e57-d9ab-4bd6-97bf-c24d952602d2,dislike,2021-06-17 12:22:51
3,97522e57-d9ab-4bd6-97bf-c24d952602d2,scared,2021-04-18 05:13:58
4,97522e57-d9ab-4bd6-97bf-c24d952602d2,disgust,2021-01-06 19:13:01
5,97522e57-d9ab-4bd6-97bf-c24d952602d2,interested,2020-08-23 12:25:58
...,...,...,...
25548,75d6b589-7fae-4a6d-b0d0-752845150e56,dislike,2020-06-27 09:46:48
25549,75d6b589-7fae-4a6d-b0d0-752845150e56,intrigued,2021-02-16 17:17:02
25550,75d6b589-7fae-4a6d-b0d0-752845150e56,interested,2020-09-12 03:54:58
25551,75d6b589-7fae-4a6d-b0d0-752845150e56,worried,2020-11-04 20:08:31


In [6]:
reaction_types.rename(columns={'Type':'Reaction_type'},inplace=True)
reaction_types

Unnamed: 0,Reaction_type,Sentiment,Score
0,heart,positive,60
1,want,positive,70
2,disgust,negative,0
3,hate,negative,5
4,interested,positive,30
5,indifferent,neutral,20
6,love,positive,65
7,super love,positive,75
8,cherish,positive,70
9,adore,positive,72


Merging content table with the reaction table and naming the resultant dataset as df.

In [7]:
df = cont.merge(reactions, how='left', on='Content ID')

In [8]:
df

Unnamed: 0,Content ID,Content_type,Category,Reaction_type,Datetime
0,97522e57-d9ab-4bd6-97bf-c24d952602d2,photo,Studying,disgust,2020-11-07 09:43:50
1,97522e57-d9ab-4bd6-97bf-c24d952602d2,photo,Studying,dislike,2021-06-17 12:22:51
2,97522e57-d9ab-4bd6-97bf-c24d952602d2,photo,Studying,scared,2021-04-18 05:13:58
3,97522e57-d9ab-4bd6-97bf-c24d952602d2,photo,Studying,disgust,2021-01-06 19:13:01
4,97522e57-d9ab-4bd6-97bf-c24d952602d2,photo,Studying,interested,2020-08-23 12:25:58
...,...,...,...,...,...
24606,75d6b589-7fae-4a6d-b0d0-752845150e56,audio,technology,dislike,2020-06-27 09:46:48
24607,75d6b589-7fae-4a6d-b0d0-752845150e56,audio,technology,intrigued,2021-02-16 17:17:02
24608,75d6b589-7fae-4a6d-b0d0-752845150e56,audio,technology,interested,2020-09-12 03:54:58
24609,75d6b589-7fae-4a6d-b0d0-752845150e56,audio,technology,worried,2020-11-04 20:08:31


Merging the reaction_types table with the resultant dataset df.

In [9]:
df = df.merge(reaction_types, how='left', on='Reaction_type')

Dropping the rows with null values.

In [20]:
df.dropna()

Unnamed: 0,Content ID,Content_type,Category,Reaction_type,Datetime,Sentiment,Score
0,97522e57-d9ab-4bd6-97bf-c24d952602d2,photo,Studying,disgust,2020-11-07 09:43:50,negative,0.0
1,97522e57-d9ab-4bd6-97bf-c24d952602d2,photo,Studying,dislike,2021-06-17 12:22:51,negative,10.0
2,97522e57-d9ab-4bd6-97bf-c24d952602d2,photo,Studying,scared,2021-04-18 05:13:58,negative,15.0
3,97522e57-d9ab-4bd6-97bf-c24d952602d2,photo,Studying,disgust,2021-01-06 19:13:01,negative,0.0
4,97522e57-d9ab-4bd6-97bf-c24d952602d2,photo,Studying,interested,2020-08-23 12:25:58,positive,30.0
...,...,...,...,...,...,...,...
24606,75d6b589-7fae-4a6d-b0d0-752845150e56,audio,technology,dislike,2020-06-27 09:46:48,negative,10.0
24607,75d6b589-7fae-4a6d-b0d0-752845150e56,audio,technology,intrigued,2021-02-16 17:17:02,positive,45.0
24608,75d6b589-7fae-4a6d-b0d0-752845150e56,audio,technology,interested,2020-09-12 03:54:58,positive,30.0
24609,75d6b589-7fae-4a6d-b0d0-752845150e56,audio,technology,worried,2020-11-04 20:08:31,negative,12.0


Analysing number of reaction types for each category.

In [25]:
df.groupby('Category').agg({"Reaction_type" : "count"}).reset_index().sort_values(by='Reaction_type', ascending=False)

Unnamed: 0,Category,Reaction_type
25,animals,1765
32,healthy eating,1711
37,technology,1667
34,science,1662
26,cooking,1640
39,travel,1618
31,food,1606
27,culture,1586
29,education,1397
35,soccer,1334


Sorting the category in ascending order of their scores.

In [26]:
df.groupby('Category').agg({"Score" : "sum"}).reset_index().sort_values(by='Score', ascending=False)

Unnamed: 0,Category,Score
25,animals,69548.0
32,healthy eating,69067.0
37,technology,67472.0
34,science,66043.0
26,cooking,63982.0
39,travel,63788.0
31,food,63122.0
27,culture,62915.0
29,education,56041.0
35,soccer,52684.0


Converting the cleaned dataset to csv file using the to_csv function provided in the pandas library. This csv file will be used for Data Visualization.

In [43]:
df.to_csv("C:\\Users\\Kuldeep\\Desktop\\sadna.csv")

Now we will use this csv file for visualization for better understanding of the data. We will use tableau for this purpose.