# Social Media Sentiments Analysis

Social media analysis is to understand audience, develop creative contents, increase traffic and sales, boost ROI, and improve strategic decision-making to achieve social media goals. Audience analysis helps improving customer experience, brand perception, and marketing strategy. Meanwhile, sentiment analysis is to find out how audience feel about your brand on social media via engagement activities such as likes, follows, clicks, retweets, comments, impressions, interests, and behaviours, etc. These metrics help marketing campains and measuring key performance indicators(KPIs). 

# About the dataset

The dataset captures audience emotions, trends, and interactions across different social media platform; Instagram, Facebook, Tweeter. It provides a snapshot of user generated content, encompassing text, timestamps, hashtags, countries, likes and retweets. It can be leveraged for diverse analytical purposes such as sentiment analysis, temporal analysis, user behaviour insights, platform specific analysis, hashtag trends, geographical analysis, user identification, and cross-analysis.

[Data Source](https://www.kaggle.com/datasets/kashishparmar02/social-media-sentiments-analysis-dataset)

* Text: User-generated content showcasing sentiments
* Sentiment: Categorized emotions
* Timestamp: Date and time information
* User: Unique identifiers of users contributing
* Platform: Social media platform where the content originated
* Hashtags: Identifies trending topics and themes
* Likes: Quantities user engagement
* Retweets: Reflects content popularity
* Country: Geographical origin of each post

# Identify objective

* Understand data to interpret insights about how customer feel on social media
* Analyze and visualize audience sentiments to improve audience experience
* Deliver strategic marketing metrics to achieve socal media goals

# 1. Import libaries and Load data

In [None]:

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
df = pd.read_csv("/kaggle/input/social-media-sentiments-analysis-dataset/sentimentdataset.csv")
df.head(5)

# 2. Preprocessing: Inspect and Clean Data

In [None]:
print('Columns of dataset: ', df.columns, '\n')
print('Dimension of dataset: ', df.shape, '\n')
print('Infomation of dataset: ', df.info())

In [None]:
df.isnull().sum() #<--- null value: none

In [None]:
df.duplicated().sum() #<---duplicates: none

# 3. Preprocessing: Wrangle and Transform data

In [None]:
# Drop unrelevant and unclear no-name columns from the dataset
df1=df.drop(['Unnamed: 0.1', 'Unnamed: 0'], axis=1)

In [None]:
# check statistical distribution of numerical variables
df1.describe()

In [None]:
# Check object columns: count of all values in the column, unique value, top value, frequency of value
df1.describe(include=['object'])

**pandas.Series.str.strip()**

Strip whitespaces (including newlines) or a set of specified characters from each string in the Series/Index from left and right sides. Replaces any non-strings in Series with NaNs.

Example: 
- Before using str.strip(): ' Twitter  ', ' Twitter ', ' Instagram ', ' Facebook '
- After using str.strip(): 'Twitter', 'Instagram', 'Facebook'


In [None]:
# Series.str.strip() in pandas: Remove leading and trailing characters in Series/Index.
df1['Text']= df1['Text'].str.strip()
df1['Sentiment']= df1['Sentiment'].str.strip()
df1['User']= df1['User'].str.strip()
df1['Platform']= df1['Platform'].str.strip()
df1['Hashtags']= df1['Hashtags'].str.strip()
df1['Country']= df1['Country'].str.strip()

In [None]:
# to see unique values in 'Platform' column
print("Print unique values in 'Platform'column: ", df1['Platform'].unique(), '\n')
print("Value counts in 'Platform' column: ", '\n', df1['Platform'].value_counts())

In [None]:
df1.sample(3)

In [None]:
# Transform the 'Timestamp' column to two columns, 'Date', 'Time'
import datetime as dt
df1['time'] = pd.to_datetime(df1.Timestamp)
df1['Date'] = df1['time'].dt.date
df1['Time'] = df1['time'].dt.time
#df1['new_Day']=df1['time'].dt.day
df1['Weekday']=df1['time'].dt.weekday  #<--- weekday value: 0 ~ 6
#df1.drop(['Timestamp', 'time'], axis=1) #<--- not working in here
df1.head(2)

In [None]:
# drop two columns: 'timestamp', 'time'
df2=df1.drop(['Timestamp', 'time'], axis=1)
df2.head(2)

In [None]:
# Transform name of the column and create new column 'Month_name' using replace() function
df2['Monthname']=df2['Month'].replace([1,2,3,4,5,6,7,8,9,10,11,12], ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'])
df2['Weekdayname']=df2['Weekday'].replace([0,1,2,3,4,5,6], ['Mon','Tue','Wed','Thur','Fri','Sat','Sun'])
df2.head(2)

In [None]:
df2.drop('Weekday', axis=1).sample(2)

In [None]:
# Check the value of columns: 'Monthname', 'Weekdayname'using np.unique()
print('Name of value in the Monthname column:', df2.Monthname.unique())
print('Name of value in the Weekdayname column: ', df2.Weekdayname.unique())

# 4. Analyze and visualize data

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(10, 5))
df2['Sentiment'].value_counts().nlargest(20).plot(kind='bar')
plt.title("Kinds of Sentiment in descending order")

In [None]:
#
plt.figure(figsize=(10, 5))
df2['Platform'].value_counts().plot(kind='pie', autopct='%1.1f%%')
plt.title("Proportion of Platform")
#plt.legend()

In [None]:
#
plt.figure(figsize=(10, 5))
df2['Country'].value_counts().nlargest(15).plot(kind='bar')

In [None]:
for column in df2[['Year', 'Likes', 'Retweets']]:
    print(f"Maxiumn value: {column}:{df2[column].max()} | Minimum value: {column}:{df2[column].min()}")

In [None]:
#
plt.figure(figsize=(10, 5))
df2.groupby('Country')['Likes'].sum().nlargest(15).plot(kind='bar')

In [None]:
#
plt.figure(figsize=(10, 5))
df3=df2.groupby('Hashtags')['Retweets'].sum().nlargest(10).sort_values(ascending=False)
df3.plot(kind='bar')
plt.xticks(rotation=80)
#control test angle

In [None]:
# Create variable 'Twitter' 
Twitter=df2[df2['Platform']=='Twitter']
df5=Twitter.groupby('Year')['Likes'].sum().reset_index()
plt.figure(figsize=(10, 5))
sns.lineplot(data=df5, x='Year', y='Likes', marker='o')
plt.title("Accumulative 'Likes' over years on Twitter")

In [None]:
# iterrows(): iterate over DataFrame rows as (index, Series) pairs.
Instagram=df2[df2['Platform']=='Instagram']
df_ins=Instagram.groupby('Year')['Retweets'].sum().reset_index()

plt.figure(figsize=(12, 5))
sns.lineplot(data=df_ins, x='Year', y='Retweets', marker='o')
for index, value in df_ins.iterrows():
    plt.text(value['Year'], value['Retweets'], str(value['Retweets']), ha='left', va='bottom')
plt.title("Accumulative 'Retweets' over time on Instagram")