# Social Media Sentiments Analysis

## Table of Contents  <a id='back'></a> 
- [Project Introduction](#project-introduction)
    - [Analysis Outline](#analysis-outline)
    - [Results](#results)
- [Importing Libraries and Opening Data Files](#importing-libraries-and-opening-data-files)
- [Pre-Processing Data](#pre-processing-data)
    - [Header Style](#header-style)
    - [Duplicates](#duplicates)
    - [Missing Values](#missing-values)
    - [Data Usage and Formatting](#data-usage-and-formatting)
- [Data Analysis](#data-analysis)
- [Conclusion](#conclusion)

<a name='headers'>

## Project Introduction


### Analysis Outline


### Results

## Importing Libraries and Opening Data Files

In [1]:
# Importing the needed libraries for this assignment
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns

In [2]:
# Importing file for assignment
try:
    sm = pd.read_csv('sentimentdataset.csv')
except:
    sm = pd.read_csv('/datasets/sentimentdataset.csv')

[Back to Table of Contents](#back)

## Pre-Processing Data

### Header Style

In [3]:
# Getting general information about the dataset
sm.info()
sm.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 732 entries, 0 to 731
Data columns (total 15 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Unnamed: 0.1  732 non-null    int64  
 1   Unnamed: 0    732 non-null    int64  
 2   Text          732 non-null    object 
 3   Sentiment     732 non-null    object 
 4   Timestamp     732 non-null    object 
 5   User          732 non-null    object 
 6   Platform      732 non-null    object 
 7   Hashtags      732 non-null    object 
 8   Retweets      732 non-null    float64
 9   Likes         732 non-null    float64
 10  Country       732 non-null    object 
 11  Year          732 non-null    int64  
 12  Month         732 non-null    int64  
 13  Day           732 non-null    int64  
 14  Hour          732 non-null    int64  
dtypes: float64(2), int64(6), object(7)
memory usage: 85.9+ KB


Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,Text,Sentiment,Timestamp,User,Platform,Hashtags,Retweets,Likes,Country,Year,Month,Day,Hour
0,0,0,Enjoying a beautiful day at the park! ...,Positive,2023-01-15 12:30:00,User123,Twitter,#Nature #Park,15.0,30.0,USA,2023,1,15,12
1,1,1,Traffic was terrible this morning. ...,Negative,2023-01-15 08:45:00,CommuterX,Twitter,#Traffic #Morning,5.0,10.0,Canada,2023,1,15,8
2,2,2,Just finished an amazing workout! 💪 ...,Positive,2023-01-15 15:45:00,FitnessFan,Instagram,#Fitness #Workout,20.0,40.0,USA,2023,1,15,15
3,3,3,Excited about the upcoming weekend getaway! ...,Positive,2023-01-15 18:20:00,AdventureX,Facebook,#Travel #Adventure,8.0,15.0,UK,2023,1,15,18
4,4,4,Trying out a new recipe for dinner tonight. ...,Neutral,2023-01-15 19:55:00,ChefCook,Instagram,#Cooking #Food,12.0,25.0,Australia,2023,1,15,19


In [4]:
#checking for snakecase format
sm.columns

Index(['Unnamed: 0.1', 'Unnamed: 0', 'Text', 'Sentiment', 'Timestamp', 'User',
       'Platform', 'Hashtags', 'Retweets', 'Likes', 'Country', 'Year', 'Month',
       'Day', 'Hour'],
      dtype='object')

In [5]:
# Renaming column names to snake_case format
sm = sm.rename(columns={'Unnamed: 0.1': 'unnamed_0.01',
                        'Unnamed: 0': 'unnamed_0',
                        'Text': 'text',
                        'Sentiment': 'sentiment',
                        'Timestamp': 'timestamp',
                        'User': 'user',
                        'Platform': 'platform',
                        'Hashtags': 'hashtags',
                        'Retweets': 'retweets',
                        'Likes': 'likes',
                        'Country': 'country',
                        'Year': 'year',
                        'Month': 'month',
                        'Day': 'day',
                        'Hour': 'hour'})
sm.columns

Index(['unnamed_0.01', 'unnamed_0', 'text', 'sentiment', 'timestamp', 'user',
       'platform', 'hashtags', 'retweets', 'likes', 'country', 'year', 'month',
       'day', 'hour'],
      dtype='object')

### Duplicates

In [6]:
# Checking for duplicates
sm.duplicated().sum()

0

### Missing Values

In [7]:
# Checking for null values
sm.isna().sum()

unnamed_0.01    0
unnamed_0       0
text            0
sentiment       0
timestamp       0
user            0
platform        0
hashtags        0
retweets        0
likes           0
country         0
year            0
month           0
day             0
hour            0
dtype: int64

### Data Usage and Formatting

In [8]:
sm.info()
sm.head(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 732 entries, 0 to 731
Data columns (total 15 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   unnamed_0.01  732 non-null    int64  
 1   unnamed_0     732 non-null    int64  
 2   text          732 non-null    object 
 3   sentiment     732 non-null    object 
 4   timestamp     732 non-null    object 
 5   user          732 non-null    object 
 6   platform      732 non-null    object 
 7   hashtags      732 non-null    object 
 8   retweets      732 non-null    float64
 9   likes         732 non-null    float64
 10  country       732 non-null    object 
 11  year          732 non-null    int64  
 12  month         732 non-null    int64  
 13  day           732 non-null    int64  
 14  hour          732 non-null    int64  
dtypes: float64(2), int64(6), object(7)
memory usage: 85.9+ KB


Unnamed: 0,unnamed_0.01,unnamed_0,text,sentiment,timestamp,user,platform,hashtags,retweets,likes,country,year,month,day,hour
0,0,0,Enjoying a beautiful day at the park! ...,Positive,2023-01-15 12:30:00,User123,Twitter,#Nature #Park,15.0,30.0,USA,2023,1,15,12
1,1,1,Traffic was terrible this morning. ...,Negative,2023-01-15 08:45:00,CommuterX,Twitter,#Traffic #Morning,5.0,10.0,Canada,2023,1,15,8
2,2,2,Just finished an amazing workout! 💪 ...,Positive,2023-01-15 15:45:00,FitnessFan,Instagram,#Fitness #Workout,20.0,40.0,USA,2023,1,15,15
3,3,3,Excited about the upcoming weekend getaway! ...,Positive,2023-01-15 18:20:00,AdventureX,Facebook,#Travel #Adventure,8.0,15.0,UK,2023,1,15,18
4,4,4,Trying out a new recipe for dinner tonight. ...,Neutral,2023-01-15 19:55:00,ChefCook,Instagram,#Cooking #Food,12.0,25.0,Australia,2023,1,15,19
5,5,5,Feeling grateful for the little things in lif...,Positive,2023-01-16 09:10:00,GratitudeNow,Twitter,#Gratitude #PositiveVibes,25.0,50.0,India,2023,1,16,9
6,6,6,Rainy days call for cozy blankets and hot coc...,Positive,2023-01-16 14:45:00,RainyDays,Facebook,#RainyDays #Cozy,10.0,20.0,Canada,2023,1,16,14
7,7,7,The new movie release is a must-watch! ...,Positive,2023-01-16 19:30:00,MovieBuff,Instagram,#MovieNight #MustWatch,15.0,30.0,USA,2023,1,16,19
8,8,8,Political discussions heating up on the timel...,Negative,2023-01-17 08:00:00,DebateTalk,Twitter,#Politics #Debate,30.0,60.0,USA,2023,1,17,8
9,9,9,Missing summer vibes and beach days. ...,Neutral,2023-01-17 12:20:00,BeachLover,Facebook,#Summer #BeachDays,18.0,35.0,Australia,2023,1,17,12


In [9]:
# Looking at both of the numeric unnamed columns, it appears to look like a column 
# that was accidentally recording the row value. If they are we can remove these columns.

sm['unnamed_0'].isin(sm['unnamed_0.01']).count()

732

In [10]:
# After uncovering that these two columns are identical to the row value, 
# they can be removed to optimize data usage

sm = sm.drop(columns=['unnamed_0.01', 'unnamed_0'])

In [11]:
# Lowering the text column to snakecase format

sm['text'] = sm['text'].str.lower()

In [12]:
# Lowering the elements to snakecase format and changing the data type to lower data usage

sm['sentiment'] = sm['sentiment'].str.lower()
sm['sentiment'] = sm['sentiment'].astype('category')

In [13]:
# Looking at the first few values it shows the time stamp column is a string type but
# we should convert it to a datetime type to save data usage

sm['timestamp'] = pd.to_datetime(sm['timestamp'], format='%Y-%m-%d %H:%M:%S')

In [14]:
# Lowering the user column to snakecase format

sm['user'] = sm['user'].str.lower()

In [15]:
# Lowering the platform column to snakecase format

sm['platform'] = sm['platform'].str.lower()
sm['platform'] = sm['platform'].astype('category')

In [16]:
# Lowering the hashtags column to snakecase format

sm['hashtags'] = sm['hashtags'].str.lower()
sm['hashtags'] = sm['hashtags']

In [17]:
# Lowering the country column to snakecase format

sm['country'] = sm['country'].str.lower()
sm['country'] = sm['country'].astype('category')

In [18]:
sm.head(10)

Unnamed: 0,text,sentiment,timestamp,user,platform,hashtags,retweets,likes,country,year,month,day,hour
0,enjoying a beautiful day at the park! ...,positive,2023-01-15 12:30:00,user123,twitter,#nature #park,15.0,30.0,usa,2023,1,15,12
1,traffic was terrible this morning. ...,negative,2023-01-15 08:45:00,commuterx,twitter,#traffic #morning,5.0,10.0,canada,2023,1,15,8
2,just finished an amazing workout! 💪 ...,positive,2023-01-15 15:45:00,fitnessfan,instagram,#fitness #workout,20.0,40.0,usa,2023,1,15,15
3,excited about the upcoming weekend getaway! ...,positive,2023-01-15 18:20:00,adventurex,facebook,#travel #adventure,8.0,15.0,uk,2023,1,15,18
4,trying out a new recipe for dinner tonight. ...,neutral,2023-01-15 19:55:00,chefcook,instagram,#cooking #food,12.0,25.0,australia,2023,1,15,19
5,feeling grateful for the little things in lif...,positive,2023-01-16 09:10:00,gratitudenow,twitter,#gratitude #positivevibes,25.0,50.0,india,2023,1,16,9
6,rainy days call for cozy blankets and hot coc...,positive,2023-01-16 14:45:00,rainydays,facebook,#rainydays #cozy,10.0,20.0,canada,2023,1,16,14
7,the new movie release is a must-watch! ...,positive,2023-01-16 19:30:00,moviebuff,instagram,#movienight #mustwatch,15.0,30.0,usa,2023,1,16,19
8,political discussions heating up on the timel...,negative,2023-01-17 08:00:00,debatetalk,twitter,#politics #debate,30.0,60.0,usa,2023,1,17,8
9,missing summer vibes and beach days. ...,neutral,2023-01-17 12:20:00,beachlover,facebook,#summer #beachdays,18.0,35.0,australia,2023,1,17,12


In [19]:
sm.info()

# Memory usage has decreased from 85.9kb to 75.6kb

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 732 entries, 0 to 731
Data columns (total 13 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   text       732 non-null    object        
 1   sentiment  732 non-null    category      
 2   timestamp  732 non-null    datetime64[ns]
 3   user       732 non-null    object        
 4   platform   732 non-null    category      
 5   hashtags   732 non-null    object        
 6   retweets   732 non-null    float64       
 7   likes      732 non-null    float64       
 8   country    732 non-null    category      
 9   year       732 non-null    int64         
 10  month      732 non-null    int64         
 11  day        732 non-null    int64         
 12  hour       732 non-null    int64         
dtypes: category(3), datetime64[ns](1), float64(2), int64(4), object(3)
memory usage: 75.6+ KB


[Back to Table of Contents](#back)

### Data Wrangling

In [20]:
sm.head(10)

Unnamed: 0,text,sentiment,timestamp,user,platform,hashtags,retweets,likes,country,year,month,day,hour
0,enjoying a beautiful day at the park! ...,positive,2023-01-15 12:30:00,user123,twitter,#nature #park,15.0,30.0,usa,2023,1,15,12
1,traffic was terrible this morning. ...,negative,2023-01-15 08:45:00,commuterx,twitter,#traffic #morning,5.0,10.0,canada,2023,1,15,8
2,just finished an amazing workout! 💪 ...,positive,2023-01-15 15:45:00,fitnessfan,instagram,#fitness #workout,20.0,40.0,usa,2023,1,15,15
3,excited about the upcoming weekend getaway! ...,positive,2023-01-15 18:20:00,adventurex,facebook,#travel #adventure,8.0,15.0,uk,2023,1,15,18
4,trying out a new recipe for dinner tonight. ...,neutral,2023-01-15 19:55:00,chefcook,instagram,#cooking #food,12.0,25.0,australia,2023,1,15,19
5,feeling grateful for the little things in lif...,positive,2023-01-16 09:10:00,gratitudenow,twitter,#gratitude #positivevibes,25.0,50.0,india,2023,1,16,9
6,rainy days call for cozy blankets and hot coc...,positive,2023-01-16 14:45:00,rainydays,facebook,#rainydays #cozy,10.0,20.0,canada,2023,1,16,14
7,the new movie release is a must-watch! ...,positive,2023-01-16 19:30:00,moviebuff,instagram,#movienight #mustwatch,15.0,30.0,usa,2023,1,16,19
8,political discussions heating up on the timel...,negative,2023-01-17 08:00:00,debatetalk,twitter,#politics #debate,30.0,60.0,usa,2023,1,17,8
9,missing summer vibes and beach days. ...,neutral,2023-01-17 12:20:00,beachlover,facebook,#summer #beachdays,18.0,35.0,australia,2023,1,17,12


In [21]:
sm_hashtags = sm[['user', 'hashtags']].copy()

[Back to Table of Contents](#back)

## Data Analysis

[Back to Table of Contents](#back)

## Conclusion

[Back to Table of Contents](#back)