# Introduction

Social media sentiment analysis involves extracting opinions from text data to categorize them into sentiments such as positive, negative, and neutral. This helps in understanding public opinion about various topics, products, or services.

# Data Preparation

__Objective:__ Prepare social media sentiment data for analysis.

1. __Load the Data:__ Import data from a CSV file.

2. __Check for Missing Values:__ Identify and handle missing values.

3. __Convert Dates:__ Ensure the date column is in datetime format.

4. __Set Index:__ Set the date column as the index.

5. __Clean Text Data:__ Preprocess text data for sentiment analysis.

In [10]:
# Import the required libraries
import pandas as pd 

data_url = 'https://raw.githubusercontent.com/EdulaneDotCo/kaggle/main/data/social_media_sentimate_data.csv'
df_sentiments = pd.read_csv(data_url)

# Print the first few rows to varify the columns name.
print(df_sentiments.head())

# Check for the missing values 
print(df_sentiments.isnull().sum())

# Drop rows with missing values in 'clean tweet column
df_sentiments.dropna(subset=['clean_tweet'], inplace = True)

# Convert 'New_Date' column to datetime
df_sentiments['New_Date'] = pd.to_datetime(df_sentiments['New_Date'])

# Set 'New_Date' as index 
df_sentiments.set_index('New_Date', inplace = True)

# Display the prepared data
print(df_sentiments.head)

                        Date  \
0  2023-04-08 03:31:08+00:00   
1  2023-04-08 03:30:51+00:00   
2  2023-04-08 03:30:00+00:00   
3  2023-04-08 03:28:59+00:00   
4  2023-04-08 03:28:31+00:00   

                                               Tweet  \
0  OpenAI’s GPT-4 Just Got Supercharged! #ai #Cha...   
1  "Classical art" is struggling - not changed th...   
2  Alibaba invites businesses to trial 'ChatGPT r...   
3  Trying to stop students from using #AI and #ch...   
4  I Asked ChatGPT's AI Chatbot How Can I Earn Cr...   

                                                 Url            User  \
0  https://twitter.com/tubeblogger/status/1644543...     tubeblogger   
1  https://twitter.com/majorradic/status/16445432...      majorradic   
2  https://twitter.com/gadgetsnow/status/16445430...      gadgetsnow   
3  https://twitter.com/Sherab_Taye/status/1644542...     Sherab_Taye   
4  https://twitter.com/cryptoccentral/status/1644...  cryptoccentral   

                 UserCreated  UserVer

__Explanation:__

1. __Load the Data:__ We use pd.read_csv to load the sentiment data from a CSV file.

2. __Check for Missing Values:__ We use `df_sentiments.isnull().sum()` to identify missing values and drop rows with missing values in the ‘New_Date’ and ‘clean_tweet’ columns using `dropna`.

3. __Convert Dates:__ We ensure the ‘New_Date’ column is in datetime format using `pd.to_datetime`.

4. __Set Index:__ We set the ‘New_Date’ column as the index using `df_sentiments.set_index`.

# Sentiments Analysis 

__Objective:__ Perform sentiment analysis to categorize sentiments into positive, negative, and neutral.

1. __Install and Import Required Libraries:__ Install nltk and import necessary libraries.

2. __Load Sentiment Analyzer:__ Use nltk‘s VADER sentiment analyzer.

3. __Analyze Sentiments:__ Calculate sentiment scores and categorize them.

In [16]:
# Install NLTK if not already installed 
# !pip install nltk

# Import Required libraries 
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Download Vader lexicon
nltk.download('vader_lexicon')

# Load Sentiments analyzer
sid = SentimentIntensityAnalyzer()

# Calculate sentiment score for each tweet
df_sentiments['sentiment_scores'] = df_sentiments['clean_tweet'].apply(lambda tweet: sid.polarity_scores(tweet))

# Extract compound score
df_sentiments['compound_score'] = df_sentiments['sentiment_scores'].apply(lambda score_dict: score_dict['compound'])

# Categories sentiments based on compund score
df_sentiments['sentiment'] = df_sentiments['compound_score'].apply(lambda score: 'positive' if score >= 0.05 else ('negative' if score <=-0.05 else 'neutral'))

# Display the sentiments analysis results 
print(df_sentiments[['clean_tweet', 'compound_score', 'sentiment']].head())


[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\gaura\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


                                                  clean_tweet  compound_score  \
New_Date                                                                        
2023-04-08               openai’s gpt4 just got supercharged!          0.0000   
2023-04-08  classical art" is struggling  not changed the ...         -0.2500   
2023-04-08  alibaba invites businesses to trial chatgpt ri...          0.0000   
2023-04-08  trying to stop students from using and is like...         -0.2263   
2023-04-08  i asked chatgpts ai chatbot how can i earn cry...          0.0000   

           sentiment  
New_Date              
2023-04-08   neutral  
2023-04-08  negative  
2023-04-08   neutral  
2023-04-08  negative  
2023-04-08   neutral  
