## Real time Public sentiment analysis for Samsung

### Installing the necessary library

In [1]:
!pip install tweepy pandas nltk



In [2]:
import nltk
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\Harsha\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

This script fetches the latest 100 tweets about "Samsung" using the Twitter API (via Tweepy), filters out retweets and non-English tweets, and analyzes the sentiment of each tweet using NLTK’s VADER sentiment analyzer. It labels each tweet as Positive, Negative, or Neutral based on its sentiment score. Then, it stores the tweet text, date, and sentiment in a table and saves the results to a CSV file for further analysis.

In [4]:
import tweepy
import pandas as pd
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Initialize sentiment analyzer
nltk.download('vader_lexicon')
sia = SentimentIntensityAnalyzer()

# --- Your Bearer Token from Developer Portal ---
BEARER_TOKEN = "token"

# Initialize client
client = tweepy.Client(bearer_token=BEARER_TOKEN)

# Define query and tweet fetch function
query = "Samsung -is:retweet lang:en"
tweets = client.search_recent_tweets(query=query, tweet_fields=['created_at', 'text'], max_results=100)

# Analyze sentiment
data = []
for tweet in tweets.data:
    text = tweet.text
    sentiment_score = sia.polarity_scores(text)['compound']
    if sentiment_score >= 0.05:
        sentiment = 'Positive'
    elif sentiment_score <= -0.05:
        sentiment = 'Negative'
    else:
        sentiment = 'Neutral'
    data.append([tweet.created_at, text, sentiment])

# Save to DataFrame
df = pd.DataFrame(data, columns=["Date", "Tweet", "Sentiment"])
df.to_csv("samsung_sentiment_analysis1.csv", index=False)

print("✅ Saved 100 Samsung tweets with sentiment labels to samsung_sentiment_analysis.csv")

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\Harsha\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


✅ Saved 100 Samsung tweets with sentiment labels to samsung_sentiment_analysis.csv


In [7]:
import pandas as pd
import glob

# Step 1: Find all CSV files matching the pattern
csv_files = glob.glob("samsung_sentiment_*.csv")

# Step 2: Read and combine all found CSV files
if csv_files:
    combined_df = pd.concat([pd.read_csv(file) for file in csv_files], ignore_index=True)

    # Step 3: Save to new CSV and Excel files
    combined_df.to_csv("samsung_sentiment_combined.csv", index=False)
    combined_df.to_excel("samsung_sentiment_combined.xlsx", index=False)

    print("✅ Files combined and saved as 'samsung_sentiment_combined.csv' and '.xlsx'")
else:
    print("❌ No files found matching 'samsung_sentiment_*.csv'. Please check the filenames and location.")


✅ Files combined and saved as 'samsung_sentiment_combined.csv' and '.xlsx'


In [3]:
import pandas as pd
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Load your existing dataset
df = pd.read_excel('samsung_sentiment_combined.xlsx')

# Initialize the sentiment analyzer
sid = SentimentIntensityAnalyzer()

# Apply VADER compound score to each tweet
df['Score'] = df['Tweet'].apply(lambda x: sid.polarity_scores(str(x))['compound'])

# Save the updated file
df.to_excel('samsung_sentiment_with_scores.xlsx', index=False)

print("Sentiment scores added and saved to 'samsung_sentiment_with_scores.xlsx'")


Sentiment scores added and saved to 'samsung_sentiment_with_scores.xlsx'


In [1]:
import pandas as pd
from collections import Counter
import re
import nltk
from nltk.corpus import stopwords

nltk.download('stopwords')

# Load your existing tweets dataset
df = pd.read_csv("samsung_sentiment_combined.csv")  # or your file name

# Function to clean and tokenize
def preprocess(text):
    text = re.sub(r"http\S+|@\S+|#[A-Za-z0-9_]+", "", text)  # Remove URLs, mentions, hashtags
    text = re.sub(r"[^a-zA-Z\s]", "", text)  # Remove non-alphabetic characters
    text = text.lower()
    tokens = text.split()
    tokens = [word for word in tokens if word not in stopwords.words("english") and len(word) > 2]
    return tokens

# Separate by sentiment
data = []
for sentiment in df['Sentiment'].unique():
    subset = df[df['Sentiment'] == sentiment]
    words = []
    for tweet in subset['Tweet']:
        words += preprocess(str(tweet))
    word_freq = Counter(words)
    for word, freq in word_freq.most_common(50):  # top 50 words per sentiment
        data.append({'Word': word, 'Frequency': freq, 'Sentiment': sentiment})

# Create DataFrame
wordcloud_df = pd.DataFrame(data)

# Save to CSV
wordcloud_df.to_csv("wordcloud_by_sentiment.csv", index=False)
print("✅ Word cloud CSV generated.")


[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Harsha\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\stopwords.zip.


✅ Word cloud CSV generated.
