![Twitter Sentiment Analysis](tt4.png)

# Twitter Sentiment Analysis

With the large amount of data generated by users on social networks, social network monitoring techniques have become increasingly relevant. In this context, natural language processing (NLP) techniques have become essential to extract relevant information from this unstructured data.

This project aims to use NLP techniques to analyze sentiments in tweets using Gensim Word2Vec for text vectorization and the logistic regression algorithm from the sklearn library for sentiment classification into negative or positive.

The word2vec algorithm uses a neural network model to learn word associations from a large corpus of text. It is capable of capturing semantic relationships between words, which allows for superior performance in classification.

It is expected that the results obtained can contribute to the understanding of the sentiments expressed by Twitter users regarding certain topics and events.

This project has been divided into three notebooks:<br>
The first notebook concerns data visualization and cleaning.<br>
The second notebook concerns training of the word2vec and logistic regression models.<br>
The third notebook uses the previously trained models for classifying tweets extracted using the Twitter API and the tweepy library.

***

# Classificating tweets using Twitter API and Tweepy library

## Importing Libs

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import pickle
import nltk
import unidecode
import numpy as np
import tweepy as tw
import configparser
from wordcloud import WordCloud
from nltk import tokenize
from nltk.stem import PorterStemmer
from string import punctuation
from gensim.models import KeyedVectors

## loading Trained Models

In [2]:
w2v_cbow = KeyedVectors.load("models/word2vec_cbow.model")

In [3]:
w2v_sg = KeyedVectors.load("models/word2vec_sg.model")

In [4]:
with open("models/lr_cbow.pkl", "rb") as f:
    lr_cbow = pickle.load(f)

In [5]:
with open("models/lr_sg.pkl", "rb") as f:
    lr_sg = pickle.load(f)

## Defining functions to pre-process tweets

In [6]:
stop_words = nltk.corpus.stopwords.words("english")
ws_tokenizer = tokenize.WhitespaceTokenizer()
punc_tokenizer = tokenize.WordPunctTokenizer()
porter = PorterStemmer()
punc_lst = [p for p in punctuation]

In [7]:
def remove_stopwords (tweet):
    ntweet = list()
    words = ws_tokenizer.tokenize(tweet)
    for word in words:
        if (word.lower() not in stop_words) and (word.lower().startswith("http")==False) and (word.lower().startswith("&")==False):
            ntweet.append(word.lower())
    return ' '.join(word for word in ntweet)

In [8]:
stop_words2 = stop_words + punc_lst
def remove_punctuation (tweet):
    ntweet = list()
    
    words = punc_tokenizer.tokenize(tweet)
    for word in words:
        if word not in stop_words2:
            ntweet.append(word)
    return ' '.join(word for word in ntweet)

In [9]:
stop_words3 = [unidecode.unidecode(text) for text in stop_words2]
def remove_accentuation (tweet):
    tweet = unidecode.unidecode(tweet)
    
    ntweet = list()
    words = punc_tokenizer.tokenize(tweet)
    for word in words:
        if word not in stop_words3:
            ntweet.append(word)
    return ' '.join(word for word in ntweet)

In [10]:
def stemming (tweet):
    ntweet = list()
    words = punc_tokenizer.tokenize(tweet)
    for word in words:
        if word not in stop_words3:
            ntweet.append(porter.stem(word))
    return ' '.join(word for word in ntweet)

In [11]:
def only_alpha(tweet):
    ntweet = list()
    words = ws_tokenizer.tokenize(tweet)
    for word in words:
        if word.isalpha() and word!='quot':
            ntweet.append(word)
    return ' '.join(word for word in ntweet)

In [12]:
def preprocess_tweet(tweet):
    tweet = remove_stopwords(tweet)
    tweet = remove_punctuation(tweet)
    tweet = remove_accentuation(tweet)
    tweet = stemming(tweet)
    tweet = only_alpha(tweet)
    return tweet

### Creating function to vectorize tweets

In [13]:
def vectorize (tweet, model):
    vector = np.zeros(300)
    
    words = ws_tokenizer.tokenize(tweet)
    for word in words:
        if word in model.wv:
            vector += model.wv.get_vector(word)
    return vector

### Creating and testing predict function

In [59]:
def predict(tweet, vec_model, lr_model):
    original_tweet = tweet
    tweet = preprocess_tweet(tweet)
    vector = vectorize(tweet, vec_model)
    
    result = lr_model.predict(vector.reshape(1, -1))
    
    if (result[0]==0):
        sentiment = 'Negative'
    
    if (result[0]==1):
        sentiment = 'Positive'
        
    return('"' + original_tweet + '" is a ' + sentiment + ' tweet.')

In [61]:
tweet = "My dog died"
print(predict(tweet,w2v_cbow, lr_cbow))

tweet = "I've found 10 bucks today"
print(predict(tweet,w2v_cbow, lr_cbow))

tweet = "feeling tired"
print(predict(tweet,w2v_cbow, lr_cbow))

tweet = "i can't wait for the weekend!!! it's gonna be awesome"
print(predict(tweet,w2v_cbow, lr_cbow))

tweet = "i'm excited about college this year"
print(predict(tweet,w2v_cbow, lr_cbow))

tweet = "I procrastinated a lot, the exams start next week and I don't know anything"
print(predict(tweet,w2v_cbow, lr_cbow))

tweet = "I failed the test today"
print(predict(tweet,w2v_cbow, lr_cbow))

"My dog died" is a Negative tweet.
"I've found 10 bucks today" is a Positive tweet.
"feeling tired" is a Negative tweet.
"i can't wait for the weekend!!! it's gonna be awesome" is a Positive tweet.
"i'm excited about college this year" is a Positive tweet.
"I procrastinated a lot, the exams start next week and I don't know anything" is a Negative tweet.
"I failed the test today" is a Negative tweet.


### Autenticating Twitter API and extracting tweets using Tweepy

In [16]:
api_key = 
api_secret = 
bearer_token = 
access_token = 
access_token_secret = 
client_id = 
client_secret = 


In [52]:
authenticator = tw.OAuthHandler(api_key, api_secret)
authenticator.set_access_token(access_token, access_token_secret)

api = tw.API(authenticator, wait_on_rate_limit=True)

crypto_coin = "I am"
search_term = f'{crypto_coin} -filter:retweets'

tweet_cursor = tw.Cursor(api.search_tweets, q= search_term, lang="en",
tweet_mode="extended").items(100)

tweets = [tweet.full_text for tweet in tweet_cursor]
print(tweets)

["@uptownsaul Well if it's dishonest then why is XRP available to buy on exchanges all over the world....it's only in the US that SOME of the exchanges delisted XRP..\nIf it's not meant for retail then explain why I am able to buy $100k of XRP daily....", "@TransRightsSlay @jay59834501 @sloppyrl @MrAndyNgo @elonmusk I am exposed to compelled speech and pressured into participating in gender identity ideology against my own beliefs. That doesn't make me feel great at all? But you don't care about that, as long as the respect is all one way in your direction.", '@bostongeorgeohh I have a pair of Armada invictus 89 but I am thinking of getting a wider ski for powder days out west and the rare east coast powder day.  Looking at the Ripstick 106. Either the black or non black.  Thoughts?', 'im still trying, struggling to recover everything inside and out. yes i am happy of what i am doing right now but this is what you wanted right? don’t play victim terpaling tersakiti when you’re the one 

In [68]:
import random
random = random.randint(0, 90)
print(random)
test_tweets = tweets[random:random+10]

45


In [69]:
for i in test_tweets:
    print(predict(i,w2v_cbow, lr_cbow) + '\n')

"I’m proud to announce that I - Venusian Prince - did in fact study for his 10 am exam that he will be taking tomorrow. 😌 https://t.co/7axPStvj5M" is a Positive tweet.

"@QueenOfDiagolon Pod bean, and listen in the truck first thing in the am, it gives me that jump start I need in the morning!" is a Positive tweet.

"@POTUS You are obviously affraid it will be used against you.....I love your support of the seond ammendment.....BTW, when they are banned, you will have no Milatary, but I am sure that does not concern you.  Why not just give more weapons to overseas countries, that will help solve it." is a Positive tweet.

"Who pro am team should I join ?" is a Positive tweet.

"I have accepted the fact that I am not religious.
I have accepted the fact that I have much to learn.
I have accepted the fact that life fucks all.
I have accepted the fact that I am going to hell. 
I have accepted the fact that people are the worst." is a Negative tweet.

"@kxpture I am King 👑" is a Positive tw