<a href="https://colab.research.google.com/github/joshuacalloway/dsc540groupproject/blob/main/StartingTrumpTweets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using Natural Language Processing on Trump's Tweets
- Joshua Calloway
- DSC 540, Fall Quarter - DePaul


# Summary of Project
We are looking Trump's tweets and applying NLP to see if we can build a classifier to predict sentiment on new tweets.
Trump tweets are widely available, and we can run the project on either small set of 1000 tweets or larger set of 55,000 tweets.  All of these tweets are available on thetrumparchive and @realDonaldTrump twitter handle

I selected the [Azure Predictive Analysis](https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/) to create the **ground truth** sentiment labels.  Azure produces labels of **neutral, positive, negative**.

with the tweets, we then split it into training and validation ( 90 percent of data ) and test data ( 10 percent of data ). With the training data, we apply the various methods

- [Tweepy](https://www.tweepy.org/) and [TextBlob](https://textblob.readthedocs.io/en/dev/) for analyzing sentiment
- Tuning of Tweepy/TextBlob by trying different neutral cutoff values to distinguish a tweet as neutral
- [LogisticRegression](https://www.twilio.com/blog/2017/12/sentiment-analysis-scikit-learn.html) using the Tweepy/TextBlob to train the model
- [NLTK toolkit and NaiveBayesClassifier](https://www.twilio.com/blog/2017/12/sentiment-analysis-scikit-learn.html) using the Tweepy/TextBlob to train the model
- [FastText](https://fasttext.cc) with single label sentiment using the Tweepy/TextBlob to train the model
    - FastText with hyper-parameter tuning of epochs, learning rate, ngrams using the [fasttext autotune](https://fasttext.cc/docs/en/autotune.html) against the validation file
    
## The Findings

|the method| accuracy | Notes |
|----------|--------------|---|
|Tweepy/TextBlob| 0.6| seems slightly better coin flip |
|Tweepy/TextBlob with optimal neutralCutoff at 0.22 | 0.67 | some improvement with tuning |
|Logistic Regression trained by Tweepy/TextBlob| 0.46| terrible, worse then coin flip |
|NLTK Toolkit and NaiveBayes classifier trained with Tweepy/TextBlob| 0.45 | again, terrible |
|FastText trained with Tweepy/TextBlob | 0.54 | Ok with default hyper-parameters tuning | 
|FastText hyper-parameter tuned trained with Tweepy/TextBlob | 0.56 | slightly better with autotuned hyper parameters|
  


# The Data, Trump Tweets ( either 1000 or 55,0000 tweets )
Here we use thetrumparchive to either fetch 1000 or larger set of 55,000 tweets.  The tweets come back as JSON in format of
<code>
{
  id: 1
  text: 'Lets win Michigan'
  isRetweet: True
  isDeleted: False
  device: iPhone
  favorites: 323,
  retweets: 2
  date: 2020-11-02
}
</code>

In [None]:
# azure, nltk and fasttext libraries
!pip install -r otherreq.txt


In [3]:
import urllib.request, json
from sklearn.model_selection import train_test_split
from pandas import DataFrame

In [4]:
# If LargeData is True, then we fetch 55,0000 tweets
def fetch_data(largeData=False):
    if largeData:
        with open('tweets_11-06-2020.json') as f:
            data = json.load(f)  # fetch 50,0000 tweets
    else:
        with urllib.request.urlopen("https://www.thetrumparchive.com/latest-tweets") as url:
            data = json.loads(url.read().decode())
    return DataFrame(data)


### We can either fetch 1000 or 55,000 tweets by switching the flag largeData

In [5]:
# we r interested in the text for NLP
data = fetch_data(largeData=False)
data.head()

Unnamed: 0,id,text,isRetweet,isDeleted,device,favorites,retweets,date
0,1329963571250335744,https://t.co/YHscjY6G8t,False,False,Twitter for iPhone,105034,29537,2020-11-21T01:43:19.000Z
1,1329963296854847492,https://t.co/OLZnCJq93Y,False,False,Twitter for iPhone,78420,20742,2020-11-21T01:42:14.000Z
2,1329963239170564098,https://t.co/cwOQLhQNFq,False,False,Twitter for iPhone,126359,32853,2020-11-21T01:42:00.000Z
3,1329871920607744001,RT @WhiteHouse: LIVE: President @realDonaldTru...,True,False,Twitter for iPhone,0,18564,2020-11-20T19:39:08.000Z
4,1329871776889925636,"...Why won’t they do it, and why are they so f...",False,False,Twitter for iPhone,152441,25486,2020-11-20T19:38:34.000Z


# We split the data into training, validation and test


In [6]:
# we create a Y with unknown value
data['sentiment'] = 'unknown'

In [7]:
from sklearn.model_selection import train_test_split

X = data['text']
y = data['sentiment']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.10, random_state=555)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.30, random_state=555)

# We use Azure Predictive Analysis to create the ground truth sentiment

In [8]:
import numpy as np

from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient

credential = AzureKeyCredential("cb61d607e5c8402b9742b8aa40207593")
text_analytics_client = TextAnalyticsClient(endpoint="https://trumptweetanalysissentiment.cognitiveservices.azure.com/", credential=credential)
    
def call_azure(list_text_only_ten_items):
    response = text_analytics_client.analyze_sentiment(list_text_only_ten_items)
    successful_responses = [doc for doc in response if not doc.is_error]
    return list_text_only_ten_items, list(map(lambda x: x['sentiment'], successful_responses))
    
# treat mixed and neutral sentiment as neutral
def combine_mixed_neutral(sentiments):
    converted = []
    for item in sentiments:
        newitem = 'neutral' if item == 'mixed' else item
        converted.append(newitem)
    return converted
           
# this is ground truth.  Using Azure sentiment
def calculate_groundtruth_sentiment(list_of_texts):
    sublists = np.split(np.array(list_of_texts.tolist()), list_of_texts.size / 10)
    retvalues = list(map(lambda ls: call_azure(list(ls)), sublists))
    sentiments = []
    for item in retvalues:
        sentiments.append(item[1])    
    sentiments = [item for items in sentiments for item in items]
    return list_of_texts, combine_mixed_neutral(sentiments)

In [9]:
X_test, Y_test_groundtruth = calculate_groundtruth_sentiment(X_test)


# Now that we have groundtruth sentiment labels on test tweets, we will then try various methods and train on the test data.

# 1. Let's try tweepy and TextBlob to add Sentiment to each Tweet


## B. Let's use a tweepy and TextBlob to add Sentiment to each Tweet

Two blogs that use tweepy and TextBlob can be found at 
- https://www.earthdatascience.org/courses/use-data-open-source-python/intro-to-apis/analyze-tweet-sentiment-in-python/
- https://medium.com/better-programming/twitter-sentiment-analysis-15d8892c0082



# C. We compute Sentiment of Tweet using TextBlob
- polarity is whether or not the tweet is positive, negative, or neutral ( scaled from 1 to -1 )

In [10]:
# We r going to use tweepy and TextBlob for tweets
import tweepy as tw
from textblob import TextBlob

# Create a function to get the subjectivity
def getSubjectivity(text):
    return TextBlob(text).sentiment.subjectivity

# Create a function to get the polarity
def getPolarity(text):
    return  TextBlob(text).sentiment.polarity

# We eliminate words less then 3 characters long and standardize all words to lowercase
def filter_words(words):
    words_filtered = [e.lower() for e in words.split() if len(e) >= 3]
    return words_filtered


# return neutral if small
def calculateSentiment(text, neutralCutoff = 0.05):
    polarity = getPolarity(text)
    if abs(polarity) < neutralCutoff:
        return 'neutral'
    if polarity > 0:
        return "positive"
    else:
        return "negative"
    

In [11]:
y_test = X_test.apply(calculateSentiment)

In [12]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

print(f'accuracy_score is {accuracy_score(Y_test_groundtruth, y_test)}')


accuracy_score is 0.6


# Here we get 0.6 accuracy by guessing the neutralCutoff = 0.05.  Let's try to see if we can do better by trying different neutralCutoffs

In [13]:
from numpy import arange

bestCutoff = 0.05
bestAccuracy = 0.6

for i in arange(0.0, 0.5, 0.02):
    y_test_i = X_test.apply(calculateSentiment, neutralCutoff=i)
    accuracy_i = accuracy_score(Y_test_groundtruth, y_test_i)
    if accuracy_i > bestAccuracy:
        bestAccuracy = accuracy_i
        bestCutoff = i

print(f'bestAccuracy is at {bestAccuracy} with neutralCutoff at {bestCutoff}')


bestAccuracy is at 0.67 with neutralCutoff at 0.22


In [14]:
# let's recalculate the y_test with best neutralCutoff at 0.22
y_test = X_test.apply(calculateSentiment, neutralCutoff=0.22)

In [15]:
print(f'confusion matrix is \n{confusion_matrix(Y_test_groundtruth, y_test)}')

confusion matrix is 
[[ 4 15  2]
 [ 2 48  4]
 [ 0 10 15]]


In [16]:
print(f'classification report is \n{classification_report(Y_test_groundtruth, y_test)}')

classification report is 
              precision    recall  f1-score   support

    negative       0.67      0.19      0.30        21
     neutral       0.66      0.89      0.76        54
    positive       0.71      0.60      0.65        25

    accuracy                           0.67       100
   macro avg       0.68      0.56      0.57       100
weighted avg       0.67      0.67      0.63       100



# 2. Let's try LogisticRegression

In [17]:
import pandas as pd

y = np.array([calculateSentiment(xi, neutralCutoff=0.22) for xi in X])

In [18]:
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(
    analyzer = 'word',
    lowercase = False,
)
features = vectorizer.fit_transform(
    data['text']
)
features_nd = features.toarray() # for easy usage

In [19]:
# https://www.twilio.com/blog/2017/12/sentiment-analysis-scikit-learn.html
from sklearn.linear_model import LogisticRegression
log_model = LogisticRegression()

In [20]:
from sklearn.model_selection import train_test_split

X_train_logistic, X_test_logistic, y_train_logistic, y_test_logistic  = train_test_split(
        features_nd, 
        y,
        train_size=0.90, 
        random_state=1234)

In [21]:
log_model = log_model.fit(X=X_train_logistic, y=y_train_logistic)

In [22]:
y_pred = log_model.predict(X_test_logistic)

In [23]:
print(f'accuracy_score is {accuracy_score(Y_test_groundtruth, y_pred)}')


accuracy_score is 0.46


In [24]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

    negative       0.00      0.00      0.00         6
     neutral       0.74      0.86      0.80        73
    positive       0.18      0.10      0.12        21

    accuracy                           0.65       100
   macro avg       0.31      0.32      0.31       100
weighted avg       0.58      0.65      0.61       100



# 3. We can try the NLTK toolkit

In [25]:
import nltk
nltk.download('punkt')

def format_sentence(sent):
    return({word: True for word in nltk.word_tokenize(sent)})

print(nltk.word_tokenize("The cat is very cute"))

def format_sentence_with_sentiment(sent):
    formatted = format_sentence(sent)
    sentiment = calculateSentiment(sent, neutralCutoff = 0.22)
    return [formatted, sentiment]

['The', 'cat', 'is', 'very', 'cute']


In [26]:
format_sentence_with_sentiment("Stars are great!")

[{'Stars': True, 'are': True, 'great': True, '!': True}, 'positive']

In [27]:
format_sentence_with_sentiment("Rats smell very bad")

[{'Rats': True, 'smell': True, 'very': True, 'bad': True}, 'negative']

In [28]:
nltk.download('punkt')

[nltk_data] Downloading package punkt to /Users/jc487/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [29]:
X

0                                https://t.co/YHscjY6G8t
1                                https://t.co/OLZnCJq93Y
2                                https://t.co/cwOQLhQNFq
3      RT @WhiteHouse: LIVE: President @realDonaldTru...
4      ...Why won’t they do it, and why are they so f...
                             ...                        
995    RT @chefjclark: Amen ⁦@bkirkland7⁩! https://t....
996      https://t.co/gsFSghkmdM https://t.co/zNoPFsTnn3
997    Big problems and discrepancies with Mail In Ba...
998    RT @realDonaldTrump: The Fake News Media is ri...
999    RT @realDonaldTrump: Joe Biden called me Georg...
Name: text, Length: 1000, dtype: object

In [30]:
X_nltk = X.apply(format_sentence_with_sentiment)

In [31]:
X_nltk.head()

0    [{'https': True, ':': True, '//t.co/YHscjY6G8t...
1    [{'https': True, ':': True, '//t.co/OLZnCJq93Y...
2    [{'https': True, ':': True, '//t.co/cwOQLhQNFq...
3    [{'RT': True, '@': True, 'WhiteHouse': True, '...
4    [{'...': True, 'Why': True, 'won': True, '’': ...
Name: text, dtype: object

In [32]:
training = X_nltk[:int((.9)*len(X_nltk))]
test =  X_nltk[int((.1)*len( X_nltk)):] 


In [33]:
from nltk.classify import NaiveBayesClassifier

classifier = NaiveBayesClassifier.train(training)
classifier.show_most_informative_features()

Most Informative Features
                     WIN = True           positi : neutra =     28.6 : 1.0
                    Fake = True           negati : positi =     26.1 : 1.0
                 Corrupt = True           negati : neutra =     25.0 : 1.0
                decision = True           negati : neutra =     25.0 : 1.0
                  things = True           negati : neutra =     25.0 : 1.0
                   Great = True           positi : neutra =     24.4 : 1.0
                   place = True           negati : neutra =     19.3 : 1.0
                       * = True           negati : neutra =     17.9 : 1.0
                  Andrew = True           negati : neutra =     17.9 : 1.0
                     FOR = True           negati : neutra =     17.9 : 1.0


In [34]:
example1 = "America is great"

print(classifier.classify(format_sentence(example1)))

positive


In [35]:
example1 = "Mail in ballots are fraudelent"

print(classifier.classify(format_sentence(example1)))

negative


In [36]:
from nltk.classify.util import accuracy
print(accuracy(classifier, test))

0.45555555555555555


## Let's see the accuracy vs Ground Truth

In [37]:

def nltk_predict(sent):
    return classifier.classify(format_sentence(sent))

y_pred_nltk = X_test.apply(nltk_predict)

In [38]:
print(f'accuracy_score is {accuracy_score(Y_test_groundtruth, y_pred_nltk)}')


accuracy_score is 0.51


In [39]:
print(classification_report(Y_test_groundtruth, y_pred_nltk))


              precision    recall  f1-score   support

    negative       0.28      0.81      0.42        21
     neutral       0.92      0.41      0.56        54
    positive       0.75      0.48      0.59        25

    accuracy                           0.51       100
   macro avg       0.65      0.57      0.52       100
weighted avg       0.74      0.51      0.54       100



# 4. We train a FaceBook FastText model to build a classifier

In [40]:
import fasttext

In [52]:
def create_fasttext_record(sent):
    sentiment = calculateSentiment(sent, neutralCutoff = 0.22)
    return f'__label__{sentiment} {sent}'

def write_fasttext_sentiment_to_file(sent, fileName):
    file = open(fileName, "a+")  
    record = create_fasttext_record(sent)
    file.write(f'{record}\n')
    file.close()


In [64]:
from myutils import clean_file


In [66]:
trainFile = "trumpsentiment.train"

clean_file(trainFile)
X_train.apply(write_fasttext_sentiment_to_file, fileName=trainFile)

96     None
471    None
312    None
449    None
882    None
       ... 
759    None
861    None
42     None
555    None
367    None
Name: text, Length: 630, dtype: object

In [67]:
validationFile = "trumpsentiment.valid"
clean_file(validationFile)
X_val.apply(write_fasttext_sentiment_to_file, fileName=validationFile)

875    None
377    None
412    None
274    None
763    None
       ... 
346    None
282    None
323    None
803    None
58     None
Name: text, Length: 270, dtype: object

In [75]:
testFile = "trumpsentiment.test"
clean_file(testFile)
X_test.apply(write_fasttext_sentiment_to_file, fileName=testFile)

357    None
81     None
436    None
657    None
654    None
       ... 
742    None
34     None
132    None
530    None
886    None
Name: text, Length: 100, dtype: object

In [68]:
len(X_train)

630

In [69]:
model = fasttext.train_supervised(input=trainFile)

In [72]:
model.predict("Trump will is awesome")

(('__label__neutral',), array([0.90248114]))

In [73]:
 model.test(validationFile)

(270, 0.6666666666666666, 0.6666666666666666)

In [76]:
 model.test(testFile)

(100, 0.73, 0.73)

## Performance on validation with 270 records, the precision is 0.666666 and the recall is 0.6666
## Performance on the test with 100 records, the precision is 0.73 and the recall is 0.73


## Hmm, let's try to do better with hyperparameter tuning.  FastText has various hyperparameters such as epochs, learning rate, and ngrams.  This made available to us via trial and error or we can try the autotune feature.


In [74]:
model_autotuned = fasttext.train_supervised(input=trainFile, autotuneValidationFile=validationFile)

In [77]:
model_autotuned.test(validationFile)

(270, 0.7518518518518519, 0.7518518518518519)

In [78]:
model_autotuned.test(testFile)

(100, 0.73, 0.73)

## So even though we got better performance on the validation file with precision/recall at 0.75, the performance was the same on the testFile at 0.73.

## Let's now calculate the accuracy against the test data so that we can compare against previous models

In [94]:
def fasttext_predict(sent, model):
    sent_without_linebreaks = sent.replace("\n", ".  ")
    print(f'sent_without_linebreaks is {sent_without_linebreaks}')
    sentiment_label = model.predict(sent_without_linebreaks)[0]
    fasttext_dict = {
      "__label__neutral": "neutral",
      "__label__negative": "negative",
      "__label__positive": "positive"
    }
    return fasttext_dict[sentiment_label[0]]

fasttext_predict("hello today is sunny", model)

sent_without_linebreaks is hello today is sunny


'neutral'

In [95]:
X_test

357    Pennsylvania Party Leadership votes are this w...
81     RT @Jim_Jordan: In Oregon, you can be jailed f...
436    RT @paulsperry_: ‘Blindsided’: GOP elections b...
657    RT @GOP: The GREAT @CoachLouHoltz88 joined @re...
654    Our numbers are looking VERY good all over. Sl...
                             ...                        
742    President Obama used to say that “if you wante...
34     “Evidence of voter fraud continues to grow, in...
132                      No way! https://t.co/Dwvb57mgMz
530                              https://t.co/JXkiePOLMb
886                              https://t.co/xgVRpP2Tjc
Name: text, Length: 100, dtype: object

In [96]:
y_pred_model = X_test.apply(fasttext_predict, model=model)

sent_without_linebreaks is Pennsylvania Party Leadership votes are this week. I hope they pick very tough and smart fighters. We will WIN!!
sent_without_linebreaks is RT @Jim_Jordan: In Oregon, you can be jailed for having too many people over for Thanksgiving. .  .  But if you want to riot and loot in Portla…
sent_without_linebreaks is RT @paulsperry_: ‘Blindsided’: GOP elections board members in NC resign in protest over irregular absentee ballots.  .  Democrats insisted on l…
sent_without_linebreaks is RT @GOP: The GREAT @CoachLouHoltz88 joined @realDonaldTrump on stage tonight in Butler, PA! https://t.co/M09iebjZBu
sent_without_linebreaks is Our numbers are looking VERY good all over. Sleepy Joe is already beginning to pull out of certain states. The Radical Left is going down!
sent_without_linebreaks is I LOVE TEXAS! https://t.co/EP7P3AvE8L
sent_without_linebreaks is Look at this in Michigan! A day AFTER the election, Biden receives a dump of 134,886 votes at 6:31AM! https://t.co/

In [98]:
print(f'accuracy_score is {accuracy_score(Y_test_groundtruth, y_pred_model)}')


accuracy_score is 0.54


In [99]:
print(classification_report(Y_test_groundtruth, y_pred_model))

              precision    recall  f1-score   support

    negative       0.00      0.00      0.00        21
     neutral       0.54      1.00      0.70        54
    positive       0.00      0.00      0.00        25

    accuracy                           0.54       100
   macro avg       0.18      0.33      0.23       100
weighted avg       0.29      0.54      0.38       100



  _warn_prf(average, modifier, msg_start, len(result))


### and now the autotuned fasttext model accuracy

In [100]:
y_pred_model_autotuned = X_test.apply(fasttext_predict, model=model_autotuned)

sent_without_linebreaks is Pennsylvania Party Leadership votes are this week. I hope they pick very tough and smart fighters. We will WIN!!
sent_without_linebreaks is RT @Jim_Jordan: In Oregon, you can be jailed for having too many people over for Thanksgiving. .  .  But if you want to riot and loot in Portla…
sent_without_linebreaks is RT @paulsperry_: ‘Blindsided’: GOP elections board members in NC resign in protest over irregular absentee ballots.  .  Democrats insisted on l…
sent_without_linebreaks is RT @GOP: The GREAT @CoachLouHoltz88 joined @realDonaldTrump on stage tonight in Butler, PA! https://t.co/M09iebjZBu
sent_without_linebreaks is Our numbers are looking VERY good all over. Sleepy Joe is already beginning to pull out of certain states. The Radical Left is going down!
sent_without_linebreaks is I LOVE TEXAS! https://t.co/EP7P3AvE8L
sent_without_linebreaks is Look at this in Michigan! A day AFTER the election, Biden receives a dump of 134,886 votes at 6:31AM! https://t.co/

In [101]:
print(f'accuracy_score is {accuracy_score(Y_test_groundtruth, y_pred_model_autotuned)}')


accuracy_score is 0.56


In [102]:
print(classification_report(Y_test_groundtruth, y_pred_model_autotuned))

              precision    recall  f1-score   support

    negative       0.00      0.00      0.00        21
     neutral       0.58      0.89      0.70        54
    positive       0.57      0.32      0.41        25

    accuracy                           0.56       100
   macro avg       0.38      0.40      0.37       100
weighted avg       0.46      0.56      0.48       100



### Just for kicks, let's see the hyper-parameters

In [109]:
def print_hyperparameters(model):
    args_obj = model.f.getArgs()
    for hparam in dir(args_obj):
        if not hparam.startswith('__'):
            print(f"{hparam} -> {getattr(args_obj, hparam)}")            

autotuneDuration -> 300
autotuneMetric -> f1
autotuneModelSize -> 
autotunePredictions -> 1
autotuneValidationFile -> 
bucket -> 0
cutoff -> 0
dim -> 100
dsub -> 2
epoch -> 5
input -> trumpsentiment.train
label -> __label__
loss -> loss_name.softmax
lr -> 0.1
lrUpdateRate -> 100
maxn -> 0
minCount -> 1
minCountLabel -> 0
minn -> 0
model -> model_name.supervised
neg -> 5
output -> 
pretrainedVectors -> 
qnorm -> False
qout -> False
retrain -> False
saveOutput -> False
seed -> 0
setManual -> <bound method PyCapsule.setManual of <fasttext_pybind.args object at 0x1a29db58f0>>
t -> 0.0001
thread -> 11
verbose -> 2
wordNgrams -> 1
ws -> 5


In [110]:
print_hyperparameters(model)


autotuneDuration -> 300
autotuneMetric -> f1
autotuneModelSize -> 
autotunePredictions -> 1
autotuneValidationFile -> 
bucket -> 0
cutoff -> 0
dim -> 100
dsub -> 2
epoch -> 5
input -> trumpsentiment.train
label -> __label__
loss -> loss_name.softmax
lr -> 0.1
lrUpdateRate -> 100
maxn -> 0
minCount -> 1
minCountLabel -> 0
minn -> 0
model -> model_name.supervised
neg -> 5
output -> 
pretrainedVectors -> 
qnorm -> False
qout -> False
retrain -> False
saveOutput -> False
seed -> 0
setManual -> <bound method PyCapsule.setManual of <fasttext_pybind.args object at 0x1a29db53e8>>
t -> 0.0001
thread -> 11
verbose -> 2
wordNgrams -> 1
ws -> 5


In [111]:
print_hyperparameters(model_autotuned)


autotuneDuration -> 300
autotuneMetric -> f1
autotuneModelSize -> 
autotunePredictions -> 1
autotuneValidationFile -> trumpsentiment.valid
bucket -> 601846
cutoff -> 0
dim -> 189
dsub -> 2
epoch -> 58
input -> trumpsentiment.train
label -> __label__
loss -> loss_name.softmax
lr -> 5.0
lrUpdateRate -> 100
maxn -> 6
minCount -> 1
minCountLabel -> 0
minn -> 3
model -> model_name.supervised
neg -> 5
output -> 
pretrainedVectors -> 
qnorm -> False
qout -> False
retrain -> False
saveOutput -> False
seed -> 0
setManual -> <bound method PyCapsule.setManual of <fasttext_pybind.args object at 0x1a29db5308>>
t -> 0.0001
thread -> 11
verbose -> 2
wordNgrams -> 2
ws -> 5


# For future work, we could try to clean the twitter text to get better accuracy and also maybe section off tweets by timeframes such as pre-covid19, covid19, pre-voting-day, post-voting-day


In [177]:
from myutils import clean_twitter_text