# Train A Sentiment Classifier

The Yelp dataset is generated from the [Yelp academic download](https://www.yelp.com/dataset/download). The lesson is derived from this example in [Textblob's documentation](https://textblob.readthedocs.io/en/dev/classifiers.html#classifiers).

In [1]:
import pandas as pd
from textblob.classifiers import NaiveBayesClassifier

In [2]:
 import nltk
 nltk.download('punkt_tab') # we need to do this once for the tokenizer

[nltk_data] Downloading package punkt_tab to
[nltk_data]     /Users/maximillianrivera/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


True

In [3]:
adams = pd.read_csv('small_Adams.csv')

In [4]:
adams.head(5)

Unnamed: 0,Adams,Date
0,"Secure our borders, address the people who are...",12/3/24
1,And the federal government made me take six po...,12/3/24
2,"Those who are here committing crimes, robberie...",12/3/24
3,"Okay, let me find that sentence. President Bid...",12/3/24
4,Our city remains committed to protecting and a...,11/6/24


In [5]:
# Saving this for combining our CSVs!
train = [
    ("Secure our borders, address the people who are committing violent acts in our country and make sure that we have our citizens— are going to be safe. That's where I am. ", "rep"),
    ("And the federal government made me take six point four billion dollars out of providing these services that we should– We all should be angry at what happened to our city under this administration.", "rep"),
    ("Those who are here committing crimes, robberies, shooting at police officers, raping innocent people have been a harm to our country. I want to sit down and hear the plan on how we're going to address them. Those are the people I am talking about. And I would love to sit down with the border czar and hear his thoughts on how we're going to address those who are harming our citizens.", "rep"),
    ("Okay, let me find that sentence. President Biden and President-elect Donald Trump now agree on one thing. The Biden Justice Department has been politicized. Does that sound familiar?", "rep"),
    ("And while we will always respect and protect the right to peaceful protest, there will be zero tolerance for crime, blocking traffic, graffiti or disorderly behavior. And months ago I made it clear that those engaging in political battles need to take down the temperature and I am renewing that call today.", "rep"),
    ("Our city remains committed to protecting and advancing women's health care, including access to abortion care.", "dem"),
    ("I know they're committed, dedicated men and women that work over at ACS. And every day they're making these tough decisions on making sure these calls are right.", "dem"),
    ("To tackle these issues, we are launching a multi-agency operation that brings together more than a dozen city agencies with state partners to make sure crime and quality-of-life issues are addressed. Our administration has a clear mission: to make New York a safer, more affordable city, and we will not rest until we have accomplished that mission.", "dem"),
    ("Everything from cleanliness over some of the trash bins was spilling over with garbage. So we wanted a holistic, multifaceted approach with the police and the Department of Sanitation and other partners that are going to talk to the young sex workers and try to get them the services that they need. ", "dem"),
    ("Chauncey is a lifelong public servant who has spent his career working at the city, state, and federal levels building bridges between law enforcement and communities across the state. We are safer, stronger, and better connected thanks to Chauncey’s service to our city, and I am thrilled to have him and Mona Suazo take the lead on our administration’s public safety portfolio through the next successful chapter.", "dem"),
]
test = [
    ("We need to slow down the migrants.", "rep"),
    ("Criminals don't belong in our city", "rep"),
    ("We are a sanctuary city for all", "dem"),
    ("I endorsed Kamala", "dem"),
    ("We have to get rid of the rats", "rep"),
    ("I stand with communities of color", "dem"),
]

In [6]:
pip install textblob

Note: you may need to restart the kernel to use updated packages.


## Train Our Classifier

In [9]:
import nltk
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')


[nltk_data] Downloading package punkt to
[nltk_data]     /Users/maximillianrivera/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/maximillianrivera/nltk_data...
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/maximillianrivera/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


True

In [10]:
cl = NaiveBayesClassifier(train)

Let's see what's driving the model:

In [11]:
cl.show_informative_features(5)

Most Informative Features
           contains(And) = True              rep : dem    =      2.3 : 1.0
             contains(a) = True              dem : rep    =      2.3 : 1.0
            contains(am) = True              rep : dem    =      2.3 : 1.0
           contains(are) = False             rep : dem    =      2.3 : 1.0
          contains(city) = True              dem : rep    =      2.3 : 1.0


## Remember Our Accuracy Metric?

In [12]:
cl.accuracy(test)

0.8333333333333334

## Time to test the Adams-Meter

In [13]:
cl.classify("Our administration was clear and laser focused on that. When you look at the over $80 million worth of illegal products that were removed off our streets, you see that those $80 million that we insure illegal profits are not made.")

'dem'

In [14]:
prob_dist = cl.prob_classify("So today we're saying goodbye and good riddance to products that endanger our children and undermine our quality of life. ") # this also shows us the parts

In [30]:
prob_dist.prob('dem'), prob_dist.prob('rep')

(0.26630434782608825, 0.733695652173912)

In [None]:
##ADAMS-METER
user_input = input("What did Eric Adams say? ")
analysis = cl.classify(user_input)

print(f"Party line: {analysis}")


---

---

---