# Notebook to sift through classified sentiment

This notebook is intended to automate the selection of tweets to build our training and test dataset.
Each of the 4 team members were assigned a large number of tweets to produce the following sets of classified tweets:
* 100 anti-gun
* 100 pro-gun
* 50 neutral

Between the four of us, we'll produce a final set of 1,000 tweets:
* 400 anti-gun
* 400 pro-gun
* 200 neutral

At the end of this notebook, 4 distinct CSV files will be exported.
One team member will then merge all four files into one the training/test dataset.


In [1]:
! jupyter --version

Selected Jupyter core packages...
IPython          : 7.31.1
ipykernel        : 6.15.2
ipywidgets       : 7.6.5
jupyter_client   : 7.3.4
jupyter_core     : 4.11.1
jupyter_server   : 1.18.1
jupyterlab       : 3.4.4
nbclient         : 0.5.13
nbconvert        : 6.4.4
nbformat         : 5.5.0
notebook         : 6.4.12
qtconsole        : 5.3.2
traitlets        : 5.1.1


In [2]:
# Dependencies
import pandas as pd
import numpy as np


In [3]:
# Read CSV into Pandas DataFrame
# raw_classification_df = pd.read_csv('../classification files/dana_sentiment_analysis.csv')
raw_classification_df = pd.read_csv('../../contributors/david/david_sentiment_analysis.csv')
# raw_classification_df = pd.read_csv('../classification files/keerti_sentiment_analysis.csv')
# raw_classification_df = pd.read_csv('../classification files/kevin_sentiment_analysis.csv')

raw_classification_df.head()


Unnamed: 0,tweet_id,full_text,sentiment
0,1587817360164999168,@twk_5 @davidhogg111 Good question. The guns a...,anti-gun
1,1587817358550188032,Second Amendment Sanctuary City will arrest an...,pro-gun
2,1587817321057456128,@NikaOneDay @thegreatunkn @obiwill_kenobi @Tul...,anti-gun
3,1587817259828883456,"@madandmatt @philosophyfanex @NRA Oh geee, her...",neutral
4,1587817169605185536,Just…read this. \nhttps://t.co/TfKqT2nNZI\n\n@...,anti-gun


In [4]:
# Extract tweets by sentiment classification

anti_df = raw_classification_df.loc[raw_classification_df['sentiment'] == 'anti-gun']
print(len(anti_df))

pro_df = raw_classification_df.loc[raw_classification_df['sentiment'] == 'pro-gun']
print(len(pro_df))

neutral_df = raw_classification_df.loc[raw_classification_df['sentiment'] == 'neutral']
print(len(neutral_df))

throw_df = raw_classification_df.loc[raw_classification_df['sentiment'] == 'throw-out']
print(len(throw_df))

# Confirm all identified rows equal to rows classifed on csv
print((len(anti_df))+(len(pro_df))+(len(neutral_df))+(len(throw_df)))


238
122
532
8
900


To avoid any confusion or in case we may need to add more tweets, only the top rows will be extracted into the new DataFrame.
It was discussed we could use a randomizer to pick which tweets would go into our final dataset, but ultimately, the team agreed on making sure we kept it simple but consistent.


In [5]:
frames = [anti_df.head(100), pro_df.head(100), neutral_df.head(50)]
clean_classification_df = pd.concat(frames)
clean_classification_df


Unnamed: 0,tweet_id,full_text,sentiment
0,1587817360164999168,@twk_5 @davidhogg111 Good question. The guns a...,anti-gun
2,1587817321057456128,@NikaOneDay @thegreatunkn @obiwill_kenobi @Tul...,anti-gun
4,1587817169605185536,Just…read this. \nhttps://t.co/TfKqT2nNZI\n\n@...,anti-gun
14,1587817006258335744,@TomCottonAR Are you suggesting more guns like...,anti-gun
15,1587816908283269120,@GhostofTST Disagreed! You can have sensible g...,anti-gun
...,...,...,...
86,1587814541693521922,@Jim_Jordan that would mean more gun control &...,neutral
87,1587814521502056451,"@DrOz So Ozzie, in 2019, not so long ago, you ...",neutral
88,1587814518243135490,@SteveDeaceShow I'm guessing it will be illega...,neutral
89,1587814498760790017,@Missy10013Kathy @AbbottCampaign @GregAbbott_T...,neutral


In [6]:
# Output clean_classification_df to CSV file

# clean_classification_df.to_csv(r'../../contributors/dana/dana_classification.csv', 
#                                index = False)
clean_classification_df.to_csv(r'../../contributors/david/david_classification.csv', 
                               index = False)
# clean_classification_df.to_csv(r'../../contributors/keerti/keerti_classification.csv', 
#                                index = False)
# clean_classification_df.to_csv(r../../contributors/kevin/kevin_classification.csv', 
#                                index = False)
