# Finding Fact-checkable Tweets with Machine Learning


## Credits

This notebook was based on one originally created by Jeremy Howard and the other folks at [fast.ai](https://fast.ai) as part of [this fantastic class](https://course.fast.ai/). Specifically, it comes from Lesson 4. You can [see the lession video](https://course.fast.ai/videos/?lesson=4) and [the original class notebook](https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson3-imdb.ipynb). 

The idea for the project came from Dan Keemahill at the Austin American-Statesman newspaper. Dan, Madlin Mekelburg, and others at the paper hand-coded the tweets used for the classificaiton training.

For more information about this project, and details about how to use this work in the wild, check out our [Quartz AI Studio blog post about the checkable-tweets project](https://qz.ai/?p=89).

-- John Keefe, [Quartz](https://qz.com), October 2019

## Setup

### For those using Google Colaboratory ...

Be aware that Google Colab instances are ephemeral -- they vanish *Poof* when you close them, or after a period of sitting idle (currently 90 minutes), or if you use one for more than 12 hours.

If you're using Google Colaboratory, you don't need a GPU for this notebook.

Just run this cell:

In [0]:
## ALL GOOGLE COLAB USERS RUN THIS CELL

## This runs a script that installs fast.ai
!curl -s https://course.fast.ai/setup/colab | bash

Updating fastai...
Done.


### For those _not_ using Google Colaboratory ...

This section is just for people who decide to use one of the notebooks on a system other than Google Colaboartory. 

Those people should run the cell below.

In [0]:
## NON-COLABORATORY USERS SHOULD RUN THIS CELL
%reload_ext autoreload
%autoreload 2
%matplotlib inline

### Everybody do this ...

Everyone needs to run the next cell, which initializes the Python libraries we'll use in this notebook.

In [0]:
## AND *EVERYBODY* SHOULD RUN THIS CELL
import warnings
warnings.filterwarnings('ignore')
from fastai.text import *
import fastai
print(f'fastai: {fastai.__version__}')
print(f'cuda: {torch.cuda.is_available()}')

fastai: 1.0.61
cuda: False


## The Data

We're going to be using two sets of tweets for this project:

- A CSV (comma-separated values file) containing a bunch of #txlege tweets
- A CSV of #txlege tweets that have been hand-coded as "fact-checkable" or "not fact-checkable"


In [0]:
# Run this cell to download the data we'll use for this exercise
# !wget -N https://s3.amazonaws.com/media.johnkeefe.net/newmark-investigations/unclassified_tweets.zip --quiet
# !unzip -q unclassified_tweets.zip

# !wget -N https://s3.amazonaws.com/media.johnkeefe.net/newmark-investigations/TWITTER_CSV_EXPORTS.zip --quiet
# !unzip -q TWITTER_CSV_EXPORTS.zip

!wget -N https://www.dropbox.com/s/rwt23g38d2krdae/output.zip
!unzip -q output.zip

print('Done!')

--2020-05-11 14:50:34--  https://www.dropbox.com/s/rwt23g38d2krdae/output.zip
Resolving www.dropbox.com (www.dropbox.com)... 162.125.9.1, 2620:100:601f:1::a27d:901
Connecting to www.dropbox.com (www.dropbox.com)|162.125.9.1|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /s/raw/rwt23g38d2krdae/output.zip [following]
--2020-05-11 14:50:34--  https://www.dropbox.com/s/raw/rwt23g38d2krdae/output.zip
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc77def0a5aab646008e600ec29c.dl.dropboxusercontent.com/cd/0/inline/A3gb66hI0EFL1kW8EKLfjFNVTMxVQ2QJQLcCtMW6Sdensq_MGVf08wbXSagVFNIbvd3YgI0puQN8xfzsgId2IKb6HHHuEl9-QMvX7dBnDNp_Bw/file# [following]
--2020-05-11 14:50:34--  https://uc77def0a5aab646008e600ec29c.dl.dropboxusercontent.com/cd/0/inline/A3gb66hI0EFL1kW8EKLfjFNVTMxVQ2QJQLcCtMW6Sdensq_MGVf08wbXSagVFNIbvd3YgI0puQN8xfzsgId2IKb6HHHuEl9-QMvX7dBnDNp_Bw/file
Resolving uc77def0a5aab64

Let's take a look.

In [0]:
%ls

[0m[01;36mdata[0m@  [01;36mmodels[0m@  [01;34moutput[0m/  output.zip


In [0]:
import glob
import pandas as pd

# file_list = sorted(glob.glob('./unclassified_tweets/*.csv'))
file_list = sorted(glob.glob('./output/*.csv'))
# file_list2 = sorted(glob.glob('./unclassified_tweets/*.csv'))
# file_list.extend(file_list2)
file_list

['./output/A. Donald McEachin.csv',
 './output/A. Drew Ferguson IV.csv',
 './output/Abby Finkenauer.csv',
 './output/Abigail Davis Spanberger.csv',
 './output/Adam B. Schiff.csv',
 './output/Adam Kinzinger.csv',
 './output/Adam Smith.csv',
 './output/Adrian Smith.csv',
 './output/Adriano Espaillat.csv',
 './output/Al Green.csv',
 './output/Al Lawson, Jr..csv',
 './output/Alan S. Lowenthal.csv',
 './output/Albio Sires.csv',
 './output/Alcee L. Hastings.csv',
 './output/Alexander X. Mooney.csv',
 './output/Alexandria Ocasio-Cortez.csv',
 './output/Alma S. Adams.csv',
 './output/Ami Bera.csv',
 './output/Amy Klobuchar.csv',
 './output/André Carson.csv',
 './output/Andy Barr.csv',
 './output/Andy Biggs.csv',
 './output/Andy Harris.csv',
 './output/Andy Kim.csv',
 './output/Andy Levin.csv',
 './output/Angie Craig.csv',
 './output/Angus S. King, Jr..csv',
 './output/Ann Kirkpatrick.csv',
 './output/Ann M. Kuster.csv',
 './output/Ann Wagner.csv',
 './output/Anna G. Eshoo.csv',
 './output/Anth

In [0]:
!head './output/Yvette D. Clarke.csv'

tweet,dates
"Every May, we recognize our incredible mothers. We are indebted to their sacrifice, inspired by their wisdom and emboldened by their love.
 
This #MothersDay, let us remember to hold our loved ones extra close during this difficult time. https://t.co/IBLSQn1Jyw",2020-05-10 14:03:36
"A Saturday well-spent delivering masks to constituents in #Brownsville. Thanks for helping our community #flattenthecurve, #NY09! 💪🏾✨ https://t.co/D4qn979vzJ",2020-05-09 21:09:56
"In an ideal world, we'd honor teachers 7 days a week, 365 days a year for the enormous impact they have on our children. Too often, they’re the one person a child living in difficult circumstances can count on for support and an encouraging word. Happy #TeacherAppreciationWeek ✨",2020-05-08 18:15:45
Join me and @RepSusanWild tonight for this important conversation on #racial health disparities within #COVID19 .  Register here: https://t.co/IWEQmHAwqA https://t.co/0tjItGgyqh,2020-05-07 20:57:25
"This global pandem

In [0]:
# recovering after crash:

file_list = ['./output/Angie Craig.csv',
 './output/Angus S. King, Jr..csv',
 './output/Ann Kirkpatrick.csv',
 './output/Ann M. Kuster.csv',
 './output/Ann Wagner.csv',
 './output/Anna G. Eshoo.csv',
 './output/Anthony Brindisi.csv',
 './output/Anthony G. Brown.csv',
 './output/Anthony Gonzalez.csv',
 './output/Antonio Delgado.csv',
 './output/Aumua Amata Coleman Radewagen.csv',
 './output/Austin Scott.csv',
 './output/Ayanna Pressley.csv',
 './output/Barbara Lee.csv',
 './output/Barry Loudermilk.csv',
 './output/Ben Cline.csv',
 './output/Ben McAdams.csv',
 './output/Ben Ray Luján.csv',
 './output/Ben Sasse.csv',
 './output/Benjamin L. Cardin.csv',
 './output/Bennie G. Thompson.csv',
 './output/Bernard Sanders.csv',
 './output/Betty McCollum.csv',
 './output/Bill Cassidy.csv',
 './output/Bill Flores.csv',
 './output/Bill Foster.csv',
 './output/Bill Huizenga.csv',
 './output/Bill Johnson.csv',
 './output/Bill Pascrell, Jr..csv',
 './output/Bill Posey.csv',
 './output/Billy Long.csv',
 './output/Blaine Luetkemeyer.csv',
 './output/Bob Gibbs.csv',
 './output/Bobby L. Rush.csv',
 './output/Bonnie Watson Coleman.csv',
 './output/Brad R. Wenstrup.csv',
 './output/Brad Sherman.csv',
 './output/Bradley Byrne.csv',
 './output/Bradley Scott Schneider.csv',
 './output/Brenda L. Lawrence.csv',
 './output/Brendan F. Boyle.csv',
 './output/Brett Guthrie.csv',
 './output/Brian Babin.csv',
 './output/Brian Higgins.csv',
 './output/Brian J. Mast.csv',
 './output/Brian K. Fitzpatrick.csv',
 './output/Brian Schatz.csv',
 './output/Bruce Westerman.csv',
 './output/Bryan Steil.csv',
 './output/C. A. Dutch Ruppersberger.csv',
 './output/Carol D. Miller.csv',
 './output/Carolyn B. Maloney.csv',
 './output/Catherine Cortez Masto.csv',
 './output/Cathy McMorris Rodgers.csv',
 './output/Cedric L. Richmond.csv',
 './output/Charles E. Schumer.csv',
 './output/Charles J. "Chuck" Fleischmann.csv',
 './output/Charlie Crist.csv',
 './output/Chellie Pingree.csv',
 './output/Cheri Bustos.csv',
 './output/Chip Roy.csv',
 './output/Chris Pappas.csv',
 './output/Chris Stewart.csv',
 './output/Chris Van Hollen.csv',
 './output/Chrissy Houlahan.csv',
 './output/Christopher A. Coons.csv',
 './output/Christopher H. Smith.csv',
 './output/Christopher Murphy.csv',
 './output/Chuck Grassley.csv',
 './output/Cindy Hyde-Smith.csv',
 './output/Clay Higgins.csv',
 './output/Colin Z. Allred.csv',
 './output/Conor Lamb.csv',
 './output/Cory A. Booker.csv',
 './output/Cory Gardner.csv',
 './output/Cynthia Axne.csv',
 './output/Dan Crenshaw.csv',
 './output/Dan Newhouse.csv',
 './output/Dan Sullivan.csv',
 './output/Daniel Lipinski.csv',
 './output/Daniel Meuser.csv',
 './output/Daniel T. Kildee.csv',
 './output/Daniel Webster.csv',
 './output/Danny K. Davis.csv',
 './output/Darin LaHood.csv',
 './output/Darren Soto.csv',
 './output/David B. McKinley.csv',
 './output/David E. Price.csv',
 './output/David J. Trone.csv',
 './output/David Kustoff.csv',
 './output/David Loebsack.csv',
 './output/David N. Cicilline.csv',
 './output/David P. Joyce.csv',
 './output/David P. Roe.csv',
 './output/David Perdue.csv',
 './output/David Rouzer.csv',
 './output/David Schweikert.csv',
 './output/David Scott.csv',
 './output/Dean Phillips.csv',
 './output/Deb Fischer.csv',
 './output/Debbie Dingell.csv',
 './output/Debbie Lesko.csv',
 './output/Debbie Mucarsel-Powell.csv',
 './output/Debbie Stabenow.csv',
 './output/Debbie Wasserman Schultz.csv',
 './output/Debra A. Haaland.csv',
 './output/Denny Heck.csv',
 './output/Denver Riggleman.csv',
 './output/Derek Kilmer.csv',
 './output/Devin Nunes.csv',
 './output/Diana DeGette.csv',
 './output/Dianne Feinstein.csv',
 './output/Dina Titus.csv',
 './output/Don Bacon.csv',
 './output/Don Young.csv',
 './output/Donald M. Payne, Jr..csv',
 './output/Donald Norcross.csv',
 './output/Donald S. Beyer, Jr..csv',
 './output/Donna E. Shalala.csv',
 './output/Doris O. Matsui.csv',
 './output/Doug Collins.csv',
 './output/Doug Jones.csv',
 './output/Doug LaMalfa.csv',
 './output/Doug Lamborn.csv',
 './output/Dusty Johnson.csv',
 './output/Dwight Evans.csv',
 './output/Earl Blumenauer.csv',
 './output/Earl L. "Buddy" Carter.csv',
 './output/Ed Case.csv',
 './output/Ed Perlmutter.csv',
 './output/Eddie Bernice Johnson.csv',
 './output/Edward J. Markey.csv',
 './output/Elaine G. Luria.csv',
 './output/Eleanor Holmes Norton.csv',
 './output/Eliot L. Engel.csv',
 './output/Elise M. Stefanik.csv',
 './output/Elissa Slotkin.csv',
 './output/Elizabeth Warren.csv',
 './output/Emanuel Cleaver.csv',
 './output/Eric A. "Rick" Crawford.csv',
 './output/Eric Swalwell.csv',
 './output/F. James Sensenbrenner, Jr..csv',
 './output/Filemon Vela.csv',
 './output/Francis Rooney.csv',
 './output/Frank D. Lucas.csv',
 './output/Frank Pallone, Jr..csv',
 './output/Fred Keller.csv',
 './output/Fred Upton.csv',
 './output/Frederica S. Wilson.csv',
 './output/G. K. Butterfield.csv',
 './output/Garret Graves.csv',
 './output/Gary C. Peters.csv',
 './output/Gary J. Palmer.csv',
 './output/George Holding.csv',
 './output/Gerald E. Connolly.csv',
 './output/Gilbert Ray Cisneros, Jr..csv',
 './output/Glenn Grothman.csv',
 './output/Glenn Thompson.csv',
 './output/Grace F. Napolitano.csv',
 './output/Grace Meng.csv',
 './output/Greg Pence.csv',
 './output/Greg Stanton.csv',
 './output/Greg Walden.csv',
 './output/Gregorio Kilili Camacho Sablan.csv',
 './output/Gregory W. Meeks.csv',
 './output/Gus M. Bilirakis.csv',
 './output/Guy Reschenthaler.csv',
 './output/Gwen Moore.csv',
 './output/H. Morgan Griffith.csv',
 './output/Hakeem S. Jeffries.csv',
 './output/Haley M. Stevens.csv',
 './output/Harley Rouda.csv',
 './output/Harold Rogers.csv',
 './output/Henry C. "Hank" Johnson, Jr..csv',
 './output/Henry Cuellar.csv',
 './output/Ilhan Omar.csv',
 './output/J. French Hill.csv',
 './output/J. Luis Correa.csv',
 './output/Jack Bergman.csv',
 './output/Jack Reed.csv',
 './output/Jackie Speier.csv',
 './output/Jackie Walorski.csv',
 './output/Jacky Rosen.csv',
 './output/Jahana Hayes.csv',
 './output/Jaime Herrera Beutler.csv',
 './output/James A. Himes.csv',
 './output/James E. Clyburn.csv',
 './output/James E. Risch.csv',
 './output/James Lankford.csv',
 './output/James M. Inhofe.csv',
 './output/James P. McGovern.csv',
 './output/James R. Baird.csv',
 './output/James R. Langevin.csv',
 './output/Jamie Raskin.csv',
 './output/Janice D. Schakowsky.csv',
 './output/Jared F. Golden.csv',
 './output/Jared Huffman.csv',
 './output/Jason Crow.csv',
 './output/Jason Smith.csv',
 './output/Jeanne Shaheen.csv',
 './output/Jeff Duncan.csv',
 './output/Jeff Fortenberry.csv',
 './output/Jeff Merkley.csv',
 './output/Jefferson Van Drew.csv',
 './output/Jennifer Wexton.csv',
 './output/Jenniffer González-Colón.csv',
 './output/Jerrold Nadler.csv',
 './output/Jerry McNerney.csv',
 './output/Jerry Moran.csv',
 './output/Jesús G. "Chuy" García.csv',
 './output/Jim Banks.csv',
 './output/Jim Cooper.csv',
 './output/Jim Costa.csv',
 './output/Jim Hagedorn.csv',
 './output/Jim Jordan.csv',
 './output/Jimmy Gomez.csv',
 './output/Jimmy Panetta.csv',
 './output/Joaquin Castro.csv',
 './output/Jodey C. Arrington.csv',
 './output/Jody B. Hice.csv',
 './output/Joe Courtney.csv',
 './output/Joe Cunningham.csv',
 './output/Joe Manchin, III.csv',
 './output/Joe Neguse.csv',
 './output/Joe Wilson.csv',
 './output/John A. Yarmuth.csv',
 './output/John B. Larson.csv',
 './output/John Barrasso.csv',
 './output/John Boozman.csv',
 './output/John Cornyn.csv',
 './output/John Garamendi.csv',
 './output/John H. Rutherford.csv',
 './output/John Hoeven.csv',
 './output/John Joyce.csv',
 './output/John Katko.csv',
 './output/John Kennedy.csv',
 './output/John Lewis.csv',
 './output/John P. Sarbanes.csv',
 './output/John R. Carter.csv',
 './output/John R. Curtis.csv',
 './output/John R. Moolenaar.csv',
 './output/John Ratcliffe.csv',
 './output/John Shimkus.csv',
 './output/John Thune.csv',
 './output/John W. Rose.csv',
 './output/Jon Tester.csv',
 './output/Joni Ernst.csv',
 './output/Joseph D. Morelle.csv',
 './output/Joseph P. Kennedy III.csv',
 './output/Josh Gottheimer.csv',
 './output/Josh Harder.csv',
 './output/Josh Hawley.csv',
 './output/José E. Serrano.csv',
 './output/Joyce Beatty.csv',
 './output/Juan Vargas.csv',
 './output/Judy Chu.csv',
 './output/Julia Brownley.csv',
 './output/K. Michael Conaway.csv',
 './output/Kamala D. Harris.csv',
 './output/Karen Bass.csv',
 './output/Katherine M. Clark.csv',
 './output/Kathleen M. Rice.csv',
 './output/Kathy Castor.csv',
 './output/Katie Porter.csv',
 './output/Kay Granger.csv',
 './output/Kelly Armstrong.csv',
 './output/Kelly Loeffler.csv',
 './output/Ken Buck.csv',
 './output/Ken Calvert.csv',
 './output/Kendra S. Horn.csv',
 './output/Kenny Marchant.csv',
 './output/Kevin Brady.csv',
 './output/Kevin Cramer.csv',
 './output/Kevin Hern.csv',
 './output/Kevin McCarthy.csv',
 './output/Kim Schrier.csv',
 './output/Kirsten E. Gillibrand.csv',
 './output/Kurt Schrader.csv',
 './output/Kyrsten Sinema.csv',
 './output/Lamar Alexander.csv',
 './output/Larry Bucshon.csv',
 './output/Lauren Underwood.csv',
 './output/Lee M. Zeldin.csv',
 './output/Linda T. Sánchez.csv',
 './output/Lindsey Graham.csv',
 './output/Lisa Blunt Rochester.csv',
 './output/Lisa Murkowski.csv',
 './output/Liz Cheney.csv',
 './output/Lizzie Fletcher.csv',
 './output/Lloyd Doggett.csv',
 './output/Lloyd Smucker.csv',
 './output/Lois Frankel.csv',
 './output/Lori Trahan.csv',
 './output/Louie Gohmert.csv',
 './output/Lucille Roybal-Allard.csv',
 './output/Lucy McBath.csv',
 './output/Mac Thornberry.csv',
 './output/Madeleine Dean.csv',
 './output/Marc A. Veasey.csv',
 './output/Marcia L. Fudge.csv',
 './output/Marco Rubio.csv',
 './output/Marcy Kaptur.csv',
 './output/Margaret Wood Hassan.csv',
 './output/Maria Cantwell.csv',
 './output/Mario Diaz-Balart.csv',
 './output/Mark DeSaulnier.csv',
 './output/Mark E. Amodei.csv',
 './output/Mark E. Green.csv',
 './output/Mark Pocan.csv',
 './output/Mark R. Warner.csv',
 './output/Mark Takano.csv',
 './output/Mark Walker.csv',
 './output/Markwayne Mullin.csv',
 './output/Marsha Blackburn.csv',
 './output/Martha McSally.csv',
 './output/Martha Roby.csv',
 './output/Martin Heinrich.csv',
 './output/Mary Gay Scanlon.csv',
 './output/Matt Cartwright.csv',
 './output/Matt Gaetz.csv',
 './output/Max Rose.csv',
 './output/Maxine Waters.csv',
 './output/Mazie K. Hirono.csv',
 './output/Michael B. Enzi.csv',
 './output/Michael C. Burgess.csv',
 './output/Michael Cloud.csv',
 './output/Michael F. Bennet.csv',
 './output/Michael F. Doyle.csv',
 './output/Michael F. Q. San Nicolas.csv',
 './output/Michael Guest.csv',
 './output/Michael K. Simpson.csv',
 './output/Michael R. Turner.csv',
 './output/Michael T. McCaul.csv',
 './output/Michael Waltz.csv',
 './output/Mike Bost.csv',
 './output/Mike Braun.csv',
 './output/Mike Crapo.csv',
 './output/Mike Gallagher.csv',
 './output/Mike Johnson.csv',
 './output/Mike Kelly.csv',
 './output/Mike Lee.csv',
 './output/Mike Levin.csv',
 './output/Mike Quigley.csv',
 './output/Mike Rogers.csv',
 './output/Mike Rounds.csv',
 './output/Mike Thompson.csv',
 './output/Mikie Sherrill.csv',
 './output/Mitch McConnell.csv',
 './output/Mitt Romney.csv',
 './output/Mo Brooks.csv',
 './output/Nancy Pelosi.csv',
 './output/Nanette Diaz Barragán.csv',
 './output/Neal P. Dunn.csv',
 './output/Nita M. Lowey.csv',
 './output/Norma J. Torres.csv',
 './output/Nydia M. Velázquez.csv',
 './output/Pat Roberts.csv',
 './output/Patrick J. Leahy.csv',
 './output/Patrick J. Toomey.csv',
 './output/Patrick T. McHenry.csv',
 './output/Patty Murray.csv',
 './output/Paul A. Gosar.csv',
 './output/Paul Cook.csv',
 './output/Paul Mitchell.csv',
 './output/Paul Tonko.csv',
 './output/Pete Aguilar.csv',
 './output/Pete Olson.csv',
 './output/Pete Stauber.csv',
 './output/Peter A. DeFazio.csv',
 './output/Peter J. Visclosky.csv',
 './output/Peter T. King.csv',
 './output/Peter Welch.csv',
 './output/Pramila Jayapal.csv',
 './output/Raja Krishnamoorthi.csv',
 './output/Ralph Lee Abraham.csv',
 './output/Ralph Norman.csv',
 './output/Rand Paul.csv',
 './output/Randy K. Weber, Sr..csv',
 './output/Rashida Tlaib.csv',
 './output/Raul Ruiz.csv',
 './output/Raúl M. Grijalva.csv',
 './output/Richard Blumenthal.csv',
 './output/Richard Burr.csv',
 './output/Richard C. Shelby.csv',
 './output/Richard E. Neal.csv',
 './output/Richard Hudson.csv',
 './output/Richard J. Durbin.csv',
 './output/Rick Larsen.csv',
 './output/Rick Scott.csv',
 './output/Rick W. Allen.csv',
 './output/Ro Khanna.csv',
 './output/Rob Bishop.csv',
 './output/Rob Portman.csv',
 './output/Rob Woodall.csv',
 './output/Robert B. Aderholt.csv',
 './output/Robert C. "Bobby" Scott.csv',
 './output/Robert E. Latta.csv',
 './output/Robert J. Wittman.csv',
 './output/Robert Menendez.csv',
 './output/Robert P. Casey, Jr..csv',
 './output/Robin L. Kelly.csv',
 './output/Rodney Davis.csv',
 './output/Roger F. Wicker.csv',
 './output/Roger W. Marshall.csv',
 './output/Roger Williams.csv',
 './output/Ron Estes.csv',
 './output/Ron Johnson.csv',
 './output/Ron Kind.csv',
 './output/Ron Wright.csv',
 './output/Ron Wyden.csv',
 './output/Rosa L. DeLauro.csv',
 './output/Ross Spano.csv',
 './output/Roy Blunt.csv',
 './output/Ruben Gallego.csv',
 './output/Russ Fulcher.csv',
 './output/Salud O. Carbajal.csv',
 './output/Sam Graves.csv',
 './output/Sanford D. Bishop, Jr..csv',
 './output/Scott DesJarlais.csv',
 './output/Scott H. Peters.csv',
 './output/Scott Perry.csv',
 './output/Scott R. Tipton.csv',
 './output/Sean Casten.csv',
 './output/Sean Patrick Maloney.csv',
 './output/Seth Moulton.csv',
 './output/Sharice Davids.csv',
 './output/Sheila Jackson Lee.csv',
 './output/Sheldon Whitehouse.csv',
 './output/Shelley Moore Capito.csv',
 './output/Sherrod Brown.csv',
 './output/Stacey E. Plaskett.csv',
 './output/Steny H. Hoyer.csv',
 './output/Stephanie N. Murphy.csv',
 './output/Stephen F. Lynch.csv',
 './output/Steve Chabot.csv',
 './output/Steve Cohen.csv',
 './output/Steve Daines.csv',
 './output/Steve King.csv',
 './output/Steve Scalise.csv',
 './output/Steve Stivers.csv',
 './output/Steve Watkins.csv',
 './output/Steve Womack.csv',
 './output/Steven Horsford.csv',
 './output/Steven M. Palazzo.csv',
 './output/Susan A. Davis.csv',
 './output/Susan M. Collins.csv',
 './output/Susan W. Brooks.csv',
 './output/Susan Wild.csv',
 './output/Susie Lee.csv',
 './output/Suzan K. DelBene.csv',
 './output/Suzanne Bonamici.csv',
 './output/Sylvia R. Garcia.csv',
 './output/TJ Cox.csv',
 './output/Tammy Baldwin.csv',
 './output/Tammy Duckworth.csv',
 './output/Ted Budd.csv',
 './output/Ted Cruz.csv',
 './output/Ted Lieu.csv',
 './output/Ted S. Yoho.csv',
 './output/Terri A. Sewell.csv',
 './output/Theodore E. Deutch.csv',
 './output/Thom Tillis.csv',
 './output/Thomas Massie.csv',
 './output/Thomas R. Carper.csv',
 './output/Thomas R. Suozzi.csv',
 './output/Tim Burchett.csv',
 './output/Tim Ryan.csv',
 './output/Tim Scott.csv',
 './output/Tim Walberg.csv',
 './output/Tina Smith.csv',
 './output/Todd Young.csv',
 './output/Tom Cole.csv',
 './output/Tom Cotton.csv',
 './output/Tom Emmer.csv',
 './output/Tom Graves.csv',
 './output/Tom Malinowski.csv',
 './output/Tom McClintock.csv',
 './output/Tom O’Halleran.csv',
 './output/Tom Reed.csv',
 './output/Tom Rice.csv',
 './output/Tom Udall.csv',
 './output/Tony Cárdenas.csv',
 './output/Trent Kelly.csv',
 './output/Trey Hollingsworth.csv',
 './output/Troy Balderson.csv',
 './output/Tulsi Gabbard.csv',
 './output/Val Butler Demings.csv',
 './output/Van Taylor.csv',
 './output/Vern Buchanan.csv',
 './output/Veronica Escobar.csv',
 './output/Vicente Gonzalez.csv',
 './output/Vicky Hartzler.csv',
 './output/Virginia Foxx.csv',
 './output/W. Gregory Steube.csv',
 './output/Warren Davidson.csv',
 './output/Will Hurd.csv',
 './output/William R. Keating.csv',
 './output/William R. Timmons IV.csv',
 './output/Xochitl Torres Small.csv',
 './output/Yvette D. Clarke.csv',
 './output/Zoe Lofgren.csv']

## Using our saved model

In [22]:
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
root_dir = "/content/gdrive/My Drive/"
base_dir = root_dir + 'ai-workshop/candidate_tweets/'

save_path = Path(base_dir)
save_path.mkdir(parents=True, exist_ok=True)

score_directory = root_dir + 'ai-workshop/candidate_tweets/scored/'
score_path = Path(score_directory)
score_path.mkdir(parents=True, exist_ok=True)

Mounted at /content/gdrive


In [0]:
# load the model from the 'export.pkl' file on your Google Drive
my_model = load_learner(save_path, file="export-tweetmodel.pkl")  

## Getting every candidate's stats

In [0]:
import csv
import os

In [0]:
# with open(file_list[0], newline='') as csvfile:
#   reader = csv.DictReader(csvfile)

#   for row in reader:
#     row['test'] = "hello"
#     summary_data.append(row)
#     print (row)

In [0]:
# ## Testing code
# file = file_list[0]

In [0]:
# file_list = ['./output/Van Taylor.csv',
#  './output/Vern Buchanan.csv']

In [0]:
summary_data = []

# # loop through all the file names
for file in file_list: 

  this_data = []

  # open csv
  with open(file, newline='') as csvfile:
    reader = csv.DictReader(csvfile)

    # loop through all the rows in the csv
    for row in reader:

      # skip this row if there's no content
      if row['tweet'] == "":
        continue 

      # make the prediction
      fear = my_model.predict(row['tweet'])

      # Note: .item() here turns the float tensor into an actual float
      pct_true = fear[2][1].item()

      if pct_true > 0.50:

        row['file_name'] = os.path.basename(file)
        row['pct_true'] = pct_true

        summary_data.append(row)
        this_data.append(row)

        print(row)

  # Save current csv data
  if len(this_data) > 0:
    this_csv_df = pd.DataFrame(this_data)
    this_output_csv_df = this_csv_df.sort_values(['pct_true'], ascending=False)
    this_output_csv_df.rename(columns={"pct_true": "likelihood"}, inplace=True)
    this_output_csv_name = f'{score_path}/{os.path.basename(file)}'
    this_output_csv_df.to_csv(this_output_csv_name, index=False)


 

OrderedDict([('tweet', 'Trying to “terminate” healthcare protections for Americans with pre-existing conditions is terrible; doubling down on this effort, amidst a global pandemic that has already killed tens of thousands of Americans, is even worse. We must protect the ACA. https://t.co/EFPKbExnHG'), ('dates', '2020-05-07 15:10:53'), ('file_name', 'Angus S. King, Jr..csv'), ('pct_true', 0.8254337906837463)])
OrderedDict([('tweet', 'The only way we can beat coronavirus is with facts and data – so it’s not helpful when the President removes people from their positions for telling the truth. We need our leaders to be seeking out all available information, not just cherry-picking the pieces they want to hear. https://t.co/rEbhdO5v1w'), ('dates', '2020-05-05 13:41:27'), ('file_name', 'Angus S. King, Jr..csv'), ('pct_true', 0.5226852297782898)])
OrderedDict([('tweet', 'In this crisis, we need to help those who need it most – but unfortunately, the CARES Act included massive tax breaks for a

In [0]:
df = pd.DataFrame(summary_data)
output_csv_df = df.sort_values(['pct_true'], ascending=False)

In [0]:
output_csv_df

In [0]:
output_csv_df.drop(columns=['covid_1',	'covid_2',	'covid_3', 'other_1',	'other_2',	'other_3'], inplace=True)

In [0]:
output_csv_df.rename(columns={"pct_true": "likelihood"}, inplace=True)

In [0]:
output_csv_df

In [0]:
output_csv_name = f'{save_path}/fear_tweets_may-11-2020.csv'
output_csv_df.to_csv(output_csv_name, index=False)

In [23]:
!wget -N https://www.dropbox.com/s/jk1ifkoxc9u1rxi/scored.zip
!unzip -q scored.zip

--2020-05-11 23:03:52--  https://www.dropbox.com/s/jk1ifkoxc9u1rxi/scored.zip
Resolving www.dropbox.com (www.dropbox.com)... 162.125.3.1, 2620:100:601f:1::a27d:901
Connecting to www.dropbox.com (www.dropbox.com)|162.125.3.1|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /s/raw/jk1ifkoxc9u1rxi/scored.zip [following]
--2020-05-11 23:03:53--  https://www.dropbox.com/s/raw/jk1ifkoxc9u1rxi/scored.zip
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://ucee548a80b62e8a8fc4cc21d34a.dl.dropboxusercontent.com/cd/0/inline/A3iDz4WEhFLq_dRRaAh48B6EK4jVSIg79Bhx_JkMjtpN75a3CjSH5hs2TxRVUr4TrMfBtyYtK5E6QoALUopgEFuNvLMcdvZwDHfyaD6sHGNaZQ/file# [following]
--2020-05-11 23:03:53--  https://ucee548a80b62e8a8fc4cc21d34a.dl.dropboxusercontent.com/cd/0/inline/A3iDz4WEhFLq_dRRaAh48B6EK4jVSIg79Bhx_JkMjtpN75a3CjSH5hs2TxRVUr4TrMfBtyYtK5E6QoALUopgEFuNvLMcdvZwDHfyaD6sHGNaZQ/file
Resolving ucee548a80b62e8

In [27]:
%ls

[0m[01;36mdata[0m@  [01;34mgdrive[0m/  [01;36mmodels[0m@  [01;34moutput[0m/  output.zip  [01;34mscored[0m/  scored.zip


In [0]:
import pandas as pd
import glob

path = r'./scored' # use your path
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)



In [32]:
frame

Unnamed: 0,tweet,dates,file_name,likelihood
0,"75 years ago, Allied forces accepted Nazi Germ...",2020-05-08 20:06:03,Donald Norcross.csv,0.844977
1,Too many small businesses are missing out on #...,2020-04-17 18:29:58,Donald Norcross.csv,0.680890
2,"Tonight, we commemorate #YomHaShoah &amp; reme...",2020-04-20 22:27:58,Donald Norcross.csv,0.624146
3,Absolutely. China hid this virus for months a...,2020-03-18 13:18:56,Adam Kinzinger.csv,0.984388
4,While the rest of the world comes together to ...,2020-04-02 12:21:40,Adam Kinzinger.csv,0.926206
...,...,...,...,...
5058,When @senatemajldr says he doesn’t want to sup...,2020-04-23 20:00:00,Christopher A. Coons.csv,0.578626
5059,We need to activate American manufacturing so ...,2020-05-07 22:13:29,Christopher A. Coons.csv,0.576368
5060,"There is absolutely no place for racism, hatre...",2020-04-30 20:23:41,Christopher A. Coons.csv,0.567769
5061,I'm a Democrat and @BillKristol is a lifelong ...,2020-05-07 16:57:00,Christopher A. Coons.csv,0.548556


In [0]:
output_csv_name = f'{save_path}/fear_tweets_may-11-2020.csv'
frame.to_csv(output_csv_name, index=False)