Twitter has a very restrictive API/charges you for more than 50 requests a month (!), so I looked into other options to try to scrape down the text from tweets. A classmate suggest Twint, which looks like a great tool, though turned out to be hard to install for inexplicable reasons and then ended up not actually working on my machine?

Twint prints out an output of the tweets it scrapes, but also ostensibly is meant to be able to save down the actual data into a csv, json or directly into a pandas dataframe. For some reason, on my machine it won't save the scraped data down. So this notebook consists mostly of my workaround for getting at the scraped data which involves saving down the system output of the scraper to a .txt file, then reading the contents of that file back into the notebook as a string upon which the sentiment analysis could be performed. As with the NYT article scraping, I did the actual scraping with .py file in the terminal, rather than from the notebook, which would be a little over-burdened by the scraping at scale.

In [1]:
import twint
import nest_asyncio
nest_asyncio.apply()
import datetime
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import pandas as pd
import sys
import io
import re

In [2]:
#an example of a twint search on it's own
search = twint.Config() #instantiate a twint object
search.To = 'Tesla' #attributes of the object relate to the particular search
search.Limit = 10
search.Since = '2019-07-04'
search.Until = '2019-07-05'
twint.run.Search(search) #perform the search

1146931552879857664 2019-07-04 19:59:25 EDT <TrashBoat3015> So this happened today, thought it was appropriate. They all can’t be Tesla, but the RW&B is represented. I’m the blue. 🇺🇸 pic.twitter.com/JesE4pecy3
1146931204932931585 2019-07-04 19:58:02 EDT <Auto_nerdz> To you to sir!
1146929631729078272 2019-07-04 19:51:47 EDT <MaryDaisy16> Elon, if I ever buy a car again it will be electric; and it will be because you have quietly shown us how and why. @elonmusk
1146928818281578500 2019-07-04 19:48:33 EDT <tonyjwinter> Happy #4ofJuly to #Tesla from happy owners! pic.twitter.com/SBtJrkDi7b
1146927706287890432 2019-07-04 19:44:08 EDT <R0BOT> Out here charging in Nacogdoches Texas on my way home to Houston from 4th of July festivities. pic.twitter.com/iiuHbJDEJy
1146926241968033792 2019-07-04 19:38:19 EDT <Adellasmith2032> Happy 4th of July! pic.twitter.com/NBLIPG44GN
1146924172146368512 2019-07-04 19:30:06 EDT <cryptonewszcom> @Tesla can maintain its deliveries momentum by driveway deliver

So this search is working just fine, but I don't have a good way of handling the output as is: twint is simply printing the result. They can't be directly saved into an object and, as I mentioned, twint's functionality to save down the scraped data doesn't work on my machine.

So, how to actually store and access this data? Here's my workaround:

In [3]:
old_stdout = sys.stdout # Memorize the default stdout stream
sys.stdout = buffer = io.StringIO() #we're going to track from that point forward
search = twint.Config() #same search as before
search.To = 'Tesla' 
search.Limit = 0
search.Since = '2019-07-04'
search.Until = '2019-07-05'
twint.run.Search(search) 
output = buffer.getvalue() #saving the tracked data
f = open('tweets.txt','w') #write the output down to a txt file
f.write(str(output)) #writing over the file each time, this is temporary storage
f.close()

Notice that when this cell is run, the output doesn't get printed/is not immediately visible. Instead, it's been saved to the .txt, and we can access it from there:

In [4]:
f = open('tweets.txt','r')
text = f.read()
f.close()
text

"1146931552879857664 2019-07-04 19:59:25 EDT <TrashBoat3015> So this happened today, thought it was appropriate. They all can’t be Tesla, but the RW&B is represented. I’m the blue. 🇺🇸 pic.twitter.com/JesE4pecy3\n1146931204932931585 2019-07-04 19:58:02 EDT <Auto_nerdz> To you to sir!\n1146929631729078272 2019-07-04 19:51:47 EDT <MaryDaisy16> Elon, if I ever buy a car again it will be electric; and it will be because you have quietly shown us how and why. @elonmusk\n1146928818281578500 2019-07-04 19:48:33 EDT <tonyjwinter> Happy #4ofJuly to #Tesla from happy owners! pic.twitter.com/SBtJrkDi7b\n1146927706287890432 2019-07-04 19:44:08 EDT <R0BOT> Out here charging in Nacogdoches Texas on my way home to Houston from 4th of July festivities. pic.twitter.com/iiuHbJDEJy\n1146926241968033792 2019-07-04 19:38:19 EDT <Adellasmith2032> Happy 4th of July! pic.twitter.com/NBLIPG44GN\n1146924172146368512 2019-07-04 19:30:06 EDT <cryptonewszcom> @Tesla can maintain its deliveries momentum by driveway 

So, my strategy going forward is to run my searches, one day at a time, saving down all the text data into a temporary txt file, and then reading it back up to actually operate on it/perform sentiment analysis.

In [6]:
#Date formatting function so that dates are strings in the format twint wants
def date_formatter(datetime_obj):
    year = str(datetime_obj.year)
    if len(str(datetime_obj.month))==1:
        month = '0'+str(datetime_obj.month)
    else:
        month = str(datetime_obj.month)
        
    if len(str(datetime_obj.day))==1:
        day = '0'+str(datetime_obj.day)
    else:
        day = str(datetime_obj.day)
    return year+'-'+month+'-'+day

#function to perform the scrape, and return the text of the tweets using regex
def twitter_scrape(date, company):
    old_stdout = sys.stdout 
    sys.stdout = buffer = io.StringIO()
    search = twint.Config()
    search.To = company
    search.Limit = 10 #change this to a higher number when going to actually get data
    search.Since = date_formatter(date)
    search.Until = date_formatter(date + datetime.timedelta(1))
    twint.run.Search(search)
    output = buffer.getvalue()
    f = open('tweets.txt','w') #this temp file is always getting written over...
    f.write(str(output))
    f.close()
    f = open('tweets.txt','r')
    text = f.read()
    f.close()
    tweets = text.split('\n') #split the text up into each individual tweet

    p = re.compile(r'.+<.+>(.+)') #the text of the tweet always follws <user_name>
    total_text = ''
    for tweet in tweets[:-1]:
        try:
            total_text += p.search(tweet).group(1)
        except:
            pass
    return total_text

In [9]:
start = datetime.datetime(2017,1,1)
twitter_scrape(start, 'Tesla')

" Hello 2017 . . . My model III is that much closer to delivery! @TeslaMotors  y @panasonic  comenzarán a producir paneles solares de manera conjunta  https://www.geektopia.es/es/technology/2016/12/28/noticias/tesla-y-panasonic-comenzaran-a-producir-paneles-solares-de-manera-conjunta.html\xa0… vía @geektopic @TeslaMotors enhanced autopilot to arrive before New Year.  http://readwrite.com/2016/12/28/tesla-autopilot-ex-launch-tl4/?utm_campaign=coschedule&utm_source=twitter&utm_medium=RWW&utm_content=Tesla's%20enhanced%20autopilot%20to%20arrive%20before%20New%20Year\xa0… via @RWW @DrivenGrowth that will, significantly, be more expensive :)) @TeslaMotors you need to hire this guy asap https://twitter.com/omgitsuzzi/status/815661468108619776\xa0… @TeslaMotors email to PaloAlto_Service@tesla.com is bouncing. Can you investigate? waiting for non-perf version of 100D MX I wana drive Tesla car, some day I will if u discount it 30% or so... That's really generous! Thanks 😁 Tesla come Poltrone&So

In [10]:
#finally to compile a dataframe of daily sentiment, and save down a csv
def compile_twitter_sentiment(start, company, n_days, year):
    df = pd.DataFrame(columns=['date','neg','pos','compound'])
    sid = SentimentIntensityAnalyzer() #NLTK's VADER sentiment analyzer
    for n in range(0,n_days):
        date = start+datetime.timedelta(n)
        text = twitter_scrape(date, company)
        vader = sid.polarity_scores(text)
        temp = pd.DataFrame(columns=['date','neg','pos','compound'])
        temp['date'] = [date_formatter(date)]
        temp['neg'] = [vader['neg']]
        temp['pos'] = [vader['pos']]
        temp['compound'] = [vader['compound']]
        df = pd.concat([df,temp])
        df.to_csv(f'{company}_twitter_sentiment_{year}.csv')
    return df

In [11]:
compile_twitter_sentiment(start,'Tesla',7,2017)

Unnamed: 0,date,neg,pos,compound
0,2017-01-01,0.049,0.06,0.5245
0,2017-01-02,0.023,0.134,0.9815
0,2017-01-03,0.016,0.066,0.9342
0,2017-01-04,0.032,0.237,0.9932
0,2017-01-05,0.016,0.085,0.9488
0,2017-01-06,0.0,0.145,0.9896
0,2017-01-07,0.035,0.101,0.9472
