# Twitter Feed Scrapper and Sentimental Analysis 

### Abstract

This notebook deals with fetching tweets from twitter feed using a Library called GetOldTweet3. We will be using this library because the standard API for twitter has limitations to fetch data for just previous 7 days. But as we will need to pull data from a previous timeline we will have to use this scraper. We will use TextBlob library that allows us to check for the sentimental polarity of any text, in our case it will be tweets. We can then bundle all of this data and use it for building a Neural network model with twitter feed as an input.

Importing necessary libraries

In [1]:
import numpy as np
import pandas as pd
import GetOldTweets3 as got
import datetime as dt
import pickle
from textblob import TextBlob 

Here we take the timestamp from the stocks data. For each day we can take a small sample of tweets and then calculate the scure of each tweet. Once we have that we can take an aggregate for that day. This new score will be the twitte rscore for that day's stock price. 

In [2]:
df = pd.read_csv("daily_PYPL.csv")
df.head()

Unnamed: 0,timestamp,open,high,low,close,volume
0,2019-07-19,120.74,121.36,118.6,118.63,4913977
1,2019-07-18,118.99,120.8,118.63,119.87,4440672
2,2019-07-17,119.65,119.99,118.8,118.81,4627478
3,2019-07-16,121.1,121.48,119.53,119.71,4006838
4,2019-07-15,120.77,121.04,119.8,120.9,3721345


We need to take the timestamps starting from the index 50 because for the first 50 entries in stock data will be used as an input for the next days stock price.

In [3]:
df = df['timestamp'][50:500]

In [4]:
df.shape

(450,)

Here we try to load the tweets data if it is already saved or else we scrape the data from twitter feed. The parameters fro the scraper would be the timestamp and the twitter search query.
In our case we use `$PYPL` for the search query. This is because in twitter all stock related data is tagges using the `$` symbol followed by the stock short name. We perform this for each timestamp in our stock data.

In [5]:
all_tweets = []

try:
    print('Loading Saved Tweets')
    all_tweets = pickle.load(open('tweets.data', 'rb'))
except (OSError, IOError) as e:
    print('No Tweets found. Downloading Tweets')
    for d in df:
        fromDate = d
        toDate = dt.datetime.strptime(d, '%Y-%m-%d') + dt.timedelta(days=1)
        toDate = dt.datetime.strftime(toDate,'%Y-%m-%d')
        #print(fromDate+" "+toDate)
        tweetCriteria = got.manager.TweetCriteria().setQuerySearch('$PYPL').setSince(fromDate).setUntil(toDate).setMaxTweets(5)
        tweets = got.manager.TweetManager.getTweets(tweetCriteria)
        print("Downloaded Tweets for date:" + fromDate)
        all_tweets.append(tweets)
    print('Saving Tweets')
    pickle.dump(all_tweets,open('tweets.data', 'wb'))
print('Tweets Data Loaded')

Loading Saved Tweets
Tweets Data Loaded


Once we have the tweets we can apply the TextBlob library to get the sentimental polarity/score for the each tweet. And then we aggregate them to get the score for that day.

In [6]:
scores = []
for entry in all_tweets:
    score = 0
    for tweet in entry:
        blob = TextBlob(tweet.text)
        score = score + blob.sentiment.polarity
    scores.append(score/5)

Saving score using pickle library

In [8]:
print('Saving Scores')
pickle.dump(scores,open('tweetScore.data','wb'))

Saving Scores


Creating a data frame to save all the data to a CSV file for future use.

In [15]:
tweet_text = []
tweet_score = []
for entry in all_tweets:
    for tweet in entry:
        tweet_text.append(tweet.text)
        blob = TextBlob(tweet.text)
        tweet_score.append(blob.sentiment.polarity)

dict = {'Tweets':tweet_text,'Score':tweet_score}

In [16]:
df = pd.DataFrame(dict)

In [17]:
df.head()

Unnamed: 0,Score,Tweets
0,0.0,"IBD Big Cap 20 Ranked May 8, 2019 1. $VEEV $IN..."
1,0.4375,OptionAlarm provides Option Swing Trade Alerts...
2,0.0,http://bit.ly/2Uh4tyu Tim Sykes Alerts REVEALE...
3,0.375,$PYPL $FB: Facebook’s crypto project echoes bo...
4,0.0,I do not like the current situations BUT this ...


In [14]:
df.to_csv('twitter_data.csv')

# Conclusion

We have extracted sample data from the twitter feed and converted that to a sentimental score. This score can be now used as an input to the Neural Network for predicting stock price based on twitter data.

# Contribution:
1. Code contributed by me ~ 80%
2. Code taken from documentations: ~20%


# Citations:
1. https://textblob.readthedocs.io/en/dev/quickstart.html#sentiment-analysis
2. https://github.com/Mottl/GetOldTweets3


<div class="alert alert-block alert-info">
MIT License 

Copyright (c) 2019 Ninad Subhedar 

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
</div>