# Twitter Stocker
## Using the Twitter API to implement sentiment analysis on different sets of stocks

## Section 1) Getting the Tweepy API working for Python

* Import the dependencies for the rest of the project here, this section will be kept up-to-date
* Create a twitter account (out of scope) and through the developer section, get a set of `OAuth` credentials
* Instantiate the api and make some test searches
* Define some helper functions

### 1) Import the necessary libraries 

In [1]:
import pandas as pd
import numpy as np

In [2]:
import tweepy
from tweepy import OAuthHandler, Stream, StreamListener
tweepy.__version__

'3.5.0'

In [3]:
import datetime
import pandas_datareader.data as pdr
import pytz

In [4]:
import configparser

In [5]:
config = configparser.ConfigParser()

In [6]:
config.read('config/keys.txt')

['config/keys.txt']

### 2) Authenticate User: you should use ideally use your own login credentials

In [7]:
consumer_key = config['DEFAULT']['consumer_key']
consumer_secret = config['DEFAULT']['consumer_secret']
access_token = config['DEFAULT']['access_token']
access_secret = config['DEFAULT']['access_secret']

In [8]:
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

# plug into the matrix
api = tweepy.API(auth)

### 3) Test the tweepy api with Microsoft

In [9]:
test_stock = 'Microsoft'
test_ticker = 'MSFT'

In [10]:
search_results = api.search(q=test_stock, count=10, lang='en')

In [11]:
for s in search_results[:3]:
    print(s.created_at)
    print(s.text)

2019-04-03 00:15:48
RT @chrisheauxx: nigga recorded this with microsoft word 2003 https://t.co/eSeMrdGPjZ
2019-04-03 00:15:47
@DChaikovskiy @THEREALRTU You're saying this bullshtick because you only have a Sony system because Nintendo and Mi… https://t.co/xlodbfWvMp
2019-04-03 00:15:46
@USCSIsrael @mikejwalker @Microsoft @DigitalChamber Nothing to see here


### 4) Define helper functions:

* One to get the corpus from a tweet
* One to perform a twitter search with a string and collect 100 tweets before a given date
* One to combine gainers and losers into a set of 300 tweets, assigning each tweet to the company

In [12]:
def get_corpus(status):
    """
    Given a tweepy.models.Status object, returns the corpus as a str object
    
    :params: status, tweepy.models.Status
    :returns: corpus, str
    """
    if isinstance(status, tweepy.models.Status):
        return status.text
    else:
        raise TypeError("Input not of type tweepy.api.Status")

In [16]:
def collect_tweets(query, 
                   limit=1000,
                   dt=datetime.datetime.now(),
                   tz='US/Eastern'):
    
    assert(isinstance(query, str))

    local_tz = pytz.timezone(tz)
    local_dt = local_tz.localize(dt)
    
        
    valid_results = []
    for s in tweepy.Cursor(api.search, q=query, rpp=10,count=100, lang='en').items(limit):
        if local_tz.localize(s.created_at) < local_dt:
            
            valid_results.append(s)
            
    if len(valid_results) < 100:
        print("WARN: Less than 100 results, consider expanding search")
        
    return valid_results[:100]
    

In [17]:
def assemble_corpus(tweet_dict):
    pass
    

In [18]:
tweets = collect_tweets('GRANITE')

In [19]:
len(tweets)

100

In [21]:
tweets[0].text

'RT @RepAnnieKuster: NH families deserve to know their water is protected from pollution, yet the president’s #DirtyWaterRule will undo crit…'

## Section 2: Determining the Winners and Losers for a given day

* Define the scope of the problem. Here, I have chosen the NASDAQ index and have pulled a .csv file of the companies
* Define functions to find the winners and losers, winners and losers are defined by their diff = price_close - price_open

In [None]:
datetime.datetime.now()

#### 1) Read in all companies on the NASDAQ, I have pre-populated a list of the companies

In [None]:
ticker_df = pd.read_csv('files\companylist.csv')

In [None]:
ticker_df.columns

In [None]:
tickers = ticker_df['Symbol']
tickers.head()

In [None]:
# useful datetime variable
start_dt = datetime.datetime(2019, 3, 27)
end_dt = start_dt + datetime.timedelta(1)

#### 3) define three helper functions: 

* One to compute the stock's gain or loss for a given day
* One to compile a dictionary of all (or some) of the NASDAQs stocks' gains/losses for a given day.
By default, the limit is set at 50 out of the 3500 or so NASDAQ companies. Simply set limit=None to
scan the entire NASDAQ index
* One to identify the winners and losers given the dictionary and return a winner's dictionary and a loser's
dictionary based on the differential

In [None]:
def _get_diff(df):
    assert(df.shape[0] == 1)
    return float(df['close'] - df['open']) / float(df['open'])

In [None]:
def get_all_diffs(tickers, 
                  limit=50, 
                  dt=datetime.datetime.now() - datetime.timedelta(1)):
    
    if limit and limit < len(tickers):
        _tickers = tickers[:limit]
    else:
        _tickers = tickers
    
    _diffs = dict()
    
    for ticker in _tickers:
    
        try:
            _df = pdr.DataReader(ticker, 'iex', dt, dt)
            diff_value = _get_diff(_df)
            _diffs[ticker] = diff_value
        except Exception as e:
            print(ticker, _df.shape)
            
    return _diffs

In [None]:
def find_gainers_and_losers(diff_dict):
    
    _df = pd.DataFrame([diff_dict.keys(), diff_dict.values()]).T
    _df.dropna(axis=0, inplace=True)
    _df.columns = ['ticker', 'diff']
    _df.sort_values('diff', inplace=True, ascending=False)
    _df.set_index('ticker', inplace=True, drop=True)
    
    winners = _df.iloc[:3, :]
    losers = _df.iloc[-3:, :]
    
    return winners.to_dict()['diff'], losers.to_dict()['diff']
    

In [None]:
diffs = get_all_diffs(tickers, 200, start_dt)

In [None]:
gainers, losers = find_gainers_and_losers(diffs)

In [None]:
gainers

In [None]:
losers

## Section 3: Combining the previous sections

Now that we have achieved the basic abilities to use the tweepy API and the pandas data reader, we want to abstract
their functionality into a more structured piece of software with clear inputs and outputs and robust paremeter handling