# Research

by Joshua Isaacson and Hannah Isaacson 

For our Fall 2017 SICE@IU undergraduate research project, *A Sentiment-Based Long-Short Equity Strategy*.

## Components

1. Universe Selection
2. Factor Analysis
3. Rebalancing
4. Portfolio
5. Pipeline

##  Universe Selection

This component covers our process of defining the trading universe for which the algorithm operates.

### Imports 

In [185]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from quantopian.pipeline.filters import Q1500US
from quantopian.research import run_pipeline
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.psychsignal import stocktwits
from quantopian.pipeline.data import Fundamentals
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.filters.fundamentals import IsPrimaryShare
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import CustomFactor, Returns
from quantopian.pipeline.classifiers.fundamentals import Sector
from quantopian.pipeline.data.sentdex import sentiment_free
from quantopian.pipeline.factors import SimpleMovingAverage
from time import time
import alphalens as al

### Universe

In [186]:
universe = Q1500US()

## Factor Analysis

We want to test to see how good our alpha factors are at predicting relative price movements. A wide range of factors that are independent of each other yield a better ranking scheme.

The factors we are going to evaluate are:
* bearish_intensity
* bullish_intensity
* sentiment_signal
* sentiment moving average (10, 20, 30, 50, 80 day)
    * simple and exponential

### Fields in PsychSignal Dataset

In [187]:
def print_fields(dataset):
    print "Dataset: %s\n" % dataset.__name__
    print "Fields:"
    for field in list(dataset.columns):
        print "%s - %s" % (field.name, field.dtype)
    print "\n"

for data in (stocktwits,):
    print_fields(data)

Dataset: stocktwits

Fields:
bull_scored_messages - float64
bullish_intensity - float64
symbol - object
bull_minus_bear - float64
bull_bear_msg_ratio - float64
source - object
bear_scored_messages - float64
total_scanned_messages - float64
asof_date - datetime64[ns]
bearish_intensity - float64




### Fields in Sentdex Sentiment Analysis Dataset

In [188]:
def print_fields(dataset):
    print "Dataset: %s\n" % dataset.__name__
    print "Fields:"
    for field in list(dataset.columns):
        print "%s - %s" % (field.name, field.dtype)
    print "\n"

for data in (sentiment_free,):
    print_fields(data)

Dataset: sentiment_free

Fields:
sentiment_signal - float64
symbol - object
asof_date - datetime64[ns]




### Sentiment Signal Moving Averages

Simple Moving Averages

In [189]:
sma_10 = SimpleMovingAverage(inputs=[sentiment_free.sentiment_signal], window_length=10, mask=universe)
sma_20 = SimpleMovingAverage(inputs=[sentiment_free.sentiment_signal], window_length=20, mask=universe)
sma_30 = SimpleMovingAverage(inputs=[sentiment_free.sentiment_signal], window_length=30, mask=universe)
sma_50 = SimpleMovingAverage(inputs=[sentiment_free.sentiment_signal], window_length=50, mask=universe)
sma_80 = SimpleMovingAverage(inputs=[sentiment_free.sentiment_signal], window_length=80, mask=universe)

### Sector Codes

In [190]:
MORNINGSTAR_SECTOR_CODES = {
     -1: 'Misc',
    101: 'Basic Materials',
    102: 'Consumer Cyclical',
    103: 'Financial Services',
    104: 'Real Estate',
    205: 'Consumer Defensive',
    206: 'Healthcare',
    207: 'Utilities',
    308: 'Communication Services',
    309: 'Energy',
    310: 'Industrials',
    311: 'Technology' ,
}

### Getting Data

In [191]:
pipe = Pipeline()

pipe.add(stocktwits.bearish_intensity.latest, 'bearish_intensity')
pipe.add(stocktwits.bullish_intensity.latest, 'bullish_intensity')
pipe.add(sentiment_free.sentiment_signal.latest, 'sentiment_signal')
pipe.add(sma_10, 'sma_10')
pipe.add(sma_20, 'sma_20')
pipe.add(sma_30, 'sma_30')
pipe.add(sma_50, 'sma_50')
pipe.add(sma_80, 'sma_80')

pipe.set_screen(universe)

start_timer = time()
results = run_pipeline(pipe, '2015-01-01', '2016-01-01')
end_timer = time()

print("Time to run pipeline %.2f secs" % (end_timer - start_timer))

Time to run pipeline 27.22 secs


### Dealing with NaN Values

In [195]:
adjusted_dataset = results.interpolate()
adjusted_dataset.head()
#len(adjusted_dataset)

Unnamed: 0,Unnamed: 1,bearish_intensity,bullish_intensity,sentiment_signal,sma_10,sma_20,sma_30,sma_50,sma_80
2015-01-02 00:00:00+00:00,Equity(2 [ARNC]),0.0,1.2,2.0,2.8,3.6,4.266667,4.26,2.7375
2015-01-02 00:00:00+00:00,Equity(24 [AAPL]),1.82,1.46,2.0,1.8,0.2,0.8,0.8,0.875
2015-01-02 00:00:00+00:00,Equity(41 [ARCB]),0.91,0.73,1.5,-0.2,-0.375,0.416667,0.88,1.325
2015-01-02 00:00:00+00:00,Equity(62 [ABT]),0.0,0.0,1.0,-2.2,-0.95,0.033333,0.96,1.775
2015-01-02 00:00:00+00:00,Equity(67 [ADSK]),1.7,0.0,6.0,6.0,6.0,5.933333,4.56,4.25


### Filtering for Unique Equities

# TODO

* first name the equity column, the drop duplicates based on it
* Alphalens tearsheet for:
    * bearish_intensity
    * bullish_intensity
    * sentiment_signal
    * sentiment moving averages
* choose factors
* choose how to distribute long and short
* backtest
* analyze portfolio
* repeat backtests