# Research

by Joshua Isaacson and Hannah Isaacson 

For our Fall 2017 SICE@IU undergraduate research project, *A Sentiment-Based Long-Short Equity Strategy*.

## Components

1. Universe Selection
2. Alphalens Factor Analysis
3. Rebalancing
4. Portfolio
5. Pipeline

##  Universe Selection

This component covers our process of defining the trading universe for which the algorithm operates.

### Imports 

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from quantopian.pipeline.filters import Q1500US
from quantopian.research import run_pipeline
from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.psychsignal import stocktwits
from quantopian.pipeline.data import Fundamentals
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.filters.fundamentals import IsPrimaryShare
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import CustomFactor, Returns
from quantopian.pipeline.classifiers.fundamentals import Sector
from quantopian.pipeline.data.sentdex import sentiment_free
from quantopian.pipeline.factors import SimpleMovingAverage
from time import time
import alphalens as al

### Universe

WHY DID WE CHOOSE THIS
WHAT EQUITIES ARE IN IT

In [3]:
universe = Q1500US()

## Factor Analysis

We want to test to see how good our alpha factors are at predicting relative price movements. A wide range of factors that are independent of each other yield a better ranking scheme.

The factors we are going to evaluate are:
* bearish_intensity
* bullish_intensity
* sentiment_signal
* sentiment moving average (10, 20, 30, 50, 80 day)
    * simple and exponential

### Fields in PsychSignal Dataset

In [4]:
def print_fields(dataset):
    print "Dataset: %s\n" % dataset.__name__
    print "Fields:"
    for field in list(dataset.columns):
        print "%s - %s" % (field.name, field.dtype)
    print "\n"

for data in (stocktwits,):
    print_fields(data)

Dataset: stocktwits

Fields:
bullish_intensity - float64
symbol - object
bull_bear_msg_ratio - float64
source - object
bear_scored_messages - float64
asof_date - datetime64[ns]
bull_minus_bear - float64
bearish_intensity - float64
bull_scored_messages - float64
total_scanned_messages - float64




### Fields in Sentdex Sentiment Analysis Dataset

In [5]:
def print_fields(dataset):
    print "Dataset: %s\n" % dataset.__name__
    print "Fields:"
    for field in list(dataset.columns):
        print "%s - %s" % (field.name, field.dtype)
    print "\n"

for data in (sentiment_free,):
    print_fields(data)

Dataset: sentiment_free

Fields:
symbol - object
asof_date - datetime64[ns]
sentiment_signal - float64




### Sentiment Signal Moving Averages

Simple Moving Averages

In [6]:
sma_10 = SimpleMovingAverage(inputs=[sentiment_free.sentiment_signal], window_length=10, mask=universe)
sma_20 = SimpleMovingAverage(inputs=[sentiment_free.sentiment_signal], window_length=20, mask=universe)
sma_30 = SimpleMovingAverage(inputs=[sentiment_free.sentiment_signal], window_length=30, mask=universe)
sma_50 = SimpleMovingAverage(inputs=[sentiment_free.sentiment_signal], window_length=50, mask=universe)
sma_80 = SimpleMovingAverage(inputs=[sentiment_free.sentiment_signal], window_length=80, mask=universe)

### Sector Codes

In [7]:
MORNINGSTAR_SECTOR_CODES = {
     -1: 'Misc',
    101: 'Basic Materials',
    102: 'Consumer Cyclical',
    103: 'Financial Services',
    104: 'Real Estate',
    205: 'Consumer Defensive',
    206: 'Healthcare',
    207: 'Utilities',
    308: 'Communication Services',
    309: 'Energy',
    310: 'Industrials',
    311: 'Technology' ,
}

### Getting Data

In [20]:
pipe = Pipeline()

pipe.add(stocktwits.bearish_intensity.latest, 'bearish_intensity')
pipe.add(stocktwits.bullish_intensity.latest, 'bullish_intensity')
pipe.add(sentiment_free.sentiment_signal.latest, 'sentiment_signal')
pipe.add(sma_10, 'sma_10')
pipe.add(sma_20, 'sma_20')
pipe.add(sma_30, 'sma_30')
pipe.add(sma_50, 'sma_50')
pipe.add(sma_80, 'sma_80')
pipe.add(Sector(), 'Sector')

pipe.set_screen(universe)

start_timer = time()
results = run_pipeline(pipe, '2015-01-01', '2016-01-01')
end_timer = time()

print("Time to run pipeline %.2f secs" % (end_timer - start_timer))

Time to run pipeline 45.70 secs


### Dealing with NaN Values

In [21]:
adjusted_dataset = results.interpolate()
adjusted_dataset.head()
#len(adjusted_dataset)

Unnamed: 0,Unnamed: 1,Sector,bearish_intensity,bullish_intensity,sentiment_signal,sma_10,sma_20,sma_30,sma_50,sma_80
2015-01-02 00:00:00+00:00,Equity(2 [ARNC]),101,0.0,1.2,2.0,2.8,3.6,4.266667,4.26,2.7375
2015-01-02 00:00:00+00:00,Equity(24 [AAPL]),311,1.82,1.46,2.0,1.8,0.2,0.8,0.8,0.875
2015-01-02 00:00:00+00:00,Equity(41 [ARCB]),310,0.91,0.73,1.5,-0.2,-0.375,0.416667,0.88,1.325
2015-01-02 00:00:00+00:00,Equity(62 [ABT]),206,0.0,0.0,1.0,-2.2,-0.95,0.033333,0.96,1.775
2015-01-02 00:00:00+00:00,Equity(67 [ADSK]),311,1.7,0.0,6.0,6.0,6.0,5.933333,4.56,4.25


### Filtering for Unique Equities

# TODO

* first name the equity column, the drop duplicates based on it
* Alphalens tearsheet for:
    * bearish_intensity
    * bullish_intensity
    * sentiment_signal
    * sentiment moving averages
* choose factors
* choose how to distribute long and short
* backtest
* analyze portfolio
* repeat backtests

### Factor Output from Pipeline

All factors are from the pipeline's output, adjusted to interpolate the NaNs.

In [22]:
bearish_intensity_factor = adjusted_dataset['bearish_intensity']
print(bearish_intensity_factor.head())

2015-01-02 00:00:00+00:00  Equity(2 [ARNC])     0.00
                           Equity(24 [AAPL])    1.82
                           Equity(41 [ARCB])    0.91
                           Equity(62 [ABT])     0.00
                           Equity(67 [ADSK])    1.70
Name: bearish_intensity, dtype: float64


In [23]:
bullish_intensity_factor = adjusted_dataset['bullish_intensity']
print(bullish_intensity_factor.head())

2015-01-02 00:00:00+00:00  Equity(2 [ARNC])     1.20
                           Equity(24 [AAPL])    1.46
                           Equity(41 [ARCB])    0.73
                           Equity(62 [ABT])     0.00
                           Equity(67 [ADSK])    0.00
Name: bullish_intensity, dtype: float64


In [24]:
sentiment_signal_factor = adjusted_dataset['sentiment_signal']
print(sentiment_signal_factor.head())

2015-01-02 00:00:00+00:00  Equity(2 [ARNC])     2.0
                           Equity(24 [AAPL])    2.0
                           Equity(41 [ARCB])    1.5
                           Equity(62 [ABT])     1.0
                           Equity(67 [ADSK])    6.0
Name: sentiment_signal, dtype: float64


In [25]:
sma_10_factor = adjusted_dataset['sma_10']
print(sma_10_factor.head())

2015-01-02 00:00:00+00:00  Equity(2 [ARNC])     2.8
                           Equity(24 [AAPL])    1.8
                           Equity(41 [ARCB])   -0.2
                           Equity(62 [ABT])    -2.2
                           Equity(67 [ADSK])    6.0
Name: sma_10, dtype: float64


In [26]:
sma_20_factor = adjusted_dataset['sma_20']
print(sma_20_factor.head())

2015-01-02 00:00:00+00:00  Equity(2 [ARNC])     3.600
                           Equity(24 [AAPL])    0.200
                           Equity(41 [ARCB])   -0.375
                           Equity(62 [ABT])    -0.950
                           Equity(67 [ADSK])    6.000
Name: sma_20, dtype: float64


In [27]:
sma_30_factor = adjusted_dataset['sma_30']
print(sma_30_factor.head())

2015-01-02 00:00:00+00:00  Equity(2 [ARNC])     4.266667
                           Equity(24 [AAPL])    0.800000
                           Equity(41 [ARCB])    0.416667
                           Equity(62 [ABT])     0.033333
                           Equity(67 [ADSK])    5.933333
Name: sma_30, dtype: float64


In [28]:
sma_50_factor = adjusted_dataset['sma_50']
print(sma_50_factor.head())

2015-01-02 00:00:00+00:00  Equity(2 [ARNC])     4.26
                           Equity(24 [AAPL])    0.80
                           Equity(41 [ARCB])    0.88
                           Equity(62 [ABT])     0.96
                           Equity(67 [ADSK])    4.56
Name: sma_50, dtype: float64


In [29]:
sma_80_factor = adjusted_dataset['sma_80']
print(sma_80_factor.head())

2015-01-02 00:00:00+00:00  Equity(2 [ARNC])     2.7375
                           Equity(24 [AAPL])    0.8750
                           Equity(41 [ARCB])    1.3250
                           Equity(62 [ABT])     1.7750
                           Equity(67 [ADSK])    4.2500
Name: sma_80, dtype: float64


We also want to see equity performance broken down by sector.

In [30]:
sectors = adjusted_dataset['Sector']

Grab the pricing data for the unique equities in our pipeline.

In [36]:
asset_list = adjusted_dataset.index.levels[1].unique()
prices = get_pricing(asset_list, start_date='2015-01-01', end_date='2016-01-01', fields='price')
print(asset_list)

[Equity(2 [ARNC]) Equity(24 [AAPL]) Equity(41 [ARCB]) ...,
 Equity(49496 [FDC]) Equity(49506 [HPE]) Equity(49515 [RACE])]


In [37]:
prices.head()

Unnamed: 0,Equity(2 [ARNC]),Equity(24 [AAPL]),Equity(41 [ARCB]),Equity(53 [ABMD]),Equity(62 [ABT]),Equity(67 [ADSK]),Equity(76 [TAP]),Equity(110 [ACXM]),Equity(114 [ADBE]),Equity(122 [ADI]),...,Equity(49203 [GCI]),Equity(49209 [BXLT]),Equity(49210 [CC]),Equity(49213 [ENR]),Equity(49229 [KHC]),Equity(49242 [PYPL]),Equity(49279 [BUFF]),Equity(49496 [FDC]),Equity(49506 [HPE]),Equity(49515 [RACE])
2015-01-02 00:00:00+00:00,15.717,107.469,45.513,37.3,43.977,59.53,72.3,19.605,72.33,54.041,...,,,,,,,,,,
2015-01-05 00:00:00+00:00,14.817,104.47,44.839,37.09,43.997,58.66,71.889,19.425,71.99,53.058,...,,,,,,,,,,
2015-01-06 00:00:00+00:00,14.906,104.451,42.805,36.13,43.478,57.5,71.527,19.08,70.52,51.812,...,,,,,,,,,,
2015-01-07 00:00:00+00:00,15.302,105.945,41.734,37.28,43.84,57.37,73.807,19.33,71.12,52.357,...,,,,,,,,,,
2015-01-08 00:00:00+00:00,15.757,109.996,42.716,38.96,44.731,58.8,76.088,19.79,72.91,53.281,...,,,,,,,,,,


## Alphalens Factor Analysis

Now that we have created the pipeline to filter and gather the equities, we will use the Alphalens tool provided by Quantopian to analyze the alpha factors that we want to test. Ultimately, we will use this tool and the metrics it gives us to understand each alpha factor's inherent ability to predict future price. We are looking for a high Alpha, a Beta close to 0, a high Sharpe Ratio, and a high Spearman Correlation.

In [48]:
bearish_intensity_factor_data = al.utils.get_clean_factor_and_forward_returns(
                                                            factor=bearish_intensity_factor,
                                                            prices=prices,
                                                            groupby=sectors,
                                                            groupby_labels=MORNINGSTAR_SECTOR_CODES,
                                                            periods=(1,5,10))

ValueError: Bin edges must be unique: array([ 0. ,  0. ,  0. ,  0. ,  0. ,  3.2])