### Overnight Returns

[Overnight Returns and Firm-Specific Investor Sentiment](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2554010)

> **Abstract**: We explore the possibility that overnight rturns can serve as a measure of firm-specific investor sentimatne by analyzing whether they exhibit characteristics expected of a sentiment measure. First, we document short-term persistence in share demand of sentiment-influenced in share demand of sentiment-influenced retain investors. Second, we dind that short-term persistence is stonger for harder-to-value firms, consistent with evidence that sentiment palys a larger role when there is less objective data available for valuation. Thirrd, we show that stocks with high(low) overnight returns underperform (outperform) over the longer-term, consistent with evidence of temporary sentiment-driven mispricing.

> ** P 2 I**: The recent work of Berkman, Koch, Tuttle, and Zhang(2012) suggests that a stock's overnight (close-to_open) return can serve as a measure of firm-level sentiment.

> **p3, I**: Specifically, Berkman et al. (2012) finds that attention-generating events ( high absolute returns or storn net buying by retail investors) on one day leads to higher demand by individual investors, concentrated near the open of the next day...This creates temporary price pressur at the open, resulting in evalated overnight returns that are reversed during the next trading day.

> **p 3 I**: We conduct three setns of analysis. **In the first we test for short run persistence in overnight returns** The basis for expecting this from a measure of sentiment is the evidence in BArber et al. (2009) that the order imbalances of retail investors, who are the investors most likely to exhibit sentiment, persist for periods extending over several weeks...In the third analysis we examine whether stocks with high overnight returns underperform those with low overnight returns over the long term.

In [1]:
# Imports
from quantopian.pipeline.data import Fundamentals
from quantopian.pipeline.data import morningstar as mstar
from quantopian.pipeline.factors import AverageDollarVolume
from quantopian.pipeline.factors.morningstar import MarketCap
from quantopian.pipeline.classifiers.morningstar import Sector
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline

from quantopian.pipeline.factors import SimpleMovingAverage
from quantopian.pipeline.factors import Returns
from zipline.pipeline.factors import DailyReturns

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pandas import DataFrame as df
import time

from quantopian.pipeline.experimental import QTradableStocksUS

### Pipeline construction using QTradableStocksUS

In [2]:
def make_pipeline():
    average_day_dv_200 = AverageDollarVolume(window_length = 200)
    market_cap = Fundamentals.market_cap.latest
    price_open = USEquityPricing.open.latest
    price_close = USEquityPricing.close.latest
    volume = USEquityPricing.volume.latest
    sector = Sector()
    daily_returns = DailyReturns(inputs = [USEquityPricing.close])
    
    # Ranked Returns
    returns = Returns(window_length=252)
    ranked_retuns = returns.rank()
    
    # New Factors
    # create a factor of 1 year returns, demeaned by sector, rank, zscore
    factor = (
    Returns(window_length=252).\
    demean(groupby = Sector()).\
    rank().\
    zscore()
    )
    
    # use the newly created factor as an input into SimpleMovingAverage, with window length = 5
    factor_smoothed = (SimpleMovingAverage(inputs = [factor], window_length=5).\
                      rank().\
                      zscore()
                      )
    
    return Pipeline(
        columns = {
            'AverageDollarVolume': average_day_dv_200,
            'MarketCap': market_cap,
            'close_price':price_close,
            'open_price':price_open,
            'volume':volume,
            'sector':sector,
            '1yrReturns': factor,
            '5dAvgReturns': factor_smoothed,
            'ranked_returns':ranked_retuns,
            '1dReturns':daily_returns
        },
        screen = QTradableStocksUS()
    )

In [3]:
# pipeline is run over this time range and outputs a dataframe indexed by asset name:
start_date = '2017'
end_date = '2019-08-03'

QTU_pipeline2 = run_pipeline(make_pipeline(), start_date, end_date, chunksize=252)



In [4]:
# lets see what stocks had the most gains ytd through August 2nd
leading_ytd_stocks = QTU_pipeline2.sort_values(by=['ranked_returns'], ascending = False)
leading_ytd_stocks.filter(like='2019-08-02', axis = 0).head(25)

Unnamed: 0,Unnamed: 1,1dReturns,1yrReturns,5dAvgReturns,AverageDollarVolume,MarketCap,close_price,open_price,ranked_returns,sector,volume
2019-08-02 00:00:00+00:00,Equity(49607 [AXSM]),0.018846,1.729817,1.729248,15415280.0,863939700.0,25.95,25.7,7330.0,206,554169.0
2019-08-02 00:00:00+00:00,Equity(42749 [ENPH]),0.069627,1.725988,1.720529,28105330.0,3671445000.0,30.11,28.58,7323.0,311,10589028.0
2019-08-02 00:00:00+00:00,Equity(32726 [EHTH]),-0.012531,1.720882,1.723643,21754930.0,2318811000.0,102.44,103.74,7315.0,103,250048.0
2019-08-02 00:00:00+00:00,Equity(32215 [APPS]),0.027599,1.71833,1.719906,3525766.0,463175200.0,5.585,5.45,7312.0,311,1702343.0
2019-08-02 00:00:00+00:00,Equity(50735 [AYX]),0.120985,1.711948,1.70745,65467130.0,8228589000.0,131.57,123.25,7304.0,311,2407415.0
2019-08-02 00:00:00+00:00,Equity(50477 [IIPR]),-0.036067,1.712586,1.716169,24434160.0,1131358000.0,101.801,105.46,7303.0,104,329757.0
2019-08-02 00:00:00+00:00,Equity(48628 [NVTA]),0.011165,1.714501,1.713055,24059110.0,2545968000.0,27.17,27.0,7302.0,206,2089566.0
2019-08-02 00:00:00+00:00,Equity(50288 [TTD]),0.004899,1.708757,1.711187,199116900.0,11782820000.0,264.6,264.39,7298.0,311,878443.0
2019-08-02 00:00:00+00:00,Equity(50411 [RARX]),-0.032021,1.71131,1.713678,7399436.0,1546865000.0,32.95,34.42,7296.0,206,525195.0
2019-08-02 00:00:00+00:00,Equity(31341 [ZIOP]),-0.025937,1.705566,1.70309,6076891.0,1099411000.0,6.76,7.02,7289.0,206,2870513.0


In [5]:
# lets see what stocks had the highest daily returns on Aug 2nd
leading_daily_stocks = QTU_pipeline2.sort_values(by=['ranked_returns'], ascending = False)
leading_daily_stocks.filter(like='2019-08-02', axis = 0).head(25)

Unnamed: 0,Unnamed: 1,1dReturns,1yrReturns,5dAvgReturns,AverageDollarVolume,MarketCap,close_price,open_price,ranked_returns,sector,volume
2019-08-02 00:00:00+00:00,Equity(49607 [AXSM]),0.018846,1.729817,1.729248,15415280.0,863939700.0,25.95,25.7,7330.0,206,554169.0
2019-08-02 00:00:00+00:00,Equity(42749 [ENPH]),0.069627,1.725988,1.720529,28105330.0,3671445000.0,30.11,28.58,7323.0,311,10589028.0
2019-08-02 00:00:00+00:00,Equity(32726 [EHTH]),-0.012531,1.720882,1.723643,21754930.0,2318811000.0,102.44,103.74,7315.0,103,250048.0
2019-08-02 00:00:00+00:00,Equity(32215 [APPS]),0.027599,1.71833,1.719906,3525766.0,463175200.0,5.585,5.45,7312.0,311,1702343.0
2019-08-02 00:00:00+00:00,Equity(50735 [AYX]),0.120985,1.711948,1.70745,65467130.0,8228589000.0,131.57,123.25,7304.0,311,2407415.0
2019-08-02 00:00:00+00:00,Equity(50477 [IIPR]),-0.036067,1.712586,1.716169,24434160.0,1131358000.0,101.801,105.46,7303.0,104,329757.0
2019-08-02 00:00:00+00:00,Equity(48628 [NVTA]),0.011165,1.714501,1.713055,24059110.0,2545968000.0,27.17,27.0,7302.0,206,2089566.0
2019-08-02 00:00:00+00:00,Equity(50288 [TTD]),0.004899,1.708757,1.711187,199116900.0,11782820000.0,264.6,264.39,7298.0,311,878443.0
2019-08-02 00:00:00+00:00,Equity(50411 [RARX]),-0.032021,1.71131,1.713678,7399436.0,1546865000.0,32.95,34.42,7296.0,206,525195.0
2019-08-02 00:00:00+00:00,Equity(31341 [ZIOP]),-0.025937,1.705566,1.70309,6076891.0,1099411000.0,6.76,7.02,7289.0,206,2870513.0


### Create individual custom dataframes

In [6]:
df_momentum = QTU_pipeline2[['1yrReturns', '5dAvgReturns']]
df_price = QTU_pipeline2[['close_price']]
df_returns = QTU_pipeline2[['1dReturns']]

df_price = df_price.close_price.unstack().fillna(0)
df_returns = df_returns['1dReturns'].unstack().fillna(0)

### Walk through the "Returns" class
we'll walk through how the `Returns` class works, because we'll create a new class that inherits from `Returns` in order to calculated a customized return.

### Returns inherits from CustomFactor
The zipline package has a class [zipline.pipeline.factors.Returns](https://www.zipline.io/appendix.html?highlight=returns#zipline.pipeline.factors.Returns) which inherits from class [zipline.pipeline.CustomFactor](https://www.zipline.io/appendix.html?highlight=custom%20factor#zipline.pipeline.CustomFactor).  The [source code for Returns is here](https://www.zipline.io/_modules/zipline/pipeline/factors/basic.html#Returns), and the [source code for CustomFactor is here.](https://www.zipline.io/_modules/zipline/pipeline/factors/factor.html#CustomFactor) 

**Please open the links to the documentation and source code and follow along with our notes about the code**

### Inputs Variable
The CustomFactor class takes the `inputs` as a parameter of the constructor for the class, otherwise it looks for a class-level variable named `inputs`. `inputs` takes a list of BoundColumn instances. These help us choose what kind of price-volume data to use as input. The `Returns` class sets this to

```
inputs = [USEquityPricing.close]
```

### USEquityPricing Class
The class [USEquityPricing](https://www.zipline.io/appendix.html?highlight=usequitypricing#zipline.pipeline.data.USEquityPricing) has a couple BoundColumn instances that we can choose from:

close = USEquityPricing.close

high = USEquityPricing.high

low = USEquityPricing.low

open = USEquityPricing.open

volume = USEquityPricing.volume

### Quiz 1
If we wish to calculate close to open rturns, which columns from USEquityPricing do you think we'll want ot put into the list and set as `inputs`?

### Answer 1
USEquityPricing.close & USEquityPricing.open

### window_length variable
The CustomFactor class takes `window_length` (an integer) as a constructor parameter, otherwise it looks for a class-level variable named `window_length`. If we chose a `window_length = 2` then ths means that we'll be passing two days' worth of data (two rows) into the `compute` function.

### Quiz 2
What window length would you choose if you were calculating daily close to open returns? Assume we have daily data:

### Answer 2
window_length of 2 to have 2 days of data

### Compute Function
The function definition of the `Returns` class includes the compute function
```
    def compute(self, today, assets, out, close):
        out[:] = (close[-1] - close[0]) / close[0]
```

* `today`: this is handled by parent classes; it has the datetime for the "today" row for the given subset of data. We wont use it for this function implementation.
* `assets`: this is handled by parent classes: it has the column header names fo the "out" and "close". We won't use it for this fucntion implementation.
* `out`: this points to a numpy array that will store the result of our compute. It stores our "return" value of the `compute` function instead of explicitly returning a variable.
* `*input`: a tuple of numpy arrays that contain input data that we'll use t compute a signal. In the 'Returns' definition of `compute`, the input is a single value `close` but we can list more if we need additional columns of data to comupte a return

if we set teh `window_length = 2` , then the `compute` fucntion gets two rows worth of data from `close`. The index 1 value is the most recent value, adn the index value 0 is the earliest in time. Recall that in Python, the -1 index is the same as getting the highest indexed value, so with a numpy array of just length two, -1 gives use the value at index 1.

So the line of code is calculating the one-day return using the close price and storing that into the `out` variable.

$Return = \frac{close_1 - close_0}{close_0} $

### Quiz 3
Given a numpy array for open prices called `open` and a numpy array for close prices called `close`, what code would you write to get the most recent open price? Assum that you have 2 days of data.

### Answer 3
open[1] or open[-1] (because python lets us do a reverse indexing) are valid answers. Use the -1 index allows us to get the most recent price ( the very last indes of the numpy array) regardless of the window length, you may prefer to use -1 to make your code easier to maintain or modify.

### Close to Open Returns (Overnight Returns)
The close-to-open return is the change in price between when the market closed on one day and when it opened on the next. So it's

$
CloseToOpen = \frac{open_1 - close_0}{close_0}
$

We'll now creae a class `CTO` that inherits from `Return`, and override  the `compute` function

### Quiz 4
Create a customized class `CloseToOpenReturns` that inherit from the Returns class. Define that compute function to calculate overnight returns.

In [13]:
class CloseToOpenReturns(Returns):
    """
    the following function crates the return between today's opening price vs. last nights closing price
    """
    # Set window_length (we're calculating daily returns)
    window_length = 2
    
    # set inputs
    inputs = [USEquityPricing.open, USEquityPricing.close]
    
    # The compute method is passed the current day, the assets list, a pre-allocated out vector and the
    # factor's items in the list 'inputs'
    def compute(self, today, assets, out, opens, closes):
        # calculate close-to-open return and save into out[:]
        out[:] = (opens[-1] - closes[0]) / closes[0]

In [20]:
CloseToOpenReturns()

CloseToOpenReturns([EquityPricing<US>.open, EquityPricing<US>.close], 2)

### Trailing overnight returns
The cumulative overnight returns over a week may be predictive of future returns; hence it's kind of a momentum signal.

$
TrailingOvernightReturns = \sum_{1}^{Days}CloseToOpen_1
$
Where $Days$ could be 5 if we are looking at a weekly window.

So we want ot take the `CloseToOpenReturns` as our input into another class, `TrailingOvernightReturns` which also inherits from `Returns`.

### mask
Note that we're going to creae another class that inherits from `Returns`. Recall that `Returns` inherits from [CustomFactor](), which has a `mask` parameter for its constructor. The `mask` parameter takes in a `Filter` object, which determines which stock series get passed to the `compute` function. Note that when we used `AverageDollarVolume` and stored its output  in the variable `universe`, this `universe` variable is of type `Filter`.

### Quiz 5
if you wanted to creae an object of the type `CloseToOpen` and also defind the object so that it only computes returns on the set of stocks in universe that we selected eariler in this notebook, what code would you write?

### Answer 5
WE could instantiate a `CloseToOpen` object wth `CloseToOpenRetuns(mask=universe), and this would only calculate the close to open returns for the stocks defined in ur universe variable.

### numpy.nansum
Numpy has a `namsum` function that treat NaN (not a number) as zeros. Note that by default if we give numpy.nansum a 2D numpy array, it will calculate a single sum across all rows and columns. For our purposes, we want to compute a sum over 5 days (5 rows), and each column has daily close to open returns for a single stock. It helps to think of the matrix (2D numpy array) as a nested list of lists. This makes it easier to  decide whether to set `axis = 0` or `axis = 1`.


```
    temp = 
    [
        [stock1day1, stock2day1]
        [stock1day2, stock2day2]
        ...
    ]
```

if we look at the outermost list, each element is a list that represents one day's worth of data. If we used `np.nansum(tmp, axis = 0)`,this would sum across the days for each stock. If we think of this as a 2D matrix, setting `axis = 0` is like calculating a sum for each column.

If we set `axis = 0` this applies `nansum` to the outermost list (axis 0), so that we end up with:

```
[
    sum_of_stock_1, sum_of_stock_2
]
```

Alternatively, if we set `axis = 1`, this applies `nansum` to the lists nested inside the outermost list. Each of these nested lists represent data for a single day, for all stocks, so that we get:

```
    [
    sum_of_day_1,
    sum_of_day_2
    ]
```

### Example using numpy.nansum

In [14]:
tmp = np.array([
    [1,2,3],
    [np.nan, np.nan, np.nan],
    [1,1,1]
])

np.nansum(tmp), np.nansum(tmp, axis = 0), np.nansum(tmp, axis=1)

(9.0, array([ 2.,  3.,  4.]), array([ 6.,  0.,  3.]))

### Quiz 6
For our purposes, we want a sum for each stock series. Which axis do you think we should use?

### Answer 6
WE want to set axis = 0 so that we have a sum for each stock (each column).

### Quiz 7
Create a class TrailingOvernightReturns that inherits from Returns and takes the cumulative weekly sum of overnight reutrns

In [31]:
universe = AverageDollarVolume(window_length=120).top(500) 

In [32]:
class TrailingOvernightReturns(Returns):
    """
    Sum of trailing close-to-open returns; we expect sentiment persistence at short horizons, 
    so we look at the 5-day (i.e. 1 week) window
    """
    
    # choose a window_length to calculate a weekly return
    window_length = 5
    
    # set inputs to a list containing the daily close to open returns
    # filter the close to open returns by our stock universe
    
    inputs = [CloseToOpenReturns(mask=universe)]
    
    def compute(self, today, assets, out, close_to_open):
        # calcualte the sum of close_to_open
        # choose the axis so that ther is a sum for each stock (each column)
        # treat NaN as zeros
        out[:] = np.nansum(close_to_open, axis = 0)

In [33]:
TrailingOvernightReturns()

TrailingOvernightReturns([CloseToOpenReturns(...)], 5)

# Quiz 8
Create a factor by instantiating the TrailingOvernightReturns class that you just defined. Deman by sector, rank, and convert to a zscore.

In [34]:
def make_pipeline():
    average_day_dv_200 = AverageDollarVolume(window_length = 200)
    market_cap = Fundamentals.market_cap.latest
    price = USEquityPricing.close.latest
    volume = USEquityPricing.volume.latest
    sector = Sector()
    daily_returns = DailyReturns(inputs = [USEquityPricing.close])
    
    # Ranked Returns
    returns = Returns(window_length = 252)
    ranked_returns = returns.rank()
    
    # New Factors
    # create a factor of 1 year returns, demeaned by sector, rank, zscore
    factor = (
    Returns(window_length=252).\
    demean(groupby=Sector()).\
    rank().
    zscore()
    )
    
    # use the newly created factor as a input into SimpleMovingAverage, with windowlength = 5
    factor_smoothed = (SimpleMovingAverage(inputs = [factor], window_length=5).\
                      rank().
                      zscore()
                      )
    
    # create an overnight_returns_factor variable
    overnight_returns_factor = (
    TrailingOvernightReturns().\
        demean(groupby = Sector()).\
        rank().\
        zscore()
    )
    
    return Pipeline(
        columns = {
            'AverageDollarVolume':average_day_dv_200,
            'MarketCap':market_cap,
            'Price':price,
            'Volume':volume,
            'Sector':sector,
            '1yrReturns':factor,
            '5dAvgReturns':factor_smoothed,
            'ranked_return':ranked_returns,
            '1dReturns':daily_returns,
            'o_n_sentiment': overnight_returns_factor
        },
        screen = QTradableStocksUS()
    )

In [35]:
make_pipeline()

<zipline.pipeline.pipeline.Pipeline at 0x7f315787e2d0>

In [36]:
# pipeline is run over this time range and outputs a data frame indexed by asset name:
start_date = '2017'
end_date = '2019-08-02'

QTU_pipeline3 = run_pipeline(make_pipeline(), start_date, end_date, chunksize=252)



In [37]:
QTU_pipeline3

Unnamed: 0,Unnamed: 1,1dReturns,1yrReturns,5dAvgReturns,AverageDollarVolume,MarketCap,Price,Sector,Volume,o_n_sentiment,ranked_return
2017-01-03 00:00:00+00:00,Equity(2 [ARNC]),-0.008552,-1.663162,-1.694499,1.736107e+08,8.129387e+09,18.550,101,3871123.0,-1.709949,609.0
2017-01-03 00:00:00+00:00,Equity(24 [AAPL]),-0.007624,0.242636,0.270176,3.048351e+09,6.175885e+11,115.840,311,23251752.0,-1.724268,3760.0
2017-01-03 00:00:00+00:00,Equity(31 [ABAX]),-0.001893,0.731982,0.638386,5.583182e+06,1.189172e+09,52.740,206,79145.0,-0.286704,1770.0
2017-01-03 00:00:00+00:00,Equity(41 [ARCB]),-0.015957,0.571808,0.623764,3.861210e+06,7.084623e+08,27.750,310,118470.0,0.257442,5667.0
2017-01-03 00:00:00+00:00,Equity(52 [ABM]),0.001717,1.027217,0.888291,8.639352e+06,2.268896e+09,40.680,310,257808.0,0.258064,6267.0
2017-01-03 00:00:00+00:00,Equity(53 [ABMD]),-0.008446,1.407291,1.366831,4.395544e+07,4.888170e+09,112.700,206,91316.0,-0.286081,5479.0
2017-01-03 00:00:00+00:00,Equity(62 [ABT]),0.002348,0.527013,0.414403,3.053665e+08,5.655141e+10,38.420,206,8426976.0,1.625899,1350.0
2017-01-03 00:00:00+00:00,Equity(64 [GOLD]),-0.024405,1.425616,1.415350,2.981478e+08,1.862359e+10,15.990,101,17842566.0,1.724268,6979.0
2017-01-03 00:00:00+00:00,Equity(67 [ADSK]),-0.013068,0.628819,0.669624,9.727194e+07,1.647140e+10,74.010,311,1035873.0,1.600995,4775.0
2017-01-03 00:00:00+00:00,Equity(76 [TAP]),-0.007141,-0.091286,-0.100028,1.395543e+08,2.090877e+10,97.330,205,577000.0,1.638973,3169.0
