**Fin 585R**  
**Diether**  
**Problem Set**  
**Introduction to Portfolios**  
**Application: Short Selling**

**Overview**

This problem set is designed to introduce you to portfolio construction using a real portfolio application from the Finance literature. This assignment builds on the concepts and code I introduced last time in class. I think you'll find the Intro to Portfolios' Jupyter notebook helpful for this assignment.


**Short Selling Background**

When someone shorts a stock, they profit if the price of the stock goes down instead of going up, but short-selling transactions are more complicated than going long (buying a stock and then later selling it).  There are four basic steps to short selling:<br>

1. *The short seller borrows the desired number of shares from someone.* This is usually done by the broker who locates the shares and the broker becomes the middleman for the short seller and the lender (note, the broker is often both the middleman and the lender). The lender expects to be paid interest on the loan which is the main cost of shorting. The loan is callable by the lender at any time. The short seller can repay the loan at any time. <br><br>

2. *The short seller sells the shares.* The proceeds are put into an interest-bearing account called the collateral account. Most lenders require the collateral account to contain 102% of the value of the proceeds.  The collateral account usually invests in low risk, short term securities (e.g., Treasury bills). When the short seller borrows the stock there are lending fees; the short seller pays interest on the loan. Typically, the interest rate is small. The overall interest rate earned on the collateral account is split between the lender and the short seller. The portion of the interest rate received by the short seller is called the rebate rate. The **loan fee** is the portion paid the lender, and is equivalent to the interest rate the short seller pays on the loan. Therefore, the **loan fee** is the main direct cost of shorting. There can be a zero or negative rebate rate; a negative rebate rate corresponds to a situation where the lender receives all the interest in the collateral account and the short sellers pays additional interest out of her pocket to the lender.<br><br>

3. *Pay any dividends while the loan is open.* The short seller must pay to the lender the cash equivalent of any dividends paid out on the stock. If you short Apple, and Apple pays a 2 dollar dividend per share during the time you short the stock, then you owe the lender 2 dollars for every share you shorted. <br><br>

4. *Buy the shares back.* The short sellers profits are the following:

$$
Profit = Sell - Buy -(Interest \ Paid)
$$ <br> 
  
The Finance literature has been interested in whether short sellers have good information. Do short sellers typically make money when the short? If loan fees are high, then short sellers are paying a high price to short (bet that the stock will perform poorly). Therefore, if short sellers are paying a lot to short, it likely represents times when short sellers have good information. 

In this problem set you create portfolios based on lagged loan fee to test the preceding hypothesis. Remember, the returns in the data are from going long (buying and then selling the stock). Your portfolios will reflect that fact. If the returns are really low or negative for a portfolio, then that means the short sellers are experiencing high returns.


**Data**

The data are monthly stock data for all stocks in the U.S. with non-missing loan fee data. The basic unit of observation is the stock month. You can download the data directly using the following link: [the data](http://diether.org/prephd/03-mstk_short_02-12.csv). There is also a link on *Learning Suite*. The data contain the following variables:

|Variable | Description                                       |
|---------|---------------------------------------------------|
|permno   | stock identifier                                  |
|caldt    | calendar date                                     |
|ret      | monthly return                                    |
|prclag   | stock price, lagged                               |   
|melag    | market equity, lagged                             |
|feelag   | the loan fee expressed a percent per anum, lagged |

**Tasks and Questions**  

1. What is the sample period of this data?<br><br>

2. There are some observations where `feelag` is less than zero. These are data errors. Remove these observation from the dataframe and then compute the summary statistics for `feelag`.<br><br>

3. Output the median and 90th percentile of `feelag` for every month. This can be done in seperate commands. <br><br>

4. Construct three equal-weight portfolios using `feelag` a the criterion variable. Portfolio 0: an equal-weight portfolio the includes all stocks with `feelag` less than or equal to 3% (loan fee are express as the interest rate per anum). Portfolio 1: an equal-weight portfolio the includes all stocks with `feelag` greater than 3% and less than or eqal to 5%. Portfolio 2: an equal-weight portfolio the includes stocks with `feelag` greater than 5% (loan fee are express as the interest rate per anum). Hint: use the pd.cut to bin the data and use my Jupyter notebook from last class as a guide.<br><br>

5. Compute summary statistics for each of the portfolios you created including the mean, standard deviation, and t-statistics testing of the mean return of the portfolio is zero.<br><br>

5. Are your results consistent with the hypothesis that short sellers have good information?

In [2]:
import pandas as pd
import numpy as np
from finance_byu.summarize import summary

In [3]:
df = pd.read_csv("03-mstk_short_02-12.csv",parse_dates=['caldt'])
df.head()

Unnamed: 0,permno,caldt,ret,prclag,melag,feelag
0,10001,2005-06-30,0.12843,8.02,21.053,0.15
1,10001,2005-07-29,0.009945,9.05,26.363,0.32701
2,10001,2005-08-31,0.039387,9.14,26.625,0.15
3,10001,2005-10-31,-0.11904,11.51,33.529,0.1625
4,10001,2005-11-30,-0.059397,10.1,29.421,0.15


What is the sample period of this data?

In [4]:
df.caldt.describe(datetime_is_numeric=True)[["min", "max"]]

min    2002-06-28 00:00:00
max    2012-07-31 00:00:00
Name: caldt, dtype: object

**Corrected**:
`df['caldt'].min()`

There are some observations where feelag is less than zero. These are data errors. Remove these observation from the dataframe and then compute the summary statistics for feelag.

In [5]:
df = df.query("feelag >= 0").reset_index(drop = True)
# Probably not appropriate to use summary() here..?
df["feelag"].describe()

count    419500.000000
mean          1.030662
std           3.764338
min           0.000000
25%           0.095028
50%           0.150000
75%           0.361825
max          98.180000
Name: feelag, dtype: float64

**NOTE:** You pay .15% per year in order to short it; on average, these loan fees are VERY low

Output the median and 90th percentile of feelag for every month. This can be done in seperate commands.

In [6]:
percentiles = [.5, .9]
# Do you want for every month for every year? Or just across the months?
df.groupby("caldt")["feelag"].quantile(percentiles).unstack()

Unnamed: 0_level_0,0.5,0.9
caldt,Unnamed: 1_level_1,Unnamed: 2_level_1
2002-06-28,0.442105,1.87500
2002-07-31,0.250000,1.55537
2002-08-30,0.250000,1.62500
2002-09-30,0.227160,1.75000
2002-10-31,0.189690,1.39574
...,...,...
2012-03-30,0.104045,3.44536
2012-04-30,0.099750,3.43792
2012-05-31,0.108170,3.59560
2012-06-29,0.109820,3.91192


Construct three equal-weight portfolios using feelag a the criterion variable. Portfolio 0: an equal-weight portfolio the includes all stocks with feelag less than or equal to 3% (loan fee are express as the interest rate per anum). Portfolio 1: an equal-weight portfolio the includes all stocks with feelag greater than 3% and less than or eqal to 5%. Portfolio 2: an equal-weight portfolio the includes stocks with feelag greater than 5% (loan fee are express as the interest rate per anum). Hint: use the pd.cut to bin the data and use my Jupyter notebook from last class as a guide.

In [7]:
df['bins'] = pd.cut(df['feelag'],[0,3,5,100],labels=False)
df.bins.describe()

count    419165.000000
mean          0.131717
std           0.458930
min           0.000000
25%           0.000000
50%           0.000000
75%           0.000000
max           2.000000
Name: bins, dtype: float64

Compute summary statistics for each of the portfolios you created including the mean, standard deviation, and t-statistics testing of the mean return of the portfolio is zero.

In [8]:
port = df.groupby(['caldt','bins'])['ret'].mean()*100
port = port.unstack(level = "bins")
summary(port).loc[["count","mean", "std", "tstat"]].round(3)

bins,0.0,1.0,2.0
count,122.0,122.0,122.0
mean,0.972,-0.128,-0.766
std,6.548,9.464,9.395
tstat,1.639,-0.15,-0.9


Are your results consistent with the hypothesis that short sellers have good information?

We hypothesized that if short sellers were willing to pay high fees, they must have good information. Our findings are not consistent with this hypothesis; our bin labeled "2.0" represents the short sellers' transactions with the highest fees. The standard deviations of each bin is highly inflated relative to the return; hence, we would be unable to reject the null hypothesis that shortsellers have a return other than 0 on average.

# Spread Portfolio
`100% in Low - 100% in High`

In [13]:
port["spread"] = port[0.0] - port[2.0]
summary(port).loc[["count","mean", "std", "tstat"]].round(3)

bins,0.0,1.0,2.0,spread
count,122.0,122.0,122.0,122.0
mean,0.972,-0.128,-0.766,1.737
std,6.548,9.464,9.395,5.394
tstat,1.639,-0.15,-0.9,3.558


We see the spread is meaningfully different than zero; so we can construct this portfolio for a return