Please write a program that takes in a stock and requested dates and returns an estimate of the beta
(and whether it is statistically significantly different from the market's beta), and the alpha (and whether
 it is statistically significantly different from zero).  It should also supply the SCL (security characteristic line)
 and any other metrics of interest (typically, a table of summary statistics, ANOVA, and coefficients).
 It should also write all the data to an Excel file in the event the user wants to do further analysis.

Thus, one suggested procedure would be:

-Program should query the user to enter the number of stocks to analyze and dates of interest (start/end).

-Then for that number, user should enter ticker symbols.  Program should also query user as to what benchmark
is appropriate (give options that user can select (like S&P 500, Russell 3000, Nasdaq, etc.) and/or invite user
to enter their own index symbol.

-Program should request user enter the appropriate risk free rate to use.

-Program should scrape the price data from yahoo, google, or any other data site (there are plenty of examples
of code on the web that you may use).

-Program should convert price data to returns, run regression, and interpret output.   Interpretations should be
printed out to display and/or file for recordkeeping--perhaps in tabular form.

-Program should write data to file--preferably a multi-tabbed excel file with each stock's regression in one tab.
 Multiple files are also acceptable.

Note:  there are code samples for yahoo and google finance extractions available on the web--feel free to use these.
Also note that yahoo uses "adjusted price" and closing price.  You should always use the adjusted price since that incorporates the effects of dividends and splits.

In [1]:
import pandas as pd
import pandas_datareader.data as web
import numpy as np
import random
import statsmodels.api as sm
from statsmodels import regression
%pylab inline

Populating the interactive namespace from numpy and matplotlib


`%matplotlib` prevents importing * from pylab and numpy
  "\n`%matplotlib` prevents importing * from pylab and numpy"


In [5]:
# Returns intercept, first coefficient, regression summary. ref: https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.OLSResults.summary2.html, https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html
def linreg(x,y):
    
    x = sm.add_constant(x)
    model = regression.linear_model.OLS(y,x).fit()
    
#     x = x[:, 1]
    return model.params[0], model.params[1], model.summary2()

def betas():
    benchmark = input('Which benchmark ticker would you like to use? \nSPY, Russel etc...')
    stocks = [benchmark]
    print('[Benchmark]:', stocks)
    print()
    
    rf = int(input('What would you like to use for risk free rate?'))
    print('Rf: ', rf)
    print()
    
    num_stocks = int(input('How many companies do you want to look at? '))
    print('Number of stocks: ', num_stocks)
    print()
    
    while num_stocks != 0:
        s = input('Please enter ticker symbol for one stock: ')
        stocks.append(s)
        num_stocks -= 1
    print('Stocks: ', stocks) #Includes benchmark
    
    start_year = int(input('What year do you want to start on? '))
    end_year = int(input('What year do you want to end on? '))
    
    data = (web.DataReader(stocks,data_source='yahoo',start='01/01/'+str(start_year), 
                       end='12/31/'+str(end_year))['Adj Close'])
   
    print('\nData Head: ')
    print(data.head())
    print('Data Tail: ')
    print(data.tail())
    
#     returns = datas.pct_change()
#     returns = returns.dropna()
    
    stocks = stocks[1:]
    print('Stocks: ', stocks) #DEBUG
    print('Stocks[1:] w/out benchmark: ', stocks[1:])
    print()
    
    results = []
    
    # Create Pandas Excel writer object w/ engine xlsxwriter
    writer = pd.ExcelWriter('pandas_multiple.xlsx', engine='xlsxwriter')
    # Create xlsxwriter workbook object
    workbook  = writer.book

    # Regress each stock against the benchmark 
    for stock in stocks:
        returns = data[[benchmark, stock]].pct_change()
        returns = returns.dropna()
        print('Returns with no NAs: ', returns)
              
        X = returns[benchmark] #changed from 'stock'
        y = returns[stock] #changed from 'benchmark'
              
        a, b, model = linreg(X,y)
              
        df = model.tables[0] # Regresion summary statistics
#         print('Df: ', df)
        df1 = model.tables[1] # Tests of the coefficients
#         print('Df1: ', df1)
        table = df.append(df1) # Combine
        print('Table: ', table)
            
        table.to_excel(writer, sheet_name='{}'.format(stock)) 
              
        plt.scatter(X,y, alpha=0.3) #alpha - value used for blending
              
        p = np.linspace(X.min(),X.max(), 100)
        print('P shape: ', p.shape())
              
        y1 = b*p + a
        print('Y1: ', y1.shape())
              
        plt.plot(p, y1, 'r', alpha=.9)
        plt.xlabel('Benchmark {} Return'.format(benchmark))
        plt.ylabel('{} return'.format(stock))
        plt.savefig('{}.png'.format(stock))
              
        worksheet = writer.sheets['{}'.format(stock)]
        worksheet.insert_image('A15', '{}.png'.format(stock))
        plt.clf() #Clear the current figure.
        
    writer.save()
    writer.close()

In [None]:
betas() #rerun

In [13]:
# import statsmodels.api as sm
# Y = [1,3,4,5,2,3,4]
# X = range(1,8) 
# print(X)
# X = sm.add_constant(X)
# print(X)
# model = sm.OLS(Y,X)
# print(model)
# results = model.fit()
# print(results)
# results.params
# resul

range(1, 8)
[[1. 1.]
 [1. 2.]
 [1. 3.]
 [1. 4.]
 [1. 5.]
 [1. 6.]
 [1. 7.]]
<statsmodels.regression.linear_model.OLS object at 0x0000029CA1202B88>
<statsmodels.regression.linear_model.RegressionResultsWrapper object at 0x0000029CA1202D48>


array([2.14285714, 0.25      ])