## S&P 500 Index Fund

### Introduction and Library Imports

The S&P 500 is the world's most popular market index. The largest fund that is benchmarked to this index is the SPDR S&P 500 ETF trust. 

The goal of this notebook is to create a Python script that will accept the value of your portfolio and tell you how many shares of each S&P 500 constituent you should purchase to get an equal-weighting version of the index fund. 

### Library Imports 

Let's first import the libraries we will be using. 

In [1]:
import numpy as np
import pandas as pd
from pandas_datareader import data
import yfinance as yf
import requests
import xlsxwriter 
import math
from datetime import date

# import API key
from secrets import ALPHAVANTAGE_API_KEY

# Can load in data if needed
# final_dataframe = pd.read_csv("SP500_Fund.csv", index_col=0)

### Importing List of Stocks

We now import the constituents of the S&P 500. 

These constituents change over time, so ideally we would connect directly to the index provider (Standard & Poor) and pull their real-time constituents on a regular basis. 

Paying for access to the index provider's API is outside the scope of our project. 

There is a static version which has been downloaded and saved within this folder. We now need to import these stocks to our notebook.  

In [2]:
tickers = pd.read_csv('constituents.csv')
tickers = tickers.loc[:, "Symbol"].copy()
tickers

0       MMM
1       AOS
2       ABT
3      ABBV
4      ABMD
       ... 
500     YUM
501    ZBRA
502     ZBH
503    ZION
504     ZTS
Name: Symbol, Length: 505, dtype: object

***
## Adding Stocks Data to a Pandas Dataframe ## 

We now download the stock data using yfinance, extracting price and market capitalization, and then append the data into a pandas dataframe.

We first create the empty dataframe for our data.

In [3]:
my_columns = ["Ticker", "Price", "MarketCap", "NumberSharesToBuy"]
final_dataframe = pd.DataFrame(columns=my_columns)
final_dataframe

Unnamed: 0,Ticker,Price,MarketCap,NumberSharesToBuy


Loop through all stocks and get the price and market cap, then place it into `final_dataframe`. Finally, save the final dataframe as a csv file so it can be loaded without repeating the loop. 

In [4]:
### Using Alpha Vantage API: only can have 5 requests per minute....
# for stock in tickers:
#     print(stock)
#     api_url = f"https://www.alphavantage.co/query?function=OVERVIEW&symbol={stock}&apikey={ALPHAVANTAGE_API_KEY}"
#     try:
#         data = requests.get(api_url).json()
#         market_cap = int(data['MarketCapitalization'])
#         price = float(data['PERatio']) * float(data['EPS'])
#         row = pd.Series([stock, price, market_cap, 'N/A'], index = my_columns)
#         final_dataframe = pd.concat([final_dataframe, row.to_frame().T], ignore_index=True)
#     except:
#         print("Error with: ", stock)

        
### Using yfinance
for stock in tickers :
    print(stock)
    try :
        price = yf.Ticker(stock).history(start=date.today())['Close'][0]
        marketCap = int(data.get_quote_yahoo(stock)['marketCap'])
        row = pd.Series([stock, price, marketCap, 'N/A'], index = my_columns)
        final_dataframe = pd.concat([final_dataframe, row.to_frame().T], ignore_index=True)
    except:
        print("Error with: ", stock)
    

final_dataframe.to_csv("SP500_Fund.csv")

MMM
AOS
ABT
ABBV
ABMD
- ABMD: No data found for this date range, symbol may be delisted
Error with:  ABMD
ACN
ATVI
ADM
ADBE
AAP
AMD
AES
AFL
A
APD
AKAM
ALK
ALB
ARE
ALGN
ALLE
LNT
ALL
GOOGL
GOOG
MO
AMZN
AMCR
AEE
AAL
AEP
AXP
AIG
AMT
AWK
AMP
ABC
AME
AMGN
APH
ADI
ANSS
ANTM
Got error from yahoo api for ticker ANTM, Error: {'code': 'Not Found', 'description': 'No data found, symbol may be delisted'}
- ANTM: No data found for this date range, symbol may be delisted
Error with:  ANTM
AON
APA
AAPL
AMAT
APTV
ANET
AJG
AIZ
T
ATO
ADSK
ADP
AZO
AVB
AVY
BKR
BLL
Got error from yahoo api for ticker BLL, Error: {'code': 'Not Found', 'description': 'No data found, symbol may be delisted'}
- BLL: No data found for this date range, symbol may be delisted
Error with:  BLL
BAC
BBWI
BAX
BDX
BRK.B
Got error from yahoo api for ticker BRK.B, Error: {'code': 'Not Found', 'description': 'No data found, symbol may be delisted'}
- BRK.B: No data found for this date range, symbol may be delisted
Error with:  BRK.B
BBY
B

We now need to compute how much of each share to buy.

### _Alternative: Batching API Calls_ ### 

**NOTE: AlphaVantage does not currently allow for batch calls**

Here we overview how to batch our API calls so that we can get more data and avoid limits. Here, we will split our list of stocks up into chunks of 100 and make a batch api request for each group. 

The following function is useful and breaks up any list into n equal sized chunks.

In [None]:
def chunks(lst, n):
    ## Yield successive n-sized chunks from lst
    for i in range(0, len(lst), n):
        yield lst[i:i+n]

We now batch our tickers into a list of strings, separated by `,` symbols.

In [None]:
symbol_groups = list(chunks(tickers, 100))
symbol_strings = []
for i in range(0, len(symbol_groups)):
    symbol_strings.append(','.join(symbol_groups[i]))

Again we complete the loop and input the data in the final dataframe. This has not been completed as the API we are using does not allow for batch calls.

***
## Calculating the Number of Shares to Buy ##

We now calculate how many shares of each stock we should buy in order to create a equal-weighted and marketCap weighted S&P 500 fund.

We first take user input for the size of the portfolio, checking that the input is correct.

In [2]:
portfolio_size = input("Enter the value of your portfolio:")

try: 
    portfolio_size = float(portfolio_size)
except ValueError:
    print("Portfolio Size is not a valid number.\n")

We will create a marketCap weighted S&P 500 index fund.

In [14]:
total_market_cap = sum(final_dataframe.loc[:,"MarketCap"])
for i in range(0, len(final_dataframe.index)):
    weight = final_dataframe.loc[i,"MarketCap"]/total_market_cap
    value = portfolio_size*weight
    final_dataframe.loc[i,"NumberSharesToBuy"] = math.floor(value/final_dataframe.loc[i,"Price"])

final_dataframe

Unnamed: 0,MarketCap,NumberSharesToBuy
0,65461350400,16.0
1,8828924928,4.0
2,194931130368,50.0
3,262990675968,51.0
4,171988385792,18.0
...,...,...
479,35506774016,8.0
480,14854926336,1.0
481,25591451648,6.0
482,7409083392,4.0


***
## Formatting Our Excel Output ##

We will use XlsxWriter to create a nicely-formatted Excel file for output. 

### Initializing XlsxWriter Object ###

In [9]:
writer = pd.ExcelWriter("SP500_fund.xlsx", engine="xlsxwriter")
final_dataframe.to_excel(writer, "SP500 Fund", index=False)

### Creating the Formats We'll Need For Our `.xlsx` File

Formats include colors, fonts, and also symbols like `%` and `$`. We'll need four main formats for our excel documents:
- string format for tickers
- $XX.XX format for stock prices 
- $XX,XXX format for market capitalization 
- integer format for the number of shares to buy

In [10]:
background_color = "#0a0a23"
font_color = "ffffff"

string_format = writer.book.add_format(
    {
        "font_color" : font_color,
        "bg_color" : background_color,
        "border" : 1
    }
)

dollar_format = writer.book.add_format(
    {
        "num_format" : "$0.00",
        "font_color" : font_color,
        "bg_color" : background_color,
        "border" : 1
    }
)

integer_format = writer.book.add_format(
    {
        "num_format" : "0",
        "font_color" : font_color,
        "bg_color" : background_color,
        "border" : 1
    }
)

### Applying the Formats to our Columns in our `.xlsx` File ###

We can use `set_column` method applied to the `writer.book` object to apply formats to specific columns of our spreadsheets. 

In [11]:
column_formats = {
    "A" : ["Ticker", string_format],
    "B" : ["Stock Price", dollar_format],
    "C" : ["Market Capitalization", dollar_format],
    "D" : ["Number of Shares to Buy", integer_format]
}

for column in column_formats.keys():
    writer.sheets["SP500 Fund"].set_column(f"{column}:{column}", 30, column_formats[column][1])
    writer.sheets["SP500 Fund"].write(f"{column}1", column_formats[column][0], column_formats[column][1])

### Saving Our Excel Output ###

Saving is very easy:

In [12]:
writer.close()