The S&P 500 is the most popular index fund in the world. I have tried to build a similar index fund except that instead of being weighted like the S&P 500 index fund is, this index, however is equally weighted.The goal of this project is to create a Python script that will accept the value of your portfolio and tell you how many shares of each S&P 500 constituent you should purchase to get an equal-weight version of the index fund.

We start by importing the required packages.

In [1]:
import numpy as np
import math
import xlsxwriter
import pandas as pd
import warnings
import seaborn as sns
import requests

After which we read the data into the pandas dataframe and print it to get a look. This data contains the constituents of the S&P 500.

In [2]:
stocks = pd.read_csv('sp_500_stocks.csv') # we read our data into pandas
stocks

Unnamed: 0,Ticker
0,A
1,AAL
2,AAP
3,AAPL
4,ABBV
...,...
500,YUM
501,ZBH
502,ZBRA
503,ZION


In [19]:
stocks.info()

<bound method NDFrame.describe of     Ticker
0        A
1      AAL
2      AAP
3     AAPL
4     ABBV
..     ...
500    YUM
501    ZBH
502   ZBRA
503   ZION
504    ZTS

[501 rows x 1 columns]>

We see no null entries in th data. Hence, we can proceed.

The next cell of code is not to be run now. We encounter a KeyError in the later stages of this project and the next cell is to clean data for the same. Please proceed now and come back to it later on when you get the error.

In [3]:
#stocks = stocks[~stocks['Ticker'].isin(['DISCA', 'HFC', 'VIAC', 'WLTW'])] #to get rid of the 4 rows with delisted stocks
#stocks #remove the '#' from the start of both lines to run the code # we don't care about the index here as we will create a new data frame and store the values there

Unnamed: 0,Ticker
0,A
1,AAL
2,AAP
3,AAPL
4,ABBV
...,...
500,YUM
501,ZBH
502,ZBRA
503,ZION


We see that the data has 505 rows and only 1 column. Next we need to get the API token 

Now in order to make an API call, we need an API token.We have stored that in a secrets.py file for limited access.

In [2]:
from secrets import IEX_CLOUD_API_TOKEN

Now it's time to structure our API calls to IEX cloud.

We need the following information from the API:
1. Market capitalization for each stock
2. Price of each stock

All the API refrences and documentation that we might need can be found at https://iexcloud.io/docs/api.
However, for the purpose of this project we are using a testing sandbox URL as the base URL of the API, since that will give us free, randomised data that may not be useful for a real investing strategy but certainly will serve our purpose here. First we must create the URL for the API call. Exploring the documentation on IEX Cloud website we find the base url for testing sandbox is: https://sandbox.iexapis.com

To make the call we will need an API endpoint that will give us both the market cap and the price of each stock. Going through the IEX Cloud references we see that the Quote endpoint does that.That gives us https://sandbox.iexapis.com/stock/{symbol}/quote/. We also need the API token to make the call so we'll add that. https://sandbox.iexapis.com/stock/{symbol}/quote?token={IEX_CLOUD_API_TOKEN}. We also need to specify the naming convention in order to get the data, /stable/ gives us the most stable data to use. This along with API token should be enough to get us the data for Apple.  https://sandbox.iexapis.com/stable/stock/{symbol}/quote?token={API_TOKEN} Let's try that.

In [5]:
symbol ='AAPL'
api_url = f'https://sandbox.iexapis.com/stable/stock/{symbol}/quote?token={IEX_CLOUD_API_TOKEN}'
data = requests.get(api_url)
print(data.status_code)

200


We see Status code of 200 which implies that the API call has been processed successfully on the server. We will store the data as a json file which will give us a python dictionary to work with.

In [6]:
data = requests.get(api_url).json() #adding .json gives us a dictionary that is easier to use
data

{'avgTotalVolume': 76769202,
 'calculationPrice': 'close',
 'change': 1.7,
 'changePercent': 0.0118,
 'close': 153.74,
 'closeSource': 'ifcaoifl',
 'closeTime': 1718662168918,
 'companyName': 'Apple Inc',
 'currency': 'USD',
 'delayedPrice': 155.51,
 'delayedPriceTime': 1659140935490,
 'extendedChange': -0.12,
 'extendedChangePercent': -0.0008,
 'extendedPrice': 155.39,
 'extendedPriceTime': 1668666924520,
 'high': 154.9,
 'highSource': 'ac1iite enuye mdr 5delp',
 'highTime': 1686690468809,
 'iexAskPrice': None,
 'iexAskSize': None,
 'iexBidPrice': None,
 'iexBidSize': None,
 'iexClose': 156.72,
 'iexCloseTime': 1672598380638,
 'iexLastUpdated': None,
 'iexMarketPercent': None,
 'iexOpen': 156.13,
 'iexOpenTime': 1716701005369,
 'iexRealtimePrice': None,
 'iexRealtimeSize': None,
 'iexVolume': None,
 'lastTradeTime': 1658913586580,
 'latestPrice': 151.33,
 'latestSource': 'Close',
 'latestTime': 'July 15, 2022',
 'latestUpdate': 1735873968776,
 'latestVolume': 77235406,
 'low': 150.8,


Now we need to parse our API call to the components we need. As noted earlier the two variables we need are Market cap, here denoted by the variable, 'marketCap', and, the price of the stock, here denoted by the variable, 'latestPrice'.

In [7]:
price = data['latestPrice']
market_cap = data['marketCap']
#market_cap
#price

Now we need to scale up this process by calling for the price and market cap of each stock in the csv file and then store that into a pandas data frame which we might then download as an excel file. So we will now start building our pandas data frame. We will need 4 columns i.e., symbol, stock price, market cap, and, number of shares to buy.

In [8]:
my_columns = ['Ticker', 'Price', 'Market Capitalization', 'Number Of Shares to Buy']
final_dataframe = pd.DataFrame(columns = my_columns) #creates an empty pandas data frame with 4 coulmns

Having created our data frame we will use the append function to append the data into the data frame.

In [9]:
final_dataframe = final_dataframe.append(
                                        pd.Series([symbol, 
                                                   price, 
                                                   market_cap, 
                                                   'N/A'], 
                                                  index = my_columns), 
                                        ignore_index = True) #ignore_index = True is added to prevent a common error
final_dataframe

Unnamed: 0,Ticker,Price,Market Capitalization,Number Of Shares to Buy
0,AAPL,151.33,2521516452993,


We see that the method works. Now let's create a loop to loop through all the stocks in the data and make the API call which will then add to our data frame. Here we are using a single API request.

In [10]:
final_dataframe = pd.DataFrame(columns = my_columns) # redefine an empty data frame
for stock in stocks['Ticker']: # loops for all stocks in the column 'Ticker'
    api_url = f'https://sandbox.iexapis.com/stable/stock/{stock}/quote?token={IEX_CLOUD_API_TOKEN}'
    data = requests.get(api_url).json()
    final_dataframe = final_dataframe.append(
                                        pd.Series([stock, 
                                                   data['latestPrice'], 
                                                   data['marketCap'], 
                                                   'N/A'], 
                                                  index = my_columns), 
                                        ignore_index = True)
final_dataframe

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

We saw a few KeyErrors. This is probably because the stocks with Ticker, 'DISCA', 'HFC', 'VIAC', 'WLTW', have been delisted and therefore the API call request sends an error. So we we will go back to the first few steps and clean the data of these rows.

Well that works! But it is exceptionally slow and has a tendency to disconnect. It took me a fair few times to stay connected with server long enough to get the output. We know that making HTTPS requests are one of the slowest things one can do on python.

Next we will try to learn how to make batch API calls, where a bunch of API calls cluubed together. This makes the process much more efficient. Although, IEX Cloud limits their batch API calls to 100 tickers per request, this is a massive improvement as it reduced the number of requests down to 1%, i.e., 5 requests, of the original, 500 requests. 

PERFORMING BATCH API CALLS:

In [11]:
def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

In [12]:
symbol_groups = list(chunks(stocks['Ticker'], 100))
symbol_groups
#len(symbol_groups) #to find the length of this list

[0         A
 1       AAL
 2       AAP
 3      AAPL
 4      ABBV
       ...  
 95     CINF
 96       CL
 97      CLX
 98      CMA
 99    CMCSA
 Name: Ticker, Length: 100, dtype: object,
 100     CME
 101     CMG
 102     CMI
 103     CMS
 104     CNC
        ... 
 196     FTV
 197      GD
 198      GE
 199    GILD
 200     GIS
 Name: Ticker, Length: 100, dtype: object,
 201       GL
 202      GLW
 203       GM
 204     GOOG
 205    GOOGL
        ...  
 297      MAS
 298      MCD
 299     MCHP
 300      MCK
 301      MCO
 Name: Ticker, Length: 100, dtype: object,
 302    MDLZ
 303     MDT
 304     MET
 305     MGM
 306     MHK
        ... 
 397      RL
 398     RMD
 399     ROK
 400     ROL
 401     ROP
 Name: Ticker, Length: 100, dtype: object,
 402    ROST
 403     RSG
 404     RTX
 405    SBAC
 406    SBUX
        ... 
 499     XYL
 500     YUM
 501     ZBH
 502    ZBRA
 503    ZION
 Name: Ticker, Length: 100, dtype: object,
 504    ZTS
 Name: Ticker, dtype: object]

Having successfully created a list of 5 lists, each with a 100 entries in them, and, a list of some extra entries in the data.
Now we need to transfrom all of the stocks in that list into strings so that we can pass it server to make an HTTPS request.
We do that by creating a new list that contains the list of all stocks in string seperated by a comma using the join function and then adding that group onto he new list we created.

In [13]:
symbol_strings = []
for i in range(0, len(symbol_groups)): # loop runs for each chunk
    symbol_strings.append(','.join(symbol_groups[i])) # join() joins all keys in the chunk and append() add each joined chunk
    print(symbol_strings[i])

A,AAL,AAP,AAPL,ABBV,ABC,ABMD,ABT,ACN,ADBE,ADI,ADM,ADP,ADSK,AEE,AEP,AES,AFL,AIG,AIV,AIZ,AJG,AKAM,ALB,ALGN,ALK,ALL,ALLE,ALXN,AMAT,AMCR,AMD,AME,AMGN,AMP,AMT,AMZN,ANET,ANSS,ANTM,AON,AOS,APA,APD,APH,APTV,ARE,ATO,ATVI,AVB,AVGO,AVY,AWK,AXP,AZO,BA,BAC,BAX,BBY,BDX,BEN,BF.B,BIIB,BIO,BK,BKNG,BKR,BLK,BLL,BMY,BR,BRK.B,BSX,BWA,BXP,C,CAG,CAH,CARR,CAT,CB,CBOE,CBRE,CCI,CCL,CDNS,CDW,CE,CERN,CF,CFG,CHD,CHRW,CHTR,CI,CINF,CL,CLX,CMA,CMCSA
CME,CMG,CMI,CMS,CNC,CNP,COF,COG,COO,COP,COST,COTY,CPB,CPRT,CRM,CSCO,CSX,CTAS,CTL,CTSH,CTVA,CTXS,CVS,CVX,CXO,D,DAL,DD,DE,DFS,DG,DGX,DHI,DHR,DIS,DISCK,DISH,DLR,DLTR,DOV,DOW,DPZ,DRE,DRI,DTE,DUK,DVA,DVN,DXC,DXCM,EA,EBAY,ECL,ED,EFX,EIX,EL,EMN,EMR,EOG,EQIX,EQR,ES,ESS,ETFC,ETN,ETR,EVRG,EW,EXC,EXPD,EXPE,EXR,F,FANG,FAST,FB,FBHS,FCX,FDX,FE,FFIV,FIS,FISV,FITB,FLIR,FLS,FLT,FMC,FOX,FOXA,FRC,FRT,FTI,FTNT,FTV,GD,GE,GILD,GIS
GL,GLW,GM,GOOG,GOOGL,GPC,GPN,GPS,GRMN,GS,GWW,HAL,HAS,HBAN,HBI,HCA,HD,HES,HIG,HII,HLT,HOLX,HON,HPE,HPQ,HRB,HRL,HSIC,HST,HSY,HUM,HWM,IBM,ICE,IDXX,IEX,IFF,ILMN,INCY,INF

Next thing we do is to create a blank data frame where we will store all the data that we recieve after making the API call. We loop through each string in the symbol_strings list, which has exactly 6 lists. We pass those strings to make our Batch API call. We will syft through the IEX Cloud documentation page to find the base URL to make our batch API call.

The base URL is:https://sandbox.iexapis.com/stable/. On the docs page we find a URL /stock/market/batch. We add that. https://sandbox.iexapis.com/stable/stock/market/batch. Now what do we want from it. We want the quote endpoint and that'll use the symbols we pass on to it. So we add ?types=quote&symbols={symbol_string}&token={API_TOKEN}. We have the required URL as https://sandbox.iexapis.com/stable/stock/market/batch?types=quote&symbols={symbol_string}&token={API_TOKEN}. The question mark represents what we want from the API call.

In [14]:
final_dataframe = pd.DataFrame(columns = my_columns)
for symbol_string in symbol_strings: # loop runcs for each chunk
    #print(symbol_string) #for testing the loop
    batch_api_call_url = f'https://sandbox.iexapis.com/stable/stock/market/batch/?types=quote&symbols={symbol_string}&token={IEX_CLOUD_API_TOKEN}'
    #print(batch_api_call_url) #for testing the URL
    data = requests.get(batch_api_call_url).json()
    #print(data.status_code) #to check if the data we receive is sound #remove .json() from above before running
    for symbol in symbol_string.split(','): #because we need the metrics for each stock we'll split the data #loop runs for each symbol in given chunk
        final_dataframe = final_dataframe.append(
                                        pd.Series([symbol, 
                                                   data[symbol]['quote']['latestPrice'], #this is to parse the data for the metrics of the specific stock
                                                   data[symbol]['quote']['marketCap'], #we add the symbol and the endpoint for the same
                                                   'N/A'], 
                                                  index = my_columns), 
                                        ignore_index = True)
final_dataframe
    

Unnamed: 0,Ticker,Price,Market Capitalization,Number Of Shares to Buy
0,A,119.40,36091338554,
1,AAL,14.77,9588750119,
2,AAP,189.99,11521975709,
3,AAPL,150.87,2541201722177,
4,ABBV,156.75,282880622048,
...,...,...,...,...
496,YUM,121.05,35246327087,
497,ZBH,108.12,22364516108,
498,ZBRA,310.09,15784718076,
499,ZION,51.89,7727021820,


That works well! We can see the substantial change in speed with the use of batch API calls.

CALCULATING THE NUMBER OF SHARES TO BUY:

Now to calculate the number of shares to buy. In order to write good code, one must write code that is reusable in various situations. So instead of creating a code for this specific case we will create one that asks you how big your portfolio is and works with that.

In [35]:
currency = 'USD'
portfolio_size = input(f'Enter the value of your portfolio in {currency}:') # to get the input from the user
try:
    val = float(portfolio_size) # forcing a string here will give a ValueError
except ValueError: # except works no matter what the error is so we restrict it to ValueErrors only
    print('Please only enter a number without the currency sign \nPlease try again')
    portfolio_size = input(f'Enter the value of your portfolio in {currency}:')
    val = float(portfolio_size)

Enter the value of your portfolio in USD: 1000000


Now since this takes a string value for portfolio size and a person may enter a word instead of a number which will brick the next code. So we will use a try except statement to deal with that.

Since this is an equal weight fund, therefore,

In [39]:
position_size = float(portfolio_size) / len(final_dataframe.index)
for i in range(0, len(final_dataframe.index)):
    final_dataframe.loc[i, 'Number Of Shares to Buy'] = math.floor(position_size / final_dataframe['Price'][i]) 
final_dataframe
# floor function is used because a lot of places don't accept fractional shares. So to avoid misbalancing, by rounding to the
# closest integer as the value of shares rounded up maybe more than those that were rounded down, we round down all of them.

Unnamed: 0,Ticker,Price,Market Capitalization,Number Of Shares to Buy
0,A,119.40,36091338554,16
1,AAL,14.77,9588750119,135
2,AAP,189.99,11521975709,10
3,AAPL,150.87,2541201722177,13
4,ABBV,156.75,282880622048,12
...,...,...,...,...
496,YUM,121.05,35246327087,16
497,ZBH,108.12,22364516108,18
498,ZBRA,310.09,15784718076,6
499,ZION,51.89,7727021820,38


Voila!

We have created what we need.

Formatting our output in Excel using xlsxwriter:

Firstly we initialise our xlsxwriter object. After which we pass on our data frame to the writer.

In [74]:
writer = pd.ExcelWriter('recommended_trades.xlsx', engine='xlsxwriter')
final_dataframe.to_excel(writer, sheet_name='Recommended Trades', index = False) # index = False is added because we don't want an index

Creating the Formats We'll Need For Our .xlsx File

Formats include colors, fonts, and also symbols like % and $. We'll need four main formats for our Excel document:

1. String format for tickers
2. $XX.XX format for stock prices
3. $XX,XXX format for market capitalization
4. Integer format for the number of shares to purchase

We will use the book.add_format attribute to save the formats we want for each column.

In [75]:
background_color = '#ffffff'
font_color = '#000000'
# let's start with string format
string_format = writer.book.add_format(
        {
            'font_color': font_color,
            'bg_color': background_color,
            'border': 1
        }
    )
dollar_format = writer.book.add_format(
        {
            'num_format': '$0.00',
            'font_color': font_color,
            'bg_color': background_color,
            'border': 1
        }
    )
integer_format = writer.book.add_format(
        {
            'num_format': '0',
            'font_color': font_color,
            'bg_color': background_color,
            'border': 1
        }
    )
# this assigns formats to each of the variables

Let's start formatting each column of our .xlsx file.

In [65]:
# writer.sheets['Recommended Trades'].set_column('A:A', 18, string_format)
# writer.sheets['Recommended Trades'].set_column('B:B', 18, dollar_format)
# writer.sheets['Recommended Trades'].set_column('C:C', 18, dollar_format)
# writer.sheets['Recommended Trades'].set_column('D:D', 18, integer_format)
# let's take a peak at our file
# writer.save()

We see that the data has been formatted alright but the column names have not been so. We'll use the write function for that.

In [66]:
# writer.sheets['Recommended Trades'].write('A1', 'Ticker', string_format)
# writer.sheets['Recommended Trades'].write('B1', 'Price', string_format)
# writer.sheets['Recommended Trades'].write('C1', 'Market Capitalization', string_format)
# writer.sheets['Recommended Trades'].write('D1', 'Number Of Shares to Buy', string_format)
# writer.save()

This code works, but it is too repitive. So we will simplify it by using two loops instead. "Don't repeat yourself" is an important rule of coding. The code must be simple but short.

We start by making a dictionary that contains the keys and the value for each column i.e., name and format.

In [76]:
column_formats = { 
                    'A': ['Ticker', string_format],
                    'B': ['Price', dollar_format],
                    'C': ['Market Capitalization', dollar_format],
                    'D': ['Number of Shares to Buy', integer_format]
                    }
# we need a loop that loops through each key in the above dictionary
for column in column_formats.keys(): # .keys() returns all the keys of a dictionary
    writer.sheets['Recommended Trades'].set_column(f'{column}:{column}', 18, column_formats[column][1])
    writer.sheets['Recommended Trades'].write(f'{column}1', column_formats[column][0], string_format)
writer.save()

That completes it. I think this fulfils the goals of this project which were less about how to build an equal weight index fund, which is a relatively easy code. Rather, this was about how to use APIs and request data from APIs. It taught me how to clean and explore the data and also how to format the data into an Excel file that is more accessible to the public. Skills that are must have for an Quantitative analyst.