For context and information about API calls using Python (using the IEX Cloud platform API), see the notebook on API calls.

# Recap code

## Required libraries

In [1]:
# For array computation...
import numpy as np
# For data processing, manipulation and analysis...
import pandas as pd
# For making requests to the server through the HTTP protocol...
import requests

## Endpoint request URL

In [2]:
# Public sandbox token...
myApiToken = "Tpk_75e5db7f508e48f2b7dff4f435ff8029"
#========================
# Base URL of the IEX Cloud testing sandbox platform...
baseUrl = "https://sandbox.iexapis.com/"
# URL to the specific API version we want...
apiUrl = baseUrl + "stable/"
#------------
# A symbol i.e. reference to particular entreprise's stock.
symbol = "MSFT" # Refers to data on the company 'Microsoft'.
# Link to data source...
dataSourceUrl = apiUrl + "stock/"
# Reference to data object and link to service...
endpointUrl = dataSourceUrl + f"{symbol}/quote"
# For authentication...
endpointRequestUrl = endpointUrl + f"?token={myApiToken}"
# Full URL to be used...
print("URL to be used for API calls:",endpointUrl)

URL to be used for API calls: https://sandbox.iexapis.com/stable/stock/MSFT/quote


In [4]:
# Function to get to endpoint URL for a given symbol...
# Will be used when looping through each symbol to make the dataframe...
def getEndpointRequestUrl(s):
    return dataSourceUrl + f"{s}/quote?token={myApiToken}"

## Testing URL validity

In [5]:
# Viewing the URL we are using...
print("URL used:\n" + endpointRequestUrl)
# Sending API call 'get' as an HTTP request, and storing the response...
res = requests.get(endpointRequestUrl)
# Checking if the request was valid...
print("HTTP response status code:", res.status_code)

URL used:
https://sandbox.iexapis.com/stable/stock/MSFT/quote?token=Tpk_75e5db7f508e48f2b7dff4f435ff8029
HTTP response status code: 200


## Converting response content to JSON format

In [6]:
data = res.json()
# Viewing data fields for reference...
for x in data: print(x)

avgTotalVolume
calculationPrice
change
changePercent
close
closeSource
closeTime
companyName
currency
delayedPrice
delayedPriceTime
extendedChange
extendedChangePercent
extendedPrice
extendedPriceTime
high
highSource
highTime
iexAskPrice
iexAskSize
iexBidPrice
iexBidSize
iexClose
iexCloseTime
iexLastUpdated
iexMarketPercent
iexOpen
iexOpenTime
iexRealtimePrice
iexRealtimeSize
iexVolume
lastTradeTime
latestPrice
latestSource
latestTime
latestUpdate
latestVolume
low
lowSource
lowTime
marketCap
oddLotDelayedPrice
oddLotDelayedPriceTime
open
openTime
openSource
peRatio
previousClose
previousVolume
primaryExchange
symbol
volume
week52High
week52Low
ytdChange
isUSMarketOpen


# Creating dataframe for data

## Identifying desired columns

We will not need all the data fields, and hence we will only choose the ones we need for the dataframe. The data fields needed for our purpose are:
- Ticker (ticket symbol i.e. unique identifier for stock)
- Stock price
- Market capitalisation
- Number of shares to buy (to be determined by our algorithm)

In [7]:
# Listing the desired column names...
# (no restrictions on how you name the columns)
myColumns = [
        "Ticker",
        "Stock price",
        "Market cap",
        "No. of shares to buy"]
# Creating an empty dataframe object with the desired columns...
df = pd.DataFrame(columns = myColumns)
# Viewing the result...
df

Unnamed: 0,Ticker,Stock price,Market cap,No. of shares to buy


## Obtaining the list of all ticker symbols

Ticker symbols are unique identifiers of given to each stock (entreprise's shares) in the given stock exchange. Now, we need to get response data for every symbol available in the IEX Cloud testing sandbox database 'stock'.

This platform contains (among other data) data on all the companies listed in the S&P 500 stock index. For now, we will concern ourselves with only these companies (since that is also the project's focus). Data on S&P 500 stocks is available in this project's directory, in the archive's folder, and this is the data we will use for obtaining the ticker symbols.

_(The archives directory data is from: https://www.kaggle.com/camnugent/sandp500)_

In [71]:
# Reading the CSV file...
# (putting the data in a Pandas 'DataFrame' object)
stocks = pd.read_csv("./archive/all_stocks_5yr.csv")
# Viewing the dataframe...
print("Viewing part of the dataframe...")
stocks[0:5]

Viewing part of the dataframe...


Unnamed: 0,date,open,high,low,close,volume,Name
0,2013-02-08,15.07,15.12,14.63,14.75,8407500,AAL
1,2013-02-11,14.89,15.01,14.26,14.46,8882000,AAL
2,2013-02-12,14.45,14.51,14.1,14.27,8126000,AAL
3,2013-02-13,14.3,14.94,14.25,14.66,10259500,AAL
4,2013-02-14,14.94,14.96,13.16,13.99,31879900,AAL


Hence, we need to use the 'Name' column for obtaining the ticker symbols. However, since the names are duplicated, we need to obtain the unique values of the 'Name' column.

_**List of unique ticker symbols**_

In [72]:
# List of unique ticker symbols...
symbols = stocks["Name"].unique()
# Confirming uniqueness...
print(symbols[0:5])

['AAL' 'AAPL' 'AAP' 'ABBV' 'ABC']


##  Matching response data fields to dataframe fields

After inspecting the data fields of the response data (printed in the top section, and stored in the 'data' variable), we will choose the following fields for each field of the dataframe:
1. **Ticker**: (we will use elements from 'symbols')
2. **Stock price**: latestPrice
3. **Market cap**: marketCap
4. **No. of shares to buy**: (to be determined by the algorithm)

The response data converted to JSON is available in a dictionary format, and we have stored this data in the variable 'data'. Hence, to obtain the value from any of the above two fields that need to be drawn from the response data, we can simply refer to them in as **data['latestPrice']** and **data['marketCap']**.

## Point requests

Here, we request for data on one symbol per request. This is very time consuming for retrieving data on a large amount of symbols, but we will do it for demonstration purposes.

We will be appending data from one response per iteration to the dataframe, starting from an empty dataframe. We will measure the time taken for execution, and compare it with batch request.

_**Emptying our dataframe**_

In [87]:
# Ensuring empty dataframe...
df = pd.DataFrame(columns = myColumns)
# Viewing empty dataframe for reference...
df

Unnamed: 0,Ticker,Stock price,Market cap,No. of shares to buy


_**Appending the desired data from the response data to the dataframe**_

_To time the execution..._

In [89]:
import time

_The code..._

In [154]:
# STARTING TIME
startTime = time.time()
#________________________
# Dictionary used to add new row...
row = {}
# Looping through the ticker symbols...
for s in symbols:
    #========================
    # Retrieving endpoint request URL for current symbol...
    # (Using a user-defined function given above)
    endpointRequestUrl = getEndpointRequestUrl(s)
    #========================
    # Making a request for data to get response data...
    # (And converting the response's content to JSON format)
    try: data = requests.get(endpointRequestUrl).json()
    except: continue # In case the symbol is not available.
    # (To view the data fields, see the first section)
    #========================
    # Appending data from the desired fields to the dataframe...
    # (Make sure to give them in the order of the dataframe's columns)
    #------------
    # Creating a new row as a dictionary...
    # The corresponding fields have been picked from
    # - The response data
    # - The dataframe that must be filled
    row = {
        "Ticker": s,
        "Stock price": data["latestPrice"],
        "Market cap": data["marketCap"]}
    # (Giving no value for 'No. of shares to buy' will leave it as NaN)
    df = df.append(row, ignore_index = True)
    # (Can only append dictionary by putting ignore_index = True)
#________________________
# ENDING TIME
endTime = time.time()
print("Total exectution time: {0} seconds".format(endTime - startTime))

Total exectution time: 543.731085062027 seconds


In [86]:
# Viewing the resulting dataframe
df

Unnamed: 0,Ticker,Stock price,Market cap,No. of shares to buy
0,AAL,19.8,12594344396,
1,AAPL,153.47,2579319468263,
2,AAP,241.48,15022270048,
3,ABBV,113.32,195059046325,
4,ABC,126.13,26456620228,


## Batch requests

Here, we request more data on multiple symbols at in a single request. This is the easiest way to improve the performance of code involving a large number API calls, since HTTP requests are usually the slowest aspects of the code.

Furthermore, API providers (such as IEX Cloud) will often provide discounts for batch API calls, since they are easier for the API provider to respond to.

IEX Cloud limits batch API calls to 100 symbols (i.e. tickers) per request. This greatly reduces the number of HTTP requests that need to be made to retrieve the data, hence increasing the speed of the code's execution.

### Splitting the list of symbols into chunks

We need to process the symbols one chunk at a time. Hence, we must divide the array of symbols stored in the variable 'symbols' into as few chunks as possible. Since 100 is the maximum number of symbols per request, we will make chunks of 100 (or less, if fewer symbols are left over).

_**Function for returning a list of chunks (100 elements or less per chunk).**_

In [109]:
# IMPLEMENTATION 1
def chunks(myList, chunkSize):
# myList can be any mutable and iterable collection ex. array.
    resultantList = []
    for i in range(0, len(myList), chunkSize):
    # chunkSize serves as the step value for this loop.
        resultantList.append(myList[i: i + chunkSize])
    return resultantList
# NOTES
# myList[i: i + chunkSize] slices from a range of indices.
# i is the lower bound and is included.
# i + chunkSize is the upper bound and is excluded.
# If the upper bound exceeds list size, no error will be thrown.
# Instead, the upper bound will simply default to the list size.

In [105]:
# IMPLEMENTATION 2
def chunks(myList, chunkSize):
    for i in range(0, len(myList), chunkSize):
        yield myList[i: i + chunkSize]

Note that making this function is not necessary. These implementations are simply given to highlight the different approaches, and allow for easy reusability if needed. Note that both of the above implementations produce the a similar end result. The difference is in their return values types:
- Implementation 1 returns a list of slices of the argument 'myList'
- Implementation 2 returns a generator object (convertible to a collection data type)

In [117]:
symbolGroups = list(chunks(symbols, 100))
# List constructor => same end result for all implementations.
#------------
# Printing the result in a neat manner for demonstration...
for x in symbolGroups: # Each sublist.
    print("------------")
    for y in x[0:5]: # Few elements from each sublist.
        print(y)
    print("({0} more...)".format(len(x) - 5))

------------
AAL
AAPL
AAP
ABBV
ABC
(95 more...)
------------
CL
CMA
CMCSA
CME
CMG
(95 more...)
------------
GE
GGP
GILD
GIS
GLW
(95 more...)
------------
MCD
MCHP
MCK
MCO
MDLZ
(95 more...)
------------
ROK
ROP
ROST
RRC
RSG
(95 more...)
------------
XYL
YUM
ZBH
ZION
ZTS
(0 more...)


Hence, we have 5 sublists of 100 elements, and 1 sublist of 5.

### Sending batch requests

For official IEX Cloud information: https://iexcloud.io/docs/api/#batch-requests. For specific notes, check the notes on IEX Cloud API batch calls, present in the directory of IEX Cloud notes in the notes directory.

#### Creating a query string

To convert each sublist into a comma delimited string...

In [133]:
symbolGroupStrings = [] # Intended list of comma delimited strings.
for x in symbolGroups: # Going through each sublist.
    symbolGroupStrings.append(",".join(x))
    # ",".join(x) joins each element of x using ',' as separator.
#------------
# Printing part of one element for demonstration...
print("A part of one comma delimited query string...")
print(symbolGroupStrings[0][0:50] + "...")

A part of one comma delimited query string...
AAL,AAPL,AAP,ABBV,ABC,ABT,ACN,ADBE,ADI,ADM,ADP,ADS...


**Creating batch request URL's**

In [162]:
def getBatchRequestUrl(scds):
    br = dataSourceUrl + "market/batch?"
    # Note that database URL contains API URL as well.
    qs = "types=quote&symbols="
    # The query string is incomplete to allow for different strings.
    auths = f"&token={myApiToken}"
    return br + qs + scds + auths
# scds => symbols comma delimited string
# br => batch request URL
# qs => query string
# auths => authorisation string
#------------------------
# Testing the function
batchRequestUrl = getBatchRequestUrl(symbolGroupStrings[0])
print("Example of batch URL: " + batchRequestUrl)

Example of batch URL: https://sandbox.iexapis.com/stable/stock/market/batch?types=quote&symbols=AAL,AAPL,AAP,ABBV,ABC,ABT,ACN,ADBE,ADI,ADM,ADP,ADSK,ADS,AEE,AEP,AES,AET,AFL,AGN,AIG,AIV,AIZ,AJG,AKAM,ALB,ALGN,ALK,ALLE,ALL,ALXN,AMAT,AMD,AME,AMGN,AMG,AMP,AMT,AMZN,ANDV,ANSS,ANTM,AON,AOS,APA,APC,APD,APH,APTV,ARE,ARNC,ATVI,AVB,AVGO,AVY,AWK,AXP,AYI,AZO,A,BAC,BAX,BA,BBT,BBY,BDX,BEN,BF.B,BHF,BHGE,BIIB,BK,BLK,BLL,BMY,BRK.B,BSX,BWA,BXP,CAG,CAH,CAT,CA,CBG,CBOE,CBS,CB,CCI,CCL,CDNS,CELG,CERN,CFG,CF,CHD,CHK,CHRW,CHTR,CINF,CI,CLX&token=Tpk_75e5db7f508e48f2b7dff4f435ff8029


#### Testing the batch requests

In [196]:
i = 1
data
for scds in symbolGroupStrings:
# scds => symbol comma delimited string (represents one batch)
    print("------------")
    print("Bath request:", i)
    batchRequestUrl = getBatchRequestUrl(scds)
    # Viewing the URL we are using...
    print("URL used:\n" + batchRequestUrl)
    # Sending API call 'get' as an HTTP request, and storing the response...
    res = requests.get(batchRequestUrl)
    # Checking if the request was valid...
    print("HTTP response status code:", res.status_code)
    i = i + 1

------------
Bath request: 1
URL used:
https://sandbox.iexapis.com/stable/stock/market/batch?types=quote&symbols=AAL,AAPL,AAP,ABBV,ABC,ABT,ACN,ADBE,ADI,ADM,ADP,ADSK,ADS,AEE,AEP,AES,AET,AFL,AGN,AIG,AIV,AIZ,AJG,AKAM,ALB,ALGN,ALK,ALLE,ALL,ALXN,AMAT,AMD,AME,AMGN,AMG,AMP,AMT,AMZN,ANDV,ANSS,ANTM,AON,AOS,APA,APC,APD,APH,APTV,ARE,ARNC,ATVI,AVB,AVGO,AVY,AWK,AXP,AYI,AZO,A,BAC,BAX,BA,BBT,BBY,BDX,BEN,BF.B,BHF,BHGE,BIIB,BK,BLK,BLL,BMY,BRK.B,BSX,BWA,BXP,CAG,CAH,CAT,CA,CBG,CBOE,CBS,CB,CCI,CCL,CDNS,CELG,CERN,CFG,CF,CHD,CHK,CHRW,CHTR,CINF,CI,CLX&token=Tpk_75e5db7f508e48f2b7dff4f435ff8029
HTTP response status code: 200
------------
Bath request: 2
URL used:
https://sandbox.iexapis.com/stable/stock/market/batch?types=quote&symbols=CL,CMA,CMCSA,CME,CMG,CMI,CMS,CNC,CNP,COF,COG,COL,COO,COP,COST,COTY,CPB,CRM,CSCO,CSRA,CSX,CTAS,CTL,CTSH,CTXS,CVS,CVX,CXO,C,DAL,DE,DFS,DGX,DG,DHI,DHR,DISCA,DISCK,DISH,DIS,DLR,DLTR,DOV,DPS,DRE,DRI,DTE,DUK,DVA,DVN,DWDP,DXC,D,EA,EBAY,ECL,ED,EFX,EIX,EL,EMN,EMR,EOG,EQIX,EQR,EQT,ESRX,E

All batch requests are valid.

#### Applying the batch requests to retrieve the desired data

This could have been done during testing, but I separated the processes for conceptual clarity.

_**Inspecting the type of response data we get from a batch request**_

To know how to add the response data to our dataframe, we must know what kind of data we get, since it is likely to be different from a point request response. Given that the response data will be converted in JSON format, we know it will be in the form of a dictionary.

In [192]:
batchRequestUrl = getBatchRequestUrl(symbolGroupStrings[0])
data = requests.get(batchRequestUrl).json()
# Checking only the keys of this data dictionary...
# (We will use only one key to look into, to make the outputs manageable)
for x in data.keys(): print(x, end = ";")

AAL;AAPL;AAP;ABBV;ABC;ABT;ACN;ADBE;ADI;ADM;ADP;ADSK;ADS;AEE;AEP;AES;AET;AFL;AGN;AIG;AIV;AIZ;AJG;AKAM;ALB;ALGN;ALK;ALLE;ALL;ALXN;AMAT;AMD;AME;AMGN;AMG;AMP;AMT;AMZN;ANDV;ANSS;ANTM;AON;AOS;APA;APC;APD;APH;APTV;ARE;ARNC;ATVI;AVB;AVGO;AVY;AWK;AXP;AYI;AZO;A;BAC;BAX;BA;BBT;BBY;BDX;BEN;BF.B;BHF;BHGE;BIIB;BK;BLK;BLL;BMY;BRK.B;BSX;BWA;BXP;CAG;CAH;CAT;CA;CBOE;CBS;CB;CCI;CCL;CDNS;CELG;CERN;CFG;CF;CHD;CHK;CHRW;CHTR;CINF;CI;CLX;

As we knew, the response data converted to JSON is in the form a a dictionary. They keys of this dictionary are the symbols i.e. tickers. Inspecting the value of one key...

In [186]:
# LAYER 1 INSPECTION
# Choosing one key and inspecting its value...
dataValue = data['AAL']
type(dataValue)

dict

The value attached to the symbol is another dictionary. Inspecting this value further...

In [193]:
# LAYER 2 INSPECTION
for x in dataValue.keys(): print(x, end = ";")

quote;

 The value of attached to each symbol is another dictionary, whose keys are the endpoints. Here, we have only one endpoint, hence, only one key. Inspecting the value of this particular key...

In [194]:
# LAYER 3 INSPECTION
dataValue['quote']

{'avgTotalVolume': 28429250,
 'calculationPrice': 'close',
 'change': -0.75,
 'changePercent': -0.0382,
 'close': 19.42,
 'closeSource': 'cfailifo',
 'closeTime': 1670234102346,
 'companyName': 'American Airlines Group Inc',
 'currency': 'USD',
 'delayedPrice': 19.87,
 'delayedPriceTime': 1657325754096,
 'extendedChange': 0.07,
 'extendedChangePercent': 0.00367,
 'extendedPrice': 19.56,
 'extendedPriceTime': 1660945348434,
 'high': 20.66,
 'highSource': 'pdl a5cneumi ei tre1edy',
 'highTime': 1710025873829,
 'iexAskPrice': None,
 'iexAskSize': None,
 'iexBidPrice': None,
 'iexBidSize': None,
 'iexClose': 19.76,
 'iexCloseTime': 1685272777647,
 'iexLastUpdated': None,
 'iexMarketPercent': None,
 'iexOpen': 19.31,
 'iexOpenTime': 1696890655475,
 'iexRealtimePrice': None,
 'iexRealtimeSize': None,
 'iexVolume': None,
 'lastTradeTime': 1683415191041,
 'latestPrice': 19.98,
 'latestSource': 'Close',
 'latestTime': 'October 22, 2021',
 'latestUpdate': 1653663044560,
 'latestVolume': 38131074

This is the dictionary of the data retrieved for the symbol at the endpoint. This is the dictionary we get in a point request. But since a batch request is equivalent to multiple point requests on multiple symbols AND multiple endpoints (potentially), the response data of a batch request is a dictionary of dictionaries of dictionaries.

Based on this knowledge, we must append data into our dataframe from response data accordingly. We must reference the values of both the symbol and the endpoint to get dictionary of values retrieved for that symbol from that endpoint (only one endpoint in our case). From this dictionary, we must reference the desired fields, as before.

_**Emptying the dataframe**_

In [159]:
# Ensuring empty dataframe...
df = pd.DataFrame(columns = myColumns)
# Viewing empty dataframe for reference...
df

Unnamed: 0,Ticker,Stock price,Market cap,No. of shares to buy


_**Appending the desired data from the response data to the dataframe**_

_To time the execution..._

In [160]:
import time

_The code..._

In [203]:
# STARTING TIME
startTime = time.time()
#________________________
# Dictionary used to add new row...
row = {}
# Looping through the ticker symbols...
for scds in symbolGroupStrings:
# scds => symbol comma delimited string (represents one batch)
    #========================
    # Retrieving batch request for current symbol...
    # (Using a user-defined function given above)
    batchRequestUrl = getBatchRequestUrl(scds)
    #========================
    # Making a request for data to get response data...
    # (And converting the response's content to JSON format)
    data = requests.get(batchRequestUrl).json()
    #========================
    # Appending data from the desired fields to the dataframe...
    # (Reference the symbol and endpoint to get the retrieved data)
    # Iterating through every symbol in the batch...
    for s in scds.split(","):
    # (Only one endpoint, so we need not loop for endpoints as well)
        try:
            row = {
                "Ticker": s,
                "Stock price": data[s]["quote"]["latestPrice"],
                "Market cap": data[s]["quote"]["marketCap"]}
            df = df.append(row, ignore_index = True)
        except: continue
"""
NOTE ON THIS TRY-EXCEPT BLOCK
In the batch request, it retrieves data for all available symbols.
If symbol is not available in the database, it is absent in the response.
Hence, we have this try-except block for cases where a symbol is absent.
(Absent => it is not a key in the JSON dictionary)
"""
#________________________
# ENDING TIME
endTime = time.time()
print("Total exectution time: {0} seconds".format(endTime - startTime))

Total exectution time: 10.637717008590698 seconds


In [201]:
df

Unnamed: 0,Ticker,Stock price,Market cap,No. of shares to buy
0,AAL,19.21,12563917635,
1,AAPL,150.14,2497846221609,
2,AAP,238.80,15113018499,
3,ABBV,109.31,201000261624,
4,ABC,128.71,25672782819,
...,...,...,...,...
575,XYL,130.34,24200658546,
576,YUM,130.93,38645409142,
577,ZBH,157.83,32466095212,
578,ZION,68.58,10812793982,


As can be seen, batch requests are enormously faster than point requests if we have a large amounts of symbols for whom we need data. In this execution, it finished processing requests for the same number of symbols in **10.64** seconds, while sequential point requests took **543.73** seconds to do the same job.