# API Calls and Database Creation

#### *Attribution*: Data provided by [Financial Modeling Prep](https://financialmodelingprep.com/developer/docs/), [Yahoo! Finance](https://finance.yahoo.com/?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce_referrer_sig=AQAAAKtvd0VA2qrc8bFsNkuF4NGhKMC_Wd22-xdYWS2LiYJy_Ox0R0rkGGz4-G_e8KIKaE-aqTCnWYMEOYbt0LIb2KNVb8PG6rsyuanF4ZW-0IzXwBjAdjiA40xsi8Vt1f0cImCziXPPI8L4Kwfmv9UDvEmLa0be1bhdntcfuyeozk3m)

<hr style="border:4px solid blue">

In this notebook, I will download stock data from the Yahoo! Finance API, yfinance, going back to January 1st, 2023. The yfinance API returns pandas DataFrames as a response, so I needed to iteratively insert the relevant features from the returned dataframe into a postgreSQL table. 

I also use API calls to Financial Modeling Prep to populate a postgreSQL table containing reference material about each of the stocks contained in the aforementioned stock data. The use of FMP is strictly not necessary, as yfinance has access to the same background information through their API's "Ticker()" objects. By calling: yf.Ticker(<ticker_symbol>).get_info(), a very similar object (dictionary) is returned, and I could've populated my postgres table by using .get(<attribute>) on that object.  
Instead, to demonstrate multiple API calls for my ELT workflow, I will use the FMP API to query profile data about each stock. The API call returns lists of dictionaries, so very similar to a JSON or the dictionary returned from yfinance. As I already described, I inserted values from each item in the returned list of dictionaries to populate the fields of background information in my postgreSQL table. 

At the completion of this notebook, I will have created 2 separate postgreSQL tables. One table, "daily_stock_prices", has roughly 200k rows containing the daily price action of all stocks from the S&P 500 from January 1st, 2023 to August 23, 2024. The second table, "sp500_profiles", has a single row for each company found in the list of S&P 500 companies (503 tickers). The two tables can be joined on the fields daily_stock_prices.symbol = sp500_profiles.symbol. 

These postgreSQL tables will be used to query data for the purposes of asset research and portfolio allocation. 

In [1]:
import yfinance as yf
from urllib.request import urlopen
import certifi
import json
from sqlalchemy import create_engine
import psycopg2
import numpy as np
import pandas as pd
import os
import configparser
import ssl

from utils import create_daily_stock_prices_table, insert_data_to_daily_stock_prices, get_jsonparsed_data, create_sp500profiles_table, \
insert_data_to_postgres

### Tickers

In [2]:
# Retrieve Current S&P Stock Tickers - as of 08/23/2024
url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
sp500_tickers = pd.read_html(url)[0]["Symbol"].to_list()

# Handling Errors from Wiki
sp500_tickers.remove("BRK.B")
sp500_tickers.remove("BF.B")
sp500_tickers.extend(["BRK-B", "BF-B"])

### Yfinance API

In [70]:
# Download and format the stock data without multi-index, "Date" column is currently stored as a datetime obj
today = pd.Timestamp.today()

# Download using yfinance API
df = yf.download(sp500_tickers, start="2023-01-01", end=today, interval="1d")

# Formatting
df.sort_index(axis=1, inplace=True)
df = df.stack(future_stack=True).reset_index()  # future_stack required due to future deprecation
df.columns.name = None
df = df.dropna(subset="Open")
df["Volume"] = df["Volume"].fillna(0).astype(int)

# Visual Inspection - instead of multi-index, each stock shown for each day, future transformations require groupby(["Ticker").transform()
print(df.shape)
df.head()

[*********************100%%**********************]  503 of 503 completed


(206466, 8)


Unnamed: 0,Date,Ticker,Adj Close,Close,High,Low,Open,Volume
0,2023-01-03,A,148.440598,150.039993,153.130005,148.470001,151.960007,1414300
1,2023-01-03,AAL,12.74,12.74,13.0,12.53,12.91,21865100
2,2023-01-03,AAPL,123.904625,125.07,130.899994,124.169998,130.279999,112117500
3,2023-01-03,ABBV,151.773132,162.380005,163.020004,160.809998,162.039993,4937500
4,2023-01-03,ABNB,84.900002,84.900002,88.139999,84.300003,87.385002,3901200


In [71]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 206466 entries, 0 to 207738
Data columns (total 8 columns):
 #   Column     Non-Null Count   Dtype         
---  ------     --------------   -----         
 0   Date       206466 non-null  datetime64[ns]
 1   Ticker     206466 non-null  object        
 2   Adj Close  206466 non-null  float64       
 3   Close      206466 non-null  float64       
 4   High       206466 non-null  float64       
 5   Low        206466 non-null  float64       
 6   Open       206466 non-null  float64       
 7   Volume     206466 non-null  int64         
dtypes: datetime64[ns](1), float64(5), int64(1), object(1)
memory usage: 14.2+ MB


In [72]:
# Verifying daterange
df.Date.max(), df.Date.min(), df.Volume.dtype

(Timestamp('2024-08-23 00:00:00'),
 Timestamp('2023-01-03 00:00:00'),
 dtype('int64'))

<hr style="border:3px solid blue">

### Inserting S&P 500 Daily Data into Postgres DB

In [6]:
# System Variables
db_name = os.getenv('stock_db')
db_user = os.getenv('db_user')
db_password = os.getenv('db_password')
db_host = os.getenv('db_host')
db_port = os.getenv('db_port')

# For create_engine from sqlalchemy
connection_string = f"postgresql+psycopg2://{db_user}:{db_password}@{db_host}:{db_port}/{db_name}"

In [75]:
# Create the Table
create_daily_stock_prices_table()

Table daily_stock_prices created successfully.


In [76]:
# Checking table creation and values
pd.read_sql("""SELECT * FROM information_schema.columns WHERE table_name = 'daily_stock_prices';""", 
            create_engine(connection_string))

Unnamed: 0,table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,...,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
0,project_database,public,daily_stock_prices,volume,8,,YES,bigint,,,...,NO,,,,,,NO,NEVER,,YES
1,project_database,public,daily_stock_prices,date,2,,NO,date,,,...,NO,,,,,,NO,NEVER,,YES
2,project_database,public,daily_stock_prices,id,1,nextval('daily_stock_prices_id_seq'::regclass),NO,integer,,,...,NO,,,,,,NO,NEVER,,YES
3,project_database,public,daily_stock_prices,open,4,,NO,double precision,,,...,NO,,,,,,NO,NEVER,,YES
4,project_database,public,daily_stock_prices,close,5,,NO,double precision,,,...,NO,,,,,,NO,NEVER,,YES
5,project_database,public,daily_stock_prices,high,6,,NO,double precision,,,...,NO,,,,,,NO,NEVER,,YES
6,project_database,public,daily_stock_prices,low,7,,NO,double precision,,,...,NO,,,,,,NO,NEVER,,YES
7,project_database,public,daily_stock_prices,symbol,3,,NO,character varying,10.0,40.0,...,NO,,,,,,NO,NEVER,,YES


In [79]:
# Manage Connection -> Iteratively insert data from the pandas.DataFrame() returned from yfinance

with psycopg2.connect(
    dbname=db_name, 
    user=db_user, 
    port=db_port, 
    password=db_password, 
    host=db_host) as connection:

    # # Insert rows of data using custom function, iterrows() used internally
    insert_data_to_daily_stock_prices(connection, df)  # custom function

    # Inspection
    pd.read_sql("""SELECT * FROM daily_stock_prices;""", create_engine(connection_string))

In [80]:
# Verification
stocks = pd.read_sql("""SELECT * FROM daily_stock_prices;""", 
            create_engine(connection_string), parse_dates=["date"])

stocks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 206466 entries, 0 to 206465
Data columns (total 8 columns):
 #   Column  Non-Null Count   Dtype         
---  ------  --------------   -----         
 0   id      206466 non-null  int64         
 1   date    206466 non-null  datetime64[ns]
 2   symbol  206466 non-null  object        
 3   open    206466 non-null  float64       
 4   close   206466 non-null  float64       
 5   high    206466 non-null  float64       
 6   low     206466 non-null  float64       
 7   volume  206466 non-null  int64         
dtypes: datetime64[ns](1), float64(4), int64(2), object(1)
memory usage: 12.6+ MB



#### After running the above cells, the database table daily_stock_prices has successfully been completed, and will be able to be used in future querying operations. 

<hr style="border: 4px solid blue">

### Financial Modeling Prep

I have to do the same rounds of downloading data from the FMP API and inserting it into the respective database table as well. 

In [3]:
# FMP requires an API key, using my environment to protect the specific key
api_key = os.getenv('STOCK_API_KEY')

In [4]:
# Demonstrate returned object for AAPL - custom function

## Options:
# "https://financialmodelingprep.com/api/v3/profile/AAPL?apikey=YOUR_API_KEY"  -> company profile!
# "https://financialmodelingprep.com/api/v3/search?query=AAPL&apikey={api_key}" -> symbol|name|currency|exchange ONLY

# Profile example -> query string for the url below using AAPL specifically
url = (f"https://financialmodelingprep.com/api/v3/profile/AAPL?apikey={api_key}")
aapl_profile = get_jsonparsed_data(url)
len(aapl_profile), type(aapl_profile), type(aapl_profile[0]), aapl_profile

(1,
 list,
 dict,
 [{'symbol': 'AAPL',
   'price': 226.84,
   'beta': 1.244,
   'volAvg': 64811409,
   'mktCap': 3448898044000,
   'lastDiv': 1,
   'range': '164.08-237.23',
   'changes': 2.31,
   'companyName': 'Apple Inc.',
   'currency': 'USD',
   'cik': '0000320193',
   'isin': 'US0378331005',
   'cusip': '037833100',
   'exchange': 'NASDAQ Global Select',
   'exchangeShortName': 'NASDAQ',
   'industry': 'Consumer Electronics',
   'website': 'https://www.apple.com',
   'description': 'Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide. The company offers iPhone, a line of smartphones; Mac, a line of personal computers; iPad, a line of multi-purpose tablets; and wearables, home, and accessories comprising AirPods, Apple TV, Apple Watch, Beats products, and HomePod. It also provides AppleCare support and cloud services; and operates various platforms, including the App Store that allow customers to discover and

In [78]:
# Create the Table
create_sp500_table()

Table sp500_profiles created successfully.


In [91]:
# Verifying database connection 

# connection parameters
with psycopg2.connect(
            dbname=db_name,
            user=db_user,
            password=db_password,
            host=db_host,
            port=db_port
        ) as connection:
        
        # Check the connection validity
        with connection.cursor() as cursor:

            # Simple query to check connection
            cursor.execute("SELECT version();")  
            db_version = cursor.fetchone()[0]
            print(f"Connected to database: {db_name}")
            print(f"Database version: {db_version}")

            cursor.execute("""SELECT * FROM information_schema.columns WHERE table_name = 'sp500_profiles';""")
            display(pd.DataFrame(cursor.fetchall(), columns = [desc[0] for desc in cursor.description]))
            print(type(cursor), type(connection))

Connected to database: project_database
Database version: PostgreSQL 16.4 (Postgres.app) on aarch64-apple-darwin21.6.0, compiled by Apple clang version 14.0.0 (clang-1400.0.29.102), 64-bit


Unnamed: 0,table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,...,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
0,project_database,public,sp500_profiles,asof_date,2,,YES,date,,,...,NO,,,,,,NO,NEVER,,YES
1,project_database,public,sp500_profiles,mktcap,8,,YES,bigint,,,...,NO,,,,,,NO,NEVER,,YES
2,project_database,public,sp500_profiles,ipo_date,11,,YES,date,,,...,NO,,,,,,NO,NEVER,,YES
3,project_database,public,sp500_profiles,id,1,nextval('sp500_profiles_id_seq'::regclass),NO,integer,,,...,NO,,,,,,NO,NEVER,,YES
4,project_database,public,sp500_profiles,sector,6,,YES,text,,1073742000.0,...,NO,,,,,,NO,NEVER,,YES
5,project_database,public,sp500_profiles,industry,7,,YES,text,,1073742000.0,...,NO,,,,,,NO,NEVER,,YES
6,project_database,public,sp500_profiles,description,9,,YES,text,,1073742000.0,...,NO,,,,,,NO,NEVER,,YES
7,project_database,public,sp500_profiles,ceo,10,,YES,text,,1073742000.0,...,NO,,,,,,NO,NEVER,,YES
8,project_database,public,sp500_profiles,website,12,,YES,text,,1073742000.0,...,NO,,,,,,NO,NEVER,,YES
9,project_database,public,sp500_profiles,symbol,3,,NO,character varying,10.0,40.0,...,NO,,,,,,NO,NEVER,,YES


<class 'psycopg2.extensions.cursor'> <class 'psycopg2.extensions.connection'>


<hr style="border: 3px solid blue">

### Testing Database Table Creation with Toy Data -> <span style="color:red;  text-decoration: underline; background-color:black"><b>DO NOT REPEAT</b></span>

In [33]:
# Adding a single entry to the db

with psycopg2.connect(
    dbname=db_name,
    user=db_user,
    password=db_password,
    host=db_host,
    port=db_port
) as connection:

    insert_data_to_postgres(connection, aapl_profile[0])  # custom function

In [77]:
# using psycopg2 to verify row entry

with psycopg2.connect(
        dbname=db_name,
        user=db_user,
        password=db_password,
        host=db_host,
        port=db_port
    ) as connection:
    with connection.cursor() as cursor:
        cursor.execute("""SELECT * FROM sp500_profiles""")
        print(cursor.fetchall())
        cursor.execute("""DROP TABLE IF EXISTS sp500_profiles""")

[]


<hr style="border:3px solid blue">

### Inserting Values Into the Database
##### Will have to run this block 3x over 3 separate days

In [3]:
# List of S&P 500 stock symbols (use your list here)
sp500_tickers_1 = sp500_tickers[:200]  # done on august 24th
sp500_tickers_2 = sp500_tickers[200:400]
sp500_tickers_3 = sp500_tickers[400:]

In [90]:
# Trying a bulk string to make one api call for multiple stocks -> this method works to create the table more quickly

bulk_string = ",".join(sp500_tickers_3)

### Using the bulk_string to expedite and condense API calls
After reading the API documentation, I initially believed that I would have to make an individual call for each stock ticker that I wanted supplemental information for. However, after experimenting with a bulk call by listing all stock tickers separated by commas, I found that the bulk-call method worked. While I used 3 separate API calls, all S&P stocks could've been called at one time if listed as a single comma-separated component of the query string. 

In [32]:
# Initialize PostgreSQL connection

connection = psycopg2.connect(
    dbname=db_name,
    user=db_user,
    password=db_password,
    host=db_host,
    port=db_port
)

# query string for bulk download -> unknown compatability
url = f"https://financialmodelingprep.com/api/v3/profile/{bulk_string}?apikey={api_key}"
stock_data = get_jsonparsed_data(url)

 # Ensure the response is a list and contains data
if stock_data and isinstance(stock_data, list):
        for stock_profile in stock_data:  # The API returns a list with one dictionary profile per index
            insert_data_to_postgres(connection, stock_profile)  # Custom function for inserting data

connection.close()

In [11]:
### NO LONGER NEEDED, initial approach was using a single API call per stock ticker -----------------------------


# # Initialize PostgreSQL connection
# connection = psycopg2.connect(
#     dbname=db_name,
#     user=db_user,
#     password=db_password,
#     host=db_host,
#     port=db_port
# )

# # Iterate over the list of S&P 500 stocks -> maximum of 250 API calls per day -> tread carefully --------------------------
# for symbol in sp500_tickers_1:
#     url = f"https://financialmodelingprep.com/api/v3/profile/{symbol}?apikey={api_key}"
#     stock_data = get_jsonparsed_data(url)

#     # Ensure the response is a list and contains data
#     if stock_data and isinstance(stock_data, list):
#         stock_profile = stock_data[0]  # The API returns a list with one profile
#         insert_data_to_postgres(connection, stock_profile)

# # Close the PostgreSQL connection
# connection.close()

### Checking that Data was Inserted Properly

In [7]:
# Checking insertion into the database table

# Create the SQLAlchemy engine
engine = create_engine(connection_string)

try:
    profile_df = pd.read_sql("""SELECT * FROM sp500_profiles""", engine)
    print(profile_df.shape)
    display(profile_df.head())
except:
    print("No DB Found")

(503, 12)


Unnamed: 0,id,asof_date,symbol,company_name,exchange,sector,industry,mktcap,description,ceo,ipo_date,website
0,1,2024-08-24,MMM,3M Company,New York Stock Exchange,Industrials,Conglomerates,71718164700,3M Company operates as a diversified technolog...,Mr. William M. Brown,1946-01-14,https://www.3m.com
1,2,2024-08-24,AOS,A. O. Smith Corporation,New York Stock Exchange,Industrials,Industrial - Machinery,12038155500,A. O. Smith Corporation manufactures and marke...,Mr. Kevin J. Wheeler,1983-09-30,https://www.aosmith.com
2,3,2024-08-24,ABT,Abbott Laboratories,New York Stock Exchange,Healthcare,Medical - Devices,196069331000,"Abbott Laboratories, together with its subsidi...",Mr. Robert B. Ford,1980-03-17,https://www.abbott.com
3,4,2024-08-24,ABBV,AbbVie Inc.,New York Stock Exchange,Healthcare,Drug Manufacturers - General,348940467000,"AbbVie Inc. discovers, develops, manufactures,...",Mr. Robert A. Michael,2013-01-02,https://www.abbvie.com
4,294,2024-08-25,MLM,"Martin Marietta Materials, Inc.",New York Stock Exchange,Basic Materials,Construction Materials,34037335332,"Martin Marietta Materials, Inc., a natural res...",Mr. C. Howard Nye,1994-02-17,https://www.martinmarietta.com


In [85]:
# Smurfit Westrock has null values for sector industry, and website - they are a Dublin-based co that just merged and started
# trading as of 2024-07-08

profile_df.isna().sum()

id              0
asof_date       0
symbol          0
company_name    0
exchange        0
sector          1
industry        1
mktcap          0
description     1
ceo             0
ipo_date        0
website         2
dtype: int64

In [86]:
profile_df[profile_df["website"].isna()]

Unnamed: 0,id,asof_date,symbol,company_name,exchange,sector,industry,mktcap,description,ceo,ipo_date,website
134,128,2024-08-24,CPAY,"Corpay, Inc.",New York Stock Exchange,Technology,Software - Infrastructure,20688256680,"FLEETCOR Technologies, Inc. operates as a paym...",Mr. Ronald F. Clarke,2010-12-15,
409,406,2024-08-25,SW,Smurfit WestRock plc,New York Stock Exchange,,,23468457025,,Mr. Anthony Paul J. Smurfit,2014-12-23,


In [8]:
# Another potential example of how yfinance may have more complete information than FMP

yf.Ticker("CPAY").get_info()["website"], yf.Ticker("SW").get_info()["website"]

('https://www.corpay.com', 'https://www.smurfitwestrock.com')