## Storing data on disk in CSV format

In [1]:
import pandas as pd
from openbb import obb

In [2]:
obb.user.preferences.output_type = "dataframe"

Function to fetch historical stock data for a given symbol and date range, and add a 'symbol' column

In [3]:
def get_stock_data(symbol, start_date=None, end_date=None):
    data = obb.equity.price.historical(
        symbol,
        start_date=start_date,
        end_date=end_date,
        provider="yfinance",
    )
    data.reset_index(inplace=True)
    data["symbol"] = symbol
    return data

Function to save the fetched stock data to a gzipped CSV file

In [4]:
def save_data_range(symbol, start_date=None, end_date=None):
    data = get_stock_data(symbol, start_date, end_date)
    data.to_csv(f"{symbol}.gz", compression="gzip", index=False)

Function to read stock data from a gzipped CSV file

In [5]:
def get_data(symbol):
    return pd.read_csv(
        f"{symbol}.gz",
        compression="gzip",
        index_col="date",
        usecols=["date", "open", "high", "low", "close", "volume", "symbol"],
    )

Save data for the stock symbol "PLTR"

In [6]:
save_data_range("PLTR")

Get the saved data for "PLTR"

In [7]:
pltr = get_data("PLTR")

Save the DataFrame 'df' to a CSV file

In [9]:
pltr.to_csv("market_data.csv")

Save the DataFrame 'df' to a tab-separated file

In [10]:
pltr.to_csv("market_data.tsv", sep="\t")

Save specific columns of 'df' to a CSV file

In [11]:
pltr.to_csv("market_data.csv", columns=["open", "close"])

Save 'df' to a CSV file with a specific date format

In [12]:
pltr.to_csv("market_data.csv", date_format="%Y-%m-%d")

Read the first 10 rows of the CSV file

In [13]:
pltr = pd.read_csv("market_data.csv", nrows=10)

Skip the first 10 rows and read the rest of the CSV file

In [14]:
pltr = pd.read_csv("market_data.csv", skiprows=range(1, 11))

Read the CSV file and specify 'NULL' as the value to be treated as NaN

In [15]:
pltr = pd.read_csv("market_data.csv", na_values="NULL")

**Jason Strimpel** is the founder of <a href='https://pyquantnews.com/'>PyQuant News</a> and co-founder of <a href='https://www.tradeblotter.io/'>Trade Blotter</a>. His career in algorithmic trading spans 20+ years. He previously traded for a Chicago-based hedge fund, was a risk manager at JPMorgan, and managed production risk technology for an energy derivatives trading firm in London. In Singapore, he served as APAC CIO for an agricultural trading firm and built the data science team for a global metals trading firm. Jason holds degrees in Finance and Economics and a Master's in Quantitative Finance from the Illinois Institute of Technology. His career spans America, Europe, and Asia. He shares his expertise through the <a href='https://pyquantnews.com/subscribe-to-the-pyquant-newsletter/'>PyQuant Newsletter</a>, social media, and has taught over 1,000+ algorithmic trading with Python in his popular course **<a href='https://gettingstartedwithpythonforquantfinance.com/'>Getting Started With Python for Quant Finance</a>**. All code is for educational purposes only. Nothing provided here is financial advise. Use at your own risk.

## Storing data on disk with SQLite

In [None]:
!pip install exchange-calendars

In [1]:
import sqlite3

In [2]:
import exchange_calendars as xcals
import pandas as pd
from IPython.display import Markdown, display
from openbb import obb

In [3]:
obb.user.preferences.output_type = "dataframe"

Function to fetch historical stock data for a given symbol and date range, and add a 'symbol' column

In [4]:
def get_stock_data(symbol, start_date=None, end_date=None):
    data = obb.equity.price.historical(
        symbol,
        start_date=start_date,
        end_date=end_date,
        provider="yfinance",
    )
    data.reset_index(inplace=True)
    data["symbol"] = symbol
    return data

Function to save the fetched stock data to an SQLite database

In [5]:
def save_data_range(symbol, conn, start_date=None, end_date=None):
    data = get_stock_data(symbol, start_date, end_date)
    data.to_sql("stock_data", conn, if_exists="append", index=False)

Function to save the stock data for the last trading session to an SQLite database

In [6]:
def save_last_trading_session(symbol, conn):
    today = pd.Timestamp.today()
    data = get_stock_data(symbol, today, today)
    data.to_sql("stock_data", conn, if_exists="append", index=False)

Establish a connection to the SQLite database

In [7]:
conn = sqlite3.connect("market_data.sqlite")

Save data for multiple stock symbols in the specified date range

In [8]:
for symbol in ["SPY", "QQQ", "DIA"]:
    save_data_range(symbol, conn=conn, start_date="2020-06-01", end_date="2023-01-01")

Read and display data for the stock symbol "SPY" from the database

In [9]:
df_1 = pd.read_sql_query("SELECT * from stock_data where symbol='SPY'", conn)
display(df_1)

Unnamed: 0,date,open,high,low,close,volume,split_ratio,dividend,capital_gains,symbol
0,2020-06-01,303.619995,306.209991,303.059998,305.549988,55758300,0.0,0.0,0.0,SPY
1,2020-06-02,306.549988,308.130005,305.100006,308.079987,74267200,0.0,0.0,0.0,SPY
2,2020-06-03,310.239990,313.220001,309.940002,312.179993,92567600,0.0,0.0,0.0,SPY
3,2020-06-04,311.109985,313.000000,309.079987,311.359985,75794400,0.0,0.0,0.0,SPY
4,2020-06-05,317.230011,321.269989,317.160004,319.339996,150524700,0.0,0.0,0.0,SPY
...,...,...,...,...,...,...,...,...,...,...
1301,2022-12-23,379.649994,383.059998,378.029999,382.910004,59857300,0.0,0.0,0.0,SPY
1302,2022-12-27,382.790009,383.149994,379.649994,381.399994,51638200,0.0,0.0,0.0,SPY
1303,2022-12-28,381.329987,383.390015,376.420013,376.660004,70911500,0.0,0.0,0.0,SPY
1304,2022-12-29,379.630005,384.350006,379.079987,383.440002,66970900,0.0,0.0,0.0,SPY


Read and display data for "SPY" where the volume is greater than 100,000,000

In [10]:
df_2 = pd.read_sql_query(
    "SELECT * from stock_data where symbol='SPY' and volume > 100000000", conn
)
display(df_2)

Unnamed: 0,date,open,high,low,close,volume,split_ratio,dividend,capital_gains,symbol
0,2020-06-05,317.230011,321.269989,317.160004,319.339996,150524700,0.0,0.000,0.0,SPY
1,2020-06-11,311.459991,312.149994,300.010010,300.609985,209243600,0.0,0.000,0.0,SPY
2,2020-06-12,308.239990,309.079987,298.600006,304.209991,194678900,0.0,0.000,0.0,SPY
3,2020-06-15,298.019989,308.279999,296.739990,307.049988,135782700,0.0,0.000,0.0,SPY
4,2020-06-16,315.480011,315.640015,307.670013,312.959991,137627500,0.0,0.000,0.0,SPY
...,...,...,...,...,...,...,...,...,...,...
303,2022-12-13,410.220001,410.489990,399.070007,401.970001,123782500,0.0,0.000,0.0,SPY
304,2022-12-14,401.609985,405.500000,396.309998,399.399994,108111300,0.0,0.000,0.0,SPY
305,2022-12-15,394.299988,395.250000,387.890015,389.630005,117705900,0.0,0.000,0.0,SPY
306,2022-12-16,385.179993,386.579987,381.040009,383.269989,119858000,0.0,1.781,0.0,SPY


**Jason Strimpel** is the founder of <a href='https://pyquantnews.com/'>PyQuant News</a> and co-founder of <a href='https://www.tradeblotter.io/'>Trade Blotter</a>. His career in algorithmic trading spans 20+ years. He previously traded for a Chicago-based hedge fund, was a risk manager at JPMorgan, and managed production risk technology for an energy derivatives trading firm in London. In Singapore, he served as APAC CIO for an agricultural trading firm and built the data science team for a global metals trading firm. Jason holds degrees in Finance and Economics and a Master's in Quantitative Finance from the Illinois Institute of Technology. His career spans America, Europe, and Asia. He shares his expertise through the <a href='https://pyquantnews.com/subscribe-to-the-pyquant-newsletter/'>PyQuant Newsletter</a>, social media, and has taught over 1,000+ algorithmic trading with Python in his popular course **<a href='https://gettingstartedwithpythonforquantfinance.com/'>Getting Started With Python for Quant Finance</a>**. All code is for educational purposes only. Nothing provided here is financial advise. Use at your own risk.

## Storing data in a networked Postgres database

To avoid the "pg_config executable not found" error when installing the psycopg2 package, ensure that the necessary PostgreSQL libraries and headers are installed on your system. For Debian/Ubuntu systems, execute sudo apt-get install libpq-dev python3-dev. For Red Hat/CentOS/Fedora, use sudo yum install postgresql-devel python3-devel. On macOS, install PostgreSQL with Homebrew using brew install postgresql. Alternatively, you can install psycopg2-binary with pip install psycopg2-binary, which includes the required dependencies without needing compilation. However, psycopg2-binary is more suitable for development and limited testing rather than production environments. If issues persist after these steps, try reinstalling SQLAlchemy with pip install --force-reinstall sqlalchemy. Following these steps will enable smooth installation and operation of psycopg2 and SQLAlchemy for connecting to a PostgreSQL database.

In [None]:
!pip install sqlalchemy psycopg2

In [1]:
import exchange_calendars as xcals
import pandas as pd
from IPython.display import Markdown, display
from openbb import obb
from sqlalchemy import create_engine, text
from sqlalchemy.exc import ProgrammingError

In [2]:
obb.user.preferences.output_type = "dataframe"

Database connection parameters

In [None]:
username = ""
password = ""
host = "127.0.0.1"
port = "5432"
database = "market_data"

In [None]:
DATABASE_URL = f"postgresql://{username}:{password}@{host}:{port}/postgres"
base_engine = create_engine(DATABASE_URL)

Function to create a new database and return an engine for that database

In [None]:
def create_database_and_get_engine(db_name, base_engine):
    conn = base_engine.connect()
    conn = conn.execution_options(isolation_level="AUTOCOMMIT")

    try:
        conn.execute(text(f"CREATE DATABASE {db_name};"))
    except ProgrammingError:
        pass
    finally:
        conn.close()

    conn_str = base_engine.url.set(database=db_name)

    return create_engine(conn_str)

Create the database and get an engine for it

In [None]:
engine = create_database_and_get_engine("stock_data", base_engine)

Function to fetch historical stock data for a given symbol and date range, and add a 'symbol' column

In [None]:
def get_stock_data(symbol, start_date=None, end_date=None):
    data = obb.equity.price.historical(
        symbol,
        start_date=start_date,
        end_date=end_date,
        provider="yfinance",
    )
    data.reset_index(inplace=True)
    data["symbol"] = symbol
    return data

Function to save the fetched stock data to a PostgreSQL database

In [None]:
def save_data_range(symbol, engine, start_date=None, end_date=None):
    data = get_stock_data(symbol, start_date, end_date)
    data.to_sql("stock_data", engine, if_exists="append", index=False)

Function to save the stock data for the last trading session to a PostgreSQL database

In [None]:
def save_last_trading_session(symbol, engine):
    today = pd.Timestamp.today()
    data = get_stock_data(symbol, today, today)
    data.to_sql("stock_data", engine, if_exists="append", index=False)

Save data for multiple stock symbols in the specified date range

In [None]:
for symbol in ["SPY", "QQQ", "DIA"]:
    save_data_range(
        symbol, engine=engine, start_date="2020-06-01", end_date="2023-01-01"
    )

Read and display data for the stock symbol "SPY" from the database

In [None]:
df_1 = pd.read_sql_query("SELECT * from stock_data where symbol='SPY'", engine)
display(df_1)

Read and display data for "SPY" where the volume is greater than 100,000,000

In [None]:
df_2 = pd.read_sql_query(
    "SELECT * from stock_data where symbol='SPY' and volume > 100000000", engine
)
display(df_2)

The best way to build your database is to first download history. You do this once to “backfill” the historic data. Then you schedule the script to run every trading day after the market close to capture that day’s data going forward.

There are a lot of references to schedule a Python job on Mac and Windows so I won’t cover it in detail here.

Here are two that will work:

​Schedule on Mac/Linux​ : https://theautomatic.net/2020/11/18/how-to-schedule-a-python-script-on-a-mac/
​Schedule on Windows​ : https://www.jcchouinard.com/python-automation-using-task-scheduler/

**Jason Strimpel** is the founder of <a href='https://pyquantnews.com/'>PyQuant News</a> and co-founder of <a href='https://www.tradeblotter.io/'>Trade Blotter</a>. His career in algorithmic trading spans 20+ years. He previously traded for a Chicago-based hedge fund, was a risk manager at JPMorgan, and managed production risk technology for an energy derivatives trading firm in London. In Singapore, he served as APAC CIO for an agricultural trading firm and built the data science team for a global metals trading firm. Jason holds degrees in Finance and Economics and a Master's in Quantitative Finance from the Illinois Institute of Technology. His career spans America, Europe, and Asia. He shares his expertise through the <a href='https://pyquantnews.com/subscribe-to-the-pyquant-newsletter/'>PyQuant Newsletter</a>, social media, and has taught over 1,000+ algorithmic trading with Python in his popular course **<a href='https://gettingstartedwithpythonforquantfinance.com/'>Getting Started With Python for Quant Finance</a>**. All code is for educational purposes only. Nothing provided here is financial advise. Use at your own risk.

## Storing data in ultra-fast HDF5

In [None]:
!conda install -c conda-forge pytables -y

In [1]:
import warnings

In [2]:
import pandas as pd
from IPython.display import Markdown, display
from openbb import obb

In [3]:
warnings.filterwarnings("ignore")
obb.user.preferences.output_type = "dataframe"

In [4]:
STOCKS_DATA_STORE = "stocks.h5"
FUTURES_DATA_STORE = "futures.h5"

In [5]:
ticker = "SPY"
root = "ES"

Fetches historical price data for the equity "SPY" starting from 2021-01-01 using the "yfinance" provider and stores it in 'spy_equity'

In [6]:
spy_equity = obb.equity.price.historical(
    ticker, start_date="2021-01-01", provider="yfinance"
)

Fetches options chains for the equity "SPY" using the "cboe" provider and stores it in 'spy_chains'

In [7]:
spy_chains = obb.derivatives.options.chains(ticker, provider="cboe")

Gets the unique expiration dates from 'spy_chains'

In [8]:
spy_expirations = spy_chains.expiration.astype(str).unique().tolist()

Fetches historical price data for a specific option of "SPY" starting from 2021-01-01 using the "yfinance" provider and stores it in 'spy_historic'

In [9]:
spy_historic = obb.equity.price.historical(
    ticker + spy_expirations[-10].replace("-", "")[2:] + "C" + "00400000",
    start_date="2021-01-01",
    provider="yfinance",
)

Save stock data to HDF5 store

In [10]:
with pd.HDFStore(STOCKS_DATA_STORE) as store:
    store.put("equities/spy/stock_prices", spy_equity)
    store.put("equities/spy/options_prices", spy_historic)
    store.put("equities/spy/chains", spy_chains)

Save futures data to HDF5 store

In [11]:
with pd.HDFStore(FUTURES_DATA_STORE) as store:
    for i in range(24, 31):
        expiry = f"20{i}-12"
        df = obb.derivatives.futures.historical(
            symbol=[root],
            expiry=expiry,
            start_date="2021-01-01",
        )
        df.rename(columns={"close": expiry}, inplace=True)
        prices = df[expiry]

        store.put(f"futures/{root}/{expiry}", prices)

Load stock data from HDF5 store

In [12]:
with pd.HDFStore(STOCKS_DATA_STORE) as store:
    spy_prices = store["equities/spy/stock_prices"]
    spy_options = store["equities/spy/options_prices"]
    spy_chains = store["equities/spy/chains"]

Load futures data from HDF5 store

In [13]:
with pd.HDFStore(FUTURES_DATA_STORE) as store:
    es_prices = store[f"futures/{root}/2024-12"]

In [14]:
display(spy_prices)
display(spy_options)
display(es_prices)

Unnamed: 0_level_0,open,high,low,close,volume,split_ratio,dividend,capital_gains
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2021-01-04,375.309998,375.450012,364.820007,368.790009,110210800,0.0,0.0,0.0
2021-01-05,368.100006,372.500000,368.049988,371.329987,66426200,0.0,0.0,0.0
2021-01-06,369.709991,376.980011,369.119995,373.549988,107997700,0.0,0.0,0.0
2021-01-07,376.100006,379.899994,375.910004,379.100006,68766800,0.0,0.0,0.0
2021-01-08,380.589996,381.489990,377.100006,381.260010,71677200,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...
2024-06-12,541.630005,544.119995,540.299988,541.359985,63251300,0.0,0.0,0.0
2024-06-13,543.150024,543.330017,539.590027,542.450012,44760900,0.0,0.0,0.0
2024-06-14,540.880005,542.809998,539.849976,542.780029,40089900,0.0,0.0,0.0
2024-06-17,542.080017,548.530029,541.609985,547.099976,55839500,0.0,0.0,0.0


Unnamed: 0_level_0,open,high,low,close,volume,split_ratio,dividend
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2022-09-13,64.449997,64.449997,64.449997,64.449997,23,0.0,0.0
2022-09-15,61.419998,61.419998,61.000000,61.000000,5,0.0,0.0
2022-09-16,60.830002,60.830002,58.000000,58.000000,2,0.0,0.0
2022-09-19,57.669998,60.000000,57.570000,60.000000,411,0.0,0.0
2022-09-20,60.000000,60.000000,56.000000,56.099998,124,0.0,0.0
...,...,...,...,...,...,...,...
2024-06-07,147.220001,147.220001,147.220001,147.220001,1,0.0,0.0
2024-06-12,154.710007,154.710007,154.009995,154.009995,131,0.0,0.0
2024-06-13,154.000000,154.000000,151.520004,153.130005,16,0.0,0.0
2024-06-17,153.850006,153.850006,153.850006,153.850006,2,0.0,0.0


date
2000-09-18    1467.5
2000-09-19    1478.5
2000-09-20    1469.5
2000-09-21    1469.5
2000-09-22    1468.5
               ...  
2024-06-13    5438.5
2024-06-14    5437.5
2024-06-17    5478.5
2024-06-18    5491.0
2024-06-19    5493.0
Name: 2024-12, Length: 6000, dtype: float64

**Jason Strimpel** is the founder of <a href='https://pyquantnews.com/'>PyQuant News</a> and co-founder of <a href='https://www.tradeblotter.io/'>Trade Blotter</a>. His career in algorithmic trading spans 20+ years. He previously traded for a Chicago-based hedge fund, was a risk manager at JPMorgan, and managed production risk technology for an energy derivatives trading firm in London. In Singapore, he served as APAC CIO for an agricultural trading firm and built the data science team for a global metals trading firm. Jason holds degrees in Finance and Economics and a Master's in Quantitative Finance from the Illinois Institute of Technology. His career spans America, Europe, and Asia. He shares his expertise through the <a href='https://pyquantnews.com/subscribe-to-the-pyquant-newsletter/'>PyQuant Newsletter</a>, social media, and has taught over 1,000+ algorithmic trading with Python in his popular course **<a href='https://gettingstartedwithpythonforquantfinance.com/'>Getting Started With Python for Quant Finance</a>**. All code is for educational purposes only. Nothing provided here is financial advise. Use at your own risk.