## Financial Data with Polars and DuckDB

In [1]:
import polars as pl
import yfinance as yf
import duckdb

### Retrieve and store financial data for further analysis

This code block downloads historical stock price data for NVIDIA (NVDA) from Yahoo Finance. It retrieves data from the beginning of 2023 to the end of the year 2023. The data is then converted into a Polars DataFrame, which is a high-performance data manipulation library similar to Pandas. A new column is added to the DataFrame to label the data with the stock symbol "NVDA".

In [11]:
TICKER = "NVDA"
START = '2011-01-01'
END = '2024-12-06'

prices = yf.download(TICKER, start=START, end=END)

df = (
    pl
    .from_pandas(
        prices
        .reset_index()
    )
    .with_columns(
        [pl.lit(TICKER).alias("symbol")]
    )
)
df.columns = [col.strip("()''").split(",")[0].replace("'", "") for col in df.columns]

[*********************100%***********************]  1 of 1 completed


In [16]:
df

Date,Adj Close,Close,High,Low,Open,Volume,symbol
datetime[ns],f64,f64,f64,f64,f64,i64,str
2011-01-03 00:00:00,0.362707,0.3955,0.39925,0.3875,0.388,817448000,"""NVDA"""
2011-01-04 00:00:00,0.361561,0.39425,0.398,0.3855,0.39625,651384000,"""NVDA"""
2011-01-05 00:00:00,0.389303,0.4245,0.425,0.3975,0.4015,1428216000,"""NVDA"""
2011-01-06 00:00:00,0.443181,0.48325,0.4835,0.43425,0.4355,3493312000,"""NVDA"""
2011-01-07 00:00:00,0.455562,0.49675,0.49825,0.467,0.47775,2579984000,"""NVDA"""
…,…,…,…,…,…,…,…
2024-11-29 00:00:00,138.240479,138.25,139.350006,136.050003,136.779999,141863200,"""NVDA"""
2024-12-02 00:00:00,138.620453,138.630005,140.449997,137.820007,138.830002,171682800,"""NVDA"""
2024-12-03 00:00:00,140.250336,140.259995,140.539993,137.949997,138.259995,164414000,"""NVDA"""
2024-12-04 00:00:00,145.130005,145.139999,145.789993,140.289993,142.0,231224300,"""NVDA"""


The code uses Yahoo Finance to download stock prices for NVIDIA, covering the specified date range. The downloaded data is in Pandas format, which is then converted into a Polars DataFrame for efficient processing. After resetting the index to ensure the date is a regular column, a new column "symbol" is added with the value "NVDA" for easy identification of the stock in later analyses. This setup is particularly useful for managing and analyzing large datasets efficiently.

In [24]:
import plotly.express as px

fig = px.line(df,
              x=df['Date'],  # assuming your date
              y=["Close", "High"],
              title="Titolo",
              width=1500,
              height=700,
              labels={"value": "Sales", "index": "Date"},  # rename axes
              color_discrete_sequence=['#0000FF', '#ff8c00']  # set line colors
             )

# Update layout with rangeslider and range selector
fig.update_xaxes(
    rangeslider_visible=True,
    rangeselector=dict(
        buttons=list([
            dict(count=1, label="1y", step="year", stepmode="backward"),
            dict(count=2, label="3y", step="year", stepmode="backward"),
            dict(count=3, label="5y", step="year", stepmode="backward"),
            dict(step="all")
        ])
    )
)

# Additional layout customization
fig.update_layout(
    showlegend=True,
    legend_title_text=''
)

fig.show()

### Create a database and store the stock data for querying

This code initializes a connection to a DuckDB database called 'stocks.db' and creates a table named 'stocks'. If the table already exists, it will not create a new one. The data from our Polars DataFrame is stored in this table. This allows us to use SQL queries on the stock data, which is particularly helpful for complex data manipulations and analyses.

In [25]:
con = duckdb.connect('stocks.ddb')

con.execute("""
    CREATE TABLE IF NOT EXISTS stocks AS SELECT * FROM df
""")

<duckdb.duckdb.DuckDBPyConnection at 0x79ddb54930b0>

The connection to DuckDB is established, allowing for efficient SQL operations on data stored in-memory or on disk. The code checks if a table named 'stocks' exists in the database. If not, it creates it using the data from the Polars DataFrame. This integration of Polars and DuckDB enables the user to leverage SQL queries for data analysis, combining the strengths of both SQL and DataFrame manipulations.

### Perform SQL queries to extract insights from the data

Here, we execute SQL queries on the stored stock data to calculate the average closing price and find high-volume trading days. The first query calculates the average closing price for NVDA and the second query identifies the top 5 days with the highest trading volume. This approach highlights how SQL can be used to quickly summarize and explore financial data.

In [28]:
query1 = """
    SELECT symbol,
           Date,
           Volume,
           Close
    FROM stocks
    ORDER BY Volume DESC
    LIMIT 5
"""

pl.DataFrame(
    con.execute(query1).fetchdf()
)

symbol,Date,Volume,Close
str,datetime[ns],i64,f64
"""NVDA""",2023-05-25 00:00:00,1543911000,37.98
"""NVDA""",2023-08-24 00:00:00,1156044000,47.162998
"""NVDA""",2024-03-08 00:00:00,1142269000,87.528
"""NVDA""",2023-02-23 00:00:00,1117995000,23.664
"""NVDA""",2023-05-31 00:00:00,1002580000,37.834


The SQL query calculates the average closing price of NVIDIA stock by grouping all data entries by the stock symbol and averaging the closing prices. The result is then rounded to two decimal places for clarity. The second query retrieves the top 5 trading days with the highest volume of trades. The results are converted into Polars DataFrames, allowing easy manipulation and visualization of the output. These queries provide a quick overview of key metrics, such as average price and trading volume peaks.

### Calculate rolling VWAP to track stock price trends

We use SQL to calculate the 20-day rolling Volume Weighted Average Price (VWAP) for NVIDIA stock. The VWAP is an important metric for traders as it provides insights into the average price a stock has traded at, factoring in volume. The rolling VWAP smooths out daily fluctuations to reveal longer-term trends.

In [27]:
vwap_query = """
WITH daily_vwap AS (
    SELECT
        "Date",
        symbol,
        SUM(Volume * Close) / SUM(Volume) as vwap
    FROM stocks
    GROUP BY "Date", symbol
),
rolling_vwap AS (
    SELECT
        "Date",
        symbol,
        AVG(vwap) OVER (
            PARTITION BY symbol
            ORDER BY "Date"
            ROWS BETWEEN 19 PRECEDING AND CURRENT ROW
        ) as rolling_20d_vwap
    FROM daily_vwap
)
SELECT * FROM rolling_vwap
ORDER BY symbol, "Date";
"""
pl.DataFrame(con.execute(vwap_query).fetchdf())

Date,symbol,rolling_20d_vwap
datetime[ns],str,f64
2023-01-03 00:00:00,"""NVDA""",14.315
2023-01-04 00:00:00,"""NVDA""",14.532
2023-01-05 00:00:00,"""NVDA""",14.443
2023-01-06 00:00:00,"""NVDA""",14.547
2023-01-09 00:00:00,"""NVDA""",14.7632
…,…,…
2024-11-29 00:00:00,"""NVDA""",142.511999
2024-12-02 00:00:00,"""NVDA""",142.673499
2024-12-03 00:00:00,"""NVDA""",142.883999
2024-12-04 00:00:00,"""NVDA""",143.145499


The code defines a SQL query that first calculates the daily VWAP by dividing the total value traded (Volume * Close) by the total volume for each day. Next, it computes a 20-day rolling average of these daily VWAPs to smooth short-term fluctuations and reveal longer-term price trends. The use of a window function in SQL allows the rolling average to be calculated efficiently. The output is then converted into a Polars DataFrame for further analysis or visualization. This process helps traders and analysts understand how the stock's price trends over time, adjusted for trading volume.

In [8]:
con.close()

Closing the DuckDB connection ensures that all resources are freed and the database is properly closed. This is a good practice to prevent memory leaks and ensure data integrity after completing database operations.