<div style="background-color:#000;"><img src="pqn.png"></img></div>

## Imports and setup

We rely on these libraries to pull stock, options, and financial data from Yahoo Finance, and to handle and organize the results in convenient tables.

In [None]:
import pandas as pd
import yfinance as yf

We're defining a small list of stock tickers that we'll analyze in the next steps.

In [None]:
stock_symbols = ["AAPL", "MSFT", "GOOGL"]

Here, we're pulling past market prices, available options, and key financial statements for each company in our list.

In [None]:
market_data = {
    symbol: yf.Ticker(symbol).history(period="1y") for symbol in stock_symbols
}
options_data = {symbol: yf.Ticker(symbol).option_chain() for symbol in stock_symbols}
financials_data = {symbol: yf.Ticker(symbol).financials for symbol in stock_symbols}

This block sets up everything we need to work with data from Yahoo Finance. We start with our list of ticker symbols. Using a standard interface, we reach out to Yahoo Finance and grab a year of daily trading data, complete option chain data, and recent financials for each ticker. We organize all of this into dictionaries with the ticker symbol as the key for easy reference later.

## Organize and standardize financial data

We're reformatting and cleaning the stock market price data so it's consistent and easy to work with later.

In [None]:
standardized_market_data = []
for symbol, df in market_data.items():
    if not df.empty:
        df = df[["Open", "High", "Low", "Close", "Volume"]]
        df = df.dropna()
        df = df.astype(float)
        df["Symbol"] = symbol
        standardized_market_data.append(df)
market_df = pd.concat(standardized_market_data).reset_index()

This section loops through each stock's daily trading data. We pick out the price and volume columns that matter most, remove any missing data, and make sure all numbers are in a usable format. We add a column so we always know which row belongs to which company. At the end, we stack all the companies' data together in a single, neat table, and include the date as a useful column.

We reshape the options data so each company’s calls and puts are formatted the same way and combined in one spot.

In [None]:
standardized_options_data = []
for symbol, option_chain in options_data.items():
    calls = option_chain.calls
    puts = option_chain.puts
    for opt_type, opt_df in [("call", calls), ("put", puts)]:
        if not opt_df.empty:
            opt_df = opt_df[
                [
                    "contractSymbol",
                    "strike",
                    "lastPrice",
                    "bid",
                    "ask",
                    "volume",
                    "openInterest",
                ]
            ]
            opt_df = opt_df.dropna()
            opt_df = opt_df.astype(
                {
                    "strike": float,
                    "lastPrice": float,
                    "bid": float,
                    "ask": float,
                    "volume": float,
                    "openInterest": float,
                }
            )
            opt_df["Type"] = opt_type
            opt_df["Symbol"] = symbol
            standardized_options_data.append(opt_df)
options_df = pd.concat(standardized_options_data).reset_index(drop=True)

For each company, we work through their option contracts—both calls and puts. We pick out contract IDs, strike prices, recent trading prices, and the amount of interest from buyers. Everything is cleaned up, filled with numbers only, and tagged with the type and company name. We pull all companies' options together into a single big table, keeping things lined up and ready for analysis.

We’re setting up the financial statement data so every company is displayed in the same way, making it easy to compare performance.

In [None]:
standardized_financials_data = []
for symbol, df in financials_data.items():
    if not df.empty:
        df = df.transpose()
        df["Symbol"] = symbol
        df = df.dropna(axis=1, how="all")
        standardized_financials_data.append(df)
financials_df = pd.concat(standardized_financials_data).reset_index()

Here, we loop through the companies and gather their income statements and other key financials. We flip the layout to make each year a row, which is easier to pull apart later. We also hide any columns where all companies have missing values, and label every row by company. The final product is a side-by-side view of the most recent financials for each stock on our list.

## Clean and finalize data frames

We're making a final check for missing values in our results and removing any incomplete rows for reliable analysis.

In [None]:
if market_df.isnull().values.any():
    market_df = market_df.dropna()
if options_df.isnull().values.any():
    options_df = options_df.dropna()

Before our data is ready for use, we take one last sweep through the key tables and delete any rows with blanks. This makes sure future analysis isn’t thrown off by empty entries. Now, everything is tidy and set for use in models, charts, or whatever next step we want to take.

<a href="https://pyquantnews.com/">PyQuant News</a> is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to <a href="https://gettingstartedwithpythonforquantfinance.com/">get started with Python for quant finance</a>. For educational purposes. Not investment advice. Use at your own risk.