<a href="https://colab.research.google.com/github/prof-rossetti/intro-to-python/blob/main/exercises/csv-processing/Stock_Data_Processing_with_Pandas_(Summer_2023).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook, we will focus on using the `pandas` package for practical processing of financial data.

In Part 1, we will demonstrate our ability to fetch CSV formatted stock price data from the [AlphaVantage API](https://www.alphavantage.co/). We have already practiced fetching JSON formatted data from this API, however some of the endpoints offer the data in CSV format, which will most likely be easier to process.

In Part 2, we will demonstrate our ability to fetch stock price data from the `YahooQuery` package. We will see `YahooQuery` makes it easy to get stock prices for multiple companies at once. Then with this data about multiple companies, we will focus on comparing the growth rates of each company, and determining which company has the highest growth.

## 1) Data from AlphaVantage API



We can use the `pandas.read_csv` function to process dynamic CSV data provided by an API. For this example, let's first obtain and supply an AlphaVantage API Key, or use one of the prof's premium keys:

In [None]:
from getpass import getpass

API_KEY = getpass("Please input your API key:") or "demo"

Please input your API key:··········


In [None]:
#from pandas import read_csv
#
#read_csv("daily_adjusted_GOOGL.csv")

In [None]:
from pandas import read_csv

# see: https://www.alphavantage.co/documentation/#dailyadj
# ... the datatype=csv URL param gives us data in CSV format, which may be easier to work with

# request_url = "https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=IBM&apikey=demo&datatype=json"

symbol = "MSFT"
request_url = f"https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol={symbol}&apikey={API_KEY}&datatype=csv"

prices_df = read_csv(request_url)
prices_df.head()

Unnamed: 0,timestamp,open,high,low,close,adjusted_close,volume,dividend_amount,split_coefficient
0,2023-06-21,336.37,337.73,332.07,333.56,333.56,24865845,0.0,1.0
1,2023-06-20,339.31,342.08,335.86,338.05,338.05,26375407,0.0,1.0
2,2023-06-16,351.32,351.47,341.95,342.33,342.33,46551985,0.0,1.0
3,2023-06-15,337.48,349.84,337.2,348.1,348.1,38899075,0.0,1.0
4,2023-06-14,334.34,339.04,332.81,337.34,337.34,26003791,0.0,1.0


In [None]:
print(type(prices_df))

<class 'pandas.core.frame.DataFrame'>


#### Getting the Latest Closing Price

In [None]:
# column centric approach
prices_df.columns

Index(['timestamp', 'open', 'high', 'low', 'close', 'adjusted_close', 'volume',
       'dividend_amount', 'split_coefficient'],
      dtype='object')

In [None]:
print(type(prices_df["adjusted_close"]))
prices_df["adjusted_close"]

<class 'pandas.core.series.Series'>


0     333.560000
1     338.050000
2     342.330000
3     348.100000
4     337.340000
         ...    
95    263.363259
96    251.568646
97    246.651736
98    241.575573
99    247.000100
Name: adjusted_close, Length: 100, dtype: float64

In [None]:
prices_df["adjusted_close"][0]

333.56

In [None]:
# row centric approach

prices_df.iloc[0]

timestamp            2023-06-21
open                     336.37
high                     337.73
low                      332.07
close                    333.56
adjusted_close           333.56
volume                 24865845
dividend_amount             0.0
split_coefficient           1.0
Name: 0, dtype: object

In [None]:
prices_df.iloc[0]["adjusted_close"]

333.56

In [None]:
# example mapping operation
prices_df["adjusted_close"] * 100

0     33356.000000
1     33805.000000
2     34233.000000
3     34810.000000
4     33734.000000
          ...     
95    26336.325918
96    25156.864610
97    24665.173567
98    24157.557308
99    24700.009977
Name: adjusted_close, Length: 100, dtype: float64

In [None]:
# quick check for stock splits, because they might warrant adjustments to the other columns as well
prices_df["split_coefficient"].value_counts() #> all values are 1.0, so in this case we have no splits

1.0    100
Name: split_coefficient, dtype: int64

Now that we have our prices as a pandas series (or column), aggregations are easy. Let's calculate the average price, the 100 day high and low, as well as the volatility (standard deviation) of prices.

In [None]:
print("AVERAGE CLOSING PRICE:")
prices_df["adjusted_close"].mean()

AVERAGE CLOSING PRICE:


288.36124891224006

In [None]:
# FYI - if there were stock splits, we would have to create an adjusted version of the high
print("100 DAY HIGH PRICE:")
print(prices_df["high"].max())

100 DAY HIGH PRICE:
351.47


In [None]:
# FYI - if there were stock splits, we would have to create an adjusted version of the low
print("100 DAY LOW PRICE:")
print(prices_df["low"].min())

100 DAY LOW PRICE:
242.2


In [None]:
print("VOLATILITY OF CLOSING PRICES (STANDARD DEVIATION):")
prices_df["adjusted_close"].std()

VOLATILITY OF CLOSING PRICES (STANDARD DEVIATION):


29.394966643762185

### Plotting Stock Prices

In [None]:
import plotly.express as px

#px.line(x=[1,2,3], y=[100, 200, 300], title=f"Adjusted Closing Prices for {symbol}")

#px.line(x=prices_df["timestamp"], y=prices_df["adjusted_close"], title=f"Adjusted Closing Prices for {symbol}")

In [None]:
import plotly.express as px

px.line(prices_df, x="timestamp", y="adjusted_close", title=f"Adjusted Closing Prices for {symbol}")

## 2) Data from `YahooQuery` Package

In [None]:
# setup cell (run and leave as is)

%%capture

!pip install yahooquery

https://yahooquery.dpguthrie.com/guide/ticker/intro/

In [None]:
# setup cell (run and leave as is)

from yahooquery import Ticker

symbol = "MSFT"
company = Ticker(symbol)
type(company)

yahooquery.ticker.Ticker

### Exploring Available Datasets

In [None]:
# dir(company)

In [None]:
# company.country #> str
# company.asset_profile #> dict
# company.calendar_events #> dict
# company.earnings #> dict
# company.earnings_trend #> dict
# company.esg_scores #> dict
# company.financial_data #> dict
# company.key_stats #> dict
# company.price #> dict
# company.recommendations #> dict of similar stocks and reco scores
# company.summary_detail #> dict
# company.summary_profile #> dict with address and long summary and number of employees
# company.technical_insights #> dict

# company.dividend_history(start="2020-01-01") #> series or df

# company.all_financial_data() #> dataframe
# company.balance_sheet() #> dataframe
# company.cash_flow() #> dataframe
# company.company_officers #> dataframe
# company.corporate_events #> dataframe
# company.corporate_guidance #> dataframe
# company.earning_history #> dataframe
# company.history() #> dataframe of daily prices
# company.income_statement() #> dataframe
# company.news() #> ["error"]
# company.option_chain #> dataframe
# company.recommendation_trend #> df row per period
# company.sec_filings #> dataframe

Dividend history:

In [None]:
company.dividend_history(start="2021-01-01").head()

Unnamed: 0_level_0,Unnamed: 1_level_0,dividends
symbol,date,Unnamed: 2_level_1
MSFT,2021-02-17,0.56
MSFT,2021-05-19,0.56
MSFT,2021-08-18,0.56
MSFT,2021-11-17,0.62
MSFT,2022-02-16,0.62


In [None]:
company.all_financial_data()

Unnamed: 0_level_0,asOfDate,periodType,currencyCode,AccountsPayable,AccountsReceivable,AccumulatedDepreciation,AllowanceForDoubtfulAccountsReceivable,BasicAverageShares,BasicEPS,BeginningCashPosition,...,TotalNonCurrentLiabilitiesNetMinorityInterest,TotalOperatingIncomeAsReported,TotalRevenue,TotalTaxPayable,TotalUnusualItems,TotalUnusualItemsExcludingGoodwill,TradeandOtherPayablesNonCurrent,WorkInProcess,WorkingCapital,WriteOff
symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
MSFT,2019-06-30,12M,USD,9382000000.0,29524000000.0,-35330000000.0,-411000000.0,7673000000.0,5.11,11946000000.0,...,114806000000.0,42959000000.0,125843000000.0,5665000000.0,710000000.0,710000000.0,29612000000.0,53000000.0,106132000000.0,
MSFT,2020-06-30,12M,USD,12530000000.0,32011000000.0,-43197000000.0,-788000000.0,7610000000.0,5.82,11356000000.0,...,110697000000.0,52959000000.0,143015000000.0,2130000000.0,28000000.0,28000000.0,29432000000.0,83000000.0,109605000000.0,
MSFT,2021-06-30,12M,USD,15163000000.0,38043000000.0,-51351000000.0,-751000000.0,7547000000.0,8.12,13576000000.0,...,103134000000.0,69916000000.0,168088000000.0,2174000000.0,1303000000.0,1303000000.0,27190000000.0,79000000.0,95749000000.0,
MSFT,2022-06-30,12M,USD,19000000000.0,44261000000.0,-59660000000.0,-633000000.0,7496000000.0,9.7,14224000000.0,...,103216000000.0,83383000000.0,198270000000.0,4067000000.0,334000000.0,334000000.0,26069000000.0,82000000.0,74602000000.0,101000000.0


In [None]:
company.balance_sheet() #> dataframe

Unnamed: 0_level_0,asOfDate,periodType,currencyCode,AccountsPayable,AccountsReceivable,AccumulatedDepreciation,AllowanceForDoubtfulAccountsReceivable,BuildingsAndImprovements,CapitalLeaseObligations,CapitalStock,...,TotalCapitalization,TotalDebt,TotalEquityGrossMinorityInterest,TotalLiabilitiesNetMinorityInterest,TotalNonCurrentAssets,TotalNonCurrentLiabilitiesNetMinorityInterest,TotalTaxPayable,TradeandOtherPayablesNonCurrent,WorkInProcess,WorkingCapital
symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
MSFT,2019-06-30,12M,USD,9382000000.0,29524000000.0,-35330000000.0,-411000000.0,26288000000.0,6188000000.0,78520000000.0,...,168992000000.0,78366000000.0,102330000000.0,184226000000.0,111004000000.0,114806000000.0,5665000000.0,29612000000.0,53000000.0,106132000000.0
MSFT,2020-06-30,12M,USD,12530000000.0,32011000000.0,-43197000000.0,-788000000.0,33995000000.0,7671000000.0,80552000000.0,...,177882000000.0,70998000000.0,118304000000.0,183007000000.0,119396000000.0,110697000000.0,2130000000.0,29432000000.0,83000000.0,109605000000.0
MSFT,2021-06-30,12M,USD,15163000000.0,38043000000.0,-51351000000.0,-751000000.0,43928000000.0,9629000000.0,83111000000.0,...,192062000000.0,67775000000.0,141988000000.0,191791000000.0,149373000000.0,103134000000.0,2174000000.0,27190000000.0,79000000.0,95749000000.0
MSFT,2022-06-30,12M,USD,19000000000.0,44261000000.0,-59660000000.0,-633000000.0,55014000000.0,11489000000.0,86939000000.0,...,213574000000.0,61270000000.0,166542000000.0,198298000000.0,195156000000.0,103216000000.0,4067000000.0,26069000000.0,82000000.0,74602000000.0


In [None]:
company.cash_flow()

Unnamed: 0_level_0,asOfDate,periodType,currencyCode,BeginningCashPosition,CapitalExpenditure,CashDividendsPaid,CashFlowFromContinuingFinancingActivities,CashFlowFromContinuingInvestingActivities,CashFlowFromContinuingOperatingActivities,ChangeInAccountPayable,...,NetShortTermDebtIssuance,OperatingCashFlow,OperatingGainsLosses,PurchaseOfBusiness,PurchaseOfInvestment,PurchaseOfPPE,RepaymentOfDebt,RepurchaseOfCapitalStock,SaleOfInvestment,StockBasedCompensation
symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
MSFT,2019-06-30,12M,USD,11946000000.0,-13925000000.0,-13811000000.0,-36887000000.0,-15773000000.0,52185000000.0,232000000.0,...,0.0,52185000000.0,-792000000.0,-2388000000.0,-57697000000.0,-13925000000.0,-4000000000.0,-19543000000.0,58237000000.0,4652000000.0
MSFT,2020-06-30,12M,USD,11356000000.0,-15441000000.0,-15137000000.0,-46031000000.0,-12223000000.0,60675000000.0,3018000000.0,...,0.0,60675000000.0,-219000000.0,-2521000000.0,-77190000000.0,-15441000000.0,-5518000000.0,-22968000000.0,84170000000.0,5289000000.0
MSFT,2021-06-30,12M,USD,13576000000.0,-20622000000.0,-16521000000.0,-48486000000.0,-27577000000.0,76740000000.0,2798000000.0,...,,76740000000.0,-1249000000.0,-8909000000.0,-62924000000.0,-20622000000.0,-3750000000.0,-27385000000.0,65800000000.0,6118000000.0
MSFT,2022-06-30,12M,USD,14224000000.0,-23886000000.0,-18135000000.0,-58876000000.0,-30311000000.0,89035000000.0,2943000000.0,...,,89035000000.0,-409000000.0,-22038000000.0,-26456000000.0,-23886000000.0,-9023000000.0,-32696000000.0,44894000000.0,7502000000.0
MSFT,2022-12-31,TTM,USD,20604000000.0,-24768000000.0,-18964000000.0,-52846000000.0,-36182000000.0,84386000000.0,-446000000.0,...,,84386000000.0,454000000.0,-21010000000.0,-30254000000.0,-24768000000.0,-5947000000.0,-28611000000.0,43330000000.0,8633000000.0
MSFT,2023-03-31,TTM,USD,12498000000.0,-26035000000.0,-19378000000.0,-45791000000.0,-23275000000.0,83441000000.0,-1373000000.0,...,,83441000000.0,309000000.0,-2592000000.0,-30594000000.0,-26035000000.0,-1750000000.0,-25298000000.0,39931000000.0,9192000000.0


In [None]:
company.company_officers

Unnamed: 0_level_0,Unnamed: 1_level_0,maxAge,name,age,title,yearBorn,fiscalYear,totalPay,exercisedValue,unexercisedValue
symbol,row,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
MSFT,0,1,Mr. Satya Nadella,55.0,Chairman & CEO,1967.0,2022.0,12676750.0,0,0
MSFT,1,1,Mr. Bradford L. Smith LCA,63.0,Pres & Vice Chairman,1959.0,2022.0,4655274.0,0,0
MSFT,2,1,Ms. Amy E. Hood,50.0,Exec. VP & CFO,1972.0,2022.0,4637915.0,0,0
MSFT,3,1,Mr. Judson Althoff,49.0,Exec. VP & Chief Commercial Officer,1973.0,2022.0,4428268.0,0,0
MSFT,4,1,Mr. Christopher David Young,50.0,"Exec. VP of Bus. Devel., Strategy & Ventures",1972.0,2022.0,4588876.0,0,0
MSFT,5,1,Ms. Alice L. Jolla,55.0,Corp. VP & Chief Accounting Officer,1967.0,,,0,0
MSFT,6,1,Mr. James Kevin Scott,50.0,Exec. VP of AI & CTO,1972.0,,,0,0
MSFT,7,1,Brett Iversen,,Gen. Mang. of Investor Relations,,,,0,0
MSFT,8,1,Mr. Frank X. Shaw,,Corp. VP for Corp. Communications,,,,0,0
MSFT,9,1,Mr. Christopher C. Capossela,52.0,Exec. VP & Chief Marketing Officer,1970.0,,,0,0


In [None]:
company.corporate_events


Unnamed: 0_level_0,Unnamed: 1_level_0,id,significance,headline,description,parentTopics
symbol,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
MSFT,2013-09-17,2832441,1,Microsoft Corp Announces Quarterly Dividend In...,Microsoft Corp. announced on that its Board of...,Performance
MSFT,2013-11-20,2872539,3,Microsoft Corp Declares Quarterly Dividend,Microsoft Corp announced that its Board of Dir...,Performance
MSFT,2014-03-11,2942626,3,Microsoft Corp declares quarterly dividend,Microsoft Corp:Declares quarterly dividend of ...,Performance
MSFT,2014-06-10,3007564,3,Microsoft Corp declares quarterly dividend,Microsoft Corp:Says that its board declared qu...,Performance
MSFT,2014-09-16,3069314,1,Microsoft Corp announces quarterly dividend in...,Microsoft Corp:Says that its board declared qu...,Performance
MSFT,...,...,...,...,...,...
MSFT,2023-01-24,4649139,1,Microsoft Sees Q3 Productivity And Business Pr...,Jan 24 (Reuters) - Microsoft Corp <MSFT.O>::MI...,Corporate Guidance
MSFT,2023-04-25,4695510,1,Microsoft Reports Quarterly Revenue In More Pe...,April 25 (Reuters) - Microsoft Corp <MSFT.O>::...,Corporate Guidance
MSFT,2023-04-25,4695644,1,Microsoft Says Sees Q4 Intelligent Cloud Reven...,April 25 (Reuters) - Microsoft Corp <MSFT.O>::...,Corporate Guidance
MSFT,2023-04-26,4696423,1,Activision Blizzard Announces First Quarter 20...,April 26 (Reuters) - Activision Blizzard Inc <...,Corporate Guidance


In [None]:
company.corporate_guidance

Unnamed: 0_level_0,Unnamed: 1_level_0,id,significance,headline,description,parentTopics
symbol,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
MSFT,2014-07-17,3031584,1,Microsoft announces steps to simplify organiza...,Microsoft Corp:Says restructuring plan to simp...,Restructuring/Reorganization/Related
MSFT,2016-05-24,3380603,1,Finland's Kone picks CFO from Microsoft's phon...,Kone Oyj <KNEBV.HE> : says appoints Ilkka Hara...,Performance
MSFT,2016-06-13,3390319,1,Microsoft sees $150 mln of cost synergies annu...,Microsoft Corp <MSFT.O>: Sees $150 mln of cost...,Performance
MSFT,2016-07-19,3407917,1,Microsoft sees negative forex impact of 2 pts ...,Microsoft Corp <MSFT.O> : Sees negative forex ...,Corporate Guidance
MSFT,2016-07-19,3407928,1,Microsoft Corp CFO says expect full year opera...,Microsoft Corp CFO Hood : In q4 we did see add...,Corporate Guidance
MSFT,...,...,...,...,...,...
MSFT,2023-01-24,4649139,1,Microsoft Sees Q3 Productivity And Business Pr...,Jan 24 (Reuters) - Microsoft Corp <MSFT.O>::MI...,Corporate Guidance
MSFT,2023-04-25,4695510,1,Microsoft Reports Quarterly Revenue In More Pe...,April 25 (Reuters) - Microsoft Corp <MSFT.O>::...,Corporate Guidance
MSFT,2023-04-25,4695644,1,Microsoft Says Sees Q4 Intelligent Cloud Reven...,April 25 (Reuters) - Microsoft Corp <MSFT.O>::...,Corporate Guidance
MSFT,2023-04-26,4696423,1,Activision Blizzard Announces First Quarter 20...,April 26 (Reuters) - Activision Blizzard Inc <...,Corporate Guidance


In [None]:
company.earning_history

Unnamed: 0_level_0,Unnamed: 1_level_0,maxAge,epsActual,epsEstimate,epsDifference,surprisePercent,quarter,period
symbol,row,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
MSFT,0,1,2.23,2.29,-0.06,-0.026,2022-06-30,-4q
MSFT,1,1,2.35,2.3,0.05,0.022,2022-09-30,-3q
MSFT,2,1,2.32,2.29,0.03,0.013,2022-12-31,-2q
MSFT,3,1,2.45,2.23,0.22,0.099,2023-03-31,-1q


Looks like this "history" dataframe has the historical prices, which are probably of particular interest:

In [None]:
company.history()

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,adjclose,dividends
symbol,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
MSFT,2023-01-03,243.080002,245.750000,237.399994,239.580002,25740000,238.460129,0.0
MSFT,2023-01-04,232.279999,232.869995,225.960007,229.100006,50623400,228.029114,0.0
MSFT,2023-01-05,227.199997,227.550003,221.759995,222.309998,39585600,221.270859,0.0
MSFT,2023-01-06,223.000000,225.759995,219.350006,224.929993,43613600,223.878616,0.0
MSFT,2023-01-09,226.449997,231.240005,226.410004,227.119995,27369800,226.058365,0.0
MSFT,...,...,...,...,...,...,...,...
MSFT,2023-06-14,334.339996,339.040009,332.809998,337.339996,26003800,337.339996,0.0
MSFT,2023-06-15,337.480011,349.839996,337.200012,348.100006,38899100,348.100006,0.0
MSFT,2023-06-16,351.320007,351.470001,341.950012,342.329987,46533600,342.329987,0.0
MSFT,2023-06-20,339.309998,342.079987,335.859985,338.049988,26375400,338.049988,0.0


In [None]:
company.income_statement()

Unnamed: 0_level_0,asOfDate,periodType,currencyCode,BasicAverageShares,BasicEPS,CostOfRevenue,DilutedAverageShares,DilutedEPS,DilutedNIAvailtoComStockholders,EBIT,...,SpecialIncomeCharges,TaxEffectOfUnusualItems,TaxProvision,TaxRateForCalcs,TotalExpenses,TotalOperatingIncomeAsReported,TotalRevenue,TotalUnusualItems,TotalUnusualItemsExcludingGoodwill,WriteOff
symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
MSFT,2019-06-30,12M,USD,7673000000.0,5.11,42910000000.0,7753000000.0,5.06,39240000000.0,42959000000.0,...,0.0,0.0,4448000000.0,0.1,82884000000.0,42959000000.0,125843000000.0,710000000.0,710000000.0,
MSFT,2020-06-30,12M,USD,7610000000.0,5.82,46078000000.0,7683000000.0,5.76,44281000000.0,52959000000.0,...,,0.0,8755000000.0,0.16,90056000000.0,52959000000.0,143015000000.0,28000000.0,28000000.0,
MSFT,2021-06-30,12M,USD,7547000000.0,8.12,52232000000.0,7608000000.0,8.05,61271000000.0,69916000000.0,...,,0.0,9831000000.0,0.138266,98172000000.0,69916000000.0,168088000000.0,1303000000.0,1303000000.0,
MSFT,2022-06-30,12M,USD,7496000000.0,9.7,62650000000.0,7540000000.0,9.65,72738000000.0,83383000000.0,...,-101000000.0,0.0,10978000000.0,0.131134,114887000000.0,83383000000.0,198270000000.0,334000000.0,334000000.0,101000000.0
MSFT,2022-12-31,TTM,USD,,,64984000000.0,,,67449000000.0,84577000000.0,...,,-100269200.0,15139000000.0,0.183308,121279000000.0,82815000000.0,204094000000.0,-547000000.0,-547000000.0,
MSFT,2023-03-31,TTM,USD,,,65497000000.0,,,69020000000.0,87053000000.0,...,,-38867600.0,16051000000.0,0.188678,122788000000.0,84803000000.0,207591000000.0,-206000000.0,-206000000.0,


In [None]:
#company.news() #> ["error"]

In [None]:
company.option_chain

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,contractSymbol,strike,currency,lastPrice,change,percentChange,volume,openInterest,bid,ask,contractSize,lastTradeDate,impliedVolatility,inTheMoney
symbol,expiration,optionType,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
MSFT,2023-06-23,calls,MSFT230623C00215000,215.0,USD,122.57,0.000000,0.000000,3.0,3,117.40,120.20,REGULAR,2023-06-15 13:38:58,2.667972,True
MSFT,2023-06-23,calls,MSFT230623C00220000,220.0,USD,124.70,0.000000,0.000000,5.0,5,112.35,115.15,REGULAR,2023-06-16 18:52:31,2.464848,True
MSFT,2023-06-23,calls,MSFT230623C00225000,225.0,USD,115.80,0.000000,0.000000,3.0,4,107.35,110.05,REGULAR,2023-06-15 13:45:12,2.253911,True
MSFT,2023-06-23,calls,MSFT230623C00230000,230.0,USD,104.15,-4.470001,-4.115265,2.0,22,102.55,105.15,REGULAR,2023-06-21 14:47:10,2.361332,True
MSFT,2023-06-23,calls,MSFT230623C00235000,235.0,USD,92.09,0.000000,0.000000,1.0,35,97.45,100.30,REGULAR,2023-06-09 16:54:15,2.267582,True
MSFT,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
MSFT,2025-12-19,puts,MSFT251219P00420000,420.0,USD,94.15,-4.949997,-4.994952,1.0,33,92.20,96.05,REGULAR,2023-06-21 15:34:48,0.162759,True
MSFT,2025-12-19,puts,MSFT251219P00430000,430.0,USD,133.00,0.000000,0.000000,0.0,1,100.35,104.20,REGULAR,2023-04-26 16:58:59,0.159829,True
MSFT,2025-12-19,puts,MSFT251219P00440000,440.0,USD,122.15,0.000000,0.000000,1.0,1,100.50,105.00,REGULAR,2023-05-19 13:41:20,0.000010,True
MSFT,2025-12-19,puts,MSFT251219P00450000,450.0,USD,108.49,0.000000,0.000000,0.0,2,116.30,120.50,REGULAR,2023-06-15 17:20:56,0.146844,True


In [None]:
company.recommendation_trend

Unnamed: 0_level_0,Unnamed: 1_level_0,period,strongBuy,buy,hold,sell,strongSell
symbol,row,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
MSFT,0,0m,14,13,6,0,1
MSFT,1,-1m,20,26,8,0,1
MSFT,2,-2m,19,26,8,0,1
MSFT,3,-3m,19,26,8,0,1


In [None]:
company.sec_filings

Unnamed: 0_level_0,Unnamed: 1_level_0,date,epochDate,type,title,edgarUrl,exhibits,maxAge
symbol,row,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
MSFT,0,2023-05-30,2023-05-30 00:00:00,SD,Specialized Disclosure Report filed pursuant t...,https://finance.yahoo.com/sec-filing/MSFT/0001...,"[{'type': 'SD', 'url': 'https://cdn.yahoofinan...",1
MSFT,1,2023-04-25,2023-04-25 00:00:00,8-K,Corporate Changes & Voting Matters,https://finance.yahoo.com/sec-filing/MSFT/0001...,"[{'type': 'EX-99.1', 'url': 'https://cdn.yahoo...",1
MSFT,2,2023-03-24,2023-03-24 00:00:00,11-K,Annual report of employee stock purchase,https://finance.yahoo.com/sec-filing/MSFT/0001...,"[{'type': 'EX-23.1', 'url': 'https://cdn.yahoo...",1
MSFT,3,2023-02-09,2023-02-09 00:00:00,SC 13G/A,Tender Offer/Acquisition Reports,https://finance.yahoo.com/sec-filing/MSFT/0001...,"[{'type': 'SC 13G/A', 'url': 'https://cdn.yaho...",1
MSFT,4,2023-01-31,2023-01-31 00:00:00,SC 13G/A,Tender Offer/Acquisition Reports,https://finance.yahoo.com/sec-filing/MSFT/0001...,"[{'type': 'SC 13G/A', 'url': 'https://s3.amazo...",1
MSFT,5,2023-01-24,2023-01-24 00:00:00,10-Q,Periodic Financial Reports,https://finance.yahoo.com/sec-filing/MSFT/0001...,"[{'type': 'EXCEL', 'url': 'https://s3.amazonaw...",1
MSFT,6,2023-01-18,2023-01-18 00:00:00,8-K,Corporate Changes & Voting Matters,https://finance.yahoo.com/sec-filing/MSFT/0001...,"[{'type': '8-K', 'url': 'https://cdn.yahoofina...",1
MSFT,7,2022-12-16,2022-12-16 00:00:00,8-K,Corporate Changes & Voting Matters,https://finance.yahoo.com/sec-filing/MSFT/0001...,"[{'type': 'EXCEL', 'url': 'https://s3.amazonaw...",1
MSFT,8,2022-11-21,2022-11-21 00:00:00,PX14A6G,Notice of exempt solicitation,https://finance.yahoo.com/sec-filing/MSFT/0001...,"[{'type': 'PX14A6G', 'url': 'https://cdn.yahoo...",1
MSFT,9,2022-11-04,2022-11-04 00:00:00,PX14A6G,Notice of exempt solicitation,https://finance.yahoo.com/sec-filing/MSFT/0001...,"[{'type': 'PX14A6G', 'url': 'https://cdn.yahoo...",1


### Plotting Stock Prices

The `history` dataframe has the stock prices, but it looks like this dataframe has a multi-index, so we'll need to flatten it before charting.

https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.get_level_values.html


In [None]:
history_df = company.history()
#history_df.index.get_level_values(0)
#history_df.index.get_level_values(1)

history_df["symbol"] = history_df.index.get_level_values(0)
history_df["date"] = history_df.index.get_level_values(1)
history_df.reset_index(drop=True, inplace=True) # reset the index for simplification
history_df

Unnamed: 0,open,high,low,close,volume,adjclose,dividends,symbol,date
0,243.080002,245.750000,237.399994,239.580002,25740000,238.460129,0.0,MSFT,2023-01-03
1,232.279999,232.869995,225.960007,229.100006,50623400,228.029114,0.0,MSFT,2023-01-04
2,227.199997,227.550003,221.759995,222.309998,39585600,221.270859,0.0,MSFT,2023-01-05
3,223.000000,225.759995,219.350006,224.929993,43613600,223.878616,0.0,MSFT,2023-01-06
4,226.449997,231.240005,226.410004,227.119995,27369800,226.058365,0.0,MSFT,2023-01-09
...,...,...,...,...,...,...,...,...,...
112,334.339996,339.040009,332.809998,337.339996,26003800,337.339996,0.0,MSFT,2023-06-14
113,337.480011,349.839996,337.200012,348.100006,38899100,348.100006,0.0,MSFT,2023-06-15
114,351.320007,351.470001,341.950012,342.329987,46533600,342.329987,0.0,MSFT,2023-06-16
115,339.309998,342.079987,335.859985,338.049988,26375400,338.049988,0.0,MSFT,2023-06-20


In [None]:
import plotly.express as px

px.line(history_df, x="date", y="adjclose", title=f"Adjusted Closing Prices for {symbol}")

In [None]:
#px.bar(history_df, x="date", y="dividends", title=f"Dividends for {symbol}")

### Multiple Stocks

Let's choose around ten stocks of interest. Feel free to update the list of ticker symbols below.

In [None]:
# setup cell (run and leave as is)

from yahooquery import Ticker

symbols = ["AAPL", "GOOGL", "META", "MSFT", "NFLX", "AMZN", "NVDA",
           "BAC", "JPM"
]
companies = Ticker(symbols)
print(type(companies))

<class 'yahooquery.ticker.Ticker'>


Let's grab the historical prices for them all. We know we will get a somewhat complicated multi-indexed dataframe back.

In [None]:
companies.history()

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume,adjclose,dividends
symbol,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
AAPL,2023-01-03,130.279999,130.899994,124.169998,125.070000,112117500,124.706833,0.0
AAPL,2023-01-04,126.889999,128.660004,125.080002,126.360001,89113600,125.993095,0.0
AAPL,2023-01-05,127.129997,127.769997,124.760002,125.019997,80962700,124.656975,0.0
AAPL,2023-01-06,126.010002,130.289993,124.889999,129.619995,87754700,129.243622,0.0
AAPL,2023-01-09,130.470001,133.410004,129.889999,130.149994,70790800,129.772079,0.0
...,...,...,...,...,...,...,...,...
JPM,2023-06-14,143.000000,143.610001,140.699997,141.490005,9068700,141.490005,0.0
JPM,2023-06-15,140.589996,143.399994,140.029999,143.089996,9343100,143.089996,0.0
JPM,2023-06-16,143.050003,143.970001,142.500000,143.259995,13812100,143.259995,0.0
JPM,2023-06-20,142.610001,143.000000,141.039993,142.529999,8428600,142.529999,0.0


In [None]:
from pandas import to_datetime

histories_df = companies.history()
#histories_df.index.get_level_values(0)
#histories_df.index.get_level_values(1)

histories_df["symbol"] = histories_df.index.get_level_values(0)
histories_df["date"] = to_datetime(histories_df.index.get_level_values(1)).date
histories_df.reset_index(drop=True, inplace=True) # see: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reset_index.html
print(len(histories_df)) #> 1053 rows
histories_df[["date", "symbol", "adjclose"]]

1053


Unnamed: 0,date,symbol,adjclose
0,2023-01-03,AAPL,124.706833
1,2023-01-04,AAPL,125.993095
2,2023-01-05,AAPL,124.656975
3,2023-01-06,AAPL,129.243622
4,2023-01-09,AAPL,129.772079
...,...,...,...
1048,2023-06-14,JPM,141.490005
1049,2023-06-15,JPM,143.089996
1050,2023-06-16,JPM,143.259995
1051,2023-06-20,JPM,142.529999


In [None]:
# quick check for null values (because some stocks may have different history lengths)
# https://stackoverflow.com/questions/26266362/how-do-i-count-the-nan-values-in-a-column-in-pandas-dataframe
histories_df["adjclose"].isnull().sum()  #> 0 ok looks good. can proceed without concern for nulls

0

#### Grouping and Aggregations

By default, the dataset is formatted as a **row per date per symbol**, with a **single column of (adjusted closing) prices**.

We can use this kind of data format to do easy grouping and aggregation, for example, calculating the average price for each stock, as well as the volatility / risk level (standard deviation of prices) for each stock.

> Beta and standard deviation are two ways to describe market volatility. Beta measures the fund's volatility in comparison to other funds, whereas standard deviation measures the fluctuation in the fund's share price over time. - [source](https://www.tutorialspoint.com/difference-between-beta-and-standard-deviation)


In [None]:
print("AVERAGE PRICE PER STOCK:")
histories_df.groupby("symbol")["adjclose"].mean().sort_values(ascending=False)

AVERAGE PRICE PER STOCK:


symbol
NFLX     345.091368
MSFT     280.546800
NVDA     265.622497
META     203.974273
AAPL     159.034452
JPM      136.477879
GOOGL    104.643932
AMZN     103.991282
BAC       30.632444
Name: adjclose, dtype: float64

In [None]:
print("VOLATILITY (STANDARD DEVIATION) PER STOCK:")
histories_df.groupby("symbol")["adjclose"].std().sort_values(ascending=False)

VOLATILITY (STANDARD DEVIATION) PER STOCK:


symbol
NVDA     71.178386
META     43.533557
NFLX     35.039349
MSFT     33.262708
AAPL     15.253099
GOOGL    11.706819
AMZN     10.791435
JPM       4.979865
BAC       3.016316
Name: adjclose, dtype: float64

#### Pivot Operations


However, for charting purposes, we will want to construct a differently formatted dataset with a **row per date**, and a **column of adjusted closing prices for each symbol**.

If you are not sure how to achieve this, you could try an internet search for something like "pandas single column to multiple columns for different values", which will yeild an [SO post like this](https://stackoverflow.com/questions/58134672/converting-single-column-to-multiple-columns-based-on-unique-values) talking about a pivot-based approach.

Now let's try to adapt the approach from this post, while cross referencing our notes about pivot operations, as well as the [`pivot` function docs](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html), as necessary.



In [None]:
# using the pivot, we say we want a column for each symbol,
# and use the adjusted closing prices as the values
# and the dates as the rows
prices_pivot = histories_df.pivot(columns="symbol", values="adjclose", index="date")
print(len(prices_pivot))
# adding a column of dates for charting later
prices_pivot["date"] = prices_pivot.index
prices_pivot.reset_index(inplace=True, drop=True)
prices_pivot

117


symbol,AAPL,AMZN,BAC,GOOGL,JPM,META,MSFT,NFLX,NVDA,date
0,124.706833,85.820000,33.030487,89.120003,133.084778,124.739998,238.460144,294.950012,143.110870,2023-01-03
1,125.993095,85.139999,33.651470,88.080002,134.325806,127.370003,228.029129,309.410004,147.449707,2023-01-04
2,124.656975,83.120003,33.582474,86.199997,134.296051,126.940002,221.270844,309.700012,142.611023,2023-01-05
3,129.243622,86.080002,33.917606,87.339996,136.865875,130.020004,223.878601,315.549988,148.549393,2023-01-06
4,129.772079,87.360001,33.405048,88.019997,136.300308,129.470001,226.058380,315.170013,156.237289,2023-01-09
...,...,...,...,...,...,...,...,...,...,...
112,183.949997,126.419998,29.120001,123.669998,141.490005,273.350006,337.339996,440.859985,429.970001,2023-06-14
113,186.009995,127.110001,29.370001,125.089996,143.089996,281.829987,348.100006,445.269989,426.529999,2023-06-15
114,184.919998,125.489998,29.190001,123.529999,143.259995,281.000000,342.329987,431.959991,426.920013,2023-06-16
115,185.009995,125.779999,28.870001,123.099998,142.529999,284.329987,338.049988,434.700012,438.079987,2023-06-20


In [None]:
import plotly.express as px

#px.line(prices_pivot, x="date", y="AAPL")
#px.line(prices_pivot, x="date", y=["AAPL", "AMZN"])
px.line(prices_pivot, x="date", y=symbols, title="Stock Prices (Raw Values)")

#### Growth Calculations

The prices all have different ranges, and plotting the raw data on a graph does not allow us to see which have the highest relative growth.

Instead, after getting the raw stock prices, we can convert them to a relative scale (for example calculating percentage growth, or cumulative growth). Then we can plot these normalized prices to provide us with proper ability to compare.

https://www.codingfinance.com/post/2018-04-03-calc-returns-py/

In other words, let's treat each stock's price on the first day as a value of 1, and then plot the growth in relative terms from there.

In [None]:
# knowing we are going to perform some shift-based methods,
# we first ensure data is sorted, for good measure
prices_pivot.sort_values(by="date", ascending=True, inplace=True)
prices_pivot

symbol,AAPL,AMZN,BAC,GOOGL,JPM,META,MSFT,NFLX,NVDA,date
0,124.706833,85.820000,33.030487,89.120003,133.084778,124.739998,238.460144,294.950012,143.110870,2023-01-03
1,125.993095,85.139999,33.651470,88.080002,134.325806,127.370003,228.029129,309.410004,147.449707,2023-01-04
2,124.656975,83.120003,33.582474,86.199997,134.296051,126.940002,221.270844,309.700012,142.611023,2023-01-05
3,129.243622,86.080002,33.917606,87.339996,136.865875,130.020004,223.878601,315.549988,148.549393,2023-01-06
4,129.772079,87.360001,33.405048,88.019997,136.300308,129.470001,226.058380,315.170013,156.237289,2023-01-09
...,...,...,...,...,...,...,...,...,...,...
112,183.949997,126.419998,29.120001,123.669998,141.490005,273.350006,337.339996,440.859985,429.970001,2023-06-14
113,186.009995,127.110001,29.370001,125.089996,143.089996,281.829987,348.100006,445.269989,426.529999,2023-06-15
114,184.919998,125.489998,29.190001,123.529999,143.259995,281.000000,342.329987,431.959991,426.920013,2023-06-16
115,185.009995,125.779999,28.870001,123.099998,142.529999,284.329987,338.049988,434.700012,438.079987,2023-06-20


Illustration of methods:

In [None]:
#prices_pivot["AAPL"].pct_change(periods=1)

In [None]:
#prices_pivot["AAPL"].pct_change(periods=1) + 1

In [None]:
(prices_pivot["AAPL"].pct_change(periods=1) + 1).cumprod()

0           NaN
1      1.010314
2      0.999600
3      1.036380
4      1.040617
         ...   
112    1.475059
113    1.491578
114    1.482838
115    1.483559
116    1.475140
Name: AAPL, Length: 117, dtype: float64

In [None]:
# ignoring some warnings that these operations will throw
import warnings
warnings.filterwarnings("ignore")


returns_df = prices_pivot.copy()

for symbol in symbols:
    growth_colname = f"{symbol}_growth"
    returns_df[growth_colname] = (returns_df[symbol].pct_change(periods=1) + 1).cumprod()
    returns_df[growth_colname][0] = 1 # set NaN to 1

returns_df.reset_index(inplace=True, drop=True)
returns_df[["AAPL", "AAPL_growth"]]

symbol,AAPL,AAPL_growth
0,124.706833,1.000000
1,125.993095,1.010314
2,124.656975,0.999600
3,129.243622,1.036380
4,129.772079,1.040617
...,...,...
112,183.949997,1.475059
113,186.009995,1.491578
114,184.919998,1.482838
115,185.009995,1.483559


It is always important to check our work. Let's do a quick check to see if our cumulative growth matches our expectations. Here we see the final growth value matches our check.

In [None]:
beginning = returns_df.iloc[0]["AAPL"]
ending = returns_df.iloc[-1]["AAPL"]
growth_check = (ending - beginning) / beginning
print(growth_check)


print(returns_df.iloc[-1]["AAPL_growth"])
print(returns_df.iloc[-1]["AAPL_growth"] - 1) # let's subtract one to undo our previous operation

0.47513975342805337
1.4751397534280528
0.4751397534280528


OK great, now we can chart the data on a single plot, and also compare the growth across each company.

In [None]:
growth_cols = [f"{symbol}_growth" for symbol in symbols]
print(growth_cols)

['AAPL_growth', 'GOOGL_growth', 'META_growth', 'MSFT_growth', 'NFLX_growth', 'AMZN_growth', 'NVDA_growth', 'BAC_growth', 'JPM_growth']


In [None]:
import plotly.express as px

px.line(returns_df, x="date", y=growth_cols, title="Stock Prices (Cumulative Growth)")

Now it is easy to tell the story of which company has the highest growth!

In [None]:
returns_df[growth_cols].iloc[-1].sort_values(ascending=False)

symbol
NVDA_growth     3.007808
META_growth     2.257816
AAPL_growth     1.475140
AMZN_growth     1.454556
NFLX_growth     1.439057
MSFT_growth     1.398808
GOOGL_growth    1.352671
JPM_growth      1.069394
BAC_growth      0.864958
Name: 116, dtype: float64