# Pandas Learning Journey: JPMorgan Stock Data

## Dataset Download from Kaggle

In [31]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("rockyt07/jpmc-stock-detailed-data")

print("Path to dataset files:", path)

Path to dataset files: /Users/pierre/.cache/kagglehub/datasets/rockyt07/jpmc-stock-detailed-data/versions/1


## DataFrame Import and Data Exploration

In [32]:
import pandas as pd
balance_df = pd.read_csv('../Dataset/JPM_balance_sheet.csv')
balance_df.head(5)

Unnamed: 0.1,Unnamed: 0,2024-12-31,2023-12-31,2022-12-31,2021-12-31,2020-12-31
0,Treasury Shares Number,1307313000.0,1228275000.0,1170676000.0,1160785000.0,
1,Preferred Shares Number,391850000.0,391850000.0,391850000.0,391850000.0,
2,Ordinary Shares Number,2797620000.0,2876659000.0,2934258000.0,2944149000.0,
3,Share Issued,4104934000.0,4104934000.0,4104934000.0,4104934000.0,
4,Total Debt,454311000000.0,436537000000.0,339892000000.0,354599000000.0,


In [33]:
cashflow_df = pd.read_csv('../Dataset/JPM_cashflow.csv')
cashflow_df.head(5)

Unnamed: 0.1,Unnamed: 0,2024-12-31,2023-12-31,2022-12-31,2021-12-31
0,Free Cash Flow,-42012000000.0,12974000000.0,107119000000.0,78084000000.0
1,Repurchase Of Capital Stock,-28680000000.0,-9824000000.0,-10596000000.0,-20983000000.0
2,Repayment Of Debt,-96605000000.0,-64880000000.0,-45556000000.0,-54932000000.0
3,Issuance Of Debt,109915000000.0,75417000000.0,78442000000.0,82409000000.0
4,Issuance Of Capital Stock,2500000000.0,0.0,0.0,7350000000.0


In [34]:
dividends_df = pd.read_csv('../Dataset/JPM_dividends.csv')
dividends_df.head(5)

Unnamed: 0,Date,Dividends
0,1984-03-09 00:00:00-05:00,0.196667
1,1984-06-11 00:00:00-04:00,0.196667
2,1984-09-10 00:00:00-04:00,0.196667
3,1984-12-10 00:00:00-05:00,0.196667
4,1985-03-11 00:00:00-05:00,0.206667


### [balance_df] Column Renaming

Since the balance_df's column 0 has an unnamed columns, rename it as **Metrics** for easier reference. Also renamed the 'YYYY-MM-DD' format as YYYY only.

In [35]:
balance_df = balance_df.rename(columns = {'Unnamed: 0':'Metrics', '2024-12-31': '2024', '2023-12-31': '2023', '2022-12-31': '2022', '2021-12-31': '2021', '2020-12-31': '2020'}) 
balance_df.columns

Index(['Metrics', '2024', '2023', '2022', '2021', '2020'], dtype='object')

View the **Dataframe** again with revised names.

In [36]:
balance_df

Unnamed: 0,Metrics,2024,2023,2022,2021,2020
0,Treasury Shares Number,1307313000.0,1228275000.0,1170676000.0,1160785000.0,
1,Preferred Shares Number,391850000.0,391850000.0,391850000.0,391850000.0,
2,Ordinary Shares Number,2797620000.0,2876659000.0,2934258000.0,2944149000.0,
3,Share Issued,4104934000.0,4104934000.0,4104934000.0,4104934000.0,
4,Total Debt,454311000000.0,436537000000.0,339892000000.0,354599000000.0,
5,Tangible Book Value,260148000000.0,236093000000.0,204069000000.0,202598000000.0,
6,Invested Capital,779019000000.0,737011000000.0,604820000000.0,613888000000.0,
7,Net Tangible Assets,280198000000.0,263497000000.0,231473000000.0,237436000000.0,
8,Common Stock Equity,324708000000.0,300474000000.0,264928000000.0,259289000000.0,
9,Preferred Stock Equity,20050000000.0,27404000000.0,27404000000.0,34838000000.0,


### [balance_df] Data Cleaning

Since the 2020 data is mostly **NaN**, I'll drop the columns.

In [37]:
balance_df = balance_df.drop(columns=['2020'])

In [38]:
balance_df.columns

Index(['Metrics', '2024', '2023', '2022', '2021'], dtype='object')

Other NaN's are replaced with **zero**.

In [None]:
balance_df = balance_df.fillna(0)