# Pandas Learning Journey: JPMorgan Stock Data

## Dataset Download from Kaggle

In [152]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("rockyt07/jpmc-stock-detailed-data")

print("Path to dataset files:", path)

Path to dataset files: /Users/pierre/.cache/kagglehub/datasets/rockyt07/jpmc-stock-detailed-data/versions/1


## DataFrame Import and Data Exploration

In [153]:
import pandas as pd
balance_df = pd.read_csv('../Dataset/JPM_balance_sheet.csv')
balance_df.head(5)

Unnamed: 0.1,Unnamed: 0,2024-12-31,2023-12-31,2022-12-31,2021-12-31,2020-12-31
0,Treasury Shares Number,1307313000.0,1228275000.0,1170676000.0,1160785000.0,
1,Preferred Shares Number,391850000.0,391850000.0,391850000.0,391850000.0,
2,Ordinary Shares Number,2797620000.0,2876659000.0,2934258000.0,2944149000.0,
3,Share Issued,4104934000.0,4104934000.0,4104934000.0,4104934000.0,
4,Total Debt,454311000000.0,436537000000.0,339892000000.0,354599000000.0,


In [154]:
cashflow_df = pd.read_csv('../Dataset/JPM_cashflow.csv')
cashflow_df.head(5)

Unnamed: 0.1,Unnamed: 0,2024-12-31,2023-12-31,2022-12-31,2021-12-31
0,Free Cash Flow,-42012000000.0,12974000000.0,107119000000.0,78084000000.0
1,Repurchase Of Capital Stock,-28680000000.0,-9824000000.0,-10596000000.0,-20983000000.0
2,Repayment Of Debt,-96605000000.0,-64880000000.0,-45556000000.0,-54932000000.0
3,Issuance Of Debt,109915000000.0,75417000000.0,78442000000.0,82409000000.0
4,Issuance Of Capital Stock,2500000000.0,0.0,0.0,7350000000.0


In [155]:
dividends_df = pd.read_csv('../Dataset/JPM_dividends.csv')
dividends_df.head(5)

Unnamed: 0,Date,Dividends
0,1984-03-09 00:00:00-05:00,0.196667
1,1984-06-11 00:00:00-04:00,0.196667
2,1984-09-10 00:00:00-04:00,0.196667
3,1984-12-10 00:00:00-05:00,0.196667
4,1985-03-11 00:00:00-05:00,0.206667


### [balance_df] Column Renaming

Since the balance_df's column 0 has an unnamed columns, rename it as **Metrics** for easier reference. Also renamed the 'YYYY-MM-DD' format as YYYY only.

In [156]:
balance_df = balance_df.rename(columns = {'Unnamed: 0':'Metrics', '2024-12-31': '2024', '2023-12-31': '2023', '2022-12-31': '2022', '2021-12-31': '2021', '2020-12-31': '2020'}) 
balance_df.columns

Index(['Metrics', '2024', '2023', '2022', '2021', '2020'], dtype='object')

View the **Dataframe** again with revised names.

In [157]:
balance_df

Unnamed: 0,Metrics,2024,2023,2022,2021,2020
0,Treasury Shares Number,1307313000.0,1228275000.0,1170676000.0,1160785000.0,
1,Preferred Shares Number,391850000.0,391850000.0,391850000.0,391850000.0,
2,Ordinary Shares Number,2797620000.0,2876659000.0,2934258000.0,2944149000.0,
3,Share Issued,4104934000.0,4104934000.0,4104934000.0,4104934000.0,
4,Total Debt,454311000000.0,436537000000.0,339892000000.0,354599000000.0,
5,Tangible Book Value,260148000000.0,236093000000.0,204069000000.0,202598000000.0,
6,Invested Capital,779019000000.0,737011000000.0,604820000000.0,613888000000.0,
7,Net Tangible Assets,280198000000.0,263497000000.0,231473000000.0,237436000000.0,
8,Common Stock Equity,324708000000.0,300474000000.0,264928000000.0,259289000000.0,
9,Preferred Stock Equity,20050000000.0,27404000000.0,27404000000.0,34838000000.0,


### [balance_df] Data Cleaning

Show a grasp on the null values present on the dataset.

In [158]:
balance_df.isnull().sum()

Metrics     0
2024        3
2023        3
2022        3
2021        6
2020       48
dtype: int64

Since the 2020 data is mostly **NaN**, I'll drop the columns.

In [159]:
balance_df = balance_df.drop(columns=['2020'])

In [160]:
balance_df.columns

Index(['Metrics', '2024', '2023', '2022', '2021'], dtype='object')

Other NaN's are replaced with **zero**.

In [161]:
balance_df = balance_df.fillna(0)

Show the dataset after filling and dropping null values.

In [162]:
balance_df

Unnamed: 0,Metrics,2024,2023,2022,2021
0,Treasury Shares Number,1307313000.0,1228275000.0,1170676000.0,1160785000.0
1,Preferred Shares Number,391850000.0,391850000.0,391850000.0,391850000.0
2,Ordinary Shares Number,2797620000.0,2876659000.0,2934258000.0,2944149000.0
3,Share Issued,4104934000.0,4104934000.0,4104934000.0,4104934000.0
4,Total Debt,454311000000.0,436537000000.0,339892000000.0,354599000000.0
5,Tangible Book Value,260148000000.0,236093000000.0,204069000000.0,202598000000.0
6,Invested Capital,779019000000.0,737011000000.0,604820000000.0,613888000000.0
7,Net Tangible Assets,280198000000.0,263497000000.0,231473000000.0,237436000000.0
8,Common Stock Equity,324708000000.0,300474000000.0,264928000000.0,259289000000.0
9,Preferred Stock Equity,20050000000.0,27404000000.0,27404000000.0,34838000000.0


Transposed the wide dataframe table to a long table format.

In [163]:
balance_t = balance_df.set_index('Metrics').T
balance_t

Metrics,Treasury Shares Number,Preferred Shares Number,Ordinary Shares Number,Share Issued,Total Debt,Tangible Book Value,Invested Capital,Net Tangible Assets,Common Stock Equity,Preferred Stock Equity,...,Gross PPE,Other Properties,Properties,Receivables,Other Receivables,Accounts Receivable,Other Short Term Investments,Cash And Cash Equivalents,Cash Financial,Cash Cash Equivalents And Federal Funds Sold
2024,1307313000.0,391850000.0,2797620000.0,4104934000.0,454311000000.0,260148000000.0,779019000000.0,280198000000.0,324708000000.0,20050000000.0,...,32223000000.0,15349000000.0,16874000000.0,101223000000.0,0.0,101223000000.0,396690000000.0,469317000000.0,23372000000.0,764318000000.0
2023,1228275000.0,391850000.0,2876659000.0,4104934000.0,436537000000.0,236093000000.0,737011000000.0,263497000000.0,300474000000.0,27404000000.0,...,30157000000.0,15295000000.0,14862000000.0,107363000000.0,0.0,107363000000.0,192485000000.0,624151000000.0,29066000000.0,900303000000.0
2022,1170676000.0,391850000.0,2934258000.0,4104934000.0,339892000000.0,204069000000.0,604820000000.0,231473000000.0,264928000000.0,27404000000.0,...,27734000000.0,14248000000.0,13486000000.0,125189000000.0,0.0,125189000000.0,196699000000.0,567234000000.0,27697000000.0,882826000000.0
2021,1160785000.0,391850000.0,2944149000.0,4104934000.0,354599000000.0,202598000000.0,613888000000.0,237436000000.0,259289000000.0,34838000000.0,...,0.0,0.0,0.0,102570000000.0,0.0,102570000000.0,290257000000.0,740834000000.0,26438000000.0,1002532000000.0


Since the metrics column shows years, rename the column as year. Since the metrics was used as an index, we can rename it by using the *rename_axis* function.

In [164]:
balance_t = balance_t.rename_axis('Year', axis = 1)
balance_t.columns

Index(['Treasury Shares Number', 'Preferred Shares Number',
       'Ordinary Shares Number', 'Share Issued', 'Total Debt',
       'Tangible Book Value', 'Invested Capital', 'Net Tangible Assets',
       'Common Stock Equity', 'Preferred Stock Equity', 'Total Capitalization',
       'Total Equity Gross Minority Interest', 'Stockholders Equity',
       'Gains Losses Not Affecting Retained Earnings',
       'Other Equity Adjustments', 'Treasury Stock', 'Retained Earnings',
       'Additional Paid In Capital', 'Capital Stock', 'Common Stock',
       'Preferred Stock', 'Total Liabilities Net Minority Interest',
       'Long Term Debt And Capital Lease Obligation', 'Long Term Debt',
       'Current Debt And Capital Lease Obligation', 'Current Debt',
       'Other Current Borrowings', 'Commercial Paper',
       'Payables And Accrued Expenses', 'Payables', 'Other Payable',
       'Accounts Payable', 'Total Assets', 'Investments And Advances',
       'Held To Maturity Securities', 'Available Fo

## [balance_df] Practicing to select and Perform Operations

Reviewing the columns of the new dataframe table.

In [165]:
balance_t.columns

Index(['Treasury Shares Number', 'Preferred Shares Number',
       'Ordinary Shares Number', 'Share Issued', 'Total Debt',
       'Tangible Book Value', 'Invested Capital', 'Net Tangible Assets',
       'Common Stock Equity', 'Preferred Stock Equity', 'Total Capitalization',
       'Total Equity Gross Minority Interest', 'Stockholders Equity',
       'Gains Losses Not Affecting Retained Earnings',
       'Other Equity Adjustments', 'Treasury Stock', 'Retained Earnings',
       'Additional Paid In Capital', 'Capital Stock', 'Common Stock',
       'Preferred Stock', 'Total Liabilities Net Minority Interest',
       'Long Term Debt And Capital Lease Obligation', 'Long Term Debt',
       'Current Debt And Capital Lease Obligation', 'Current Debt',
       'Other Current Borrowings', 'Commercial Paper',
       'Payables And Accrued Expenses', 'Payables', 'Other Payable',
       'Accounts Payable', 'Total Assets', 'Investments And Advances',
       'Held To Maturity Securities', 'Available Fo

Performing calculations and selecting various columns.

In [176]:
balance_t['Debt to Asset Ratio'] = (balance_t['Total Debt'] / balance_t['Total Assets']) *100
balance_t[['Total Debt', 'Total Assets', 'Debt to Asset Ratio']]


Year,Total Debt,Total Assets,Debt to Asset Ratio
2024,454311000000.0,4002814000000.0,11.34979
2023,436537000000.0,3875393000000.0,11.264329
2022,339892000000.0,3665743000000.0,9.272118
2021,354599000000.0,3743567000000.0,9.472223
