# Python for Finance

My notebook for the Coursera "Python for Finance" course

---

### Must-have Packages for Finance and Data Science
- NumPy: allows us to work with multidimensional arrays
- Pandas (from PANel DAta): enchances NumPy and allows us to organize data in a tabular form and to attach descriptive labels to the rows and columns of the table
- Matplotlib: 2D plotting library designed for visualization of NumPy computations

*Sidenote: All three of the above fall under the Scipy library - a library for scientific calculations suitable for mathematics, machine learning, artificial intelligence, and engineering

- math: math functions
- random - invokes random number generators
- statsmodels - descriptive statistics, plotting functions, regressions, etc

---

### Arrays
The difference between arrays and lists is that the elements of an array are the same data type, whereas lists can constaind elements of different types. NumPy n-dimensional array = ndarray. An ndarray is always homogenous - all elements are of same type and amount. 

- A 1D array is called a vector
- A 2D array is called a matrix

In [2]:
import numpy as np

a = np.array([[0, 1, 2, 3], [4, 5, 6, 7]])
a

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [4]:
a.shape

(2, 4)

In [6]:
a[1, 3]

np.int64(7)

In [7]:
a[1, 2] = 8
a

array([[0, 1, 2, 3],
       [4, 5, 8, 7]])

In [9]:
a[1]

array([4, 5, 8, 7])

---

### Generating Random Numbers



In [19]:
import random

probability = random.random() # Generates a random float from [0-1) > 0 included, 1 excluded
probability

0.6462656111873746

In [32]:
# pretend we needed a dice roll
probability = random.randint(1, 6) # generates a random float over a provided interval
probability

2

In [33]:
# similarly we can generate random arrays with numpy
import numpy as np
np.random.randint(1, 6, (3, 3)) # the third parameter is the output shape of the matrix

array([[3, 5, 5],
       [5, 2, 1],
       [1, 4, 2]])

---

### Sources of Financial Data

We get financial data from either a web server or our computer:

#### Web server
- Requires internet connection
- Provides up-to-date data
- Financial Data APIs = online financial data sources
    - iex
    - morningstar
    - alpha vantage
    - quandl
    - etc
- pandas-datareader helps retrieve data from these sources and prepare the data for analysis
- APIs are prone to breaking down for unkown periods of time
- Certain APIs may contain only part of the data you need for a proper financial analysis
    - For instance multiple-stock vs foreign stock vs market indeces data

#### Computer
- You can also use data from files on your computer such as a .csv file
- No internet connection required, and probably not up-to-date info

---

### Importing and Organzing Data

In [34]:
import numpy as np
import pandas as pd

series = pd.Series(np.random.random(5), name = "Column 01")
series

0    0.971956
1    0.824798
2    0.433181
3    0.581124
4    0.399774
Name: Column 01, dtype: float64

In [35]:
# series are like dictionaries
series[2]

np.float64(0.4331812135216766)

In [45]:
from pandas_datareader import data as web
pg = web.DataReader('PG', data_source='yahoo', start='1995-1-1')
pg

RemoteDataError: Unable to read URL: https://finance.yahoo.com/quote/PG/history?period1=788950800&period2=1750319999&interval=1d&frequency=1d&filter=history
Response Text:
b'<html><meta charset=\'utf-8\'>\n<script>\nif(window != window.top){\ndocument.write(\'<p>Content is currently unavailable.</p><img src="//geo.yahoo.com/p?s=1197757039&t=\'\n    + new Date().getTime() + \'&_R=\'\n    + encodeURIComponent(document.referrer)\n    + \'&err=404\'\n    + \'" width="0px" height="0px"/>\');\n}else{\nwindow.location.replace(\'https://\' + window.location.host + \'/?err=404\');\n}\n</script>\n<noscript>\n<META http-equiv="refresh" content="0;URL=https://finance.yahoo.com/?err=404">\n</noscript></html>'

Welp. That doesn't work anymore. I'm not exactly sure how to follow along if the course is going to continually use pandas_datareader, but I at least want the domain knowledge of financial stuff, so I'm going to keep taking notes on the concepts even though I may need to figure out how to code everything myself

In [48]:
import yfinance as yf
pg = yf.download('PG', start='1995-1-1', auto_adjust=False) # auto_adj is off to ensure we get the adj close as a separate column
pg  

[*********************100%***********************]  1 of 1 completed


Price,Adj Close,Close,High,Low,Open,Volume
Ticker,PG,PG,PG,PG,PG,PG
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
1995-01-03,7.350254,15.593750,15.625000,15.437500,15.468750,3318400
1995-01-04,7.291330,15.468750,15.656250,15.312500,15.531250,2218800
1995-01-05,7.188221,15.250000,15.437500,15.218750,15.375000,2319600
1995-01-06,7.202951,15.281250,15.406250,15.156250,15.156250,3438000
1995-01-09,7.173489,15.218750,15.406250,15.187500,15.343750,1795200
...,...,...,...,...,...,...
2025-06-12,163.179993,163.179993,163.309998,161.679993,161.970001,6509900
2025-06-13,160.279999,160.279999,163.029999,159.910004,162.759995,7104500
2025-06-16,160.880005,160.880005,161.949997,160.009995,160.880005,6332000
2025-06-17,158.520004,158.520004,160.369995,158.309998,160.059998,6792500


Found a workaround using yfinance instead of pandas_datareader 💪🏼

In [49]:
pg.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 7667 entries, 1995-01-03 to 2025-06-18
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   (Adj Close, PG)  7667 non-null   float64
 1   (Close, PG)      7667 non-null   float64
 2   (High, PG)       7667 non-null   float64
 3   (Low, PG)        7667 non-null   float64
 4   (Open, PG)       7667 non-null   float64
 5   (Volume, PG)     7667 non-null   int64  
dtypes: float64(5), int64(1)
memory usage: 419.3 KB


In [50]:
pg.head()

Price,Adj Close,Close,High,Low,Open,Volume
Ticker,PG,PG,PG,PG,PG,PG
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
1995-01-03,7.350254,15.59375,15.625,15.4375,15.46875,3318400
1995-01-04,7.29133,15.46875,15.65625,15.3125,15.53125,2218800
1995-01-05,7.188221,15.25,15.4375,15.21875,15.375,2319600
1995-01-06,7.202951,15.28125,15.40625,15.15625,15.15625,3438000
1995-01-09,7.173489,15.21875,15.40625,15.1875,15.34375,1795200


In [51]:
pg.tail()

Price,Adj Close,Close,High,Low,Open,Volume
Ticker,PG,PG,PG,PG,PG,PG
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
2025-06-12,163.179993,163.179993,163.309998,161.679993,161.970001,6509900
2025-06-13,160.279999,160.279999,163.029999,159.910004,162.759995,7104500
2025-06-16,160.880005,160.880005,161.949997,160.009995,160.880005,6332000
2025-06-17,158.520004,158.520004,160.369995,158.309998,160.059998,6792500
2025-06-18,158.304993,158.304993,158.649994,157.910004,158.550003,2380325


In [52]:
pg.head(20)

Price,Adj Close,Close,High,Low,Open,Volume
Ticker,PG,PG,PG,PG,PG,PG
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
1995-01-03,7.350254,15.59375,15.625,15.4375,15.46875,3318400
1995-01-04,7.29133,15.46875,15.65625,15.3125,15.53125,2218800
1995-01-05,7.188221,15.25,15.4375,15.21875,15.375,2319600
1995-01-06,7.202951,15.28125,15.40625,15.15625,15.15625,3438000
1995-01-09,7.173489,15.21875,15.40625,15.1875,15.34375,1795200
1995-01-10,7.26187,15.40625,15.4375,15.1875,15.28125,4364000
1995-01-11,7.24714,15.375,15.59375,15.375,15.59375,3738400
1995-01-12,7.320792,15.53125,15.53125,15.3125,15.375,3307600
1995-01-13,7.406708,15.625,15.84375,15.53125,15.59375,3992800
1995-01-16,7.465963,15.75,15.96875,15.75,15.90625,3677200


In [53]:
pg.tail(20)

Price,Adj Close,Close,High,Low,Open,Volume
Ticker,PG,PG,PG,PG,PG,PG
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
2025-05-21,165.429993,165.429993,166.369995,164.619995,164.820007,6424200
2025-05-22,165.029999,165.029999,166.199997,164.020004,164.440002,6245600
2025-05-23,165.860001,165.860001,166.220001,163.479996,164.830002,5349800
2025-05-27,167.759995,167.759995,167.979996,165.270004,165.270004,11423800
2025-05-28,167.360001,167.360001,168.839996,167.059998,167.649994,5385900
2025-05-29,168.559998,168.559998,168.990005,166.440002,166.740005,4842500
2025-05-30,169.889999,169.889999,170.990005,168.600006,168.75,12587500
2025-06-02,167.779999,167.779999,169.039993,166.229996,169.020004,7574400
2025-06-03,166.850006,166.850006,167.410004,165.899994,166.779999,6223000
2025-06-04,165.949997,165.949997,168.050003,165.919998,166.690002,4939000


In [57]:
# How we could get a single dataframe of multiple tickers' adj close prices
tickers = ['PG', 'MSFT', 'T', 'F', 'GE']

new_data = pd.DataFrame()

for t in tickers:
    new_data[t] = yf.download(t, start='1995-1-1', auto_adjust=False)['Adj Close']

new_data.tail()

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,PG,MSFT,T,F,GE
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2025-06-12,163.179993,478.869995,28.27,10.53,239.990005
2025-06-13,160.279999,474.959991,28.190001,10.43,236.600006
2025-06-16,160.880005,479.140015,27.969999,10.62,236.539993
2025-06-17,158.520004,478.040009,27.65,10.42,235.75
2025-06-18,158.160004,479.565002,27.674999,10.4299,236.610001


---

The following is how you could get data from quandl back in the day, but it no longer works without an api key

In [59]:
import quandl

mydata_01 = quandl.get('FRED/GDP')
mydata_01.head()

QuandlError: (Status 403) Something went wrong. Please try again. If you continue to have problems, please contact us at connect@quandl.com.