# Python for Finance

My notebook for the Coursera "Python for Finance" course

---

### Must-have Packages for Finance and Data Science
- NumPy: allows us to work with multidimensional arrays
- Pandas (from PANel DAta): enchances NumPy and allows us to organize data in a tabular form and to attach descriptive labels to the rows and columns of the table
- Matplotlib: 2D plotting library designed for visualization of NumPy computations

*Sidenote: All three of the above fall under the Scipy library - a library for scientific calculations suitable for mathematics, machine learning, artificial intelligence, and engineering

- math: math functions
- random - invokes random number generators
- statsmodels - descriptive statistics, plotting functions, regressions, etc

---

### Arrays
The difference between arrays and lists is that the elements of an array are the same data type, whereas lists can constaind elements of different types. NumPy n-dimensional array = ndarray. An ndarray is always homogenous - all elements are of same type and amount.

- A 1D array is called a vector
- A 2D array is called a matrix

In [1]:
import numpy as np

a = np.array([[0, 1, 2, 3], [4, 5, 6, 7]])
a

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [2]:
a.shape

(2, 4)

In [3]:
a[1, 3]

np.int64(7)

In [4]:
a[1, 2] = 8
a

array([[0, 1, 2, 3],
       [4, 5, 8, 7]])

In [5]:
a[1]

array([4, 5, 8, 7])

---

### Generating Random Numbers



In [6]:
import random

probability = random.random() # Generates a random float from [0-1) > 0 included, 1 excluded
probability

0.06636635636591837

In [7]:
# pretend we needed a dice roll
probability = random.randint(1, 6) # generates a random float over a provided interval
probability

2

In [8]:
# similarly we can generate random arrays with numpy
import numpy as np
np.random.randint(1, 6, (3, 3)) # the third parameter is the output shape of the matrix

array([[2, 3, 5],
       [3, 2, 5],
       [3, 4, 3]])

---

### Sources of Financial Data

We get financial data from either a web server or our computer:

#### Web server
- Requires internet connection
- Provides up-to-date data
- Financial Data APIs = online financial data sources
    - iex
    - morningstar
    - alpha vantage
    - quandl
    - etc
- pandas-datareader helps retrieve data from these sources and prepare the data for analysis
- APIs are prone to breaking down for unkown periods of time
- Certain APIs may contain only part of the data you need for a proper financial analysis
    - For instance multiple-stock vs foreign stock vs market indeces data

#### Computer
- You can also use data from files on your computer such as a .csv file
- No internet connection required, and probably not up-to-date info

---

### Importing and Organzing Data

In [9]:
import numpy as np
import pandas as pd

series = pd.Series(np.random.random(5), name = "Column 01")
series

Unnamed: 0,Column 01
0,0.661661
1,0.648877
2,0.409527
3,0.322058
4,0.95888


In [10]:
# series are like dictionaries
series[2]

np.float64(0.40952686545794525)

In [13]:
# from pandas_datareader import data as web
# pg = web.DataReader('PG', data_source='yahoo', start='1995-1-1')
# pg

Welp. That doesn't work anymore. I'm not exactly sure how to follow along if the course is going to continually use pandas_datareader, but I at least want the domain knowledge of financial stuff, so I'm going to keep taking notes on the concepts even though I may need to figure out how to code everything myself

In [12]:
import yfinance as yf
pg = yf.download('PG', start='1995-1-1', auto_adjust=False) # auto_adj is off to ensure we get the adj close as a separate column
pg

[*********************100%***********************]  1 of 1 completed


Price,Adj Close,Close,High,Low,Open,Volume
Ticker,PG,PG,PG,PG,PG,PG
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
1995-01-03,7.350248,15.593750,15.625000,15.437500,15.468750,3318400
1995-01-04,7.291329,15.468750,15.656250,15.312500,15.531250,2218800
1995-01-05,7.188221,15.250000,15.437500,15.218750,15.375000,2319600
1995-01-06,7.202950,15.281250,15.406250,15.156250,15.156250,3438000
1995-01-09,7.173495,15.218750,15.406250,15.187500,15.343750,1795200
...,...,...,...,...,...,...
2025-06-24,160.360001,160.360001,161.740005,159.649994,161.179993,7470100
2025-06-25,158.970001,158.970001,160.339996,158.710007,159.820007,5933700
2025-06-26,158.630005,158.630005,159.559998,157.169998,159.550003,7410400
2025-06-27,159.860001,159.860001,160.130005,158.289993,158.679993,19256200


Found a workaround using yfinance instead of pandas_datareader 💪🏼

In [14]:
pg.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 7674 entries, 1995-01-03 to 2025-06-30
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   (Adj Close, PG)  7674 non-null   float64
 1   (Close, PG)      7674 non-null   float64
 2   (High, PG)       7674 non-null   float64
 3   (Low, PG)        7674 non-null   float64
 4   (Open, PG)       7674 non-null   float64
 5   (Volume, PG)     7674 non-null   int64  
dtypes: float64(5), int64(1)
memory usage: 419.7 KB


In [15]:
pg.head()

Price,Adj Close,Close,High,Low,Open,Volume
Ticker,PG,PG,PG,PG,PG,PG
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
1995-01-03,7.350248,15.59375,15.625,15.4375,15.46875,3318400
1995-01-04,7.291329,15.46875,15.65625,15.3125,15.53125,2218800
1995-01-05,7.188221,15.25,15.4375,15.21875,15.375,2319600
1995-01-06,7.20295,15.28125,15.40625,15.15625,15.15625,3438000
1995-01-09,7.173495,15.21875,15.40625,15.1875,15.34375,1795200


In [16]:
pg.tail()

Price,Adj Close,Close,High,Low,Open,Volume
Ticker,PG,PG,PG,PG,PG,PG
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
2025-06-24,160.360001,160.360001,161.740005,159.649994,161.179993,7470100
2025-06-25,158.970001,158.970001,160.339996,158.710007,159.820007,5933700
2025-06-26,158.630005,158.630005,159.559998,157.169998,159.550003,7410400
2025-06-27,159.860001,159.860001,160.130005,158.289993,158.679993,19256200
2025-06-30,158.710007,158.710007,159.899994,158.304993,159.259995,3292041


In [17]:
pg.head(20)

Price,Adj Close,Close,High,Low,Open,Volume
Ticker,PG,PG,PG,PG,PG,PG
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
1995-01-03,7.350248,15.59375,15.625,15.4375,15.46875,3318400
1995-01-04,7.291329,15.46875,15.65625,15.3125,15.53125,2218800
1995-01-05,7.188221,15.25,15.4375,15.21875,15.375,2319600
1995-01-06,7.20295,15.28125,15.40625,15.15625,15.15625,3438000
1995-01-09,7.173495,15.21875,15.40625,15.1875,15.34375,1795200
1995-01-10,7.261868,15.40625,15.4375,15.1875,15.28125,4364000
1995-01-11,7.247137,15.375,15.59375,15.375,15.59375,3738400
1995-01-12,7.320789,15.53125,15.53125,15.3125,15.375,3307600
1995-01-13,7.406711,15.625,15.84375,15.53125,15.59375,3992800
1995-01-16,7.46596,15.75,15.96875,15.75,15.90625,3677200


In [18]:
pg.tail(20)

Price,Adj Close,Close,High,Low,Open,Volume
Ticker,PG,PG,PG,PG,PG,PG
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
2025-06-02,167.779999,167.779999,169.039993,166.229996,169.020004,7574400
2025-06-03,166.850006,166.850006,167.410004,165.899994,166.779999,6223000
2025-06-04,165.949997,165.949997,168.050003,165.919998,166.690002,4939000
2025-06-05,162.800003,162.800003,165.440002,162.509995,165.429993,10351800
2025-06-06,164.020004,164.020004,165.240005,163.279999,163.289993,5697700
2025-06-09,162.559998,162.559998,164.020004,162.070007,163.389999,6544200
2025-06-10,162.839996,162.839996,163.509995,161.919998,162.690002,7680000
2025-06-11,162.110001,162.110001,162.770004,161.690002,162.630005,5960100
2025-06-12,163.179993,163.179993,163.309998,161.679993,161.970001,6509900
2025-06-13,160.279999,160.279999,163.029999,159.910004,162.759995,7104500


In [19]:
# How we could get a single dataframe of multiple tickers' adj close prices
tickers = ['PG', 'MSFT', 'T', 'F', 'GE']

new_data = pd.DataFrame()

for t in tickers:
    new_data[t] = yf.download(t, start='1995-1-1', auto_adjust=False)['Adj Close']

new_data.tail()

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,PG,MSFT,T,F,GE
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2025-06-24,160.360001,490.109985,28.280001,10.73,248.75
2025-06-25,158.970001,492.269989,27.92,10.48,249.899994
2025-06-26,158.630005,497.450012,28.0,10.63,251.0
2025-06-27,159.860001,495.940002,28.08,10.8,254.509995
2025-06-30,158.740005,498.290009,28.594999,10.75,257.589996


---

The following is how you could get data from quandl back in the day, but it no longer works without an api key

In [21]:
# import quandl

# mydata_01 = quandl.get('FRED/GDP')
# mydata_01.head()