# Key Takeaways

We will typically download data from the internet using a handful of Python packages.
This approach avoids manually downloading and importing data files from multiple websites.
This notebook provides a tutorial on how to use the yfinance, pandas-datareader, and requests-cache packages to download data.
For completeness, this tutorial also covers saving to and reading from .csv and .pkl files, which are easier to share.

***The key takeaways from this notebook are:***

1. Downloading data with the yfinance and pandas-datareader packages
1. Saving and sharing data with .csv and .pkl files (comma-separated value and pickle)

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

In [None]:
plt.rcParams['figure.dpi'] = 150
np.set_printoptions(precision=4, suppress=True)
pd.options.display.float_format = '{:.4f}'.format

# The yfinance Package

The [yfinance package](https://github.com/ranaroussi/yfinance) provides "a reliable, threaded, and Pythonic way to download historical market data from Yahoo! finance."
Other packages that provide similar functionality, but I think yfinance is the best when I last searched for alternatives in September 2021.
To avoid repeated calls to Yahoo! Finance's advanced programming interface (API), we will use the requests-cache package.
We can install these packages with the `%pip` magic by running the following cell once per Anaconda installation or DataCamp Workspace:

In [None]:
# %pip install yfinance requests-cache

In [None]:
import yfinance as yf
import requests_cache
session = requests_cache.CachedSession(expire_after='1D')

We can download data for the FAANG stocks (Facebook, Amazon, Apple, Netflix, and Google).
We can pass tickers as either a space-delimited string or a list of strings.

In [None]:
faang = yf.download(tickers='FB AMZN AAPL NFLX GOOG', session=session)

We can plot the value of 100 dollars invested in each of the stocks at the close on the last day of 2020.
We can chain the following steps:

1. Calculate returns
1. Subset to 2021 and beyond
1. Compound returns
1. Multiply by 100

Then plot.
Note that I subtract one business month from the index to find the date of the previous close.

In [None]:
_ = faang['Adj Close'].pct_change().loc['2021':].add(1).cumprod().mul(100)
_.plot()
buy_date = (_.index[0] - pd.offsets.BusinessMonthEnd()).strftime('%B %d, %Y')
plt.title('Value of $100 Invested in FAANG Stocks at Close on ' + buy_date)
plt.xlabel('Date')
plt.ylabel('Value ($)')
plt.show()

---

***Practice:***
Can we make the plot above without compounding returns?

---

The easiest, and most universal way to save data is to a .csv file with the `.to_csv()` method.
Note that I save notebooks and data to folders named Notebooks and Data that are at the same level.

In [None]:
faang.to_csv('../../Data/FAANG.csv')

With one column index, the `.to_csv()` is great.
However, the column multi-index of this data frame make reading this .csv file tricky.

In [None]:
pd.read_csv('../../Data/FAANG.csv')

If we we will typically use data with Python, a .pkl file is a better than a .csv file.
A .pkl file stores/reloads the pandas object as-is.

In [None]:
faang.to_pickle('../../Data/FAANG.pkl')

In [None]:
pd.read_pickle('../../Data/FAANG.pkl')

If we want to save our data as a .csv file, we should convert our data to a long format.
In a long format, each variable (e.g., adjusted close) appears in one and only one column, and each row represents one stock on one date.
We can use the `.stack()` method to convert from wide data to long data.
We will cover `.stack()` and `.unstack()` methods in greater detail later in the course.

In [None]:
faang.columns.names = ['Variable', 'Ticker']

In [None]:
faang.stack().to_csv('../../Data/FAANG-long.csv')

In [None]:
faang_long = pd.read_csv('../../Data/FAANG-long.csv', index_col=['Date', 'Ticker'], parse_dates=['Date'])

---

***Practice:***
Manipulate `faang_long` to match the original `faang` (i.e., wide format).

---

# The pandas-datareader package

The [pandas-datareader](https://pandas-datareader.readthedocs.io/en/latest/index.html) package provides easy access to a variety of data sources, such as 
[Ken French's Data Library](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html) 
and 
[Federal Reserve Economic Data (FRED)](https://fred.stlouisfed.org/).
The pandas-datareader package also provides access to Yahoo! Finance data, but I think the yfinance package has better documentation.
We can install pandas-datareader with the `%pip` magic by running the following cell once per Anaconda installation or DataCamp Workspace:

In [None]:
# %pip install pandas-datareader

We will use `pdr` as the abbreviated prefix for pandas-datareader.

In [None]:
import pandas_datareader as pdr

He is an example with the daily benchmark factor from Ken French's Data Library.
The `get_available_datasets()` function provides the exact names for all of Ken French's data sets.

In [None]:
pdr.famafrench.get_available_datasets()[:5]

Note that pandas-datareader returns a dictionary of data frames and we specify a `start` date.

In [None]:
ff = pdr.get_data_famafrench('F-F_Research_Data_Factors_daily', start='1900', session=session)

In [None]:
ff[0].describe()

By default, pandas-datareader downloads five years of data, but most Fama-French data are available back through the mid 1920s.
We can easily plot the cumulative returns to the Fama-French factors over the past five years.

In [None]:
ff[0].div(100).add(1).cumprod().plot()
buy_date = (ff[0].index[0] - pd.offsets.BusinessDay()).strftime('%B %d, %Y')
plt.title('Value of $1 Invested in FF Factors at Close on ' + buy_date)
plt.xlabel('Date')
plt.ylabel('Value ($)')
plt.show()