<a href="https://colab.research.google.com/github/mscouse/TBS_investment_management/blob/main/PM_labs_part_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1F1J2rObxMwR11cnRzm5m_b1cLwW1U-nL?usp=sharing)
# <strong> Investment Management 1</strong>
---
#<strong> Part 4: Data sources & data collection in Python.</strong>

In the course repository on GitHub, you will find several introductory Colab notebooks covering the following topics:

**Part 1: Introduction to Python and Google Colab notebooks.**

**Part 2: Getting started with Colab notebooks & basic features.**

**Part 3: Data visualisation libraries.**

**Part 4: Data sources & data collection in Python (CURRENT NOTEBOOK).**

**Part 5: Basic financial calculations in python.**


The notebooks have been designed to help you get started with Python and Google Colab. See the **“1_labs_introduction”** folder for more information. Each notebook contains all necessary libraries and references to the required subsets of data.

# <strong>Data sources and data collection</strong>

To perform data analysis, the first step is to load a file containing the pertinent data – such as a CSV or Excel file - into Colab. There are several ways to do so. You can import your own data into Colab notebooks from Google Drive, GitHub and many other sources. Some of these are discussed below. 

To find out more about importing data, and how Colab can be used for data analysis, see the <a href="https://github.com/mscouse/TBS_investment_management/blob/main/Python_workspace.pdf">Python Workspace</a> document in the course GitHub repository or a more <a href="https://neptune.ai/blog/google-colab-dealing-with-files">comprehensive guide</a> prepared by Siddhant Sadangi of Reuters.

##1. Uploading files from your local drive

It is easy to upload your locally stored data files. To upload the data from your local drive, type in the following code in a new “Code” cell in Colab (as demonstrated below):

```
from google.colab import files
files.upload()
```
Once executed, the code will prompt you to select a file containing your data. Click on **“Choose Files”** then select and upload the file. Wait for the file to be 100% uploaded. You should see the name of the file in the code cell once it is uploaded.

On the left side of Colab interface, there is a **"Files/ Folder"** tab. You can find the uploaded file in that directory. 

If you want to read the uploaded data into a Pandas dataframe (named `df` in this example), use the following code in a new code cell. The **'filename.csv'** should match the name of the uploaded file, including the `.csv` extension:

```
import pandas as pd
df = pd.read_csv('filename.csv')
```


In [None]:
from google.colab import files
files.upload()

##2. Upload files from GitHub  (via its RAW URL)

You can either clone an entire GitHub repository to your Colab environment or access individual files from their raw link. We use the latter method throughout the course/assignments. 


**Clone a GitHub repository**

You can clone a GitHub repository into your Colab environment in the same way as you would on your local machine, using `!git clone` followed by the clone URL of the repository:

```
# use the correct URL
!git clone https://github.com/repository_name.git
```
Once the repository is cloned, refresh the file-explorer to browse through its contents. Then you can simply read the files as you would in your local machine (see above).

&nbsp;


**Load GitHub files using raw links**

There is no neeed to clone the repository to Colab if you need to work with only a few files from that repository. You can load individual files directly from GitHub using thier raw links, as follows:

1.   click on the file in the repository;
2.   click on `View Raw`;
3.   copy the URL of the raw file,
4.   use this URL as the location of your file (see sample code below) 

```
import pandas as pd

# step 1: store the link to your dataset as a string titled "url"
url="https://raw.githubusercontent.com/mscouse/TBS_investment_management/main/1_labs_introduction/stock_prices_1.csv"

# step 2: Load the dataset into pandas. The dataset is stored as a pandas dataframe "df".
df = pd.read_csv(url)
```

Try doing it yourself using the code cells below.


In [None]:
# import any required libraries
import pandas as pd

# store the URL link to your GitHub dataset as a string titled "url"
url = 'https://raw.githubusercontent.com/mscouse/TBS_investment_management/main/1_labs_introduction/stock_prices_1.csv'

In [None]:
# load the dataset into Pandas. The dataset will be stored as a Pandas Dataframe "df".
# Note that the file we deal with in this example contains dates in the first column.
# Therefore, we parse the dates using "parse_dates" and set the date column to be
# the index of the dataframe (using the "index_col" parameter)
df = pd.read_csv(url, parse_dates=['date'], index_col=['date'])
df.head()

##3. Accessing financial data

There are several open source Python library designed to help researchers access financial data. One example is `yfinance` (formerly known as `fix-yahoo-finance`). It is a popular library, developed as a means to access the financial data available on Yahoo Finance.

Other widely used libraries are `pandas_datareader`, `yahoo_fin`, `ffn`, `PyNance`, and `alpha vantage`.

In this section we focus on the former library, `yfinance`. As this library is not pre-installed in Google Colab by default, we will first execute the following code to install it:

```
!pip install yfinance
```
The `!pip install <package>` command looks for the latest version of the package and installs it. This only needs to be done once per session.

In [None]:
# install the yfinance library
!pip install yfinance

As you may know, **Yahoo Finance** offers historical market data on stocks, bonds, cryptocurrencies, and currencies. It also aggregates companies' fundamental data.

We will be using several modules and functions included with the `yfinance` library to download historical market data from Yahoo Finance. For more information on the library, see <a href="https://pypi.org/project/yfinance/">here</a>. 

**Company information**

The first `module` of the `yfinance` library we consider is `Ticker`. By using the `Ticker` function we pass the stock symbol for which we need to download the data. It allows us to access ticker-specific data, such as stock info, corporate actions, company financials, etc. In the example below we are working with Apple - its ticker is”AAPL”. The first step is to call the `Ticker` function to initialize the stock we work with.

In [None]:
# import required libraries (note that yfinance needs to be imported in addition to being installed)
import yfinance as yf

# assign ticker to Python variable
aapl = yf.Ticker("AAPL")

# get stock info
aapl.info

**Downloading stock data**

To download the historical stock data, we need to use the `history` function. As arguments, we can pass **start** and **end** dates to set a specific time period. Otherwise, we can set the period to **max** which will return all the stock data available on Yahoo for the chosen ticker. 

Available paramaters for the `history()` method are:

* period: data period to download (either use `period` parameter or use `start` and `end`). Valid periods are: 1d, 5d, 1mo, 3mo, 6mo, 1y, 2y, 5y, 10y, ytd, max;

* interval: data interval (intraday data cannot extend past 60 days). Valid intervals are: 1m, 2m, 5m, 15m, 30m, 60m, 90m, 1h, 1d, 5d, 1wk, 1mo, 3mo;

* start: if not using `period` - download start date string (YYYY-MM-DD) or datetime;

* end: if not using `period` - download end date string (YYYY-MM-DD) or datetime;

In [None]:
# get historical market data
hist = aapl.history(period="max")
hist.head()

**Displaying corporate actions and analysts recommendations**

To display information about the dividends and stock splits, or the analysts recommendations use the `actions` and `recommendations` functions.

In [None]:
# show company corporate actions, such as dividends and stock splits
aapl.actions

In [None]:
# show analysts recommendations
aapl.recommendations

**Data for multiple stocks**

To download data for multiple tickers, we need to use the `download()` method, as follows:
```
# Version 1

import yfinance as yf
stock_data = yf.download("AAPL MSFT BRK-A", start="2015-01-01", end="2021-01-20")
```

Alternatively, we can rewrite the code above as:
```
# Version 2

import yfinance as yf

tickers = "AAPL MSFT BRK-A"
date_1 = "2015-01-01"
date_2 = "2021-01-20"

stock_data = yf.download(tickers, start=date_1, end=date_2)
```

To access the closing adjusted price data for the tickers in the `stock_data` dataframe the code above creates, you should use: `stock_data['Adj Close']`. To access the closing adjusted price data for 'AAPL' only, use: `stock_data['Adj Close']['AAPL']`. 


In [None]:
# Version 1
# import required libraries
import yfinance as yf

# fetch data for multiple tickers
stock_data = yf.download("AAPL MSFT BRK-A", start="2015-01-01", end="2021-01-20")

# display the last 5 rows of the dataframe; we choose to display the "Adj Close" column only
stock_data["Adj Close"].tail()

In [None]:
# Version 2
import yfinance as yf

# assign required values to variables
tickers = "AAPL MSFT BRK-A"
date_1 = "2015-01-01"
date_2 = "2021-01-20"

# fetch data for multiple tickers
stock_data = yf.download(tickers, start=date_1, end=date_2)

# display the last 5 rows of the dataframe; we choose to display the "Adj Close" column only
stock_data["Adj Close"].tail()

In [None]:
# display the last 5 rows of AAPL adjusted close prices
stock_data['Adj Close']['AAPL'].tail()

However, if you want to group stock data by ticker, use the following code:

```
# Version 3

import yfinance as yf

tickers = "AAPL MSFT BRK-A"
date_1 = "2015-01-01"
date_2 = "2021-01-20"

stock_data = yf.download(tickers, start=date_1, end=date_2, group_by="ticker")
```

To access the closing adjusted price data for 'AAPL' only, use: `stock_data['AAPL']['Adj Close']`. 


In [None]:
# Version 3
import yfinance as yf

# assign required values to variables
tickers = "AAPL MSFT BRK-A"
date_1 = "2015-01-01"
date_2 = "2021-01-20"

# fetch data for multiple tickers
stock_data = yf.download(tickers, start=date_1, end=date_2, group_by="ticker")

# display the last 5 rows of "Adj Close" prices for AAPL only
stock_data["AAPL"]["Adj Close"].tail()