# Notebook Instructions

1. All the <u>code and data files</u> used in this course are available in the downloadable unit of the <u>last section of this course</u>.
2. You can run the notebook document sequentially (one cell at a time) by pressing **shift + enter**. 
3. While a cell is running, a [*] is shown on the left. After the cell is run, the output will appear on the next line.

This course is based on specific versions of python packages. You can find the details of the packages in <a href='https://quantra.quantinsti.com/quantra-notebook' target="_blank" >this manual</a>.

## Minute Price Data & Resampling Techniques

So far you have learnt how to download the data points for every day. But sometimes you might need more granularity to test your strategies like a data point for each hour, every 30 minutes or even each minute. In this notebook, you will learn how to download minute level data and how to resample them into different time frames such as 15 minutes and 1 hour. An important point to note here is, you can resample high frequency data to low frequency data, but not the other way round.

You will perform the following steps:
1. [Download Minute Data](#minute-data)
2. [Resample Data](#resample-data)

## Import Libraries

In [1]:
# For data manipulation
import pandas as pd

# To fetch financial data
import yfinance as yf

# For visualisation
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('seaborn-darkgrid')

<a id='minute-data'></a> 
## Download Minute Data

The `download` method of `yfinance` has parameters `period` and `interval`. You can play around with these parameters to download data for different periods and intervals.

You can download the minute data for up to seven days from Yahoo! Finance. The syntax for downloading the minute data of an asset for 5 days is as below:
```python
yf.download(tickers, period="5d", interval="1m", auto_adjust=True)
```

Parameters:
1. **ticker:** Ticker of the asset.
2. **period:** This is the number of days/month of data required. The valid frequencies are `1d, 5d, 1mo, 3mo, 6mo, 1y, 2y, 5y, 10y, ytd, max`.
3. **interval:** This is the frequency of data. The valid intervals are `1m, 2m, 5m, 15m, 30m, 60m, 90m, 1h, 1d, 5d, 1wk, 1mo, 3mo`.
4. **auto_adjust:** `True` to download adjusted data, else `False`.

In [2]:
# Download the minute data for Apple
apple_minute_data = yf.download(tickers="AAPL", period="5d", interval="1m", auto_adjust=True)

# Set the index to a datetime object
apple_minute_data.index = pd.to_datetime(apple_minute_data.index)

# Dispaly the first 5 rows
apple_minute_data.head()

[*********************100%***********************]  1 of 1 downloaded


Unnamed: 0_level_0,Open,High,Low,Close,Volume
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2021-05-19 09:30:00-04:00,123.16,123.47,122.86,123.45,3462407
2021-05-19 09:31:00-04:00,123.44,123.88,123.44,123.78,452823
2021-05-19 09:32:00-04:00,123.77,123.9,123.72,123.76,382622
2021-05-19 09:33:00-04:00,123.75,123.99,123.67,123.9,445365
2021-05-19 09:34:00-04:00,123.79,123.9,123.77,123.84,374504


<a id='resample-data'></a> 
## Resample Data

During strategy modelling, you might be required to work with a custom frequency of stock market data such as 15 minutes or 1 hour or even 1 month. If you have minute level data, then you can easily construct the 15 minutes, 1 hour or daily candles by resampling them. Thus, you don't have to buy them separately.

In this case, you can use the pandas `resample()` method to convert the stock data to the frequency of your choice.

The first step is to define the dictionary with the conversion logic. For example, to get the open value the first value will be used, to get the high value the maximum value will be used and so on. The name `Open`, `High`, `Low`, `Close` and `Volume` should match the column names in your dataframe.

In [3]:
# Aggregate function
ohlcv_dict = {'Open': 'first',
              'High': 'max',
              'Low': 'min',
              'Close': 'last',
              'Volume': 'sum'
             }

You can now use the `resample()` method to resample the data to the desired frequency.

Syntax:
```python
DataFrame.resample(interval).agg(aggregate)
```

Parameters:
1. **interval:** Resampling interval such as 15T for 15 minutes (H is for hour, D is for days, M is for months)
2. **aggregate:** Dictionary with aggregating values to be used while resampling

Returns: <br>
Resampled dataframe

### Resample minute data to 15 minutes data

In [4]:
# Resample data to 15 minutes data
apple_minute_data_15M = apple_minute_data.resample('15T').agg(ohlcv_dict)

# Drop the missing values
apple_minute_data_15M.dropna(inplace=True)

# Display the first 5 rows
apple_minute_data_15M.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2021-05-19 09:30:00-04:00,123.16,124.07,122.86,123.88,9330077
2021-05-19 09:45:00-04:00,123.88,124.0,123.33,123.36,4655323
2021-05-19 10:00:00-04:00,123.35,123.69,123.11,123.6,3870079
2021-05-19 10:15:00-04:00,123.59,123.85,123.36,123.76,3537176
2021-05-19 10:30:00-04:00,123.76,123.98,123.45,123.71,3442268


### Resample minute data to 1 hour data

In [5]:
# Resample data to 1 hour data
apple_minute_data_1H = apple_minute_data.resample('1H').agg(ohlcv_dict)

# Drop the missing values
apple_minute_data_1H.dropna(inplace=True)

# Display the first 5 rows
apple_minute_data_1H.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2021-05-19 09:00:00-04:00,123.16,124.07,122.86,123.36,13985400
2021-05-19 10:00:00-04:00,123.35,124.05,123.11,123.62,13668306
2021-05-19 11:00:00-04:00,123.62,124.12,123.31,123.89,10640324
2021-05-19 12:00:00-04:00,123.9,124.25,123.71,123.99,7711867
2021-05-19 13:00:00-04:00,123.98,124.44,123.97,124.26,6499870


### Resample minute data to 4 hours data

In [6]:
# Resample data to 4 hours data
apple_minute_data_4H = apple_minute_data.resample('4H').agg(ohlcv_dict)

# Drop the missing values
apple_minute_data_4H.dropna(inplace=True)

# Display the first 5 rows
apple_minute_data_4H.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2021-05-19 08:00:00-04:00,123.16,124.12,122.86,123.89,38294030
2021-05-19 12:00:00-04:00,123.9,124.92,123.5,124.69,40571254
2021-05-20 08:00:00-04:00,125.23,127.26,125.1,127.04,30385675
2021-05-20 12:00:00-04:00,127.05,127.72,126.67,127.31,36448257
2021-05-21 08:00:00-04:00,127.82,128.0,125.7,125.75,36394522


## Tweak the code

You can tweak the code in the following ways:

1. Use different asset other than the `AAPL` of your choice and download the data.
2. Use a different time interval to resample the data.
<br><br>