<h1>Chapter 5 | Data Exercise #1 | Financial Assets | Generalizing from Data</h1>
<h2>Introduction:</h2>
<p>In this notebook, you will find my notes and code for Chapter 5's <b>exercise 1</b> of the book <a href="https://gabors-data-analysis.com/">Data Analysis for Business, Economics, and Policy</a>, by Gábor Békés and Gábor Kézdi. The question was: 
<p>1. Download ten years of daily data on the price of a financial asset, such as an individual stock, or another stock market index.</p>
<p>Assignments:</p>
<ul>
    <li>Document the main features of the data.</li>
    <li>Create daily percentage returns.</li>
    <li>Create a binary variable indicating large losses by choosing your own cutoff.</li>
    <li>Estimate the standard error of the estimated likelihood of large daily losses by bootsrap and using the SE formula.</li>
    <li>Compare the two, and create 95% confidence intervals.</li>
    <li>Conclude by giving advice on how to use these results in future investments decisions.</li>

</ul>
<h2>1. Load the data</h2>

In [None]:
import requests
import os
from dotenv import load_dotenv
import pandas as pd
import warnings
import yfinance as yf
from datetime import datetime

warnings.filterwarnings("ignore")
%matplotlib inline

: 

In [3]:
# Increase number of returned rows in pandas
pd.set_option("display.max_rows", 500)

In [9]:
# Current script folder
dirname = os.getcwd()

# Get location folders
data_in = f"{dirname}/da_data_exercises/ch05-generalizing_from_data/01-stock_data_analysis/data/raw/"
data_out = f"{dirname}/da_data_exercises/ch05-generalizing_from_data/01-stock_data_analysis/data/clean/"
output = f"{dirname}/da_data_exercises/ch05-generalizing_from_data/01-stock_data_analysis/data/output/"
func = f"{dirname}da_case_studies/ch00-tech_prep/"

paths = [data_in, data_out, output]

for path in paths:
    if not os.path.exists(path):
        os.makedirs(path)

It would be a good idea to see how Brazilian firms are doing. We could use **Petrobras**, the main Brazilian oil company (which is a state-owned multinational corporation). Let's test using `Yahoo Finance`.

In [12]:
ticker = yf.Ticker("PETR4.SA")

# Get data of the most recent date
petr4_data = ticker.history(period="10d")

petr4_data


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2024-04-19 00:00:00-03:00,37.34722,38.298385,37.151393,37.794827,80546900,0.0,0.0
2024-04-22 00:00:00-03:00,38.009305,38.783294,37.785503,38.699368,51775500,0.0,0.0
2024-04-23 00:00:00-03:00,38.606116,38.820594,38.195808,38.624763,35456900,0.0,0.0
2024-04-24 00:00:00-03:00,38.745994,39.156303,38.428939,38.44759,45388300,0.0,0.0
2024-04-25 00:00:00-03:00,38.606115,39.613229,38.307711,39.370777,66372400,0.0,0.0
2024-04-26 00:00:00-03:00,39.569925,40.03951,39.359089,39.914928,31899100,1.137805,0.0
2024-04-29 00:00:00-03:00,39.752009,40.3941,39.598675,40.3941,27886000,0.0,0.0
2024-04-30 00:00:00-03:00,40.250349,40.4516,39.886178,40.269516,36635400,0.0,0.0
2024-05-02 00:00:00-03:00,40.489931,40.614517,40.097011,40.422848,33244700,0.0,0.0
2024-05-03 00:00:00-03:00,40.689999,40.689999,39.450001,39.889999,45114200,1.757152,0.0


Great! Now, our goal is to get 10 years of data. Let's try to specify the period we want and hopefully this API won't let us down.

In [22]:
# Define the start date
start_date = datetime(2014, 5, 1)

# Define the end date
end_date = datetime(2024, 5, 2)

# Pass the parameters as the taken dates for start and end
petr4_data = yf.Ticker("PETR4.SA")
petr4_df = petr4_data.history(start=start_date, end=end_date)

In [23]:
petr4_df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2014-05-02 00:00:00-03:00,5.626026,6.027157,5.612429,6.016959,52627500,0.0,0.0
2014-05-05 00:00:00-03:00,5.999962,6.040755,5.884382,5.972766,35540000,0.0,0.0
2014-05-06 00:00:00-03:00,5.925175,6.282113,5.884381,6.200527,48256400,0.0,0.0
2014-05-07 00:00:00-03:00,6.166532,6.407891,6.142736,6.316106,44239500,0.0,0.0
2014-05-08 00:00:00-03:00,6.322906,6.390894,6.033956,6.078148,53471600,0.0,0.0


Jah bless Yahoo Finance! Let's move on.
## 2. Document the main features of the data ##
Let's take a picture of our dataset and describe it.



In [24]:
petr4_df.shape

(2486, 7)

In [25]:
petr4_df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2486 entries, 2014-05-02 00:00:00-03:00 to 2024-04-30 00:00:00-03:00
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          2486 non-null   float64
 1   High          2486 non-null   float64
 2   Low           2486 non-null   float64
 3   Close         2486 non-null   float64
 4   Volume        2486 non-null   int64  
 5   Dividends     2486 non-null   float64
 6   Stock Splits  2486 non-null   float64
dtypes: float64(6), int64(1)
memory usage: 155.4 KB


In [26]:
is_duplicate = petr4_df.duplicated().sum()
print(f"There are {is_duplicate} observations in the dataset.")

There are 0 observations in the dataset.


A brief summary:
- Our dataset has 2486 observations and 7 features/variables.
- All columns are numeric.
- There are no missing values.
- There are no duplicate values either.

About the variables, a short description:


| **Variable** | **Definition** |
|--------------| ---------------|
| `Open`       | The opening price for the specified date. |
| `High`       | The high price for the specified date. |
| `Low`        | The low price for the specified date. |
| `Close`      | The closing price for the specified date. |
| `Volume`     | The number of shares traded in a stock or contracts traded in futures or options. |
| `Dividends`  | A portion of a company's earnings that is paid to a shareholder. |
| `Stock Splits` |  A corporate action in which a company increases the number of its outstanding shares by issuing more shares to current shareholders. | 


## 3. Create daily percentage returns ##
Our goal is to work on closing prices, so we can include only `Close` and create a percent return variable that indicate the percentage change on a daily basis.


In [28]:
petr4_df = petr4_df["Close"]
petr4_df["pct_return"] = (
    (petr4_df["Close"] - petr4_df.shift(1).reset_index(drop=True)["Close"])
    /
    petr4_df.shift(1).reset_index(drop=True)["Close"]
)

KeyError: 'Close'