# Multivariate Time Series Analysis Project

This project is part of the *Advanced Time Series Prediction* course at OpenCampus, Kiel University. The primary goal is to conduct a comprehensive *multivariate time series analysis* to forecast the daily closing price of Bitcoin, focusing on the influence of various economic, social, and sentiment-based factors. 

### Project Objective

We aim to predict **Bitcoin’s daily closing prices** by examining the potential impact of multiple time series variables:
- **S&P 500 Closing Data**: to capture broad market movements.
- **Inflation Rate**: to account for macroeconomic pressures.
- **Daily Treasury Rates**: representing economic stability indicators.
- **Bitcoin Daily Google Trends**: to gauge public interest.
- **Twitter Sentiments**: to assess social media sentiment around Bitcoin.
- **Holiday Indicator**: to consider public holidays and weekends as factors that might influence trading volume and price volatility.

### Data Processing and Pretreatment

The collected data will undergo a series of pretreatment steps, including:
- **Typecasting** to ensure compatibility between time series values.
- **Handling Missing Values** using appropriate imputation techniques to maintain data integrity.

### Goal

The ultimate goal is to utilize this preprocessed dataset in a predictive model that accurately captures the relationship between Bitcoin’s closing prices and the identified factors, achieving reliable forecasts for strategic decision-making. This project provides hands-on experience in multivariate time series forecasting and enhances practical knowledge in handling complex datasets, developing critical skills for time series modeling and analysis.

### Interest Rate
### Code Explanation

In the following script, we are downloading daily U.S. Treasury rates data for each year from 01-11-2014 up to today. Our goal is to extract only the *Date* column and the first data column immediately after that is **4-week Bank Discount** rate. We then save the resulting dataset as a `.csv` file.

Here’s a step-by-step breakdown of what this code does:

1. **Imports**:  
   - We import the essential libraries: `requests` to make HTTP requests, `pandas` for data manipulation, `io` to handle in-memory text data, and `datetime` to get the current year dynamically.

2. **Data Retrieval and Processing**:  
   - We start by initializing an empty DataFrame named `daily_treasury_rates` to store data across multiple years.
   - We define the structure of the URL with `base_url`, which we’ll use to download each year’s data.
   - Using `datetime.now().year`, we determine the current year and create a range from 2014 to this year. This ensures that we retrieve the most current data available
   - For each year in this range, we format the URL, request the data, and read it into a DataFrame if the response is successful (`status code 200`).
   - We only select the *Date* column and the first data column next to it (`.iloc[:, :2]`), keeping the dataset focused on key information only.
   - Each year’s selected data is appended to `daily_treasury_rates` with `pd.concat()`.

3. **Saving the Data**:  
   - Finally, we saved (in local directory) our combined data as a `.csv` file, excluding row indices thanks to `index=False`.

### Why We Focus on the **4-Week Bank Discount Rate**

We’re interested in the **4-week Bank Discount** rate because it’s crucial for studying the multivariate time series relationship between U.S. interest rates and **Bitcoin** prices. This short-term rate offers insights into borrowing costs and liquidity in the economy, influencing investor preferences and risk appetite. These factors can significantly affect asset prices, including cryptocurrencies. By focusing on this daily rate, we capture high-frequency movements in interest rates, which could correlate with Bitcoin’s price dynamics. This approach may reveal important relationships or patterns within the market structure over time, helping us understand how interest rate changes might influence cryptocurrency prices.

In [8]:
# Import necessary libraries
import requests  # For sending HTTP requests
import pandas as pd  # For data manipulation and analysis
import io  # For handling in-memory text data as files
from datetime import datetime  # For getting the current year

# Initialize an empty DataFrame to store the daily treasury rates data
daily_treasury_rates = pd.DataFrame()

# Define the base URL format and the range of years for data retrieval
base_url = "https://home.treasury.gov/resource-center/data-chart-center/interest-rates/daily-treasury-rates.csv/{year}/all?type=daily_treasury_bill_rates&field_tdr_date_value={year}&page&_format=csv"
current_year = datetime.now().year  # Get the current year
years = range(2014, current_year + 1)  # Set the range of years from 2014 to the current year

# Loop through each year, download the CSV data, and append only the required columns to the main DataFrame
for year in years:
    # Format the URL for the current year
    url = base_url.format(year=year)
    response = requests.get(url)
    
    # Check if the request was successful (status code 200 indicates success)
    if response.status_code == 200:
        # Read the CSV data for the current year from the response
        yearly_data = pd.read_csv(io.StringIO(response.text))
        
        # Select only the "Date" column and the column immediately after it
        selected_data = yearly_data.iloc[:, :2]
        
        # Append selected data to the main DataFrame and print the number of rows added
        daily_treasury_rates = pd.concat([daily_treasury_rates, selected_data], ignore_index=True)
        print(f"{len(selected_data)} rows added for year {year}")
    else:
        print(f"Failed to retrieve data for year {year}")

# Save the daily treasury rates data to a CSV file on the D: drive
daily_treasury_rates.to_csv("daily_treasury_rates.csv", index=False)
print("Data has been saved to daily_treasury_rates.csv")

250 rows added for year 2014
251 rows added for year 2015
250 rows added for year 2016
250 rows added for year 2017
249 rows added for year 2018
250 rows added for year 2019
251 rows added for year 2020
251 rows added for year 2021
249 rows added for year 2022
250 rows added for year 2023
211 rows added for year 2024
Data has been saved to daily_treasury_rates.csv


### Inflation Expectation
### Code Explanation

This Python script is designed to download daily inflation data represented by the **T10YIE** (10-Year Breakeven Inflation Rate) from the `Federal Reserve Economic Data (FRED)` website. Here’s a breakdown of what the code does:

1. **Imports**:  
   - The script begins by importing the necessary libraries: `requests` to handle HTTP requests and `datetime` to manage date and time data.

2. **Data Retrieval and Processing**:  
   - We define the start date as **November 1, 2014**, and the end date is set to today’s date, dynamically determined using `datetime.today().strftime("%Y-%m-%d")`. This ensures that we always fetch the most recent data available.

   - The URL is constructed using the defined date range to request the **T10YIE** data in CSV format. This data represents the market's expectations of future inflation based on the yield difference between nominal and inflation-protected securities.

   - A GET request is sent to the FRED URL to retrieve the data. The script checks if the request is successful (HTTP status code **200**). 

3. **Saving the Data**:  
   - If the data retrieval is successful, the script saves the content to a CSV file named `inflation_data.csv` in the current working directory. If the request fails, it prints an error message with the status code.

### Importance of Using the **10-Year Breakeven Inflation Expectation** Rate

Using the **daily T10YIE rate** as a proxy for inflation data is crucial because the **Consumer Price Index (CPI)** is not available on a daily basis. The T10YIE provides a timely measure of inflation expectations that can be correlated with Bitcoin prices. 

By analyzing the multivariate time series relationship between daily T10YIE rates and Bitcoin prices, we can gain insights into how market perceptions of inflation impact cryptocurrency valuations. This approach allows us to capture rapid changes in inflation expectations, providing a more nuanced understanding of the interactions between traditional financial metrics and emerging digital asset prices.

In [9]:
import requests  # For sending HTTP requests
from datetime import datetime  # For handling date and time

# Define the date range for the data
start_date = "2014-11-01"  # Starting date for the data
end_date = datetime.today().strftime("%Y-%m-%d")  # Today's date as the ending date

# Construct the URL for downloading data from FRED
url = f"https://fred.stlouisfed.org/graph/fredgraph.csv?id=T10YIE&cosd={start_date}&coed={end_date}&fq=Daily"

# Request the data from FRED
response = requests.get(url)  # Send a GET request to the specified URL

# Check if the request was successful (status code 200 indicates success)
if response.status_code == 200:
    # Define the path to save the CSV file in the current directory
    file_path = "inflation_data.csv"  # Save in the local directory (current working directory)
    
    # Save the content to a CSV file
    with open(file_path, 'wb') as file:  # Open the file in write-binary mode
        file.write(response.content)  # Write the response content to the file
    
    print(f"Data successfully downloaded and saved to {file_path}")  # Success message
else:
    print(f"Failed to download data. Status code: {response.status_code}")  # Error message

Data successfully downloaded and saved to inflation_data.csv


### Stock Market Index (S&P 500)
### Code Explanation

In this Python script, we aim to download historical daily closing prices for the **S&P 500** index using the **Yahoo Finance** API. Below is a breakdown of what our code accomplishes:

1. **Imports**:  
   - We start by importing the necessary libraries: `yfinance` for fetching financial data from Yahoo Finance and `pandas` for data manipulation and analysis.

2. **Ticker Symbol Definition**:  
   - We define the ticker symbol for the S&P 500 as `^GSPC`, which we will use to fetch the index data.

3. **Date Range Setup**:  
   - Our script sets the start date to **November 1, 2014**, and the end date is dynamically set to today's date using `pd.Timestamp.today().strftime('%Y-%m-%d')`. This ensures that we retrieve the most current data available.

4. **Data Retrieval**:  
   - Using `yf.download()`, we fetch the historical data for the specified ticker symbol over the defined date range.

5. **Data Extraction**:  
   - We extract only the closing prices from the downloaded data, focusing on the `Close` column.

6. **Data Saving**:  
   - Finally, we save the closing prices to a CSV file named `sp500_closing_data.csv` in our local directory.

### Importance of Using Daily Closing S&P 500 Data

Utilizing the **daily closing index of the S&P 500** is vital for us when studying the multivariate time series relationship between the **U.S. stock index** and **Bitcoin** prices. The S&P 500 serves as a key indicator of the overall health of the U.S. economy and investor sentiment. By analyzing the daily fluctuations in the S&P 500 alongside Bitcoin prices, we can identify potential correlations and trends.

The daily closing data provides us with a granular view of market dynamics, allowing us to detect patterns that may influence Bitcoin valuations. As both the stock market and cryptocurrency market can react to macroeconomic news and investor sentiment, understanding their relationship can help us develop investment strategies and risk management practices. This analysis ultimately leads us to a deeper understanding of how traditional financial metrics and digital assets interact in the broader financial landscape.

In [None]:
# Import necessary libraries
import yfinance as yf  # For downloading financial data from Yahoo Finance
import pandas as pd  # For data manipulation and analysis

# Define the ticker symbol for the S&P 500
ticker_symbol = '^GSPC'  # Yahoo Finance ticker for the S&P 500

# Set the start and end dates for the data retrieval
start_date = '2014-11-01'  # Starting date for the historical data
end_date = pd.Timestamp.today().strftime('%Y-%m-%d')  # Today's date as the ending date

# Calculate the total number of days between start_date and end_date
total_days = pd.date_range(start=start_date, end=end_date).shape[0]  # Total days in the date range

# Fetch the historical data for the S&P 500
sp500_data = yf.download(ticker_symbol, start=start_date, end=end_date)  # Download the data

# Extract only the date and closing price from the data
sp500_closing_data = sp500_data[['Close']]  # Focus on the 'Close' column

# Save the closing data to a CSV file
sp500_closing_data.to_csv('sp500_closing_data.csv')  # Write the DataFrame to a CSV file

# Display the total number of days of data that were successfully scraped
print(f"Total days of data successfully scraped: {total_days}")  # Print the total days

[*********************100%%**********************]  1 of 1 completed

First few rows of S&P 500 closing data:
                  Close
Date                   
2014-11-03  2017.810059
2014-11-04  2012.099976
2014-11-05  2023.569946
2014-11-06  2031.209961
2014-11-07  2031.920044
Total days of data successfully scraped: 3656





### Google Trend - Interest over time
### Code Explanation

This Python script retrieves historical **Google Trends** data for the keyword "bitcoin" and stores it as daily interest data over multiple 3-month intervals. This allows us to analyze trends in Google searches related to Bitcoin. Below is a detailed breakdown of each section in the code:

1. **Imports**:  
   - We import `TrendReq` from `pytrends`, which allows us to interact with the Google Trends API.
   - `pandas` is used for data manipulation, while `datetime` and `timedelta` help us work with date ranges.

2. **Setting Up Pytrends**:  
   - We initialize **Pytrends** with parameters such as `hl='en-US'` (language and region) and `tz=360` (timezone offset in minutes, where 360 equals UTC+6).

3. **Defining Parameters**:  
   - We specify "bitcoin" as the keyword to be searched and set a date range starting from **January 1, 2014** to the present date.

4. **Looping Through 3-Month Intervals**:  
   - Google Trends allows only a 90-day range for daily data. Therefore, we loop through each 3-month period from the start date to today’s date.
   - For each interval, we set a **timeframe** and use `pytrends.build_payload` to fetch the data.
   - The fetched data is then appended to an accumulating **DataFrame** (`all_data`).

5. **Data Processing**:  
   - We remove the `isPartial` column if it exists, as it only indicates incomplete data for the final days in a period.
   - We reset the index of the DataFrame for a cleaner structure.

6. **Saving Data**:  
   - Finally, we save the combined dataset as a **CSV file** named `bitcoin_daily_google_trends.csv` in the local directory.

### Importance of Using Google Trends Data

Using **Google Trends** data, specifically the *interest over time* metric, is essential for studying the multivariate time series relationship between **Google search interest** and **Bitcoin prices**. The **daily interest scores** in Google Trends serve as a proxy for public interest, sentiment, or hype surrounding Bitcoin. 

Analyzing these trends allows us to identify whether there’s a correlation between spikes in search interest and Bitcoin's price movements. Google Trends data helps us gauge how shifts in public interest might influence Bitcoin's volatility and trading volume, providing valuable insights for both financial analysis and investment strategy development.

In [11]:
# Import necessary libraries
from pytrends.request import TrendReq  # For accessing Google Trends data
import pandas as pd  # For data manipulation
from datetime import datetime, timedelta  # For date handling

# Initialize pytrends and set up parameters
# 'hl' specifies the language (en-US for English, United States)
# 'tz' specifies the timezone offset in minutes (360 for UTC+6)
pytrends = TrendReq(hl='en-US', tz=360)

# Define the keyword to search for and the data range for the search
keyword = "bitcoin"  # Keyword we are analyzing on Google Trends
start_date = "2014-11-01"  # Start date for the data
end_date = datetime.today().strftime('%Y-%m-%d')  # Today's date as the end date

# Initialize an empty DataFrame to store the accumulated results
all_data = pd.DataFrame()

# Convert start and end dates to datetime objects to support date arithmetic
current_start = datetime.strptime(start_date, "%Y-%m-%d")
end_date = datetime.strptime(end_date, "%Y-%m-%d")

# Loop through each 3-month interval to get daily trend data as Google Trends allows only up to 90 days for daily data granularity
while current_start < end_date:
    # Define the end of the current 3-month period or up to the end date
    current_end = min(current_start + timedelta(days=90), end_date)
    
    # Format the time frame for Google Trends API
    timeframe = f"{current_start.strftime('%Y-%m-%d')} {current_end.strftime('%Y-%m-%d')}"
    
    # Build the payload and retrieve interest over time data
    # 'cat' is set to 0 (default), meaning no specific category filtering (or we can use cat=7 for finance)
    # 'geo' is empty, so no country-specific filtering is applied (global data)
    # 'gprop' is empty, meaning no specific Google property
    pytrends.build_payload([keyword], cat=0, timeframe=timeframe, geo='', gprop='')
    data = pytrends.interest_over_time()
    
    # Append the data for the current interval to the all_data DataFrame
    if not data.empty:
        all_data = pd.concat([all_data, data])

    # Move the start date forward by one day after the current_end to avoid overlap
    current_start = current_end + timedelta(days=1)

# Remove the 'isPartial' column if it exists, as it indicates incomplete data
if 'isPartial' in all_data.columns:
    all_data = all_data.drop(columns=['isPartial'])

# Reset the index for clean DataFrame formatting
all_data.reset_index(inplace=True)

# Save the combined data to a CSV file in the local directory
all_data.to_csv('bitcoin_daily_google_trends.csv', index=False)  # Save without row index

  df = df.fillna(False)


### Bitcoin Historical Data

### Code Explanation
This Python script retrieves daily historical **closing prices** for **Bitcoin** (in USD) from **Yahoo Finance** and saves it to a CSV file. Below is a breakdown of each part of the code:

1. **Define Ticker Symbol**:
   - We specify the ticker symbol `BTC-USD`, which corresponds to **Bitcoin in USD** on Yahoo Finance.

2. **Set the Date Range**:
   - `start_date` is set to `'2014-11-01'`, the earliest available date for Bitcoin data.
   - `end_date` is set to today’s date, ensuring that we retrieve the latest available data.

3. **Fetch Historical Data**:
   - We use `yf.download()` with the defined `ticker_symbol`, `start_date`, and `end_date` to download **historical Bitcoin data** for the specified date range.

4. **Extract Closing Prices**:
   - The data contains multiple columns (Open, High, Low, Close, etc.), but we are only interested in the **closing prices**. 
   - We extract the `Close` column and save it in a new DataFrame `btc_closing_data`.

5. **Save Data to CSV**:
   - We save the closing prices to a CSV file named **`bitcoin_closing_prices.csv`** in the local directory.

This data provides daily Bitcoin closing prices over the specified period, which is useful for analyzing **Bitcoin's historical performance** and studying its relationship with other financial or economic indicator.

In [12]:
# Import necessary libraries
import yfinance as yf  # For accessing financial data from Yahoo Finance
import pandas as pd  # For data manipulation and analysis

# Define the ticker symbol for Bitcoin
ticker_symbol = 'BTC-USD'  # Yahoo Finance ticker for Bitcoin in USD

# Set the start and end dates for the data retrieval
start_date = '2014-11-01'  # Starting date for historical Bitcoin data
end_date = pd.Timestamp.today().strftime('%Y-%m-%d')  # Today's date as the end date

# Fetch the historical data for Bitcoin
btc_data = yf.download(ticker_symbol, start=start_date, end=end_date)  # Download historical Bitcoin data

# Check if we received data and display the first available date if we did
if btc_data.empty:
    print("No data retrieved for the specified date range. Bitcoin data may not be available before 2014.")
else:
    # Extract only the closing prices from the data
    btc_closing_data = btc_data[['Close']]  # Keep only the 'Close' column

    # Calculate and display the total number of days of data retrieved
    total_days = btc_closing_data.shape[0]  # Count the number of rows in the DataFrame
    print(f"\nTotal days of data retrieved: {total_days}")  # Display total days
    print(f"Data starts from: {btc_closing_data.index.min().date()}")  # Display the earliest available date

    # Save the closing prices to a CSV file in the local directory
    btc_closing_data.to_csv('bitcoin_closing_prices.csv')  # Write the closing data to CSV file

[*********************100%%**********************]  1 of 1 completed


Total days of data retrieved: 3655
Data starts from: 2014-11-01





### Dataframe

In [18]:
import pandas as pd
from datetime import datetime
import holidays

# Define start and end dates
start_date = datetime(2014, 11, 1)
end_date = datetime.today()

# Generate a date range for column 1
dates = pd.date_range(start=start_date, end=end_date)

# Initialize the DataFrame with dates as the first column
df = pd.DataFrame(dates, columns=['date'])

# Load data from the specified CSV files
btc_data = pd.read_csv('bitcoin_closing_prices.csv', index_col=0, parse_dates=True)
sp500_data = pd.read_csv('sp500_closing_data.csv', index_col=0, parse_dates=True)
inflation_data = pd.read_csv('inflation_data.csv', index_col=0, parse_dates=True)
treasury_data = pd.read_csv('daily_treasury_rates.csv', index_col=0, parse_dates=True)
google_trends_data = pd.read_csv('bitcoin_daily_google_trends.csv', index_col=0, parse_dates=True)

# Rename columns for uniformity
btc_data.columns = ['bitcoin_closing_prices']
sp500_data.columns = ['sp500_closing_data']
inflation_data.columns = ['inflation_rate']
treasury_data.columns = ['daily_treasury_rates']
google_trends_data.columns = ['bitcoin_daily_google_trends']

# Merge all the data on the 'date' column of the main DataFrame using left join
df = df.merge(btc_data, how='left', left_on='date', right_index=True)
df = df.merge(sp500_data, how='left', left_on='date', right_index=True)
df = df.merge(inflation_data, how='left', left_on='date', right_index=True)
df = df.merge(treasury_data, how='left', left_on='date', right_index=True)
df = df.merge(google_trends_data, how='left', left_on='date', right_index=True)

# Set up holiday dates
us_holidays = holidays.US()  # U.S. holiday list

# Add a column to mark holidays (both public holidays and weekends)
df['is_holiday'] = df['date'].apply(
    lambda x: 1 if x in us_holidays or x.weekday() >= 5 else 0  # Mark as holiday if it's a public holiday or weekend
)

# Create an empty 'twitter_sentiments_score' column
df['twitter_sentiments_score'] = None

# Save the final dataset to CSV
df.to_csv('dataset.csv', index=False)

print("Dataset created and saved as 'dataset.csv'")

Dataset created and saved as 'dataset.csv'
