<div class="alert alert-block alert-info">
    
<center> 
    
# __Forecasting Economic and Market Regimes__ 
## __Data Preparation__ 
    
</center>

</div>

---

#### Author Information

* **Author:** ALI RAHIMI
* **Date:** 2023-08-20
* **Project:** Forecasting Economic and Market Regimes
* **Section:** Data Preparation

---

#### Note

All the functions required for the Data Preparation phase are defined in the **`data_preparation.py`** file. This dedicated module contains the tools and utilities necessary for data cleaning, feature engineering and labeling market regimes using 𝓁1-trend-filtering.

In [25]:
import sys
sys.path.append('../src')

import pandas as pd
import numpy as np

from data_preparation import DataCleaning
from data_preparation import FeatureEngineering
from data_preparation import TrendFiltering

from data_understanding import plot_regimes

## **1. Introduction**

Within the framework of the CRISP-DM methodology, the Data Preparation phase plays a pivotal role in shaping the data for subsequent analysis and modeling. This phase focuses on a series of tasks aimed at preparing the data for predictive modeling in a structured and methodical manner. The Data Preparation phase in our project includes the following key tasks:

* **[Data Cleaning:](#section2)** This task is concerned with identifying and addressing missing data within the features dataset. It involves the removal of features with more than 10 missing values and the application of forward filling to handle the remaining missing values.

* **[Feature Engineering:](#section3)** Feature engineering involves creating new features or transforming existing ones to enhance the predictive capabilities of our models. This includes generating lagged values of economic features, or applying transformations to make time series data stationary.

* **[Labeling:](#section4)** Labeling is the process of assigning target variables to the dataset. In our case, it involves associating market regime labels with the data points. 

* **[Data Integration:](#section5)** Data integration refers to the combining of features and labels from different sources to construct the final dataset. This merged dataset will be used for training and testing classification models to address the prediction objectives.

Through these tasks, the Data Preparation phase establishes a reliable and structured foundation for subsequent analysis and modeling. It ensures that the data is clean, complete, and appropriately structured to be fed into predictive models effectively.

For reference, the meticulously gathered and stored data assets are conveniently stored in the following directories:

| Data Name | Path and File Name |
| --------- | -------------- |
| FRED-MD Features Descriptions    | data/raw_data/fredmd_feat_spec.csv   |
| FRED-MD Features Panel    | data/raw_data/fredmd_feat_ds.csv   |
| NBER Regimes    | data/raw_data/nber_regimes.csv   |
| S&P500 Monthly OHLC | data/raw_data/SP500_monthly.csv   |

---

## **2. Data Cleaning** <a id="section2"></a>

In the pursuit of robust and insightful analysis, the data cleaning process emerges as a foundational step. While working with a dataset characterized by high-quality features, i.e. FRED-MD, our paramount focus within the data cleaning phase revolves around the __identification and mitigation of instances of missing data__. This step is pivotal in ensuring that our subsequent analysis and modeling endeavors are grounded in accurate and reliable information.

Our approach begins with the identification of features that exhibit a notable proportion of missing values. For features that manifest __more than 10 instances of missing data__, a judicious course of action involves their exclusion from our analysis. This pragmatic decision aids in streamlining our dataset while preserving its pertinent attributes.

Following this initial culling, we direct our attention to addressing the remaining gaps stemming from missing values. Employing a __forward-filling strategy__, we systematically impute these lacunae within the data sequence. This strategy draws upon the information from antecedent time points to facilitate the interpolation of missing values. The overarching objective is to ensure a coherent and uninterrupted progression of data, thereby bolstering the reliability and integrity of our dataset.

In [26]:
# Read the raw features dataset
raw_features_df = pd.read_csv('../data/raw_data/fredmd_feat_ds.csv')

df = DataCleaning(data=raw_features_df)

# remove rows with more than 10 null features
df.remove_null_rows(max_null=10, inplace=True)

# remove features with more than 10 null rows
df.remove_null_features(max_null=10, inplace=True)

# Forward fill null observations 
df.fill_null_obs(inplace=True)

# save the cleaned dataframe
cleaned_features_df = df.data


# Change the date column name
cleaned_features_df.rename(columns={'sasdate': 'Date'}, inplace=True)

cleaned_features_df

Unnamed: 0,Date,RPI,W875RX1,DPCERA3M086SBEA,CMRMTSPLx,RETAILx,INDPRO,IPFPNSS,IPFINAL,IPCONGD,...,PCEPI,DDURRG3M086SBEA,DNDGRG3M086SBEA,DSERRG3M086SBEA,CES0600000008,CES2000000008,CES3000000008,DTCOLNVHFNM,DTCTHFNM,INVEST
0,Transform:,5.000,5.0,5.000,5.000000e+00,5.00000,5.0000,5.0000,5.0000,5.0000,...,6.000,6.000,6.000,6.000,6.00,6.00,6.00,6.00,6.00,6.0000
1,1/1/1959,2583.560,2426.0,15.188,2.766768e+05,17689.23968,21.9616,23.3868,22.2620,31.6664,...,15.164,63.517,18.294,10.152,2.13,2.45,2.04,6476.00,12298.00,84.2043
2,2/1/1959,2593.596,2434.8,15.346,2.787140e+05,17819.01912,22.3917,23.7024,22.4549,31.8987,...,15.179,63.554,18.302,10.167,2.14,2.46,2.05,6476.00,12298.00,83.5280
3,3/1/1959,2610.396,2452.7,15.491,2.777753e+05,17967.91336,22.7142,23.8459,22.5651,31.8987,...,15.189,63.634,18.289,10.185,2.15,2.45,2.07,6508.00,12349.00,81.6405
4,4/1/1959,2627.446,2470.0,15.435,2.833627e+05,17978.97983,23.1981,24.1903,22.8957,32.4019,...,15.219,63.698,18.300,10.221,2.16,2.47,2.08,6620.00,12484.00,81.8099
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
791,11/1/2024,20091.169,16376.8,122.396,1.545040e+06,712145.00000,101.9619,99.3808,98.8609,100.8691,...,124.399,105.391,119.230,129.380,31.59,36.26,28.22,556011.41,938335.20,5381.4576
792,12/1/2024,20101.629,16387.7,123.077,1.558008e+06,717662.00000,103.1177,100.4976,99.9719,101.6868,...,124.769,104.883,119.746,129.875,31.72,36.43,28.33,559364.75,943484.76,5366.6686
793,1/1/2025,20148.969,16391.2,122.614,1.543178e+06,711461.00000,103.3418,101.0766,100.6319,102.1879,...,125.231,105.209,120.457,130.281,31.91,36.56,28.58,559087.09,944167.06,5350.2541
794,2/1/2025,20209.351,16389.5,122.742,1.556553e+06,711680.00000,104.2202,101.8233,101.4377,102.7245,...,125.788,105.634,120.615,130.990,32.00,36.66,28.68,556142.06,941199.49,5367.9408


Now the data is cleaned and ready for feature engineering.

---

## **3. Feature Engineering** <a id="section3"></a>

In this phase, we enhance the predictive power of our features through systematic transformations and the introduction of lagged features. The goal is to ensure feature stability and enrich their predictive capabilities.

1. **Applying Transformations**: We leverage the transformation codes from the **FRED-MD documentation** to systematically modify the features. These transformation codes encompass various mathematical operations such as logarithmic transformations, differencing operations, and ratios. These transformations serve to make the features stationary, maintaining their statistical properties over time. The transformations are defined as follows:

| TransCode | Transformation |
| --------- | -------------- |
| 1    | No Transformation   |
| 2    | $\Delta x_{t}$    |
| 3    | $\Delta^2 x_{t}$  |
| 4    | $\log(x_{t})$     |
| 5    | $\Delta \log(x_{t})$  |
| 6    | $\Delta^2 \log(x_{t})$  |
| 7    | $\Delta\left(\frac{x_{t}}{x_{t-1}} - 1\right)$  |

2. **Introducing Lagged Features**: Following the application of transformations, we expand our feature space by introducing lagged features. These lagged versions of the features provide historical context, capturing past values at different time intervals. We consider lag intervals of 1, 3, 6, 9, and 12 months to encompass various temporal dynamics.

The combination of transformed and lagged features enhances our models' ability to capture both short-term fluctuations and long-term trends present in economic and market data. Our transformation and feature engineering approach aligns with the guidelines established during the Data Understanding phase, ensuring consistency and accuracy throughout the process. This enhanced feature set equips our predictive models to better identify patterns and relationships essential for accurate forecasting.

In [27]:
df = FeatureEngineering(data = cleaned_features_df)

# apply transformations to the cleaned dataset of features
df.transform_features()

# add new lagged features to the dataset
df.add_lagged_features(lag_values = [1,3,6,9,12])

# Save features dataset in a new dataframe and save it into a new .csv file for future uses
final_features_df = df.data
final_features_df.to_csv('../data/processed_data/feats_ds.csv', index=False)

# Show final features dataframe
final_features_df.set_index('Date')

Unnamed: 0_level_0,RPI,W875RX1,DPCERA3M086SBEA,CMRMTSPLx,RETAILx,INDPRO,IPFPNSS,IPFINAL,IPCONGD,IPDCONGD,...,DTCTHFNM 1M Lag,DTCTHFNM 3M Lag,DTCTHFNM 6M Lag,DTCTHFNM 9M Lag,DTCTHFNM 12M Lag,INVEST 1M Lag,INVEST 3M Lag,INVEST 6M Lag,INVEST 9M Lag,INVEST 12M Lag
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1960-03-01,0.003024,0.001811,0.018449,-0.013636,0.002594,-0.017958,-0.009113,-0.004587,-0.010303,-0.033823,...,0.006099,0.018233,0.034111,0.034793,0.004138,-0.037926,-0.000688,-0.027018,-0.038791,-0.030921
1960-04-01,0.005341,0.004518,0.029391,-0.018187,0.024801,-0.016978,-0.001141,0.000000,0.008033,-0.021156,...,0.012437,0.012450,0.024419,0.042800,0.015011,-0.053379,-0.005654,-0.014082,-0.021701,-0.020784
1960-05-01,0.005870,0.006039,-0.005049,-0.021665,0.010857,-0.009099,0.007979,0.008015,0.012578,0.008270,...,0.018840,0.006099,0.016845,0.041298,0.023766,-0.025753,-0.037926,-0.013266,-0.017855,-0.011197
1960-06-01,0.003169,0.002192,-0.020601,-0.022965,-0.016411,-0.013795,-0.005721,-0.004603,-0.001143,-0.001185,...,0.022225,0.012437,0.018233,0.034111,0.034793,0.002181,-0.053379,-0.000688,-0.027018,-0.038791
1960-07-01,0.002555,0.001680,0.001811,0.001687,-0.011463,-0.016132,-0.017171,-0.017253,-0.016035,-0.049437,...,0.027577,0.018840,0.012450,0.024419,0.042800,-0.004001,-0.025753,-0.005654,-0.014082,-0.021701
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024-11-01,0.004875,0.005738,0.005785,0.002420,0.012812,-0.006194,-0.010413,-0.012851,-0.012110,-0.014724,...,0.004309,0.006544,0.004752,0.000028,0.005452,0.014688,0.017336,-0.006419,0.003853,-0.012474
2024-12-01,0.002319,0.003172,0.010011,0.012628,0.014101,0.008804,0.008822,0.009136,0.001767,-0.001381,...,0.004327,0.004192,0.002350,0.000110,0.006046,0.002098,0.017080,0.014894,0.018040,0.010490
2025-01-01,0.002873,0.000879,0.001780,-0.001206,-0.000961,0.013443,0.016920,0.017755,0.012990,-0.035773,...,0.006794,0.004309,0.003307,0.003053,0.004203,-0.007658,0.014688,0.019989,0.009105,0.017585
2025-02-01,0.005345,0.000110,-0.002726,-0.000934,-0.008370,0.010635,0.013105,0.014556,0.010153,0.023269,...,0.006196,0.004327,0.006544,0.004752,0.000028,-0.005815,0.002098,0.017336,-0.006419,0.003853


---

## __4. Labeling__ <a id="section4"></a>


### __4.1. Economic Regimes: Expansion and Contraction__

In our project, we will be utilizing supervised binary classification models, which require labels for training. The primary label we need is for classifying the business cycle's expansion and contraction, a determination made by the National Bureau of Economic Research (NBER) in the US. We will designate label 1 for expansion/normal periods and label 0 for contraction/recession periods, effectively classifying the various economic regimes. Since we are utilizing the NBER data, there is no need for a separate labeling process for economic regimes.

In [28]:
econ_regimes = pd.read_csv('../data/raw_data/nber_regimes.csv', parse_dates = ['Date']) 
econ_regimes.set_index('Date')

Unnamed: 0_level_0,EconRegime
Date,Unnamed: 1_level_1
1854-12-01,1
1855-01-01,0
1855-02-01,0
1855-03-01,0
1855-04-01,0
...,...
2025-01-01,0
2025-02-01,0
2025-03-01,0
2025-04-01,0


### __4.2. Market Regimes: Normal and Crash__

For the labeling of market regime periods, the responsibility falls upon us. To accomplish this, we embrace the market regime definition provided by the primary stakeholders of the project, namely asset managers and investment professionals. According to their perspective, **a market regime signifies a span of time characterized by the stability of market performance attributes**.

The process of labeling these periods involves the application of the **𝓁1-trend-filtering algorithm** to the S&P500 total return index. This algorithm plays a pivotal role in identifying a finely-fitted time series, which serves as the representative signal of the underlying trend. The achievement of this objective is realized through the solution of the following **optimization problem**:

\begin{equation*} 
    \hat{\beta} = \text{argmin}_{\beta \in \mathbb{R}^n} ||x-\beta||_2^2 + \lambda||D\beta||_1 
\end{equation*}

where,

\begin{equation*} 
D =
    \begin{bmatrix}
       1 & -1 & 0 & \dots & 0 & 0  \\
       0  & 1 &-1 & \dots & 0 & 0 \\
       \vdots \\
       0  & 0 & 0 & \dots & -1 & 0\\
       0  & 0 & 0 & \dots & 1 & -1
    \end{bmatrix}
\in \mathbb{R}^{(n-1)\times n}
\end{equation*}

The value of lambda (λ) is determined as 0.16 through a process of trial and error, a decision rooted in the experience of the stakeholders and the data and visual representations we provided to them. In fact, they assessed the impact of this penalization coefficient on the labeling process using their own heuristic approach.

It's important to clarify that the labels assigned to these market regimes are as follows: normal regimes are labeled as 0, while crash regimes are labeled as 1.

In [29]:
# Read SP500 monthly OHLC data
sp500_mn_df = pd.read_csv('../data/raw_data/SP500_monthly.csv', index_col='Date')

# Apply l1-trend-filtering
df = TrendFiltering(mkt_data = sp500_mn_df)
mkt_regimes = df.l1_trend_filter()

# Save data into a CSV file
mkt_regimes.to_csv('../data/processed_data/mkt_regimes.csv')

# Show Data
mkt_regimes

Unnamed: 0_level_0,MktRegime
Date,Unnamed: 1_level_1
1950-02-01,0
1950-03-01,0
1950-04-01,0
1950-05-01,0
1950-06-01,0
...,...
2025-01-01,0
2025-02-01,0
2025-03-01,0
2025-04-01,0


In [30]:
# Concat sp500 and crash data in a new dataframe
plot_df = pd.concat([sp500_mn_df[['Close']], mkt_regimes], axis=1).dropna()
plot_df.reset_index(inplace=True)

# Plot crash periods and SP500 closing price
plot_regimes(date_series = plot_df['Date'],
             data_series = plot_df['Close'], 
             regime_series = plot_df['MktRegime'],
             area_name = 'Crash',
             data_name = 'S&P500',
             x_axis_title = 'Date',
             y_axis_title = 'S&P500',
             plot_title = 'Crash Periods over S&P 500 Monthly Closing Price',
             log_scale = True,
             line_color = 'blue',
             width = 1200,
             height = 600)

ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed

---

## __5. Data Integration__ <a id="section5"></a>

In this step, we will integrate our features with labels to construct the dataset for training and testing the classification models that will be used to address the prediction problem at hand.

To do this, we will combine the feature data with the corresponding labels, ensuring that each data point is correctly associated with its corresponding target label. This integrated dataset will serve as the foundation for training and evaluating our classification models.

By constructing this unified dataset, we can effectively feed the relevant information to the machine learning algorithms, allowing them to learn from the patterns in the data and make accurate predictions based on the provided labels. This dataset will play a crucial role in training our models to classify and predict the business cycle expansion and contraction, enabling us to make insightful and data-driven decisions.

### __5.1. Economic Regimes Dataset__ <a id="section51"></a>

To obtain a dataset containing both features and labels of economic regimes, we merge two dataframes and store the resultant dataset in a CSV file for use in the subsequent modeling phases.

In [31]:
# Merge features and economic lables, set 'Date' as index and drop NaN
econ_regs_ds = final_features_df.merge(econ_regimes, on='Date', how='left')
econ_regs_ds.set_index('Date', inplace=True)
econ_regs_ds.dropna(inplace=True)

# Save data into a CSV file
econ_regs_ds.to_csv('../data/datasets/econ_regs_ds.csv')

# Show dataset
econ_regs_ds

Unnamed: 0_level_0,RPI,W875RX1,DPCERA3M086SBEA,CMRMTSPLx,RETAILx,INDPRO,IPFPNSS,IPFINAL,IPCONGD,IPDCONGD,...,DTCTHFNM 3M Lag,DTCTHFNM 6M Lag,DTCTHFNM 9M Lag,DTCTHFNM 12M Lag,INVEST 1M Lag,INVEST 3M Lag,INVEST 6M Lag,INVEST 9M Lag,INVEST 12M Lag,EconRegime
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1960-03-01,0.003024,0.001811,0.018449,-0.013636,0.002594,-0.017958,-0.009113,-0.004587,-0.010303,-0.033823,...,0.018233,0.034111,0.034793,0.004138,-0.037926,-0.000688,-0.027018,-0.038791,-0.030921,0
1960-04-01,0.005341,0.004518,0.029391,-0.018187,0.024801,-0.016978,-0.001141,0.000000,0.008033,-0.021156,...,0.012450,0.024419,0.042800,0.015011,-0.053379,-0.005654,-0.014082,-0.021701,-0.020784,0
1960-05-01,0.005870,0.006039,-0.005049,-0.021665,0.010857,-0.009099,0.007979,0.008015,0.012578,0.008270,...,0.006099,0.016845,0.041298,0.023766,-0.025753,-0.037926,-0.013266,-0.017855,-0.011197,1
1960-06-01,0.003169,0.002192,-0.020601,-0.022965,-0.016411,-0.013795,-0.005721,-0.004603,-0.001143,-0.001185,...,0.012437,0.018233,0.034111,0.034793,0.002181,-0.053379,-0.000688,-0.027018,-0.038791,1
1960-07-01,0.002555,0.001680,0.001811,0.001687,-0.011463,-0.016132,-0.017171,-0.017253,-0.016035,-0.049437,...,0.018840,0.012450,0.024419,0.042800,-0.004001,-0.025753,-0.005654,-0.014082,-0.021701,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024-11-01,0.004875,0.005738,0.005785,0.002420,0.012812,-0.006194,-0.010413,-0.012851,-0.012110,-0.014724,...,0.006544,0.004752,0.000028,0.005452,0.014688,0.017336,-0.006419,0.003853,-0.012474,0
2024-12-01,0.002319,0.003172,0.010011,0.012628,0.014101,0.008804,0.008822,0.009136,0.001767,-0.001381,...,0.004192,0.002350,0.000110,0.006046,0.002098,0.017080,0.014894,0.018040,0.010490,0
2025-01-01,0.002873,0.000879,0.001780,-0.001206,-0.000961,0.013443,0.016920,0.017755,0.012990,-0.035773,...,0.004309,0.003307,0.003053,0.004203,-0.007658,0.014688,0.019989,0.009105,0.017585,0
2025-02-01,0.005345,0.000110,-0.002726,-0.000934,-0.008370,0.010635,0.013105,0.014556,0.010153,0.023269,...,0.004327,0.006544,0.004752,0.000028,-0.005815,0.002098,0.017336,-0.006419,0.003853,0


### __5.2. Market Regimes Dataset__ <a id="section52"></a>

To obtain a dataset containing both features and labels of market regimes, we merge two dataframes and store the resultant dataset in a CSV file for use in the subsequent modeling phases.

In [32]:
# Convert index type for merge
mkt_regimes.index = pd.to_datetime(mkt_regimes.index)

# Merge features and economic lables, set 'Date' as index and drop NaN
mkt_regs_ds = final_features_df.merge(mkt_regimes.reset_index(), on='Date', how='left')
mkt_regs_ds.set_index('Date', inplace=True)
mkt_regs_ds.dropna(inplace=True)

# Save data into a CSV file
mkt_regs_ds.to_csv('../data/datasets/mkt_regs_ds.csv')

# Show dataset
mkt_regs_ds

Unnamed: 0_level_0,RPI,W875RX1,DPCERA3M086SBEA,CMRMTSPLx,RETAILx,INDPRO,IPFPNSS,IPFINAL,IPCONGD,IPDCONGD,...,DTCTHFNM 3M Lag,DTCTHFNM 6M Lag,DTCTHFNM 9M Lag,DTCTHFNM 12M Lag,INVEST 1M Lag,INVEST 3M Lag,INVEST 6M Lag,INVEST 9M Lag,INVEST 12M Lag,MktRegime
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1960-03-01,0.003024,0.001811,0.018449,-0.013636,0.002594,-0.017958,-0.009113,-0.004587,-0.010303,-0.033823,...,0.018233,0.034111,0.034793,0.004138,-0.037926,-0.000688,-0.027018,-0.038791,-0.030921,0
1960-04-01,0.005341,0.004518,0.029391,-0.018187,0.024801,-0.016978,-0.001141,0.000000,0.008033,-0.021156,...,0.012450,0.024419,0.042800,0.015011,-0.053379,-0.005654,-0.014082,-0.021701,-0.020784,0
1960-05-01,0.005870,0.006039,-0.005049,-0.021665,0.010857,-0.009099,0.007979,0.008015,0.012578,0.008270,...,0.006099,0.016845,0.041298,0.023766,-0.025753,-0.037926,-0.013266,-0.017855,-0.011197,0
1960-06-01,0.003169,0.002192,-0.020601,-0.022965,-0.016411,-0.013795,-0.005721,-0.004603,-0.001143,-0.001185,...,0.012437,0.018233,0.034111,0.034793,0.002181,-0.053379,-0.000688,-0.027018,-0.038791,0
1960-07-01,0.002555,0.001680,0.001811,0.001687,-0.011463,-0.016132,-0.017171,-0.017253,-0.016035,-0.049437,...,0.018840,0.012450,0.024419,0.042800,-0.004001,-0.025753,-0.005654,-0.014082,-0.021701,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024-11-01,0.004875,0.005738,0.005785,0.002420,0.012812,-0.006194,-0.010413,-0.012851,-0.012110,-0.014724,...,0.006544,0.004752,0.000028,0.005452,0.014688,0.017336,-0.006419,0.003853,-0.012474,0
2024-12-01,0.002319,0.003172,0.010011,0.012628,0.014101,0.008804,0.008822,0.009136,0.001767,-0.001381,...,0.004192,0.002350,0.000110,0.006046,0.002098,0.017080,0.014894,0.018040,0.010490,0
2025-01-01,0.002873,0.000879,0.001780,-0.001206,-0.000961,0.013443,0.016920,0.017755,0.012990,-0.035773,...,0.004309,0.003307,0.003053,0.004203,-0.007658,0.014688,0.019989,0.009105,0.017585,0
2025-02-01,0.005345,0.000110,-0.002726,-0.000934,-0.008370,0.010635,0.013105,0.014556,0.010153,0.023269,...,0.004327,0.006544,0.004752,0.000028,-0.005815,0.002098,0.017336,-0.006419,0.003853,0


## __5. Final Note__ <a id="section5"></a>

Up to this point, we have successfully created datasets for our project and stored them in the <code>../data/datasets/</code> directory. In the upcoming sections, we will leverage this data to progress through the subsequent phases of the CRISP-DM methodology. For your convenience, we have provided a summary of the CSV file names below:

| Data Name | Path and File Name |
| --------- | -------------- |
| Economic Regimes Dataset    | ../data/datasets/econ_regs_ds.csv   |
| Market Regimes Dataset    | ../data/datasets/mkt_regs_ds.csv   |