<div style="text-align: center;">
    <h1>CPI Prediction Using Linear Regression</h1>
    </div>
</div>

# 1. Introduction

The Consumer Price Index (CPI) is a critical economic indicator that measures the average change in prices paid by consumers for a basket of goods and services over time. It is a widely used indicator of inflation and deflation, closely followed by policymakers, financial markets, businesses, and consumers. 

In this assignment, we aim to predict the CPI using past monthly data and evaluate the performance of Linear Regression, Ridge Regression, and Lasso Regression models. By incorporating various economic indicators such as past CPI values, output gap, unemployment rate, interest rates, money supply, wage growth, commodity prices, exchange rates and some additional indicator, the goal is to identify the most effective model for forecasting CPI and understand the relevance of different indicators in predicting inflation trends.

# 2. Regression Models

## 2.1 Preliminary Regression Model

The preliminary regression model can be illustrated as follows:

$$
CPI_t = \alpha + \beta_1 \cdot{CPI}_{t-1} + \beta_2 \cdot {OutputGap}_t + \beta_3 \cdot {UnemploymentRate}_t + \beta_4 \cdot {InterestRates}_t + \beta_5 \cdot {MoneySupply}_t \\+ \beta_6 \cdot {WageGrowth}_t + \beta_7 \cdot {CommodityPrices}_t + \beta_8 \cdot {ExchangeRates}_t + \epsilon_t
$$

- $\beta$: Coefficients representing the sensitivity of CPI to each respective feature.
- $\alpha$: Intercept term, representing the baseline level of CPI when all other factors are zero.
- $CPI_t$: Consumer Price Index at time $t$.
- $CPI_{t-1}$: Historical CPI data, representing the CPI from the previous month. It captures the persistence of inflation.
- $OutputGap_t$: Difference between actual and potential economic output. A positive output gap can lead to higher inflation. The calculation for the output gap is (Y–Y*)/Y* where Y is actual output and Y* is potential output.
- $UnemploymentRate_t$: Unemployment rate, reflecting labor market conditions. A lower unemployment rates can lead to higher inflation based on Phillips Curve.
- $InterestRates_t$: Central bank policy rates, such as the Federal Funds Rate, influences inflation through monetary policy.
- $MoneySupply_t$: Measures like M2 (a broad measure of money supply) can impact inflation.
- $WageGrowth_t$: Growth in wages, indicating changes in consumer spending power. Increases in wages can lead to higher consumer spending and inflation.
- $CommodityPrices_t$: Prices of key commodities like oil and food can directly affect inflation.
- $ExchangeRates_t$: Changes in exchange rates can influence import prices and thus inflation.
- $\epsilon_t$: Error term, capturing the variance in CPI not explained by the model.


## 2.2 Regression Model with an Additional Feature


$$
CPI_t = \alpha + \beta_1 \cdot{CPI}_{t-1} + \beta_2 \cdot {OutputGap}_t + \beta_3 \cdot {UnemploymentRate}_t + \beta_4 \cdot {InterestRates}_t + \beta_5 \cdot {MoneySupply}_t \\ + \beta_6 \cdot {WageGrowth}_t + \beta_7 \cdot {CommodityPrices}_t + \beta_8 \cdot {ExchangeRates}_t + \beta_9 \cdot {CCI}_t + \epsilon_t
$$

- $CCI_t$: Consumer Confidence Index at time $t$. It measures the degree of optimism that consumers feel about the overall state of the economy and their personal financial situation.

# 3. Data Preparation

## 3.1 Data Description and Source
<div style="text-align: left;">
  <table style="width: 100%; border-collapse: collapse; table-layout: auto; word-wrap: break-word;">
    <thead>
      <tr>
        <th style="text-align: left; padding: 8px;">Data</th>
        <th style="text-align: left; padding: 8px;">Description</th>
        <th style="text-align: left; padding: 8px;">Source</th>
        <th style="text-align: left; padding: 8px;">Link</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td style="text-align: left; padding: 8px;">CPI</td>
        <td style="text-align: left; padding: 8px;">CPI-U measures the average change over time in the prices paid by urban consumers for a market basket of goods and services.</td>
        <td style="text-align: left; padding: 8px;">U.S. Bureau of Labor Statistics (BLS)</td>
        <td style="text-align: left; padding: 8px;"><a href="https://data.bls.gov/timeseries/CUSR0000SA0&output_view=pct_1mth">Link</a></td>
      </tr>
      <tr>
        <td style="text-align: left; padding: 8px;">Output Gap</td>
        <td style="text-align: left; padding: 8px;">This measures the difference between an economy's actual and potential output, as an indicator of resource utilization. A positive gap suggests overheating and potential inflation, while a negative gap indicates underutilization and room for growth without inflationary pressures.</td>
        <td style="text-align: left; padding: 8px;">International Monetary Fund</td>
        <td style="text-align: left; padding: 8px;"><a href="https://www.imf.org/en/Publications/WEO/weo-database/2024/October/weo-report?c=111,&s=NGAP_NPGDP,&sy=1980&ey=2029&ssm=0&scsm=1&scc=0&ssd=1&ssc=0&sic=0&sort=country&ds=.&br=1">Link</a></td>
      </tr>
      <tr>
        <td style="text-align: left; padding: 8px;">Unemployment Rate</td>
        <td style="text-align: left; padding: 8px;">This measures the percentage of the labor force actively seeking employment but currently without work. It reflects labor market conditions and economic health, with a high rate indicating economic downturns and a low rate suggesting robust job availability.</td>
        <td style="text-align: left; padding: 8px;">U.S. Bureau of Labor Statistics (BLS)</td>
        <td style="text-align: left; padding: 8px;"><a href="https://data.bls.gov/timeseries/LNS14000000">Link</a></td>
      </tr>
      <tr>
        <td style="text-align: left; padding: 8px;">Interest Rates</td>
        <td style="text-align: left; padding: 8px;">This is the interest rate at which banks lend to each other overnight. Set by the Federal Reserve, it influences short-term interest rates and broader economic conditions, impacting borrowing costs and economic activity.</td>
        <td style="text-align: left; padding: 8px;">Federal Reserve Economic Data (FRED)</td>
        <td style="text-align: left; padding: 8px;"><a href="https://fred.stlouisfed.org/series/DFF">Link</a></td>
      </tr>
      <tr>
        <td style="text-align: left; padding: 8px;">Money Supply</td>
        <td style="text-align: left; padding: 8px;">This measures the total amount of currency in circulation, including cash, checking deposits, savings deposits, money market accounts, and other liquid assets. Adjusted for seasonal variations, it reflects the overall liquidity in the economy. Changes in M2 can indicate shifts in monetary policy and economic activity.</td>
        <td style="text-align: left; padding: 8px;">Federal Reserve Board</td>
        <td style="text-align: left; padding: 8px;"><a href="https://www.federalreserve.gov/datadownload/Download.aspx?rel=H6&series=798e2796917702a5f8423426ba7e6b42&lastobs=&from=&to=&filetype=csv&label=include&layout=seriescolumn&type=package">Link</a></td>
      </tr>
      <tr>
        <td style="text-align: left; padding: 8px;">Wage Growth</td>
        <td style="text-align: left; padding: 8px;">This measures the increase in workers' wages over time. It reflects changes in labor market conditions and economic productivity. Higher wage growth indicates improved worker earnings and potential inflationary pressures.</td>
        <td style="text-align: left; padding: 8px;">Federal Reserve Bank of Atlanta</td>
        <td style="text-align: left; padding: 8px;"><a href="https://www.atlantafed.org/chcs/wage-growth-tracker">Link</a></td>
      </tr>
      <tr>
        <td style="text-align: left; padding: 8px;">Commodity Prices</td>
        <td style="text-align: left; padding: 8px;">This measures changes in the prices of raw materials and primary agricultural products. We used the All Commodity Price Index, which includes both fuel and non-fuel price indices. It reflects inflation and demand shifts in the global economy.</td>
        <td style="text-align: left; padding: 8px;">International Monetary Fund</td>
        <td style="text-align: left; padding: 8px;"><a href="https://www.imf.org/en/Research/commodity-prices">Link</a></td>
      </tr>
      <tr>
        <td style="text-align: left; padding: 8px;">Exchange Rates</td>
        <td style="text-align: left; padding: 8px;">This measures the value of the U.S. dollar against a broad basket of foreign currencies. It reflects changes in the dollar's strength and global trade dynamics. Movements indicate shifts in international trade and monetary policy.</td>
        <td style="text-align: left; padding: 8px;">Federal Reserve Board</td>
        <td style="text-align: left; padding: 8px;"><a href="https://www.federalreserve.gov/datadownload/Download.aspx?rel=H10&series=d896a0d00241604bbbfab3d292c873c8&filetype=csv&label=include&layout=seriescolumn&from=12/01/2003&to=01/31/2025">Link</a></td>
      </tr>
      <tr>
        <td style="text-align: left; padding: 8px;">CCI</td>
        <td style="text-align: left; padding: 8px;">This measures consumer optimism about the economy's future. It reflects expectations regarding income, employment, and spending. Higher values indicate greater consumer confidence and potential economic growth.</td>
        <td style="text-align: left; padding: 8px;">Organization for Economic Co-operation and Development</td>
        <td style="text-align: left; padding: 8px;"><a href="https://www.oecd.org/en/data/indicators/consumer-confidence-index-cci.html?oecdcontrol-cf46a27224-var1=USA&oecdcontrol-b2a0dbca4d-var3=2004-12&oecdcontrol-b2a0dbca4d-var4=2024-12">Link</a></td>
      </tr>
    </tbody>
  </table>
</div>


In [88]:
import numpy as np
import pandas as pd
import os
from openpyxl import load_workbook
from openpyxl.styles import Border, Side
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.metrics import mean_squared_error, r2_score
from scipy import stats
from sklearn.utils import resample

## 3.2 Data Pre-Processing

### 3.2.1 Data Range

For the above data, we used the last 20 years of historical data, i.e. from December 2004 to December 2024. Additionally, we noted that the earliest available exchange rate data begins in January 2006. To address the gap for the period from December 2004 to December 2005, we replicated the January 2006 data for this timeframe.

### 3.2.2 Data Frequency

The dataset contains features with different time frequencies. To ensure consistency and relevance to the CPI, we adjusted all datasets to a monthly frequency.

- **Monthly Data**: The below six datasets align perfectly with our goal, as CPI is monthly. No further adjustments are needed.
    - CPI
    - Unemployment Rate
    - Wage Growth
    - Money Supply
    - Commodity Prices
    - Exchange Rates
    - CCI (added feature)
- **Yearly Data**: We interpolated or repeat yearly values to create monthly entries. Linear interpolation is a reasonable approach to estimate monthly values between yearly data points.
    - Output Gap
- **Daily Data**: We aggregated daily data to monthly by using the average value for each month to match the CPI frequency. This smooths out daily fluctuations and provides a more stable and representative value for each month.
    - Interest Rates 

### 3.2.3 Merge Data

#### 3.2.3.1 Initialize the merged dataset

In [89]:
# Define the date range, starting one month earlier to capture 2004-11 for Past CPI
date_range = pd.date_range(start='2004-11-30', end='2024-12-31', freq='M')
date_range_str = date_range.strftime('%Y-%m')

# Initialize the cpi_predict dataframe
cpi_predict = pd.DataFrame({
    'Date': date_range_str,
    'CPI': [None] * len(date_range),
    'Past CPI': [None] * len(date_range),
    'Output Gap': [None] * len(date_range),
    'Unemployment Rate': [None] * len(date_range),
    'Interest Rate': [None] * len(date_range),
    'Money Supply': [None] * len(date_range),
    'Wage Growth': [None] * len(date_range),
    'Commodity Prices': [None] * len(date_range),
    'Exchange Rates': [None] * len(date_range)
})

# Load and reshape the CPI tab data
file_path = 'cpi_predict.xlsx'

try:
    cpi_df = pd.read_excel(file_path, sheet_name='CPI', header=11)
except FileNotFoundError:
    print(f"Error: '{file_path}' not found in {os.getcwd()}")
    exit()

#### 3.2.3.2 Merge the CPI data

In [90]:
### CPI
# Melt the wide format into long format
cpi_melted = cpi_df.melt(id_vars=['Year'], value_vars=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
                                                       'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'],
                         var_name='Month', value_name='CPI')

# Create Date column in "YYYY-MM" format
cpi_melted['Date'] = cpi_melted['Year'].astype(str) + '-' + cpi_melted['Month'].map({
    'Jan': '01', 'Feb': '02', 'Mar': '03', 'Apr': '04', 'May': '05', 'Jun': '06',
    'Jul': '07', 'Aug': '08', 'Sep': '09', 'Oct': '10', 'Nov': '11', 'Dec': '12'
})

# Sort and select columns
cpi_melted = cpi_melted[['Date', 'CPI']].sort_values('Date')

# Merge CPI data
cpi_predict = cpi_predict.merge(cpi_melted, on='Date', how='left', suffixes=('', '_new'))
cpi_predict['CPI'] = cpi_predict['CPI_new'].combine_first(cpi_predict['CPI'])
cpi_predict = cpi_predict.drop(columns=['CPI_new'])

# Add Past CPI (shift CPI by 1 month)
cpi_predict['Past CPI'] = cpi_predict['CPI'].shift(1)

# Filter to keep only rows from 2004-12 onward
cpi_predict = cpi_predict[cpi_predict['Date'] >= '2004-12']

#### 3.2.3.3 Merge the Output Gap data

In [91]:
### Output Gap
# Load the Output Gap tab data
output_gap_df = pd.read_excel(file_path, sheet_name='Output Gap', header=None, skiprows=1)

# Extract relevant data for the years 2004 to 2024
output_gap_data = output_gap_df.iloc[0, 29:50].values

# Create a DataFrame to repeat yearly values for each month
output_gap_monthly = []
for year, value in zip(range(2004, 2025), output_gap_data):
    for month in range(1, 13):
        output_gap_monthly.append({'Date': f"{year}-{str(month).zfill(2)}", 'Output Gap': value})

# Convert to DataFrame
output_gap_monthly_df = pd.DataFrame(output_gap_monthly)

# Merge Output Gap data
cpi_predict = cpi_predict.merge(output_gap_monthly_df, on='Date', how='left', suffixes=('', '_gap'))
cpi_predict['Output Gap'] = cpi_predict['Output Gap_gap'].combine_first(cpi_predict['Output Gap'])
cpi_predict = cpi_predict.drop(columns=['Output Gap_gap'])

#### 3.2.3.4 Merge the Unemployment Rate data

In [92]:
### Unemployment Rate
# Load and reshape the Unemployment Rate tab data
try:
    unemployment_df = pd.read_excel(file_path, sheet_name='Unemployment Rate', header=11)
except FileNotFoundError:
    print(f"Error: '{file_path}' not found in {os.getcwd()}")
    exit()

# Melt the wide format into long format
unemployment_melted = unemployment_df.melt(id_vars=['Year'], value_vars=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
                                                                         'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'],
                                           var_name='Month', value_name='Unemployment Rate')

# Create Date column in "YYYY-MM" format
unemployment_melted['Date'] = unemployment_melted['Year'].astype(str) + '-' + unemployment_melted['Month'].map({
    'Jan': '01', 'Feb': '02', 'Mar': '03', 'Apr': '04', 'May': '05', 'Jun': '06',
    'Jul': '07', 'Aug': '08', 'Sep': '09', 'Oct': '10', 'Nov': '11', 'Dec': '12'
})

# Sort and select columns
unemployment_melted = unemployment_melted[['Date', 'Unemployment Rate']].sort_values('Date')

# Merge Unemployment Rate data
cpi_predict = cpi_predict.merge(unemployment_melted, on='Date', how='left', suffixes=('', '_new'))
cpi_predict['Unemployment Rate'] = cpi_predict['Unemployment Rate_new'].combine_first(cpi_predict['Unemployment Rate'])
cpi_predict = cpi_predict.drop(columns=['Unemployment Rate_new'])

#### 3.2.3.5 Merge the Interest Rates data

In [93]:
### Interest Rates
# Load daily interest rates data
try:
    interest_rate_df = pd.read_excel(file_path, sheet_name='Interest Rates', header=0)
except FileNotFoundError:
    print(f"Error: '{file_path}' not found in {os.getcwd()}")
    exit()

# Ensure the date column is in datetime format
interest_rate_df['observation_date'] = pd.to_datetime(interest_rate_df['observation_date'])

# Resample to monthly frequency and calculate the average
interest_rate_monthly = interest_rate_df.resample('M', on='observation_date').mean().reset_index()

# Convert Date to "YYYY-MM" format
interest_rate_monthly['Date'] = interest_rate_monthly['observation_date'].dt.strftime('%Y-%m')

# Merge interest rates data
cpi_predict = cpi_predict.merge(interest_rate_monthly[['Date', 'DFF']], on='Date', how='left', suffixes=('', '_new'))
cpi_predict['Interest Rate'] = cpi_predict['DFF'].combine_first(cpi_predict['Interest Rate'])
cpi_predict = cpi_predict.drop(columns=['DFF'])

#### 3.2.3.6 Merge the Money Supply data

In [94]:
# Money Supply Data
# Load monthly money supply data
try:
    money_supply_df = pd.read_excel(file_path, sheet_name='Money Supply', header=0)
except FileNotFoundError:
    print(f"Error: '{file_path}' not found in {os.getcwd()}")
    exit()

# Filter the relevant columns and date range
money_supply_df = money_supply_df[['Series Description', 'M2; Seasonally adjusted']]
money_supply_df.columns = ['Date', 'Money Supply']

# Remove non-date rows
money_supply_df = money_supply_df[money_supply_df['Date'].str.match(r'\d{4}-\d{2}')]

# Convert Date to "YYYY-MM" format
money_supply_df['Date'] = pd.to_datetime(money_supply_df['Date']).dt.strftime('%Y-%m')

# Filter to keep only rows from 2004-12 onward
money_supply_df = money_supply_df[money_supply_df['Date'] >= '2004-12']
money_supply_df = money_supply_df[money_supply_df['Date'] <= '2024-12']

# Merge Money Supply data
cpi_predict = cpi_predict.merge(money_supply_df, on='Date', how='left', suffixes=('', '_new'))
cpi_predict['Money Supply'] = cpi_predict['Money Supply_new'].combine_first(cpi_predict['Money Supply'])
cpi_predict = cpi_predict.drop(columns=['Money Supply_new'])

#### 3.2.3.7 Merge the Wage Growth data

In [95]:
### Wage Growth
# Load monthly wage growth data
try:
    wage_growth_df = pd.read_excel(file_path, sheet_name='Wage Growth', skiprows=9, header=None)
except FileNotFoundError:
    print(f"Error: '{file_path}' not found in {os.getcwd()}")
    exit()

# Select the first two columns (i.e. date and overall wage growth)
wage_growth_df = wage_growth_df.iloc[:, [0, 1]]
wage_growth_df.columns = ['Date', 'Wage Growth']

# Convert Date to "YYYY-MM" format
wage_growth_df['Date'] = pd.to_datetime(wage_growth_df['Date']).dt.strftime('%Y-%m')

# Filter to keep only rows from 2004-12 onward
wage_growth_df = wage_growth_df[wage_growth_df['Date'] >= '2004-12']
wage_growth_df = wage_growth_df[wage_growth_df['Date'] <= '2024-12']

# Merge Wage Growth data into cpi_predict DataFrame
cpi_predict = cpi_predict.merge(wage_growth_df, on='Date', how='left', suffixes=('', '_new'))
cpi_predict['Wage Growth'] = cpi_predict['Wage Growth_new'].combine_first(cpi_predict['Wage Growth'])
cpi_predict = cpi_predict.drop(columns=['Wage Growth_new'])

#### 3.2.3.8 Merge the Commodity Prices data

In [96]:
### Commodity Prices
# Load monthly commodity prices data
try:
    commodity_prices_df = pd.read_excel(file_path, sheet_name='Commodity Prices', skiprows=3, header=None)
except FileNotFoundError:
    print(f"Error: '{file_path}' not found in {os.getcwd()}")
    exit()

# Select only the first two columns
commodity_prices_df = commodity_prices_df.iloc[:, [0, 1]]
commodity_prices_df.columns = ['Date', 'Commodity Prices']

# Remove non-date rows (e.g., headers)
commodity_prices_df = commodity_prices_df[commodity_prices_df['Date'].str.match(r'^\d{4}M\d{1,2}$')]

# Convert Date to "YYYY-MM" format
commodity_prices_df['Date'] = pd.to_datetime(commodity_prices_df['Date'].str.replace('M', '-'), format='%Y-%m').dt.strftime('%Y-%m')

# Filter to keep only rows from 2004-12 onward
commodity_prices_df = commodity_prices_df[commodity_prices_df['Date'] >= '2004-12']
commodity_prices_df = commodity_prices_df[commodity_prices_df['Date'] <= '2024-12']

# Merge Commodity Prices data into cpi_predict DataFrame
cpi_predict = cpi_predict.merge(commodity_prices_df, on='Date', how='left', suffixes=('', '_new'))
cpi_predict['Commodity Prices'] = cpi_predict['Commodity Prices_new'].combine_first(cpi_predict['Commodity Prices'])
cpi_predict = cpi_predict.drop(columns=['Commodity Prices_new'])

#### 3.2.3.9 Merge the Exchange Rates data

In [97]:
### Exchange Rates
# Load monthly exchange rates data
try:
    exchange_rates_df = pd.read_excel(file_path, sheet_name='Exchange Rates', skiprows=6, header=None)
except FileNotFoundError:
    print(f"Error: '{file_path}' not found in {os.getcwd()}")
    exit()

# Select only the first two columns
exchange_rates_df = exchange_rates_df.iloc[:, [0, 1]]
exchange_rates_df.columns = ['Date', 'Exchange Rates']

# Convert Date to "YYYY-MM" format if necessary
exchange_rates_df['Date'] = pd.to_datetime(exchange_rates_df['Date'], format='%Y-%m').dt.strftime('%Y-%m')

# Filter to keep only rows from 2004-12 onward
exchange_rates_df = exchange_rates_df[exchange_rates_df['Date'] >= '2004-12']
exchange_rates_df = exchange_rates_df[exchange_rates_df['Date'] <= '2024-12']

# Add missing dates from 2004-12 to 2005-12 with exchange rate 100
missing_dates = pd.date_range(start='2004-12', end='2006-01', freq='M').strftime('%Y-%m')
missing_data = pd.DataFrame({'Date': missing_dates, 'Exchange Rates': 100})
exchange_rates_df = pd.concat([missing_data, exchange_rates_df]).reset_index(drop=True)

# Merge Exchange Rates data into cpi_predict DataFrame
cpi_predict = cpi_predict.merge(exchange_rates_df, on='Date', how='left', suffixes=('', '_new'))
cpi_predict['Exchange Rates'] = cpi_predict['Exchange Rates_new'].combine_first(cpi_predict['Exchange Rates'])
cpi_predict = cpi_predict.drop(columns=['Exchange Rates_new'])

#### 3.2.3.10 Save the final merged dataset

In [98]:
# Save to the same file, preserving other tabs
try:
    with pd.ExcelWriter(file_path, mode='a', engine='openpyxl', if_sheet_exists='replace') as writer:
        cpi_predict.to_excel(writer, sheet_name='cpi_predict', index=False)

        # Remove borders from the header row
        workbook = writer.book
        worksheet = writer.sheets['cpi_predict']
        for cell in worksheet[1]:
            cell.border = Border(left=Side(style=None), right=Side(style=None),
                                 top=Side(style=None), bottom=Side(style=None))

    print(f"Data migrated successfully. Check '{file_path}'.")

except PermissionError:
    print(f"Permission denied when writing to '{file_path}'. Close the file if open, or check permissions.")

except Exception as e:
    print(f"An error occurred: {e}")

Data migrated successfully. Check 'cpi_predict.xlsx'.


# 4. Modeling

### 4.1 Split Data for train and test

In [99]:
# Load the data from the specific tab in the Excel file
file_path = 'cpi_predict.xlsx'
sheet_name = 'cpi_predict'
data = pd.read_excel(file_path, sheet_name=sheet_name)

# Calculate the index for the split
split_index = int(len(data) * 0.8)

# Split the data into training and testing sets
train_data = data[:split_index]
test_data = data[split_index:]

# Display the shapes of the resulting datasets
print("Training Data Shape:", train_data.shape)
print("Testing Data Shape:", test_data.shape)

Training Data Shape: (192, 10)
Testing Data Shape: (49, 10)


### 4.2 Model Training and Evaluation

In [100]:
# Define the features and target variable
features = ['Past CPI', 'Output Gap', 'Unemployment Rate', 'Interest Rate', 'Money Supply', 'Wage Growth', 'Commodity Prices', 'Exchange Rates']
target = 'CPI'

# Split the data into features (X) and target (y)
X = data[features]
y = data[target]

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the models
linear_model = LinearRegression()
ridge_model = Ridge(alpha=1.0)
lasso_model = Lasso(alpha=0.1)

# Train the models
linear_model.fit(X_train, y_train)
ridge_model.fit(X_train, y_train)
lasso_model.fit(X_train, y_train)

# Get predictions on the test set
linear_pred = linear_model.predict(X_test)
ridge_pred = ridge_model.predict(X_test)
lasso_pred = lasso_model.predict(X_test)

# Calculate residuals for linear regression
linear_residuals = y_test - linear_pred

# Degrees of freedom for linear regression
n = X_train.shape[0]  # Number of observations in the training set
p = X_train.shape[1]  # Number of predictors
df = n - p - 1  # Degrees of freedom

# Standard error of the estimate for linear regression
sse = np.sum(linear_residuals ** 2)  # Sum of squared errors
mse = sse / df  # Mean squared error
se = np.sqrt(mse)  # Standard error

# Standard errors of coefficients for linear regression
X_with_intercept = np.column_stack((np.ones(X_train.shape[0]), X_train))
beta = np.append(linear_model.intercept_, linear_model.coef_)  # Include intercept
cov_matrix = np.linalg.inv(X_with_intercept.T @ X_with_intercept) * mse
se_coefficients = np.sqrt(np.diag(cov_matrix))

# t-statistics and p-values for linear regression
t_stats = beta / se_coefficients
p_values = [2 * (1 - stats.t.cdf(np.abs(t), df)) for t in t_stats]

# Create a summary DataFrame for linear regression
linear_summary = pd.DataFrame({
    'Coefficient': beta,
    'Std. Error': se_coefficients,
    't-statistic': t_stats,
    'p-value': p_values
}, index=['Intercept'] + X.columns.tolist())

# R-squared and adjusted R-squared for linear regression
linear_r_squared = linear_model.score(X_test, y_test)
linear_adjusted_r_squared = 1 - (1 - linear_r_squared) * (n - 1) / df

# Evaluate the models
linear_mse = mean_squared_error(y_test, linear_pred)
ridge_mse = mean_squared_error(y_test, ridge_pred)
lasso_mse = mean_squared_error(y_test, lasso_pred)

linear_r2 = r2_score(y_test, linear_pred)
ridge_r2 = r2_score(y_test, ridge_pred)
lasso_r2 = r2_score(y_test, lasso_pred)

# Extract coefficients from the ridge and lasso models
ridge_coef = np.append(ridge_model.intercept_, ridge_model.coef_)
lasso_coef = np.append(lasso_model.intercept_, lasso_model.coef_)

# Create a summary DataFrame for ridge regression
ridge_summary = pd.DataFrame({
    'Coefficient': ridge_coef
}, index=['Intercept'] + X.columns.tolist())

# Create a summary DataFrame for lasso regression
lasso_summary = pd.DataFrame({
    'Coefficient': lasso_coef
}, index=['Intercept'] + X.columns.tolist())

# Print the summary for the linear model
print("\nLinear Regression Summary:")
print(linear_summary)

# Print the summary for the ridge model
print("\nRidge Regression Summary:")
print(ridge_summary)

# Print the summary for the lasso model
print("\nLasso Regression Summary:")
print(lasso_summary)

# Create a summary DataFrame for MSE and R-squared values
model_summary = pd.DataFrame({
    'Model': ['Linear Regression', 'Ridge Regression', 'Lasso Regression'],
    'MSE': [linear_mse, ridge_mse, lasso_mse],
    'R-squared': [linear_r2, ridge_r2, lasso_r2]
})

print("\nSummary for Three Models:")
print(model_summary)


Linear Regression Summary:
                   Coefficient  Std. Error  t-statistic       p-value
Intercept             9.548206    0.826117    11.557931  0.000000e+00
Past CPI              0.940740    0.004077   230.725326  0.000000e+00
Output Gap           -0.263071    0.034355    -7.657349  1.061151e-12
Unemployment Rate    -0.184204    0.020393    -9.032917  2.220446e-16
Interest Rate         0.251766    0.026665     9.442004  0.000000e+00
Money Supply          0.000397    0.000025    16.187655  0.000000e+00
Wage Growth          -0.192554    0.044095    -4.366801  2.106614e-05
Commodity Prices      0.012662    0.001620     7.817998  4.096723e-13
Exchange Rates       -0.003621    0.008314    -0.435589  6.636483e-01

Ridge Regression Summary:
                   Coefficient
Intercept             9.503339
Past CPI              0.941186
Output Gap           -0.257665
Unemployment Rate    -0.181534
Interest Rate         0.247295
Money Supply          0.000395
Wage Growth          -0.1869

# 5. Analysis

## 5.1 Initial Observations and Interpretations
### Observations on coefficients
#### Simple Linear Regression
  - The coefficients indicate the relationship between each feature and the CPI. Holding all else constant, a high positive or a high negative coefficient indicates the strong positive or negative relationship with the CPI;
  - Among all eight features, Past CPI has the highest coefficient of approximately 0.94, meaning that one unit increase in Past CPI will increase the current CPI by approximately 0.94 unit;
  - Past CPI, Interest Rate, Money Supply and Commodity Prices are positively correlated to the CPI; and
    - The CPI is often persistent over time. Past inflation rates can be a good predictor of future inflation.
    - Higher interest rates can lead to higher borrowing costs, which can increase the cost of goods and services. Also, central banks might raise interest rates in response to rising inflation.
    - More money in the economy can increase demand for goods and services, which drives up prices.
    - The increase in prices of commodities will lead to the increase of the production costs for many goods, which results in higher consumer prices. 
  - Output Gap, Unemployment Rate, Wage Growth, and Exchange Rate are negatively correlated to the CPI.
    - The output gap measures the difference between actual economic output and potential output. A negative output gap typically indicates the underutilized resources as the actual output is less than potential output, leading to lower inflation.
    - Higher unemployment rates means less consumer spending and lower wage pressures, which can lead to lower inflation.
    - The coefficient of wage growth is counterintuitive here as usually higher wage will increase the consumer spending. The increase in demand will drive up prices, leading to a higher CPI. Potential reason to explain the negative coefficient might be that if the wage growth is modest or if higher wages lead to increased productivity, the overall cost of production might now increase siginificantly.
    - A stronger domestic currency can lead to lower inflation because it reduces the cost of imported goods and services.

#### Ridge Regression
  - The coefficients are similar to those in the linear regression model but slightly shrunk towards zero due to the regularization effect of Ridge Regression.
  
#### Lasso Regression
  - The Lasso Regression model has set some coefficients to zero (i.e., "Output Gap" and "Wage Growth"), indicating that these features are not important predictors in this model. This is a result of the Lasso's ability to perform feature selection.

### Observations on statistical significance
#### Simple Linear Regression
  - Most features have very low p-values (i.e. close to 0), which indicates that they are statistically significant predictors of the CPI, apart from the Exchange Rates, which has a high p-value of 0.66.
  - The high p-value for Exchange Rates suggests that it is not a significant predictor in this model.
  
#### Ridge Regression and Lasso Regression
Ridge and Lasso regression models do not typically provide p-values because the regularization terms alter the optimization process, which introduces bias and violates the assumptions required for standard statistical inference. Therefore, the calculation of standard errors and p-values is not straightforward in these models.

## 5.2 Model Comparison
- **MSE and R-squared**
  - The Lasso Regression model has the lowest MSE (i.e. 0.307337), followed by Ridge Regression model (0.334279) and Linear Regression model (i.e. 0.335420). This indicates that the Lasso Regression model has the best predictive accuracy among the three models.
  - All three models have very high R-squared values, with Lasso Regression model having the highest (i.e. 0.999757), followed by Ridge Regression model (i.e. 0.999736) and Linear Regression model (i.e. 0.999735). Same as above, this indicates that the Lasso Regression model has the best accuracy among three models.
  
- **Differences in performance**
  - Linear Regression model provides a baseline for comparison. It has high explanatory power but may overfit, especially with a large number of features. In our case, the Linear Regression model includes all the features to predict the CPI. Compared to Ridge and Lasso Regression models, the MSE is larger and R-squared is slightly smaller. This potentially indicates the overfitting risk.
  - Ridge Regression model introduces regularization to help prevent overfitting by shrinking the coefficients. This can lead to better generalization on unseen data. In our case, Ridge Regression model shrinks the coefficients of Unemployment Rate, Interest Rate, Money Supply, Wage Growth and Commodity Prices, which reduces the MSE and improves the R-squared.
  - Lasso Regression model not only regularizes the coefficients but also performs feature selection. This can improve model interpretability and reduce the risk of overfitting further. In our case, Lasso Regression model sets the coefficients of Output Gap and Wage Growth to be zero, meaning that these two features are potentially not important to predict CPI and should be excluded, which reduces the MSE and improved the R-sqaured further.

## 5.3 Additional Interpretation
  - **Most Relevant Feature:** Past CPI is consistently important across all three models. Past CPI has coefficients of 0.940740 in Linear Regression, 0.941186 in Ridge Regression, and 0.961915 in Lasso Regression, which indicates a strong positive relationship with CPI.
  - **Moderately Relevant Features:** Unemployment Rate, Interest Rates, Money Supply, Commodity Prices and Exchange Rates are moderately relevant features to predict the CPI. 
  - **Least Relevant Features:** Output Gap, Wage Growth appear to be less relevant, as they are set to zero in Lasso Regression model.

# 6. Further Modeling and Analysis

## 6.1 Additional Feature

### Consumer Confidence Index

We think Consumer Confidence Index (CCI) could be introduced for the model improvement. CCI measures the degree of optimism that consumers feel about the overall state of the economy and their personal financial situation. It is a leading indicator that can influence consumer spending and economic activity, which in turn can impact the CPI.

## 6.2 Adding CCI Data

In [102]:
### CCI
# Load the CPI tab data
file_path = 'cpi_predict.xlsx'

# Load the CCI tab data
try:
    CCI_df = pd.read_excel(file_path, sheet_name='CCI', skiprows=3, header=None)
except FileNotFoundError:
    print(f"Error: '{file_path}' not found in {os.getcwd()}")
    exit()

# Select only the first two columns
CCI_df = CCI_df.iloc[:, [0, 1]]
CCI_df.columns = ['Date', 'CCI']

# Convert Date to "YYYY-MM" format if necessary
CCI_df['Date'] = pd.to_datetime(CCI_df['Date']).dt.strftime('%Y-%m')

# Merge CCI data into cpi_predict DataFrame
cpi_predict = cpi_predict.merge(CCI_df, on='Date', how='left', suffixes=('', '_new'))
cpi_predict['CCI'] = cpi_predict['CCI_new'].combine_first(cpi_predict['CCI'])
cpi_predict = cpi_predict.drop(columns=['CCI_new'])

# Save to the same file, preserving other tabs
try:
    with pd.ExcelWriter(file_path, mode='a', engine='openpyxl', if_sheet_exists='replace') as writer:
        cpi_predict.to_excel(writer, sheet_name='cpi_predict', index=False)

        # Remove borders from the header row
        workbook = writer.book
        worksheet = writer.sheets['cpi_predict']
        for cell in worksheet[1]:
            cell.border = Border(left=Side(style=None), right=Side(style=None),
                                 top=Side(style=None), bottom=Side(style=None))

    print(f"Data further migrated successfully. Check '{file_path}'.")

except PermissionError:
    print(f"Permission denied when writing to '{file_path}'. Close the file if open, or check permissions.")

except Exception as e:
    print(f"An error occurred: {e}")

Data further migrated successfully. Check 'cpi_predict.xlsx'.


## 6.3 Further Modeling

In [103]:
# Load the CPI tab data
file_path = 'cpi_predict.xlsx'

try:
    cpi_predict = pd.read_excel(file_path, sheet_name='cpi_predict')
except FileNotFoundError:
    print(f"Error: '{file_path}' not found in {os.getcwd()}")
    exit()

# Load the CCI tab data
try:
    CCI_df = pd.read_excel(file_path, sheet_name='CCI', skiprows=3, header=None)
except FileNotFoundError:
    print(f"Error: '{file_path}' not found in {os.getcwd()}")
    exit()

# Define the features and target variable
features = ['Past CPI', 'Output Gap', 'Unemployment Rate', 'Interest Rate', 'Money Supply', 'Wage Growth', 'Commodity Prices', 'Exchange Rates', 'CCI']
target = 'CPI'

# Split the data into features (X) and target (y)
X = cpi_predict[features]
y = cpi_predict[target]

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the models
linear_model = LinearRegression()
ridge_model = Ridge(alpha=1.0)
lasso_model = Lasso(alpha=0.1)

# Train the models
linear_model.fit(X_train, y_train)
ridge_model.fit(X_train, y_train)
lasso_model.fit(X_train, y_train)

# Get predictions on the test set
linear_pred = linear_model.predict(X_test)
ridge_pred = ridge_model.predict(X_test)
lasso_pred = lasso_model.predict(X_test)

# Calculate residuals for linear regression
linear_residuals = y_test - linear_pred

# Degrees of freedom for linear regression
n = X_train.shape[0]  # Number of observations in the training set
p = X_train.shape[1]  # Number of predictors
df = n - p - 1  # Degrees of freedom

# Standard error of the estimate for linear regression
sse = np.sum(linear_residuals ** 2)  # Sum of squared errors
mse = sse / df  # Mean squared error
se = np.sqrt(mse)  # Standard error

# Standard errors of coefficients for linear regression
X_with_intercept = np.column_stack((np.ones(X_train.shape[0]), X_train))
beta = np.append(linear_model.intercept_, linear_model.coef_)  # Include intercept
cov_matrix = np.linalg.inv(X_with_intercept.T @ X_with_intercept) * mse
se_coefficients = np.sqrt(np.diag(cov_matrix))

# t-statistics and p-values for linear regression
t_stats = beta / se_coefficients
p_values = [2 * (1 - stats.t.cdf(np.abs(t), df)) for t in t_stats]

# Create a summary DataFrame for linear regression
linear_summary = pd.DataFrame({
    'Coefficient': beta,
    'Std. Error': se_coefficients,
    't-statistic': t_stats,
    'p-value': p_values
}, index=['Intercept'] + X.columns.tolist())

# R-squared and adjusted R-squared for linear regression
linear_r_squared = linear_model.score(X_test, y_test)
linear_adjusted_r_squared = 1 - (1 - linear_r_squared) * (n - 1) / df

# Evaluate the models
linear_mse = mean_squared_error(y_test, linear_pred)
ridge_mse = mean_squared_error(y_test, ridge_pred)
lasso_mse = mean_squared_error(y_test, lasso_pred)

linear_r2 = r2_score(y_test, linear_pred)
ridge_r2 = r2_score(y_test, ridge_pred)
lasso_r2 = r2_score(y_test, lasso_pred)

# Extract coefficients from the ridge and lasso models
ridge_coef = np.append(ridge_model.intercept_, ridge_model.coef_)
lasso_coef = np.append(lasso_model.intercept_, lasso_model.coef_)

# Create a summary DataFrame for ridge regression
ridge_summary = pd.DataFrame({
    'Coefficient': ridge_coef
}, index=['Intercept'] + X.columns.tolist())

# Create a summary DataFrame for lasso regression
lasso_summary = pd.DataFrame({
    'Coefficient': lasso_coef
}, index=['Intercept'] + X.columns.tolist())

# Print the summary for the linear model
print("\nLinear Regression Summary:")
print(linear_summary)

# Print the summary for the ridge model
print("\nRidge Regression Summary:")
print(ridge_summary)

# Print the summary for the lasso model
print("\nLasso Regression Summary:")
print(lasso_summary)

# Create a summary DataFrame for MSE and R-squared values
model_summary = pd.DataFrame({
    'Model': ['Linear Regression', 'Ridge Regression', 'Lasso Regression'],
    'MSE': [linear_mse, ridge_mse, lasso_mse],
    'R-squared': [linear_r2, ridge_r2, lasso_r2]
})

print("\nSummary for Three Models:")
print(model_summary)


Linear Regression Summary:
                   Coefficient  Std. Error  t-statistic       p-value
Intercept            10.980356    4.056649     2.706755  7.441191e-03
Past CPI              0.940334    0.004246   221.452438  0.000000e+00
Output Gap           -0.261988    0.034629    -7.565498  1.852740e-12
Unemployment Rate    -0.190776    0.027411    -6.959764  5.964185e-11
Interest Rate         0.253570    0.027239     9.309172  0.000000e+00
Money Supply          0.000399    0.000025    15.843976  0.000000e+00
Wage Growth          -0.210204    0.065996    -3.185113  1.702642e-03
Commodity Prices      0.012492    0.001693     7.376280  5.568213e-12
Exchange Rates       -0.002880    0.008598    -0.335005  7.380073e-01
CCI                  -0.013225    0.036669    -0.360659  7.187723e-01

Ridge Regression Summary:
                   Coefficient
Intercept            10.376810
Past CPI              0.940950
Output Gap           -0.256973
Unemployment Rate    -0.185510
Interest Rate       

## 6.3 Further Analysis

<div style="text-align: left;">
  <table style="width: 100%; border-collapse: collapse;">
    <thead>
      <tr>
        <th style="text-align: left; padding: 8px;">Model</th>
        <th style="text-align: left; padding: 8px;">MSE Without CCI</th>
        <th style="text-align: left; padding: 8px;">MSE With CCI</th>
        <th style="text-align: left; padding: 8px;">R-squared Without CCI</th>
        <th style="text-align: left; padding: 8px;">R-squared With CCI</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td style="text-align: left; padding: 8px;">Linear Regression</td>
        <td style="text-align: center; padding: 8px;">0.335420</td>
        <td style="text-align: center; padding: 8px;">0.336377</td>
        <td style="text-align: center; padding: 8px;">0.999735</td>
        <td style="text-align: center; padding: 8px;">0.999734</td>
      </tr>
      <tr>
        <td style="text-align: left; padding: 8px;">Ridge Regression</td>
        <td style="text-align: center; padding: 8px;">0.334279</td>
        <td style="text-align: center; padding: 8px;">0.334822</td>
        <td style="text-align: center; padding: 8px;">0.999736</td>
        <td style="text-align: center; padding: 8px;">0.999735</td>
      </tr>
      <tr>
        <td style="text-align: left; padding: 8px;">Lasso Regression</td>
        <td style="text-align: center; padding: 8px;">0.307337</td>
        <td style="text-align: center; padding: 8px;">0.307337</td>
        <td style="text-align: center; padding: 8px;">0.999757</td>
        <td style="text-align: center; padding: 8px;">0.999757</td>
      </tr>
    </tbody>
  </table>
</div>


<div style="text-align: left;">
  <table style="width: 100%; border-collapse: collapse;">
    <thead>
      <tr>
        <th style="text-align: left; padding: 8px;">Feature</th>
        <th style="text-align: left; padding: 8px;">Linear Regression (Without CCI)</th>
        <th style="text-align: left; padding: 8px;">Linear Regression (With CCI)</th>
        <th style="text-align: left; padding: 8px;">Ridge Regression (Without CCI)</th>
        <th style="text-align: left; padding: 8px;">Ridge Regression (With CCI)</th>
        <th style="text-align: left; padding: 8px;">Lasso Regression (Without CCI)</th>
        <th style="text-align: left; padding: 8px;">Lasso Regression (With CCI)</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td style="text-align: left; padding: 8px;">Intercept</td>
        <td style="text-align: center; padding: 8px;">9.548206</td>
        <td style="text-align: center; padding: 8px;">10.980356</td>
        <td style="text-align: center; padding: 8px;">9.503339</td>
        <td style="text-align: center; padding: 8px;">10.376810</td>
        <td style="text-align: center; padding: 8px;">5.495578</td>
        <td style="text-align: center; padding: 8px;">5.495578</td>
      </tr>
      <tr>
        <td style="text-align: left; padding: 8px;">Past CPI</td>
        <td style="text-align: center; padding: 8px;">0.940740</td>
        <td style="text-align: center; padding: 8px;">0.940334</td>
        <td style="text-align: center; padding: 8px;">0.941186</td>
        <td style="text-align: center; padding: 8px;">0.940950</td>
        <td style="text-align: center; padding: 8px;">0.961915</td>
        <td style="text-align: center; padding: 8px;">0.961915</td>
      </tr>
      <tr>
        <td style="text-align: left; padding: 8px;">Output Gap</td>
        <td style="text-align: center; padding: 8px;">-0.263071</td>
        <td style="text-align: center; padding: 8px;">-0.261988</td>
        <td style="text-align: center; padding: 8px;">-0.257665</td>
        <td style="text-align: center; padding: 8px;">-0.256973</td>
        <td style="text-align: center; padding: 8px;">-0.000000</td>
        <td style="text-align: center; padding: 8px;">-0.000000</td>
      </tr>
      <tr>
        <td style="text-align: left; padding: 8px;">Unemployment Rate</td>
        <td style="text-align: center; padding: 8px;">-0.184204</td>
        <td style="text-align: center; padding: 8px;">-0.190776</td>
        <td style="text-align: center; padding: 8px;">-0.181534</td>
        <td style="text-align: center; padding: 8px;">-0.185510</td>
        <td style="text-align: center; padding: 8px;">-0.037622</td>
        <td style="text-align: center; padding: 8px;">-0.037622</td>
      </tr>
      <tr>
        <td style="text-align: left; padding: 8px;">Interest Rate</td>
        <td style="text-align: center; padding: 8px;">0.251766</td>
        <td style="text-align: center; padding: 8px;">0.253570</td>
        <td style="text-align: center; padding: 8px;">0.247295</td>
        <td style="text-align: center; padding: 8px;">0.248307</td>
        <td style="text-align: center; padding: 8px;">0.044781</td>
        <td style="text-align: center; padding: 8px;">0.044781</td>
      </tr>
      <tr>
        <td style="text-align: left; padding: 8px;">Money Supply</td>
        <td style="text-align: center; padding: 8px;">0.000397</td>
        <td style="text-align: center; padding: 8px;">0.000399</td>
        <td style="text-align: center; padding: 8px;">0.000395</td>
        <td style="text-align: center; padding: 8px;">0.000396</td>
        <td style="text-align: center; padding: 8px;">0.000269</td>
        <td style="text-align: center; padding: 8px;">0.000269</td>
      </tr>
      <tr>
        <td style="text-align: left; padding: 8px;">Wage Growth</td>
        <td style="text-align: center; padding: 8px;">-0.192554</td>
        <td style="text-align: center; padding: 8px;">-0.210204</td>
        <td style="text-align: center; padding: 8px;">-0.186939</td>
        <td style="text-align: center; padding: 8px;">-0.197464</td>
        <td style="text-align: center; padding: 8px;">-0.000000</td>
        <td style="text-align: center; padding: 8px;">-0.000000</td>
      </tr>
      <tr>
        <td style="text-align: left; padding: 8px;">Commodity Prices</td>
        <td style="text-align: center; padding: 8px;">0.012662</td>
        <td style="text-align: center; padding: 8px;">0.012492</td>
        <td style="text-align: center; padding: 8px;">0.012541</td>
        <td style="text-align: center; padding: 8px;">0.012433</td>
        <td style="text-align: center; padding: 8px;">0.009720</td>
        <td style="text-align: center; padding: 8px;">0.009720</td>
      </tr>
      <tr>
        <td style="text-align: left; padding: 8px;">Exchange Rates</td>
        <td style="text-align: center; padding: 8px;">-0.003621</td>
        <td style="text-align: center; padding: 8px;">-0.002880</td>
        <td style="text-align: center; padding: 8px;">-0.004062</td>
        <td style="text-align: center; padding: 8px;">-0.003630</td>
        <td style="text-align: center; padding: 8px;">-0.004490</td>
        <td style="text-align: center; padding: 8px;">-0.004490</td>
      </tr>
      <tr>
        <td style="text-align: left; padding: 8px;">CCI</td>
        <td style="text-align: center; padding: 8px;">-</td>
        <td style="text-align: center; padding: 8px;">-0.013225</td>
        <td style="text-align: center; padding: 8px;">-</td>
        <td style="text-align: center; padding: 8px;">-0.008068</td>
        <td style="text-align: center; padding: 8px;">-</td>
        <td style="text-align: center; padding: 8px;">0.000000</td>
      </tr>
    </tbody>
  </table>
</div>


The two tables above show the comparison of model performance metrics (MSE and R-squared) and the coefficients of the features for the Linear, Ridge, and Lasso Regression models with and without the CCI feature.

From the comparison, it is shown that the inclusion of the CCI feature unfortunately does not significantly improve the model's performance. The MSE and R-squared values remain almost unchanged, and the coefficient for CCI is very small and not statistically significant in the linear regression model.

Based on the analysis, the CCI feature does not appear to provide additional predictive power for the CPI model. The existing features already capture the necessary information to predict CPI effectively. Therefore, while the CCI is an interesting economic indicator, it does not significantly enhance the model's performance in this context.