# Role 3 Evaluation Notebook

This notebook assumes **you have a file named `predictions.csv` in the same folder as this notebook**.

It will:
- Load `predictions.csv`
- Detect the actual and prediction columns
- Compute RMSE, MAE, MAPE, R², and Directional Accuracy (DA)
- Create comparison tables and visualizations.

> **Usage:** Place this notebook and `predictions.csv` in the same directory, then run all cells.

## Install dependencies from `requirements.txt` (optional)

If you have a `requirements.txt` file in the **same folder** as this notebook, you can run the cell below
to install all required packages into your current environment.

> If everything is already installed (for example, you’re using a pre-configured virtual environment),
> you can skip this cell.


In [None]:
# Install dependencies listed in requirements.txt
# Make sure requirements.txt is in the same folder as this notebook.
%pip install -r requirements.txt


## 1. Imports

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

%matplotlib inline

## 2. Load `predictions.csv`

This version assumes `predictions.csv` is in the **same folder** as this notebook.
If it's somewhere else, update the path in the cell below.

In [None]:
# Change the path below if your file is not in the same folder.
df = pd.read_csv('predictions.csv')

print('Columns found in predictions.csv:')
print(df.columns)
df.head()

## 3. Identify Actual, Date, and Prediction Columns

This cell automatically:
- Detects the **actual price** column (searching for `Actual`, `Close`, or `Price` in the name).
- Detects a **Date** column if present.
- Treats all remaining columns as **model prediction** columns.

In [None]:
# Detect actual/true price column (case-insensitive)
actual_col = None
for c in df.columns:
    name = c.lower()
    if 'actual' in name or 'close' in name or 'price' in name:
        actual_col = c
        break

if actual_col is None:
    raise ValueError(
        'Could not automatically find the actual price column. '
        'Rename the true-price column to include "Actual", "Close", or "Price".'
    )

# Detect date column if present
date_col = None
for c in df.columns:
    if 'date' in c.lower():
        date_col = c
        break

# Prediction columns = all columns except actual + date
exclude_cols = [actual_col]
if date_col is not None:
    exclude_cols.append(date_col)

model_cols = [c for c in df.columns if c not in exclude_cols]

print(f'Detected actual price column: {actual_col}')
if date_col:
    print(f'Detected date column: {date_col}')
print('Detected model prediction columns:', model_cols)

# Parse date column if present
if date_col is not None:
    try:
        df[date_col] = pd.to_datetime(df[date_col])
    except Exception as e:
        print('Warning: could not parse Date column as datetime:', e)

y_true = df[actual_col]

## 4. Define Metric Functions

We define helper functions for:
- **Directional Accuracy (DA)** – fraction of times the model gets the direction of change correct.
- **MAPE** – Mean Absolute Percentage Error.


In [None]:
def directional_accuracy(y_true, y_pred):
    """Directional Accuracy: fraction of times the model gets the direction of change right."""
    true_diff = y_true.diff()
    pred_diff = y_pred.diff()

    # Drop the first NaN caused by diff()
    true_sign = np.sign(true_diff.iloc[1:])
    pred_sign = np.sign(pred_diff.iloc[1:])

    # Align indices
    pred_sign = pred_sign.reindex(true_sign.index)

    matches = (true_sign == pred_sign)
    return matches.mean()


def mean_absolute_percentage_error(y_true, y_pred):
    """MAPE: Mean Absolute Percentage Error. Returns a value between 0 and 1."""
    y_true_arr = np.array(y_true)
    y_pred_arr = np.array(y_pred)
    eps = 1e-10  # avoid division by zero
    return np.mean(np.abs((y_true_arr - y_pred_arr) / (y_true_arr + eps)))

## 5. Compute Metrics for Each Model

For each prediction column, we compute:
- RMSE
- MAE
- MAPE
- R²
- DA

We then build a comparison table and a "pretty" version with rounded values.

In [None]:
results = []

for col in model_cols:
    y_pred = df[col]

    rmse = np.sqrt(mean_squared_error(y_true, y_pred))
    mae  = mean_absolute_error(y_true, y_pred)
    mape = mean_absolute_percentage_error(y_true, y_pred)
    r2   = r2_score(y_true, y_pred)
    da   = directional_accuracy(y_true, y_pred)

    results.append({
        'Model': col,
        'RMSE': rmse,
        'MAE': mae,
        'MAPE': mape,
        'R2':  r2,
        'DA':  da
    })

metrics_df = pd.DataFrame(results)
metrics_df = metrics_df.sort_values(by='RMSE').reset_index(drop=True)

print('=== Final Metrics Table (raw) ===')
display(metrics_df)

# Pretty version for reporting
metrics_pretty = metrics_df.copy()
metrics_pretty['RMSE'] = metrics_pretty['RMSE'].round(2)
metrics_pretty['MAE']  = metrics_pretty['MAE'].round(2)
metrics_pretty['MAPE'] = (metrics_pretty['MAPE'] * 100).round(2)  # %
metrics_pretty['R2']   = metrics_pretty['R2'].round(3)
metrics_pretty['DA']   = (metrics_pretty['DA'] * 100).round(2)    # %

print('=== Final Metrics Table (pretty, % for MAPE & DA) ===')
display(metrics_pretty)

# Save raw metrics to CSV
metrics_df.to_csv('metrics_summary.csv', index=False)
print('Saved metrics_summary.csv')

## 6. Visualizations

This section creates:
- Actual vs Predicted plot (all models)
- RMSE bar chart
- MAE bar chart
- MAPE bar chart
- R² bar chart
- Directional Accuracy bar chart


In [None]:
# X-axis for plots
if 'date_col' in globals() and date_col is not None:
    x_vals = df[date_col]
    x_label = 'Date'
else:
    x_vals = np.arange(len(df))
    x_label = 'Index'

# A. Actual vs Predicted (all models)
plt.figure(figsize=(12, 6))
plt.plot(x_vals, y_true, label='Actual', linewidth=2)

for col in model_cols:
    plt.plot(x_vals, df[col], label=col)

plt.title('Actual vs Predicted Bitcoin Price')
plt.xlabel(x_label)
plt.ylabel('Price')
plt.legend()
plt.tight_layout()
plt.show()

# B. RMSE Bar Chart
plt.figure(figsize=(8, 5))
plt.bar(metrics_df['Model'], metrics_df['RMSE'])
plt.title('RMSE by Model')
plt.ylabel('RMSE')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

# C. MAE Bar Chart
plt.figure(figsize=(8, 5))
plt.bar(metrics_df['Model'], metrics_df['MAE'])
plt.title('MAE by Model')
plt.ylabel('MAE')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

# D. MAPE Bar Chart (%)
plt.figure(figsize=(8, 5))
plt.bar(metrics_df['Model'], metrics_df['MAPE'] * 100)
plt.title('MAPE by Model')
plt.ylabel('MAPE (%)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

# E. R² Bar Chart
plt.figure(figsize=(8, 5))
plt.bar(metrics_df['Model'], metrics_df['R2'])
plt.title('R² by Model')
plt.ylabel('R²')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

# F. Directional Accuracy Bar Chart (%)
plt.figure(figsize=(8, 5))
plt.bar(metrics_df['Model'], metrics_df['DA'] * 100)
plt.title('Directional Accuracy by Model')
plt.ylabel('DA (%)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## 7. Optional: Error Distributions

These histograms show the distribution of errors (Actual - Predicted) for each model.

In [None]:
for col in model_cols:
    errors = y_true - df[col]
    plt.figure(figsize=(8, 5))
    plt.hist(errors, bins=40)
    plt.title(f'Error Distribution - {col}')
    plt.xlabel('Error (Actual - Predicted)')
    plt.ylabel('Frequency')
    plt.tight_layout()
    plt.show()