# üöÄ Confidence Interval Metrics - Interactive Demo

This notebook demonstrates how to use the **MetricEvaluator** unified interface for calculating confidence intervals on both regression and classification metrics.

## üìã What You'll Learn:
- How to use the unified `MetricEvaluator` class
- Calculate confidence intervals for regression metrics (MAE, MSE, RMSE, R¬≤)  
- Calculate confidence intervals for classification metrics (accuracy, precision, recall)
- Different confidence interval methods (bootstrap, jackknife, proportion-based)
- Interactive examples with real data

## üì¶ Install and Import Required Libraries

First, let's import all the necessary libraries for our demo.

In [None]:
# Import standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_regression, make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.metrics import mean_squared_error, accuracy_score
import warnings
warnings.filterwarnings('ignore')

# Set up plotting style
plt.style.use('default')
sns.set_palette("husl")

In [None]:
# Import the confidence interval package
import sys
sys.path.append('/home/zokirov_diyorbek/Documents/confidence_interval_jacob/confidenceinterval')

from confidenceinterval import MetricEvaluator

print("‚úÖ MetricEvaluator imported successfully!")
print(f"üìä Available for regression: {MetricEvaluator().get_available_metrics('regression')}")
print(f"üéØ Available for classification: {MetricEvaluator().get_available_metrics('classification')}")

## üìà Regression Metrics with Confidence Intervals

Let's start with regression metrics. We'll generate some sample data, train a model, and then calculate confidence intervals for various regression metrics.

In [None]:
# Generate sample regression data
print("üîÑ Generating regression dataset...")
X, y = make_regression(n_samples=200, n_features=5, noise=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train a regression model
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print(f"üìä Dataset: {len(X_test)} test samples")
print(f"üéØ R¬≤ Score: {model.score(X_test, y_test):.4f}")
print(f"üìâ MSE: {mean_squared_error(y_test, y_pred):.4f}")

# Plot predictions vs actual
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.scatter(y_test, y_pred, alpha=0.6)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
plt.xlabel('Actual Values')
plt.ylabel('Predicted Values')
plt.title('Predictions vs Actual')

plt.subplot(1, 2, 2)
residuals = y_test - y_pred
plt.scatter(y_pred, residuals, alpha=0.6)
plt.axhline(y=0, color='r', linestyle='--')
plt.xlabel('Predicted Values')
plt.ylabel('Residuals')
plt.title('Residual Plot')

plt.tight_layout()
plt.show()

In [None]:
# Calculate confidence intervals for regression metrics
evaluator = MetricEvaluator()

# Test different regression metrics
regression_metrics = ['mae', 'mse', 'rmse', 'r2']
methods = ['bootstrap_bca', 'jackknife']

print("üîç REGRESSION METRICS WITH CONFIDENCE INTERVALS")
print("=" * 60)

results_data = []

for metric in regression_metrics:
    print(f"\nüìä {metric.upper()} Results:")
    for method in methods:
        try:
            score, ci = evaluator.evaluate(
                y_true=y_test.tolist(),
                y_pred=y_pred.tolist(),
                task='regression',
                metric=metric,
                method=method,
                confidence_level=0.95
            )
            print(f"  {method:15s}: {score:.6f}, CI: ({ci[0]:.6f}, {ci[1]:.6f})")
            results_data.append({
                'Metric': metric.upper(),
                'Method': method,
                'Score': score,
                'CI_Lower': ci[0],
                'CI_Upper': ci[1],
                'CI_Width': ci[1] - ci[0]
            })
        except Exception as e:
            print(f"  {method:15s}: ‚ùå Failed - {e}")

# Create a results DataFrame
results_df = pd.DataFrame(results_data)
print(f"\nüìã Summary of {len(results_data)} successful calculations:")
print(results_df.round(6))

In [None]:
# Visualize confidence intervals for regression metrics
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('Regression Metrics with 95% Confidence Intervals', fontsize=16)

metrics = results_df['Metric'].unique()
colors = ['skyblue', 'lightcoral']

for i, metric in enumerate(metrics):
    ax = axes[i//2, i%2]
    metric_data = results_df[results_df['Metric'] == metric]
    
    x_positions = range(len(metric_data))
    scores = metric_data['Score'].values
    ci_lower = metric_data['CI_Lower'].values
    ci_upper = metric_data['CI_Upper'].values
    methods = metric_data['Method'].values
    
    # Create bar plot with error bars
    bars = ax.bar(x_positions, scores, color=colors, alpha=0.7, 
                  yerr=[scores - ci_lower, ci_upper - scores], 
                  capsize=10, error_kw={'linewidth': 2})
    
    ax.set_title(f'{metric} with Confidence Intervals')
    ax.set_xlabel('Method')
    ax.set_ylabel('Score')
    ax.set_xticks(x_positions)
    ax.set_xticklabels(methods, rotation=45)
    ax.grid(True, alpha=0.3)
    
    # Add value labels on bars
    for j, (score, method) in enumerate(zip(scores, methods)):
        ax.text(j, score, f'{score:.4f}', ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

print("üí° Interpretation:")
print("- Error bars show 95% confidence intervals")
print("- Smaller intervals indicate more precise estimates")
print("- Bootstrap and Jackknife methods may give different interval widths")

## üéØ Classification Metrics with Confidence Intervals

Now let's explore classification metrics. We'll generate binary classification data and calculate confidence intervals for common classification metrics.

In [None]:
# Generate sample classification data
print("üîÑ Generating classification dataset...")
X_clf, y_clf = make_classification(n_samples=300, n_features=10, n_classes=2, 
                                   n_informative=7, n_redundant=1, random_state=42)
X_train_clf, X_test_clf, y_train_clf, y_test_clf = train_test_split(
    X_clf, y_clf, test_size=0.3, random_state=42)

# Train a classification model
clf_model = RandomForestClassifier(n_estimators=100, random_state=42)
clf_model.fit(X_train_clf, y_train_clf)
y_pred_clf = clf_model.predict(X_test_clf)
y_pred_proba = clf_model.predict_proba(X_test_clf)[:, 1]

print(f"üìä Dataset: {len(X_test_clf)} test samples")
print(f"üéØ Accuracy: {accuracy_score(y_test_clf, y_pred_clf):.4f}")
print(f"üìà Class distribution: {np.bincount(y_test_clf)}")

# Plot classification results
plt.figure(figsize=(12, 4))

plt.subplot(1, 3, 1)
plt.hist(y_pred_proba[y_test_clf == 0], alpha=0.7, label='Class 0', bins=20)
plt.hist(y_pred_proba[y_test_clf == 1], alpha=0.7, label='Class 1', bins=20)
plt.xlabel('Predicted Probability')
plt.ylabel('Count')
plt.title('Prediction Probability Distribution')
plt.legend()

plt.subplot(1, 3, 2)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test_clf, y_pred_clf)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')

plt.subplot(1, 3, 3)
from sklearn.metrics import roc_curve, auc
fpr, tpr, _ = roc_curve(y_test_clf, y_pred_proba)
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, label=f'ROC Curve (AUC = {roc_auc:.3f})')
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
# Calculate confidence intervals for classification metrics
classification_metrics = ['accuracy', 'precision', 'recall']
classification_methods = ['wilson', 'normal', 'agresti_coull']

print("üîç CLASSIFICATION METRICS WITH CONFIDENCE INTERVALS")
print("=" * 60)

clf_results_data = []

for metric in classification_metrics:
    print(f"\nüéØ {metric.upper()} Results:")
    for method in classification_methods:
        try:
            score, ci = evaluator.evaluate(
                y_true=y_test_clf.tolist(),
                y_pred=y_pred_clf.tolist(),
                task='classification',
                metric=metric,
                method=method,
                confidence_level=0.95
            )
            print(f"  {method:15s}: {score:.6f}, CI: ({ci[0]:.6f}, {ci[1]:.6f})")
            clf_results_data.append({
                'Metric': metric.upper(),
                'Method': method,
                'Score': score,
                'CI_Lower': ci[0],
                'CI_Upper': ci[1],
                'CI_Width': ci[1] - ci[0]
            })
        except Exception as e:
            print(f"  {method:15s}: ‚ùå Failed - {e}")

# Create a results DataFrame
clf_results_df = pd.DataFrame(clf_results_data)
print(f"\nüìã Summary of {len(clf_results_data)} successful calculations:")
print(clf_results_df.round(6))

In [None]:
# Visualize classification confidence intervals
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
fig.suptitle('Classification Metrics with 95% Confidence Intervals', fontsize=16)

clf_metrics = clf_results_df['Metric'].unique()
colors = ['lightgreen', 'lightblue', 'salmon']

for i, metric in enumerate(clf_metrics):
    ax = axes[i]
    metric_data = clf_results_df[clf_results_df['Metric'] == metric]
    
    x_positions = range(len(metric_data))
    scores = metric_data['Score'].values
    ci_lower = metric_data['CI_Lower'].values
    ci_upper = metric_data['CI_Upper'].values
    methods = metric_data['Method'].values
    
    # Create bar plot with error bars
    bars = ax.bar(x_positions, scores, color=colors[i], alpha=0.7, 
                  yerr=[scores - ci_lower, ci_upper - scores], 
                  capsize=8, error_kw={'linewidth': 2})
    
    ax.set_title(f'{metric}')
    ax.set_xlabel('CI Method')
    ax.set_ylabel('Score')
    ax.set_xticks(x_positions)
    ax.set_xticklabels(methods, rotation=45)
    ax.grid(True, alpha=0.3)
    ax.set_ylim(0, 1)
    
    # Add value labels on bars
    for j, (score, method) in enumerate(zip(scores, methods)):
        ax.text(j, score + 0.02, f'{score:.3f}', ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

print("üí° Different confidence interval methods for classification:")
print("- Wilson: Generally recommended, good coverage properties")
print("- Normal: Simple but may have poor coverage for extreme proportions")  
print("- Agresti-Coull: Conservative, good for small samples")

## üéÆ Interactive Examples - Try Your Own Data!

Let's create some simple examples you can modify with your own data.

In [None]:
# üìä Simple Regression Example - Modify these values!
print("üìä SIMPLE REGRESSION EXAMPLE")
print("=" * 40)

# Your data here - feel free to modify!
y_true_reg = [1.0, 2.5, 3.2, 4.1, 5.0, 2.8, 3.9, 4.7, 1.8, 2.3]
y_pred_reg = [1.1, 2.4, 3.0, 4.2, 4.9, 2.9, 3.8, 4.8, 1.9, 2.2]

evaluator = MetricEvaluator()

# Calculate different metrics
metrics_simple = ['mae', 'mse', 'rmse', 'r2']

print("Results with Bootstrap BCA:")
for metric in metrics_simple:
    score, ci = evaluator.evaluate(y_true_reg, y_pred_reg, 
                                  task='regression', metric=metric, 
                                  method='bootstrap_bca')
    print(f"{metric.upper():4s}: {score:7.4f}, CI: ({ci[0]:7.4f}, {ci[1]:7.4f})")

# Visualize the simple regression data
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.scatter(range(len(y_true_reg)), y_true_reg, label='True', alpha=0.7, s=50)
plt.scatter(range(len(y_pred_reg)), y_pred_reg, label='Predicted', alpha=0.7, s=50)
plt.xlabel('Sample Index')
plt.ylabel('Value')
plt.title('True vs Predicted Values')
plt.legend()

plt.subplot(1, 2, 2)
plt.scatter(y_true_reg, y_pred_reg, alpha=0.7)
plt.plot([min(y_true_reg), max(y_true_reg)], [min(y_true_reg), max(y_true_reg)], 'r--')
plt.xlabel('True Values')
plt.ylabel('Predicted Values')
plt.title('Prediction Scatter Plot')

plt.tight_layout()
plt.show()

In [None]:
# üéØ Simple Classification Example - Modify these values!
print("\nüéØ SIMPLE CLASSIFICATION EXAMPLE")
print("=" * 40)

# Your classification data here - feel free to modify!
y_true_clf_simple = [0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0]
y_pred_clf_simple = [0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1]

print(f"Sample size: {len(y_true_clf_simple)}")
print(f"True positives: {sum(1 for t, p in zip(y_true_clf_simple, y_pred_clf_simple) if t == 1 and p == 1)}")
print(f"True negatives: {sum(1 for t, p in zip(y_true_clf_simple, y_pred_clf_simple) if t == 0 and p == 0)}")

print("\nResults with Wilson method:")
clf_metrics_simple = ['accuracy', 'precision', 'recall']

for metric in clf_metrics_simple:
    score, ci = evaluator.evaluate(y_true_clf_simple, y_pred_clf_simple, 
                                  task='classification', metric=metric, 
                                  method='wilson')
    print(f"{metric.upper():9s}: {score:6.4f}, CI: ({ci[0]:6.4f}, {ci[1]:6.4f})")

# Create a simple confusion matrix visualization
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_true_clf_simple, y_pred_clf_simple)

plt.figure(figsize=(8, 3))
plt.subplot(1, 2, 1)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['Pred 0', 'Pred 1'], 
            yticklabels=['True 0', 'True 1'])
plt.title('Confusion Matrix')

plt.subplot(1, 2, 2)
metrics_names = ['Accuracy', 'Precision', 'Recall']
metrics_values = []
for metric in ['accuracy', 'precision', 'recall']:
    score, _ = evaluator.evaluate(y_true_clf_simple, y_pred_clf_simple, 
                                 task='classification', metric=metric)
    metrics_values.append(score)

bars = plt.bar(metrics_names, metrics_values, color=['lightgreen', 'lightblue', 'salmon'], alpha=0.7)
plt.ylabel('Score')
plt.title('Classification Metrics')
plt.ylim(0, 1)
for bar, value in zip(bars, metrics_values):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, 
             f'{value:.3f}', ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

## üìö Quick Reference & Summary

Here's everything you need to know to use the MetricEvaluator in your own projects!

In [None]:
# üìã Quick Reference Guide
evaluator = MetricEvaluator()

print("üöÄ METRICEVAL UATOR QUICK REFERENCE")
print("=" * 50)

print("\nüìä Available Regression Metrics:")
reg_metrics = evaluator.get_available_metrics('regression')
for i, metric in enumerate(reg_metrics, 1):
    print(f"  {i}. {metric}")

print("\nüéØ Available Classification Metrics:")
clf_metrics = evaluator.get_available_metrics('classification')
for i, metric in enumerate(clf_metrics, 1):
    print(f"  {i}. {metric}")

print("\nüõ†Ô∏è Available Methods for Regression:")
reg_methods = evaluator.get_available_methods('regression')
for i, method in enumerate(reg_methods, 1):
    print(f"  {i}. {method}")

print("\nüõ†Ô∏è Available Methods for Classification:")
clf_methods = evaluator.get_available_methods('classification')
for i, method in enumerate(clf_methods, 1):
    print(f"  {i}. {method}")

print("\nüí° Basic Usage Pattern:")
print("""
from confidenceinterval import MetricEvaluator

evaluator = MetricEvaluator()

# For regression
score, ci = evaluator.evaluate(
    y_true=[1.0, 2.0, 3.0],
    y_pred=[1.1, 2.1, 2.9], 
    task='regression',
    metric='mae',
    method='bootstrap_bca',
    confidence_level=0.95
)

# For classification  
score, ci = evaluator.evaluate(
    y_true=[0, 1, 1, 0],
    y_pred=[0, 1, 0, 0],
    task='classification', 
    metric='accuracy',
    method='wilson',
    confidence_level=0.95
)
""")

## üéâ Congratulations!

You've successfully learned how to use the **MetricEvaluator** for calculating confidence intervals on machine learning metrics!

### üîë Key Takeaways:
- **Unified Interface**: One class handles both regression and classification metrics
- **Multiple Methods**: Bootstrap, Jackknife, and proportion-based confidence intervals  
- **Easy to Use**: Simple `.evaluate()` method with clear parameters
- **Comprehensive**: Supports all common evaluation metrics

### üöÄ Next Steps:
1. Try the examples with your own data
2. Experiment with different confidence levels (e.g., 0.90, 0.99)
3. Compare different confidence interval methods
4. Use in your machine learning projects for robust evaluation

### üìñ Remember:
- Confidence intervals help quantify uncertainty in your metrics
- Larger samples generally give tighter confidence intervals  
- Different methods may be more appropriate for different scenarios
- Always report confidence intervals alongside point estimates!

**Happy analyzing! üìä‚ú®**