# Handling Exceptions

In this notebook, we'll explore how to handle exceptions effectively. Exception handling is crucial for building robust and maintainable code, especially in complex workflows. We'll cover best practices, demonstrate how to implement them in a data science context, and illustrate advanced techniques such as using custom exceptions and ensuring clean error handling across nested functions.

## Table of Contents

1. [Basic Exception Handling](#1)
2. [Custom Exceptions](#2)
3. [Nested Functions and Exception Propagation](#3)
4. [Logging Exceptions](#4)
5. [Step-by-Step Example](#5)
6. [Exercise](#6)

---
## 1. Basic Exception Handling <a name="1"></a>

Exception handling allows your code to deal with errors gracefully. Here's a simple example of handling an exception in a data loading step.

In [None]:
import pandas as pd

def load_data(filepath):
    try:
        data = pd.read_csv(filepath)
        return data
    except FileNotFoundError:
        print(f"Error: The file at {filepath} was not found.")
    except pd.errors.EmptyDataError:
        print("Error: No data in file.")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")

# Usage
data = load_data('data/raw/non_existent_file.csv')

---
2. Custom Exceptions <a name="2"></a>

Creating custom exceptions allows you to handle specific error conditions more gracefully.

python

class DataValidationError(Exception):
    pass

def validate_data(data):
    if data.isnull().sum().sum() > 0:
        raise DataValidationError("Data contains missing values.")

# Usage
try:
    data = load_data('data/raw/example.csv')
    validate_data(data)
except DataValidationError as e:
    print(f"Validation Error: {e}")

---
3. Nested Functions and Exception Propagation <a name="3"></a>

Handling exceptions in nested functions ensures that errors are caught and managed properly, preventing unexpected crashes.

python

def preprocess_data(data):
    try:
        # Example preprocessing step
        data['new_column'] = data['existing_column'] * 2
        return data
    except KeyError as e:
        raise DataValidationError(f"Missing column during preprocessing: {e}")

def run_pipeline(filepath):
    try:
        data = load_data(filepath)
        validate_data(data)
        data = preprocess_data(data)
        return data
    except DataValidationError as e:
        print(f"Data validation failed: {e}")
    except Exception as e:
        print(f"An unexpected error occurred in the pipeline: {e}")

# Usage
processed_data = run_pipeline('data/raw/example.csv')

---
4. Logging Exceptions <a name="4"></a>

Using logging for exception handling provides a more flexible and powerful way to manage errors, especially in production environments.

python

import logging

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def load_data(filepath):
    try:
        data = pd.read_csv(filepath)
        return data
    except FileNotFoundError:
        logger.error(f"File not found: {filepath}")
        raise
    except pd.errors.EmptyDataError:
        logger.error("No data in file.")
        raise
    except Exception as e:
        logger.error(f"An unexpected error occurred: {e}")
        raise

# Usage
try:
    data = load_data('data/raw/non_existent_file.csv')
except Exception as e:
    logger.critical(f"Critical error occurred: {e}")

---
5. Step-by-Step Example <a name="5"></a>

We'll now build a complete data science pipeline with exception handling at each step.
Step 1: Data Loading

python

def load_data(filepath):
    try:
        data = pd.read_csv(filepath)
        return data
    except FileNotFoundError:
        logger.error(f"File not found: {filepath}")
        raise
    except pd.errors.EmptyDataError:
        logger.error("No data in file.")
        raise
    except Exception as e:
        logger.error(f"An unexpected error occurred: {e}")
        raise

Step 2: Data Validation

python

class DataValidationError(Exception):
    pass

def validate_data(data):
    try:
        if data.isnull().sum().sum() > 0:
            raise DataValidationError("Data contains missing values.")
    except DataValidationError as e:
        logger.warning(f"Validation error: {e}")
        raise

Step 3: Data Preprocessing

python

def preprocess_data(data):
    try:
        data['new_column'] = data['existing_column'] * 2
        return data
    except KeyError as e:
        logger.error(f"Missing column during preprocessing: {e}")
        raise DataValidationError(f"Preprocessing error: {e}")

Step 4: Running the Pipeline

python

def run_pipeline(filepath):
    try:
        data = load_data(filepath)
        validate_data(data)
        data = preprocess_data(data)
        return data
    except DataValidationError as e:
        logger.error(f"Pipeline failed: {e}")
    except Exception as e:
        logger.critical(f"Critical error in pipeline: {e}")

# Usage
processed_data = run_pipeline('data/raw/example.csv')

---
6. Exercise <a name="6"></a>
Task

You are provided with a simple data science pipeline that loads data, validates it, preprocesses it, and trains a model. The pipeline currently does not have any exception handling. Your task is to:

    Add exception handling to each step of the pipeline.
    Use custom exceptions where appropriate.
    Implement logging for all exceptions.

Initial Code

python

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

def load_data(filepath):
    data = pd.read_csv(filepath)
    return data

def validate_data(data):
    if data.isnull().sum().sum() > 0:
        print("Data contains missing values.")

def preprocess_data(data):
    data['new_column'] = data['existing_column'] * 2
    return data

def train_model(data):
    X = data[['new_column']]
    y = data['target']
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    model = LinearRegression()
    model.fit(X_train, y_train)
    return model

def run_pipeline(filepath):
    data = load_data(filepath)
    validate_data(data)
    data = preprocess_data(data)
    model = train_model(data)
    return model

# Usage
model = run_pipeline('data/raw/example.csv')
print(model)

Requirements

    Handle file not found errors in load_data.
    Raise a custom exception for validation errors in validate_data.
    Handle missing column errors in preprocess_data.
    Handle any errors during model training in train_model.
    Log all exceptions with appropriate severity levels.

Solution

python

import pandas as pd
import logging
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class DataValidationError(Exception):
    pass

def load_data(filepath):
    try:
        data = pd.read_csv(filepath)
        return data
    except FileNotFoundError:
        logger.error(f"File not found: {filepath}")
        raise
    except pd.errors.EmptyDataError:
        logger.error("No data in file.")
        raise
    except Exception as e:
        logger.error(f"An unexpected error occurred: {e}")
        raise

def validate_data(data):
    try:
        if data.isnull().sum().sum() > 0:
            raise DataValidationError("Data contains missing values.")
    except DataValidationError as e:
        logger.warning(f"Validation error: {e}")
        raise

def preprocess_data(data):
    try:
        data['new_column'] = data['existing_column'] * 2
        return data
    except KeyError as e:
        logger.error(f"Missing column during preprocessing: {e}")
        raise DataValidationError(f"Preprocessing error: {e}")

def train_model(data):
    try:
        X = data[['new_column']]
        y = data['target']
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
        model = LinearRegression()
        model.fit(X_train, y_train)
        return model
    except KeyError as e:
        logger.error(f"Missing target column: {e}")
        raise DataValidationError(f"Training error: {e}")
    except Exception as e:
        logger.error(f"An error occurred during model training: {e}")
        raise

def run_pipeline(filepath):
    try:
        data = load_data(filepath)
        validate_data(data)
        data = preprocess_data(data)
        model = train_model(data)
        return model
    except DataValidationError as e:
        logger.error(f"Pipeline failed: {e}")
    except Exception as e:
        logger.critical(f"Critical error in pipeline: {e}")

# Usage
try:
    model = run_pipeline('data/raw/example.csv')
    print(model)
except Exception as e:
    logger.critical(f"Pipeline execution failed: {e}")