<img src='../images/xebia-logo.png' width='300px' align='right' style="padding: 15px">

# Python type hinting & error handling

This notebook will level up your **functions** skills by focusing on **type casting, type hints, and writing cleaner, safer data-science-ready functions**.
- [Why does handling types matter?](#why)
- [Type hints](#th)
- [Type casting](#tc)
- [Error handling](#eh)
- [Exercises](#ex)

<a id=why></a>
## Why Does Type Casting Matter?

Consider the following function... it seems pretty logical that this function is defined for numeric data...

In [None]:
def add_two_numbers(a, b):
    return float(a + b)

In [None]:
add_two_numbers(4, 5), add_two_numbers(3.45, -5.54)

But if different data types are passed into it, it might fail...

In [None]:
# errors
# add_two_numbers('4', 5)
# add_two_numbers([1,2,3], [4,5,6]) 

Or even worse... return unexpected results

In [None]:
add_two_numbers('4', '6')

<a id=th></a>
## Type hints

You can use type hints to document what the **inputs** should be as well as what the **outputs** should be.

Type hints have been around since `version 3.5`.

In [None]:
def add_two_numbers_th(a: int, b: int) -> int:
    return a + b

add_two_numbers_th(2, 3)

However they are NEVER enforced meaning that, despite having type hints, a function can still accept (or return) a conflicting type.

In [None]:
add_two_numbers_th('2', '3')

### Why?

Python's core philosophy is duck typing first, meaning Python is designed around *behaviour* not *explicit types*. Functions are expected to work on any object that behaves correctly, not just a specific type. Enforcing types at runtime would add noise and reduce the simplicity that Python emphasizes. 

So, type hints are meant to communicate intent without forcing enforcement.

Then how to enforce this?

<a id=why></a>
## Type casting in functions

One way around this would be to cast the data types inside the function, to apply the expected behaviour.

In [None]:
def safe_add(a: int, b: int) -> int:
    return int(a) + int(b)

In [None]:
safe_add(5, 6), safe_add('6', '4')

Now if you attempt to run this using strings...

In [None]:
# error - uncomment the line below and run
# safe_add('Silver', 'Gold')

But surely there's a way to identify these issues ahead of time, i.e. not execute the action if an issue gets identified?

<a id=eh></a>
## Error handling

There are two main methods when it comes to handling errors:

- LBYL - Look Before You Leap
- EAFP - Easier to Ask for Forgiveness rather than Permission

### Look before you leap - catch those errors before executing the code

Take a look at the following code... what is happening here?

In [None]:
import os

def remove_file_from_filepath_lbyl(filepath):
    """
    Remove a file at the specified filepath.
    
    Args:
        filepath (str): Path to the file to be removed
    
    Returns:
        None: os.remove() returns None on success
    """
    if not os.path.exists(filepath):
        raise FileNotFoundError(f'File path "{filepath}" does not exist!')
    
    return os.remove(filepath)


The code is identifying a potential issue that may arise - the coder has anticipated that the filepath may not exist. This is quite a common issue as users of the function may input an incorrect filepath by mistake.

So now, as a user of this function, if you execute the function with a filepath that does not exist...

In [None]:
# error
# remove_file_from_filepath_lbyl('this/path/does/not.exist')

The error will be triggered - and it will include the error message set by the function creator!

A major benefit of using this is that you can handle particular types of errors... you expect that users of your function will try and an input that is something other than the type you want to allow? Raise a `ValueError`!

However... 
- what down sides are there with this way of working?
- What other errors might come up when trying to delete a file?

<details>
    
  <summary><span style="color:blue">Click to show answer:</span></summary>

This is considered the "look before you leap" - LBYL - pattern for error handling. Applying this means you check specific conditions (is `a` a float?) that will cause an action to fail, before before triggering the action itself.

A limitation is that you **need to know all the possible things that can go wrong** so that you can check for them before making the action. It is clear that both values should be numeric... but there are many other things that could go wrong with this function - to name a few:

- The path could be of a directory instead of a file
- The file could be owned by a different user than the one attempting the deletion
- The file could have read-only permissions
- The disk on which the file is stored could be mounted as a read-only volume
- The file could be locked by another process, a common annoyance on Microsoft Windows

</details>

### Defensive Programming: Try/Except Type Casting

Instead of the `LBYL: Look Before You Leap method`, you can apply the `EAFP: Easier to Ask Forgiveness than Permission` method.

The competing pattern says that you should perform the action - or TRY to perform the action - and deal with any errors afterwards - or deal with the EXCEPTions.

This can be done using `try:` and `except:`.

In [None]:
import os

def remove_file_from_filepath(filepath):
    """
    Remove a file at the specified filepath.
    
    Args:
        filepath (str): Path to the file to be removed
    
    Returns:
        None: os.remove() returns None on success
    """
    
    try: 
        return os.remove(filepath)
    except Exception as error:
        print(f"An error has occured: {error}")

Here it was not necessary to check exact reasons why... instead a print message was shown as to why there is no output.

Below the same code is applied. What is the difference between the code above, and the code below?



In [None]:
import os

def remove_file_from_filepath(filepath):
    """
    Remove a file at the specified filepath.
    
    Args:
        filepath (str): Path to the file to be removed
    
    Returns:
        None: os.remove() returns None on success
    """
    
    try: 
        return os.remove(filepath)
    except FileNotFoundError as error:
        print(f"An error has occured: {error}")
        raise

<details>
    
  <summary><span style="color:blue">Click to show answer:</span></summary>

- **Re-raising the error**: The important thing to note here is that the `raise` keyword interrupts the function. This is necessary when the rest of the function will not be able to do what it needs to do because of this error, and therefore should not run. 
- **Specifying an exception type:** This only catches that specific type of error. This is much better practice because you can handle different errors differently by adding multiple `except` commands.

</details>

### Exception types

Common exception types you'll see:

- **ValueError**: Wrong type of value
- **TypeError**: Wrong type of object
- **KeyError**: Dictionary key doesn't exist
- **IndexError**: List index out of range
- **FileNotFoundError**: File doesn't exist
- **ZeroDivisionError**: Division by zero
- **Exception**: The generic base class (catches almost everything)

For example... what issues could you anticipate with the following code:

In [None]:
import pandas as pd

required_columns_for_pikachu = ['stats_sum', 'hp', 'attack', 'defense', 'special_atk', 'special_def', 'speed', 'colour']

df = pd.read_csv('data/pokemon.csv')

missing = set(required_columns_for_pikachu) - set(df.columns)
if missing:
    print(f"The following columns are missing: {missing}")

df.head()

With the above code 
- What issues might arise in some specific cases?
- And how would you handle each issue?

<details>
    
  <summary><span style="color:blue">Click to show answer:</span></summary>
  
1. **FileNotFoundError** - The file might not exist at that filepath
   - How to handle it: Print a message and return None (no data is workable)

2. **EmptyDataError** - The CSV file exists but is completely empty
   - How to handle it: Print a message and return None (empty data isn't useful but not catastrophic)

3. **ValueError** - The data loaded successfully but is missing required columns
   - How to handle it: Print the validation error and re-raise it (this is a serious problem that should stop execution)

4. **Unknown** - Something completely unexpected happened 
   - How to handle it: Print a message and re-raise it (need to investigate what went wrong)

</details>

The code can be rewritten to account for all these possible cases, as well as a final `except` to catch all final errors.

In [None]:
import pandas as pd

def load_and_validate_data(filepath, required_columns):
    """Load a CSV and validate it has the required columns."""
    try:
        df = pd.read_csv(filepath)
        
        # Check for required columns
        missing = set(required_columns) - set(df.columns)
        if missing:
            raise ValueError(f"Missing required columns: {missing}")
        
        return df
    
    except FileNotFoundError as e:
        print(f"Data file not found: {filepath}")
        return None  # Could handle gracefully with empty DataFrame
    
    except pd.errors.EmptyDataError as e:
        print(f"CSV file is empty: {filepath}")
        return None
    
    except ValueError as e:
        print(f"Data validation error: {e}")
        raise  # Re-raise because missing columns is a serious issue
    
    except Exception as error:
        print(f"Unexpected error loading data: {error}")
        raise  # Unknown errors should crash so we can investigate


# This will handle the FileNotFoundError gracefully
df = load_and_validate_data("missing_file.csv", ["user_id", "revenue"])

# This will raise ValueError (missing columns)
df = load_and_validate_data("data.csv", ["user_id", "revenue", "country"])

# An unexpected error might be: permission denied, corrupted file, etc.

<a id=ex></a>
## <mark>Exercises</mark>

### <mark>Exercise 1: Safe Division Function</mark>

Write a function that safely divides two numbers and handles various error cases:
|Case|How to handle|
|---|---|
|`safe_divide(10, 2)`|           Should return 5.0|
| `safe_divide(10, 0)`|          Should handle ZeroDivisionError|
| `safe_divide("10", "2")`|      Should convert and return 5.0|
| `safe_divide("ten", 2)`|       Should handle ValueError|
| `safe_divide([1,2], 3)`|       Should handle TypeError|

**Questions to consider:**
- Which errors should return the default value vs. raising an exception?
- Should you print error messages or just handle them silently?
- When would re-raising be appropriate here?



In [None]:
def safe_divide(numerator, dividend, default=None):
    """
    Safely divide two numbers with proper error handling.
    
    Args:
        numerator: The number to divide
        dividend: The number to divide by
        default: Value to return if division fails (default: None)
    
    Returns:
        Result of division, or default value if it fails
    """
    # TODO: Implement this function with try/except blocks for:
    # 1. ZeroDivisionError - when dividend is zero
    # 2. TypeError - when inputs can't be converted to numbers
    # 3. ValueError - when string inputs can't be parsed as numbers
    # 4. Any unexpected errors
    
    pass

### <mark>Exercise 2: Data Type Validator</mark>

Create a function that validates a DataFrame has the correct column types.

For example, with the following dataframe...
```python    
df = pd.DataFrame({
    'age': [25, 30, 35],
    'name': ['Alice', 'Bob', 'Charlie'],
    'salary': [50000.0, 60000.0, 55000.0]
})
```
The expected columns and their data types can be extracted to give:
```python 
expected = {'age': 'int64', 'name': 'object', 'salary': 'float64'}
validate_column_types(df, expected)  # Should return True
```

However if the wrong types are identified:
```python
wrong_types = {'age': 'float64', 'name': 'object',  'salary': 'float64'}
validate_column_types(df, wrong_types)  # Should raise ValueError
```

**Hint:** Use `df[column].dtype` to check column types. Think about what information would be helpful in your error messages!



In [None]:
def validate_column_types(df, expected_types):
    """
    Validate that DataFrame columns match expected types.
    
    Args:
        df: pandas DataFrame to validate
        expected_types: dict mapping column names to expected types
                       e.g., {'age': 'int64', 'name': 'object', 'salary': 'float64'}
    
    Returns:
        True if all types match, raises ValueError with details if not
    """
    try:
        # TODO: Implement validation logic
        # 1. Check if all expected columns exist in df
        # 2. Check if each column has the correct dtype
        # 3. Raise ValueError with helpful message if mismatches found
        
        pass
    except:
        pass
    
    # TODO: Add except blocks for:
    # - AttributeError: df is not a DataFrame
    # - KeyError: expected column doesn't exist
    # - Any unexpected errors

### <mark>Bonus Challenge: Robust CSV Loader with Type Casting</mark>

Build a production-ready CSV loader that handles missing files, validates columns, AND automatically casts columns to specified types.

You should be able to test it with the following code:
```python
df = load_csv_with_types(
    'data/pokemon.csv',
    column_types={'hp': int, 'name': str, 'is_legendary': bool},
    required_columns=['name', 'hp'],
    fillna_strategy={'hp': 0}
)


In [None]:
def load_csv_with_types(filepath, column_types, required_columns=None, fillna_strategy=None):
    """
    Load a CSV file with robust error handling and automatic type casting.
    
    Args:
        filepath: Path to the CSV file
        column_types: dict mapping column names to desired types
                     e.g., {'age': int, 'name': str, 'score': float}
        required_columns: list of columns that must exist (optional)
        fillna_strategy: dict mapping column names to fill values (optional)
                        e.g., {'age': 0, 'score': -1}
    
    Returns:
        DataFrame with properly typed columns, or None if loading fails
    """
    # TODO: Implement a function that:
    # 1. Tries to load the CSV file
    # 2. Validates required columns exist
    # 3. Attempts to cast each column to its specified type
    # 4. Fills NA values according to strategy (if provided)
    # 5. Handles these errors appropriately:
    #    - FileNotFoundError (return None with message)
    #    - pd.errors.EmptyDataError (return None with message)
    #    - ValueError from missing required columns (raise)
    #    - ValueError/TypeError from type casting (print warning, keep original type)
    #    - Any unexpected errors (print and raise)
    
    pass

**<mark>Extra challenges for the speedy:**</mark>
- Add a `verbose` parameter that controls whether to print status messages
- Return a tuple of `(df, errors)` where errors is a list of any type casting issues
- Add support for custom type casting functions (e.g., converting 'Yes'/'No' to boolean)

**Answers**: Uncomment and run the cells below to load in answers

In [None]:
# %load answers/type-hint-1

In [None]:
# %load answers/type-hint-2

In [None]:
# %load answers/type-hint-bonus1

In [None]:
# %load answers/type-hint-bonus2

<id a=summ></a>

## Summary: Error Handling Best Practices

**Key Takeaways:**

**EAFP** (Easier to Ask Forgiveness than Permission) uses `try/except` with the method of attempting the action and handle errors. **LBYL** (Look Before You Leap)checks conditions with `if` statements before attempting actions. **EAFP** is generally more Pythonic and handles edge cases you might not anticipate

There are many Try/Except syntax patterns, and you can decide whether to handle or re-raise the error.

Build robust functions using type casting + error handling, combine type hints (documentation) with type casting (enforcement), as well as `try`/`except` around casting operations to handle conversion failures, providing sensible defaults or raise clear errors when casting fails.

All of this in practice helps to create production-ready patterns:
   - Be specific with exception types (catch `FileNotFoundError`, not just `Exception`)
   - Provide helpful error messages that guide users on how to fix the issue
   - Log errors before re-raising them for debugging
   - Consider returning `None` or default values for non-critical failures

**Remember** that good error handling makes your code resilient, debuggable, and user-friendly.

<!-- # To add

- mypy
- pydantic
- attrs/cattrs -->