# Data Cleaning with Pandas `.replace`

In this notebook, we will explore the use of the `.replace` method in the pandas library to clean and transform data. The `.replace` method is a powerful tool for handling straightforward value replacements across a DataFrame or Series, making it ideal for standardizing data and handling missing values.

## Objectives

1. **Data Cleaning**:
    - **Standardizing Categories**: Clean and standardize the 'category' column to handle inconsistencies such as extra spaces, mixed case, and missing values.
    - **Handling Availability Status**: Standardize the 'availability' column to ensure consistent terminology.

2. **Multiple Column Replacements**:
    - Demonstrate how to use `.replace` to update multiple columns simultaneously, showcasing its efficiency and ease of use.

## `.replace` Method Overview

The `.replace` method in pandas allows for straightforward value replacements within a DataFrame or Series. It is particularly useful for:
- **Replacing Multiple Values at Once**: Efficiently handle multiple value replacements using dictionaries or lists.
- **Handling Missing or Specific Values**: Replace specific values, including `NaN` or empty strings, with appropriate placeholders.
- **Replacing Values Across Multiple Columns**: Apply replacements across multiple columns in a single operation.

### Example

In this notebook, we will work with a sample DataFrame containing inconsistencies and missing values in the 'category' and 'availability' columns. We will use the `.replace` method to clean and standardize these columns, ensuring our data is consistent and ready for analysis.


In [None]:
import pandas as pd
import numpy as np

# Create Sample Data
def build_sample_dataframe() -> pd.DataFrame:
    data = {
        'category': ['Electronics', 'Clothing', 'Home', 'Electronics ', ' clothing', 'Home', 'electronics', 'Clothing', 'Books', None, ''],
        'item': ['Laptop', 'T-Shirt', 'Sofa', 'Smartphone', 'Jeans', 'Table', 'Headphones', 'Jacket', 'Novel', 'Lamp', ''],
        'price': [999.50, 19.75, 299.00, 699.25, 49.50, 199.00, 89.75, 79.50, 14.25, 39.00, 0.0],
        'availability': ['In Stock', 'Out of Stock', 'In Stock', 'In Stock', 'Out of Stock', 'In Stock', 'In Stock', 'Out of Stock', None, 'In Stock', '']
    }
    _df = pd.DataFrame(data)
    return _df


In [None]:
df = build_sample_dataframe()

# Display the original DataFrame
print("Original DataFrame:")
print(df)


In [None]:
# Replace values in multiple columns
df.replace({
    'category': {
        'Electronics ': 'electronics',
        ' clothing': 'clothing',
        'electronics': 'electronics',
        'Clothing': 'clothing',
        '': 'unknown',
        None: 'unknown'
    },
    'availability': {
        'In Stock': 'available',
        'Out of Stock': 'unavailable',
        '': 'unknown',
        None: 'unknown'
    }
}, inplace=True)

# Display the cleaned DataFrame
print("\nCleaned DataFrame:")
print(df)