# Simple Imputation - Constant Value Imputation

Constant Value Imputation (The "Predefined Value" Guess)

When to Use:

* When missing values have a specific meaning (e.g., -99 for "error" or "not applicable").
* When you want to explicitly flag missing values with a distinct, easily identifiable value.
* When you have a strong domain-specific reason to believe a particular constant is a reasonable substitute.
* As a simple placeholder before applying more sophisticated imputation.

How it Works:

* Replaces all missing values in a column with a pre-determined, fixed value.
* The constant value is chosen by the user, not derived from the data itself.
* The constant can be numerical (e.g., 0, -1) or categorical (e.g., "Unknown", "Missing").

Limitations:

* Can introduce significant bias if the chosen constant is not appropriate.
* Reduces variance in the data, especially if a large number of values are imputed.
* Does not consider relationships between variables.
* The choice of the constant is subjective and can greatly influence the results.


# Notebook Structure

1. Import necessary dependencies
2. Create the dataset
3. Define the Utility function for Constant Value imputation
4. Execution of the utility function


# 1. Import necessary dependencies

In [71]:
# libraries & dataset

import pandas as pd
import numpy as np

# 2. Create the dataset

In [72]:
# Create Sample DataFrame with missing values

data = pd.DataFrame({
    'CustomerID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'Product_Type': ['T-shirt', 'Shorts', 'Track Pants', 'T-shirt', None,
                     'Joggers', 'Cap', None, 'Shorts', 'T-shirt'],
    'Purchase_Amount': [500, 1200, np.nan, 600, 1500, 2000, 100, 700, 1300, np.nan],
    'Discount_Percentage': [0.1, 0.05, 0.15, np.nan, 0.03, 0.0, 0.2, 0.12, 0.07, 0.5],
    'Delivery_Time_Days': [2, 3, 4, 2, 5, 6, 1, np.nan, 4, 3]
})

In [73]:
print("Original Data:\n")
data

Original Data:



Unnamed: 0,CustomerID,Product_Type,Purchase_Amount,Discount_Percentage,Delivery_Time_Days
0,1,T-shirt,500.0,0.1,2.0
1,2,Shorts,1200.0,0.05,3.0
2,3,Track Pants,,0.15,4.0
3,4,T-shirt,600.0,,2.0
4,5,,1500.0,0.03,5.0
5,6,Joggers,2000.0,0.0,6.0
6,7,Cap,100.0,0.2,1.0
7,8,,700.0,0.12,
8,9,Shorts,1300.0,0.07,4.0
9,10,T-shirt,,0.5,3.0


# 3. Define the Utility function for Constant Value imputation

The constant_value_imputation function takes a DataFrame, column name, and a constant value. It creates a copy of the DataFrame and replaces missing values in the specified column with the provided constant.  The original DataFrame is unchanged, and the function returns the modified copy.

constant_value_imputation(df, column, constant_value) Function:

* Takes a DataFrame, column name, and a constant value.
* Utilizes fillna() to replace missing values in the specified column with the provided constant.
* Returns a new DataFrame with the imputed values.

In [65]:
# Define the utility function for Constant Value Imputation

def constant_value_imputation(df, column, constant_value):
    """
    Imputes missing values in a specified column with a given constant value.

    Args:
        df (pd.DataFrame): The input DataFrame.
        column (str): The name of the column to impute.
        constant_value: The value to use for imputation.  Can be any data type
                        that the column can hold (e.g., 0, -99, 'Missing').

    Returns:
        pd.DataFrame: A new DataFrame with the missing values imputed.
    """
    df_imputed = df.copy()  # Create a copy to avoid modifying the original DataFrame
    df_imputed[column].fillna(constant_value, inplace=True)
    return df_imputed

# 4. Execution of the utility function

### A. Impute the missing values in 'Purchase_Amount' column with 0

In [74]:
# Impute 'Purchase_Amount' with 0

data_imputed_purchase_0 = constant_value_imputation(data.copy(), 'Purchase_Amount', 0)
print("\nData after Constant Value Imputation (Purchase_Amount with 0):\n")
data_imputed_purchase_0


Data after Constant Value Imputation (Purchase_Amount with 0):



Unnamed: 0,CustomerID,Product_Type,Purchase_Amount,Discount_Percentage,Delivery_Time_Days
0,1,T-shirt,500.0,0.1,2.0
1,2,Shorts,1200.0,0.05,3.0
2,3,Track Pants,0.0,0.15,4.0
3,4,T-shirt,600.0,,2.0
4,5,,1500.0,0.03,5.0
5,6,Joggers,2000.0,0.0,6.0
6,7,Cap,100.0,0.2,1.0
7,8,,700.0,0.12,
8,9,Shorts,1300.0,0.07,4.0
9,10,T-shirt,0.0,0.5,3.0


As you can see the Row with NaN value is replaced with 0.0

### B. Impute the missing values in 'Purchase_Amount' column with -99

In [75]:
# Impute 'Purchase_Amount' with -99
data_imputed_purchase_neg99 = constant_value_imputation(data.copy(), 'Purchase_Amount', -99)
print("\nData after Constant Value Imputation (Purchase_Amount with -99):\n")
data_imputed_purchase_neg99


Data after Constant Value Imputation (Purchase_Amount with -99):



Unnamed: 0,CustomerID,Product_Type,Purchase_Amount,Discount_Percentage,Delivery_Time_Days
0,1,T-shirt,500.0,0.1,2.0
1,2,Shorts,1200.0,0.05,3.0
2,3,Track Pants,-99.0,0.15,4.0
3,4,T-shirt,600.0,,2.0
4,5,,1500.0,0.03,5.0
5,6,Joggers,2000.0,0.0,6.0
6,7,Cap,100.0,0.2,1.0
7,8,,700.0,0.12,
8,9,Shorts,1300.0,0.07,4.0
9,10,T-shirt,-99.0,0.5,3.0


As you can see the Row with NaN value is replaced with -99.0

### C. Impute the missing values in 'Product_Type' column with 'Unknown'

In [76]:
# Impute 'Product_Type' with 'Unknown'

data_imputed_product_unknown = constant_value_imputation(data.copy(), 'Product_Type', 'Unknown')
print("\nData after Constant Value Imputation (Product_Type with 'Unknown'):\n")
data_imputed_product_unknown


Data after Constant Value Imputation (Product_Type with 'Unknown'):



Unnamed: 0,CustomerID,Product_Type,Purchase_Amount,Discount_Percentage,Delivery_Time_Days
0,1,T-shirt,500.0,0.1,2.0
1,2,Shorts,1200.0,0.05,3.0
2,3,Track Pants,,0.15,4.0
3,4,T-shirt,600.0,,2.0
4,5,Unknown,1500.0,0.03,5.0
5,6,Joggers,2000.0,0.0,6.0
6,7,Cap,100.0,0.2,1.0
7,8,Unknown,700.0,0.12,
8,9,Shorts,1300.0,0.07,4.0
9,10,T-shirt,,0.5,3.0


As you can see the Row with None  value is replaced with 'Unknown'

### D. Impute the missing values in 'Discount_Percentage' column with 0.15

In [77]:
# Impute 'Discount_Percentage' with 0.15

data_imputed_discount_015 = constant_value_imputation(data.copy(), 'Discount_Percentage', 0.15)
print("\nData after Constant Value Imputation (Discount_Percentage with 0.15):\n")
data_imputed_discount_015


Data after Constant Value Imputation (Discount_Percentage with 0.15):



Unnamed: 0,CustomerID,Product_Type,Purchase_Amount,Discount_Percentage,Delivery_Time_Days
0,1,T-shirt,500.0,0.1,2.0
1,2,Shorts,1200.0,0.05,3.0
2,3,Track Pants,,0.15,4.0
3,4,T-shirt,600.0,0.15,2.0
4,5,,1500.0,0.03,5.0
5,6,Joggers,2000.0,0.0,6.0
6,7,Cap,100.0,0.2,1.0
7,8,,700.0,0.12,
8,9,Shorts,1300.0,0.07,4.0
9,10,T-shirt,,0.5,3.0


As you can see the Row with NaN value is replaced with 0.15

### E. Impute the missing values in 'Delivery_Time_Days' column with 7

In [78]:
# Impute 'Delivery_Time_Days' with 7

data_imputed_delivery_7 = constant_value_imputation(data.copy(), 'Delivery_Time_Days', 7)
print("\nData after Constant Value Imputation (Delivery_Time_Days with 7):\n")
data_imputed_delivery_7


Data after Constant Value Imputation (Delivery_Time_Days with 7):



Unnamed: 0,CustomerID,Product_Type,Purchase_Amount,Discount_Percentage,Delivery_Time_Days
0,1,T-shirt,500.0,0.1,2.0
1,2,Shorts,1200.0,0.05,3.0
2,3,Track Pants,,0.15,4.0
3,4,T-shirt,600.0,,2.0
4,5,,1500.0,0.03,5.0
5,6,Joggers,2000.0,0.0,6.0
6,7,Cap,100.0,0.2,1.0
7,8,,700.0,0.12,7.0
8,9,Shorts,1300.0,0.07,4.0
9,10,T-shirt,,0.5,3.0


As you can see the Row with NaN value is replaced with 7.0