# 1. Domain Specific feature creation

Creating domain-specific features in numerical feature engineering involves leveraging your understanding of the particular domain or business problem to derive new numerical features that are likely to be relevant and informative for your model. These features often go beyond the raw data and represent concepts or metrics that are meaningful within that specific context.

How Domain-Specific Features are Created:

The creation process relies heavily on expert knowledge, business understanding, and intuition about what factors might influence the target variable. It often involves:

Identifying Key Concepts and Relationships: Understanding the core entities, processes, and relationships within the domain.

Formulating Hypotheses: Based on your understanding, hypothesizing which derived metrics or combinations of existing data could be predictive.
Defining and Calculating New Features: Translating these hypotheses into concrete numerical features by performing calculations on the available data.
Examples (Relating to E-commerce and the "Men's Sports Apparel" context in ABC city):

Let's consider the e-commerce scenario of selling men's sports apparel and think about features a domain expert might suggest:

Time Since Last Purchase (for a customer):

1. Domain Expertise: Customers who recently made a purchase might be more likely to make another one or respond to promotions.
2. Calculation: If you have customer purchase history, you can calculate the difference between the current date (or the last data collection date) and the date of their most recent purchase. This would be a numerical feature representing recency (e.g., in days, weeks).

Average Purchase Frequency (for a customer):

1. Domain Expertise: Customers who buy frequently might be high-value or loyal customers.
2. Calculation: Divide the total number of orders a customer has placed by the time elapsed since their first purchase. This gives a rate of purchase.

Average Time Between Purchases (for a customer):

1. Domain Expertise: Understanding the typical interval between a customer's purchases can help predict when they might be due for another one.
2. Calculation: Calculate the time difference between consecutive orders for each customer and then average these differences.

Discount Velocity (for a product):

1. Domain Expertise: How quickly a discount is applied or how frequently a product goes on sale might indicate its demand or seasonality.
2. Calculation: If you have a history of product prices and discounts over time, you could calculate the rate at which discounts are introduced or the average duration between discount periods.

Price Relative to Category Average:

1. Domain Expertise: Customers might be sensitive to whether a product is priced above or below the average for its category.
2. Calculation: For each product, calculate the difference or ratio between its price and the average price of all products within the same category (e.g., "Shorts," "Track Pants").

Stockout Frequency (for a product):

1. Domain Expertise: Products that frequently go out of stock might be high in demand or have supply chain issues, which could affect customer experience and future purchases.
2. Calculation: Track the number of times a product has been marked as "out of stock" over a given period.

Number of Related Items in Cart (for a customer session):

1. Domain Expertise: Customers buying multiple related sports apparel items (e.g., shorts and a t-shirt) might have a higher purchase intent or value.
2. Calculation: During a customer's current browsing session or at checkout, count the number of items in their cart that belong to related categories.

Click-Through Rate on Similar Items (for a product):

1. Domain Expertise: How often customers click on a particular product compared to other similar items shown alongside it can indicate its attractiveness.
2. Calculation: Track impressions and clicks for product recommendations or search results.

How Domain-Specific Features Help:

1. Capture Underlying Business Logic: These features encode knowledge about how the business operates and what factors are likely to drive the outcome.
2. Improve Model Interpretability: Features based on domain understanding are often easier to interpret and explain to stakeholders.
3. Increase Predictive Power: By focusing on relevant concepts, these features can provide strong signals to the model, leading to better accuracy.
4. Handle Specific Business Scenarios: They can be tailored to address particular business questions or challenges.

# Import necessary dependencies

In [160]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Create sample dataset

In [161]:
# Sample DataFrame with customer purchase data
purchase_data = pd.DataFrame({
    'CustomerID': [1, 1, 2, 3, 1, 2, 3, 4],
    'PurchaseDate': [
        datetime(2025, 3, 1), datetime(2025, 3, 10), datetime(2025, 2, 20), datetime(2025, 3, 5),
        datetime(2025, 3, 15), datetime(2025, 3, 12), datetime(2025, 3, 18), datetime(2025, 3, 16)
    ],
    'AmountSpent': [1500, 800, 2200, 500, 1200, 1800, 700, 2500]
})

current_date = datetime(2025, 3, 18)
purchase_data

Unnamed: 0,CustomerID,PurchaseDate,AmountSpent
0,1,2025-03-01,1500
1,1,2025-03-10,800
2,2,2025-02-20,2200
3,3,2025-03-05,500
4,1,2025-03-15,1200
5,2,2025-03-12,1800
6,3,2025-03-18,700
7,4,2025-03-16,2500


In [162]:
# Feature: Time Since Last Purchase

last_purchase_date = purchase_data.groupby('CustomerID')['PurchaseDate'].max().reset_index()
last_purchase_date['Time_Since_Last_Purchase'] = (current_date - last_purchase_date['PurchaseDate']).dt.days
purchase_data = pd.merge(purchase_data, last_purchase_date[['CustomerID', 'Time_Since_Last_Purchase']], on='CustomerID', how='left')
print("Data with Time Since Last Purchase:")
purchase_data

Data with Time Since Last Purchase:


Unnamed: 0,CustomerID,PurchaseDate,AmountSpent,Time_Since_Last_Purchase
0,1,2025-03-01,1500,3
1,1,2025-03-10,800,3
2,2,2025-02-20,2200,6
3,3,2025-03-05,500,0
4,1,2025-03-15,1200,3
5,2,2025-03-12,1800,6
6,3,2025-03-18,700,0
7,4,2025-03-16,2500,2


In [163]:
# Feature: Average Purchase Frequency (simplified)

first_purchase_date = purchase_data.groupby('CustomerID')['PurchaseDate'].min().reset_index()
purchase_counts = purchase_data.groupby('CustomerID').size().reset_index(name='Number_of_Orders')
customer_history = pd.merge(first_purchase_date, purchase_counts, on='CustomerID')
customer_history['Time_Difference'] = (current_date - customer_history['PurchaseDate']).dt.days + 1 # Add 1 to avoid division by zero for new customers
customer_history['Avg_Purchase_Frequency'] = customer_history['Number_of_Orders'] / customer_history['Time_Difference']
purchase_data = pd.merge(purchase_data, customer_history[['CustomerID', 'Avg_Purchase_Frequency']], on='CustomerID', how='left')
print("\nData with Average Purchase Frequency:")
purchase_data


Data with Average Purchase Frequency:


Unnamed: 0,CustomerID,PurchaseDate,AmountSpent,Time_Since_Last_Purchase,Avg_Purchase_Frequency
0,1,2025-03-01,1500,3,0.166667
1,1,2025-03-10,800,3,0.166667
2,2,2025-02-20,2200,6,0.074074
3,3,2025-03-05,500,0,0.142857
4,1,2025-03-15,1200,3,0.166667
5,2,2025-03-12,1800,6,0.074074
6,3,2025-03-18,700,0,0.142857
7,4,2025-03-16,2500,2,0.333333


This  code demonstrates how you might calculate "Time Since Last Purchase" and a simplified "Average Purchase Frequency" based on customer purchase history, showcasing the creation of domain-specific features. The actual implementation will depend on the specific data you have and the domain you are working in.