# 1. Time-Based Features: Time since an event, time intervals

Creating time-based features in date/time feature engineering involves calculating the duration or interval between specific events or points in time, or the time elapsed since a particular event occurred. These features can provide valuable insights into the temporal dynamics of the data and can be highly predictive in various scenarios.

Types of Time-Based Features:

Time Since an Event:

Definition: This feature represents the amount of time that has passed between a specific point in the past and the current time (or another relevant timestamp).

Examples (Relating to Men's Sports Apparel E-commerce):

1. Time Since Last Purchase: For each customer, the number of days (or weeks, months) since their most recent order. This can indicate customer engagement and likelihood of future purchases.
2. Time Since Last Website Visit: For each user, the duration since their last activity on the website.
3. Time Since Product Listing: The age of a product listing on the platform (e.g., number of days since it was first added). Newer listings might get more initial attention.
4. Time Since Last Review: For a product, how long ago the most recent review was posted. Recent reviews might be more influential.

Time Intervals (Durations):

Definition: This feature represents the duration between two specific timestamps.

Examples:

1. Order Processing Time: The time difference between when an order was placed and when it was shipped. Longer processing times might indicate logistical issues.
2. Session Duration: The time elapsed between a user's login and logout on the website. Longer sessions might indicate higher engagement.
3. Time to Conversion: The time difference between a user's first visit to the website and their first purchase.
4. Event Recurrence Interval: The time between successive occurrences of the same event (e.g., the time between a customer's repeat purchases).

Why Create Time-Based Features?

1. Capture Temporal Decay: The impact of past events often diminishes over time. "Time Since Last Purchase" captures this decay in customer engagement.
2. Understand Behavioral Patterns: Time intervals can reveal patterns in user behavior, such as how frequently they make purchases or how long they engage with the platform.
3. Identify Trends and Seasonality: Tracking the age of listings or the time since certain events can help identify trends and seasonal effects.
4. Improve Prediction Accuracy: These features can add significant predictive power to models by incorporating the temporal dimension of the data. For example, a customer who purchased recently is more likely to purchase again soon.
5. Feature Engineering for Specific Tasks: Time-based features are crucial for tasks like churn prediction (time since last activity), fraud detection (time between transactions), and recommendation systems (recency of interactions).

# Import necessary dependencies

In [13]:
import pandas as pd
from datetime import datetime

# Create sample dataset

In [14]:
# Sample DataFrame (representing customer interactions)
data = pd.DataFrame({
    'CustomerID': [1, 1, 2, 3, 1, 2, 3, 4],
    'Event': ['Purchase', 'Visit', 'Purchase', 'Visit', 'Purchase', 'Visit', 'Purchase', 'Visit'],
    'Timestamp': pd.to_datetime([
        '2025-03-01 10:00:00',
        '2025-03-05 14:30:00',
        '2025-02-25 09:15:00',
        '2025-03-03 18:00:00',
        '2025-03-12 11:45:00',
        '2025-03-08 20:00:00',
        '2025-03-15 16:20:00',
        '2025-03-17 08:00:00'
    ])
})

current_time = datetime(2025, 3, 18, 7, 0, 0)
data

Unnamed: 0,CustomerID,Event,Timestamp
0,1,Purchase,2025-03-01 10:00:00
1,1,Visit,2025-03-05 14:30:00
2,2,Purchase,2025-02-25 09:15:00
3,3,Visit,2025-03-03 18:00:00
4,1,Purchase,2025-03-12 11:45:00
5,2,Visit,2025-03-08 20:00:00
6,3,Purchase,2025-03-15 16:20:00
7,4,Visit,2025-03-17 08:00:00


In [15]:
# 1. Time Since Last Purchase

last_purchase_time = data[data['Event'] == 'Purchase'].groupby('CustomerID')['Timestamp'].max().reset_index()
last_purchase_time['Time_Since_Last_Purchase'] = (current_time - last_purchase_time['Timestamp']).dt.days
data = pd.merge(data, last_purchase_time[['CustomerID', 'Time_Since_Last_Purchase']], on='CustomerID', how='left')
print("Data with Time Since Last Purchase (in days):")
data

Data with Time Since Last Purchase (in days):


Unnamed: 0,CustomerID,Event,Timestamp,Time_Since_Last_Purchase
0,1,Purchase,2025-03-01 10:00:00,5.0
1,1,Visit,2025-03-05 14:30:00,5.0
2,2,Purchase,2025-02-25 09:15:00,20.0
3,3,Visit,2025-03-03 18:00:00,2.0
4,1,Purchase,2025-03-12 11:45:00,5.0
5,2,Visit,2025-03-08 20:00:00,20.0
6,3,Purchase,2025-03-15 16:20:00,2.0
7,4,Visit,2025-03-17 08:00:00,


CustomerID 4 only has a 'Visit' event and no 'Purchase' event in the dataset.

Therefore, when the code filters for data['Event'] == 'Purchase', CustomerID 4 is not included in the last_purchase_time DataFrame.

When this last_purchase_time DataFrame is merged back into the original data using a left merge on CustomerID, for CustomerID 4, there is no corresponding Time_Since_Last_Purchase value to bring in. As a result, the Time_Since_Last_Purchase column for the row where CustomerID is 4 gets filled with NaN (Not a Number), indicating a missing value because there's no record of a purchase for that customer in the provided data.

In [12]:
# 2. Time Interval: Time between consecutive visits (for Customer 1)

customer_1_visits = data[(data['CustomerID'] == 1) & (data['Event'] == 'Visit')].sort_values(by='Timestamp')
customer_1_visits['Time_Since_Last_Visit'] = customer_1_visits['Timestamp'].diff().dt.seconds / 3600  # in hours
print("\nCustomer 1 Visits with Time Since Last Visit (in hours):")
customer_1_visits


Customer 1 Visits with Time Since Last Visit (in hours):


Unnamed: 0,CustomerID,Event,Timestamp,Time_Since_Last_Purchase,Time_Since_Last_Visit
1,1,Visit,2025-03-05 14:30:00,5.0,


Time Since Last Viewed Product: If you track product views, the time since a user last viewed a specific item or category could be a strong indicator of interest.

Time Between Adding to Cart and Purchase: The duration between a user adding an item to their cart and completing the purchase. Shorter intervals might indicate higher purchase intent.

Time Since Last Discount on a Product: How long it has been since a specific sports apparel item was offered at a discount. This could influence a customer's decision to buy now or wait.

By creating these time-based features, you can enrich your dataset with temporal information that can significantly improve the performance of your predictive models. Remember to choose the appropriate time units (seconds, minutes, hours, days, etc.) based on the context of your problem.