# Feature Engineering with `feature_extractor.py`

This notebook demonstrates the core functionalities of the feature extraction module, showcasing how structured features can be extracted from parsed transaction data.

In [1]:
import sys
import os
sys.path.append('/Users/m1pro/Documents/GitHub/fraud_detection_system') # Adjust the path as necessary

# Import necessary libraries
import pandas as pd
from src.features.feature_extractor import TransactionFeatureExtractor
from src.parser.log_parser import TransactionLogParser
import logging


# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

pd.set_option('display.max_columns', None)


# Initialize the feature extractor
feature_extractor = TransactionFeatureExtractor()


INFO:src.features.feature_extractor:Feature extractor initialized


## Feature Extraction Process

Below are examples of how to use the `feature_extractor` to process transaction data and extract meaningful features.

In [2]:
# # Load sample transaction logs
# logs = [
#     '2025-07-05 19:18:10::user1069::withdrawal::2995.12::London::iPhone 13',
#     'usr:user1076|cashout|€4821.85|Glasgow|2025-07-15 12:56:05|Pixel 6',
#     '2025-07-20 05:38:14 >> [user1034] did top-up - amt=€2191.06 - None // dev:iPhone 13',
#     '2025-07-23 15:57:12 | user: user1098 | txn: purchase of €2019.47 from Glasgow | device: None'
# ]

# parser = TransactionLogParser()
# parsed_transactions = [
#     parser.parse_log_entry(log)
#     for log in logs
# ]

# # Convert transactions to DataFrame
# data = [
#     {
#         'raw_log': t.raw_log,
#         'timestamp': t.timestamp,
#         'user_id': t.user_id,
#         'transaction_type': t.transaction_type,
#         'amount': t.amount,
#         'location': t.location,
#         'device': t.device,
#         'is_parsed': t.is_parsed
#     } for t in parsed_transactions
# ]
# df = pd.DataFrame(data)

# # Display the parsed DataFrame
# df

In [3]:
df = pd.read_csv('../results/parsed_transactions.csv')
df.head()

Unnamed: 0,raw_log,timestamp,user_id,transaction_type,amount,currency,location,device,is_parsed,parse_errors,amount_raw,hour,day_of_week,is_weekend
0,2025-07-05 19:18:10::user1069::withdrawal::299...,2025-07-05 19:18:10,user1069,withdrawal,2995.12,£,London,iPhone 13,True,,2995.12,19,5,True
1,usr:user1076|cashout|€4821.85|Glasgow|2025-07-...,2025-07-15 12:56:05,user1076,cashout,4195.01,€,Glasgow,Pixel 6,True,,4821.85,12,1,False
2,2025-07-20 05:38:14 >> [user1034] did top-up -...,2025-07-20 05:38:14,user1034,top,1906.22,€,,iPhone 13,True,,2191.06,5,6,True
3,2025-06-13 10:04:51 >> [user1068] did deposit ...,2025-06-13 10:04:51,user1068,deposit,1471.25,€,Glasgow,,True,,1691.09,10,4,False
4,2025-07-29 23:47:37 | user: user1014 | txn: de...,2025-07-29 23:47:37,user1014,deposit,3539.5,£,Glasgow,iPhone 13,True,,3539.5,23,1,False


### Basic Feature Extraction

Here we perform basic feature extraction on the parsed transaction data. This includes processing temporal features, amount features, and more.

In [4]:
# Extract basic features
df_basic_features = feature_extractor.extract_basic_features(df)

# Display the dataframe with basic features
df_basic_features.head()

INFO:src.features.feature_extractor:Extracting basic features...
INFO:src.features.feature_extractor:Basic features extracted. Shape: (7774, 29)


Unnamed: 0,raw_log,timestamp,user_id,transaction_type,amount,currency,location,device,is_parsed,parse_errors,amount_raw,hour,day_of_week,is_weekend,month,day_of_month,hour_category,amount_log,amount_rounded,amount_category,has_currency,currency_filled,has_location,location_filled,has_device,device_filled,device_brand,transaction_type_filled,transaction_group
0,2025-07-05 19:18:10::user1069::withdrawal::299...,2025-07-05 19:18:10,user1069,withdrawal,2995.12,£,London,iPhone 13,True,,2995.12,19,5,True,7,5,evening,8.005073,3000.0,high,True,£,True,London,True,iPhone 13,iPhone,withdrawal,cash
1,usr:user1076|cashout|€4821.85|Glasgow|2025-07-...,2025-07-15 12:56:05,user1076,cashout,4195.01,€,Glasgow,Pixel 6,True,,4821.85,12,1,False,7,15,morning,8.341889,4200.0,very_high,True,€,True,Glasgow,True,Pixel 6,Pixel,cashout,cash
2,2025-07-20 05:38:14 >> [user1034] did top-up -...,2025-07-20 05:38:14,user1034,top,1906.22,€,,iPhone 13,True,,2191.06,5,6,True,7,20,night,7.553402,1910.0,medium,True,€,False,unknown,True,iPhone 13,iPhone,top,other
3,2025-06-13 10:04:51 >> [user1068] did deposit ...,2025-06-13 10:04:51,user1068,deposit,1471.25,€,Glasgow,,True,,1691.09,10,4,False,6,13,morning,7.294547,1470.0,medium,True,€,True,Glasgow,False,unknown,,deposit,account
4,2025-07-29 23:47:37 | user: user1014 | txn: de...,2025-07-29 23:47:37,user1014,deposit,3539.5,£,Glasgow,iPhone 13,True,,3539.5,23,1,False,7,29,evening,8.172023,3540.0,very_high,True,£,True,Glasgow,True,iPhone 13,iPhone,deposit,account


In [5]:
df_basic_features.columns

Index(['raw_log', 'timestamp', 'user_id', 'transaction_type', 'amount',
       'currency', 'location', 'device', 'is_parsed', 'parse_errors',
       'amount_raw', 'hour', 'day_of_week', 'is_weekend', 'month',
       'day_of_month', 'hour_category', 'amount_log', 'amount_rounded',
       'amount_category', 'has_currency', 'currency_filled', 'has_location',
       'location_filled', 'has_device', 'device_filled', 'device_brand',
       'transaction_type_filled', 'transaction_group'],
      dtype='object')

### User Behavioral Feature Extraction

Next, we extract features that describe user behavior over time and across transactions.

In [6]:
# Extract user behavioral features
df_user_behavioral = feature_extractor.extract_user_behavioral_features(df_basic_features)

# Display the dataframe with user behavioral features
df_user_behavioral.head()

INFO:src.features.feature_extractor:Extracting user behavioral features...
INFO:src.features.feature_extractor:User behavioral features extracted. Added 17 features.


Unnamed: 0,raw_log,timestamp,user_id,transaction_type,amount,currency,location,device,is_parsed,parse_errors,amount_raw,hour,day_of_week,is_weekend,month,day_of_month,hour_category,amount_log,amount_rounded,amount_category,has_currency,currency_filled,has_location,location_filled,has_device,device_filled,device_brand,transaction_type_filled,transaction_group,user_tx_count,user_avg_amount,user_std_amount,user_min_amount,user_max_amount,user_total_amount,user_first_tx_time,user_last_tx_time,user_unique_locations,user_unique_devices,user_unique_types,user_unique_currencies,user_activity_days,user_tx_per_day,user_amount_cv,user_location_diversity,user_device_diversity
0,2025-07-05 19:18:10::user1069::withdrawal::299...,2025-07-05 19:18:10,user1069,withdrawal,2995.12,£,London,iPhone 13,True,,2995.12,19,5,True,7,5,evening,8.005073,3000.0,high,True,£,True,London,True,iPhone 13,iPhone,withdrawal,cash,114,2337.6471,1317.3738,54.35,4981.88,266491.77,2025-06-02 18:14:53,2025-07-31 05:02:25,7,12,9,3,58.449676,1.9176,0.5635,0.0614,0.1053
1,usr:user1076|cashout|€4821.85|Glasgow|2025-07-...,2025-07-15 12:56:05,user1076,cashout,4195.01,€,Glasgow,Pixel 6,True,,4821.85,12,1,False,7,15,morning,8.341889,4200.0,very_high,True,€,True,Glasgow,True,Pixel 6,Pixel,cashout,cash,61,2113.761,1399.0567,55.73,4811.62,128939.42,2025-06-03 09:15:05,2025-07-31 03:44:22,7,8,8,3,57.770336,1.0379,0.6619,0.1148,0.1311
2,2025-07-20 05:38:14 >> [user1034] did top-up -...,2025-07-20 05:38:14,user1034,top,1906.22,€,,iPhone 13,True,,2191.06,5,6,True,7,20,night,7.553402,1910.0,medium,True,€,False,unknown,True,iPhone 13,iPhone,top,other,153,2398.4243,1280.623,1.41,4703.05,366958.92,2025-06-01 18:51:07,2025-07-31 00:18:41,7,11,9,3,59.227477,2.5404,0.5339,0.0458,0.0719
3,2025-06-13 10:04:51 >> [user1068] did deposit ...,2025-06-13 10:04:51,user1068,deposit,1471.25,€,Glasgow,,True,,1691.09,10,4,False,6,13,morning,7.294547,1470.0,medium,True,€,True,Glasgow,False,unknown,,deposit,account,157,2278.1765,1361.1492,36.09,4952.36,357673.71,2025-06-01 23:52:01,2025-07-31 00:04:34,7,12,9,3,59.008715,2.6163,0.5975,0.0446,0.0764
4,2025-07-29 23:47:37 | user: user1014 | txn: de...,2025-07-29 23:47:37,user1014,deposit,3539.5,£,Glasgow,iPhone 13,True,,3539.5,23,1,False,7,29,evening,8.172023,3540.0,very_high,True,£,True,Glasgow,True,iPhone 13,iPhone,deposit,account,165,2344.4478,1433.3515,78.55,4946.81,386833.89,2025-06-01 12:48:18,2025-07-30 04:00:28,7,11,9,3,58.633449,2.7669,0.6114,0.0424,0.0667


### Temporal Feature Extraction

We extract temporal and sequence-based features to understand transaction timing and patterns.

In [7]:
# Extract temporal features
df_temporal = feature_extractor.extract_temporal_features(df_user_behavioral)

# Display the dataframe with temporal features
df_temporal.head()

INFO:src.features.feature_extractor:Extracting temporal features...
  df_features.groupby('user_id')
INFO:src.features.feature_extractor:Temporal features extracted.


Unnamed: 0,raw_log,timestamp,user_id,transaction_type,amount,currency,location,device,is_parsed,parse_errors,amount_raw,hour,day_of_week,is_weekend,month,day_of_month,hour_category,amount_log,amount_rounded,amount_category,has_currency,currency_filled,has_location,location_filled,has_device,device_filled,device_brand,transaction_type_filled,transaction_group,user_tx_count,user_avg_amount,user_std_amount,user_min_amount,user_max_amount,user_total_amount,user_first_tx_time,user_last_tx_time,user_unique_locations,user_unique_devices,user_unique_types,user_unique_currencies,user_activity_days,user_tx_per_day,user_amount_cv,user_location_diversity,user_device_diversity,time_since_last_tx,time_until_next_tx,tx_velocity_24h,location_changed,device_changed,amount_zscore_user,is_unusual_hour,is_night_transaction
2132,2025-06-01 12:03:31 - user=user1000 - action=c...,2025-06-01 12:03:31,user1000,cashout,2235.91,$,London,Samsung Galaxy S10,True,,2981.21,12,6,True,6,1,morning,7.712851,2240.0,medium,True,$,True,London,True,Samsung Galaxy S10,Samsung,cashout,cash,74,2341.6985,1426.8656,4.82,4696.81,173285.69,2025-06-01 12:03:31,2025-07-31 09:41:19,7,10,9,3,59.90125,1.2151,0.6093,0.0946,0.1351,0.0,7.271944,0,1,1,-0.0741,0,0
3690,01/06/2025 19:19:50 ::: user1000 *** DEBIT :::...,2025-06-01 19:19:50,user1000,debit,1267.67,£,Manchester,Xiaomi Mi 11,True,,1267.67,19,6,True,6,1,evening,7.145724,1270.0,medium,True,£,True,Manchester,True,Xiaomi Mi 11,Xiaomi,debit,cash,74,2341.6985,1426.8656,4.82,4696.81,173285.69,2025-06-01 12:03:31,2025-07-31 09:41:19,7,10,9,3,59.90125,1.2151,0.6093,0.0946,0.1351,7.271944,24.548333,1,1,1,-0.7527,0,0
1147,2025-06-02 19:52:44 | user: user1000 | txn: re...,2025-06-02 19:52:44,user1000,refund,2708.01,$,Cardiff,Huawei P30,True,,3610.68,19,0,False,6,2,evening,7.904339,2710.0,high,True,$,True,Cardiff,True,Huawei P30,Huawei,refund,account,74,2341.6985,1426.8656,4.82,4696.81,173285.69,2025-06-01 12:03:31,2025-07-31 09:41:19,7,10,9,3,59.90125,1.2151,0.6093,0.0946,0.1351,24.548333,14.319167,0,1,1,0.2567,0,0
4128,2025-06-03 10:11:53 - user=user1000 - action=c...,2025-06-03 10:11:53,user1000,cashout,4659.06,£,Birmingham,Nokia 3310,True,,4659.06,10,1,False,6,3,morning,8.446784,4660.0,very_high,True,£,True,Birmingham,True,Nokia 3310,Nokia,cashout,cash,74,2341.6985,1426.8656,4.82,4696.81,173285.69,2025-06-01 12:03:31,2025-07-31 09:41:19,7,10,9,3,59.90125,1.2151,0.6093,0.0946,0.1351,14.319167,11.193611,1,1,1,1.6241,0,0
7176,2025-06-03 21:23:30 | user: user1000 | txn: ca...,2025-06-03 21:23:30,user1000,cashout,4063.97,£,Liverpool,,True,,4063.97,21,1,False,6,3,evening,8.310162,4060.0,very_high,True,£,True,Liverpool,False,unknown,,cashout,cash,74,2341.6985,1426.8656,4.82,4696.81,173285.69,2025-06-01 12:03:31,2025-07-31 09:41:19,7,10,9,3,59.90125,1.2151,0.6093,0.0946,0.1351,11.193611,18.785,1,1,1,1.207,0,0


### Contextual Features and Interactions

Finally, we extract features that capture context and interactions across multiple dimensions, such as location and device combinations.

In [15]:
# Extract contextual features
df_contextual = feature_extractor.extract_contextual_features(df_temporal)

# Display the dataframe with contextual and interaction features
df_contextual.head()

df_contextual.to_csv('../results/feature_store.csv', index=False) 
print("💾 All features saved to results/feature_store.csv")

INFO:src.features.feature_extractor:Extracting contextual features...
INFO:src.features.feature_extractor:Contextual features extracted.


💾 All features saved to results/feature_store.csv


## Conclusion

This notebook has demonstrated the comprehensive feature engineering capabilities of the `feature_extractor` module, enabling rich, structured data preparation from raw transaction logs.