# ScoreSight - Part 3: Feature Engineering

**Author:** Prathamesh Fuke  
**Branch:** Prathamesh_Fuke  
**Date:** October 28, 2025

## Objective
Create new features to improve model performance:
- **Match Prediction Features**: Team form, home/away advantage, head-to-head stats
- **Top Scorer Features**: Goals per game, assists ratio, historical performance
- **Points Tally Features**: Win rate, goal difference, consistency metrics

## 1. Import Libraries and Load Cleaned Data

In [None]:
import pandas as pd
import numpy as np
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.max_columns', None)
print("✓ Libraries imported")

In [None]:
# Load cleaned datasets
print("Loading cleaned datasets...")
match_data = pd.read_csv('data_cleaned_match.csv')
player_data = pd.read_csv('data_cleaned_player.csv')
league_data = pd.read_csv('data_cleaned_league.csv')
print(f"✓ Match data: {match_data.shape}")
print(f"✓ Player data: {player_data.shape}")
print(f"✓ League data: {league_data.shape}")

## 2. Feature Engineering for Match Prediction

In [None]:
print("="*80)
print("FEATURE ENGINEERING FOR MATCH PREDICTION")
print("="*80)

match_features = match_data.copy()
print(f"\nStarting with {match_features.shape[1]} columns")
print(f"Available columns: {list(match_features.columns)}")

In [None]:
# Display sample to understand structure
print("\nSample of match data:")
display(match_features.head())

In [None]:
# Feature 1: Home/Away Win Rate (if applicable)
# This section will be customized based on actual column names
# Example features to create:
# - home_win_rate
# - away_win_rate
# - goal_difference
# - recent_form (last 5 matches)
# - head_to_head_record

print("\nCreating match prediction features...")
print("Note: Features will be customized based on actual data structure")

## 3. Feature Engineering for Player Data (Top Scorer)

In [None]:
print("="*80)
print("FEATURE ENGINEERING FOR TOP SCORER PREDICTION")
print("="*80)

player_features = player_data.copy()
print(f"\nStarting with {player_features.shape[1]} columns")
print(f"Available columns: {list(player_features.columns)}")

In [None]:
# Display sample
print("\nSample of player data:")
display(player_features.head())

In [None]:
# Create player performance features
print("\nCreating player performance features...")

# Check if we have goals and assists columns
goal_cols = [col for col in player_features.columns if 'goal' in col.lower()]
assist_cols = [col for col in player_features.columns if 'assist' in col.lower()]
games_cols = [col for col in player_features.columns if 'game' in col.lower() or 'match' in col.lower()]

print(f"Goal-related columns: {goal_cols}")
print(f"Assist-related columns: {assist_cols}")
print(f"Games-related columns: {games_cols}")

In [None]:
# Example features (will be customized):
# - goals_per_game
# - assists_per_game
# - goal_contribution (goals + assists)
# - scoring_efficiency
# - historical_average

print("\nPlayer features will be created based on available columns...")

## 4. Feature Engineering for League/Team Data

In [None]:
print("="*80)
print("FEATURE ENGINEERING FOR LEAGUE/TEAM DATA")
print("="*80)

league_features = league_data.copy()
print(f"\nStarting with {league_features.shape[1]} columns")
print(f"Available columns: {list(league_features.columns)}")

In [None]:
# Display sample
print("\nSample of league data:")
display(league_features.head())

In [None]:
# Create league/team features
print("\nCreating league/team features...")

# Example features:
# - points_per_game
# - win_percentage
# - goal_difference_per_game
# - consistency_score
# - championship_history

print("League features will be created based on available columns...")

## 5. Create Aggregate Features

In [None]:
print("="*80)
print("CREATING AGGREGATE FEATURES")
print("="*80)

# Aggregate features combine information from multiple datasets
# Examples:
# - Team offensive strength (from match data)
# - Team defensive strength
# - Player team context (linking player to team performance)

print("\nAggregate features will be created after examining data relationships...")

## 6. Feature Summary

In [None]:
print("="*80)
print("FEATURE ENGINEERING SUMMARY")
print("="*80)

print(f"\nMatch Features:")
print(f"  Original columns: {match_data.shape[1]}")
print(f"  After engineering: {match_features.shape[1]}")
print(f"  New features created: {match_features.shape[1] - match_data.shape[1]}")

print(f"\nPlayer Features:")
print(f"  Original columns: {player_data.shape[1]}")
print(f"  After engineering: {player_features.shape[1]}")
print(f"  New features created: {player_features.shape[1] - player_data.shape[1]}")

print(f"\nLeague Features:")
print(f"  Original columns: {league_data.shape[1]}")
print(f"  After engineering: {league_features.shape[1]}")
print(f"  New features created: {league_features.shape[1] - league_data.shape[1]}")

## 7. Save Feature-Engineered Data

In [None]:
print("\nSaving feature-engineered datasets...")
match_features.to_csv('data_features_match.csv', index=False)
player_features.to_csv('data_features_player.csv', index=False)
league_features.to_csv('data_features_league.csv', index=False)
print("\n✓ All feature-engineered datasets saved!")
print("\n" + "="*80)
print("NOTEBOOK 03 COMPLETED - Ready for Encoding & Feature Selection")
print("="*80)