# Part 3: Win Rate Driver Analysis Decision Engine

This notebook builds a decision engine to identify which deal attributes drive wins vs losses and how those drivers have changed over time.

## Objectives
1. Build an interpretable model to identify win rate drivers
2. Compare drivers between baseline and recent periods
3. Generate actionable outputs for sales leadership
4. Explain how a CRO would use this analysis

In [1]:
import sys
import os

# Add project root to path to import src modules
# Works whether run from project root or notebooks directory
if os.path.basename(os.getcwd()) == 'notebooks':
    project_root = os.path.dirname(os.getcwd())
else:
    project_root = os.getcwd()

if project_root not in sys.path:
    sys.path.insert(0, project_root)

# Ensure output directories exist
os.makedirs(os.path.join(project_root, 'outputs', 'insights'), exist_ok=True)
os.makedirs(os.path.join(project_root, 'outputs', 'reports'), exist_ok=True)

import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
from src.data_loader import load_sales_data, add_derived_features
from src.decision_engine import WinRateDriverAnalyzer, generate_actionable_outputs
from src.utils import plot_driver_importance

sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (14, 6)

print(f"Project root: {project_root}")
print("Libraries imported successfully!")

Project root: /Users/shoaibmobassir/Desktop/SkyGeni1234
Libraries imported successfully!


## 1. Problem Definition

**Goal**: Identify which deal attributes (industry, region, product type, lead source, ACV, sales cycle) are most associated with wins vs losses, and understand how these drivers have changed over the last two quarters.

**Why This Matters**: 
- Helps CRO prioritize which deals to focus on
- Reveals what's changed that's causing win rate decline
- Enables data-driven resource allocation and enablement decisions

In [2]:
# Load and prepare data
data_path = os.path.join(project_root, 'data', 'skygeni_sales_data.csv')
df = load_sales_data(data_path)
df = add_derived_features(df)

print(f"Dataset shape: {df.shape}")
print(f"Date range: {df['created_date'].min()} to {df['created_date'].max()}")
print(f"\nOverall win rate: {(df['outcome'] == 'Won').sum() / len(df):.1%}")

# Split into baseline and recent periods
# Baseline: First 6 months, Recent: Last 6 months
df['created_month'] = df['created_date'].dt.to_period('M')
baseline_cutoff = df['created_date'].max() - pd.DateOffset(months=6)

baseline_data = df[df['created_date'] < baseline_cutoff].copy()
recent_data = df[df['created_date'] >= baseline_cutoff].copy()

print(f"\nBaseline period: {baseline_data['created_date'].min()} to {baseline_data['created_date'].max()}")
print(f"Baseline win rate: {(baseline_data['outcome'] == 'Won').sum() / len(baseline_data):.1%}")
print(f"Baseline deals: {len(baseline_data)}")

print(f"\nRecent period: {recent_data['created_date'].min()} to {recent_data['created_date'].max()}")
print(f"Recent win rate: {(recent_data['outcome'] == 'Won').sum() / len(recent_data):.1%}")
print(f"Recent deals: {len(recent_data)}")

Dataset shape: (5000, 17)
Date range: 2023-01-01 00:00:00 to 2024-03-26 00:00:00

Overall win rate: 45.3%

Baseline period: 2023-01-01 00:00:00 to 2023-09-25 00:00:00
Baseline win rate: 45.0%
Baseline deals: 2973

Recent period: 2023-09-26 00:00:00 to 2024-03-26 00:00:00
Recent win rate: 45.6%
Recent deals: 2027


## 2. Build Win Rate Driver Model

We use **Logistic Regression** because:
- Interpretable coefficients (each feature's impact on win probability)
- Fast and reliable
- Easy to explain to non-technical stakeholders
- No need for complex feature engineering or hyperparameter tuning

In [3]:
# Fit model on recent data (to understand current drivers)
analyzer = WinRateDriverAnalyzer()
analyzer.fit(recent_data)

print("Model fitted successfully!")
print(f"\nFeatures used: {len(analyzer.feature_names)}")
print(f"Features: {analyzer.feature_names}")

Model fitted successfully!

Features used: 9
Features: ['industry', 'region', 'product_type', 'lead_source', 'deal_stage', 'acv_bucket', 'cycle_bucket', 'deal_amount', 'sales_cycle_days']


In [4]:
# Get drivers
drivers = analyzer.get_drivers(top_n=10)

print("=" * 80)
print("TOP NEGATIVE DRIVERS (Hurting Win Rate)")
print("=" * 80)
for i, driver in enumerate(drivers['negative_drivers'][:10], 1):
    print(f"{i}. {driver['feature']:20s} | Coef: {driver['coefficient']:7.3f} | {driver['impact']} {driver['interpretation']}")

print("\n" + "=" * 80)
print("TOP POSITIVE DRIVERS (Improving Win Rate)")
print("=" * 80)
for i, driver in enumerate(drivers['positive_drivers'][:10], 1):
    print(f"{i}. {driver['feature']:20s} | Coef: {driver['coefficient']:7.3f} | {driver['impact']} {driver['interpretation']}")

TOP NEGATIVE DRIVERS (Hurting Win Rate)
1. cycle_bucket         | Coef:  -0.277 | ↓ moderately decreases win probability
2. deal_stage           | Coef:  -0.069 | ↓ slightly decreases win probability
3. product_type         | Coef:  -0.003 | ↓ slightly decreases win probability

TOP POSITIVE DRIVERS (Improving Win Rate)
1. sales_cycle_days     | Coef:   0.260 | ↑ moderately increases win probability
2. deal_amount          | Coef:   0.069 | ↑ slightly increases win probability
3. acv_bucket           | Coef:   0.051 | ↑ slightly increases win probability
4. lead_source          | Coef:   0.066 | ↑ slightly increases win probability
5. region               | Coef:   0.041 | ↑ slightly increases win probability
6. industry             | Coef:   0.008 | ↑ slightly increases win probability


## 2.5. Enhanced Driver Analysis with WRDS Scoring

The decision engine now includes **Win Rate Driver Score (WRDS)** which ranks drivers by:
- **Impact Strength**: Coefficient magnitude from logistic regression
- **Revenue Exposure**: % of total pipeline ACV affected
- **Recent Trend**: Whether driver is worsening or improving

This provides a more business-relevant ranking than coefficients alone.

In [5]:
# Get drivers with WRDS scoring
drivers_wrds = analyzer.get_drivers(top_n=5, include_wrds=True)

print("=" * 80)
print(" TOP NEGATIVE DRIVERS (ranked by WRDS)")
print("=" * 80)
print(f"{'Rank':<6} {'Driver':<25} {'WRDS':<8} {'Revenue Exposure':<18} {'Trend':<15}")
print("-" * 75)
for i, driver in enumerate(drivers_wrds['negative_drivers'][:5], 1):
    print(f"{i:<6} {driver['feature']:<25} {driver['wrds']:<8.3f} "
          f"{driver['revenue_exposure']:<18.1%} {driver['trend_direction']:<15}")

if drivers_wrds['negative_drivers']:
    top_driver = drivers_wrds['negative_drivers'][0]
    print("\n" + "=" * 80)
    print(f" DETAILED ANALYSIS: {top_driver['feature']}")
    print("=" * 80)
    print(f"WRDS Score: {top_driver['wrds']:.3f}")
    print(f"Impact: {top_driver['interpretation']}")
    print(f"Revenue Exposure: {top_driver['revenue_exposure']:.1%}")
    print(f"Trend: {top_driver['trend_direction']} ({top_driver['trend_delta']:+.1%})")
    
    if top_driver.get('likely_issues'):
        print("\nLikely Issues:")
        for issue in top_driver['likely_issues']:
            print(f"  • {issue}")
    
    if top_driver.get('suggested_actions'):
        print("\nSuggested Actions:")
        for action in top_driver['suggested_actions']:
            print(f"  • {action}")

 TOP NEGATIVE DRIVERS (ranked by WRDS)
Rank   Driver                    WRDS     Revenue Exposure   Trend          
---------------------------------------------------------------------------
1      cycle_bucket              0.076    27.5%              stable         
2      deal_stage                0.016    22.9%              stable         
3      product_type              0.001    35.1%              stable         

 DETAILED ANALYSIS: cycle_bucket
WRDS Score: 0.076
Impact: moderately decreases win probability
Revenue Exposure: 27.5%
Trend: stable (+0.0%)

Likely Issues:
  • Qualification issues
  • Chasing bad deals too long
  • Pricing friction
  • Process inefficiencies

Suggested Actions:
  • Improve early-stage disqualification
  • Tighten MEDDICC / ICP enforcement
  • Pricing transparency
  • Streamline approval processes


In [6]:
# Visualize driver importance
fig = plot_driver_importance(drivers)
output_path = os.path.join(project_root, 'outputs', 'insights', 'driver_importance.png')
plt.savefig(output_path, dpi=300, bbox_inches='tight')
plt.show()
print(f"[OK] Saved: {output_path}")

[OK] Saved: /Users/shoaibmobassir/Desktop/SkyGeni1234/outputs/insights/driver_importance.png


### Visualization: Driver Importance

![Driver Importance](outputs/insights/driver_importance.png)

## 3. Compare Drivers Over Time

Compare baseline vs recent periods to understand what changed.

In [7]:
# Compare periods
period_comparison = analyzer.compare_periods(baseline_data, recent_data)

print("=" * 80)
print("DRIVERS THAT CHANGED OVER TIME")
print("=" * 80)

if period_comparison['changed_drivers']:
    changed = sorted(period_comparison['changed_drivers'], 
                     key=lambda x: abs(x['change']), reverse=True)
    for driver in changed[:10]:
        print(f"\n{driver['feature']}:")
        print(f"  Direction: {driver['direction']}")
        print(f"  Baseline coefficient: {driver['baseline_coef']:.3f}")
        print(f"  Recent coefficient: {driver['recent_coef']:.3f}")
        print(f"  Change: {driver['change']:+.3f}")
else:
    print("No significant changes detected in driver coefficients.")

DRIVERS THAT CHANGED OVER TIME

cycle_bucket:
  Direction: worsened
  Baseline coefficient: -0.063
  Recent coefficient: -0.277
  Change: -0.214


In [8]:
# Generate actionable outputs
report = generate_actionable_outputs(analyzer, period_comparison)
print(report)

# Save report
report_path = os.path.join(project_root, 'outputs', 'reports', 'win_rate_driver_analysis.txt')
with open(report_path, 'w') as f:
    f.write(report)
print(f"\n[OK] Report saved to {report_path}")

# Win Rate Driver Analysis - Decision Engine Output

This system tells leadership what changed, where revenue is leaking, 
and what to focus on this quarter—not just what the win rate is.

## Top Negative Drivers (Hurting Win Rate)

| Driver | Impact | Revenue at Risk | What Changed |
|--------|--------|-----------------|--------------|
| cycle_bucket | ↓ moderately decreases win probability | $1.5M | Win rate 0.0% (stable) |

**Likely Issues:**
- Qualification issues
- Chasing bad deals too long
- Pricing friction
- Process inefficiencies

**Suggested Actions:**
- Improve early-stage disqualification
- Tighten MEDDICC / ICP enforcement
- Pricing transparency
- Streamline approval processes

| deal_stage | ↓ slightly decreases win probability | $0.4M | Win rate 0.0% (stable) |

**Likely Issues:**
- Process inefficiencies
- Resource constraints
- Competitive pressure

**Suggested Actions:**
- Review sales process
- Enablement and training
- Competitive analysis

| product_type | ↓ sligh

## 5. How a CRO Would Use This Analysis

### Use Case 1: Pipeline Prioritization
**Action**: Review open deals and prioritize those with positive drivers (e.g., certain lead sources, ACV ranges, industries)

**Example**: "Focus sales efforts on deals from Referral and Partner sources in Mid-Market ACV range"

### Use Case 2: Resource Allocation
**Action**: Redirect sales and marketing resources away from segments with negative drivers

**Example**: "Reduce investment in Outbound leads for Enterprise deals, shift to Inbound and Referral"

### Use Case 3: Enablement & Coaching
**Action**: Provide targeted training on deals with negative driver patterns

**Example**: "Train reps on Enterprise deal qualification - these deals are taking longer and closing less"

### Use Case 4: Strategic Planning
**Action**: Use changed drivers to understand market shifts and adjust strategy

**Example**: "Enterprise deals became negative drivers recently - investigate competitive pressure or pricing issues"

### Limitations & Next Steps
- **Correlation ≠ Causation**: These are associations, not proven causes
- **Need Qualitative Validation**: Discuss findings with sales leaders to understand context
- **Monitor Over Time**: Re-run analysis monthly to track if interventions are working