# Profit Analysis: Structural Health vs Discount Damage

From 02_discount_analysis, we know:
- Discounts don't move volume (correlation 0.009)
- Discounts hurt profit (correlation -0.219)
- Breaking point is 20% discount
- Central region gets 24% average discount and generates only $17/order profit
- West region gets 11% average discount and generates $33.85/order profit

The critical question: Are some products and regions STRUCTURALLY BROKEN, or are they just being POORLY DISCOUNTED?

Why it matters:
- If Binders are structurally unprofitable, stopping discounts won't save them
- If Binders are just over-discounted, the fix is obvious: reduce discounts
- If Central is weak by nature, we need a different strategy than West
- If Central is just being discounted to death, we can fix it immediately

This notebook establishes the baseline profitability of each product, segment, and region. From that, we can distinguish what's fixable from what's broken.

Important: We're looking at real profitability stratified by discount level. This shows baseline (0% discount orders) and how much damage discounting causes. We're not calculating a theoretical world without discounts—we're analyzing actual data.

## 01. Setup

In [16]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

pd.options.display.float_format = '{:.2f}'.format

df = pd.read_csv("sample_superstore_processed.csv")

## 02. Baseline Profitability: Products at Full Price

In [17]:
no_discount_orders = df[df['Discount'] == 0]
high_discount_orders = df[df['Discount'] > 0.20]

baseline_profit = no_discount_orders.groupby('Sub-Category').agg({
    'Profit': ['sum', 'mean', 'count'],
    'Sales': 'sum',
    'Quantity': 'sum'
}).round(2)

baseline_profit.columns = ['Total_Profit', 'Avg_Profit', 'Count', 'Total_Sales', 'Total_Qty']
baseline_profit = baseline_profit.sort_values('Avg_Profit', ascending=False)

print("BASELINE PROFITABILITY: Sub-Categories at 0% Discount")
print("="*80)
print(baseline_profit)
print("\nInterpretation:")
print("  Avg_Profit = profitability per order when sold at full price")
print("  High value = product has strong baseline margins")
print("  Low/negative = product is broken even without discounts")
print("="*80)

BASELINE PROFITABILITY: Sub-Categories at 0% Discount
              Total_Profit  Avg_Profit  Count  Total_Sales  Total_Qty
Sub-Category                                                         
Copiers           35556.13     1616.19     22     76449.18         82
Machines          27137.82      935.79     29     71034.00        138
Tables            13276.30      184.39     72     71578.76        310
Chairs            21933.10      164.91    133     91060.73        540
Binders           39314.45      116.66    337     81829.48       1291
Phones            34365.21      110.50    311    123879.71       1102
Bookcases          6075.71      101.26     60     31935.98        211
Appliances        23183.74       85.55    271     78066.19       1021
Accessories       35289.25       74.92    471    118370.31       1835
Storage           25528.17       48.17    530    157853.76       2044
Envelopes          4976.98       32.74    152     10606.45        548
Paper             25329.47       29.

Looking at the numbers:

Strong baseline (over $100/order): Copiers ($1,616), Machines ($936), Tables ($184), Chairs ($165), Binders ($117), Phones ($111). These products have structural strength and can absorb some discounting without collapsing.

Weak baseline ($30-100/order): Bookcases, Appliances, Accessories, Storage, Envelopes, Paper, Furnishings. These are vulnerable to discounting. A 20% discount will likely push them negative.

Problematic baseline (under $30/order): Labels, Supplies, Art, Fasteners. Already struggling at 0% discount. Discounting them is pointless—they're broken at the baseline.

Key insight: Baseline profitability determines what can absorb discounts. If a product is weak here, discounting won't help. It will only accelerate the damage.

## 03. Damage Analysis: Products at 20%+ Discount

In [18]:
high_discount_profit = high_discount_orders.groupby('Sub-Category').agg({
    'Profit': ['sum', 'mean', 'count']
}).round(2)

high_discount_profit.columns = ['Total_Profit', 'Avg_Profit', 'Count']

comparison = pd.DataFrame({
    'Baseline_0pct': baseline_profit['Avg_Profit'],
    'HighDiscount_20pct': high_discount_profit['Avg_Profit'],
    'Baseline_Orders': baseline_profit['Count'],
    'Discounted_Orders': high_discount_profit['Count']
})

comparison['Damage'] = comparison['Baseline_0pct'] - comparison['HighDiscount_20pct']
comparison['Damage_pct'] = (comparison['Damage'] / comparison['Baseline_0pct'] * 100).round(1)
comparison = comparison.sort_values('Baseline_0pct', ascending=False)

print("DAMAGE BY DISCOUNT: Baseline vs 20%+ Discount")
print("="*80)
print(comparison)
print("="*80)
print("\nKey comparison:")
strong = comparison[comparison['Baseline_0pct'] >= 100]
weak = comparison[(comparison['Baseline_0pct'] >= 30) & (comparison['Baseline_0pct'] < 100)]
broken = comparison[comparison['Baseline_0pct'] < 30]

print("\nSTRONG (Baseline $100+): Can these survive 20%+ discount?")
for idx, row in strong.iterrows():
    if row['HighDiscount_20pct'] > 0:
        print(f"  {idx}: {row['Baseline_0pct']:.2f} → {row['HighDiscount_20pct']:.2f} (loses {row['Damage']:.2f})")
    else:
        print(f"  {idx}: {row['Baseline_0pct']:.2f} → {row['HighDiscount_20pct']:.2f} (GOES NEGATIVE)")

print("\nWEAK (Baseline $30-100): These get destroyed by 20%+ discount")
for idx, row in weak.iterrows():
    if row['HighDiscount_20pct'] < 0:
        print(f"  {idx}: {row['Baseline_0pct']:.2f} → {row['HighDiscount_20pct']:.2f} (unprofitable)")
    else:
        print(f"  {idx}: {row['Baseline_0pct']:.2f} → {row['HighDiscount_20pct']:.2f} (still profitable but weak)")

print("\nPROBLEMATIC (Baseline under $30): Already broken")
for idx, row in broken.iterrows():
    print(f"  {idx}: {row['Baseline_0pct']:.2f} baseline")
    print(f"       Already struggling. Discounting won't help.")

print("\n" + "="*80)

DAMAGE BY DISCOUNT: Baseline vs 20%+ Discount
              Baseline_0pct  HighDiscount_20pct  Baseline_Orders  \
Sub-Category                                                       
Copiers             1616.19              242.55               22   
Machines             935.79             -557.65               29   
Tables               184.39             -174.42               72   
Chairs               164.91              -42.64              133   
Binders              116.66              -62.82              337   
Phones               110.50              -58.59              311   
Bookcases            101.26             -158.54               60   
Appliances            85.55             -128.80              271   
Accessories           74.92                 NaN              471   
Storage               48.17                 NaN              530   
Envelopes             32.74                 NaN              152   
Paper                 29.56                 NaN              857   
Fu

## 04. Volume Analysis: Do Discounts Drive Sales?

In [19]:
volume_comparison = pd.DataFrame({
    'Baseline_Qty_Total': no_discount_orders.groupby('Sub-Category')['Quantity'].sum(),
    'Baseline_Orders': no_discount_orders.groupby('Sub-Category').size(),
    'Discounted_Qty_Total': high_discount_orders.groupby('Sub-Category')['Quantity'].sum(),
    'Discounted_Orders': high_discount_orders.groupby('Sub-Category').size()
})

volume_comparison['Baseline_Qty_Avg'] = (volume_comparison['Baseline_Qty_Total'] / 
                                          volume_comparison['Baseline_Orders']).round(2)
volume_comparison['Discounted_Qty_Avg'] = (volume_comparison['Discounted_Qty_Total'] / 
                                            volume_comparison['Discounted_Orders']).round(2)
volume_comparison['Unit_Change'] = (volume_comparison['Discounted_Qty_Avg'] - 
                                    volume_comparison['Baseline_Qty_Avg']).round(2)
volume_comparison['Order_Change_pct'] = ((volume_comparison['Discounted_Orders'] - 
                                          volume_comparison['Baseline_Orders']) / 
                                         volume_comparison['Baseline_Orders'] * 100).round(1)

volume_comparison = volume_comparison.sort_values('Baseline_Orders', ascending=False)

print("VOLUME ANALYSIS: Do Discounts Drive More Sales?")
print("="*80)
print(volume_comparison[['Baseline_Orders', 'Baseline_Qty_Avg', 'Discounted_Orders', 
                         'Discounted_Qty_Avg', 'Unit_Change', 'Order_Change_pct']])
print("="*80)
print("\nInterpretation:")
print("  Baseline_Orders = transactions at 0% discount")
print("  Discounted_Orders = transactions at 20%+ discount")
print("  Unit_Change = difference in units per order")
print("  Order_Change_pct = percentage change in number of orders")
print("\nRecall from 02: overall correlation between discount and quantity is 0.009 (zero).")
print("This breakdown by product shows whether any specific product responds to discounting.")
print("="*80)

VOLUME ANALYSIS: Do Discounts Drive More Sales?
              Baseline_Orders  Baseline_Qty_Avg  Discounted_Orders  \
Sub-Category                                                         
Paper                     857              3.77                NaN   
Furnishings               571              3.79             138.00   
Storage                   530              3.86                NaN   
Art                       498              3.82                NaN   
Accessories               471              3.90                NaN   
Binders                   337              3.83             613.00   
Phones                    311              3.54             109.00   
Appliances                271              3.77              67.00   
Labels                    239              3.95                NaN   
Envelopes                 152              3.61                NaN   
Chairs                    133              4.06             158.00   
Fasteners                 128             

## 05. Segment Analysis: Consumer vs Corporate vs Home Office

In [20]:
baseline_segment = no_discount_orders.groupby('Segment')['Profit'].agg(['sum', 'mean', 'count']).round(2)
baseline_segment.columns = ['Total_Profit', 'Avg_Profit', 'Count']

discounted_segment = high_discount_orders.groupby('Segment')['Profit'].agg(['sum', 'mean', 'count']).round(2)
discounted_segment.columns = ['Total_Profit', 'Avg_Profit', 'Count']

segment_comparison = pd.DataFrame({
    'Baseline': baseline_segment['Avg_Profit'],
    'Discounted': discounted_segment['Avg_Profit'],
    'Baseline_Orders': baseline_segment['Count'],
    'Discounted_Orders': discounted_segment['Count']
})

segment_comparison['Damage'] = segment_comparison['Baseline'] - segment_comparison['Discounted']
segment_comparison['Damage_pct'] = (segment_comparison['Damage'] / segment_comparison['Baseline'] * 100).round(1)

print("SEGMENT ANALYSIS: Structural Health")
print("="*80)
print("\nBaseline (0% discount):")
print(baseline_segment)

print("\nAt 20%+ discount:")
print(discounted_segment)

print("\nComparison:")
print(segment_comparison)
print("="*80)
print("\nKey finding from 02: Consumer gets higher discounts despite having lower margins.")
print("This analysis shows: Is Consumer weak structurally, or just over-discounted?")

SEGMENT ANALYSIS: Structural Health

Baseline (0% discount):
             Total_Profit  Avg_Profit  Count
Segment                                     
Consumer        157901.96       64.16   2461
Corporate       102150.78       71.58   1427
Home Office      60934.86       66.96    910

At 20%+ discount:
             Total_Profit  Avg_Profit  Count
Segment                                     
Consumer        -71890.16      -97.81    735
Corporate       -39770.76      -94.69    420
Home Office     -23715.13      -99.64    238

Comparison:
             Baseline  Discounted  Baseline_Orders  Discounted_Orders  Damage  \
Segment                                                                         
Consumer        64.16      -97.81             2461                735  161.97   
Corporate       71.58      -94.69             1427                420  166.27   
Home Office     66.96      -99.64              910                238  166.60   

             Damage_pct  
Segment                  

## 06. Region Analysis: Geographic Breakdown

In [21]:
baseline_region = no_discount_orders.groupby('Region')['Profit'].agg(['sum', 'mean', 'count']).round(2)
baseline_region.columns = ['Total_Profit', 'Avg_Profit', 'Count']

discounted_region = high_discount_orders.groupby('Region')['Profit'].agg(['sum', 'mean', 'count']).round(2)
discounted_region.columns = ['Total_Profit', 'Avg_Profit', 'Count']

region_comparison = pd.DataFrame({
    'Baseline': baseline_region['Avg_Profit'],
    'Baseline_Orders': baseline_region['Count'],
    'Discounted': discounted_region['Avg_Profit'],
    'Discounted_Orders': discounted_region['Count']
})

region_comparison['Damage'] = region_comparison['Baseline'] - region_comparison['Discounted']
region_comparison['Damage_pct'] = (region_comparison['Damage'] / region_comparison['Baseline'] * 100).round(1)
region_comparison['Survives_20pct'] = region_comparison['Discounted'] > 0

print("REGION ANALYSIS: Baseline Health")
print("="*80)
print("\nBaseline (0% discount):")
print(baseline_region)

print("\nAt 20%+ discount:")
print(discounted_region)

print("\nComparison:")
print(region_comparison)
print("="*80)
print("\nCritical question: Is Central baseline at 0% discount still profitable?")
print("If yes, then problem is the 24% average discount from 02.")
print("If no, then it's structural weakness.")
print("="*80)

REGION ANALYSIS: Baseline Health

Baseline (0% discount):
         Total_Profit  Avg_Profit  Count
Region                                  
Central      76125.44       91.94    828
East        105377.55       72.72   1449
South        62981.34       78.24    805
West         76503.28       44.58   1716

At 20%+ discount:
         Total_Profit  Avg_Profit  Count
Region                                  
Central     -52392.27      -81.48    643
East        -42088.61      -91.30    461
South       -24880.48     -145.50    171
West        -16014.69     -135.72    118

Comparison:
         Baseline  Baseline_Orders  Discounted  Discounted_Orders  Damage  \
Region                                                                      
Central     91.94              828      -81.48                643  173.42   
East        72.72             1449      -91.30                461  164.02   
South       78.24              805     -145.50                171  223.74   
West        44.58             171

## 07. Critical Cases: Salvageable or Broken?

In [22]:
print("CRITICAL CASES: Can We Fix These With Better Discount Strategy?")
print("="*80)

cases = {
    'Binders': 'Binders',
    'Machines': 'Machines',
    'Copiers': 'Copiers',
}

for name, subcat in cases.items():
    no_disc = no_discount_orders[no_discount_orders['Sub-Category'] == subcat]
    high_disc = high_discount_orders[high_discount_orders['Sub-Category'] == subcat]
    baseline = no_disc['Profit'].mean()
    discounted = high_disc['Profit'].mean()
    
    print(f"\n{name.upper()}")
    print(f"  Baseline (0%): ${baseline:.2f}/order ({len(no_disc)} orders)")
    print(f"  At 20%+: ${discounted:.2f}/order ({len(high_disc)} orders)")
    print(f"  Damage: ${baseline - discounted:.2f}/order")
    
    if baseline > 100:
        print(f"  ✓ SALVAGEABLE: Strong baseline. Stop discounting.")
    elif baseline > 30:
        print(f"  ⚠ FIXABLE: Decent baseline. Reduce discounts.")
    else:
        print(f"  ✗ STRUCTURAL: Even at 0%, barely profitable.")

print("\nCENTRAL REGION")
central_no_disc = no_discount_orders[no_discount_orders['Region'] == 'Central']
central_high_disc = high_discount_orders[high_discount_orders['Region'] == 'Central']
central_baseline = central_no_disc['Profit'].mean()
central_discounted = central_high_disc['Profit'].mean()

print(f"  Baseline (0%): ${central_baseline:.2f}/order ({len(central_no_disc)} orders)")
print(f"  At 20%+: ${central_discounted:.2f}/order ({len(central_high_disc)} orders)")
print(f"  Damage: ${central_baseline - central_discounted:.2f}/order")

if central_baseline > 25:
    print(f"  ✓ SALVAGEABLE: Decent baseline. Central is being killed by 24% average discount.")
else:
    print(f"  ✗ STRUCTURAL: Central is weak even without discounts.")

print("\nWEST REGION (COMPARISON)")
west_no_disc = no_discount_orders[no_discount_orders['Region'] == 'West']
west_high_disc = high_discount_orders[high_discount_orders['Region'] == 'West']
west_baseline = west_no_disc['Profit'].mean()
west_discounted = west_high_disc['Profit'].mean()

print(f"  Baseline (0%): ${west_baseline:.2f}/order ({len(west_no_disc)} orders)")
print(f"  At 20%+: ${west_discounted:.2f}/order ({len(west_high_disc)} orders)")
print(f"  ✓ STRONG: High baseline + smart low-discount strategy. This is the model.")

print("\n" + "="*80)

CRITICAL CASES: Can We Fix These With Better Discount Strategy?

BINDERS
  Baseline (0%): $116.66/order (337 orders)
  At 20%+: $-62.82/order (613 orders)
  Damage: $179.48/order
  ✓ SALVAGEABLE: Strong baseline. Stop discounting.

MACHINES
  Baseline (0%): $935.79/order (29 orders)
  At 20%+: $-557.65/order (53 orders)
  Damage: $1493.44/order
  ✓ SALVAGEABLE: Strong baseline. Stop discounting.

COPIERS
  Baseline (0%): $1616.19/order (22 orders)
  At 20%+: $242.55/order (9 orders)
  Damage: $1373.64/order
  ✓ SALVAGEABLE: Strong baseline. Stop discounting.

CENTRAL REGION
  Baseline (0%): $91.94/order (828 orders)
  At 20%+: $-81.48/order (643 orders)
  Damage: $173.42/order
  ✓ SALVAGEABLE: Decent baseline. Central is being killed by 24% average discount.

WEST REGION (COMPARISON)
  Baseline (0%): $44.58/order (1716 orders)
  At 20%+: $-135.72/order (118 orders)
  ✓ STRONG: High baseline + smart low-discount strategy. This is the model.



## 08. Summary: Fixable vs Structural Problems

DISCOUNT PROBLEMS (Fixable by changing strategy):

Binders: Strong baseline (~$117), destroyed by 20%+ discount (-$63). Stop discounting them.

Machines: Strong baseline (~$936), destroyed by 20%+ discount (-$558). Stop discounting them.

Central Region: Decent baseline, destroyed by 24% average discount from 02. Reduce discounts.

STRUCTURAL PROBLEMS (Need deeper solutions):

Based on the analysis above, identify products with weak baselines even at 0% discount. These require different approaches beyond discount management.

STRENGTHS (Keep doing what you're doing):

Copiers: Can absorb discounts and stay profitable. Only product with this capability.

West: Strong baseline. Smart discount strategy from 02.

KEY INSIGHT: Most problems from 02 are DISCOUNT PROBLEMS, not structural problems. They're fixable. Stop discounting weak products and weak regions. The remedy is straightforward.

Only a few products (if any) have structural issues that discount strategy alone won't fix. Those require deeper investigation.

## 09. What's Next

In a 04_scenario_modeling notebook, we would model:

What if we eliminated discounts from Binders and Machines? Revenue change? Profit change?

What if we capped discounts at 15% (below the breaking point)? How much profit recovery?

What if we replicated West's discount strategy in Central? Reduce Central discount from 24% to 11%.

What's the financial upside of better discount management? $566K is currently being lost. How much could we recover?