EXPLORATORY DATA ANALYSIS (EDA) FOR PRICE OPTIMIZATION
=======================================================

KEY QUESTIONS WE'RE ANSWERING:
1. What's the relationship between price and quantity? (elasticity)
2. Which products/segments are most profitable?
3. How does seasonality affect demand?
4. How do competitor prices impact our sales?
5. What price ranges maximize profit?

In [23]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings('ignore')
# Set visualization style best practice for EDA scripts.
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (14, 6)




Unnamed: 0,date,product,customer_segment,price,competitor_price,quantity_sold,unit_cost,inventory_level,days_since_promotion,competitor_promotion,month,quarter,day_of_week,revenue,cost,profit,margin_percent
0,2023-11-11,Pipettes,Academic,269.04,250.53,70,114.0,87,116,0,11,4,5,18832.8,7980.0,10852.8,57.63
1,2023-02-09,Microscope,Pharma,15998.99,17402.27,115,5700.0,187,20,0,2,1,3,1839883.85,655500.0,1184383.85,64.37
2,2021-09-10,PCR_System,Pharma,7836.67,7509.45,208,3040.0,169,14,0,9,3,4,1630027.36,632320.0,997707.36,61.21
3,2023-04-12,Microscope,Government,14850.43,15919.49,107,5700.0,72,166,0,4,2,2,1588996.01,609900.0,979096.01,61.62
4,2021-01-14,Centrifuge,Government,13110.23,12597.92,78,4560.0,83,91,0,1,1,3,1022597.94,355680.0,666917.94,65.22


In [3]:
# Load the data
df = pd.read_csv('lab_equipment_pricing.csv')
df['date'] = pd.to_datetime(df['date'])

print(f"✓ Data loaded: {len(df):,} records")
print(f"✓ Date range: {df['date'].min().date()} to {df['date'].max().date()}")

df.info()
''' We don't have any missing values or any datatype issuse. 
(Other wise we would have to us df['col']=df['col'].astype['string/object/int'])   '''

✓ Data loaded: 10,000 records
✓ Date range: 2021-01-01 to 2023-12-31
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 17 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   date                  10000 non-null  datetime64[ns]
 1   product               10000 non-null  object        
 2   customer_segment      10000 non-null  object        
 3   price                 10000 non-null  float64       
 4   competitor_price      10000 non-null  float64       
 5   quantity_sold         10000 non-null  int64         
 6   unit_cost             10000 non-null  float64       
 7   inventory_level       10000 non-null  int64         
 8   days_since_promotion  10000 non-null  int64         
 9   competitor_promotion  10000 non-null  int64         
 10  month                 10000 non-null  int64         
 11  quarter               10000 non-null  int64         
 12  day_of

In [24]:
df.head()

Unnamed: 0,date,product,customer_segment,price,competitor_price,quantity_sold,unit_cost,inventory_level,days_since_promotion,competitor_promotion,month,quarter,day_of_week,revenue,cost,profit,margin_percent
0,2023-11-11,Pipettes,Academic,269.04,250.53,70,114.0,87,116,0,11,4,5,18832.8,7980.0,10852.8,57.63
1,2023-02-09,Microscope,Pharma,15998.99,17402.27,115,5700.0,187,20,0,2,1,3,1839883.85,655500.0,1184383.85,64.37
2,2021-09-10,PCR_System,Pharma,7836.67,7509.45,208,3040.0,169,14,0,9,3,4,1630027.36,632320.0,997707.36,61.21
3,2023-04-12,Microscope,Government,14850.43,15919.49,107,5700.0,72,166,0,4,2,2,1588996.01,609900.0,979096.01,61.62
4,2021-01-14,Centrifuge,Government,13110.23,12597.92,78,4560.0,83,91,0,1,1,3,1022597.94,355680.0,666917.94,65.22


In [12]:
print("\n" + "="*80)
print("ABOUT THE DATA")
print("\n" + "="*80)

'''Object is pandas generic type for anything that doesn't fit neatly into 
numeric categories.'''
print("\nCategorical Features and Unique Values:")
print(df.select_dtypes(include=['object']).nunique().sort_values(ascending=False))
print()

print("Size and shape of the DataFrame:")
print(df.size, df.shape)
print()


ABOUT THE DATA


Categorical Features and Unique Values:
product             5
customer_segment    4
dtype: int64

Size and shape of the DataFrame:
170000 (10000, 17)



In [19]:
# ============================================================================
# SECTION 1: DATA QUALITY CHECK
# ============================================================================
print("\n" + "="*80)
print("SECTION 1: DATA QUALITY CHECK")
print("="*80)
print()

print("Missing Values")
print("-"*80)

missing = df.isnull().sum()
if missing.sum() == 0:
    print("No missing values found")
else:
    print("Missing values detected:")
    print(missing[missing > 0])
print()

print("Duplicate Records")
print("-"*80)
duplicates = df.duplicated().sum()
print(f"Duplicate rows: {duplicates}")
if duplicates > 0:
    print("Duplicates found!")
print()

print("Basic Statistics")
print("-"*80)
print(df.describe())
print()

# BUSINESS INSIGHT: Check for unrealistic values
print("Data Sanity Checks")
print("-"*80)
print(f"✓ Negative prices: {(df['price'] < 0).sum()}")
print(f"✓ Negative quantities: {(df['quantity_sold'] < 0).sum()}")
print(f"✓ Negative profits: {(df['profit'] < 0).sum()} ({(df['profit'] < 0).sum() / len(df) * 100:.2f}%)")
print(f"✓ Negative reveunes: {(df['revenue'] < 0).sum()}")
print()

print("PRICING INSIGHT: Negative profits are normal (lost money on some sales).")
print("PRICING INSIGHT: Zero quantities mean price was too high for that context.")
print()



SECTION 1: DATA QUALITY CHECK

Missing Values
--------------------------------------------------------------------------------
No missing values found

Duplicate Records
--------------------------------------------------------------------------------
Duplicate rows: 0

Basic Statistics
--------------------------------------------------------------------------------
                             date         price  competitor_price  \
count                       10000  10000.000000      10000.000000   
mean   2022-06-28 05:29:19.680000   7081.918057       7084.946605   
min           2021-01-01 00:00:00    255.030000        230.430000   
25%           2021-09-30 00:00:00    413.860000        410.767500   
50%           2022-06-28 00:00:00   7969.815000       7937.675000   
75%           2023-03-25 00:00:00  12765.317500      12530.507500   
max           2023-12-31 00:00:00  17243.740000      18879.000000   
std                           NaN   5991.305862       6018.017541   

       qu

### Next let's calculate Elasticity!
WHY? Elasticity tells us how demand responds to price changes.

Formula: % Change in Quantity / % Change in Price

Elastic (< -1): Customers very price-sensitive, lower price = more revenue
Inelastic (> -1): Customers less price-sensitive, can raise prices

Rule of thumb:
- Elastic: |E|>1 (quantity changes proportionally more than price)
- Unit elastic: |E|=1
- Inelastic: |E|<1 (quantity changes proportionally less than price).



In [35]:

# ============================================================================
# SECTION 2: PRICE ELASTICITY ANALYSIS
# ============================================================================
print("\n" + "="*80)
print("SECTION 2: PRICE ELASTICITY ANALYSIS")
print("="*80)

print("Price Elasticity by Product")
print("-"*80)

elasticity_results = []

for product in df['product'].unique():
    product_data = df[df['product'] == product].copy()
    
    # Create price bins (quintiles)
    product_data['price_quintile'] = pd.qcut(product_data['price'], q=5, 
                                              labels=['Very Low', 'Low', 'Medium', 'High', 'Very High'],
                                              duplicates='drop')
    
    # Calculate average quantity by price level
    # observed=True means only include groups that actually exist in the data
    price_qty = product_data.groupby('price_quintile', observed=True).agg({
        'price': 'mean',
        'quantity_sold': 'mean'
    }).sort_values('price')

    
    
    # Calculate elasticity (comparing highest vs lowest quintile)
    if len(price_qty) >= 2:
        price_low = price_qty.iloc[0]['price']
        price_high = price_qty.iloc[-1]['price']
        qty_low = price_qty.iloc[0]['quantity_sold']
        qty_high = price_qty.iloc[-1]['quantity_sold']
        
        # Percent changes
        price_change_pct = (price_high - price_low) / price_low
        qty_change_pct = (qty_high - qty_low) / qty_low
        
        # Elasticity
        if price_change_pct != 0:
            elasticity = qty_change_pct / price_change_pct
            
            elasticity_results.append({
                'Product': product,
                'Elasticity': elasticity,
                'Interpretation': 'Elastic' if elasticity < -1 else 'Inelastic',
                'Avg_Price': product_data['price'].mean(),
                'Price_StdDev': product_data['price'].std()
            })
            
            print(f"{product:15} → Elasticity: {elasticity:6.2f} ({elasticity_results[-1]['Interpretation']:10})")


print()
print("PRICING INSIGHT:")
elasticity_df = pd.DataFrame(elasticity_results).sort_values('Elasticity')
most_elastic = elasticity_df.iloc[0]
least_elastic = elasticity_df.iloc[-1]
print(f"  Most elastic (price-sensitive): {most_elastic['Product']} ({most_elastic['Elasticity']:.2f})")
print(f"  → Strategy: Compete on price, focus on volume")
print(f"  Least elastic (price-insensitive): {least_elastic['Product']} ({least_elastic['Elasticity']:.2f})")
print(f"  → Strategy: Premium pricing opportunity")
print()


SECTION 2: PRICE ELASTICITY ANALYSIS
Price Elasticity by Product
--------------------------------------------------------------------------------
Pipettes        → Elasticity:  -1.39 (Elastic   )
Microscope      → Elasticity:  -0.57 (Inelastic )
PCR_System      → Elasticity:  -0.89 (Inelastic )
Centrifuge      → Elasticity:  -0.67 (Inelastic )
Reagent_Kit     → Elasticity:  -1.13 (Elastic   )

PRICING INSIGHT:
  Most elastic (price-sensitive): Pipettes (-1.39)
  → Strategy: Compete on price, focus on volume
  Least elastic (price-insensitive): Microscope (-0.57)
  → Strategy: Premium pricing opportunity



### PROFITABILITY ANALYSIS

WHY: Identify which products/segments drive profit (focus optimization there)

In [45]:

# ============================================================================
# SECTION 3: PROFITABILITY ANALYSIS
# ============================================================================
print("\n" + "="*80)
print("SECTION 3: PROFITABILITY ANALYSIS")
print("="*80)


print("Profitability by Product")
print("-"*80)

product_profit = df.groupby('product').agg({
    'revenue': 'sum',
    'cost': 'sum',
    'profit': 'sum',
    'quantity_sold': 'sum',
    'price': 'mean'
}).round(0)

product_profit['profit_margin_%'] = (product_profit['profit'] / product_profit['revenue'] * 100).round(3)
product_profit['profit_per_unit'] = (product_profit['profit'] / product_profit['quantity_sold']).round(2)
product_profit = product_profit.sort_values('profit', ascending=False)

# Convert revenue, cost, profit to millions for readability
product_profit[['revenue', 'cost', 'profit']] = product_profit[['revenue', 'cost', 'profit']] / 1_000_000
product_profit = product_profit.round({'revenue': 2, 'cost': 2, 'profit': 2})

print("Profitability by Product (Revenue, Cost, Profit in Millions)")
print(product_profit)
print()


print("PRICING INSIGHT:")
top_profit_product = product_profit.index[0]
print(f"  Highest profit: {top_profit_product} (${product_profit.loc[top_profit_product, 'profit']:.2f}M)")
print(f"  → Priority product for optimization")
print()



SECTION 3: PROFITABILITY ANALYSIS
Profitability by Product
--------------------------------------------------------------------------------
Profitability by Product (Revenue, Cost, Profit in Millions)
             revenue     cost   profit  quantity_sold    price  \
product                                                          
Microscope   3127.87  1196.91  1930.96         209834  14976.0   
Centrifuge   2508.91   961.30  1547.62         210649  11988.0   
PCR_System   1736.82   665.58  1071.25         218749   8013.0   
Reagent_Kit    96.50    37.22    59.28         217449    449.0   
Pipettes       64.46    24.92    39.54         218394    300.0   

             profit_margin_%  profit_per_unit  
product                                        
Microscope            61.734          9202.33  
Centrifuge            61.685          7346.90  
PCR_System            61.678          4897.14  
Reagent_Kit           61.426           272.61  
Pipettes              61.343           181.07  

In [None]:


print("3.2 Profitability by Customer Segment")
print("-"*80)

segment_profit = df.groupby('customer_segment').agg({
    'revenue': 'sum',
    'profit': 'sum',
    'quantity_sold': 'sum',
    'price': 'mean'
}).round(0)

segment_profit['profit_margin_%'] = (segment_profit['profit'] / segment_profit['revenue'] * 100).round(1)
segment_profit = segment_profit.sort_values('profit', ascending=False)
print(segment_profit)
print()

print("PRICING INSIGHT:")
top_segment = segment_profit.index[0]
bottom_segment = segment_profit.index[-1]
print(f"  Highest profit segment: {top_segment}")
print(f"  Lowest profit segment: {bottom_segment}")
print(f"  → Consider different pricing strategies by segment")
print()

print("3.3 Profit Distribution Analysis")
print("-"*80)
print(f"Mean profit per transaction: ${df['profit'].mean():,.0f}")
print(f"Median profit per transaction: ${df['profit'].median():,.0f}")
print(f"Std dev: ${df['profit'].std():,.0f}")
print(f"\nProfit quartiles:")
print(df['profit'].quantile([0.25, 0.5, 0.75, 0.9, 0.95]))
print()


SECTION 3: PROFITABILITY ANALYSIS
3.1 Profitability by Product
--------------------------------------------------------------------------------
                  revenue          cost        profit  quantity_sold    price  \
product                                                                         
Microscope   3.127872e+09  1.196909e+09  1.930963e+09         209834  14976.0   
Centrifuge   2.508913e+09  9.612965e+08  1.547616e+09         210649  11988.0   
PCR_System   1.736821e+09  6.655763e+08  1.071245e+09         218749   8013.0   
Reagent_Kit  9.650323e+07  3.722476e+07  5.927847e+07         217449    449.0   
Pipettes     6.446431e+07  2.491988e+07  3.954443e+07         218394    300.0   

             profit_margin_%  profit_per_unit  
product                                        
Microscope              61.7          9202.33  
Centrifuge              61.7          7346.90  
PCR_System              61.7          4897.14  
Reagent_Kit             61.4           272.61  

============================================================================
## SECTION 4: SEASONALITY ANALYSIS
============================================================================

In [12]:


# ============================================================================
# SECTION 4: SEASONALITY ANALYSIS
# ============================================================================
print("\n" + "="*80)
print("SECTION 4: SEASONALITY ANALYSIS")
print("="*80)
print("\nWHY: Identify when we can charge premium prices (high demand periods)")
print()

print("4.1 Sales by Month")
print("-"*80)

monthly_sales = df.groupby('month').agg({
    'quantity_sold': 'sum',
    'revenue': 'sum',
    'profit': 'sum',
    'price': 'mean'
}).round(0)

monthly_sales['quantity_index'] = (monthly_sales['quantity_sold'] / monthly_sales['quantity_sold'].mean() * 100).round(0)
monthly_sales = monthly_sales.sort_values('quantity_sold', ascending=False)

print(monthly_sales)
print()

high_months = monthly_sales[monthly_sales['quantity_index'] > 110].index.tolist()
low_months = monthly_sales[monthly_sales['quantity_index'] < 90].index.tolist()

print("PRICING INSIGHT:")
print(f"  High season months: {high_months}")
print(f"  → Opportunity to raise prices (demand is high)")
print(f"  Low season months: {low_months}")
print(f"  → May need competitive pricing or promotions")
print()

print("4.2 Quarter Performance")
print("-"*80)

quarterly = df.groupby('quarter').agg({
    'profit': ['sum', 'mean'],
    'quantity_sold': 'sum',
    'price': 'mean'
}).round(0)

print(quarterly)
print()



SECTION 4: SEASONALITY ANALYSIS

WHY: Identify when we can charge premium prices (high demand periods)

4.1 Sales by Month
--------------------------------------------------------------------------------
       quantity_sold      revenue       profit   price  quantity_index
month                                                                 
9             118921  777229382.0  477698980.0  6666.0           133.0
10            118250  820436199.0  506773635.0  7028.0           132.0
1             102287  745840851.0  458559556.0  7273.0           114.0
5             100257  731842778.0  451716303.0  7348.0           112.0
12             91056  638116631.0  393570929.0  7104.0           102.0
11             89154  628244870.0  388812298.0  7033.0           100.0
3              87637  616534883.0  382022139.0  7115.0            98.0
4              86379  613619934.0  380099629.0  7214.0            96.0
2              82245  569738703.0  351397730.0  6915.0            92.0
8             

In [14]:

# ============================================================================
# SECTION 5: COMPETITIVE DYNAMICS
# ============================================================================
print("\n" + "="*80)
print("SECTION 5: COMPETITIVE DYNAMICS")
print("="*80)
print("\nWHY: Understand how competitor pricing affects our performance")
print()

print("5.1 Price Positioning vs Competitors")
print("-"*80)

df['price_difference'] = df['price'] - df['competitor_price']
df['price_position'] = pd.cut(df['price_difference'], 
                               bins=[-np.inf, -100, 100, np.inf],
                               labels=['Below Competitor', 'At Parity', 'Above Competitor'])

position_analysis = df.groupby('price_position', observed=True).agg({
    'quantity_sold': 'mean',
    'profit': 'mean',
    'revenue': 'mean'
}).round(0)

print(position_analysis)
print()

print("PRICING INSIGHT:")
best_position = position_analysis['profit'].idxmax()
print(f"  Most profitable position: {best_position}")
print(f"  → Being {best_position} doesn't always mean lower profits")
print()

print("5.2 Competitor Promotion Impact")
print("-"*80)

promo_impact = df.groupby('competitor_promotion').agg({
    'quantity_sold': 'mean',
    'profit': 'mean',
    'price': 'mean'
}).round(0)

promo_impact.index = ['No Competitor Promo', 'Competitor Promo Active']
print(promo_impact)
print()

if len(promo_impact) > 1:
    qty_drop = ((promo_impact.loc['Competitor Promo Active', 'quantity_sold'] - 
                 promo_impact.loc['No Competitor Promo', 'quantity_sold']) / 
                promo_impact.loc['No Competitor Promo', 'quantity_sold'] * 100)
    
    print(f"PRICING INSIGHT:")
    print(f"  When competitor runs promotion, our quantity drops {qty_drop:.1f}%")
    print(f"  → Need dynamic pricing response to competitive actions")
print()




SECTION 5: COMPETITIVE DYNAMICS

WHY: Understand how competitor pricing affects our performance

5.1 Price Positioning vs Competitors
--------------------------------------------------------------------------------
                  quantity_sold    profit    revenue
price_position                                      
Below Competitor          107.0  768834.0  1244159.0
At Parity                 108.0  106086.0   173693.0
Above Competitor          107.0  769692.0  1246537.0

PRICING INSIGHT:
  Most profitable position: Above Competitor
  → Being Above Competitor doesn't always mean lower profits

5.2 Competitor Promotion Impact
--------------------------------------------------------------------------------
                         quantity_sold    profit   price
No Competitor Promo              107.0  463820.0  7064.0
Competitor Promo Active          108.0  471067.0  7189.0

PRICING INSIGHT:
  When competitor runs promotion, our quantity drops 0.9%
  → Need dynamic pricing response 

In [15]:
# ============================================================================
# SECTION 6: PRICE-QUANTITY RELATIONSHIP
# ============================================================================
print("\n" + "="*80)
print("SECTION 6: PRICE-QUANTITY RELATIONSHIP")
print("="*80)
print("\nWHY: Validate that higher prices lead to lower quantities (demand curve)")
print()

print("6.1 Correlation Analysis")
print("-"*80)

correlations = df[['price', 'quantity_sold', 'profit', 'competitor_price', 
                    'inventory_level', 'days_since_promotion']].corr()['quantity_sold'].sort_values()

print("Correlation with Quantity Sold:")
print(correlations)
print()

print("PRICING INSIGHT:")
if correlations['price'] < 0:
    print(f"  ✓ Price is negatively correlated with quantity ({correlations['price']:.3f})")
    print(f"    → Normal demand curve behavior confirmed")
else:
    print(f"  ⚠ Unexpected positive correlation - investigate")
print()

print("6.2 Optimal Price Range by Product")
print("-"*80)
print("\nFinding price ranges that maximize profit for each product:\n")

for product in df['product'].unique()[:3]:  # Show top 3
    product_data = df[df['product'] == product].copy()
    
    # Create price bins
    product_data['price_range'] = pd.qcut(product_data['price'], q=5, duplicates='drop')
    
    # Calculate average profit by price range
    profit_by_price = product_data.groupby('price_range', observed=True).agg({
        'profit': 'mean',
        'quantity_sold': 'mean',
        'price': 'mean'
    }).round(0)
    
    optimal_range = profit_by_price['profit'].idxmax()
    optimal_profit = profit_by_price.loc[optimal_range, 'profit']
    optimal_price = profit_by_price.loc[optimal_range, 'price']
    
    print(f"{product}:")
    print(f"  Optimal price range: {optimal_range}")
    print(f"  Avg price in range: ${optimal_price:,.0f}")
    print(f"  Avg profit: ${optimal_profit:,.0f}")
    print()




SECTION 6: PRICE-QUANTITY RELATIONSHIP

WHY: Validate that higher prices lead to lower quantities (demand curve)

6.1 Correlation Analysis
--------------------------------------------------------------------------------
Correlation with Quantity Sold:
competitor_price       -0.033201
price                  -0.032566
days_since_promotion   -0.003659
inventory_level         0.014449
profit                  0.316287
quantity_sold           1.000000
Name: quantity_sold, dtype: float64

PRICING INSIGHT:
  ✓ Price is negatively correlated with quantity (-0.033)
    → Normal demand curve behavior confirmed

6.2 Optimal Price Range by Product
--------------------------------------------------------------------------------

Finding price ranges that maximize profit for each product:

Pipettes:
  Optimal price range: (255.029, 272.504]
  Avg price in range: $264
  Avg profit: $20,341

Microscope:
  Optimal price range: (16377.228, 17243.74]
  Avg price in range: $16,815
  Avg profit: $1,083,996

In [16]:
# ============================================================================
# SECTION 7: INVENTORY IMPACT
# ============================================================================
print("\n" + "="*80)
print("SECTION 7: INVENTORY IMPACT ON PRICING")
print("="*80)
print("\nWHY: High inventory creates pressure to lower prices")
print()

df['inventory_category'] = pd.qcut(df['inventory_level'], q=3, labels=['Low', 'Medium', 'High'])

inventory_impact = df.groupby('inventory_category', observed=True).agg({
    'price': 'mean',
    'quantity_sold': 'mean',
    'profit': 'mean'
}).round(0)

print(inventory_impact)
print()

print("PRICING INSIGHT:")
print("  When inventory is high, consider:")
print("  - Promotional pricing to move stock")
print("  - Bundle deals")
print("  - Targeted discounts to high-volume customers")
print()




SECTION 7: INVENTORY IMPACT ON PRICING

WHY: High inventory creates pressure to lower prices

                     price  quantity_sold    profit
inventory_category                                 
Low                 7143.0          107.0  465966.0
Medium              6918.0          107.0  455405.0
High                7185.0          109.0  473250.0

PRICING INSIGHT:
  When inventory is high, consider:
  - Promotional pricing to move stock
  - Bundle deals
  - Targeted discounts to high-volume customers



In [17]:
# ============================================================================
# SUMMARY: KEY FINDINGS FOR MODEL
# ============================================================================
print("\n" + "="*80)
print("EDA SUMMARY: KEY FINDINGS FOR MODELING")
print("="*80)
print()

print("✓ DATA QUALITY:")
print(f"  - {len(df):,} clean records")
print(f"  - No missing values")
print(f"  - {len(df['product'].unique())} products, {len(df['customer_segment'].unique())} segments")
print()

print("✓ PRICING DYNAMICS CONFIRMED:")
print("  - Clear negative price-quantity relationship (demand curve exists)")
print("  - Different elasticities by product (need product-specific models)")
print("  - Seasonality present (Q3/Q4 stronger)")
print("  - Competitor pricing matters")
print()

print("✓ OPTIMIZATION OPPORTUNITIES:")
print(f"  - Focus on high-profit products: {product_profit.index[0]}")
print(f"  - Different strategies by segment needed")
print(f"  - Seasonal pricing adjustments possible")
print(f"  - Inventory-based dynamic pricing potential")
print()

print("✓ READY FOR FEATURE ENGINEERING")
print("  Next: Create features that capture these relationships")
print()

print("="*80)
print("EDA COMPLETE - Proceed to Feature Engineering")
print("="*80)


EDA SUMMARY: KEY FINDINGS FOR MODELING

✓ DATA QUALITY:
  - 10,000 clean records
  - No missing values
  - 5 products, 4 segments

✓ PRICING DYNAMICS CONFIRMED:
  - Clear negative price-quantity relationship (demand curve exists)
  - Different elasticities by product (need product-specific models)
  - Seasonality present (Q3/Q4 stronger)
  - Competitor pricing matters

✓ OPTIMIZATION OPPORTUNITIES:
  - Focus on high-profit products: Microscope
  - Different strategies by segment needed
  - Seasonal pricing adjustments possible
  - Inventory-based dynamic pricing potential

✓ READY FOR FEATURE ENGINEERING
  Next: Create features that capture these relationships

EDA COMPLETE - Proceed to Feature Engineering
