# üìà Quantitative Value Screening Strategy
## S&P 500 Equity Analysis Based on Gray & Carlisle (2012)

**Author:** Giuseppe Siragusa  
**Date:** 2026-01-11  
**Universe:** S&P 500 (ex-Financials)  
**Methodology:** Enterprise Value / EBIT Value Screening  
**Rebalance:** Annual (June 30)  

---

### üìö References
- Gray, W. & Carlisle, T. (2012). *Quantitative Value: A Practitioner's Guide*
- Greenblatt, J. (2005). *The Little Book That Beats the Market*
- Piotroski, J. (2000). *Value Investing: The Use of Historical Financial Statements*

---

### üéØ Objective
Identify S&P 500 stocks in the **top 10% cheapest** (by EV/EBIT) and **top 10% quality** to construct a concentrated value portfolio.

---


## üìë Table of Contents

1. [Configuration & Imports](#1-configuration--imports)
2. [Data Acquisition](#2-data-acquisition)
   - 2.1 Universe Construction
   - 2.2 Fundamental Data Extraction
3. [Step 1: Risk Filters - Margin of Safety](#3-step-1-risk-filters)
   - 3.1 Earnings Manipulation Detection
   - 3.2 Financial Distress Screening
4. [Step 2: Price Screening](#4-step-2-price-screening)
   - 4.1 EV/EBIT Calculation
   - 4.2 Value Decile Ranking
5. [Step 3: Quality Assessment](#5-step-3-quality-assessment)
   - 5.1 Franchise Power Metrics
   - 5.2 Financial Strength Metrics
   - 5.3 Composite Quality Score
6. [Portfolio Construction](#6-portfolio-construction)
   - 6.1 Value Portfolio (Top 10%)
   - 6.2 Glamour Portfolio (Bottom 10%)
7. [Analysis & Visualization](#7-analysis--visualization)
   - 7.1 Sector Analysis
   - 7.2 Valuation Distribution
   - 7.3 Quality Heatmaps
8. [Export & Reporting](#8-export--reporting)

---


# 1. Configuration & Imports

This section loads all necessary libraries and sets global parameters for the analysis.

**Key Libraries:**
- `yfinance`: Real-time financial data
- `pandas/numpy`: Data manipulation
- `matplotlib/seaborn`: Visualization
- `scipy`: Statistical analysis

**Global Settings:**
- Suppress warnings for cleaner output
- Set display options for large dataframes
- Define color schemes for charts


In [16]:
# Standard library imports
import warnings
from datetime import datetime
from io import StringIO

# Data manipulation
import pandas as pd
import numpy as np

# Financial data
import yfinance as yf
import requests

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Statistical tools
from scipy import stats

# Display settings
warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.float_format', '{:.2f}'.format)

print("‚úì All libraries loaded successfully")
print(f"Analysis Date: {datetime.now().strftime('%Y-%m-%d %H:%M')}")


‚úì All libraries loaded successfully
Analysis Date: 2026-01-11 14:29


In [None]:
# === SCREENING PARAMETERS ===
ANALYSIS_DATE = datetime.now().strftime('%Y%m%d')
MIN_MARKET_CAP = 1e9  # $1B minimum
EXCLUDE_SECTORS = ['Financials']  # Per Gray & Carlisle
VALUE_PERCENTILE = 10  # Top 10% cheapest
QUALITY_CUTOFF = 50  # Top 50% quality within value decile

# === OUTPUT SETTINGS ===
EXPORT_CSV = True
EXPORT_PATH = f"./output/{ANALYSIS_DATE}/"

# === VISUALIZATION THEME ===
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("‚úì Parameters configured")

# 2. Data Acquisition

**2.1 Universe Construction:**  
- Fetch S&P 500 constituents from Wikipedia
- Filter by market cap > $1B
- Exclude Financials sector

**2.2 Fundamental Data Extraction:**  
- Income statements (EBIT)
- Balance sheets (Debt, Cash)
- Market data (Market Cap, Share Price)

**Expected Runtime:** 5-10 minutes for 500+ stocks  
**Progress Tracking:** Updates every 50 stocks


In [2]:
# Cell 2: Get S&P 500 tickers with company details (UPDATED)

def get_sp500_tickers():
    """Fetch current S&P 500 constituents from Wikipedia"""
    url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
    
    # Add headers to avoid 403 error
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
    }
    
    response = requests.get(url, headers=headers)
    response.raise_for_status()
    
    tables = pd.read_html(StringIO(response.text))
    sp500_table = tables[0]
    
    # Extract ticker and company name from Wikipedia table
    df_constituents = sp500_table[['Symbol', 'Security', 'GICS Sector', 'GICS Sub-Industry']].copy()
    df_constituents.columns = ['Ticker', 'Company_Name', 'Sector', 'Industry']
    
    # Clean tickers (some have issues with yfinance)
    df_constituents['Ticker'] = df_constituents['Ticker'].str.replace('.', '-')
    
    return df_constituents

# Fetch S&P 500 constituents
df_sp500 = get_sp500_tickers()
sp500_tickers = df_sp500['Ticker'].tolist()

print(f"Fetched {len(sp500_tickers)} S&P 500 constituents\n")
print("="*100)
print("FETCHING MARKET CAP DATA (this may take 2-3 minutes)...")
print("="*100)

# Fetch market cap for each ticker
market_caps = []
for i, ticker in enumerate(sp500_tickers):
    if i % 100 == 0:
        print(f"Progress: {i}/{len(sp500_tickers)}...")
    
    try:
        stock = yf.Ticker(ticker)
        info = stock.info
        mkt_cap = info.get('marketCap', None)
        market_caps.append(mkt_cap)
    except:
        market_caps.append(None)

# Add market cap to dataframe
df_sp500['Market_Cap'] = market_caps

# Format market cap in billions
df_sp500['Market_Cap_B'] = df_sp500['Market_Cap'].apply(
    lambda x: f"${x/1e9:.2f}B" if pd.notna(x) else "N/A"
)

# Sort by market cap (largest first)
df_sp500_sorted = df_sp500.sort_values('Market_Cap', ascending=False, na_position='last')

# Display formatted table
df_display = df_sp500_sorted[['Ticker', 'Company_Name', 'Market_Cap_B', 'Sector', 'Industry']].copy()
df_display.columns = ['Ticker', 'Company', 'Market Cap', 'Sector', 'Industry']

print("\n" + "="*100)
print(f"S&P 500 CONSTITUENTS ({len(df_display)} stocks)")
print("="*100)
display(df_display)

# Summary statistics
print("\n" + "="*100)
print("SUMMARY STATISTICS")
print("="*100)
valid_mkt_caps = df_sp500['Market_Cap'].dropna()
print(f"Total S&P 500 Market Cap: ${valid_mkt_caps.sum()/1e12:.2f}T")
print(f"Average Market Cap: ${valid_mkt_caps.mean()/1e9:.2f}B")
print(f"Median Market Cap: ${valid_mkt_caps.median()/1e9:.2f}B")
print(f"\nSector Breakdown:")
print(df_sp500['Sector'].value_counts())


NameError: name 'requests' is not defined

In [None]:
# Quick sanity check
print(f"Total Universe: {len(df_sp500)} stocks")
print(f"\nSector Breakdown:")
print(df_sp500['Sector'].value_counts())
print(f"\nMarket Cap Range:")
print(f"  Min: ${df_sp500['Market_Cap'].min()/1e9:.2f}B")
print(f"  Max: ${df_sp500['Market_Cap'].max()/1e9:.2f}B")
print(f"  Median: ${df_sp500['Market_Cap'].median()/1e9:.2f}B")


NameError: name 'df_sp500' is not defined

# 3. Step 1: Risk Filters - Margin of Safety

> **"There is seldom just one cockroach in the kitchen."** - Warren Buffett

Before screening for value, we eliminate stocks at risk of **permanent capital loss**:

### 3.1 Earnings Manipulation Detection
- **Scaled Total Accruals (STA)**: Detects aggressive revenue recognition
- **Scaled Net Operating Assets (SNOA)**: Flags balance sheet bloat
- **Probability of Manipulation (PROBM)**: Beneish M-Score variant

**Action:** Eliminate top 5% of universe by each metric

### 3.2 Financial Distress Screening  
- **Campbell-Hilscher-Szilagyi Model**: Bankruptcy prediction
- Uses 8 variables: Profitability, Leverage, Volatility, Size, Liquidity

**Action:** Eliminate top 5% by failure probability

---

**‚ö†Ô∏è Implementation Note:**  
Currently **NOT IMPLEMENTED** due to `yfinance` data limitations. Requires:
- Cash flow statements (for accruals)
- Multi-year historical data (for trends)
- Delisting returns (for distress probabilities)

Consider upgrading to **Sharadar/Quandl** or **QuantConnect** for production use.


In [None]:
# === PLACEHOLDER: RISK SCREENING ===
# TODO: Implement when upgraded to institutional data provider
# 
# Functions to add:
# - calculate_accruals(ticker)
# - calculate_beneish_mscore(ticker) 
# - calculate_distress_probability(ticker)
#
# Estimated impact: -5% of universe, +200-300bps annual alpha

print("‚ö†Ô∏è Risk filters NOT YET IMPLEMENTED")
print("Current universe: 100% of non-financial S&P 500")
print("Recommended: Eliminate ~5% highest-risk stocks")


# 4. Step 2: Price Screening - Find the Cheapest Stocks

**Valuation Metric:** Enterprise Value / EBIT (Earnings Before Interest & Taxes)

$$
\text{EV/EBIT} = \frac{\text{Market Cap} + \text{Total Debt} - \text{Cash}}{\text{EBIT}}
$$

**Why EV/EBIT?**
- ‚úÖ Capital-structure neutral (unlike P/E)
- ‚úÖ Pre-tax (comparable across jurisdictions)
- ‚úÖ Gray & Carlisle found it **best single price metric** (1974-2011 backtest)

**Filters Applied:**
- Exclude negative EBIT (unprofitable companies)
- Exclude extreme outliers (EV/EBIT > 100x)
- Exclude Financials (leverage has different meaning)

**Output:** Ranked list from cheapest (low EV/EBIT) to most expensive


In [None]:
# Cell 4: Calculate EV/EBIT ratio and rank stocks (FIXED - Excludes Financials)
def calculate_ev_ebit_ratio(ticker):
    """
    Calculate Enterprise Value / EBIT ratio using financial statements
    EV = Market Cap + Total Debt - Cash
    EBIT = Operating Income (most recent fiscal year)
    
    Note: Financials excluded per Gray & Carlisle methodology
    """
    try:
        stock = yf.Ticker(ticker)
        
        # Get market cap from info
        info = stock.info
        market_cap = info.get('marketCap', None)
        
        if market_cap is None or market_cap <= 0:
            return None, None, None, None
        
        # Get balance sheet for debt and cash
        balance_sheet = stock.balance_sheet
        if balance_sheet is None or balance_sheet.empty:
            return None, None, None, None
        
        # Get most recent balance sheet data (first column)
        latest_bs = balance_sheet.iloc[:, 0]
        
        # Get total debt (try multiple field names)
        total_debt = None
        for debt_field in ['Total Debt', 'Long Term Debt', 'Total Liabilities']:
            if debt_field in latest_bs.index:
                total_debt = latest_bs[debt_field]
                if pd.notna(total_debt):
                    break
        
        # Get cash and cash equivalents
        cash = None
        for cash_field in ['Cash And Cash Equivalents', 'Cash', 'Cash Cash Equivalents And Short Term Investments']:
            if cash_field in latest_bs.index:
                cash = latest_bs[cash_field]
                if pd.notna(cash):
                    break
        
        # Default to 0 if not found
        if total_debt is None:
            total_debt = 0
        if cash is None:
            cash = 0
        
        # Get income statement for EBIT
        financials = stock.financials
        if financials is None or financials.empty:
            return None, None, None, None
        
        # Get most recent fiscal year data
        latest_financials = financials.iloc[:, 0]
        
        # Get EBIT / Operating Income
        ebit = None
        for ebit_field in ['Operating Income', 'EBIT', 'Operating Revenue']:
            if ebit_field in latest_financials.index:
                ebit = latest_financials[ebit_field]
                if pd.notna(ebit):
                    break
        
        if ebit is None or ebit <= 0:
            return None, None, None, None
        
        # Calculate Enterprise Value
        enterprise_value = market_cap + total_debt - cash
        
        # Calculate EV/EBIT ratio
        ev_ebit_ratio = enterprise_value / ebit
        
        # Filter out extreme outliers
        if ev_ebit_ratio < 0 or ev_ebit_ratio > 100:
            return None, None, None, None
        
        return enterprise_value, ebit, ev_ebit_ratio, market_cap
    
    except Exception as e:
        return None, None, None, None


# EXCLUDE FINANCIALS per Gray & Carlisle methodology
# High leverage is normal for financial firms and doesn't indicate distress
print("="*100)
print("EXCLUDING FINANCIAL SECTOR")
print("="*100)
print("Reason: High leverage in financials doesn't have the same meaning as non-financials")
print("Source: Gray & Carlisle (Quantitative Value) + Fama-French (1992)\n")

df_sp500_nonfinancial = df_sp500[df_sp500['Sector'] != 'Financials'].copy()
sp500_nonfinancial_tickers = df_sp500_nonfinancial['Ticker'].tolist()

print(f"Original universe: {len(sp500_tickers)} stocks")
print(f"After excluding Financials: {len(sp500_nonfinancial_tickers)} stocks")
print(f"Financials excluded: {len(sp500_tickers) - len(sp500_nonfinancial_tickers)} stocks\n")

# Calculate EV/EBIT for non-financial S&P 500 stocks
print("="*100)
print("CALCULATING EV/EBIT RATIOS (this will take 5-10 minutes)...")
print("="*100)

ev_list = []
ebit_list = []
ev_ebit_list = []
mktcap_list = []
successful_count = 0

for i, ticker in enumerate(sp500_nonfinancial_tickers):
    if i % 50 == 0:
        print(f"Progress: {i}/{len(sp500_nonfinancial_tickers)} | Valid data: {successful_count}")
    
    ev, ebit, ev_ebit, mktcap = calculate_ev_ebit_ratio(ticker)
    
    if ev_ebit is not None:
        successful_count += 1
    
    ev_list.append(ev)
    ebit_list.append(ebit)
    ev_ebit_list.append(ev_ebit)
    mktcap_list.append(mktcap)

print(f"Progress: {len(sp500_nonfinancial_tickers)}/{len(sp500_nonfinancial_tickers)} | Valid data: {successful_count}")

# Add to dataframe
df_sp500_nonfinancial['Enterprise_Value'] = ev_list
df_sp500_nonfinancial['EBIT'] = ebit_list
df_sp500_nonfinancial['EV_EBIT'] = ev_ebit_list
df_sp500_nonfinancial['Market_Cap_Updated'] = mktcap_list

# Filter out stocks with missing data
df_ranked = df_sp500_nonfinancial[df_sp500_nonfinancial['EV_EBIT'].notna()].copy()

print(f"\n{'='*100}")
print(f"RANKING COMPLETE: {len(df_ranked)} non-financial stocks with valid EV/EBIT data")
print(f"{'='*100}\n")

if len(df_ranked) == 0:
    print("ERROR: No valid data found. Showing debug info for first 5 tickers...")
    for ticker in sp500_nonfinancial_tickers[:5]:
        print(f"\n--- Debugging {ticker} ---")
        stock = yf.Ticker(ticker)
        print(f"Balance Sheet columns: {stock.balance_sheet.columns.tolist() if stock.balance_sheet is not None else 'None'}")
        print(f"Financials columns: {stock.financials.columns.tolist() if stock.financials is not None else 'None'}")
        if stock.balance_sheet is not None and not stock.balance_sheet.empty:
            print(f"Balance Sheet index: {stock.balance_sheet.index.tolist()[:10]}")
        if stock.financials is not None and not stock.financials.empty:
            print(f"Financials index: {stock.financials.index.tolist()[:10]}")
else:
    # Sort by EV/EBIT (ascending = cheapest first)
    df_ranked = df_ranked.sort_values('EV_EBIT', ascending=True).reset_index(drop=True)
    
    # Add ranking and percentile
    df_ranked['Rank'] = range(1, len(df_ranked) + 1)
    df_ranked['Percentile'] = (df_ranked['Rank'] / len(df_ranked) * 100).round(1)
    
    # Format for display - PROPERLY ROUNDED
    # EV ($B): 2 decimals
    df_ranked['EV ($B)'] = df_ranked['Enterprise_Value'].apply(lambda x: f"{x/1e9:.2f}")
    
    # EBIT ($M): 0 decimals (rounded to millions)
    df_ranked['EBIT ($M)'] = df_ranked['EBIT'].apply(lambda x: f"{x/1e6:,.0f}")
    
    # EV/EBIT Ratio: 2 decimals + 'x'
    df_ranked['EV/EBIT Ratio'] = df_ranked['EV_EBIT'].apply(lambda x: f"{x:.2f}x")
    
    # Percentile: 1 decimal (already done above with .round(1))
    
    # Sector analysis
    print("SECTOR DISTRIBUTION (Non-Financials):")
    print("="*100)
    sector_counts = df_ranked['Sector'].value_counts()
    print(sector_counts)
    print()
    
    # Create clean display dataframe (remove index column, clean headers)
    df_display_top = df_ranked[['Rank', 'Ticker', 'Company_Name', 'Sector', 'EV/EBIT Ratio', 
                                  'EV ($B)', 'EBIT ($M)', 'Percentile']].head(30).copy()
    df_display_top.columns = ['Rank', 'Ticker', 'Company Name', 'Sector', 'EV/EBIT Ratio', 
                               'EV ($B)', 'EBIT ($M)', 'Percentile']
    
    df_display_bottom = df_ranked[['Rank', 'Ticker', 'Company_Name', 'Sector', 'EV/EBIT Ratio', 
                                     'EV ($B)', 'EBIT ($M)', 'Percentile']].tail(30).copy()
    df_display_bottom.columns = ['Rank', 'Ticker', 'Company Name', 'Sector', 'EV/EBIT Ratio', 
                                  'EV ($B)', 'EBIT ($M)', 'Percentile']
    
    # Display top 30 value stocks (hide default index)
    print("\nTOP 30 VALUE STOCKS (Lowest EV/EBIT - Excludes Financials):")
    print("="*100)
    display(df_display_top.style.hide(axis='index'))
    
    # Display bottom 30 glamour stocks (hide default index)
    print("\n" + "="*100)
    print("BOTTOM 30 GLAMOUR STOCKS (Highest EV/EBIT - Excludes Financials):")
    print("="*100)
    display(df_display_bottom.style.hide(axis='index'))


EXCLUDING FINANCIAL SECTOR
Reason: High leverage in financials doesn't have the same meaning as non-financials
Source: Gray & Carlisle (Quantitative Value) + Fama-French (1992)

Original universe: 503 stocks
After excluding Financials: 427 stocks
Financials excluded: 76 stocks

CALCULATING EV/EBIT RATIOS (this will take 5-10 minutes)...
Progress: 0/427 | Valid data: 0
Progress: 50/427 | Valid data: 46
Progress: 100/427 | Valid data: 94
Progress: 150/427 | Valid data: 138
Progress: 200/427 | Valid data: 186
Progress: 250/427 | Valid data: 234
Progress: 300/427 | Valid data: 282
Progress: 350/427 | Valid data: 328
Progress: 400/427 | Valid data: 375
Progress: 427/427 | Valid data: 398

RANKING COMPLETE: 396 non-financial stocks with valid EV/EBIT data

SECTOR DISTRIBUTION (Non-Financials):
Sector
Industrials               76
Information Technology    57
Health Care               56
Consumer Discretionary    45
Consumer Staples          36
Utilities                 31
Real Estate         

Rank,Ticker,Company Name,Sector,EV/EBIT Ratio,EV ($B),EBIT ($M),Percentile
1,APA,APA Corporation,Energy,4.55x,14.55,3199,0.3
2,MOH,Molina Healthcare,Health Care,4.86x,8.29,1707,0.5
3,LEN,Lennar,Consumer Discretionary,5.92x,28.73,4850,0.8
4,EOG,EOG Resources,Energy,6.70x,55.26,8253,1.0
5,PHM,PulteGroup,Consumer Discretionary,6.87x,26.72,3890,1.3
6,DVN,Devon Energy,Energy,7.50x,31.06,4143,1.5
7,HPQ,HP Inc.,Information Technology,7.52x,27.26,3624,1.8
8,TAP,Molson Coors Beverage Company,Consumer Staples,8.12x,14.74,1816,2.0
9,HAL,Halliburton,Energy,8.58x,33.8,3938,2.3
10,KHC,Kraft Heinz,Consumer Staples,8.64x,46.26,5352,2.5



BOTTOM 30 GLAMOUR STOCKS (Highest EV/EBIT - Excludes Financials):


Rank,Ticker,Company Name,Sector,EV/EBIT Ratio,EV ($B),EBIT ($M),Percentile
367,DXCM,Dexcom,Health Care,47.35x,28.41,600,92.7
368,FIX,Comfort Systems USA,Industrials,47.44x,35.41,746,92.9
369,PLD,Prologis,Real Estate,48.54x,150.37,3098,93.2
370,CPT,Camden Property Trust,Real Estate,50.76x,15.09,297,93.4
371,ADI,Analog Devices,Information Technology,51.13x,153.52,3002,93.7
372,IDXX,Idexx Laboratories,Health Care,51.20x,57.77,1128,93.9
373,PWR,Quanta Services,Industrials,51.23x,66.75,1303,94.2
374,GE,GE Aerospace,Industrials,51.28x,346.68,6761,94.4
375,BAX,Baxter International,Health Care,51.29x,22.52,439,94.7
376,BSX,Boston Scientific,Health Care,51.79x,155.48,3002,94.9


In [None]:
# === DATA QUALITY CHECKS ===
print(f"‚úì Valid EV/EBIT data: {len(df_ranked)}/{len(df_sp500_nonfinancial)} stocks")
print(f"  Excluded: {len(df_sp500_nonfinancial) - len(df_ranked)} stocks")
print(f"\nEV/EBIT Range:")
print(f"  Min: {df_ranked['EV_EBIT'].min():.2f}x")
print(f"  Max: {df_ranked['EV_EBIT'].max():.2f}x")
print(f"  Median: {df_ranked['EV_EBIT'].median():.2f}x")
print(f"  Mean: {df_ranked['EV_EBIT'].mean():.2f}x")

# Flag suspiciously cheap stocks
red_flags = df_ranked[df_ranked['EV_EBIT'] < 3]
if len(red_flags) > 0:
    print(f"\n‚ö†Ô∏è {len(red_flags)} stocks with EV/EBIT < 3x (potential value traps):")
    print(red_flags[['Ticker', 'Company_Name', 'EV/EBIT Ratio']].head(10))


# 5. Step 3: Quality Assessment - Separate Winners from Losers

> **"It's far better to buy a wonderful company at a fair price than a fair company at a wonderful price."** - Warren Buffett

Within the **value decile** (top 10% cheapest), we separate high-quality from low-quality stocks.

### 5.1 Franchise Power (Long-term Earning Ability)
- 8-year ROA (Return on Assets)
- 8-year ROC (Return on Capital)
- 8-year FCF/Assets (Free Cash Flow generation)
- Gross Margin Growth & Stability

### 5.2 Financial Strength (Current Health)  
- Profitability trends (ROA, FCF changes)
- Liquidity (Current ratio)
- Leverage (Debt/Equity trend)
- Operational efficiency (Asset turnover, margin expansion)

### 5.3 Composite Quality Score
$$
\text{Quality} = 0.5 \times \text{Franchise Power} + 0.5 \times \text{Financial Strength}
$$

**Decision Rule:** Within value decile, buy **top 50% quality** stocks only

---

**‚ö†Ô∏è Implementation Note:**  
Currently **NOT IMPLEMENTED**. Requires 8+ years of historical financials.  
Workaround: Use current-year metrics as proxy (ROE, Current Ratio, Gross Margin)


In [None]:
# === PLACEHOLDER: QUALITY SCORING ===
# TODO: Calculate quality metrics
#
# Simplified proxy for now:
# - Use current ROE (vs. 8-year ROA)
# - Use current margin (vs. 8-year stability)
# - Use current liquidity ratios

print("‚ö†Ô∏è Full quality metrics NOT YET IMPLEMENTED")
print("Current approach: All value stocks treated equally")
print("Recommended: Split value decile by quality (50/50)")


# 6. Portfolio Construction

### 6.1 Value Portfolio (Top 10% Cheapest)
- **Selection:** Stocks ranked 1-45 by EV/EBIT
- **Expected:** 40-50 stocks (depending on data availability)
- **Weighting:** Equal-weight or Market-cap weight
- **Rebalance:** Annual (June 30)

### 6.2 Glamour Portfolio (Bottom 10% Most Expensive)
- **Selection:** Stocks ranked bottom 10% by EV/EBIT  
- **Purpose:** Benchmark comparison (what NOT to buy)

### 6.3 Portfolio Characteristics
Summary statistics for each portfolio:
- Number of stocks
- Average EV/EBIT
- Sector concentration
- Market cap distribution


In [None]:
# === VALUE PORTFOLIO SUMMARY ===
print("="*80)
print("VALUE PORTFOLIO (TOP 10% CHEAPEST)")
print("="*80)
print(f"Number of Stocks: {len(df_value)}")
print(f"Avg EV/EBIT: {df_value['EV_EBIT'].mean():.2f}x")
print(f"Total Market Cap: ${df_value['Market_Cap'].sum()/1e9:.1f}B")
print(f"\nTop 5 Holdings:")
display(df_value[['Ticker', 'Company Name', 'Sector', 'EV/EBIT Ratio', 'Market Cap']].head())

# === GLAMOUR PORTFOLIO SUMMARY ===
print("\n" + "="*80)
print("GLAMOUR PORTFOLIO (BOTTOM 10% MOST EXPENSIVE)")
print("="*80)
print(f"Number of Stocks: {len(df_glamour)}")
print(f"Avg EV/EBIT: {df_glamour['EV_EBIT'].mean():.2f}x")
print(f"\nTop 5 'Overvalued' Stocks:")
display(df_glamour[['Ticker', 'Company Name', 'Sector', 'EV/EBIT Ratio']].head())


# 7. Analysis & Visualization

Visual inspection of screening results to identify patterns and potential issues.

### Charts Generated:
1. **Sector Allocation**: Value vs. Glamour vs. S&P 500
2. **Valuation Distribution**: Histogram of EV/EBIT ratios
3. **Market Cap Distribution**: Are we picking small-caps or large-caps?
4. **Quality Heatmap** (when implemented): Franchise Power vs. Financial Strength
