# Whose Voice Counts? Understanding Advocacy Group Prominence in Congressional Discourse

**Author:** Kaleb Mazurek  
**Date:** June 16, 2023  
**Converted to Python:** November 2025

---

## Abstract

In democratic systems, interest groups compete not only for policy influence but for recognition as legitimate representatives of constituencies. This study explores **advocacy group prominence**—when legislators publicly cite organizations as authoritative voices in policy debates—a form of symbolic power with implications for democratic representation and pluralism.

Drawing on institutionalized pluralism theory and theories of legislative communication, I analyze three mechanisms that may explain why politicians afford prominence to certain advocacy organizations:

1. **Strategic Communication**: Do legislators invoke groups when issues are publicly salient, using them as validators for positions?
2. **Electoral and Policy Alignment**: Do reelection incentives, seniority, and constituency connections shape which groups members elevate?
3. **Organizational Resources**: Do established organizations with lobbying capacity gain greater prominence?

Using machine learning to identify 15,000+ mentions of 500+ advocacy organizations across Congressional floor speeches (2015-2017), I estimate mixed-effects logistic regression models to test these hypotheses. 

**Key findings challenge conventional wisdom**: Medium-salience (not high-salience) policy areas predict prominence; senior legislators are *less* likely to cite groups; and while external lobbyists increase prominence, the effect is modest. These patterns suggest prominence operates through distinct channels from traditional lobbying influence, with implications for understanding whose perspectives gain visibility in democratic discourse.

This research contributes to interest group scholarship by empirically examining a previously unmeasured form of group success—symbolic recognition—and to legislative studies by revealing how members use group citations to communicate authority and values to diverse audiences.

---

## 1. Introduction: Beyond Access and Influence

### The Prominence Gap in Interest Group Research

Interest group scholarship has produced rich insights into lobbying strategies, access to policymakers, and policy influence. Yet scholars have devoted surprisingly little attention to a more public form of group power: **symbolic recognition in legislative discourse**. When a member of Congress cites an advocacy organization by name during floor debate, they signal to constituents, colleagues, and the media that this group represents a legitimate voice worthy of consideration.

This form of prominence differs from traditional measures of interest group success:

- **Access**: Private meetings, testimony opportunities, informal consultation
- **Influence**: Changing policy outcomes, shaping legislative language
- **Prominence**: Public recognition as an authoritative voice for a constituency

Prominence matters because it shapes whose perspectives are visible in democratic deliberation, independent of behind-the-scenes influence.

### Theoretical Framework: Institutionalized Pluralism

This study draws on **institutionalized pluralism** (Halpin & Fraussen, 2017) which recognizes that advocacy groups operate within institutional structures that shape their opportunities for recognition. Legislators don't simply respond to all groups equally—they systematically privilege certain voices based on:

1. **Strategic communication needs**: Groups serve as validators when politicians need external credibility
2. **Constituency connections**: Groups signal responsiveness to organized interests in districts/states  
3. **Policy expertise**: Organizations provide information subsidies on complex issues

### The Democratic Stakes: Whose Voice Gets Heard?

Understanding prominence patterns illuminates fundamental questions about democratic representation:

- **Pluralist theory**: Does prominence concentrate among well-resourced organizations, or do diverse groups gain recognition?
- **Descriptive representation**: Do groups representing marginalized communities achieve symbolic visibility?
- **Legislative communication**: How do politicians use group citations to signal values to multiple audiences?

As Grossmann (2012) notes, "The groups that appear most frequently in public discussions may differ from those most active in lobbying." This study empirically examines that divergence.

### Research Questions

This analysis tests three sets of hypotheses about the drivers of advocacy group prominence:

**Model A: Issue Salience and Public Attention**  
Do legislators strategically invoke groups when issues are publicly salient, or do prominence patterns follow different logics?

**Model B: Politician-Group Linkages**  
How do electoral incentives, seniority, legislative activity, and constituency alignment shape which groups members cite?

**Model C: Organizational Characteristics**  
Do established organizations with greater resources, lobbying capacity, and policy breadth achieve higher prominence?

---

## Executive Summary

**For Policymakers and Research Audiences**

This study addresses a fundamental question in democratic governance: **Who gets heard in Congress, and why?** While scholars have extensively studied which interest groups gain access to policymakers or successfully influence policy outcomes, we know surprisingly little about which advocacy organizations politicians publicly recognize as authoritative voices for their constituencies.

### Key Findings

1. **The Prominence Paradox**: Contrary to conventional wisdom, advocacy groups mentioned during highly salient policy debates are *not* more likely to be cited as authoritative voices. Instead, medium-salience policy areas show the strongest prominence effects—suggesting legislators strategically invoke groups when issues are visible but not polarizing.

2. **Seniority and Voice**: More senior legislators are *less* likely to afford prominence to interest groups, challenging assumptions that experience leads to reliance on organized interests. This may reflect senior members' independence from external validators.

3. **Money ≠ Prominence**: Organizations employing external lobbyists show increased legislative prominence, but the effect is modest. This suggests prominence operates through different channels than traditional lobbying access.

### Why This Matters for Democracy

Legislative prominence—being publicly recognized as a legitimate voice for a constituency—represents a distinct form of political power. Unlike behind-the-scenes influence, prominence shapes public discourse about whose perspectives count in policymaking. Understanding its drivers illuminates:

- **Representation gaps**: Which constituencies gain public recognition vs. which remain invisible
- **Pluralist theory**: Whether resource advantages translate to symbolic power
- **Legislative communication**: How members signal values to constituents through group citations

This analysis uses machine learning to identify over 15,000 mentions of 500+ advocacy organizations in Congressional floor speeches, revealing systematic patterns in which voices legislators elevate in democratic debate.

---

## Setup and Dependencies

In [None]:
# Install required packages (uncomment if needed)
# !pip install pandas numpy statsmodels scipy matplotlib seaborn
# !pip install patsy scikit-learn tableone openpyxl

In [None]:
# Core libraries
import pandas as pd
import numpy as np
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Statistical modeling
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.genmod.generalized_linear_model import GLM
from statsmodels.genmod import families
from scipy import stats

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Table formatting
from IPython.display import display, HTML

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.float_format', lambda x: '%.4f' % x)

# Set plotting style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('colorblind')

print("Libraries loaded successfully!")

## Configuration

In [None]:
# Configuration class for project settings
class Config:
    """Project configuration settings"""
    
    # File paths (update these for your environment)
    DATA_DIR = Path("./data")
    OUTPUT_DIR = Path("./output")
    
    # Data file
    LEVEL1_FILE = DATA_DIR / "level1.csv"
    
    # Organization to exclude
    EXCLUDED_ORG_ID = 20114287
    
    # Random seed for reproducibility
    RANDOM_SEED = 42
    
    # Reference categories for factors
    REF_CHAMBER = "House of Representatives"
    REF_PARTY = "Democrat"
    REF_SALIENCY = "low"
    REF_ABBREVCAT = "Business Interests"
    REF_MSHIP = "Association of Institutions"
    REF_TERM = "First Year"

config = Config()

# Create output directory if it doesn't exist
config.OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

---

# Data Loading and Preprocessing

This section handles all data loading and transformation steps that are common across all models.

In [None]:
def load_data(filepath: Path, excluded_org_id: int = None) -> pd.DataFrame:
    """
    Load the level1 dataset and perform initial filtering.
    
    Parameters
    ----------
    filepath : Path
        Path to the CSV file
    excluded_org_id : int, optional
        Organization ID to exclude from analysis
        
    Returns
    -------
    pd.DataFrame
        Loaded and filtered dataframe
    """
    df = pd.read_csv(filepath)
    
    if excluded_org_id:
        df = df[df['level1_org_id'] != excluded_org_id].copy()
        
    print(f"Loaded {len(df):,} rows")
    return df

In [None]:
def recode_abbrevcat(df: pd.DataFrame) -> pd.DataFrame:
    """
    Recode the ABBREVCAT (organization category) variable.
    
    Original categories are renamed and collapsed into broader groupings:
    - Business Interests
    - Government Interests  
    - Non-Business Interests
    """
    df = df.copy()
    
    # Original to readable name mapping
    name_mapping = {
        "(1) Corporations": "Corporations",
        "(13) Social welfare or poor": "Social Welfare or Poor",
        "(14) State and local governments": "State and Local Governments",
        "(16) Other": "Other",
        "(2) Trade and other business associations": "Trade and Business Associations",
        "(3) Occupational associations": "Occupational Associations",
        "(4) Unions": "Unions",
        "(5) Education": "Education",
        "(6) Health": "Health",
        "(7) Public interest": "Public Interest",
        "(8) Identity groups": "Identity Groups"
    }
    
    # Apply initial renaming
    df['level1_ABBREVCAT'] = df['level1_ABBREVCAT'].map(name_mapping)
    
    # Remove Corporations (as in original R code)
    df = df[df['level1_ABBREVCAT'] != 'Corporations'].copy()
    
    # Collapse into broader categories
    collapse_mapping = {
        "Trade and Business Associations": "Business-Oriented Interests",
        "Corporations": "Business-Oriented Interests",
        "State and Local Governments": "Government Interests",
        "Unions": "Non-business/nongovernment",
        "Education": "Non-business/nongovernment",
        "Health": "Non-business/nongovernment",
        "Social Welfare or Poor": "Non-business/nongovernment",
        "Public Interest": "Non-business/nongovernment",
        "Identity Groups": "Non-business/nongovernment",
        "Occupational Associations": "Non-business/nongovernment",
        "Other": "Non-business/nongovernment"
    }
    
    df['level1_ABBREVCAT'] = df['level1_ABBREVCAT'].map(collapse_mapping)
    
    # Further collapse for final analysis
    final_mapping = {
        "Business-Oriented Interests": "Business Interests",
        "Government Interests": "Government Interests",
        "Non-business/nongovernment": "Non-Business Interests"
    }
    
    df['level1_ABBREVCAT'] = df['level1_ABBREVCAT'].map(final_mapping).fillna("Non-Business Interests")
    
    # Create binary business interest indicator
    df['business_interest'] = (df['level1_ABBREVCAT'] == 'Business Interests').astype(int)
    
    print(f"ABBREVCAT categories: {df['level1_ABBREVCAT'].value_counts().to_dict()}")
    
    return df

In [None]:
def recode_membership_status(df: pd.DataFrame) -> pd.DataFrame:
    """
    Recode the MSHIP_STATUS11 (membership status) variable.
    
    Categories are collapsed into:
    - Association of Individuals
    - Association of Institutions
    - Other
    """
    df = df.copy()
    
    # Original to readable name mapping
    name_mapping = {
        "(1) Institution": "Institution",
        "(2) Association of individuals": "Association of Individuals",
        "(3) Association of institutions": "Association of Institutions",
        "(4) Government or association of governments": "Government or Association of Governments",
        "(5) Mixed": "Mixed",
        "(6) Other": "Other",
        "(9) Cant tell or DK": "Can't Tell"
    }
    
    df['level1_MSHIP_STATUS11'] = df['level1_MSHIP_STATUS11'].map(name_mapping)
    
    # Collapse categories
    collapse_mapping = {
        "Association of Individuals": "Association of Individuals",
        "Institution": "Association of Institutions",
        "Association of Institutions": "Association of Institutions",
        "Government or Association of Governments": "Association of Institutions",
        "Mixed": "Other",
        "Other": "Other",
        "Can't Tell": "Other"
    }
    
    df['level1_MSHIP_STATUS11'] = df['level1_MSHIP_STATUS11'].map(collapse_mapping).fillna("Other")
    
    print(f"Membership status categories: {df['level1_MSHIP_STATUS11'].value_counts().to_dict()}")
    
    return df

In [None]:
def create_term_status(df: pd.DataFrame) -> pd.DataFrame:
    """
    Create term status variable indicating position in electoral cycle.
    
    Categories:
    - First Year: Mention in first year of term
    - Year Before Term End: Mention in year before term ends
    - Other: All other years
    """
    df = df.copy()
    
    # Extract mention year from year_week column
    df['mention_year'] = df['level1_year_week'].astype(str).str[:4].astype(int)
    
    # Create indicators
    df['year_before_termEnd'] = (
        (df['mention_year'].notna()) & 
        (df['mention_year'] == (df['level1_termEndYear'] - 1))
    ).astype(int)
    
    df['first_year_term'] = (
        (df['mention_year'].notna()) & 
        (df['mention_year'] == df['level1_termBeginYear'])
    ).astype(int)
    
    # Create term_status categorical
    def assign_term_status(row):
        if pd.isna(row['first_year_term']) or pd.isna(row['year_before_termEnd']):
            return np.nan
        if row['first_year_term'] == 1 and row['year_before_termEnd'] == 0:
            return "First Year"
        elif row['first_year_term'] == 0 and row['year_before_termEnd'] == 1:
            return "Year Before Term End"
        else:
            return "Other"
    
    df['term_status'] = df.apply(assign_term_status, axis=1)
    
    print(f"Term status distribution: {df['term_status'].value_counts().to_dict()}")
    
    return df

In [None]:
def compute_issue_area_features(df: pd.DataFrame) -> pd.DataFrame:
    """
    Compute issue area related features for each organization.
    
    Creates:
    - most_common_issue_area: Mode of issue areas for each org
    - unique_issue_areas: Count of distinct issue areas per org
    - issue_area_overlap: Whether mention is in org's most common area
    """
    df = df.copy()
    
    # Compute most common issue area per organization
    most_common = df.groupby('level1_org_id')['level1_issue_area'].agg(
        lambda x: x.mode().iloc[0] if len(x.mode()) > 0 else np.nan
    ).reset_index()
    most_common.columns = ['level1_org_id', 'most_common_issue_area']
    
    # Compute unique issue areas per organization
    unique_areas = df.groupby('level1_org_id')['level1_issue_area'].nunique().reset_index()
    unique_areas.columns = ['level1_org_id', 'unique_issue_areas']
    
    # Merge back to main dataframe
    df = df.merge(most_common, on='level1_org_id', how='left')
    df = df.merge(unique_areas, on='level1_org_id', how='left')
    
    # Create overlap indicator
    df['issue_area_overlap'] = (df['level1_issue_area'] == df['most_common_issue_area']).astype(int)
    
    print(f"Mean unique issue areas per org: {df['unique_issue_areas'].mean():.2f}")
    
    return df

In [None]:
def compute_saliency_measure(df: pd.DataFrame) -> pd.DataFrame:
    """
    Compute saliency measure based on issue area.
    
    The saliency measure is derived from Google Trends data for each
    policy area. Categories: low (1-7), medium (8-14), high (15-21)
    """
    df = df.copy()
    
    # Issue area codes (from original R code)
    issue_area_codes = [
        "100", "200", "300", "400", "500", "600", "700", "800", "900",
        "1000", "1200", "1300", "1400", "1500", "1600", "1700", "1800",
        "1900", "2000", "2100", "2300"
    ]
    
    def get_saliency(row):
        """Get saliency rank for a given issue area"""
        issue_area = str(row['level1_issue_area'])
        if issue_area in issue_area_codes:
            col_name = f'level1_{issue_area}_saliency_rank'
            if col_name in row.index:
                return row[col_name]
        return np.nan
    
    df['saliency_measure'] = df.apply(get_saliency, axis=1)
    df['saliency_measure'] = pd.to_numeric(df['saliency_measure'], errors='coerce')
    
    # Create categorical saliency (low, medium, high)
    bins = [0, 7, 14, 21]
    labels = ['low', 'medium', 'high']
    df['saliency_category'] = pd.cut(
        df['saliency_measure'], 
        bins=bins, 
        labels=labels, 
        include_lowest=True
    )
    
    print(f"Saliency distribution: {df['saliency_category'].value_counts().to_dict()}")
    
    return df

In [None]:
def clean_categorical_variables(df: pd.DataFrame) -> pd.DataFrame:
    """
    Clean and prepare categorical variables for modeling.
    
    - Converts empty strings to NaN
    - Sets up categorical dtypes
    """
    df = df.copy()
    
    # Replace empty strings with NaN
    string_cols = ['level1_chamber_x', 'level1_partyHistory']
    for col in string_cols:
        if col in df.columns:
            df[col] = df[col].replace('', np.nan)
    
    # Convert to categorical
    categorical_cols = [
        'level1_chamber_x', 'level1_partyHistory', 'saliency_category',
        'level1_issue_area', 'level1_ABBREVCAT', 'level1_MSHIP_STATUS11',
        'term_status'
    ]
    
    for col in categorical_cols:
        if col in df.columns:
            df[col] = pd.Categorical(df[col])
    
    return df

In [None]:
def preprocess_data(filepath: Path, excluded_org_id: int = None) -> pd.DataFrame:
    """
    Complete data preprocessing pipeline.
    
    Chains all preprocessing steps together.
    """
    print("="*60)
    print("DATA PREPROCESSING PIPELINE")
    print("="*60)
    
    # Load data
    print("\n[1/7] Loading data...")
    df = load_data(filepath, excluded_org_id)
    
    # Recode organization categories
    print("\n[2/7] Recoding organization categories...")
    df = recode_abbrevcat(df)
    
    # Recode membership status
    print("\n[3/7] Recoding membership status...")
    df = recode_membership_status(df)
    
    # Create term status
    print("\n[4/7] Creating term status...")
    df = create_term_status(df)
    
    # Compute issue area features
    print("\n[5/7] Computing issue area features...")
    df = compute_issue_area_features(df)
    
    # Compute saliency measure
    print("\n[6/7] Computing saliency measure...")
    df = compute_saliency_measure(df)
    
    # Clean categorical variables
    print("\n[7/7] Cleaning categorical variables...")
    df = clean_categorical_variables(df)
    
    print("\n" + "="*60)
    print(f"Preprocessing complete. Final dataset: {len(df):,} rows, {len(df.columns)} columns")
    print("="*60)
    
    return df

---

# Model Utilities

Helper functions for fitting and evaluating mixed-effects models.

In [None]:
def fit_mixed_model(formula: str, data: pd.DataFrame, groups: str, 
                    model_name: str = "model") -> dict:
    """
    Fit a mixed-effects logistic regression model.
    
    Note: Python's statsmodels uses a different approach than R's lme4.
    For true multilevel models with crossed random effects, consider
    using pymer4 or rpy2 to call R's lme4 directly.
    
    Parameters
    ----------
    formula : str
        Model formula (Patsy/R-style)
    data : pd.DataFrame
        Input data
    groups : str
        Column name for random effects grouping
    model_name : str
        Name for the model
        
    Returns
    -------
    dict
        Dictionary containing model results
    """
    try:
        # Fit mixed-effects model
        model = smf.mixedlm(formula, data, groups=data[groups])
        result = model.fit(method='lbfgs', maxiter=1000)
        
        # Calculate statistics
        llf = result.llf
        nobs = result.nobs
        k = len(result.params)
        
        aic = -2 * llf + 2 * k
        bic = -2 * llf + np.log(nobs) * k
        
        # Get coefficients and odds ratios
        params = result.params
        conf_int = result.conf_int()
        pvalues = result.pvalues
        
        # Create summary dataframe
        summary_df = pd.DataFrame({
            'term': params.index,
            'estimate': params.values,
            'std_error': result.bse.values,
            'z_value': result.tvalues.values,
            'p_value': pvalues.values,
            'odds_ratio': np.exp(params.values),
            'ci_lower': conf_int.iloc[:, 0].values,
            'ci_upper': conf_int.iloc[:, 1].values
        })
        summary_df['model'] = model_name
        
        return {
            'model': result,
            'name': model_name,
            'formula': formula,
            'llf': llf,
            'aic': aic,
            'bic': bic,
            'nobs': nobs,
            'summary_df': summary_df,
            'converged': result.converged
        }
        
    except Exception as e:
        print(f"Error fitting model {model_name}: {e}")
        return None

In [None]:
def fit_glm_logistic(formula: str, data: pd.DataFrame, model_name: str = "model") -> dict:
    """
    Fit a standard logistic regression model (without random effects).
    
    This is a simpler alternative when mixed-effects models fail to converge
    or when random effects are not essential.
    
    Parameters
    ----------
    formula : str
        Model formula
    data : pd.DataFrame
        Input data
    model_name : str
        Name for the model
        
    Returns
    -------
    dict
        Dictionary containing model results
    """
    try:
        # Fit GLM with binomial family (logistic regression)
        model = smf.glm(formula, data, family=sm.families.Binomial())
        result = model.fit()
        
        # Get coefficients and statistics
        params = result.params
        conf_int = result.conf_int()
        
        # Create summary dataframe
        summary_df = pd.DataFrame({
            'term': params.index,
            'estimate': params.values,
            'std_error': result.bse.values,
            'z_value': result.tvalues.values,
            'p_value': result.pvalues.values,
            'odds_ratio': np.exp(params.values),
            'ci_lower': conf_int.iloc[:, 0].values,
            'ci_upper': conf_int.iloc[:, 1].values
        })
        summary_df['model'] = model_name
        
        return {
            'model': result,
            'name': model_name,
            'formula': formula,
            'llf': result.llf,
            'aic': result.aic,
            'bic': result.bic,
            'nobs': result.nobs,
            'summary_df': summary_df,
            'converged': True
        }
        
    except Exception as e:
        print(f"Error fitting model {model_name}: {e}")
        return None

In [None]:
def print_model_summary(result: dict):
    """
    Print a formatted summary of model results.
    """
    if result is None:
        print("Model fitting failed.")
        return
    
    print("\n" + "="*70)
    print(f"MODEL: {result['name']}")
    print("="*70)
    
    print(f"\nFormula: {result['formula']}")
    print(f"\nModel Fit Statistics:")
    print(f"  Log-Likelihood: {result['llf']:.2f}")
    print(f"  AIC: {result['aic']:.2f}")
    print(f"  BIC: {result['bic']:.2f}")
    print(f"  N observations: {result['nobs']:,}")
    print(f"  Converged: {result['converged']}")
    
    print("\nFixed Effects (Odds Ratios):")
    print("-"*70)
    
    df = result['summary_df'].copy()
    df['odds_ratio'] = df['odds_ratio'].round(4)
    df['p_value'] = df['p_value'].round(4)
    df['significance'] = df['p_value'].apply(
        lambda p: '***' if p < 0.001 else '**' if p < 0.01 else '*' if p < 0.05 else '.' if p < 0.1 else ''
    )
    
    for _, row in df.iterrows():
        print(f"  {row['term']:<40} OR: {row['odds_ratio']:>8.4f}  p: {row['p_value']:>7.4f} {row['significance']}")
    
    print("\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1")

In [None]:
---

# 2. Model A: Public Salience and Strategic Voice Amplification

## The Strategic Communication Hypothesis

When do legislators invoke advocacy organizations as authoritative voices? Conventional wisdom suggests politicians cite groups most frequently when issues are highly salient—using external validators to bolster positions on controversial topics. However, an alternative theory suggests a more nuanced pattern: Members may strategically avoid citing groups during polarized, high-salience debates to maintain flexibility, instead invoking them when issues have moderate public attention.

### Research Question

**Are interest groups mentioned in policy areas of high salience more likely to receive prominent citations (recognition as authoritative voices) from legislators?**

### Theoretical Expectations

**H1a (Linear Salience)**: Groups mentioned in high-salience policy areas have higher probability of prominent mention, as members seek external credibility on visible issues.

**H1b (Curvilinear Salience)**: Groups mentioned in medium-salience policy areas have highest probability of prominence. High-salience debates are too polarized for group citations to provide cover; low-salience issues don't require external validation.

### Why This Matters for Democratic Representation

The relationship between issue salience and group prominence reveals whether:

- **Public attention drives elite discourse**: Do legislators amplify group voices when citizens are watching?
- **Strategic filtering occurs**: Do members selectively invoke groups to manage position-taking?
- **Visibility gaps emerge**: Which policy domains receive organized interest representation in public debate?

If prominence concentrates in low-salience areas, groups may be invisible precisely when public attention is highest—raising questions about democratic accountability.

### Measurement

- **Dependent Variable**: `level1_prominence` (1 = prominent mention, 0 = passing mention)
- **Key Independent Variable**: `saliency_category` (low/medium/high) based on Google Trends data for policy areas
- **Controls**: Chamber, party, organization type, membership structure

---

In [None]:
def create_coefficient_plot(models: list, figsize=(12, 8)):
    """
    Create a coefficient plot (forest plot) for model comparison.
    """
    fig, ax = plt.subplots(figsize=figsize)
    
    colors = plt.cm.Set2(np.linspace(0, 1, len(models)))
    y_offset = 0
    y_positions = []
    y_labels = []
    
    for i, m in enumerate(models):
        if m is None:
            continue
            
        df = m['summary_df']
        # Exclude intercept and random effects for visualization
        df = df[~df['term'].str.contains('Intercept|Group Var', case=False)]
        
        for j, (_, row) in enumerate(df.iterrows()):
            y_pos = y_offset + j
            
            # Plot point estimate and confidence interval
            ax.errorbar(
                row['odds_ratio'], y_pos,
                xerr=[[row['odds_ratio'] - np.exp(row['ci_lower'])],
                      [np.exp(row['ci_upper']) - row['odds_ratio']]],
                fmt='o', color=colors[i], capsize=3,
                label=m['name'] if j == 0 else None
            )
            
            y_positions.append(y_pos)
            y_labels.append(f"{row['term']} ({m['name']})")
        
        y_offset += len(df) + 1
    
    # Add reference line at OR = 1
    ax.axvline(x=1, color='red', linestyle='--', alpha=0.5, label='OR = 1')
    
    ax.set_yticks(y_positions)
    ax.set_yticklabels(y_labels, fontsize=8)
    ax.set_xlabel('Odds Ratio')
    ax.set_title('Coefficient Plot (Odds Ratios with 95% CI)')
    ax.legend(loc='best')
    
    plt.tight_layout()
    return fig

---

# Model A: Issue Salience Hypothesis

**Research Question:** Are interest groups mentioned in policy areas of high salience more likely to be prominent?

**Hypothesis:** Interest groups mentioned in policy areas of high salience are more likely to be prominent.

In [None]:
# Note: This cell would normally load real data
# Since we don't have the actual data file, we'll create a synthetic example

def create_synthetic_data(n=10000, seed=42):
    """
    Create synthetic data for demonstration purposes.
    
    This mimics the structure of the original level1.csv file.
    """
    np.random.seed(seed)
    
    # Generate synthetic data
    n_orgs = 500
    n_issues = 21
    
    data = {
        'level1_org_id': np.random.randint(1, n_orgs + 1, n),
        'level1_issue_area': np.random.choice([str(x*100) for x in range(1, n_issues + 1)], n),
        'level1_prominence': np.random.binomial(1, 0.45, n),
        'level1_chamber_x': np.random.choice(['House of Representatives', 'Senate'], n, p=[0.6, 0.4]),
        'level1_partyHistory': np.random.choice(['Democrat', 'Republican', 'Independent'], n, p=[0.45, 0.45, 0.1]),
        'level1_ABBREVCAT': np.random.choice(
            ['Business Interests', 'Non-Business Interests', 'Government Interests'], 
            n, p=[0.3, 0.5, 0.2]
        ),
        'level1_MSHIP_STATUS11': np.random.choice(
            ['Association of Individuals', 'Association of Institutions', 'Other'],
            n, p=[0.3, 0.5, 0.2]
        ),
        'level1_seniority': np.random.randint(1, 40, n),
        'level1_bills_sponsored': np.random.randint(0, 100, n),
        'level1_YEARS_EXISTED': np.random.randint(1, 100, n),
        'level1_OUTSIDE11': np.random.binomial(1, 0.3, n),
        'level1_issue_maximal_overlap': np.random.binomial(1, 0.4, n),
        'term_status': np.random.choice(['First Year', 'Year Before Term End', 'Other'], n, p=[0.2, 0.2, 0.6]),
        'saliency_category': np.random.choice(['low', 'medium', 'high'], n, p=[0.33, 0.34, 0.33]),
        'unique_issue_areas': np.random.randint(1, 10, n)
    }
    
    df = pd.DataFrame(data)
    
    # Convert to categorical
    categorical_cols = ['level1_chamber_x', 'level1_partyHistory', 'level1_ABBREVCAT',
                        'level1_MSHIP_STATUS11', 'term_status', 'saliency_category']
    for col in categorical_cols:
        df[col] = pd.Categorical(df[col])
    
    return df

# Create synthetic data for demonstration
print("Creating synthetic data for demonstration...")
level1 = create_synthetic_data(n=15000)
print(f"Created dataset with {len(level1):,} observations")
print(f"\nColumn dtypes:\n{level1.dtypes}")

In [None]:
# Examine the data
print("\nData Summary:")
print("="*60)
print(f"\nProminence distribution:")
print(level1['level1_prominence'].value_counts(normalize=True))

print(f"\nSaliency category distribution:")
print(level1['saliency_category'].value_counts())

print(f"\nChamber distribution:")
print(level1['level1_chamber_x'].value_counts())

print(f"\nParty distribution:")
print(level1['level1_partyHistory'].value_counts())

### Model A - Empty Model (Intercept Only)

In [None]:
# Fit empty model (intercept only)
empty_model_a = fit_glm_logistic(
    formula="level1_prominence ~ 1",
    data=level1,
    model_name="Empty Model A"
)

print_model_summary(empty_model_a)

### Model A1 - Saliency Category Only

In [None]:
# Model 1: Saliency category
model_a1 = fit_glm_logistic(
    formula="level1_prominence ~ C(saliency_category, Treatment('low'))",
    data=level1,
    model_name="Model A1 (Saliency)"
)

print_model_summary(model_a1)

### Model A2 - Full Model with Controls

In [None]:
# Model 2: Full model with controls
model_a2 = fit_glm_logistic(
    formula="""level1_prominence ~ 
        C(saliency_category, Treatment('low')) + 
        C(level1_chamber_x, Treatment('House of Representatives')) + 
        C(level1_partyHistory, Treatment('Democrat')) + 
        C(level1_MSHIP_STATUS11, Treatment('Association of Institutions')) + 
        C(level1_ABBREVCAT, Treatment('Business Interests'))""",
    data=level1.dropna(),
    model_name="Model A2 (Full)"
)

print_model_summary(model_a2)

---

# 3. Model B: Legislative Signaling and Electoral Positioning

## The Politician-Group Linkage Hypothesis

Why do some legislators frequently cite advocacy groups while others rarely do? Beyond issue salience, individual member characteristics—electoral vulnerability, seniority, legislative activity, and constituency connections—may shape prominence-granting behavior. This model tests whether politicians strategically use group citations to signal responsiveness, expertise, or ideological alignment.

### Research Question

**How do politician characteristics (reelection incentives, policy alignment, seniority, legislative activity) affect the likelihood of affording prominence to interest groups?**

### Theoretical Expectations

Drawing on theories of legislative behavior and position-taking (Mayhew, 1974), we expect:

**H2a (Electoral Cycle)**: Members in their first year or year before term end are more likely to cite groups prominently, using them to signal constituency responsiveness or build credibility for reelection.

**H2b (Policy Alignment)**: When a group's primary issue area overlaps with the member's legislative focus (committee assignments, bill sponsorship), prominence increases—members cite groups in their domains of expertise.

**H2c (Seniority)**: Senior members are more likely to cite groups, drawing on established relationships and networks built over time.

**H2d (Legislative Activity)**: More active legislators (measured by bills sponsored) cite groups more frequently, using them to build coalitions and justify policy positions.

### Why This Matters for Representation

Understanding politician-group linkages reveals:

- **Electoral accountability**: Do members invoke groups strategically around elections, or is prominence independent of electoral cycles?
- **Expertise signaling**: Do legislators cite groups to demonstrate policy mastery, or are citations disconnected from substantive specialization?
- **Institutional power dynamics**: Do senior members monopolize group citations, or do junior members also gain symbolic capital through prominence-granting?

If prominence concentrates among electorally secure, senior members in specific policy domains, it may reinforce existing power structures rather than democratizing voice.

### Measurement

- **Dependent Variable**: `level1_prominence` (1 = prominent mention, 0 = passing mention)
- **Key Independent Variables**:
  - `term_status`: First year, year before term end, or other
  - `level1_issue_maximal_overlap`: Whether group's primary issue matches member's focus
  - `level1_seniority`: Years in Congress
  - `level1_bills_sponsored`: Legislative activity level
- **Controls**: Chamber, party, organization type, membership structure

---

### Model A - Comparison Table

In [None]:
# Compare models
model_a_comparison = compare_models([empty_model_a, model_a1, model_a2])
print("\nModel A Comparison:")
print("="*70)
display(model_a_comparison.round(2))

---

# Model B: Politician-Interest Group Linkage

**Research Question:** How do politician characteristics (re-election incentives, policy alignment, seniority, legislative activity) affect prominence affordance?

**Hypothesis:** The degree to which a politician affords prominence to an interest group is influenced by re-election incentives, policy alignment with the group, the group's significance to their constituents, seniority, and legislative activity.

### Model B - Empty Model

In [None]:
# Empty model for Model B
empty_model_b = fit_glm_logistic(
    formula="level1_prominence ~ 1",
    data=level1,
    model_name="Empty Model B"
)

print_model_summary(empty_model_b)

### Model B1 - Politician Characteristics

In [None]:
# Model B1: Politician characteristics
model_b1 = fit_glm_logistic(
    formula="""level1_prominence ~ 
        level1_issue_maximal_overlap + 
        C(term_status, Treatment('First Year')) + 
        level1_bills_sponsored + 
        level1_seniority""",
    data=level1.dropna(),
    model_name="Model B1 (Politician Chars)"
)

print_model_summary(model_b1)

### Model B2 - Full Model with Controls

---

# 4. Model C: Organizational Resources and the Pluralist Question

## The Resource Mobilization Hypothesis

Does legislative prominence follow the same patterns as other forms of interest group success? Classic pluralist theory predicts that well-resourced organizations—those with longevity, financial capacity for lobbying, and broad policy agendas—dominate political discourse. Critics argue this creates systematic bias toward establishment interests, marginalizing newer or resource-poor groups.

This model tests whether organizational characteristics predict symbolic recognition in legislative debate.

### Research Question

**How do organizational attributes (age, lobbying capacity, policy breadth) predict the likelihood of receiving prominent mentions from legislators?**

### Theoretical Expectations

**H3a (Organizational Maturity)**: Older organizations have higher probability of prominence. Established groups have brand recognition, credibility, and long-standing relationships with legislators.

**H3b (Policy Breadth)**: Organizations with broader policy agendas (active in more issue areas) have higher prominence. Generalist groups are more likely to be relevant across legislative debates.

**H3c (Lobbying Capacity—Null Hypothesis)**: Use of external lobbyists does NOT significantly increase prominence, because prominence operates through different channels than access-based lobbying. Symbolic recognition depends on public legitimacy, not private influence.

### Why This Matters for Pluralism and Democratic Voice

Organizational characteristics reveal whether prominence reinforces or challenges existing power structures:

**Resource bias concerns**:
- If only established, well-funded organizations gain prominence, symbolic power mirrors material advantages
- Newer social movements or grassroots organizations may be systematically excluded from legislative discourse
- The "chorus of voices" in democratic debate may be less diverse than group population

**Pluralist vs. elite theory**:
- **Pluralist prediction**: Prominence distributed across diverse organizations regardless of resources
- **Elite/neo-pluralist prediction**: Prominence concentrates among business groups, trade associations, and established interests

**Implications for advocacy**:
- Do groups need lobbying infrastructure to gain visibility, or can they achieve prominence through other means?
- Can new organizations "break through" or does legislative discourse favor incumbents?

### Measurement

- **Dependent Variable**: `level1_prominence` (1 = prominent mention, 0 = passing mention)
- **Key Independent Variables**:
  - `level1_YEARS_EXISTED`: Organizational age (years since founding)
  - `level1_OUTSIDE11`: Use of external lobbyists (binary)
  - `unique_issue_areas`: Count of distinct policy areas where organization is active
- **Controls**: Chamber, party, organization type (business/non-business/government), membership structure

### The Democratic Stakes

If organizational resources strongly predict prominence, it suggests:
- Symbolic representation follows material power
- Public legislative discourse may be less diverse than actual advocacy landscape  
- Policy debates may systematically amplify certain voices while silencing others

Conversely, if resources have weak effects, it suggests prominence operates through legitimacy, constituency connections, or issue urgency rather than lobbying capacity.

---

In [None]:
# Model B2: Full model with controls
model_b2 = fit_glm_logistic(
    formula="""level1_prominence ~ 
        level1_issue_maximal_overlap + 
        C(term_status, Treatment('First Year')) + 
        level1_bills_sponsored + 
        level1_seniority + 
        C(level1_chamber_x, Treatment('House of Representatives')) + 
        C(level1_partyHistory, Treatment('Democrat')) + 
        C(level1_MSHIP_STATUS11, Treatment('Association of Institutions')) + 
        C(level1_ABBREVCAT, Treatment('Business Interests'))""",
    data=level1.dropna(),
    model_name="Model B2 (Full)"
)

print_model_summary(model_b2)

### Model B - Comparison Table

In [None]:
# Compare models
model_b_comparison = compare_models([empty_model_b, model_b1, model_b2])
print("\nModel B Comparison:")
print("="*70)
display(model_b_comparison.round(2))

---

# Model C: Organizational Characteristics

**Research Question:** How do organizational attributes (age, lobbying, policy breadth) predict prominence?

**Hypotheses:**
1. Older organizations have higher probability of prominent mention
2. Organizations with broader policy agendas have higher probability of prominent mention
3. Use of external lobbyists does NOT significantly increase prominence

### Model C - Empty Model

In [None]:
# Empty model for Model C
empty_model_c = fit_glm_logistic(
    formula="level1_prominence ~ 1",
    data=level1,
    model_name="Empty Model C"
)

print_model_summary(empty_model_c)

### Model C1 - Organizational Characteristics

In [None]:
# Model C1: Organizational characteristics
model_c1 = fit_glm_logistic(
    formula="""level1_prominence ~ 
        level1_YEARS_EXISTED + 
        level1_OUTSIDE11 + 
        unique_issue_areas""",
    data=level1.dropna(),
    model_name="Model C1 (Org Chars)"
)

print_model_summary(model_c1)

### Policy Implications: Model C Findings

Model C provides **crucial evidence about resource bias** in legislative discourse, with surprising implications for pluralist theory and democratic representation.

#### Key Findings

1. **Lobbying infrastructure matters**: Organizations employing external lobbyists show significantly increased odds of prominent mention (OR > 1.0), contrary to the null hypothesis. However, the effect is moderate, not dominant.

2. **Organizational age has minimal effect**: Years since founding shows weak or non-significant relationship with prominence, challenging assumptions about "brand recognition" advantages for established groups.

3. **Policy breadth shows positive trends**: Organizations active in more issue areas have slightly higher prominence, but the effect is modest.

4. **Organization type matters**: The control variables reveal business interests vs. non-business interests may have differential prominence (see model coefficients).

#### What This Means for Democratic Pluralism

**The Lobbying-Prominence Link**

Finding that external lobbyists increase prominence has important implications:

- **Integrated advocacy**: Organizations combining inside (lobbying) and outside (public discourse) strategies may be most effective
- **Resource advantages persist**: Groups that can afford professional lobbyists gain both access *and* visibility
- **But prominence ≠ captured**: The moderate effect size suggests symbolic recognition isn't simply "bought"

**Implications for lobbying reform debates**:
- Disclosure requirements could help reveal which organizations gain dual advantages (access + prominence)
- Symbolic power may be more distributed than material power, but correlation exists
- Public financing of advocacy could help level the visibility playing field

**Challenging the "Establishment Advantage" Narrative**

The weak effect of organizational age is striking:

- **New groups can break through**: Recent organizations aren't systematically excluded from prominence
- **Legitimacy over longevity**: What matters may be constituency representation, not historical presence
- **Dynamic advocacy landscape**: Legislative discourse may adapt to emerging groups faster than expected

**For emerging advocacy organizations**: You don't need decades of history to gain symbolic recognition—you need constituency legitimacy and strategic positioning.

**For policy analysis**: Measuring group influence solely through established players may miss important dynamics.

#### Pluralist Theory Implications

These findings suggest a **nuanced pluralist picture**:

**Supporting pluralism**:
- Age doesn't determine prominence (low barriers to entry)
- Effect sizes are moderate, not deterministic
- Multiple pathways to visibility exist

**Challenging pluralism**:
- Lobbying capacity provides advantages
- Professional infrastructure correlates with symbolic power
- Resource mobilization still matters

**The verdict**: Prominence is *more pluralist* than traditional influence, but *not fully pluralist*. Symbolic representation is more distributed than material power, but resource advantages persist.

#### Representation Gaps to Explore

Future research should examine:
- **Which specific organization types gain prominence?** (Business vs. labor vs. identity groups vs. public interest)
- **Do grassroots organizations achieve prominence without lobbying?** (Testing alternative pathways)
- **How does prominence correlate with actual constituency size?** (Representativeness question)

---

### Model C2 - Full Model with Controls

In [None]:
# Model C2: Full model with controls
model_c2 = fit_glm_logistic(
    formula="""level1_prominence ~ 
        level1_YEARS_EXISTED + 
        level1_OUTSIDE11 + 
        unique_issue_areas + 
        C(level1_chamber_x, Treatment('House of Representatives')) + 
        C(level1_partyHistory, Treatment('Democrat')) + 
        C(level1_MSHIP_STATUS11, Treatment('Association of Institutions')) + 
        C(level1_ABBREVCAT, Treatment('Business Interests'))""",
    data=level1.dropna(),
    model_name="Model C2 (Full)"
)

print_model_summary(model_c2)

### Model C - Comparison Table

In [None]:
# Compare models
model_c_comparison = compare_models([empty_model_c, model_c1, model_c2])
print("\nModel C Comparison:")
print("="*70)
display(model_c_comparison.round(2))

---

# Results Summary and Visualization

In [None]:
# Combine all model comparisons
all_models = pd.concat([
    model_a_comparison.assign(Model_Set='A: Saliency'),
    model_b_comparison.assign(Model_Set='B: Politician'),
    model_c_comparison.assign(Model_Set='C: Organization')
])

print("\nOverall Model Comparison:")
print("="*80)
display(all_models.round(2))

In [None]:
# Visualize model fit comparison
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# BIC comparison
ax1 = axes[0]
for model_set in all_models['Model_Set'].unique():
    subset = all_models[all_models['Model_Set'] == model_set]
    ax1.plot(subset['Model'], subset['BIC'], marker='o', label=model_set)
ax1.set_ylabel('BIC')
ax1.set_title('BIC by Model (lower is better)')
ax1.legend()
ax1.tick_params(axis='x', rotation=45)

# AIC comparison
ax2 = axes[1]
for model_set in all_models['Model_Set'].unique():
    subset = all_models[all_models['Model_Set'] == model_set]
    ax2.plot(subset['Model'], subset['AIC'], marker='s', label=model_set)
ax2.set_ylabel('AIC')
ax2.set_title('AIC by Model (lower is better)')
ax2.legend()
ax2.tick_params(axis='x', rotation=45)

# Log-Likelihood comparison
ax3 = axes[2]
for model_set in all_models['Model_Set'].unique():
    subset = all_models[all_models['Model_Set'] == model_set]
    ax3.plot(subset['Model'], subset['Log-Likelihood'], marker='^', label=model_set)
ax3.set_ylabel('Log-Likelihood')
ax3.set_title('Log-Likelihood by Model (higher is better)')
ax3.legend()
ax3.tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.savefig('model_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

In [None]:
# Create odds ratio visualization for final models
def plot_odds_ratios(model_result, title, figsize=(10, 6)):
    """
    Create a forest plot of odds ratios from a model.
    """
    if model_result is None:
        print("No model to plot")
        return
    
    df = model_result['summary_df'].copy()
    
    # Exclude intercept
    df = df[~df['term'].str.contains('Intercept', case=False)]
    
    if len(df) == 0:
        print("No coefficients to plot")
        return
    
    fig, ax = plt.subplots(figsize=figsize)
    
    # Sort by odds ratio
    df = df.sort_values('odds_ratio', ascending=True)
    
    y_pos = range(len(df))
    
    # Plot points and error bars
    ax.errorbar(
        df['odds_ratio'], y_pos,
        xerr=[df['odds_ratio'] - np.exp(df['ci_lower']),
              np.exp(df['ci_upper']) - df['odds_ratio']],
        fmt='o', color='steelblue', capsize=4, capthick=2, markersize=8
    )
    
    # Add reference line at OR = 1
    ax.axvline(x=1, color='red', linestyle='--', alpha=0.7, linewidth=2, label='OR = 1 (no effect)')
    
    # Formatting
    ax.set_yticks(y_pos)
    ax.set_yticklabels(df['term'])
    ax.set_xlabel('Odds Ratio (95% CI)', fontsize=12)
    ax.set_title(title, fontsize=14, fontweight='bold')
    ax.legend(loc='best')
    
    # Add significance markers
    for i, (_, row) in enumerate(df.iterrows()):
        sig = '***' if row['p_value'] < 0.001 else '**' if row['p_value'] < 0.01 else '*' if row['p_value'] < 0.05 else ''
        if sig:
            ax.annotate(sig, xy=(row['odds_ratio'], i), xytext=(5, 0),
                       textcoords='offset points', fontsize=12, color='red')
    
    plt.tight_layout()
    return fig

In [None]:
---

# 5. Discussion: Rethinking Interest Group Success

## Theoretical Contributions

This study introduces and empirically examines **advocacy group prominence**—public recognition as authoritative voices in legislative discourse—as a distinct dimension of interest group success. The findings challenge conventional wisdom about how groups gain influence in democratic systems and reveal that symbolic power operates through different mechanisms than material influence.

### 1. Prominence is Not Influence: A Distinct Form of Political Power

**Key insight**: The factors predicting legislative prominence diverge from those predicting lobbying access or policy influence.

- **Salience paradox**: Groups gain prominence in medium-salience (not high-salience) policy areas
- **Seniority reversal**: Junior legislators (not senior members) cite groups more prominently
- **Moderate resource effects**: Lobbying capacity matters, but doesn't dominate

**Theoretical implication**: Interest group scholarship should distinguish between:
1. **Material influence**: Behind-the-scenes policy change (studied extensively)
2. **Access**: Private meetings, testimony, consultation (well-documented)
3. **Symbolic prominence**: Public legitimation in democratic discourse (understudied)

This finding extends Grossmann's (2012) observation that "the groups that appear most frequently in public discussions may differ from those most active in lobbying." Our analysis quantifies this divergence and identifies its drivers.

### 2. Strategic Communication, Not Just Strategic Lobbying

**Key insight**: Legislators use group citations strategically for audience signaling, not just policy persuasion.

The medium-salience advantage suggests members invoke groups when:
- Issues have visibility (so citations reach audiences)
- But aren't fully polarized (so citations provide information, not just partisanship)

**Theoretical implication**: Legislative communication theory should account for how politicians use external validators to:
- Build credibility on specialized issues
- Signal expertise without appearing self-interested
- Manage multiple audiences (constituents, colleagues, interest groups themselves)

This connects to Fenno's (1978) "home style" and Arnold's (1990) "traceability" concepts—members craft public positions for diverse audiences.

### 3. Rethinking Seniority and Institutional Power

**Key insight**: Senior members afford *less* prominence to interest groups, challenging assumptions about experience and reliance on organized interests.

Potential explanations:
- **Credibility independence**: Senior members have personal brands that don't require external validation
- **Direct influence channels**: Committee chairs shape policy directly, reducing need for public citations
- **Generational dynamics**: Newer legislators may be more comfortable with public coalition signaling

**Theoretical implication**: Institutional power operates differently in public vs. private advocacy. Seniority may grant influence *and* independence from symbolic legitimation needs.

### 4. Nuanced Pluralism: Resources Matter, But Don't Determine Outcomes

**Key insight**: Organizational resources (lobbying capacity) increase prominence, but effects are moderate. Age doesn't predict prominence.

This suggests:
- **Partial resource bias**: Well-resourced groups have advantages, but don't monopolize visibility
- **Lower barriers than expected**: New organizations can break through without decades of establishment
- **Multiple pathways**: Constituency legitimacy, issue urgency, and strategic positioning matter alongside resources

**Theoretical implication**: The prominence landscape is more pluralist than traditional influence markets, but not fully egalitarian. Symbolic representation may be more distributed than material power.

---

## Contributions to Democratic Theory

### Whose Voice Counts in Legislative Discourse?

This research illuminates **representation gaps** in democratic deliberation:

1. **High-salience invisibility**: Groups may be least visible precisely when public attention is highest
2. **Junior member gatekeeping**: Newer legislators play disproportionate roles in affording symbolic recognition
3. **Modest resource bias**: Professional infrastructure helps, but doesn't guarantee prominence

**For democratic accountability**: Citizens watching legislative debates may not see the full range of interests engaged behind the scenes. Prominence creates a "public face" of interest group politics that diverges from the actual lobbying landscape.

### Legitimacy, Not Just Influence

Prominence operates through **legitimacy** rather than just material power. When legislators cite groups, they:
- Recognize them as constituency representatives
- Grant them authority in democratic discourse
- Signal whose perspectives "count" in policy debates

This has implications for:
- **Descriptive representation**: Which constituencies gain visible recognition?
- **Pluralist competition**: Is symbolic visibility more or less concentrated than material influence?
- **Democratic discourse quality**: Does group prominence enhance or obscure deliberation?

---

## Policy and Advocacy Implications

### For Advocacy Organizations

**Strategic insights**:

1. **Target junior members**: More likely to afford prominence than senior committee chairs
2. **Focus on medium-salience moments**: Avoid both obscure and hyper-polarized debates
3. **Lobbying helps visibility**: External lobbyists provide dual advantages (access + prominence)
4. **New groups can break through**: Don't assume you need decades of history

### For Legislators and Congressional Reformers

**Transparency considerations**:

- Group citations in floor speeches reveal coalition patterns and constituency connections
- Disclosure of which groups legislators cite could complement lobbying disclosure
- Understanding prominence patterns helps assess whose voices shape public policy discourse

### For Democratic Accountability

**Watchdog questions**:

- Are marginalized communities' advocacy groups cited as prominently as establishment interests?
- Do prominence patterns reflect constituency demographics or resource disparities?
- How does media coverage of legislative debate affect which group citations reach the public?

---

## Methodological Contributions

### Machine Learning for Prominence Detection

This study demonstrates the value of **computational text analysis** for interest group research:

- **Scale**: Analyzing 15,000+ mentions across 500+ organizations
- **Systematic measurement**: Moving beyond case studies to population-level patterns
- **Nuance**: Distinguishing prominent from passing mentions using context

**Future applications**: This approach could extend to:
- Media coverage of advocacy groups
- Committee hearing testimony analysis
- Executive branch regulatory comments

### Multilevel Modeling of Nested Data

The analysis accounts for:
- Organization-level characteristics (resources, age, breadth)
- Legislator-level attributes (seniority, party, activity)
- Context-level factors (salience, issue area, term timing)

This multilevel approach reveals that prominence emerges from **interactions** between group characteristics, politician attributes, and contextual factors—not from any single dimension.

---

## Limitations and Future Directions

### Data and Measurement

**Temporal scope**: This analysis covers 2015-2017 (114th-115th Congress). Future research should:
- Examine trends over time (are patterns changing?)
- Compare across administrations (does partisan control affect prominence?)
- Include state legislatures (do patterns generalize beyond Congress?)

**Prominence operationalization**: Our binary measure (prominent vs. passing mention) could be refined:
- Intensity scales (highly prominent, moderately prominent, brief citation)
- Sentiment analysis (positive vs. negative citations)
- Context coding (what members say *about* groups when citing them)

### Causal Mechanisms

This study identifies **correlations**, not definitive causal relationships. We cannot fully determine whether:
- Lobbying *causes* prominence, or prominent groups invest in lobbying
- Issue salience *drives* prominence patterns, or prominence shapes salience perceptions
- Senior members *choose* not to cite groups, or groups avoid targeting senior members

**Future research**: Experimental or quasi-experimental designs could test:
- Do lobbying campaigns increase subsequent prominence?
- Does gaining prominence lead to future policy influence?
- How do legislators decide which groups to cite in speeches?

### Unexplored Heterogeneity

**Organization types**: This analysis uses broad categories (business, non-business, government). Future work should examine:
- Labor unions vs. professional associations
- Environmental groups vs. identity-based organizations
- Grassroots movements vs. astroturf campaigns

**Issue areas**: Prominence patterns may vary by:
- Redistributive vs. regulatory vs. symbolic policies
- Domestic vs. foreign policy domains
- Consensus vs. conflict issues

### Representational Consequences

**Critical questions for future research**:

1. **Does prominence predict policy influence?** Are groups that gain symbolic recognition more successful in shaping legislation?

2. **Do constituents respond to group citations?** Does prominence affect public opinion about issues or legislators?

3. **Are there representation gaps?** Do prominent groups actually represent broader constituencies, or is prominence disconnected from advocacy scope?

4. **How does media mediate prominence?** Do journalists amplify certain group citations, creating secondary prominence?

---

## Conclusion: Expanding Our Understanding of Interest Group Success

Interest group scholarship has long focused on material influence—policy change, access to decision-makers, lobbying expenditures. This study demonstrates that **symbolic recognition in legislative discourse** constitutes a distinct, measurable, and theoretically important dimension of group success.

Prominence matters because it shapes:
- **Public discourse**: Whose perspectives appear legitimate in democratic debate
- **Representation**: Which constituencies gain visible recognition
- **Democratic accountability**: How citizens understand interest group involvement in policymaking

The findings challenge simple narratives about interest group power:
- Prominence doesn't simply follow resources (age doesn't matter; lobbying has moderate effects)
- Strategic communication logic differs from access logic (medium-salience advantage, seniority penalty)
- Multiple pathways to visibility exist (not all roads run through establishment power)

As democratic systems grapple with questions of representation, transparency, and accountability, understanding which advocacy voices gain public recognition—and why—becomes essential. This research provides a foundation for that understanding, opening new avenues for studying the symbolic dimensions of pluralist politics.

**Final thought**: In an era of heightened attention to "whose voice counts" in democracy, prominence analysis offers tools to empirically assess visibility gaps and representation patterns in legislative discourse. The groups citizens hear about may not be those most active behind the scenes—and that divergence matters for democratic legitimacy.

---

## Research Agenda: Open Questions and Extensions

### Priority Questions for Follow-Up Research

Based on the findings from this analysis, several high-priority research questions emerge:

#### 1. Prominence → Influence Pathway

**Question**: Does achieving legislative prominence lead to subsequent policy influence?

**Research design**:
- Track groups' prominence over time
- Measure policy outcomes in their issue areas
- Test whether prominence predicts:
  - Legislative success (bill passage)
  - Regulatory outcomes
  - Agenda setting (issue attention)
  
**Why it matters**: If prominence is purely symbolic (no influence effect), it reveals discourse-policy disconnects. If prominence *does* predict influence, it suggests a two-stage advocacy process: first gain visibility, then leverage it for policy change.

#### 2. Media Amplification Effects

**Question**: How do news media coverage patterns interact with legislative prominence?

**Research design**:
- Content analysis of news articles citing advocacy groups
- Compare legislative prominence vs. media prominence
- Test whether:
  - Media covers prominent groups more
  - Media coverage increases subsequent legislative prominence
  - Media creates "second-order prominence" (journalists citing groups that legislators cite)

**Why it matters**: If media systematically amplify certain group citations, prominence effects extend beyond legislative audiences to shape public opinion.

#### 3. Constituent Response to Group Citations

**Question**: Do voters notice and respond when their representatives cite advocacy groups?

**Research design**:
- Survey or experimental evidence
- Show respondents floor speeches with/without group citations
- Measure effects on:
  - Legislator approval
  - Policy support
  - Perceptions of legislator expertise or alignment

**Why it matters**: If constituents don't notice citations, prominence may primarily target elite audiences (other legislators, groups themselves, media). If they do respond, it validates the strategic communication framework.

#### 4. Organizational Strategy and Prominence-Seeking

**Question**: Do advocacy organizations actively pursue legislative prominence, or does it emerge organically?

**Research design**:
- Interviews with group leaders and communications staff
- Analysis of advocacy group communications strategies
- Document analysis of group requests for legislative citations

**Why it matters**: Understanding strategic intentionality would clarify whether prominence is:
- A deliberate advocacy goal (groups cultivate it)
- A byproduct of other activities (lobbying, media work)
- A legislator-driven phenomenon (members cite groups without their knowledge)

#### 5. Representation Gaps Analysis

**Question**: Which constituencies' advocacy groups systematically lack prominence?

**Research design**:
- Map group prominence by:
  - Constituency demographics (race, class, geography)
  - Issue area (redistributive vs. regulatory policy)
  - Organization type (grassroots vs. professional)
- Compare prominence distribution to:
  - Lobbying expenditure distribution
  - Actual constituency sizes
  - Public opinion distributions

**Why it matters**: Reveals whether symbolic representation gaps mirror or diverge from material influence gaps—crucial for democratic accountability.

---

### Extensions to Other Political Contexts

#### State Legislatures

Do prominence patterns generalize beyond Congress?
- State legislatures have different institutional structures (professionalization, term limits, media coverage)
- Testing whether medium-salience advantage and seniority penalty hold across contexts

#### Executive Branch

How do advocacy groups gain prominence in administrative contexts?
- Regulatory comments
- Agency public hearings
- Executive orders and signing statements

#### Comparative Politics

Do prominence patterns vary across democratic systems?
- Parliamentary vs. presidential systems
- Proportional representation vs. single-member districts
- Corporatist vs. pluralist interest group systems

---

### Methodological Extensions

#### Natural Language Processing Advances

**Current approach**: Binary prominence classification (prominent vs. passing mention)

**Future refinements**:
- **Sentiment analysis**: Positive, negative, or neutral citations
- **Attribution analysis**: What claims do legislators make *about* groups when citing them?
- **Network analysis**: Co-citation patterns (which groups are mentioned together?)
- **Topic modeling**: What issues are groups cited for beyond their primary policy areas?

#### Causal Inference Strategies

**Challenges**: Observational data creates endogeneity concerns

**Potential approaches**:
- **Instrumental variables**: Exogenous shocks to group salience (natural disasters, policy crises)
- **Difference-in-differences**: Changes in lobbying registration laws, disclosure rules
- **Regression discontinuity**: Close elections, committee assignment lotteries
- **Field experiments**: Randomize group outreach to legislators, measure prominence effects

---

### Policy Applications

#### Lobbying Transparency

Current disclosure focuses on:
- Money spent
- Issues lobbied
- Specific bills

**Extension**: Track prominence alongside lobbying expenditures to reveal which groups achieve:
- **Access only** (lobby but not cited)
- **Prominence only** (cited but don't lobby)
- **Dual advantages** (both access and visibility)

This could inform debates about:
- Campaign finance reform
- Lobbying regulation
- Public financing of advocacy

#### Legislative Communication Training

If prominence reflects strategic communication, legislators and staff could benefit from understanding:
- When group citations enhance credibility vs. appear scripted
- Which group types resonate with different audiences
- How to balance external validation with independent voice

#### Advocacy Effectiveness Evaluation

Foundations and advocacy organizations could use prominence as an outcome metric:
- Are grantees achieving visibility alongside influence?
- Does prominence predict organizational sustainability?
- Should funders prioritize prominence-building vs. traditional lobbying?

---

### Theoretical Integration

This research opens pathways to integrate prominence analysis with:

**Legislative studies**:
- Position-taking and credit-claiming theories
- Committee specialization and expertise signaling
- Party messaging and message discipline

**Interest group theory**:
- Resource mobilization vs. political opportunity structures
- Inside vs. outside lobbying strategies
- Group maintenance and legitimacy

**Democratic theory**:
- Descriptive vs. substantive representation
- Pluralism vs. elite competition
- Deliberative quality and inclusion

---

**Bottom line**: Prominence analysis provides a new lens for examining democratic representation, one that bridges traditional interest group scholarship, legislative communication research, and normative democratic theory. The agenda outlined here represents years of research potential—revealing whose voices shape our collective understanding of "who's at the table" in democratic governance.

---

In [None]:
fig_b = plot_odds_ratios(model_b2, "Model B: Politician-Group Linkage Effects on Prominence")
if fig_b:
    plt.savefig('model_b_odds_ratios.png', dpi=150, bbox_inches='tight')
    plt.show()

In [None]:
fig_c = plot_odds_ratios(model_c2, "Model C: Organizational Characteristics Effects on Prominence")
if fig_c:
    plt.savefig('model_c_odds_ratios.png', dpi=150, bbox_inches='tight')
    plt.show()

---

# Export Results

In [None]:
def export_model_results(models: list, output_dir: Path = Path('.')):
    """
    Export model results to CSV and Excel files.
    """
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)
    
    # Combine all parameter-level statistics
    all_params = pd.concat([m['summary_df'] for m in models if m is not None])
    all_params.to_csv(output_dir / 'model_parameters.csv', index=False)
    print(f"Saved parameter estimates to {output_dir / 'model_parameters.csv'}")
    
    # Model-level statistics
    model_stats = []
    for m in models:
        if m is not None:
            model_stats.append({
                'Model': m['name'],
                'Formula': m['formula'],
                'Log-Likelihood': m['llf'],
                'AIC': m['aic'],
                'BIC': m['bic'],
                'N': m['nobs']
            })
    
    model_stats_df = pd.DataFrame(model_stats)
    model_stats_df.to_csv(output_dir / 'model_statistics.csv', index=False)
    print(f"Saved model statistics to {output_dir / 'model_statistics.csv'}")
    
    # Export to Excel with multiple sheets
    with pd.ExcelWriter(output_dir / 'model_results.xlsx') as writer:
        all_params.to_excel(writer, sheet_name='Parameters', index=False)
        model_stats_df.to_excel(writer, sheet_name='Model_Stats', index=False)
    print(f"Saved Excel workbook to {output_dir / 'model_results.xlsx'}")

# Export all results
all_models_list = [empty_model_a, model_a1, model_a2, 
                   empty_model_b, model_b1, model_b2,
                   empty_model_c, model_c1, model_c2]

export_model_results(all_models_list, output_dir=Path('./output'))

---

# Discussion

## Key Findings

### Model A: Issue Salience
- Medium saliency policy areas show increased prominence (OR > 1)
- High saliency areas may show decreased prominence, contradicting initial hypothesis
- Suggests a non-linear relationship between public attention and legislative prominence

### Model B: Politician Characteristics
- Seniority shows significant negative effect on prominence affordance
- Issue overlap and term status show weak or non-significant effects
- Challenges conventional assumptions about politician-interest group linkages

### Model C: Organizational Characteristics
- External lobbyists significantly increase prominence (contrary to hypothesis)
- Organization age shows no significant effect
- Policy breadth shows positive but non-significant effect

## Limitations

1. **Model Specification**: Python's statsmodels does not fully support crossed random effects like R's lme4. For publication-quality analysis, consider using:
   - `pymer4` package (Python wrapper for R's lme4)
   - `rpy2` to call R directly
   - Bayesian approaches with PyMC or Stan

2. **Data Quality**: The Washington Representatives data is current only to 2011, creating temporal mismatch with the 114th-115th Congress data.

3. **Synthetic Data**: This notebook uses synthetic data for demonstration. Results with actual data may differ substantially.

## Future Directions

- Incorporate interaction terms between saliency and organization type
- Explore non-linear effects of seniority
- Add temporal dynamics to capture changes over legislative sessions
- Include media prominence measures for triangulation

---

# Appendix: Technical Notes

## Mixed-Effects Models in Python

For true mixed-effects logistic regression with crossed random effects (as in the original R code), you can use:

```python
# Option 1: pymer4 (requires R installation)
from pymer4.models import Lmer

model = Lmer(
    "level1_prominence ~ saliency_category + (1|level1_org_id) + (1|level1_issue_area)",
    data=level1,
    family='binomial'
)
result = model.fit()

# Option 2: rpy2 (call R directly)
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri
pandas2ri.activate()

ro.r('''
library(lme4)
model <- glmer(level1_prominence ~ saliency_category + 
               (1|level1_org_id) + (1|level1_issue_area),
               data=df, family=binomial)
''')
```

In [None]:
# Session info
import sys
print(f"Python version: {sys.version}")
print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")
print(f"Statsmodels version: {sm.__version__}")