# Data Collection and Analysis

This section covers data collection from multiple sources (FRED, Yahoo Finance, CFPB reports) and the multi-factor regression analysis.

## 2. Data Collection - Financial Market Variables

### 2.1 BNPL Stock Selection

Construct an equally-weighted portfolio of BNPL firms:

**Included Firms:**
- **Affirm Holdings (AFRM)**: Largest publicly-traded BNPL provider
- **PayPal Holdings (PYPL)**: Includes BNPL product (Pay in 4)
- **Sezzle (SEZL)**: Pure-play BNPL provider

**Portfolio Construction:**
- Equally-weighted average return: $R_{BNPL,t} = \frac{1}{N}\sum_{i=1}^{N} R_{i,t}$
- This approach treats all firms equally, avoiding large-firm bias

### 2.2 Market Benchmark and Controls

**S&P 500 (SPY)**: Market benchmark to control for systematic risk factors

**VIX Index**: Volatility measure to control for market uncertainty

---

## 1. Data Collection - Macroeconomic Variables

### 1.1 Data Sources and Variable Selection

This step collects macroeconomic data from the Federal Reserve Economic Data (FRED) API, focusing on variables identified in the literature review as key drivers of BNPL performance. The variable selection process is grounded in empirical research documenting BNPL firms' sensitivity to monetary policy, consumer spending patterns, and credit market conditions.

The primary interest rate variable is the **Federal Funds Rate (FEDFUNDS)**, which serves as the primary monetary policy tool and directly affects BNPL funding costs. As established in the theoretical foundation, BNPL firms rely on short-term borrowing from wholesale markets to fund consumer loans, making their cost of capital directly tied to short-term interest rates.

The **10-Year Treasury Rate (DGS10)** serves as a long-term rate benchmark and is used to calculate credit spreads, providing insight into broader credit market conditions that may affect BNPL firms' access to capital.

Consumer spending variables capture the demand-side factors affecting BNPL usage. **Retail Sales (RSAFS)** provides a direct measure of consumer spending on goods, which represents BNPL's primary market. Empirical research by Di Maggio, Williams, and Katz documents that BNPL access increases total spending by $130 per week on average, establishing a clear link between retail spending and BNPL performance (Di Maggio et al.).

**Personal Consumption Expenditures (PCE)** offers a broader measure of consumer spending across all categories, while the **Consumer Confidence Index (UMCSENT)** serves as a forward-looking indicator of consumer spending intentions, capturing the psychological factors that may drive BNPL adoption.

Credit market variables measure the availability and cost of credit in the broader economy. The **BAA Corporate Bond Yield (BAA)** is used to calculate credit spreads (BAA minus 10-Year Treasury), which capture credit market tightness affecting BNPL firms' borrowing costs and profitability. The Consumer Financial Protection Bureau's Market Trends Report documents that BNPL firms' net transaction margins declined from 1.27% in 2020 to 1.01% in 2021, with cost of funds increasing in early-to-mid 2022, highlighting the importance of credit market conditions.

**Total Consumer Credit (TOTALSL)** measures credit availability in the economy, providing insight into the broader credit environment in which BNPL firms operate.

Control variables include the **Consumer Price Index (CPIAUCSL)**, which measures inflation and affects real purchasing power. Higher inflation reduces consumers' real purchasing power, potentially affecting discretionary spending patterns and BNPL usage. This variable controls for macroeconomic conditions that may confound the relationship between interest rates and BNPL returns.

### 1.2 Data Transformation

All variables are transformed to monthly frequency and converted to appropriate forms for regression analysis. Returns and growth rates are calculated as percentage changes month-over-month, ensuring that the data captures the dynamic relationships between variables. Changes are calculated as first differences for level variables, ensuring stationarity and interpretability. Spreads are calculated as differences between rates, such as the credit spread calculated as the difference between BAA Corporate Bond Yields and 10-Year Treasury rates. These transformations ensure that the data meets the statistical assumptions required for regression analysis while maintaining economic interpretability.

---

## 3.5 CFPB Regulatory Data Analysis

### 3.5.1 Purpose and Data Sources

This section extracts and analyzes key statistics from Consumer Financial Protection Bureau (CFPB) reports to provide regulatory and market context for our regression analysis. The CFPB has published four major reports on BNPL that inform our understanding of the industry structure and consumer behavior patterns.

The first report, the CFPB Market Trends Report published in September 2022, provides industry-wide BNPL metrics including Gross Merchandise Volume (GMV), transaction volume, and charge-off rates. This report also documents market structure and competitive dynamics, as well as profitability trends such as unit margins and late fees. These metrics are crucial for understanding the business model characteristics that make BNPL firms potentially sensitive to interest rate changes.

The second report, the CFPB Making Ends Meet Report published in December 2022, focuses on consumer financial vulnerability indicators. This report documents income variability and credit card debt trends among BNPL users, as well as demographic patterns in BNPL usage. These findings inform our variable selection process, as they identify the consumer financial stress indicators that may drive BNPL demand and affect firm performance.

The third report, the CFPB Consumer Use Report published in March 2023, provides detailed analysis of how consumers use BNPL products, including usage patterns, repayment behavior, and financial outcomes. This report documents consumer characteristics and credit profiles, credit card utilization patterns, and financial distress indicators that help explain aggregate industry patterns.

The fourth report, the CFPB Consumer Use of Buy Now, Pay Later and Other Unsecured Debt report published in January 2025, provides the most recent comprehensive analysis of BNPL usage patterns and consumer outcomes. This report updates earlier findings and provides current statistics on BNPL adoption, usage intensity, and consumer financial outcomes, including latest market developments, regulatory updates, and policy implications.

Together, these four reports provide a comprehensive foundation for understanding the BNPL industry structure, consumer behavior patterns, and regulatory context that informs our regression analysis. The statistics extracted from these reports are integrated into our variable selection process and used to validate our model specifications against empirical evidence from regulatory sources.

---

## 5. Multi-Factor Regression Analysis

Having collected data from FRED, Yahoo Finance, and CFPB reports, we proceed to estimate regression models that test our theoretical predictions. The regression analysis follows a systematic progression from a simple baseline model to a refined multi-factor specification, allowing us to assess how adding theoretically-justified control variables improves our understanding of BNPL return determinants.

This approach ensures that our findings are robust to model specification choices and that each included variable contributes meaningfully to explaining BNPL return variance.

### 5.1 Overview: From Simple to Refined Models

This section presents a systematic progression from a simple baseline model to a refined multi-factor model, demonstrating how adding theoretically-justified control variables improves our understanding of BNPL stock return determinants. We begin with a baseline model that includes only the Federal Funds Rate change, then progressively add control variables to isolate the direct effect of interest rates while controlling for confounding factors.

**Model Progression Strategy:**
1. **Baseline Model (Model 1)**: Federal Funds Rate change only
2. **Multi-Factor Baseline (Model 1)**: Federal Funds Rate + 5 core control variables
3. **Model Selection**: Testing optimal variable combinations (3-7 variables)
4. **Best Model (Model 7 or Optimal 5-Variable)**: Selected based on Adjusted R-squared

### 5.2 Model Specifications: Baseline vs Refined

#### 5.2.1 Baseline Model (Simple Bivariate)

**Equation:** $$R_{BNPL,t} = \beta_0 + \beta_1(\Delta FFR_t) + \varepsilon_t$$

**Variables:**
- $R_{BNPL,t}$ = Monthly BNPL stock return (%)
- $\Delta FFR_t$ = Month-over-month change in Federal Funds Rate (%)
- $\beta_1$ = Coefficient of interest (measures BNPL sensitivity to rate changes)

**Purpose:** Establishes initial relationship between interest rates and BNPL returns without controls.

**Limitation:** Suffers from omitted variable bias - coefficient may capture indirect effects through consumer spending, credit conditions, etc.

---

#### 5.2.2 Multi-Factor Baseline Model (Model 1)

**Equation:** $$R_{BNPL,t} = \beta_0 + \beta_1(\Delta FFR_t) + \beta_2(\Delta Retail_t) + \beta_3(\Delta CC_t) + \beta_4(\Delta Spread_t) + \beta_5(\Delta PCE_t) + \beta_6(\Delta Credit_t) + \beta_7(\pi_t) + \varepsilon_t$$

**Variables Included:**

| Variable | Symbol | Description | Expected Sign | Theoretical Justification |
|---|---|---|---|---|
| **Federal Funds Rate Change** | $\Delta FFR_t$ | Month-over-month change in Fed Funds Rate (%) | **Negative** | Direct funding cost channel (Laudenbach et al.; Affirm Holdings) |
| **Retail Sales Growth** | $\Delta Retail_t$ | Month-over-month % change in Retail Sales | **Positive** | Consumer spending channel (Di Maggio et al.) |
| **Consumer Confidence Change** | $\Delta CC_t$ | Month-over-month change in Consumer Confidence Index | **Positive** | Forward-looking spending intentions |
| **Credit Spread Change** | $\Delta Spread_t$ | Change in BAA - 10Y Treasury spread (%) | **Negative** | Credit market tightness (wider spreads = higher borrowing costs) |
| **PCE Growth** | $\Delta PCE_t$ | Month-over-month % change in Personal Consumption Expenditures | **Positive** | Broader consumer spending measure |
| **Consumer Credit Growth** | $\Delta Credit_t$ | Month-over-month % change in Total Consumer Credit | **Positive** | Credit availability channel |
| **Inflation Rate** | $\pi_t$ | Month-over-month CPI inflation rate (%) | **Negative** | Purchasing power effects |

**Purpose:** Controls for confounding factors to isolate direct effect of interest rates on BNPL returns.

**Advantage:** Reduces omitted variable bias and provides cleaner estimate of interest rate sensitivity.

### 5.3 OLS Estimation Method

**Estimation Technique:** Ordinary Least Squares (OLS) with robust standard errors (Huber-White HC3 specification)

**Why Robust Standard Errors?**
- Financial returns exhibit heteroskedasticity (variance changes over time)
- HC3 performs better than HC0/HC1 in small samples (MacKinnon and White)
- Accounts for outliers without removing observations

**Model Diagnostics:**
- **Multicollinearity Check**: Correlation matrix (remove variables with correlation > 0.80)
- **Outlier Detection**: IQR method (handled via robust standard errors, not removal)
- **Model Fit Statistics**: R², Adjusted R², F-statistic, RMSE

### 5.4 Interpretation Framework

**Coefficient Interpretation:**
- Each coefficient represents the **ceteris paribus** effect (holding all other variables constant)
- **Statistical Significance**: p < 0.05 (significant), p < 0.10 (marginal)
- **Economic Magnitude**: Coefficient size indicates practical importance

**Model Fit Interpretation:**
- **R-squared**: Proportion of variance explained (0.32 = 32% of variance)
- **Adjusted R-squared**: Penalizes additional variables (preferred for model selection)
- **F-statistic**: Tests whether model as a whole is significant
- **RMSE**: Average prediction error in percentage points