In [1]:
import logging
from doc_parser.convert_pdf import PDFConverter
from doc_parser.utils.openai_agent import OpenAIAgent

logging.basicConfig(level=logging.INFO)

pdf_file = "inputs/JFDS-2025-Ghiye-jfds.2025.1.194.pdf"
pdf_converter = PDFConverter(pdf_file, dpi=300, max_workers=10)

In [2]:
await pdf_converter.convert_to_markdown(temperature=0.0)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/traces/ingest "HTTP/1.1 204 No Content"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/traces/ingest "HTTP/1.1 204 No Content"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/traces/ingest "HTTP/1.1 204 No Content"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/traces/ingest "HTTP/1.1 204 No Content"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POS

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/traces/ingest "HTTP/1.1 204 No Content"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/traces/ingest "HTTP/1.1 204 No Content"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/traces/ingest "HTTP/1.1 204 No Content"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/traces/ingest "HTTP/1.1 204 No Content"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/traces/ingest "HTTP/1.1 204 No Content"


In [19]:
sum([z.raw_responses[0].usage.total_tokens for z in pdf_converter.output_parsed])

109710

In [4]:
agt = OpenAIAgent(model_name="gpt-4.1-nano")

await agt._text_agent("Extract a three word summary from the provided text", pdf_converter.get_markdown())

result = await agt.run()

result.final_output

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"


'Interpretable Bond Factors'

In [10]:
from IPython.display import Markdown

Markdown(pdf_converter.get_markdown())

# Graph-Based Factor Models for Interpretable Credit Spread Decomposition

**Ashraf Ghiye, Baptiste Barreau, Laurent Carlier, and Michalis Vazirgiannis**

---

### Authors' Affiliations
- **Ashraf Ghiye**  
  PhD student in Graph Machine Learning at École Polytechnique and data scientist at the Data and AI Lab at BNP Paribas CIB, France.  
  Email: ashraf.ghiye@bnpparibas.com

- **Baptiste Barreau**  
  Head of the Data and AI Lab at BNP Paribas CIB, France.  
  Email: baptiste.barreau@bnpparibas.com

- **Laurent Carlier**  
  Head of the Data and AI Lab at BNP Paribas CIB, France.  
  Email: laurent.carlier@bnpparibas.com

- **Michalis Vazirgiannis**  
  Professor and team lead at the Data Science and Mining Group, École Polytechnique, France.  
  Email: mwazirg@lix.polytechnique.fr

---

### Key Findings
- The authors propose a novel graph-based framework that integrates static bond features to enable an interpretable spread return decomposition.
- Their model outperforms statistical (PCA and autoencoders) and hybrid (instrumented PCA) models by capturing greater variance while demonstrating enhanced robustness to missing data.
- The proposed model provides actionable insights for portfolio managers, facilitating tasks such as performance attribution and risk decomposition.

---

### Abstract
Factor models are essential tools for understanding asset returns. Statistical factor models such as principal component analysis (PCA) and autoencoders have been widely used to reduce the high-dimensional panels of returns into a lower-dimensional latent space. Although effective at retaining much of the original variance, these models often lack inherent economic interpretation and rely solely on historical data, failing to incorporate contextual features such as asset characteristics into factor construction. Consequently, ad hoc analyses are often required to assign real-world meaning to latent factors. To address these limitations, this article introduces a novel graph factor model (GFM) that integrates domain-informed sparsity, explicitly connecting factors to financially validated features to enable interpretable and robust factor extraction. Extensive experiments on modeling corporate spread returns demonstrate that the GFM captures more variance, is more robust to missing data, and provides clearer economic insights than PCA, autoencoders, and instrumented PCA. By bridging the gap between statistical performance and economic interpretability, this new framework supports tasks such as performance attribution and offers valuable insights for portfolio management.

---

### Introduction
Understanding the main drivers of asset performance is a fundamental challenge in finance. Factor models address this challenge by capturing common sources of variability among a large number of financial assets. These models decompose asset returns into a smaller set of explanatory components, called factors, each representing a distinct source of return variation. Assets respond uniquely to these factors through different weights, known as factor loadings, sensitivities, or exposures. This decomposition provides valuable insights into the systematic drivers of asset returns.
## Graph-Based Factor Models for Interpretable Credit Spread Decomposition

### Text Content:
of performance and supports applications such as performance attribution (Stubbs and Jeet 2016), covariance estimation (Huynh and Lenhard 2022), and trading strategy development (Guijarro-Ordonez, Pelger, and Zanotti 2021; Long and Xiao 2024).  
At the core of factor models is the premise that asset returns are influenced by a set of underlying factors, which may be latent (unobservable), tied to macroeconomic variables, or derived from intrinsic characteristics. Over the years, significant effort has been devoted to identifying, constructing, and testing various factors. Early models relied on macroeconomic indicators or heuristically constructed portfolios to define factors, whereas others used observable asset fundamentals to specify factor sensitivities. These rule-based methods often fail, however, to fully capture the intricate and dynamic nature of financial datasets.  
Statistical factor models overcome these limitations by extracting factors directly from data. Methods such as principal component analysis (PCA) have been widely used, and more recently, autoencoders—a class of machine learning algorithms designed for dimensionality reduction—have gained increasing attention (Coqueret and Guida 2023; Wang and Singh 2024). Advances in machine learning have also led to hybrid models that combine asset characteristics with statistical factor construction, offering greater flexibility and predictive power. Despite these advances, a key limitation persists: These methods lack interpretability, making it hard to derive meaningful economic insights from the extracted factors.  
To address these gaps, we introduce a novel graph-based framework designed to explain the cross-sectional spread returns of corporate bonds. Our method bridges the gap between interpretability and performance by integrating domain knowledge into the latent structure of two core algorithms: autoencoders and matrix factorization. This knowledge, formalized as a bond-factor graph, guides the learning process to extract factors directly tied to financially relevant but unobserved features.  
Our contribution can be summarized as follows:  
- We propose a novel graph-based framework that integrates static bond features to enable an interpretable spread return decomposition.  
- Our model outperforms statistical (PCA and autoencoders) and hybrid (instrumented PCA) models by capturing greater variance while demonstrating enhanced robustness to missing data.  
- The proposed model provides actionable insights for portfolio managers, facilitating tasks such as performance attribution and risk decomposition.  

The remainder of this article is organized as follows: The section “Related Work” reviews related work on factor models, highlighting the gap in interpretability for statistical factor models. The section “Methodology” details the methodology, including the design of the bond-factor graph and its integration into the autoencoder framework. The sections “Experiments” and “Results” present the experimental setup and results, demonstrating the model’s superiority in variance explanation and robustness compared to benchmark models. Finally, the sections “Discussion” and “Conclusion” discuss the implications of the findings and outline directions for future research.

### Related Work
Factor models can be broadly classified into three types: macroeconomic, fundamental, and statistical.  
**Macroeconomic models** were the first to be studied in the literature, beginning with the seminal capital asset pricing model (CAPM) introduced by Sharpe (1964), which posits that asset returns are driven by a single factor—the market return.
# The Journal of Financial Data Science | 3

Subsequent research acknowledged the model’s limitations in fully explaining return variations and sought to extend it. For instance, the Fama–French three-factor model (Fama and French 1993) added size and value factors to capture the effects of market capitalization and book-to-market ratios alongside the market factor. Similarly, Carhart (1997) added a fourth factor, momentum, to account for the tendency of past winners to outperform past losers. More recently, the Fama and French five-factor model (Fama and French 2015) included profitability and investment factors to further enhance explanatory power.

**Fundamental models** explain asset returns using firm-specific characteristics.  
A prominent example is the BARRA model (Barra 1998), which decomposes total risk into systematic and idiosyncratic components by associating firm-level characteristics, such as industry classification, with systematic factors. The factor exposures are deterministically derived from these characteristics, whereas factor returns are estimated empirically using cross-sectional regressions of returns on exposures. Although this rule-based construction ensures interpretability, its reliance on fixed assumptions and heuristics can limit flexibility in capturing more complex relationships.  

**Statistical models** extract factors directly from data using statistical techniques such as principal component analysis (PCA), without necessarily tying them to observable economic variables. Although such methods maximize the fit to historical data, the resulting factors may lack economic interpretability. Recent advancements in this area include the use of machine learning techniques that automate factor construction. Autoencoders (Rumelhart, Hinton, and Williams 1986), for instance, extend PCA by introducing nonlinear modeling capabilities, offering greater flexibility in uncovering complex patterns in asset returns. These methods are, however, prone to overfitting and still face challenges with interpretability.

Empirical evidence suggests that fundamental and statistical models generally outperform macroeconomic ones in explanatory power (Connor 1995). Modern research has sought to bridge fundamental and statistical approaches. For instance, Kelly, Pruitt, and Su (2019) introduced instrumented principal component analysis (IPCA), a method that parametrizes factor loadings as a function of firm characteristics. This approach significantly reduces the number of parameters to estimate while showing improved capacity in explaining stock returns. Building on this idea, Gu, Kelly, and Xiu (2021) proposed a more flexible framework using autoencoders, enabling nonlinear modeling between loadings and characteristics. Finally, Kelly, Palhares, and Pruitt (2023) used a five-factor IPCA to demonstrate its suitability for modeling corporate bond returns.

Our work differs from these studies in two key aspects. First, although existing hybrid approaches use time-varying factor loadings based on evolving firm characteristics, our focus is on incorporating static bond characteristics, embedded directly into the model structure through a bipartite bond-factor graph. Second, prior studies prioritize improved predictive power and factor identification with less emphasis on interpretability, as factors often remain latent and unstructured. This reduces their applicability to tasks such as performance attribution. In contrast, our approach connects unobserved bond characteristics with latent factors, enhancing the interpretability and practical utility of the decomposition.

## METHODOLOGY

**Problem Formulation**  
Let $i \in \{1, ..., N\}$ denote the bond index and let $j \in \{1, ..., K\}$ denote the factor index. Consider $T$ time periods $t = 1, 2, ..., T$ over which the signals are observed.
# Graph-Based Factor Models for Interpretable Credit Spread Decomposition

---

The goal of a factor model is to decompose the original signals into a smaller set of underlying factors such that $K \ll N$. This article considers linear multifactor models of the form:

\[ r_{i,t} = \sum_{j=1}^{\kappa} \beta_{i,j} \cdot f_{j,t} + \epsilon_{i,t} \]

where $r_{i,t}$ represents the signal (e.g., daily return of bond $i$ on day $t$), $f_{j,t}$ the factor realization, and $\beta_{i,j}$ the factor loading—that is, the sensitivity of bond $i$ to factor $j$. The model residual, $\epsilon_{i,t}$, accounts for the idiosyncratic part that cannot be explained by the factors.

Factor models differ in how they define and estimate the loadings and realizations. For example, models like CAPM use macroeconomic variables as factors and estimate the loadings through time-series regression for each bond. In contrast, models like BARRA derive loadings from firm characteristics and estimate the realizations using cross-sectional regression at each time period. Statistical models, like ours, treat both loadings and realizations as unknowns, requiring estimates through statistical techniques.

## Autoencoders

One way to define the factors is by parameterizing them as functions of the input signals. Principal component analysis (PCA) is a widely used technique that exemplifies this approach. It is commonly used in finance to identify the principal sources of variations in financial datasets by seeking orthogonal linear projections that maximize the variance of the data in a lower-dimensional space. The identified components serve as latent factors. Although effective at reducing dimensionality, PCA is known for its lack of interpretability and inefficient use of contextual information.

Autoencoders extend this idea by employing neural networks to learn compressed representations of the data. An autoencoder is composed of two main blocks: the encoder and the decoder. The encoder (Enc) maps the high-dimensional input data into a lower-dimensional representation within the latent space. The decoder (Dec) reconstructs the input from this latent representation with, hopefully, minimal distortion. This flexible framework allows autoencoders to capture more complex and potentially nonlinear relationships between signals and factors. The overall process can be summarized as:

\[ r_{t} \xrightarrow{\text{Enc}} f_{t} \xrightarrow{\text{Dec}} \hat{r}_{t} \]

where $r_{t} = [r_{1,t}, r_{2,t}, \ldots, r_{N,t}]^{T}$ denotes the vector of bond returns at time $t$, $f_{t} = [f_{1,t}, f_{2,t}, \ldots, f_{N,t}]^{T}$ represents the latent factors capturing common sources of variability among the returns, and $\hat{r}_{t}$ is the reconstructed returns vector. The notation $f_{t} \in \mathbb{R}^{N}$ indicates the latent factors capturing the common sources of variability among the returns, and $\hat{r}_{t}$ is the reconstructed returns vector. PCA can be viewed as a special case of autoencoders, in which both the encoder and decoder are linear transformations without activation functions. Our work focuses on linear autoencoders with a single hidden layer, as illustrated in Exhibit 1, Panel A, and summarized by the following equations:

\[ f_{j,t} = \sum_{i=1}^{N} w_{j,i} \cdot r_{i,t} \]

*Strictly speaking, the equivalence is not exact, as PCA ensures that the latent factors are orthogonal.*
# EXHIBIT 1
## An Illustration of Our Proposed Model

### Panel A: Linear AutoEncoder
- The diagram shows a neural network with an encoder and decoder.
- The encoder takes input $r_{i,t}$, where $i \in \{1, 2, ..., N\}$.
- The encoder computes $f_{i,t} = \sum_{j=1}^k w_{j,i} \cdot r_{i,t}$.
- The decoder reconstructs $r_{i,t}$ as $r_{i,t} = \sum_{j=1}^k \beta_{j,i} \cdot f_{j,t}$.
- The network has three nodes in the hidden layer labeled 1, 2, and 3.

### Panel B: Graph Factor Model
- The diagram shows a graph-based neural network with an encoder and a set of factors.
- The input $r_{i,t}$ is processed to produce $r_{i,t} = \sum_{j \in N(i)} \beta_{j,i} \cdot f_{j,t}$.
- The factors include:
  - Country factors: FR, UK
  - Rating factors: A+, A, BBB
  - Maturity factors: 1Y, 5Y, 10Y
- The weights $w_{FR,i}$ are normalized as $\hat{w}_{FR,i} = \frac{w_{FR,i}}{\sqrt{|N(FR)|}}$.
- The factors are represented as colored circles:
  - Beige for Country
  - Pink for Rating
  - Blue for Maturity

### Notes:
- The example considers $k=3$ latent factors for the linear autoencoder (Panel A).
- The graph model uses $M=3$ features.
- The graph includes eight factors derived from three features.
- The fully connected autoencoder lacks interpretability, whereas the sparse graph factor model has interpretable factors based on economic characteristics.

### Legend:
| Color | Category     |
|---------|--------------|
| Beige   | Country      |
| Pink    | Rating       |
| Blue    | Maturity     |
## Graph-Based Factor Models for Interpretable Credit Spread Decomposition

where $w_{j,i}$ are the encoder weights mapping inputs to factors, and $\beta_{j,i}$ are the decoder weights mapping factors back to the reconstructed returns. The objective is to find the optimal weights $(w_{j,i})$ and $(\beta_{j,i})$ that minimize the reconstruction error between the input and output.

### Graph Factor Model

Similar to PCA, autoencoders lack interpretability as a result of the dense connections between the input/output and latent representations. Moreover, both models rely solely on historical returns, without leveraging additional features that could provide a richer context for decomposing the input signal.

We propose a novel model that incorporates interpretable factors derived from corporate bond characteristics. Each bond is defined by a set of features, such as its country of issuance, credit rating, and maturity. These categorical features serve as the basis for deriving the factors, which correspond to the enumeration of all possible values of the features.

For example, if we consider only two features—country and rating—the resulting factors would consist of the set of all possible countries (e.g., {FR, UK, US}) and the set of all possible ratings (e.g., {A+, A, BBB}). It is important to note that each bond is associated with exactly $M$ factors, where $M$ represents the number of categorical features. In the previous example, $M=2$.

Next, we build a bond-factor graph, which encodes the relationships between bonds and factors. This bipartite graph is constructed as follows: (1) Every bond and factor represent a node, resulting in two disjoint sets of nodes, $\mathcal{U}$ and $\mathcal{V}$, for bond and factor nodes, respectively, and (2) each bond $i \in \mathcal{U}$ is connected to $M$ factors based on its characteristics. We use $\mathcal{N}(i) = \{j \in \mathcal{V} : w_{j,i} \neq 0 \text{ and } \beta_{j,i} \neq 0\}$ to denote the set of factors of bond $i$. For example, $\mathcal{N}(i) = \{\text{FR, A+}\}$ if the $i$ bond is issued from France and has an A+ rating. Similarly, $\mathcal{N}(j)$ represents the set of bonds of the set of bonds of the $j$ factor.

The bond-factor graph defines the connections in both the encoder and decoder—learnable parameters exist only on edges that are defined.

Exhibit 1. Panel B illustrates our model in contrast to a linear autoencoder. Interestingly, each part of the autoencoder in the section “Autoencoders” resembles a bond-factor graph, but with a graph that is fully connected, in which each bond is linked to all the factors. Therefore, the dense autoencoder can be seen as a special case of our model in which the features are indistinguishable: Each feature has only one value (factor) shared by all the bonds. Hence, the previous equations can be generalized to:

$f_{j,t} = \frac{1}{| \mathcal{N}(j) |} \sum_{i \in \mathcal{N}(j)} w_{j,i} \cdot r_{i,t}$

$\hat{r}_{j,t} = \sum_{i \in \mathcal{N}(j)} \beta_{j,i} \cdot f_{i,t}$

where the main difference lies in the indexes over which the summation is defined. To account for the varying number of connections to each factor and to maintain an economical interpretation of factor realizations, we normalize the weights $w_{j,i}$ by the total number of bonds associated with each factor. We also explore alternative normalization techniques, namely Softmax and ReLU (Agarap 2018), so that each
factor realization can be viewed as an investable portfolio (with weights summing to one), representing the basket of bonds associated with that factor.  
Our model resembles a sparsely connected autoencoder (Alessandri et al. 2021).  
Equation 5 illustrates the encoding step, in which the factor realizations are estimated based on bond returns. In the second stage, Equation 6 describes the decoding step, in which bond returns are reconstructed using the factor realizations. The model’s weights are optimized jointly using a gradient descent algorithm.  

Although our model, with $M$ features, shares the same number of parameters as a linear autoencoder, with $K = M$, it offers two distinct advantages. First, the factors are interpretable by construction, as they are directly tied to economic characteristics. Second, our method is more flexible in production environments in which the input size can vary from one time period to another. Indeed, as new bonds are issued and others mature, the dataset used for training may contain many missing values. The graph-based design handles this variation naturally by operating only on the available set of bonds.  

**Credit Factor Model**  

Parametrizing the factor realizations as a function of the inputs could be limiting. To have as little constraint as possible on factors, we propose another variant, called credit factor model (CFM), in which the factors are defined implicitly. The decoder part remains unchanged, but we consider the factor realizations to be latent and learn them jointly with the factor loadings $\beta_{i, u}$.  

**Matrix formulation.** To understand the key difference between the two variants, we consider the matrix form of the problem. At a higher level, both models can be expressed as follows:  

| R | = | B | × | F | + | ε |  
|---|---|---|---|---|---|---|---|  
| $R_{N \times T}$ | | $B_{N \times K}$ | | $F_{K \times T}$ | | $ε_{N \times T}$ |  

where $R$, $B$, and $F$ correspond to the signal, factor loadings, and factor realizations matrices, respectively. $\epsilon$ represents the matrix of residuals. A common feature of both models is that the loading matrix $B$ is designed to be sparse, embedding domain knowledge and facilitating more interpretable decomposition based on the bond-factor graph structure. Specifically, each row contains only $M$ nonzero learnable parameters, corresponding to the relevant set of factors associated with a given bond. The two models differ, however, in how $F$ is defined.  

The graph factor model parametrizes the realizations explicitly as a weighted sum of the input signal, given by $F = W \times R$, where $W$ shares the same sparse structure as $B$ but with different parameters to learn. As a result, GFM resembles a symmetric autoencoder. In contrast, the credit factor model does not parametrize the realizations; instead, $F$ is considered to be latent. This approach is closer to a matrix factorization algorithm but with constrained latent structure on the loadings. Equations 8 and 9 present the objective functions of GFM and CFM, respectively. Notably, CFM has roughly half as many parameters as GFM to learn.  

$B^*, W^* = \arg \min_{B, W} \| R - B \times W \times R \|_{F}^2$  

---

*The universe of bonds changes over time, resulting in an unbalanced panel of returns. To cope with this, we use EM-PCA (Bailey 2012b), which is an expectation maximization implementation of PCA that is robust to missing values. Our graph-based models, as they rely on set operations and gradient-descent algorithms, are not affected by missing values.*
## Text Extraction

B*, F* = arg min_{B, F} || R - B × F ||_F  (9)

After fitting the model, the factor realizations are estimated differently by the two models. For GFMI, the input spreads are projected using the learned weights W* to obtain the realizations. Whereas, for CFM, the realizations of each new period are obtained by projecting the input onto* the loadings (i.e., $F = B^* \times R$). Here, $B^* = (B'^B)^{-1} \times B'^T$ denotes the Moore–Penrose pseudo-inverse, which provides a least squares solution to the projection of R onto the column space of B. The algorithms describing the learning and inference steps for GFM and CFM are provided in “Pseudocode.”

## Experiments

### Data
The dataset comprises $N = 2,625$ time series representing the daily z-spread returns of corporate bonds, starting from August 2022 to the end of March 2024 ($T = 435$ days). Each bond is associated with a static vector of $M = 7$ categorical features. The returns are stacked into a $N \times T$ matrix, whereas the categorical features are one-hot encoded into an $N \times K$ binary matrix A. This latter serves as the adjacency matrix of the bond-factor graph, where each entry, $A[i,j]$, represents a connection between bond $i$ and factor $j$.

As corporate bonds trade over the counter with limited liquidity, their prices can exhibit slight variations, introducing some noise in spread calculations. To mitigate this, we preproces the spreads by excluding bonds with less than six months to maturity and winsorizing returns at the 1st and 99th percentiles. Exhibit 2 illustrates three examples of daily z-spread and the return distribution of the entire dataset.

### Exhibit 2
**Z-Spread Returns for Three Different Bonds and Return Distribution of the Full Dataset**

- Panel A: Return Time Series  
  **Z-Spread Returns**  
  - XS1627947440  
  - FR001400KY44  
  - XS235987224  

  *Graph showing the daily z-spread returns over time from October 2022 to March 2024, with the x-axis labeled "Data" and the y-axis labeled "Returns [bp]".*

- Panel B: Return Distribution  
  **Return Distribution From Aug 2022 to Mar 2024**  
  - Histogram with the label "Spread" in the legend, showing the distribution of returns in basis points (bp). The histogram is centered around zero, with most returns within approximately -10 to 10 bp, and the count axis indicating the frequency of returns.

*Note: The figure description summarizes the content of the figure, which includes time series plots of three bonds' z-spread returns and a histogram of the return distribution across the dataset.*

### Additional Notes
Downloaded from [https://pm-research.com/content/ijjjfds](https://pm-research.com/content/ijjjfds), at Canada Pension Plan Inv Board on July 4, 2025. Copyright 2025 With Intelligence LLC.  
It is illegal to make unauthorized copies, forward to an unauthorized user, post electronically, or store on shared cloud or hard drive without Publisher permission.
Notably, as bonds have finite lives, some naturally exit the bond universe over time, whereas new ones are continuously added.  
**Static Features.** Each bond is characterized by six categorical features. These features are summarized in Exhibit 3: The first column lists the features’ names, the second shows the cardinality, and the last two give a brief description and some examples of the different values. We add a seventh (virtual) feature that is common to all bonds to simulate a market component. "Bond Features", in the appendix section, provides additional details.

### Model
We compare our models—GFM and CFM—to three key benchmarks:
- **AE:** We use a linear autoencoder, which is implemented as a special case of GFM with seven artificial features, each having one factor that is connected to all bonds. In theory, it should yield a performance similar to PCA.
- **PCA:** We use an EM-PCA, which allows for estimating the principal components on unbalanced panel datasets. We set the number of principal components $k$ equal to the number of features used by GFM so that $k = M = 7$, and give a weight of $1e^{-5}$ to missing values.
- **IPCA:** We use an instrumented PCA with static loading specification and $k=7$. The factors are considered latent, and the loadings are parametrized as a linear function of the bond features. That is, $B = A \times \Gamma$, with A being the feature matrix and $\Gamma$ a learnable projection shared by all bonds.

The models have little parameters to tune. Each model is trained for 400 epochs, and the parameters of our models (GFM, CFM) and AE are optimized using Adam (Kingma and Ba 2014), with a learning rate of 0.05 and a weight decay of $1e^{-5}$.

### Evaluation
**Reconstruction metrics.** We rely on two key metrics to evaluate the goodness of fit. First, the mean squared error (MSE) of the model residuals. Second, the coefficient of determination, $R^2$, which measures the proportion of variance captured by the model. Although the models are optimized to capture cross-sectional variance, we use three different ways to compute the $R^2$.

---

### EXHIBIT 3
**Brief Description of Features and Factors**

| Feature          | Cardinality | Description                     | Factor Examples             |
|------------------|--------------|---------------------------------|------------------------------|
| Country          | 17           | Country of issuance             | France, UK, US             |
| Sector           | 10           | Bloomberg sector                | Financials, Utilities      |
| Rating           | 8            | Bloomberg rating                | AA, A+, BBB                |
| Tenor            | 5            | Bond tenor                      | 3Y, 5Y, 10Y                |
| Issued Amount    | 3            | Issued amount                   | 500M, 1000M, 1500M        |
| Debt Type        | 2            | Bond seniority                  | Preferred/Non-Preferred   |
| Market           | 1            | Virtual factor                  | —                            |

**NOTE:** There are $M=7$ features, and $K=46$ factors in total.
## Cross-Sectional \( R^2 \)
This metric evaluates the model’s ability to capture daily cross-sectional variation in spread movement, that is, across bonds in different periods.

\[
CS R^2 = \frac{1}{T} \sum_{t=1}^{T} \left( 1 - \frac{\sum_{i=1}^{N} (r_{i,t} - \hat{r}_{i,t})^2}{\sum_{i=1}^{N} r_{i,t}^2} \right)
\quad \text{(10)}
\]

## Time Series \( R^2 \)
This metric focuses on each bond’s spread variation over time. It measures how well the model explains the temporal spread fluctuations at the bond level.

\[
TS R^2 = \frac{1}{N} \sum_{i=1}^{N} \left( 1 - \frac{\sum_{t=1}^{T} (r_{i,t} - \hat{r}_{i,t})^2}{\sum_{t=1}^{T} r_{i,t}^2} \right)
\quad \text{(11)}
\]

## Total \( R^2 \)
This metric summarizes the explanatory power of systematic risk factors, within a given model specification, in describing the realized riskiness in the panel of individual bond returns.

\[
Total R^2 = 1 - \frac{\sum_{i=1}^{N} \sum_{t=1}^{T} (r_{i,t} - \hat{r}_{i,t})^2}{\sum_{i=1}^{N} \sum_{t=1}^{T} r_{i,t}^2}
\quad \text{(12)}
\]

---

### EXHIBIT 4
**In-Sample Comparison of GFM and CFM with Three Baseline Models: AE, PCA, and IPCA**

| Fold   | Model | \( R^2 \) | MSE   | Total |
|---------|--------|-----------|--------|--------|
|         |        | Cross-Sectional | Time Series | Total | Total |
| Fold 1 | AE     | 50.9      | 65.2   | 63.0  | 2.648 |
|        | PCA    | 51.0      | 65.3   | 63.1  | 2.644 |
|        | IPCA   | 33.7      | 51.7   | 45.5  | 3.899 |
|        | GFM    | 59.2      | 68.3   | 69.6  | 2.175 |
|        | CFM    | 60.0      | 68.8   | 70.3  | 2.127 |
| Fold 2 | AE     | 51.3      | 66.2   | 63.6  | 2.456 |
|        | PCA    | 51.3      | 66.3   | 63.7  | 2.450 |
|        | IPCA   | 34.4      | 53.7   | 47.6  | 3.535 |
|        | GFM    | 59.5      | 69.2   | 69.9  | 2.029 |
|        | CFM    | 60.4      | 69.7   | 70.6  | 1.983 |
| Fold 3 | AE     | 50.4      | 65.8   | 63.0  | 2.391 |
|        | PCA    | 50.9      | 65.9   | 63.2  | 2.376 |
|        | IPCA   | 33.9      | 53.9   | 47.6  | 3.385 |
|        | GFM    | 59.3      | 69.0   | 69.7  | 1.957 |
|        | CFM    | 60.1      | 69.6   | 70.5  | 1.908 |

**NOTES:** The exhibit reports various \( R^2 \) metrics (in percent) and the MSE across three in-sample periods: Fold 1 (March 29, 2023, to December 31, 2023), Fold 2 (April 28, 2023, to January 31, 2024), and Fold 3 (May 25, 2023, to February 29, 2024). Models are trained and evaluated on the same sample in each fold. Bold and underlined values denote the best and second-best performance, respectively.
| **Fold**   | **Model** | **Cross-Sectional** | **Time Series** | **Total** | **R²** | **MSE** | **Total** |
|------------|-----------|---------------------|-----------------|-----------|--------|---------|-----------|
| **Fold 1** | AE        | 53.0                | 67.1            | 67.7      | 2.965  |         |           |
|            | PCA       | 53.0                | 66.6            | 67.7      | 2.969  |         |           |
|            | IPCA      | 43.4                | 58.3            | 60.9      | 3.590  |         |           |
|            | GFM       | 59.4                | 70.0            | 72.8      | 2.501  |         |           |
|            | CFM       | 59.8                | 69.5            | 73.1      | 2.472  |         |           |
| **Fold 2** | AE        | 38.4                | 52.5            | 47.9      | 2.714  |         |           |
|            | PCA       | 38.3                | 52.4            | 47.8      | 2.719  |         |           |
|            | IPCA      | 29.3                | 45.3            | 39.7      | 3.141  |         |           |
|            | GFM       | 48.4                | 54.9            | 56.7      | 2.257  |         |           |
|            | CFM       | 48.7                | 55.0            | 56.9      | 2.244  |         |           |
| **Fold 3** | AE        | 36.1                | 46.7            | 48.3      | 1.842  |         |           |
|            | PCA       | 36.7                | 45.6            | 49.0      | 1.817  |         |           |
|            | IPCA      | 28.5                | 38.7            | 39.5      | 2.155  |         |           |
|            | GFM       | 45.0                | 49.8            | 56.0      | 1.566  |         |           |
|            | CFM       | 45.7                | 50.1            | 56.8      | 1.541  |         |           |

**NOTES:** The exhibit reports various $R^2$ metrics (in percent) and the MSE across three out-of-sample periods: Fold 1 (January 1, 2024, to January 31, 2024), Fold 2 (February 1, 2024, to February 29, 2024), and Fold 3 (March 1, 2024, to March 31, 2024). In each fold, the models are trained on the preceding nine months. Bold and underlined values denote the best and second-best performance, respectively.

The most significant improvements are observed in cross-sectional and total $R^2$, in which GFM and CFM show a substantial increase of 7% to 10%, underscoring their superior ability to explain spread variations in the returns panel. Although the gain in time-series $R^2$ is smaller, our models continue to outperform others in capturing the temporal spread variations at the bond level. AE and PCA exhibit comparable performance, reaffirming their theoretical equivalence. In contrast, IPCA performs significantly worse on all metrics. We attribute the low performance of IPCA to its instrumented loading structure, which we consider ill-suited for static loadings. IPCA assumes equivalence among bonds with identical features; such bonds can, however, display differing performances over the same period. For example, two bonds sharing the same factor may show varying sensitivities to changes in that factor’s realization. Unlike IPCA, which fails to account for this asset-specific variability, our models enable similar bonds to respond differently to the same factor, enhancing their flexibility and explanatory power of the model.

The out-of-sample results validate the ability of our models to generalize effectively to unseen data. In each fold, the models are trained on nine months of data, and the learned loadings are then applied to reconstruct daily returns in the subsequent month. The consistent outperformance of our models in the test periods proves that their strong performance is not merely a result of statistical overfitting on the training data. Two key findings emerge from these results: First, our graph-based models explain more variance and provide a better description of systematic risk in bond returns; second, they generalize more effectively to new data and exhibit stability across folds.

### Ablation Study

**Window size.** To evaluate the impact of training window size on model performance, Exhibit 6 shows the out-of-sample reconstruction error (MSE) for window sizes ranging from 1 to 18 months. The performance trend remains consistent across all folds and window sizes, with CFM consistently surpassing GFM, and both models achieving lower errors than PCA. An optimal window size for PCA, GFM, and CFM is observed at 9 months, with PCA achieving equally good performance around the window (at 8 and 10 months).

**Weight normalization.** Exhibit 7 compares the performance of autoencoder-based models (AE and GFM) under three weight normalization techniques: simple averaging, sparse softmax (constraining nonzero weights to be positive and sum to one), and ReLU (allowing only nonnegative weights). We explored these constraints to assess their potential to enhance generalization and improve out-of-sample performance. The results indicate, however, that both models achieve the highest $R^2$ and lowest MSE with simple average normalization. Although ReLU exhibits performance closer to simple averaging, softmax consistently results in the lowest performance in out-of-sample reconstruction. This indicates that softmax and ReLU introduce unnecessary
## EXHIBIT 6
Out-of-Sample Reconstruction Error (MSE) for PCA, GFM, and CFM across Three Different Folds (January, February, and March 2024) and Varying Training Window Sizes (1 to 18 months)

| Window Size (month) | PCA | GFM | CFM |
|---------------------|-------|-------|-------|
|                     |       |       |       |
| 1                   | 3.5   | 3.2   | 3.4   |
| 2                   | 3.2   | 2.9   | 3.1   |
| 3                   | 3.0   | 2.7   | 2.9   |
| 4                   | 2.8   | 2.5   | 2.7   |
| 5                   | 2.7   | 2.4   | 2.6   |
| 6                   | 2.6   | 2.3   | 2.5   |
| 7                   | 2.5   | 2.2   | 2.4   |
| 8                   | 2.4   | 2.1   | 2.3   |
| 9                   | 2.3   | 2.0   | 2.2   |
| 10                  | 2.3   | 2.0   | 2.2   |
| 11                  | 2.2   | 1.9   | 2.1   |
| 12                  | 2.2   | 1.9   | 2.1   |
| 13                  | 2.1   | 1.8   | 2.0   |
| 14                  | 2.1   | 1.8   | 2.0   |
| 15                  | 2.1   | 1.8   | 2.0   |
| 16                  | 2.0   | 1.7   | 1.9   |
| 17                  | 2.0   | 1.7   | 1.9   |
| 18                  | 2.0   | 1.7   | 1.9   |

*Note: The red line denotes average performance across the three folds.*

## EXHIBIT 7
Out-of-Sample Performance Comparison of AE (blue) and GFM (orange) with Three Different Weight Normalization Techniques: Simple Average, Softmax, and ReLU

| Folds | Cross-Sectional R² | Time Series R² | Panel R² | Panel MSE |
|--------|---------------------|----------------|----------|-----------|
|        | AE | GFM | AE | GFM | AE | GFM | AE | GFM |
| Fold 1 | 0.55 | 0.60 | 0.65 | 0.68 | 0.52 | 0.58 | 2.8 | 2.4 |
| Fold 2 | 0.50 | 0.55 | 0.60 | 0.63 | 0.48 | 0.54 | 2.6 | 2.3 |
| Fold 3 | 0.52 | 0.58 | 0.62 | 0.66 | 0.50 | 0.56 | 2.7 | 2.5 |

*Note: Bars are color-coded: Blue for AE, Orange for GFM, with different hatch patterns indicating normalization techniques (not shown in markdown table).*

*Download link: [https://pm-research.com/content/jjjjds](https://pm-research.com/content/jjjjds), at Canada Pension Plan Inv Board on July 4, 2025. Copyright 2025 With Intelligence LLC.*
# Summer 2025

## The Journal of Financial Data Science | 13

---

### Notation for Normalization Methods:
- Model: $\hat{w}_{i,j} = \frac{1}{|N(j)|} \cdot w_{i,j}$
- Model-Softmax: $\hat{w}_{i,j} = \frac{1}{\sum_{i \in N(j)} \exp(w_{i,j})} \cdot \exp(w_{i,j})$
- Model-ReLU: $\hat{w}_{i,j} = \frac{1}{|N(j)|} \cdot \max(0, w_{i,j})$

---

### Model Robustness

This section explores whether the models rely heavily on specific bonds to estimate their factors. Specifically, we examine model performance when some bonds unexpectedly have missing values. After training, each bond is randomly assigned to one of two groups: *masked* and *unmasked*, with a probability $p$ of being in the masked group. During inference, masked bonds are excluded from factor estimation, meaning their spreads are treated as missing (replaced with zeros) when computing the factor realizations. The estimated factors are then used to reconstruct spreads for all bonds, regardless of group assignment. Exhibit 8 illustrates how Panel $R^2$ of both groups varies with different masking probabilities.  
*Footnote: To account for randomness, this procedure is repeated 10 times with different random seeds.*

First, the performance difference between masked and unmasked bonds remains largely consistent, except for CFM under high masking probabilities ($p \geq 0.7$). For instance, PCA shows a near-zero performance gap between the two groups. This indicates that, even if their explanatory capacity diminishes, the factors maintain a uniform explanatory power across both groups, regardless of the data used for their estimation. In other words, whether the factors are estimated using 30% or 80% of the data, they exhibit equal explanatory power across all bonds.

Second, all models demonstrate resilience to a small amount of missing data, as evidenced by the close performance of unmasked bonds to the baseline (dashed black line) at low masking probabilities ($p=0.1$). As the masking probability increases ($p \geq 0.2$), however, PCA and GFM exhibit notable declines in performance (decreasing blue curves), whereas CFM remains more stable (flat blue curve). This difference arises from the models’ factor estimation approaches: CFM dynamically finds factors that minimize the reconstruction error on the available data, whereas PCA and GFM rely on a more rigid approach using pre-learned encoder weights, leading to sharper declines in performance.

Finally, the performance of the masked (red) group provides insights into whether the estimated factors overfit the available data or generalize well by explaining bonds that were not used in their estimation. Notably, CFM factors preserve high explanatory power and maintain a clear advantage over other models, even under high masking probabilities. For instance, at $p=0.5$, the panel $R^2$ of the masked group across the three folds is 65%, 46%, and 41% for CFM, 51%, 38%, and 37% for PCA, and 50%, 35%, and 36% for GFM (i.e., when using only half of the bonds to estimate the factors, CFM explains approximately up to 30% more variance in the remaining bonds compared with PCA or GFM). These findings highlight the potential of CFM to handle incomplete datasets in real-world scenarios, in which missing data is common, making it a more reliable tool for robust factor analysis.

---

### Footnotes:
5. To account for randomness, this procedure is repeated 10 times with different random seeds.
# EXHIBIT 8
## Comparison of Model Robustness under Varying Masking Probabilities

|                        | Fold 1 | Fold 2 | Fold 3 |
|------------------------|---------|---------|---------|
|                        | Panel R² | Panel R² | Panel R² |
| Masked                 | (red line with dots) | (red line with dots) | (red line with dots) |
| Unmasked               | (blue line with dots) | (blue line with dots) | (blue line with dots) |

### Graph Description:
- The x-axis represents Masking Probability (p), ranging from 0.1 to 0.9.
- The y-axis represents Panel R², ranging from 0.0 to 0.8.
- Each plot compares the performance of models with masked (red) and unmasked (blue) bonds across different masking probabilities.
- Solid lines indicate the average performance across 10 random seeds.
- Shaded areas represent the standard deviation.
- The dashed black line indicates the baseline performance ($p=0$).

## Notes:
Rows correspond to different models (PCA, GFM, CFM) and columns to different folds.
For each fold, the factor realizations are estimated using a random subset of (unmasked) bonds and then used to reconstruct spreads for all bonds.
The graphs display the panel $R^2$ of both masked (red) and unmasked (blue) bonds for different masking probabilities.
Solid lines denote the average performance across 10 random seeds, whereas shaded areas represent the standard deviation.
The dashed black line indicates the baseline performance ($p=0$).

# DISCUSSION
This article introduces two dimensionality reduction algorithms that integrate domain knowledge through a graph-based approach. We posit that bond spreads are influenced by certain key factors, which are unobserved and derived from static features such as credit rating and sector. Our methods construct a bond-factor graph that captures the relationships between bonds and their associated features. The sparse connectivity of this graph informs the structure of our models: In GFM, the connections of a linear autoencoder are configured to align with the graph structure, enforcing an interpretable mapping between the input space (bonds) and the latent space (factors), and vice versa; in CFM, the loading matrix of a linear factor model adopts the same sparse structure as the graph’s adjacency matrix.

Downloaded from https://pm-research.com/content/jiijfds, at Canada Pension Plan Inv Board on July 4, 2025. Copyright 2025 With Intelligence LLC.
It is illegal to unauthorized copies, forward to an unapproved user, post electronically, or store on shared cloud or hard drive without Publisher permission.
# The Journal of Financial Data Science | 15

---

Our results show that integrating this knowledge not only improves explanatory power but also enhances the interpretability and robustness of the models. The practical value of our approach lies in its interpretable decomposition, which, unlike other statistical and hybrid approaches, enables the models to uncover relevant factors with clear financial interpretations. This makes the model particularly useful for applications such as risk management, in which understanding factor contributions is essential. For instance, the learned decomposition could facilitate performance attribution by breaking down portfolio returns (or risks) into contributions from distinct economic factors. In future work, we aim to explore this application in greater depth, along with others, such as covariance estimation and portfolio construction.

Despite their strengths, our methods have a notable limitation: Unlike PCA, the learned factors are not necessarily uncorrelated. Exhibits A2, A3, and A4 illustrate the correlation matrix of the learned factors across various models. Although PCA (Exhibit A2) ensures orthogonality, the factors in both GFM (Exhibit A3) and CFM (Exhibit A4) reveal a higher degree of correlation, particularly among factors within the same category (e.g., sector), as highlighted by the block structure. Notably, CFM displays lower correlations compared to GFM. It is worth noting that having orthogonal factors is not necessarily desirable: PCA often fails to uncover useful factors beyond the first few components, limiting its ability to capture diverse sources of variations. Conversely, very high correlations between factors can render a model ineffective, as they may collapse into just a few dominant ones. Investigating the trade-off between interpretability and factor independence remains an open question and is a topic for future exploration.

## CONCLUSION

We present a novel multifactor framework that retrofits two prominent models—a linear autoencoder and matrix factorization—with domain-informed sparsity to improve interpretability. By leveraging a bipartite bond-factor graph to inform the model architectures, the framework integrates bond features into the factor specification, enabling the latent space to capture both statistical variance and domain-specific knowledge. This design makes the framework particularly well-suited for applications such as performance attribution and risk decomposition.

Experimental results on modeling the spread returns of corporate bonds demonstrate that these models surpass PCA, traditional autoencoders, and IPCA in explaining greater variance. Additionally, the CFM variant exhibits improved robustness to missing data, making it a reliable solution for scenarios with data inconsistency. Future work will focus on extending the framework to incorporate dynamic factors, such as time to maturity, and other observable factors, such as interest rates. We also aim to explore the broader applications, including profit-and-loss attribution, risk management, and portfolio construction, to further assess the model’s practical utility.

# PSEUDOCODE

This section provides the implementation details of GFM and CFM. For simplicity, we present the algorithms for the case in which the returns matrix is balanced. If $R$ contains missing data, an additional Boolean matrix $R_{mask}$ is needed to identify and select the non-missing bonds in each period.

- **Input:** Returns matrix $R$ (size $T \times N$), where $T$ is the number of periods and $N$ is the number of bonds.
- **Output:** Learned factors and loadings.

## GFM Algorithm

1. Initialize factor matrix $F$ (size $T \times K$) and loadings matrix $L$ (size $K \times N$).
2. Repeat until convergence:
   - Update $F$ by minimizing $\| R - F L \|_F^2 + \lambda \Omega(F)$, where $\Omega(F)$ encodes domain-informed sparsity.
   - Update $L$ similarly, with regularization.
3. Return $F$, $L$.

## CFM Algorithm

1. Initialize factors $F$, loadings $L$, and bond features $X$.
2. For each iteration:
   - Update $F$ considering bond features $X$, enforcing sparsity.
   - Update $L$ accordingly.
3. Return $F$, $L$.

*Note:* The actual optimization involves solving regularized least squares problems with sparsity constraints, which can be implemented using coordinate descent or proximal gradient methods.
The algorithms have three key hyperparameters: the number of epochs ($E$), the learning rate ($\eta$), and the regularization term ($\lambda$). Convergence can be determined using various criteria, such as when the loss improvement falls below a defined threshold. In our case, $E = 400$ (with $\eta = 0.05$ and $\lambda = 1e^{-5}$) ensures that the loss converges with a precision of $1e^{-4}$.

The element-wise (Hadamard) product between two matrices, $A$ and $B$, is denoted by $A \odot B$, and the product between a matrix and a vector is written as $B \mathbf{r}$.

## GFM

Algorithms 1 and 2 outline the training procedure and the inference steps on a new data point. Note that when $A$ is an all-ones matrix, GFM reduces to a linear autoencoder.

### Algorithm 1: Gradient-Descent Algorithm for GFM

| Data: | $R \in \mathbb{R}^{N \times T}$, $A \in \{0, 1\}^{N \times K}$ |
|-------|--------------------------------------------------------------|
| B     | $\leftarrow A \odot R^{N \times K}$ (Random initialization and sparsification) |
| W     | $\leftarrow A \odot R^{N \times K}$ (Random initialization and sparsification) |
| for $n \leftarrow 1$ to $E$ do | |
| &nbsp;&nbsp; $L \leftarrow 0$ | |
| &nbsp;&nbsp; for $t \leftarrow 1$ to $T$ do | |
| &nbsp;&nbsp;&nbsp;&nbsp; $f_t \leftarrow (A \odot W)^T R[:, t]$ | |
| &nbsp;&nbsp;&nbsp;&nbsp; $\hat{r}_t \leftarrow (A \odot B) \cdot f_t$ | |
| &nbsp;&nbsp;&nbsp;&nbsp; $L \leftarrow L + \| R[:, t] - \hat{r}_t \|_2^2$ | |
| &nbsp;&nbsp; end | |
| &nbsp;&nbsp; $L \leftarrow L + \lambda (\|A \odot W\|_F^2 + \|A \odot B\|_F^2)$ | |
| &nbsp;&nbsp; W $\leftarrow W - \eta \cdot \nabla_W L$ | |
| &nbsp;&nbsp; B $\leftarrow B - \eta \cdot \nabla_B L$ | |
| end | |
| Result: | $B \in \mathbb{R}^{N \times K}$, $W \in \mathbb{R}^{N \times K}$ |

### Algorithm 2: Inference with GFM

| Data: | $r_t \in \mathbb{R}^{N \times 1}$, $A \in \{0, 1\}^{N \times K}$, $B \in \mathbb{R}^{N \times K}$, $W \in \mathbb{R}^{N \times K}$ |
|-------|--------------------------------------------------------------------------------------------------------------|
| $f_t$ | $\leftarrow (A \odot W)^T r_t$ | |
| $\hat{r}_t$ | $\leftarrow (A \odot B) \cdot f_t$ | |
| Result: | $f_t \in \mathbb{R}^{K \times 1}$, $\hat{r}_t \in \mathbb{R}^{N \times 1}$ |

### CFM

Algorithms 3 and 4 outline the training procedure and the inference steps on a new data point. BARRA’s industrial model can be nested as a special case of CFM. When using only the sector feature, $A$ represents the bond-sector membership. As BARRA fixes the loading matrix to be $B = A$, no learning is involved and the factors that minimize the returns on day $t$ are given as $f_t = (A^T A)^{-1} A^T r_t$, which is the vector of sector averages.
# Algorithm 3: Gradient-Descent Algorithm for CFM

**Data:** R ∈ ℝ^{N×T}, A ∈ {0, 1}^N×K  
**B** ← A ⊕ ℝ^{N×K}  /* Random initialization and sparsification */  
**F** ← ℝ^{K×T}  /* Only for training */  
for n = 1 to E do  
&nbsp;&nbsp;&nbsp;&nbsp;L = 0;  
&nbsp;&nbsp;&nbsp;&nbsp;for t = 1 to T do  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;r̂_t ← (A ⊕ B) · F[:, t];  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;L ← L + ||R[:, t] - r̂_t||_2^2;  
&nbsp;&nbsp;&nbsp;&nbsp;end  
&nbsp;&nbsp;&nbsp;&nbsp;L ← L + λ ||A ⊕ B||_2^2;  
&nbsp;&nbsp;&nbsp;&nbsp;B ← B - η · ∇_B L;  
&nbsp;&nbsp;&nbsp;&nbsp;F ← F - η · ∇_F L;  
end  
**Result:** B ∈ ℝ^{N×K}

# Algorithm 4: Inference with CFM

**Data:** r_t ∈ ℝ^{N×1}, A ∈ {0, 1}^N×K, B ∈ ℝ^{N×K}  
f_t ← (A ⊕ B)^+ · r_t;  /* Moore-Penrose (Pseudo) Inverse */  
r̂_t ← (A ⊕ B) · f_t;  
**Result:** f_t ∈ ℝ^{K×1}, r̂_t ∈ ℝ^{N×1}

# Proof

The matrix $A^T A = \text{Diag}(N_1, N_2, ..., N_k)$ is a diagonal matrix whose jth entry corresponds to the number of bonds in sector j. Hence, the jth entry of $f_t$ is:

$f_t[j] = \frac{1}{N_j} \sum_{i} A_{j,i} r_{t,i}.$  (A-1)

where $A_{j,i} = 1$ if bond i belongs to sector j, and zero otherwise. Therefore, $f_t[j]$ is the average spread of bonds in sector j.

# BOND FEATURES

Exhibit A1 illustrates the distribution of bonds across different categorical features. Tenor is defined as the time in years between the issue and maturity dates. Bonds are grouped into five tenor buckets: 3Y, 5Y, 7Y, 10Y, and 15Y, based on the closest absolute difference from their tenor. Likewise, bonds are classified into one of three buckets based on their issued amount: medium (0.5 billion), large (1 billion), and very large (1.5 billion).

# FACTOR CORRELATION

Exhibits A2–A4 show the factor correlations for PCA, GFM, and CFM. The correlations are calculated using nine months of in-sample data from Fold 1.
# EXHIBIT A1
**Bond Distribution across Six Categorical Features: Sector, Rating, Country, Debt Type, Tenor, and Issued Amount**

| Sector                | Count |
|-----------------------|--------|
| Financial             | 962    |
| Consumer, Non-cyclical | 506    |
| Industrial            | 256    |
| Consumer, Cyclical    | 247    |
| Utilities             | 234    |
| Communications        | 177    |
| Basic Materials       | 97     |
| Energy                | 86     |
| Technology            | 59     |
| Diversified           | 1      |

| Rating | Count |
|---------|--------|
| AA      | 191    |
| A+      | 197    |
| A       | 277    |
| A−      | 440    |
| BBB+    | 539    |
| BBB     | 425    |
| BBB−    | 214    |
| NR      | 342    |

| Country             | Count |
|---------------------|--------|
| FRANCE              | 503    |
| UNITED STATES       | 449    |
| GERMANY             | 348    |
| UNITED KINGDOM      | 211    |
| NETHERLANDS         | 208    |
| SPAIN               | 165    |
| ITALY               | 147    |
| OTHER               | 103    |
| SWITZERLAND         | 98     |
| SWEDEN              | 81     |
| DENMARK             | 53     |
| IRELAND             | 53     |
| AUSTRALIA           | 46     |
| JAPAN               | 45     |
| BELGIUM             | 40     |
| FINLAND             | 38     |
| LUXEMBOURG          | 37     |

| Debt Type        | Count |
|------------------|--------|
| Senior           | 2,397  |
| Senior Non Preferred | 228  |

| Tenor | Count |
|--------|--------|
| 10Y    | 1,058  |
| 7Y     | 695    |
| 5Y     | 503    |
| 3Y     | 198    |
| 15Y    | 171    |

| Issued_Amount | Count |
|----------------|--------|
| 500M           | 1,662  |
| 1,000M         | 798    |
| 1,500M         | 165    |

*Note: The y-axis lists the categories (factors), and the x-axis indicates the number of bonds.*
# EXHIBIT A2
**Correlation of PCA Factors (in-sample [fold 1] from March 29, 2023, to December 29, 2023)**

|     | 0    | 1    | 2    | 3    | 4    | 5    | 6    |
|-----|-------|-------|-------|-------|-------|-------|-------|
| 0   | 1.00  | 0.02  | 0.06  | 0.04  | -0.03 | -0.07 | 0.01  |
| 1   | 0.02  | 1.00  | -0.03 | -0.01 | 0.06  | -0.00 | 0.02  |
| 2   | 0.06  | -0.03 | 1.00  | -0.04 | -0.04 | 0.02  | 0.01  |
| 3   | 0.04  | -0.01 | -0.04 | 1.00  | -0.01 | -0.01 | 0.04  |
| 4   | -0.03 | 0.06  | -0.04 | -0.01 | 1.00  | 0.07  | 0.03  |
| 5   | -0.07 | -0.00 | 0.02  | -0.01 | 0.07  | 1.00  | -0.07 |
| 6   | 0.01  | 0.02  | 0.01  | 0.04  | 0.03  | -0.07 | 1.00  |

*Note:* The diagonal values are 1, indicating perfect correlation with themselves. The off-diagonal values represent the correlation coefficients between different PCA factors. The color bar on the right indicates the strength and direction of the correlation, with darker red representing positive correlations close to 1, and blueish shades indicating negative correlations close to -1.
# EXHIBIT A3
## Correlation of GFM Factors (in-sample [fold 1] from March 29, 2023, to December 29, 2023)

![Heatmap showing correlation matrix of GFM factors. The matrix is symmetric with a diagonal of 1.0, indicating perfect correlation with itself. The heatmap uses a color gradient from blue (negative correlation) to red (positive correlation). Labels on axes include various market, sector, country, debt type, and rating factors.](https://i.imgur.com/your_image_link.png)

*Note: The actual image link is a placeholder; the description provides the content details.*
# EXHIBIT A4
## Correlation of CFM Factors (in-sample [fold 1] from March 29, 2023, to December 29, 2023)

![CFM Factor Correlation Heatmap](attachment)

*Description:*  
The heatmap displays the correlation coefficients between various CFM factors. The color scale on the right indicates the strength and direction of the correlation, ranging from -1.00 (dark blue) to 1.00 (dark red). The factors are listed along both axes, with labels including market, sector, ratings, country, debt types, and issued amounts.

|                        | market_market | sector_Basic Materials | sector_Communications | sector_Consumer_Cyclical | sector_Consumer_Non-cyclical | sector_Diversified | sector_Energy | sector_Financial | sector_Industrial | sector_Technology | sector_Utilities | rating_AA | rating_A+ | rating_A- | rating_BBB | rating_BBB+ | rating_BBB- | rating_NB | country_AUSTRALIA | country_BELGIUM | country_DENMARK | country_FINLAND | country_FRANCE | country_GERMANY | country_IRELAND | country_ITALY | country_JAPAN | country_LUXEMBOURG | country_NETHERLANDS | country_OTHER | country_SPAIN | country_SWEDEN | country_SWITZERLAND | country_UNITED KINGDOM | country_UNITED STATES | debt_type_senior | debt_type_subordinated | tenor_3Y | tenor_5Y | tenor_7Y | tenor_10Y | tenor_15Y | issued_amount_500M | issued_amount_1000M | issued_amount_1500M |
|------------------------|--------------|------------------------|------------------------|------------------------|----------------------------|------------------|--------------|----------------|------------------|----------------|----------------|---------|---------|---------|------------|------------|------------|--------|------------------|----------------|----------------|--------------|--------------|--------------|--------------|--------------|--------------|------------------|------------------|--------------|--------------|--------------|---------------------|---------------------|---------------------|------------------|---------------------|--------|--------|--------|---------|---------|------------------|------------------|-------------------|
| **market_market**     | 1.00         |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **sector_Basic Materials** |              | 1.00                   |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **sector_Communications** |              |                        | 1.00                   |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **sector_Consumer_Cyclical** |            |                        |                        | 1.00                   |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **sector_Consumer_Non-cyclical** |        |                        |                        |                        | 1.00                       |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **sector_Diversified** |                |                        |                        |                        |                            | 1.00             |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **sector_Energy** |                    |                        |                        |                        |                            |                  | 1.00         |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **sector_Financial** |                |                        |                        |                        |                            |                  |              | 1.00           |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **sector_Industrial** |                |                        |                        |                        |                            |                  |              |                | 1.00             |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **sector_Technology** |                |                        |                        |                        |                            |                  |              |                |                  | 1.00           |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **sector_Utilities** |                 |                        |                        |                        |                            |                  |              |                |                  |                | 1.00           |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **rating_AA** |                      |                        |                        |                        |                            |                  |              |                |                  |                |                | 1.00    |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **rating_A+** |                      |                        |                        |                        |                            |                  |              |                |                  |                |                |         | 1.00    |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **rating_A-** |                      |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         | 1.00    |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **rating_BBB** |                     |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         | 1.00       |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **rating_BBB+** |                     |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            | 1.00       |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **rating_BBB-** |                     |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            | 1.00       |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **rating_NB** |                        |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            | 1.00   |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **country_AUSTRALIA** |               |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        | 1.00             |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **country_BELGIUM** |                   |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  | 1.00           |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **country_DENMARK** |                   |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                | 1.00           |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **country_FINLAND** |                   |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                | 1.00         |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **country_FRANCE** |                    |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              | 1.00         |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **country_GERMANY** |                   |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              | 1.00         |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **country_IRELAND** |                   |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              | 1.00         |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **country_ITALY** |                     |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              | 1.00         |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **country_JAPAN** |                     |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              | 1.00         |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **country_LUXEMBOURG** |                |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              | 1.00             |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **country_NETHERLANDS** |               |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  | 1.00             |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **country_OTHER** |                     |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  | 1.00         |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **country_SPAIN** |                     |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              | 1.00         |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **country_SWEDEN** |                     |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              | 1.00         |                     |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **country_UNITED KINGDOM** |             |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              | 1.00                |                     |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **country_UNITED STATES** |              |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     | 1.00                |                     |                  |                     |        |        |        |         |         |                  |                  |                   |
| **debt_type_senior** |                   |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     | 1.00                |                  |                     |        |        |        |         |         |                  |                  |                   |
| **debt_type_subordinated** |             |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     | 1.00             |                     |        |        |        |         |         |                  |                  |                   |
| **tenor_3Y** |                            |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  | 1.00                |        |        |        |         |         |                  |                  |                   |
| **tenor_5Y** |                            |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                  | 1.00   |        |        |         |         |                  |                  |                   |
| **tenor_7Y** |                            |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                  |        | 1.00   |        |         |         |                  |                  |                   |
| **tenor_10Y** |                           |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                  |        |        | 1.00   |         |         |                  |                  |                   |
| **tenor_15Y** |                           |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                  |        |        |        | 1.00   |         |                  |                  |                   |
| **issued_amount_500M** |                  |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         | 1.00   |                  |                  |                   |
| **issued_amount_1000M** |                  |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         | 1.00             |                  |                   |
| **issued_amount_1500M** |                  |                        |                        |                        |                            |                  |              |                |                  |                |                |         |         |         |            |            |            |        |                  |                |                |              |              |              |              |              |              |                  |                  |              |              |              |                     |                     |                     |                  |                     |        |        |        |         |         |                  | 1.00             |                   |
## REFERENCES

Agarap, A. F. 2018. “Deep Learning using Rectified Linear Units (ReLU).” arXiv:1803.08375.

Alessandri, L., F. Cordero, M. Beccuti, N. Licheri, M. Arigoni, M. Olivero, M. F. Di Renzo, A. Sapino, and R. Calogero. 2021. “Sparsely-Connected Autoencoder (SCA) for Single Cell RNAseq Data Mining.” *NPJ Systems Biology and Applications* 7: 1–10.

Bailey, S. 2012a. “EMPCA: Weighted Expectation Maximization Principal Component Analysis.” github.

———. 2012b. “Principal Component Analysis with Noisy and/or Missing Data.” *Publications of the Astronomical Society of the Pacific* 124 (919): 1015–1023.

Barra, Inc. 1998. *Barra Risk Model Handbook*. Berkeley: Barra Inc.

Buechner, M., and L. Bybee. 2019. “Instruments Principal Components Analysis.” github.

Carhart, M. M. 1997. “On Persistence in Mutual Fund Performance.” *The Journal of Finance* 52 (1): 57–82.

Connor, G. 1995. “The Three Types of Factor Models: A Comparison of Their Explanatory Power.” *Financial Analysts Journal* 51 (3): 42–46.

Coqueret, G., and T. Guida. 2023. *Machine Learning for Factor Investing: Python Version*. New York: Chapman and Hall/CRC.

Fama, E. F., and K. R. French. 1993. “Common Risk Factors in the Returns on Stocks and Bonds.” *Journal of Financial Economics* 33 (1): 3–56.

———. 2015. “A Five-Factor Asset Pricing Model.” *Journal of Financial Economics* 116 (1): 1–22.

Gu, S., B. Kelly, and D. Xiu. 2021. “Autoencoder Asset Pricing Models.” *Journal of Econometrics* 222 (1): 429–450.

Guijarro-Ordonez, J., M. Pelger, and G. Zanotti. 2021. “Deep Learning Statistical Arbitrage.” SSRN.

Huynh, K., and G. Lenhard. 2022. “Asymmetric Autoencoders for Factor-Based Covariance Matrix Estimation.” In *Proceedings of the Third ACM International Conference on AI in Finance*, pp. 403–410. New York: Association for Computing Machinery.

Kelly, B., D. Palhares, and S. Pruitt. 2023. “Modeling Corporate Bond Returns.” *The Journal of Finance* 78 (4): 1967–2008.

Kelly, B. T., S. Pruitt, and Y. Su. 2019. “Characteristics Are Covariances: A Unified Model of Risk and Return.” *Journal of Financial Economics* 134 (3): 501–524.

Kingma, D. P., and J. Ba. 2014. “Adam: A Method for Stochastic Optimization.” arXiv:1412.6980.

Long, W., and V. Xiao. 2024. “A Deep Learning Approach for Trading Factor Residuals.” arXiv:2412.11432.

Rumelhart, D. E., G. E. Hinton, and R. J. Williams. 1986. “Learning Internal Representations by Error Propagation.” In *Parallel Distributed Processing: Explorations in the Microstructure of Cognition*, vol. 1: Foundations, pp. 318–362. Cambridge: MIT Press.

Sharpe, W. F. 1964. “Capital Asset Prices: A Theory of Market Equilibrium under Conditions of Risk.” *The Journal of Finance* 19 (3): 425–442.

Stubbs, R. A., and V. Jeet. 2016. “Adjusted Factor-Based Performance Attribution.” *The Journal of Portfolio Management* 42 (5): 67–78.

Wang, T., and S. Singh. 2024. “KAN Based Autoencoders for Factor Models.” arXiv:2408.02694.