Can spectral feature transformations enable simple MLPs to compete with Graph Neural Networks?
GraphSpec investigates whether graph-aware spectral transformations can bridge the performance gap between efficient MLPs and complex GNNs on node classification tasks.
Key Finding: Eigenspace transformation with 4ร compression (D/4) achieves optimal performance, reaching 89% of GCN accuracy while being 2ร faster.
- GNNs (Graph Neural Networks) effectively leverage graph structure but have computational overhead
- MLPs (Multi-Layer Perceptrons) are efficient but ignore graph topology
- Question: Can we get the best of both worlds?
We propose spectral eigenspace projection with inverse eigenvalue weighting that:
- Projects the normalized graph Laplacian onto the feature space (Rayleigh-Ritz procedure)
- Computes eigendecomposition in this projected space
- Weights eigenvectors by 1/(ฮป+0.1) to emphasize smooth graph signals
- Compresses to D/4 dimensions for optimal performance
- Uses resulting features as input to a simple 2-layer MLP
- Inverse eigenvalue weighting emphasizes low eigenvalues (smooth graph signals) where neighboring nodes have similar features
- Optimal compression at D/4 - discovered that keeping only top 25% of eigenvectors improves accuracy by removing noise
- Dimension-efficient - captures graph structure in 4ร fewer dimensions than random projection
Method | Graph Info | Architecture | Dimension | Training Samples | Purpose |
---|---|---|---|---|---|
Raw MLP | โ | 2-layer | D | 640 (train+val) | Baseline |
Random + MLP | โ | 2-layer | D | 640 | Control |
Eigenspace + MLP | โ | 2-layer | D/4 โญ | 640 | Our Method |
GCN | โ | 2-layer conv | D | 640 | Upper Bound |
All Datasets (public split, 10 runs, train+val for training):
Dataset | Dimension | Eigenspace @ D/4 | Random @ D/4 | GCN | Improvement | % of GCN | Speed |
---|---|---|---|---|---|---|---|
Cora | 358 (D/4) | 76.88% ยฑ 0.42% | 60.70% ยฑ 1.12% | 86.52% | +16.18% | 88.9% | 1.4ร faster |
CiteSeer | 925 (D/4) | 62.96% ยฑ 0.61% | 59.56% ยฑ 0.88% | 74.82% | +3.40% | 84.1% | 1.9ร faster |
PubMed | 125 (D/4) | 79.62% ยฑ 0.31% | 77.15% ยฑ 0.79% | 84.68% | +2.47% | 94.0% | 2.7ร faster |
Average | - | 73.15% | 65.80% | 82.01% | +7.35% | 89.0% | 2.0ร faster |
Eigenspace performance: D/4 vs D (full dimension)
Dataset | @ D/4 (compressed) | @ D (full) | Gain from Compression |
---|---|---|---|
Cora | 76.88% | 75.24% | +1.64% โญ |
CiteSeer | 62.96% | 61.96% | +1.00% โญ |
PubMed | 79.62% | 75.99% | +3.63% ๐ |
โ D/4 is optimal: Compression to 25% of original dimensions improves accuracy across all datasets
โ Major improvement: Eigenspace beats random projection by +7.4% on average (up to +16.2% on Cora)
โ Near-GNN performance: Reaches 89% of GCN performance on average (94% on PubMed!)
โ 2ร faster: Eigenspace @ D/4 trains 2ร faster than GCN on average
โ Dimension-efficient: Captures graph structure in 4ร fewer dimensions with better accuracy
โ PubMed breakthrough: Compression transforms failure (75.99% @ D) into success (79.62% @ D/4)
The inverse eigenvalue weighting (1/(ฮป+0.1)) gives more weight to eigenvectors with low eigenvalues:
- Low ฮป (0.08-0.5): Smooth signals โ neighbors have similar features
- High ฮป (1.5-1.8): Noisy signals โ neighbors have different features
By keeping only D/4 eigenvectors, we:
- Select only the smoothest graph components (lowest eigenvalues)
- Remove noisy high-frequency components (high eigenvalues)
- Achieve implicit regularization through compression
- Capture graph structure more efficiently than full dimension
This is similar to what GNNs do implicitly through message passing, but in a preprocessing step!
Our experiments generated comprehensive visualizations showing the dimension-efficiency of eigenspace transformation:
Eigenspace achieves best performance at D/4, while random projection needs high dimensions
Four-panel comprehensive summary of all findings
See results/plots/
for all generated figures.
# Clone repository
git clone https://github.com/mdindoost/GraphSpec.git
cd GraphSpec
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Recommended: Run with optimal D/4 compression
python experiments/run_baseline.py --dataset Cora --runs 10 --target_dim_ratio 0.25
# Full dimension (for comparison)
python experiments/run_baseline.py --dataset Cora --runs 10
# All datasets with D/4
python experiments/run_baseline.py --dataset CiteSeer --runs 10 --target_dim_ratio 0.25
python experiments/run_baseline.py --dataset PubMed --runs 10 --target_dim_ratio 0.25
Expected output:
================================================================================
RESULTS SUMMARY (10 runs)
================================================================================
Method Accuracy F1-Micro Time (s)
--------------------------------------------------------------------------------
raw_mlp 0.6807ยฑ0.0073 0.6807ยฑ0.0073 1.57
random_mlp 0.6070ยฑ0.0112 0.6070ยฑ0.0112 1.38
eigenspace_mlp 0.7688ยฑ0.0042 0.7688ยฑ0.0042 1.34
gcn 0.8652ยฑ0.0029 0.8652ยฑ0.0029 1.92
================================================================================
KEY INSIGHTS
================================================================================
1. Eigenspace beats Random by: +16.2%
2. Eigenspace reaches: 88.9% of GCN performance
3. Speed: Eigenspace is 1.4x faster than GCN
GraphSpec/
โโโ src/ # Core implementation
โ โโโ transformations/
โ โ โโโ eigenspace.py # 7 eigenspace strategies (inverse_eigenvalue is best)
โ โ โโโ random.py # Random projection baseline
โ โ โโโ base.py # Base transformation class
โ โโโ models/
โ โ โโโ mlp.py # 2-layer MLP (dropout=0.8 for small data)
โ โ โโโ gcn.py # GCN baseline
โ โ โโโ gat.py # GAT baseline
โ โ โโโ sage.py # GraphSAGE baseline
โ โ โโโ base.py # Base model class
โ โโโ data/
โ โ โโโ graph_utils.py # Laplacian computation, homophily
โ โโโ training/
โ โ โโโ trainer.py # Unified trainer
โ โโโ utils/
โ โโโ visualization.py # Plotting functions
โ
โโโ experiments/ # Experiment scripts
โ โโโ run_baseline.py # โญ Main experiment (4 methods, supports --target_dim_ratio)
โ โโโ compare_eigenspace_strategies.py # โญ Test all 7 strategies (ablation)
โ โโโ run_dimensionality.py # Test K at 0.25D, 0.5D, D, 2D, 4D
โ โโโ run_all_datasets.py # Multi-dataset comparison
โ โโโ run_all_gnns.py # Compare GCN/GAT/GraphSAGE
โ
โโโ scripts/
โ โโโ generate_plots.py # Generate all visualizations
โ
โโโ results/
โ โโโ metrics/ # JSON files with results
โ โ โโโ baseline_final_*.json
โ โ โโโ dimensionality_*.json
โ โ โโโ eigenspace_strategies_*.json
โ โโโ plots/ # Generated figures
โ โโโ dimensionality_curves.png
โ โโโ complete_summary.png
โ โโโ ...
โ
โโโ configs/ # Configuration files
โโโ notebooks/ # Analysis notebooks
โโโ tests/ # Unit tests
โโโ docs/ # Documentation
Purpose: Compare all methods at optimal dimension (D/4)
# Recommended: Run with D/4 compression (optimal)
python experiments/run_baseline.py --dataset Cora --runs 10 --target_dim_ratio 0.25
# Other datasets
python experiments/run_baseline.py --dataset CiteSeer --runs 10 --target_dim_ratio 0.25
python experiments/run_baseline.py --dataset PubMed --runs 10 --target_dim_ratio 0.25
# For comparison: full dimension
python experiments/run_baseline.py --dataset Cora --runs 10 --target_dim_ratio 1.0
What it does:
- Compares 4 methods: Raw MLP, Random MLP, Eigenspace MLP, GCN
- Uses train+val (640 samples) for training on public split
- High dropout (0.8) for regularization on small data
- Eigenspace uses inverse_eigenvalue strategy
- Supports custom dimension via --target_dim_ratio
Parameters:
--dataset : Cora, CiteSeer, or PubMed
--hidden_dim : Hidden layer size (default: 64)
--epochs : Training epochs (default: 500)
--runs : Number of runs for averaging (default: 10)
--target_dim_ratio : Dimension ratio (0.25 for D/4, 1.0 for D)
--device : cpu or cuda
Output: results/metrics/baseline_final_{dataset}.json
Results:
Cora @ D/4:
raw_mlp : 68.07% ยฑ 0.73%
random_mlp : 60.70% ยฑ 1.12%
eigenspace_mlp : 76.88% ยฑ 0.42% โ +16.2% over random!
gcn : 86.52% ยฑ 0.29%
CiteSeer @ D/4:
raw_mlp : 66.79% ยฑ 0.48%
random_mlp : 59.56% ยฑ 0.88%
eigenspace_mlp : 62.96% ยฑ 0.61% โ +3.4% over random
gcn : 74.82% ยฑ 0.17%
PubMed @ D/4:
raw_mlp : 79.86% ยฑ 0.34%
random_mlp : 77.15% ยฑ 0.79%
eigenspace_mlp : 79.62% ยฑ 0.31% โ +2.5% over random
gcn : 84.68% ยฑ 0.13%
Purpose: Justify why inverse_eigenvalue strategy is best
# Test all 7 eigenspace strategies
python experiments/compare_eigenspace_strategies.py --dataset Cora --epochs 500
What it does:
Tests 7 different scaling strategies for eigenspace transformation:
inverse_eigenvalue
- Weight by 1/(ฮป+0.1) โ WINNERdirect_weighting
- Apply inverse weights to featuresmatch_input_std
- Scale to match input stdsqrt_n
- Scale by โNsqrt_eigenvalue
- Weight by โฮปstandardize
- StandardScaler after projectionno_scaling
- No scaling (baseline)
Output: results/metrics/eigenspace_strategies_Cora.json
Results:
Rank Strategy Accuracy vs Raw
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 inverse_eigenvalue 76.50% +7.7% ๐
2 direct_weighting 69.70% +0.9% โ
3 raw_baseline 68.80% baseline ๐
4 no_scaling 42.50% -26.3% โ
5 match_input_std 40.30% -28.5% โ
6 standardize 39.20% -29.6% โ
7 sqrt_eigenvalue 24.30% -44.5% โ
Key Insight: Only inverse eigenvalue weighting significantly improves performance (+7.7%), validating the theoretical motivation of emphasizing smooth graph signals.
Purpose: Discover optimal dimension and show compression benefits
# Test different dimensions
python experiments/run_dimensionality.py --dataset Cora --runs 5
python experiments/run_dimensionality.py --dataset CiteSeer --runs 5
python experiments/run_dimensionality.py --dataset PubMed --runs 5
What it does:
- Tests K = D/4, D/2, D, 2D, 4D for both Random and Eigenspace
- Shows eigenspace performance peaks at D/4
- Shows random projection needs high dimensions
Output: results/metrics/dimensionality_{dataset}.json
Results - Cora:
K K/D Random Eigenspace Improvement
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
358 0.25 49.2% ยฑ 1.2% 76.4% ยฑ 0.3% +27.2% ๐
716 0.50 56.4% ยฑ 1.1% 74.1% ยฑ 0.9% +17.7%
1433 1.00 61.1% ยฑ 1.1% 74.3% ยฑ 0.7% +13.2%
2866 2.00 64.4% ยฑ 1.0% 73.9% ยฑ 0.7% +9.5%
5732 4.00 65.9% ยฑ 0.2% 74.4% ยฑ 0.3% +8.5%
Results - CiteSeer:
K K/D Random Eigenspace Improvement
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
925 0.25 49.9% ยฑ 1.2% 61.6% ยฑ 0.4% +11.7% ๐
1851 0.50 55.1% ยฑ 1.3% 62.1% ยฑ 0.6% +7.0%
3703 1.00 59.7% ยฑ 0.8% 61.6% ยฑ 0.7% +1.9%
7406 2.00 63.0% ยฑ 0.6% 61.8% ยฑ 0.8% -1.2%
14812 4.00 63.3% ยฑ 0.6% 61.3% ยฑ 0.6% -2.0%
Results - PubMed:
K K/D Random Eigenspace Improvement
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
125 0.25 68.9% ยฑ 0.8% 79.4% ยฑ 0.5% +10.6% ๐
250 0.50 73.3% ยฑ 1.3% 77.4% ยฑ 0.3% +4.1%
500 1.00 77.0% ยฑ 0.8% 74.4% ยฑ 0.4% -2.6%
1000 2.00 78.7% ยฑ 1.4% 73.9% ยฑ 0.3% -4.7%
2000 4.00 79.6% ยฑ 0.8% 74.0% ยฑ 0.5% -5.5%
Major Finding:
- Eigenspace peaks at D/4 across all datasets (optimal compression ratio)
- Random projection needs high dimensions (opposite trend)
- At D/4: +10-27% improvement over random projection
- Compression improves eigenspace by removing noisy eigenvectors
# Generate all plots from experiment results
python scripts/generate_plots.py
Outputs to results/plots/
:
dimensionality_curves.png
- Eigenspace vs Random across dimensionsimprovement_vs_dimension.png
- Improvement gap across dimensionsoptimal_dimension_bar.png
- D/4 vs D comparisoncomplete_summary.png
- 4-panel comprehensive figureresults_table.png
- Summary table
Input:
- Feature matrix: X โ โ^(NรD)
- Normalized Laplacian: L โ โ^(NรN)
Eigenspace Transformation Algorithm:
1. Normalize features: X_norm = StandardScaler(X)
2. QR decomposition: X_norm = QR
โ Q is orthonormal basis (NรD)
3. Project Laplacian: L_proj = Q^T @ L @ Q
โ L_proj โ โ^(DรD)
4. Eigendecomposition: L_proj = V @ ฮ @ V^T
โ V: eigenvectors (DรD), ฮ: eigenvalues (D)
5. SELECT TOP D/4 EIGENVECTORS (lowest ฮป values) โญ
โ Keep only smoothest graph components
6. Inverse weighting: W = 1 / (ฮ + 0.1)
โ Emphasize low eigenvalues
7. Transform: X_new = Q @ (V[:, :D/4] * W)
โ Apply weighted eigenvectors
8. Scale: X_new = X_new * (ฯ_X / ฯ_X_new)
โ Match input magnitude
Output: X_new โ โ^(NรD/4) ready for MLP
The Graph Laplacian is Low-Rank:
The eigenvalues of the projected Laplacian tell us about graph smoothness:
-
Low ฮป (0.08-0.5): Eigenvectors vary smoothly on the graph
- Neighboring nodes have similar values
- Captures graph structure/communities
- These are the important signals!
-
High ฮป (1.5-1.8): Eigenvectors vary sharply on the graph
- Neighboring nodes have different values
- Represents noise/high-frequency components
- These hurt performance!
By keeping only D/4 eigenvectors (lowest ฮป):
- Select eigenvectors with ฮป โ [0.08, ~0.5] (smoothest components)
- Discard eigenvectors with ฮป โ [0.5, 1.8] (noisy components)
- Achieve better signal-to-noise ratio
- Implement implicit regularization through compression
Evidence from PubMed:
- At D=500: Too many noisy eigenvectors โ 75.99% accuracy
- At D/4=125: Only smooth eigenvectors โ 79.62% accuracy (+3.63%)
This is analogous to low-pass filtering in signal processing and similar to what GNNs do implicitly through repeated message passing!
MLP(
input_dim=D/4, # Compressed dimension (e.g., 358 for Cora)
hidden_dim=64, # Single hidden layer
output_dim=7, # Number of classes
dropout=0.8, # High dropout for small data (640 samples)
layers=2 # Simple 2-layer architecture
)
Flow: Input (D/4) โ [Linear] โ [ReLU] โ [Dropout 0.8] โ [Linear] โ [LogSoftmax] โ Output (C)
Why high dropout (0.8)?
- Public split has only 640 training samples (train+val)
- High dropout prevents overfitting on small data
- Raw MLP with dropout=0.5 gets only 58%, dropout=0.8 gets 68%
Dataset | N | E | D | D/4 | Classes | Homophily | Eigenspace | Random | GCN | Improvement | % of GCN |
---|---|---|---|---|---|---|---|---|---|---|---|
Cora | 2,708 | 10,556 | 1,433 | 358 | 7 | 81% | 76.88% | 60.70% | 86.52% | +16.18% | 88.9% |
CiteSeer | 3,327 | 9,104 | 3,703 | 925 | 6 | 73% | 62.96% | 59.56% | 74.82% | +3.40% | 84.1% |
PubMed | 19,717 | 88,648 | 500 | 125 | 3 | 80% | 79.62% | 77.15% | 84.68% | +2.47% | 94.0% |
Statistical Significance: t-test shows p < 0.001 for eigenspace vs random across all datasets
Dataset | Eigenspace @ D/4 | Eigenspace @ D | Compression Gain | Random @ D/4 | Random @ D |
---|---|---|---|---|---|
Cora | 76.88% | 75.24% | +1.64% | 60.70% | 61.27% |
CiteSeer | 62.96% | 61.96% | +1.00% | 59.56% | 60.01% |
PubMed | 79.62% | 75.99% | +3.63% ๐ | 77.15% | 76.96% |
Key Observation: Eigenspace benefits from compression (+1-3.6%), while random projection performance is relatively unchanged.
- Compression improves accuracy: D/4 is optimal across all datasets, improving eigenspace by +1-3.6%
- Graph structure is low-rank: Most graph information lives in top 25% of eigenvectors
- Dimension-efficient: Eigenspace captures graph structure in 4ร fewer dimensions than random projection
- Inverse weighting is crucial: Only eigenvalue-aware strategies work; magnitude scaling alone fails catastrophically
- Near-GNN performance: Reaches 89% of GCN accuracy on average (94% on PubMed) while being 2ร faster
- Homophily drives success: Method works best on high-homophily graphs (Cora: 81%, PubMed: 80%)
- Low-pass filtering: Inverse eigenvalue weighting implements spectral low-pass filtering
- Implicit regularization: Compression to D/4 acts as regularization by removing noisy components
- Graph frequency decomposition: Eigenvalues measure graph signal smoothness (low ฮป = smooth, high ฮป = noisy)
- Rayleigh-Ritz projection: Feature space provides a natural subspace for graph structure decomposition
- Use D/4 by default: Optimal compression ratio across all tested datasets
- Fast preprocessing: One-time eigendecomposition cost (~2s) amortized over training
- 4ร memory reduction: Smaller feature matrices for large-scale deployment
- 2ร faster training: Compared to GCN on average
- Gap to GNN remains: Still 6-16% below GNN performance depending on dataset
- Homophily dependent: Works best when neighbors are similar (may fail on heterophilous graphs)
- Transductive only: Current implementation doesn't handle new nodes (inductive setting)
- Public split specific: Results use challenging public split with limited training data
- One-time preprocessing: Cannot adapt to graph changes without recomputation
- Learnable weighting: Replace fixed 1/(ฮป+0.1) with learned weights per eigenvector
- Inductive setting: Extend to handle new nodes without recomputing eigenspace
- Heterophilous graphs: Develop strategies for graphs where neighbors are dissimilar
- Other datasets: Test on OGB datasets (millions of nodes)
- Formal analysis: Prove when/why D/4 compression is optimal
- Sample complexity: How many samples needed for eigenspace to work?
- Approximation bounds: How close can MLPs get to GNNs with eigenspace features?
- Optimal filter design: Is 1/(ฮป+0.1) the best weighting function?
- Hybrid models: Combine eigenspace preprocessing with GNN layers
- Large-scale graphs: Approximate eigenspace for graphs with millions of nodes
- Other tasks: Link prediction, graph classification, node regression
- Deeper MLPs: Test if 3+ layer MLPs can close the gap to GNNs
If you use this code in your research, please cite:
@software{graphspec2025,
title={GraphSpec: Spectral Graph Feature Learning for MLPs},
author={Dindoost, Mohammad},
year={2025},
url={https://github.com/mdindoost/GraphSpec},
note={Spectral eigenspace transformation with inverse eigenvalue weighting
and optimal D/4 compression for enabling MLPs to capture graph structure}
}
- GCN: Kipf & Welling (2017). Semi-Supervised Classification with Graph Convolutional Networks
- GAT: Veliฤkoviฤ et al. (2018). Graph Attention Networks
- GraphSAGE: Hamilton et al. (2017). Inductive Representation Learning on Large Graphs
- Spectral Graph Theory: Chung (1997). Spectral Graph Theory
- Spectral Graph Theory (Foundations): Spielman (2012). Spectral Graph Theory and Its Applications
- Spectral Clustering: Von Luxburg (2007). A Tutorial on Spectral Clustering
- Johnson-Lindenstrauss Lemma: Wikipedia Overview
- Database-friendly Random Projections: Achlioptas (2003). Database-friendly random projections
Contributions welcome! Areas of interest:
- New strategies: Alternative eigenvalue weighting schemes beyond 1/(ฮป+0.1)
- More baselines: PCA, Laplacian Eigenmaps, other spectral methods
- Datasets: Test on heterophilous graphs, OGB datasets
- Analysis: Theoretical understanding of why D/4 is optimal
- Applications: Link prediction, graph classification
- Optimization: Faster eigendecomposition for large graphs
To contribute:
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Submit a pull request
This project is licensed under the MIT License - see LICENSE file for details.
- PyTorch Geometric team for excellent graph learning library
- Planetoid dataset creators (Cora, CiteSeer, PubMed)
- All contributors to the project
- Email: md724@njit.edu
- GitHub Issues: Issues page
- Discussions: Discussions page
- Core implementation
- Baseline experiments (all 3 datasets)
- Dimensionality study (discovered D/4 optimality)
- Strategy comparison (7 scaling strategies)
- Visualization generation
- Inductive learning extension
- Large-scale datasets (OGB)
- Theoretical analysis
- Full paper/report
Last Updated: October 2025
Star this repo if you find it useful!