This repository centers on Homophily.ipynb, a code-only notebook that loads DGL datasets, computes ground-truth homophily metrics, and evaluates Horvitz-Thompson (HT) estimators under different sampling schemes.
- Dataset loaders for DGL: Cora, Citeseer, Pubmed, Cornell, Wisconsin, Texas, AmazonRatings, Chameleon, Squirrel, Questions
- One-hot label features from node labels
- Metrics: Dirichlet Energy (2W-normalized), Edge homophily, Node homophily
- Sampling: Bernoulli and SRS node sampling; traceroute-style edge sampling on Karate Club
- Outputs: histogram plots and
results_summary.csv
Homophily.ipynb: Main notebook with dataset loaders, estimators, and experiments
Normalized total variation: TV_G(X) / (2W) where TV_G(X) = sum_{(u,v) in E} w_uv ||x_u - x_v||^2
Fraction of edges connecting nodes with the same label (weighted)
Average, over non-isolated nodes, of the fraction of same-label neighbors
- Bernoulli: Include each node independently with probability p
- Edge inclusion probability:
pi_e = p^2
- Edge inclusion probability:
- SRS (Simple Random Sampling): Sample exactly k nodes without replacement
- Edge inclusion probability:
pi_e = k(k-1) / N(N-1)
- Edge inclusion probability:
- Traceroute (notebook experiment): Union of shortest paths between sampled sources and targets on Karate Club; HT correction uses edge betweenness
- Install dependencies.
- Open
Homophily.ipynbin Jupyter/Colab and run all cells. - Edit the
datasets,p_values, andmetricslists in the notebook to customize experiments.
- Python 3.8+
- numpy
- scipy
- networkx
- matplotlib
- pandas
- torch
- dgl
pip install numpy scipy networkx matplotlib pandas torch
pip install dgl -f https://data.dgl.ai/wheels/torch-2.2/repo.htmlIf you use this code in your research, please cite:
@misc{homophily-ht-estimation,
author = {Hamed Ajorlou, Gonzalo Mateos, Luana Ruiz},
title = {Dirichlet meets Horvitz and Thompson: Estimating Homophily in Large
Graphs via Sampling},
year = {2025},
publisher = {GitHub},
url = {https://github.com/hamedajorlou/Homophily-HT-Estimation}
}MIT License