Skip to content

hamedajorlou/Homophily-HT-Estimation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Homophily Estimation Using Horvitz-Thompson Estimators

This repository centers on Homophily.ipynb, a code-only notebook that loads DGL datasets, computes ground-truth homophily metrics, and evaluates Horvitz-Thompson (HT) estimators under different sampling schemes.

Notebook Overview

  • Dataset loaders for DGL: Cora, Citeseer, Pubmed, Cornell, Wisconsin, Texas, AmazonRatings, Chameleon, Squirrel, Questions
  • One-hot label features from node labels
  • Metrics: Dirichlet Energy (2W-normalized), Edge homophily, Node homophily
  • Sampling: Bernoulli and SRS node sampling; traceroute-style edge sampling on Karate Club
  • Outputs: histogram plots and results_summary.csv

Key File

  • Homophily.ipynb: Main notebook with dataset loaders, estimators, and experiments

Metrics

Dirichlet Energy (DE)

Normalized total variation: TV_G(X) / (2W) where TV_G(X) = sum_{(u,v) in E} w_uv ||x_u - x_v||^2

Edge Homophily (H_edge)

Fraction of edges connecting nodes with the same label (weighted)

Node Homophily (H_node)

Average, over non-isolated nodes, of the fraction of same-label neighbors

Sampling Methods

  • Bernoulli: Include each node independently with probability p
    • Edge inclusion probability: pi_e = p^2
  • SRS (Simple Random Sampling): Sample exactly k nodes without replacement
    • Edge inclusion probability: pi_e = k(k-1) / N(N-1)
  • Traceroute (notebook experiment): Union of shortest paths between sampled sources and targets on Karate Club; HT correction uses edge betweenness

Usage

  1. Install dependencies.
  2. Open Homophily.ipynb in Jupyter/Colab and run all cells.
  3. Edit the datasets, p_values, and metrics lists in the notebook to customize experiments.

Requirements

  • Python 3.8+
  • numpy
  • scipy
  • networkx
  • matplotlib
  • pandas
  • torch
  • dgl

Installation

pip install numpy scipy networkx matplotlib pandas torch
pip install dgl -f https://data.dgl.ai/wheels/torch-2.2/repo.html

Citation

If you use this code in your research, please cite:

@misc{homophily-ht-estimation,
  author = {Hamed Ajorlou, Gonzalo Mateos, Luana Ruiz},
  title = {Dirichlet meets Horvitz and Thompson: Estimating Homophily in Large
Graphs via Sampling},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/hamedajorlou/Homophily-HT-Estimation}
}

License

MIT License

About

Homophily estimation using Horvitz-Thompson estimators

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published