SONAR: A Large-Scale Social Network Benchmark for Graph Anomaly Detection

Piridi et al. "SONAR: A Large-Scale Social Network Benchmark for Graph-Based Anomaly Detection." Submitted to SIGIR 2026.

SONAR (SOcial Network Anomaly Resource) is the largest publicly available heterogeneous graph benchmark for anomaly detection in social networks. Built from real X (formerly Twitter) data spanning 11 months of activity during the Indian Farmers' Protest, SONAR captures 3.8 million users, 3.6 million posts, and 7 relation types — enabling the first systematic evaluation of graph anomaly detectors at realistic social network scale.

Why SONAR?

Graph anomaly detection research is held back by benchmarks that are too small, too simple, and too homogeneous. Existing datasets top out at 1M users with a single relation type, while real social platforms have billions of users interacting through diverse mechanisms. No prior benchmark provides both large-scale authentic social network data and controlled anomaly ground truth at multiple granularities.

Comparison with existing benchmarks

Dataset	Users	Relations	Heterogeneous	Anomaly Labels
Cresci-15	5,301	1		User only
TwiBot-20	229,580	1		User only
MGTAB	410,199	4	✓	User only
TwiBot-22	1,000,000	1		User only
SONAR-Large	3,797,980	7	✓	User + Post

SONAR addresses four critical gaps:

3.8x larger scale than TwiBot-22 (3.8M vs 1M users), enabling evaluation at realistic social network sizes
Rich multi-relational structure with 3 node types and 7 edge types capturing the full spectrum of X/Twitter interactions (posting, replying, quoting, mentioning, hashtag usage)
Dual-granularity anomaly labels at both user and post level — the first social network benchmark to offer this — enabling fine-grained, multi-task evaluation
Controlled anomaly injection using established PyGOD methods: structural anomalies (coordinated cliques simulating bot networks) and contextual anomalies (attribute perturbations) at a 5% rate

Dataset Overview

SONAR is available at three scales to support both rapid prototyping and scalability research:

Variant	Users	Posts	Hashtags	Total Nodes	Edges	Anomalies
Small	18,430	18,429	1	36,860	49,865	1,818
Medium	424,446	422,032	18	846,496	1,112,995	41,830
Large	3,797,980	3,611,869	152	7,410,001	10,204,721	365,861

Graph Schema

The heterogeneous graph models the full X/Twitter interaction spectrum:

Edge Type	Source	Target	Semantics
`post_original`	User	Post	User authors a post
`post_quote`	User	Post	User quotes a post
`post_reply`	User	Post	User replies to a post
`quotes`	Post	Post	Post quotes another post
`replies`	Post	Post	Post replies to another post
`mentions`	Post	User	Post mentions a user
`contains`	Post	Hashtag	Post contains a hashtag

The figure below shows an example subgraph from SONAR illustrating the multi-relational structure with users (blue), tweets (green), and hashtags (purple):

Node Features

Node Type	Dim	Features
User	4	followers_count, following_count, listed_count, post_count
Post	772	repost_count, quote_count, like_count, post_type + 768-d Universal Sentence Encoder embedding
Hashtag	1	category label

The homogeneous representation projects all nodes into a shared 16-dimensional feature space suitable for standard PyGOD detectors.

Anomaly Types

SONAR injects two complementary anomaly types at a 5% rate:

Structural anomalies: Coordinated cliques where selected users are fully connected to selected posts, simulating bot networks that artificially amplify content
Contextual anomalies: Attribute perturbations using Euclidean distance maximization, simulating accounts with suspicious engagement metrics that deviate from their structural neighborhood

Installation

# Install PyTorch first (see https://pytorch.org/get-started)
pip install torch

# Then install sonar-graph
pip install sonar-graph

For development (includes torch-sparse, torch-scatter, pytest, ruff, jupyter):

git clone https://github.com/hpiridi/sonar.git
cd sonar
pip install -e ".[dev]"

Quick Start

from sonar import SONAR, dataset_summary, evaluate_detector

# Load small dataset (auto-downloaded, ~60MB)
dataset = SONAR(root="./data", name="small", anomalies=True)
data = dataset[0]

print(dataset_summary(data))
# {'type': 'homogeneous', 'num_nodes': 36860, 'num_edges': 49865,
#  'num_features': 16, 'num_anomalies': 1818, 'anomaly_ratio': 0.0493}

# Run a detector
from pygod.detector import DOMINANT
detector = DOMINANT(epoch=5, gpu=0)
detector.fit(data)
_, score = detector.predict(data, return_pred=True, return_score=True)

# Evaluate
print(evaluate_detector(data.y_outlier, score))
# {'roc_auc': 0.7384, 'average_precision': 0.0825, 'recall_at_k': 0.0286}

Load the heterogeneous variant to access the full multi-relational structure:

dataset = SONAR(root="./data", name="small", anomalies=False,
                representation="heterogeneous")
data = dataset[0]
# HeteroData(user={x=[18430, 4]}, tweet={x=[18429, 772]}, hashtag={x=[1, 1]}, ...)

Benchmark Results

We benchmark 16 detectors spanning deep graph, classical graph, and non-graph approaches on SONAR-Small:

Type	Detector	ROC-AUC	Avg Precision	Recall@k	Time (s)	Device
Deep Graph	AdONE	0.8459	0.1672	0.0875	16.12	GPU
	DONE	0.8407	0.1599	0.0721	15.92	GPU
	GCNAE (GAE)	0.8025	0.1806	0.1518	0.80	GPU
	DOMINANT	0.7384	0.0825	0.0286	15.85	GPU
	CONAD	0.7375	0.0824	0.0292	24.84	GPU
	AnomalyDAE	0.6858	0.2569	0.3388	16.15	GPU
	DMGD	0.6366	0.0646	0.0237	140.81	CPU
	ONE	0.5705	0.1257	0.1430	17.79	GPU
	CoLA	0.3528	0.0544	0.1194	0.79	GPU
	OCGNN	0.2294	0.0315	0.0270	0.92	GPU
Classical Graph	ANOMALOUS	0.7997	0.4305	0.4455	11.76	GPU
	Radar	0.7997	0.4305	0.4455	207.45	CPU
	SCAN	0.7526	0.5223	0.5198	44.97	GPU
Non-graph	IF	0.6518	0.1381	0.1865	0.62	CPU
	MLPAE	0.5680	0.0875	0.1078	35.27	CPU
	LOF	0.4284	0.0589	0.0567	1.38	CPU

Note: PyGOD's GAE implements a GCN-based autoencoder (GCNAE), not the variational GAE from Kipf & Welling (2016). DMGD and Radar ran on CPU due to GPU OOM. Three detectors (GAAN, GADNR, GUIDE) are excluded due to OOM or version incompatibility.

Key Findings

Deep graph methods lead on ranking but not precision: AdONE and DONE achieve the best ROC-AUC (84.59%, 84.07%), indicating strong overall separation between anomalies and normals. However, their AP (16.72%, 15.99%) and Recall@k (8.75%, 7.21%) are significantly lower, revealing that deep autoencoders produce smooth, continuous anomaly scores that rank well in aggregate but fail to concentrate true anomalies at the top of the prediction list.
Classical graph methods excel at precision: SCAN achieves the highest AP (52.23%) and Recall@k (51.98%) despite a lower ROC-AUC (75.26%). Its discrete structural clustering produces fewer but more precise predictions (933 outliers detected vs. AdONE's 3,686), making it more suitable for practical settings where analysts investigate top-k alerts. ANOMALOUS and Radar both reach ROC-AUC of 0.80 with AP of 43.05%, showing that classical graph-aware methods effectively capture both structural and contextual anomalies.
ROC-AUC alone is misleading for anomaly detection: The divergence between ROC-AUC and AP/Recall@k across detectors highlights the importance of evaluating with multiple metrics. A detector with high ROC-AUC may still produce many false positives at any practical operating threshold, while a lower-ROC-AUC detector like SCAN can be far more actionable.
Non-graph baselines provide context: Isolation Forest (ROC-AUC 0.65) and MLPAE (0.57) show that feature-only methods capture some signal, but graph-aware methods substantially outperform them, validating the importance of relational structure.
Efficiency varies 228x: IF completes in 0.62s while Radar requires 207.45s (on CPU), highlighting significant runtime-accuracy trade-offs across method types.

See results/ for full JSON results.

Reproducing Results

Run a single detector:

uv run python run_detector.py --dataset-name small --algorithm DOMINANT --epoch 5

Run all detectors:

bash run_all.sh

Use a custom dataset:

uv run python run_detector.py --dataset path/to/graph.pickle --algorithm DOMINANT

Benchmark configurations (epoch, contamination, detector list) are documented in benchmarks/configs/small.yaml.

Project Structure

sonar/                      # Python package (pip install sonar-graph)
  dataset.py                # PyG InMemoryDataset loader with auto-download
  utils.py                  # evaluate_detector(), dataset_summary()
tests/                      # pytest suite (17 fast + 8 slow tests)
notebooks/
  quickstart.ipynb          # Load, explore, detect, evaluate
  benchmark_analysis.ipynb  # Reproduce paper tables and figures
results/                    # Pre-computed benchmark results (JSON)
benchmarks/configs/         # Hyperparameter configurations
scripts/                    # Data conversion utilities
run_detector.py             # CLI benchmark runner
run_all.sh                  # Run all detectors

Dataset Access

Variant	Access	Size
Small	Auto-downloaded via `SONAR` loader	~60 MB
Medium	Contact authors (see below)	~1.5 GB
Large	Contact authors (see below)	~12 GB

The medium and large datasets exceed GitHub's LFS file size limits, so they cannot be hosted on GitHub. To access them, please contact the authors:

Hari Prasad Piridi — p20210102@hyderabad.bits-pilani.ac.in
Dipanjan Chakraborty — dipanjan@hyderabad.bits-pilani.ac.in

Please include your affiliation and intended use.

License

Code: MIT License
Data: Creative Commons Attribution 4.0 International (CC-BY-4.0)

Citation

@misc{piridi2026sonar,
  title     = {{SONAR}: A Large-Scale Social Network Benchmark for Graph-Based Anomaly Detection},
  author    = {Piridi, Hari Prasad and Agarwal, Sheyril and Singh, Anirudh and
               Duddupudi, Sailesh and Yarramsetty, Sanjeeva Sai Preetham and
               Shyamendra, Pavan and Enaganti, Shreya and Ratra, Vastav and
               Upadhyay, Prajna Devi and Chandra, Priyank and Chakraborty, Dipanjan},
  note      = {Submitted to SIGIR 2026},
  year      = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
assets		assets
benchmarks/configs		benchmarks/configs
notebooks		notebooks
results		results
scripts		scripts
sonar		sonar
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE-DATA		LICENSE-DATA
README.md		README.md
pyproject.toml		pyproject.toml
run_all.sh		run_all.sh
run_detector.py		run_detector.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SONAR: A Large-Scale Social Network Benchmark for Graph Anomaly Detection

Why SONAR?

Dataset Overview

Graph Schema

Node Features

Anomaly Types

Installation

Quick Start

Benchmark Results

Key Findings

Reproducing Results

Project Structure

Dataset Access

License

Citation

About

Licenses found

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SONAR: A Large-Scale Social Network Benchmark for Graph Anomaly Detection

Why SONAR?

Dataset Overview

Graph Schema

Node Features

Anomaly Types

Installation

Quick Start

Benchmark Results

Key Findings

Reproducing Results

Project Structure

Dataset Access

License

Citation

About

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages