Masterface Attacks on Face Verification

Pairwise Accuracy Is Not Security: Masterface Attacks Expose a Structural Vulnerability in Face Verification Ehsan Nazari, 2026 📄 Technical report: technical_report.pdf

Face verification asks whether two face images depict the same person. A model embeds each face as a vector and accepts a match when the two fall within a threshold distance. It powers phone unlock, online ID checks, and border control.

TL;DR

You might assume that high pairwise verification accuracy implies a model is robust against adversaries. After all, a score like 99.65% on a face verification benchmark can create the impression that the system is secure.

It isn’t. Pairwise verification benchmarks were never designed to expose security vulnerabilities, and this report makes that gap concrete by showing how much attack surface can remain hidden behind today’s high-accuracy face verification models.

CosFace achieves 99.65% pairwise accuracy on LFW at FAR≈0.001, yet the same model can be fooled by nine optimization-crafted face embeddings that collectively impersonate 47.2% of LFW’s 5,749 identities; about 2,700 people. In each case, at least one of the nine embeddings is accepted as the same person under the verification threshold. This is not unique to one model: we observe the same pattern across ArcFace, CosFace, AdaFace-IR101, AdaFace-ViT, FaceNet-CASIA, and FaceNet-VGG2.

The disconnect at a glance

CosFace under three pipeline configurations. Pairwise accuracy (left): uniformly ≈99.6% — the standard benchmark sees no difference. Masterface coverage (right): 47.2% → 0.23%, a more-than-200× swing. Same model, same dataset.

Three contributions

An optimization-based masterface attack. Two phases; LM-MA-ES on the embedding hypersphere (Phase 1) followed by PGD-style image-space realization (Phase 2). Reaches 49.957% Phase-2 coverage on FaceNet-CASIA, surpassing the prior GAN-based result of Shmelkin et al. (2021) at 43.82%, with a simpler, non-generative pipeline.
Two pipeline "footguns". The FAR-threshold rounding direction and the face-alignment strategy each modulate masterface coverage by up to two orders of magnitude, while pairwise accuracy moves by ≤0.1 pp. These pipeline choices are security-critical yet structurally invisible to standard benchmarks.
The irreducible floor (JT-Attack). Even with both pipeline knobs set to their safest values, every model still admits a single embedding covering any 2–4 randomly chosen identities ~100% of the time. Pipeline mitigations bound the magnitude of large-scale attacks; the small-$N$ targeted floor lies in the models themselves.

Headline results

Accumulative coverage from nine masterface embeddings (one per spherical-k-means cluster, k=9), evaluated against all 5,749 LFW identities. MTCNN = MTCNN-DavidSandberg, Retina = RetinaFace; under = FAR≈0.001 from below, over = from above.

Model	Pipeline	Coverage %	Accuracy %
CosFace	MTCNN / under	47.2	99.65
CosFace	Retina / over	24.6	99.58
CosFace	Retina / under	0.23	99.58
ArcFace	MTCNN / under	45.7	95.47
ArcFace	Retina / over	21.4	99.55
ArcFace	Retina / under	0.45	99.55
AdaFace-IR101	MTCNN / under	32.3	99.62
AdaFace-IR101	Retina / over	22.5	99.58
AdaFace-IR101	Retina / under	0.37	99.58
AdaFace-ViT	MTCNN / under	6.6	99.90
AdaFace-ViT	Retina / over	28.8	99.70
AdaFace-ViT	Retina / under	2.4	99.70
FaceNet (CASIA)	MTCNN / under	37.3	98.97
FaceNet (CASIA)	Retina / over	58.1	97.97
FaceNet (CASIA)	Retina / under	41.4	97.97
FaceNet (VGG2)	MTCNN / under	24.2	99.53
FaceNet (VGG2)	Retina / over	34.8	99.40
FaceNet (VGG2)	Retina / under	25.6	99.40

The CosFace MTCNN / under row is the load-bearing exhibit: the highest pairwise accuracy in the table coexists with the highest coverage at the strictest threshold direction.

Method

Phase 1 — embedding-space search

Given a set of target identity embeddings $\mathcal{P}\subset\mathbb{R}^d$ on the unit hypersphere, Phase 1 searches for a point $\mathbf{x}^*$ whose $\tau$-neighborhood covers as many of them as possible. The non-differentiable max-coverage objective is replaced with a smooth surrogate

$$\mathcal{L}(\mathbf{x}) = \frac{1}{|\mathcal{P}|}\left[w \cdot \sum_{p \in \mathcal{P}} \mathbb{1}[d(\mathbf{x},p) > \tau] ; + ; (1-w) \cdot \sum_{p \in \mathcal{P}} d(\mathbf{x},p)\right],$$

optimized with LM-MA-ES (Loshchilov et al., 2017) — 1,000 generations, population 100, $w = 0.99$. Identity sets are partitioned into 9 clusters via spherical $k$-means to match the budget reported by prior masterface work.

Phase 2 — image realization

Given a masterface embedding $\hat{\mathbf{x}}^$, Phase 2 iteratively perturbs a source face $\mathbf{s}$ in pixel space (Adam, PGD-style perturbation budget $\epsilon$) so that its embedding under the face mapper $FM$ approaches $\hat{\mathbf{x}}^$. The result is a plausible-looking face whose embedding lies within $\tau$ of many unrelated identities. 88–97% of Phase-1 coverage survives Phase-2 realization under the dangerous pipeline (Table 5 in the report).

The irreducible floor (JT-Attack)

The Joint-Threshold Attack asks: under the safest pipeline configuration (RetinaFace + FAR≈0.001 from below), can a single embedding cover $N$ randomly chosen identities? For $N \in {2, 3, 4}$, the answer is yes, ~100% of the time, across every model tested. The theoretical maximum for a well-separated embedding space is $1/N$ (red dashed) — the observed curves blow past it. Pipeline mitigations bound the magnitude of broad attacks but leave targeted small-group attacks fully open. This is a model-level property, independent of any evaluation choice.

📄 Full technical report: technical_report.pdf. All experimental details, threshold definitions, and additional results live there; this section covers only the commands needed to regenerate the artifacts.

Reproducing the results

Quick start

# 1) Two conda envs (PyTorch + TensorFlow).
conda env create -f environments/master.yml          # ArcFace, CosFace, AdaFace (PyTorch)
conda env create -f environments/master_facenet.yml  # FaceNet variants (TensorFlow)

# 2) Run any config.
conda activate master
python run.py --config <name>            # e.g. cosface_mtcnn_below

Weights, LFW, LMDBs, and embedding caches are auto-built on first use under data/ (the first run pays a one-time download + alignment cost). ArcFace and CosFace weights have no scriptable URL — on first use the loader prints a 3-step pointer to the InsightFace OneDrive folder; download the files manually and drop them in the path it specifies.

Configs live under configs/<section>/<name>.yaml and are looked up by name across all subdirectories. The YAML's attack_mode field selects between two attacks:

attack_mode: masterface (default) — Phase-1 GA embedding search + optional Phase-2 image realization.
attack_mode: jt — JT-Attack: joint-threshold attack across N randomly chosen identities.

Models

Name in configs	Source	Architecture	Reported LFW acc.
`arcface`	insightface	IResNet-100	99.52%
`cosface`	insightface	IResNet-100	99.58%
`adaface_ir101`	AdaFace	IResNet-101	99.58%
`adaface_vit`	CVLface	ViT-Base	99.70%
`facenet_casia`	davidsandberg/facenet	InceptionResNet	97.97%
`facenet_vgg2`	davidsandberg/facenet	InceptionResNet	99.40%

Activate master for the first four, master_facenet for the FaceNet variants. reproduce_all.sh switches automatically.

Reproducing all the experiments

./scripts/reproduce_all.sh runs every YAML under configs/ end-to-end (headline 6×3, JT-Attack, Beatles targeted), switching between the master and master_facenet conda envs automatically based on each config's model_name. Outputs land in results/.

Layout

run.py                  single entry point; dispatches on attack_mode
configs/                experiment YAMLs grouped by section
masterface/             the Python package
  attack/               masterface orchestrator + Phase-1 GA + Phase-2 image realization
  jt_attack/            JT-Attack runner + logger + plotter
  models/               6 face mappers + base + vendored backbones + on-demand weight fetcher
  detectors/            MTCNN-DS + RetinaFace alignment (incl. differentiable)
  data/                 LFW fetch, LMDB build, NPZ embeddings, threshold cache
  optimization/         LM-MA-ES + GA + fitness problems
  loss/                 euclidean / cosine / arc_cosine (PyTorch and TF variants)
  metrics/              threshold, coverage (Gini, per-identity), clustering stats
  utils/                config lookup, distance functions, result logger, helpers
data/                   LFW + LMDB + embedding caches + source faces + model weights (gitignored)
results/                experiment outputs (gitignored)
scripts/                reproduce-all driver + optional pre-fetch weights script
environments/           two conda env files (master, master_facenet)
assets/                 figures embedded in this README

Acknowledgments

This research was enabled in part by support provided by the Digital Research Alliance of Canada.

License

Apache 2.0. Model weights are subject to their upstream licenses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Masterface Attacks on Face Verification

TL;DR

The disconnect at a glance

Three contributions

Headline results

Method

Phase 1 — embedding-space search

Phase 2 — image realization

The irreducible floor (JT-Attack)

Reproducing the results

Quick start

Models

Reproducing all the experiments

Layout

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
configs		configs
environments		environments
masterface		masterface
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
run.py		run.py
technical_report.pdf		technical_report.pdf

Folders and files

Latest commit

History

Repository files navigation

Masterface Attacks on Face Verification

TL;DR

The disconnect at a glance

Three contributions

Headline results

Method

Phase 1 — embedding-space search

Phase 2 — image realization

The irreducible floor (JT-Attack)

Reproducing the results

Quick start

Models

Reproducing all the experiments

Layout

Acknowledgments

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages