Skip to content

segmentation fault in normalize_total (Numba omp) on Apple Silicon #4026

@JiayuSuPKU

Description

@JiayuSuPKU

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of scanpy.
  • (optional) I have confirmed this bug exists on the main branch of scanpy.

What happened?

Probably related to #3507. Posting it again since I could not find an existing solution. The workaround is to manually set NUMBA_THREADING_LAYER=workqueue.

Problem: When scanpy and torch are both loaded, sc.pp.normalize_total can crash (segmentation fault) with warning

 OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
import scanpy as sc
import torch
from scipy.sparse import csr_matrix
X = csr_matrix(torch.eye(100).numpy())
ad = sc.AnnData(X=X)
sc.pp.normalize_total(ad, target_sum=1e4)

Densifying ad.X first before running sc.pp.normalize_total can rescue the crash. Also, importing numpy only will throw the same OMP message but does not lead to segfault crash.

Root cause: Numba's @njit with numba.prange (parallel execution) in _normalize_csr. Numba's default threading layer on macOS/M-chip is omp (LLVM OpenMP). On Apple Silicon, Numba 0.64.0 with OpenMP threading throws the OMP warning (omp_set_nested deprecated). When torch is also imported, this somehow leads to crashes with a segfault.

A potential issue may be libomp.dylib. sklearn bundles libomp.dylib with install name /DLC/sklearn/.dylibs/libomp.dylib, while torch ships its own libomp.dylib with install name /opt/llvm-openmp/lib/libomp.dylib. Both may get loaded as separate OMP runtimes, and the LLVM OMP runtime crashes on detecting a duplicate.

Potential fix: Set NUMBA_THREADING_LAYER=workqueue which uses Numba's own platform-agnostic thread pool (no OpenMP dependency)

export NUMBA_THREADING_LAYER=workqueue

Or alternatively,

import os; os.environ['NUMBA_THREADING_LAYER'] = 'workqueue'
import scanpy as sc
import torch

Minimal code sample

conda create -n scanpy_test python=3.12 r-base
conda activate scanpy_test
pip install scanpy==1.12 torch==2.11

The r-base installation somehow is required to replicate the behavior.

import scanpy as sc
import torch
from scipy.sparse import csr_matrix
X = csr_matrix(torch.eye(100).numpy())
ad = sc.AnnData(X=X)
sc.pp.normalize_total(ad, target_sum=1e4)

Error output

OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
[1]    72347 segmentation fault  python

Versions

Details
scanpy	1.12
----	----
packaging	26.0
donfig	0.8.1.post1
typing_extensions	4.15.0
joblib	1.5.3
anndata	0.12.10
kiwisolver	1.5.0
h5py	3.16.0
numpy	2.4.4
PyYAML	6.0.3
session-info2	0.4
natsort	8.4.0
cycler	0.12.1
legacy-api-wrap	1.5
six	1.17.0
threadpoolctl	3.6.0
fast-array-utils	1.4
scipy	1.17.1
python-dateutil	2.9.0.post0
pillow	12.1.1
numba	0.64.0
numcodecs	0.16.5
scikit-learn	1.8.0
setuptools	81.0.0
fsspec	2026.3.0
llvmlite	0.46.0
pandas	2.3.3
matplotlib	3.10.8
google-crc32c	1.8.0
pyparsing	3.3.2
pytz	2026.1.post1
zarr	3.1.6
----	----
Python	3.12.13 | packaged by conda-forge | (main, Mar  5 2026, 17:06:14) [Clang 19.1.7 ]
OS	macOS-26.4-arm64-arm-64bit
CPU	8 logical CPU cores, arm
GPU	No GPU found
Updated	2026-04-01 02:04

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions