Modified GPU-accelerated currentNe(https://github.com/esrud/currentNe) with PED/MAP, and VCF input support, plus complete Ne estimation & confidence intervals.
GPU-accelerated fork of currentNe adding PED/MAP and VCF input, and providing end-to-end Nₑ estimation with confidence intervals. The GPU path computes weighted LD (d²) in FP64 using atomicAdd(double*), while Nₑ and CIs follow the original integration and neural-network variance model.
Requires: NVIDIA GPU ≥ Pascal (SM ≥ 6.0), NVIDIA driver + CUDA Toolkit (12+), gcc/g++ & make, and ~1 GB free GPU memory (more for large datasets).
Cooling note: Not recommended to run on passively cooled (fanless) Tesla GPUs without server-grade, front-to-back airflow. The FP64 path saturates the FP units for extended periods, creating stress-test-level thermal load (stress FPU). Inadequate airflow will cause throttling or faults.
Citations: Santiago, E., Caballero, A., Köpke, C., & Novo, I. (2024). Estimation of the contemporary effective population size from SNP data while accounting for mating structure. Molecular Ecology Resources, 24, e13890. https://doi.org/10.1111/1755-0998.13890
Santiago, E., Köpke, C. & Caballero, A. Accounting for population structure and data quality in demographic inference with linkage disequilibrium methods. Nat Commun 16, 6054 (2025). https://doi.org/10.1038/s41467-025-61378-w##
unzip currentNe_gpu_full.zip
cd currentNe_gpu_full
make ARCH=sm_89 # choose your GPU's SM arch (sm_70, sm_80, sm_86, sm_89 ...) also should be set `ARCH ?=sm_89` in Makefile accordingly.This creates ./currentNe_gpu.
make cpuThis creates ./currentNe_gpu_cpu (OpenMP).
General form:
ulimit -s unlimited #default Maxloci setting to 20 million, can increase in the cpp file.
./currentNe_gpu <datafile> <num_chromosomes> [options]<datafile>: one ofprefix.vcfprefix.ped(requiresprefix.mapin the same folder)prefix.tped(with individuals as columns following the first 4 fields)
<num_chromosomes>: required (e.g.,22for human-like autosomes, or the true count for your organism's autosomes).
Common options:
-s <N>Number of SNPs to use (default: all segregating)-t <T>CPU threads (for non-GPU parts; default: OpenMP auto)-o <file>Output filename (default:<prefix>_currentNe_OUTPUT.txt)-k <int>Important, please see original description in currentNe-qQuiet: only print Ne (and with-valso 50% & 90% CI)-vWith-q, also print CIs-pPrint full analysis to stdout instead of file
Examples:
# TPED
./currentNe_gpu mydata.tped 19 -t 8
# PED/MAP
./currentNe_gpu mypop.ped 19 -t 8
# VCF
./currentNe_gpu cohort.vcf 19 -t 8
./currentNe_gpu cohort.vcf 19 -t 8 -k 1 -t 8 is enough
- Full report file (unless
-p):<prefix>_currentNe_OUTPUT.txt
Includes: input stats, d², expected/observed het, Ne point estimate, 50%/90% CI.
- Double
atomicAddrequires GPU architecture sm_60+; setARCHaccordingly. - For very large SNP counts, memory =
L × Nbytes (char). Consider filtering-sor thinning SNPs.