Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] memory leak with CPU basecalling #10

Closed
colindaven opened this issue Feb 20, 2020 · 2 comments
Closed

[bug] memory leak with CPU basecalling #10

colindaven opened this issue Feb 20, 2020 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@colindaven
Copy link

Hi,

thanks for this, interesting project.

I may have found a mem leak while testing:

Server Ubuntu 1604 48 GB RAM CPU only

# mem leak, starts 40-50% of 48GB RAM, uses 75% after ca 30minutes. Then killed (after 1+ h). bonito basecaller --device cpu /mnt/ngsnfs/tools/bonito/bonito/models$
 bonito basecaller --device cpu /mnt/ngsnfs/tools/bonito/bonito/models/dna_r9.4.1/ subdir1/  > bonito3.fastq &

#input data
-rw-rw-r-- 1 rcug rcug 427M Feb 19 17:08 FAL71492_420e77a4a3d53f993710f389b74f684f01c6c3d4_14.fast5
-rw-rw-r-- 1 rcug rcug 369M Feb 19 17:08 FAL71492_420e77a4a3d53f993710f389b74f684f01c6c3d4_15.fast5
-rw-rw-r-- 1 rcug rcug 372M Feb 19 17:08 FAL71492_420e77a4a3d53f993710f389b74f684f01c6c3d4_16.fast5
-rw-rw-r-- 1 rcug rcug 419M Feb 19 17:08 FAL71492_420e77a4a3d53f993710f389b74f684f01c6c3d4_17.fast5
-rw-rw-r-- 1 rcug rcug 396M Feb 19 17:08 FAL71492_420e77a4a3d53f993710f389b74f684f01c6c3d4_18.fast5







pip freeze

alembic==1.4.0
asn1crypto==1.2.0
bioepic==0.2.9
biopython==1.74
bleach==2.1.3
botocore==1.10.24
bz2file==0.98
certifi==2019.11.28
cffi==1.13.2
chardet==3.0.4
Click==7.0
cliff==2.18.0
cmd2==0.8.9
colorlog==4.1.0
colormath==3.0.0
conda==4.8.2
conda-package-handling==1.6.0
cryptography==2.3.1
cutadapt==2.6
cycler==0.10.0
Cython==0.29.14
deap==1.2.2
decorator==4.4.1
deepTools==2.5.7
diagram==0.2.25
dnaio==0.3
docopt==0.6.2
docutils==0.15.2
entrypoints==0.2.3
fast-ctc-decode==0.1.3
ForestQC==1.1.5
future==0.16.0
gittyleaks==0.0.23
h5py==2.10.0
html5lib==1.0.1
idna==2.8
ipykernel==4.8.2
ipython==6.2.1
ipython-genutils==0.2.0
jedi==0.11.1
Jinja2==2.10.3
jmespath==0.9.4
joblib==0.14.1
jsonschema==2.6.0
jupyter-client==5.2.3
jupyter-core==4.4.0
jupyterhub==0.8.1
jupyterlab==0.31.12
jupyterlab-launcher==0.10.5
lzstring==1.0.4
Mako==1.1.1
mappy==2.17
MarkupSafe==1.1.1
matplotlib==2.1.2
mistune==0.8.3
multiqc==1.0
mysql-connector-python==8.0.17
nanoQC==0.3.3
natsort==6.2.0
nbconvert==5.3.1
nbformat==4.4.0
networkx==2.0
notebook==5.4.0
numpy==1.18.1
olefile==0.46
ont-bonito==0.0.4
ont-fast5-api==3.0.1
ont-tombo==1.5
ont2cram==0.0.1
optuna==1.1.0
pamela==0.3.0
pandas==0.20.3
pandocfilters==1.4.2
parallel-fastq-dump==0.6.5
parameterized==0.7.0
parasail==1.1.19
parso==0.1.1
patsy==0.5.1
pbr==5.4.4
pexpect==4.4.0
pickleshare==0.7.4
Pillow==5.1.0
prettytable==0.7.2
progressbar33==2.4
prompt-toolkit==1.0.15
ptyprocess==0.5.2
py2bit==0.3.0
pyBigWig==0.3.10
pycosat==0.6.3
pycparser==2.19
pyfaidx==0.5.5.2
Pygments==2.2.0
pyOpenSSL==19.0.0
pyparsing==2.4.6
pyperclip==1.7.0
pysam==0.11.2.2
PySocks==1.7.1
python-dateutil==2.8.1
python-editor==1.0.4
python-oauth2==1.1.0
pytz==2019.3
PyYAML==5.3
pyzmq==17.0.0
requests==2.22.0
rpy2==2.8.6
ruamel-yaml==0.11.14
scandir==1.7
scikit-learn==0.19.1
scipy==1.4.1
seaborn==0.8
Send2Trash==1.5.0
sh==1.12.14
simplegeneric==0.8.1
simplejson==3.8.1
singledispatch==3.4.0.3
sip==4.19.13
six==1.14.0
spectra==0.0.11
SQLAlchemy==1.3.13
statsmodels==0.8.0
stevedore==1.32.0
stopit==1.1.1
svim==0.4.2
terminado==0.8.1
testpath==0.3.1
toml==0.10.0
toolshed==0.4.6
torch==1.4.0
tornado==6.0.3
TPOT==0.9.1
tqdm==4.31.1
traitlets==4.3.2
typing==3.5.2.2
umi-tools==0.4.4
update-checker==0.16
urllib3==1.24.2
wcwidth==0.1.8
webencodings==0.5.1
xopen==0.7.3

@iiSeymour iiSeymour self-assigned this Feb 20, 2020
@iiSeymour iiSeymour added the bug Something isn't working label Feb 20, 2020
@iiSeymour
Copy link
Member

Thanks for reporting @colindaven

I can reproduce the leak and it appears to be related to using cpu -

$ /usr/bin/time -v bonito basecaller dna_r9.4.1 reads --device cpu > /dev/null
...
Maximum resident set size (kbytes): 32935176
$ /usr/bin/time -v bonito basecaller dna_r9.4.1 reads --device cuda > /dev/null
...
Maximum resident set size (kbytes): 2536244

@iiSeymour
Copy link
Member

I've tracked it down to this issue pytorch/pytorch#27971 and the workaround of setting LRU_CACHE_CAPACITY=1 seems to do the trick -

$ export LRU_CACHE_CAPACITY=1 
$ /usr/bin/time -v bonito basecaller dna_r9.4.1 read --device cpu > /dev/null
...
Maximum resident set size (kbytes): 1196220

Note: the basecaller will be very slow on CPU as PyTorch won't taking advantage of all the cores, using a different runtime like ONNX or PlaidML might be worth a look for CPU.

@iiSeymour iiSeymour changed the title mem issues ? [bug] memory leak with CPU basecalling Feb 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants