-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I think models leak #112
Comments
Hello @Qeenon , Thank you for raising this issue. I tried reproducing the error and could notice that the sanitizer was identifying a leak of 20 bytes when variables are stored on the GPU. The problem does not seem to appear when loading on CPU. I tried looping over a sequence of model loading/inference (see gist https://gist.github.com/guillaume-be/76e0d287dc125592e8a2088cc48f7066). No memory leak seems to be visible after ~ 5000 iterations. I have an intuition that the issue could come from the tokenizer loading, and not necessarily from the model itself. Which tokenizer are you loading? Are you using a GPU for your service? Is there a model in particular for which the memory leak is more severe? Would you be able to share a snippet of code to reproduce the issue? I will raise the issue of GPU memory leak with the author of the torch bindings and see if it could come from here. Note that the models are not meant to be loaded for every query, the lazy_static or use of the |
Yet I'm not 100% sure that leaks related to models, so far I'm trying to detect whether it's so This file loads QA / Conversation / Translation models, on this commit it was on demand and after some time bot used to become alike 25GB on RAM. Now I've changed to code and put those inside lazy_static and so far it's going okay-ish but I'd let it to keep running for some more time to be sure that it was related to models. They was working on CPU (I didn't setup cuda properly on host machine). |
Translation example (see gist at https://gist.github.com/guillaume-be/60d4a4a61ec16d21478ba497d517a054) Below I am including the logs from Some of the warnings seem to be caused by the reqwest library. I tried to rerun a minimal example: extern crate anyhow;
use rust_bert::resources::{Resource, RemoteResource};
use rust_bert::marian::{MarianModelResources, MarianVocabResources, MarianSpmResources, MarianConfigResources};
fn main() -> anyhow::Result<()> {
let model_resource = Resource::Remote(RemoteResource::from_pretrained(MarianModelResources::ENGLISH2RUSSIAN));
let vocab_resource = Resource::Remote(RemoteResource::from_pretrained(MarianVocabResources::ENGLISH2RUSSIAN));
let merge_resource = Resource::Remote(RemoteResource::from_pretrained(MarianSpmResources::ENGLISH2RUSSIAN));
let config_resource = Resource::Remote(RemoteResource::from_pretrained(MarianConfigResources::ENGLISH2RUSSIAN));
let _out1 = model_resource.get_local_path();
let _out2 = vocab_resource.get_local_path();
let _out3 = merge_resource.get_local_path();
let _out4 = config_resource.get_local_path();
Ok(())
} valgrind log (minimal example)==28563== Memcheck, a memory error detector ==28563== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==28563== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info ==28563== Command: target/debug/examples/resource_download --leak-check=full ==28563== ==28563== Warning: set address range perms: large range [0x4dab000, 0x40fba000) (defined) ==28563== Warning: set address range perms: large range [0x40fba000, 0x51cf0000) (defined) ==28563== Source and destination overlap in memcpy_chk(0x1ffefff5c0, 0x1ffefff5c0, 5) ==28563== at 0x4843BF0: __memcpy_chk (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==28563== by 0x44EC0F83: cpuinfo_linux_parse_cpulist (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x44EBFC96: cpuinfo_linux_get_max_possible_processor (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x44EBDFA1: cpuinfo_x86_linux_init (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x51FDF47E: __pthread_once_slow (pthread_once.c:116) ==28563== by 0x44EBA3B6: cpuinfo_initialize (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x41D1B157: at::native::compute_cpu_capability() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x41D1B30C: at::native::get_cpu_capability() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x42706CB8: THFloatVector_startup::THFloatVector_startup() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x4197ED85: _GLOBAL__sub_I_THVector.cpp (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x4011B89: call_init.part.0 (dl-init.c:72) ==28563== by 0x4011C90: call_init (dl-init.c:30) ==28563== by 0x4011C90: _dl_init (dl-init.c:119) ==28563== ==28563== Source and destination overlap in memcpy_chk(0x1ffefff5c0, 0x1ffefff5c0, 5) ==28563== at 0x4843BF0: __memcpy_chk (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==28563== by 0x44EC0F83: cpuinfo_linux_parse_cpulist (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x44EBFD16: cpuinfo_linux_get_max_present_processor (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x44EBDFAC: cpuinfo_x86_linux_init (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x51FDF47E: __pthread_once_slow (pthread_once.c:116) ==28563== by 0x44EBA3B6: cpuinfo_initialize (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x41D1B157: at::native::compute_cpu_capability() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x41D1B30C: at::native::get_cpu_capability() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x42706CB8: THFloatVector_startup::THFloatVector_startup() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x4197ED85: _GLOBAL__sub_I_THVector.cpp (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x4011B89: call_init.part.0 (dl-init.c:72) ==28563== by 0x4011C90: call_init (dl-init.c:30) ==28563== by 0x4011C90: _dl_init (dl-init.c:119) ==28563== ==28563== Source and destination overlap in memcpy_chk(0x1ffefff5b0, 0x1ffefff5b0, 5) ==28563== at 0x4843BF0: __memcpy_chk (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==28563== by 0x44EC0F83: cpuinfo_linux_parse_cpulist (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x44EBFD99: cpuinfo_linux_detect_possible_processors (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x44EBE00D: cpuinfo_x86_linux_init (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x51FDF47E: __pthread_once_slow (pthread_once.c:116) ==28563== by 0x44EBA3B6: cpuinfo_initialize (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x41D1B157: at::native::compute_cpu_capability() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x41D1B30C: at::native::get_cpu_capability() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x42706CB8: THFloatVector_startup::THFloatVector_startup() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x4197ED85: _GLOBAL__sub_I_THVector.cpp (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x4011B89: call_init.part.0 (dl-init.c:72) ==28563== by 0x4011C90: call_init (dl-init.c:30) ==28563== by 0x4011C90: _dl_init (dl-init.c:119) ==28563== ==28563== Source and destination overlap in memcpy_chk(0x1ffefff5b0, 0x1ffefff5b0, 5) ==28563== at 0x4843BF0: __memcpy_chk (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==28563== by 0x44EC0F83: cpuinfo_linux_parse_cpulist (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x44EBFDF9: cpuinfo_linux_detect_present_processors (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x44EBE02F: cpuinfo_x86_linux_init (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x51FDF47E: __pthread_once_slow (pthread_once.c:116) ==28563== by 0x44EBA3B6: cpuinfo_initialize (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x41D1B157: at::native::compute_cpu_capability() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x41D1B30C: at::native::get_cpu_capability() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x42706CB8: THFloatVector_startup::THFloatVector_startup() (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x4197ED85: _GLOBAL__sub_I_THVector.cpp (in /home/guillaume/libtorch/lib/libtorch_cpu.so) ==28563== by 0x4011B89: call_init.part.0 (dl-init.c:72) ==28563== by 0x4011C90: call_init (dl-init.c:30) ==28563== by 0x4011C90: _dl_init (dl-init.c:119) ==28563== ==28563== Thread 2 reqwest-internal: ==28563== Syscall param statx(file_name) points to unaddressable byte(s) ==28563== at 0x522579FE: statx (statx.c:29) ==28563== by 0xAB4B00: statx (weak.rs:134) ==28563== by 0xAB4B00: std::sys::unix::fs::try_statx (fs.rs:123) ==28563== by 0xAB30A7: std::sys::unix::fs::stat (fs.rs:1105) ==28563== by 0x510D3D: std::fs::metadata (fs.rs:1567) ==28563== by 0x5126E1: openssl_probe::find_certs_dirs::{{closure}} (lib.rs:31) ==28563== by 0x5125ED: core::ops::function::impls:: for &mut F>::call_mut (function.rs:269) ==28563== by 0x510E3E: core::iter::traits::iterator::Iterator::find::check::{{closure}} (iterator.rs:2227) ==28563== by 0x514733: core::iter::adapters::map::map_try_fold::{{closure}} (map.rs:87) ==28563== by 0x5116F9: core::iter::traits::iterator::Iterator::try_fold (iterator.rs:1888) ==28563== by 0x514454: as core::iter::traits::iterator::Iterator>::try_fold (map.rs:113) ==28563== by 0x514642: core::iter::traits::iterator::Iterator::find (iterator.rs:2231) ==28563== by 0x510C5C: as core::iter::traits::iterator::Iterator>::next (filter.rs:55) ==28563== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==28563== ==28563== Syscall param statx(buf) points to unaddressable byte(s) ==28563== at 0x522579FE: statx (statx.c:29) ==28563== by 0xAB4B00: statx (weak.rs:134) ==28563== by 0xAB4B00: std::sys::unix::fs::try_statx (fs.rs:123) ==28563== by 0xAB30A7: std::sys::unix::fs::stat (fs.rs:1105) ==28563== by 0x510D3D: std::fs::metadata (fs.rs:1567) ==28563== by 0x5126E1: openssl_probe::find_certs_dirs::{{closure}} (lib.rs:31) ==28563== by 0x5125ED: core::ops::function::impls:: for &mut F>::call_mut (function.rs:269) ==28563== by 0x510E3E: core::iter::traits::iterator::Iterator::find::check::{{closure}} (iterator.rs:2227) ==28563== by 0x514733: core::iter::adapters::map::map_try_fold::{{closure}} (map.rs:87) ==28563== by 0x5116F9: core::iter::traits::iterator::Iterator::try_fold (iterator.rs:1888) ==28563== by 0x514454: as core::iter::traits::iterator::Iterator>::try_fold (map.rs:113) ==28563== by 0x514642: core::iter::traits::iterator::Iterator::find (iterator.rs:2231) ==28563== by 0x510C5C: as core::iter::traits::iterator::Iterator>::next (filter.rs:55) ==28563== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==28563== ==28563== ==28563== HEAP SUMMARY: ==28563== in use at exit: 1,897,131 bytes in 30,570 blocks ==28563== total heap usage: 379,202 allocs, 348,632 frees, 54,173,510 bytes allocated ==28563== ==28563== LEAK SUMMARY: ==28563== definitely lost: 114 bytes in 1 blocks ==28563== indirectly lost: 0 bytes in 0 blocks ==28563== possibly lost: 2,120 bytes in 8 blocks ==28563== still reachable: 1,894,897 bytes in 30,561 blocks ==28563== suppressed: 0 bytes in 0 blocks ==28563== Rerun with --leak-check=full to see details of leaked memory ==28563== ==28563== For lists of detected and suppressed errors, rerun with: -s ==28563== ERROR SUMMARY: 6 errors from 6 contexts (suppressed: 0 from 0) I am not quite sure what is going on here, this goes a bit beyond my comfort zone. Note I also ran the following script for ~200 iterations and did not notice a significant increase in memory consumption (stable at ~2544MB): https://gist.github.com/guillaume-be/34a982ca33749ba4be2951836ab36b97 I also ran an end-to-end translation example, including reloading the entire model and tokenizer at each iteration (see https://gist.github.com/guillaume-be/06bbc56639522d8745f2d357b310bc17). I ran the script for 20 minutes (500 full model reloads and translation), the memory consumption remained stable at 2560MB). I could run it longer, but I am unlikely to reach a 25GB memory use in a realistic amount of time. |
@Qeenon I ran a few more experiments on my end. For the model loading, these seem to appear when registering variables or modules to the variable store. As a validation, I did a quick experiment, and a very basic module creation using the base fn main() -> anyhow::Result<()> {
let device = Device::cuda_if_available();
let vs = nn::VarStore::new(device);
let _module = nn::linear(&vs.root() / "dense", 1024, 1024, Default::default());
Ok(())
} Since running this for more than a million iteration does not lead to any actual memory leak by monitoring the resources consumed by the process, I believe this is a spurious error. Here is a summary of my investigations so far:
Based on the following I would assume that there does not seem to be an obvious issue of memory leak in the models from the library. Loading the model once and running predictions on demand is indeed the right way of using those - is this working for you? |
Thank you for your investigations here. I'm really right now can't be sure with it and next time I will do tests on my side first. |
I suggest writing memory leak test disabled by default for models.
Use
::new
to load models in scope several times and do things with them.Add memory check after all loops done.
For my service/bot I store all the models in lazystatic mutex for now as workaround.
(For my case it's also nice for speed up but in general I think leaks are bad)
The text was updated successfully, but these errors were encountered: