Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge memory usage #72

Open
Krakoer opened this issue Jun 13, 2024 · 3 comments
Open

Huge memory usage #72

Krakoer opened this issue Jun 13, 2024 · 3 comments

Comments

@Krakoer
Copy link

Krakoer commented Jun 13, 2024

Hi,

While using the lib, I witnessed a huge memory usage (peak of ~ 230Mo to extract strings from a 22Mo sample) from the python lib but not from the binary. I suspect there is a lot of overhead while allocating strings, but the memory usage drops when the strings are returned from the lib.

To monitor the memory usage, I used memory-profiler and a python script that loads the data in memory, waits for a second, extracts the strings using rust-strings, waits for a second and exits.

Do you have an idea of what can cause such a memory usage?
I'll continue to investigate on my side.

@iddohau
Copy link
Owner

iddohau commented Jun 13, 2024

Hi,

Thanks for reporting this issue; this is indeed a little problem.
I suspect that the conversion from Rust to Python has an overhead, but not that much of an overhead.
I'll try to take a look at this next week.

@iddohau
Copy link
Owner

iddohau commented Jun 19, 2024

I've created a large file using this script:

with open("large_file.bin", "wb") as f:
    for _ in range(1024 * 1024):
        f.write(b"X" * 20)
        f.write(b"\xff\xff\xff\xff")

This will create a file with size around 20MB which contains a lot of strings.
I've reproduced the problem using this script:

import time

import rust_strings
from memory_profiler import profile


@profile()
def main():
    time.sleep(1)
    x = rust_strings.strings("large_file.bin")
    time.sleep(1)


if __name__ == "__main__":
    main()

The memory huge consumption reproduce:

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     7     22.1 MiB     22.1 MiB           1   @profile()
     8                                         def main():
     9     22.1 MiB      0.0 MiB           1       time.sleep(1)
    10    206.8 MiB    184.7 MiB           1       x = rust_strings.strings("large_file.bin")
    11    206.8 MiB      0.0 MiB           1       time.sleep(1)

I've tried to debug it but I don't think there is a problem.
The list contains million of items, which consume more memory than one big string.

@Krakoer
Copy link
Author

Krakoer commented Jun 19, 2024

Indeed, the issue doesn't show up when providing a file path to strings, but it does when using the bytes input option:

import time

import rust_strings
from memory_profiler import profile


@profile()
def main():
    with open("large_file.bin", 'rb') as f:
        data = f.read()
    time.sleep(1)
    x = rust_strings.strings(bytes=data)
    time.sleep(1)


if __name__ == "__main__":
    main()

Gives this profile:
prof
(Black is my modified code)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants