Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in Fastq read (probalby .quali) #70

Closed
pfrstg opened this issue Aug 22, 2023 · 2 comments
Closed

Memory leak in Fastq read (probalby .quali) #70

pfrstg opened this issue Aug 22, 2023 · 2 comments

Comments

@pfrstg
Copy link

pfrstg commented Aug 22, 2023

I ran into OOMs when processing a set of large fastq files.

I reproduced in a small example of what appears to be a memory leak that is noticeable when you call .quali

The attached program repeatedly opens test.fq.gz and prints out the memory used after each iteration of opening the files and processing all the read. There is some small growth in memory (with plateaus) when calling .seq and .qual, but a much larger / consistent growth when calling .quali

This is the output:

Running calling seq
After run 0, using 12,210,176 bytes of memory
After run 1, using 12,845,056 bytes of memory
After run 2, using 13,230,080 bytes of memory
After run 3, using 13,340,672 bytes of memory
After run 4, using 13,471,744 bytes of memory
After run 5, using 13,520,896 bytes of memory
After run 6, using 13,524,992 bytes of memory
After run 7, using 13,594,624 bytes of memory
After run 8, using 13,594,624 bytes of memory
After run 9, using 13,594,624 bytes of memory
After run 10, using 13,627,392 bytes of memory
After run 11, using 13,705,216 bytes of memory
After run 12, using 13,787,136 bytes of memory
After run 13, using 13,791,232 bytes of memory
After run 14, using 13,824,000 bytes of memory
After run 15, using 13,856,768 bytes of memory
After run 16, using 13,877,248 bytes of memory
After run 17, using 14,000,128 bytes of memory
After run 18, using 14,053,376 bytes of memory
After run 19, using 14,053,376 bytes of memory
After run 20, using 14,053,376 bytes of memory
After run 21, using 14,053,376 bytes of memory
After run 22, using 14,086,144 bytes of memory
After run 23, using 14,127,104 bytes of memory
After run 24, using 14,127,104 bytes of memory
After run 25, using 14,127,104 bytes of memory
After run 26, using 14,127,104 bytes of memory
After run 27, using 14,127,104 bytes of memory
After run 28, using 14,127,104 bytes of memory
After run 29, using 14,127,104 bytes of memory
After run 30, using 14,127,104 bytes of memory
After run 31, using 14,131,200 bytes of memory
After run 32, using 14,131,200 bytes of memory
After run 33, using 14,163,968 bytes of memory
After run 34, using 14,163,968 bytes of memory
After run 35, using 14,163,968 bytes of memory
After run 36, using 14,163,968 bytes of memory
After run 37, using 14,163,968 bytes of memory
After run 38, using 14,163,968 bytes of memory
After run 39, using 14,163,968 bytes of memory
After run 40, using 14,196,736 bytes of memory
After run 41, using 14,196,736 bytes of memory
After run 42, using 14,196,736 bytes of memory
After run 43, using 14,196,736 bytes of memory
After run 44, using 14,196,736 bytes of memory
After run 45, using 14,196,736 bytes of memory
After run 46, using 14,196,736 bytes of memory
After run 47, using 14,229,504 bytes of memory
After run 48, using 14,229,504 bytes of memory
After run 49, using 14,229,504 bytes of memory

Running calling qual
After run 0, using 14,299,136 bytes of memory
After run 1, using 14,299,136 bytes of memory
After run 2, using 14,299,136 bytes of memory
After run 3, using 14,299,136 bytes of memory
After run 4, using 14,299,136 bytes of memory
After run 5, using 14,299,136 bytes of memory
After run 6, using 14,299,136 bytes of memory
After run 7, using 14,299,136 bytes of memory
After run 8, using 14,299,136 bytes of memory
After run 9, using 14,299,136 bytes of memory
After run 10, using 14,299,136 bytes of memory
After run 11, using 14,299,136 bytes of memory
After run 12, using 14,299,136 bytes of memory
After run 13, using 14,303,232 bytes of memory
After run 14, using 14,303,232 bytes of memory
After run 15, using 14,303,232 bytes of memory
After run 16, using 14,303,232 bytes of memory
After run 17, using 14,303,232 bytes of memory
After run 18, using 14,303,232 bytes of memory
After run 19, using 14,303,232 bytes of memory
After run 20, using 14,303,232 bytes of memory
After run 21, using 14,303,232 bytes of memory
After run 22, using 14,303,232 bytes of memory
After run 23, using 14,303,232 bytes of memory
After run 24, using 14,303,232 bytes of memory
After run 25, using 14,360,576 bytes of memory
After run 26, using 14,372,864 bytes of memory
After run 27, using 14,372,864 bytes of memory
After run 28, using 14,372,864 bytes of memory
After run 29, using 14,405,632 bytes of memory
After run 30, using 14,409,728 bytes of memory
After run 31, using 14,409,728 bytes of memory
After run 32, using 14,462,976 bytes of memory
After run 33, using 14,462,976 bytes of memory
After run 34, using 14,462,976 bytes of memory
After run 35, using 14,462,976 bytes of memory
After run 36, using 14,462,976 bytes of memory
After run 37, using 14,462,976 bytes of memory
After run 38, using 14,462,976 bytes of memory
After run 39, using 14,462,976 bytes of memory
After run 40, using 14,462,976 bytes of memory
After run 41, using 14,462,976 bytes of memory
After run 42, using 14,462,976 bytes of memory
After run 43, using 14,462,976 bytes of memory
After run 44, using 14,462,976 bytes of memory
After run 45, using 14,462,976 bytes of memory
After run 46, using 14,462,976 bytes of memory
After run 47, using 14,462,976 bytes of memory
After run 48, using 14,462,976 bytes of memory
After run 49, using 14,462,976 bytes of memory

Running calling quali
After run 0, using 14,630,912 bytes of memory
After run 1, using 14,798,848 bytes of memory
After run 2, using 15,265,792 bytes of memory
After run 3, using 15,433,728 bytes of memory
After run 4, using 15,601,664 bytes of memory
After run 5, using 15,769,600 bytes of memory
After run 6, using 15,937,536 bytes of memory
After run 7, using 16,105,472 bytes of memory
After run 8, using 16,273,408 bytes of memory
After run 9, using 16,740,352 bytes of memory
After run 10, using 16,908,288 bytes of memory
After run 11, using 17,076,224 bytes of memory
After run 12, using 17,281,024 bytes of memory
After run 13, using 17,448,960 bytes of memory
After run 14, using 17,498,112 bytes of memory
After run 15, using 17,797,120 bytes of memory
After run 16, using 17,952,768 bytes of memory
After run 17, using 18,120,704 bytes of memory
After run 18, using 18,292,736 bytes of memory
After run 19, using 18,460,672 bytes of memory
After run 20, using 18,546,688 bytes of memory
After run 21, using 18,845,696 bytes of memory
After run 22, using 18,964,480 bytes of memory
After run 23, using 19,132,416 bytes of memory
After run 24, using 19,300,352 bytes of memory
After run 25, using 19,468,288 bytes of memory
After run 26, using 19,595,264 bytes of memory
After run 27, using 19,894,272 bytes of memory
After run 28, using 19,972,096 bytes of memory
After run 29, using 20,140,032 bytes of memory
After run 30, using 20,307,968 bytes of memory
After run 31, using 20,480,000 bytes of memory
After run 32, using 20,647,936 bytes of memory
After run 33, using 20,688,896 bytes of memory
After run 34, using 21,024,768 bytes of memory
After run 35, using 21,192,704 bytes of memory
After run 36, using 21,360,640 bytes of memory
After run 37, using 21,528,576 bytes of memory
After run 38, using 21,729,280 bytes of memory
After run 39, using 21,770,240 bytes of memory
After run 40, using 22,069,248 bytes of memory
After run 41, using 22,233,088 bytes of memory
After run 42, using 22,433,792 bytes of memory
After run 43, using 22,622,208 bytes of memory
After run 44, using 22,790,144 bytes of memory
After run 45, using 22,872,064 bytes of memory
After run 46, using 23,171,072 bytes of memory
After run 47, using 23,298,048 bytes of memory
After run 48, using 23,478,272 bytes of memory
After run 49, using 23,646,208 bytes of memory

The code is (sorry, I can't attach a .py directly)

import gc
import psutil
import pyfastx

print("\nRunning calling seq")
for run_idx in range(50):
    f = pyfastx.Fastq("tests/data/test.fq.gz")
    mysum = 0
    for read in f:
        mysum += hash(read.seq)
    del(f)
    gc.collect()
    mem_used = psutil.Process().memory_info().rss
    print(f"After run {run_idx}, using {mem_used:,} bytes of memory")

print("\nRunning calling qual")
for run_idx in range(50):
    f = pyfastx.Fastq("tests/data/test.fq.gz")
    mysum = 0
    for read in f:
        mysum += hash(read.qual)
    del(f)
    gc.collect()
    mem_used = psutil.Process().memory_info().rss
    print(f"After run {run_idx}, using {mem_used:,} bytes of memory")

print("\nRunning calling quali")
for run_idx in range(50):
    f = pyfastx.Fastq("tests/data/test.fq.gz")
    mysum = 0
    for read in f:
        mysum += sum(read.quali)
    del(f)
    gc.collect()
    mem_used = psutil.Process().memory_info().rss
    print(f"After run {run_idx}, using {mem_used:,} bytes of memory")

Relevant system details

$ python --version
Python 3.10.11
$ sw_vers
ProductName:		macOS
ProductVersion:		13.4.1
ProductVersionExtra:	(c)
BuildVersion:		22F770820d
@lmdu
Copy link
Owner

lmdu commented Aug 25, 2023

Thanks for reporting this issue. I have fixed it.

@lmdu
Copy link
Owner

lmdu commented Sep 7, 2023

Fixed it in version 2.0.0.

@lmdu lmdu closed this as completed Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants