-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix file descriptor leak #34
Conversation
when cleaning up old files, the corresponding Store should also be stopped, releasing the file descriptor. This step was missing, leading to a file descriptor leak that would manifest after many compactions. Now the CubDB process keeps a list of old B-trees, and stops them (closing the relative Store) upon cleanup, when it ensured that those b-trees and their files are not in use anymore.
I'm on mobile for now, but I'll try this branch as soon as I get home. I saw you added a test that checks your new code, but it's probably worth adding a failing test based on the old code. A workflow I really like:
Side note: I'd also recommend adding at least one stress test to the suite, with lots of simultaneous readers and writers, manual and automatic compactions, etc. |
I definitely agree that a test that reproduces the issue is necessary, but the few unit tests that I wrote in this PR should act as such, preventing regressions:
What I like about this setup is that details of how a That said, after your comment I did add an assertion that uses directly Regarding the load tests I absolutely agree. Some of those already exist in form of property based tests (you can run them with Again, thanks a lot for your help on this issue! |
Really happy to help, and really glad to have this database to work with. Thanks for sharing your work. Unfortunately, it looks like there is still a problem on this branch: iex(1)> CubDB.start_link(name: :db, data_dir: "/tmp/testtest")
{:ok, #PID<0.326.0>}
iex(2)> CubDB.compact(:db)
:ok
iex(3)> :os.getpid
'5001'
iex(4)> CubDB.compact(:db)
:ok output from ...skipping...
lrwx------ 1 user user 64 Apr 18 15:24 19 -> /tmp/testtest/2.cub
lrwx------ 1 user user 64 Apr 18 15:23 2 -> /dev/pts/0
lrwx------ 1 user user 64 Apr 18 15:23 20 -> '/tmp/testtest/1.cub (deleted)'
lrwx------ 1 user user 64 Apr 18 15:24 22 -> /tmp/testtest/2.cub
...skipping... Not only is there still a file descriptor for a deleted db version, but there are 2 file descriptors open for the same version. I have a really hard time following where all the file work is being done, but I think the issue is the file not being closed before the Without making a test that is OS specific to linux, the best I was able to do was to test that the number of linked processes stayed constant through many compactions: test "compaction doesn't spawn tons of linked processes (i.e. lost file descriptors)", %{tmp_dir: tmp_dir} do
{:ok, db} = CubDB.start_link(tmp_dir, auto_compact: false)
CubDB.subscribe(db)
# do it once to identify the steady-state baseline
CubDB.put(db, 0,0)
CubDB.compact(db)
assert_receive :compaction_completed, 5000
initial_links = Process.info(db)[:links] |> length()
# now run many times to confirm we return to the baseline
for i <- 1..100 do
CubDB.put(db, i, i)
CubDB.compact(db)
assert_receive :compaction_completed, 5000
end
final_links = Process.info(db)[:links] |> length()
assert final_links == initial_links
end |
Thanks for the test, I will have a look at it tomorrow with a fresh mind to get to the bottom of it! |
When renaming the file after a successful compaction, the old Store must be closed to avoid leaking the process, and consequently also a file descriptor.
Actually, brilliant work and intuition about the renaming. I added a generic test for leaked links during compaction, based on yours, and fixed the remaining leak. There were indeed two places leaking a I also realize that your PR indeed addressed both places, I should have given it more attention. Additionally, I am also explicitly closing the file when the Could you confirm that the issue you experienced is now fixed? |
Looks like this commit fixes it. Great! I only found the second leak because I wrote the test I shared first - I actually had a lot of head scratching trying to figure out how to write the test without making tacit assumptions about the OS, since it's an OS problem. I do really like CubDB, the API is perfect for a bunch of use cases - like SQLite, but without the sql headaches. It's a shame that some of the file handling has leaked out of the |
I merged the change and will now make a new release. Thanks again for the major help on this! Regarding the |
when cleaning up old files, the corresponding
Store
should also bestopped, releasing the file descriptor. This step was missing, leading
to a file descriptor leak that would manifest after many compactions.
Now the
CubDB
process keeps a list of oldBtree
s, and stops them(closing the relative
Store
) upon cleanup, after ensuring that thoseBtree
s and their files are not in use anymore by any reader.Resolves #30