Skip to content

Conversation

barraguda
Copy link
Contributor

Problem

#525

Solution

If open_files.len() is above threshold (180 right now based on my mac testing), run a cleanup that prunes 50% of the open files.

Interval cleanup of idle files still happens, but now in a tokio::select loop instead.

Notes

Linux seems to have a larger supported open filehandles amount per OS process (~1000 as opposed to mac's 256)

@barraguda barraguda changed the base branch from main to develop September 10, 2024 12:48
@barraguda barraguda requested review from dr-frmr and nick1udwig and removed request for dr-frmr September 10, 2024 12:50
Comment on lines 624 to 638
let current_count = open_files.len();
let target_count = max(current_count / 2, MAX_OPEN_FILES / 2);

// collect all entries, sort by last access time
let mut entries: Vec<_> = open_files
.iter()
.map(|entry| (entry.key().clone(), entry.value().1))
.collect();
entries.sort_by(|a, b| a.1.cmp(&b.1));

// remove oldest entries until we reach the target count
for (path, _) in entries.into_iter().take(current_count - target_count) {
// use remove_if_present to avoid potential race conditions
open_files.remove_if(&path, |_, _| true);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to make this more efficient using retain on the dashmap with a fixed duration calculated by the average of the map? could be better than cloning every path. something like

  1. calculate average last access time by traversing map once
  2. retain entries only with a below average last access time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good point, I'll try something out!

@barraguda
Copy link
Contributor Author

some notes for the other suggestions during discussions:

  1. matching upon a specific "TooManyOpenFiles" error in the open_file() function seems to me slightly too risky, there's no specific error kind like that in tokio::fs and matching on raw OS error codes that may be differ in environments seems risky
  2. closing 1 random file every time we reach the limit rather than 50% of them has around the same performance for 5000 sequential file opens (~700ms-900ms on my machine, debug binary)

@barraguda barraguda closed this Sep 11, 2024
@barraguda barraguda reopened this Sep 11, 2024
@nick1udwig
Copy link
Member

nick1udwig commented Sep 11, 2024

  1. Surprised about this. I agree matching on raw OS error codes seems less good
  2. Gotta test performance on release; debug is meaningless

@dr-frmr
Copy link
Contributor

dr-frmr commented Sep 11, 2024

Possible/reasonable to match on any error coming out of open_file()?

@barraguda
Copy link
Contributor Author

Possible/reasonable to match on any error coming out of open_file()?

I like this idea, but noting that keeping the open_files very close to the limit and relying on specific errors like this might be brittle.

Another big note is, we need to save cursor positions of files we close, so that processes that expect them to be at a certain location won't notice a difference even if the vfs closed and reopened them. This is ok but leads to us not being able to use synchronous functions like dashmap.retain() and makes closing a lot of files considerable slower..

Copy link
Contributor

@dr-frmr dr-frmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor, looks clean

@dr-frmr dr-frmr merged commit c673e81 into develop Sep 12, 2024
1 check passed
@dr-frmr dr-frmr deleted the bp/vfsmaxfiles branch September 12, 2024 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants