Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve concurrent sequential access speed by per-file file pointer. #182

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tmhannes
Copy link

This PR addresses #181 by adding a separate "fptr" for each open file handle, so that sequential reads never have to restart from the beginning of the file.

In light testing on an i7 CPU with an external SSD, the PR allows concurrent readers at different positions in a large file to achieve the same total read throughput and CPU usage as a single reader sequentially processing the whole file.

Any feedback would be gratefully received.

This commit extends the caching of cluster information introduced by
commit 6c332e1 so that multiple open file handles accessing the same
node each have their own cache. That prevents dramatic slowdowns when
multiple processes are reading from a large file.

* Replace the fptr_index and fptr_cluster members of struct exfat_node
  introduced by 6c332e1 with a list of exfat_fptr structs, each of
  which holds one index and cluster.

* In fuse_exfat_open, add a new exfat_fptr to the exfat_node, and
  store a pointer to both in the fh member of the fuse_file_info.

* In fuse_exfat_read and fuse_exfat_write, pass the exfat_fptr from
  the fh through to exfat_advance_cluster, so that multiple file
  handles open on the same file can independently track their most
  recently used cluster.

  For all other uses of exfat_advance_cluster, we continue to user the
  "shared" exfat_fptr stored in the node.

* Adjust grow_file and shrink_file to detect when multiple exfat_fptr
  instances are stored in the node and update them all as needed.
@relan
Copy link
Owner

relan commented Apr 3, 2022

I like the idea, multi-thread operations are indeed suboptimal. Thanks for your proposal!

I have a few concerns about the implementation:

  1. Why exfat_fptrs need to be chained?
  2. The optionality of exfat_fptr argument makes the code prone to errors: if we pass it to read/write functions but forget to pass to truncate, the fptr will become invalid and can corrupt data.

1 similar comment
@relan
Copy link
Owner

relan commented Apr 3, 2022

I like the idea, multi-thread operations are indeed suboptimal. Thanks for your proposal!

I have a few concerns about the implementation:

  1. Why exfat_fptrs need to be chained?
  2. The optionality of exfat_fptr argument makes the code prone to errors: if we pass it to read/write functions but forget to pass to truncate, the fptr will become invalid and can corrupt data.

@tmhannes
Copy link
Author

tmhannes commented Apr 6, 2022

  1. Why exfat_ptrs need to be chained

The exfat_ptrs are chained so that shrink_file and grow_file (both called from exfat_truncate) can find all of the exfat_ptrs that point to the file they are working with (by following the chain from the fptr member of exfat_node), so that they can adjust them when necessary.

My impression is that this is required when, for example, a process A has a file open and then a process B truncates it, so that process A is not left holding an invalid exfat_ptr.

  1. The optionality of exfat_fptr argument makes the code prone to errors: if we pass it to read/write functions but forget to pass to truncate, the fptr will become invalid and can corrupt data.

It's not necessary to pass the exfat_fptr to truncate at all, because truncate will find all the exfat_ptrs that exist by itself (see above).

I agree that the optional exfat_ptr argument to exfat_advance_cluster is weird though. Perhaps it would be cleaner to require every caller to provide an exfat_ptr (or an exfat_fh (which contains both exfat_ptr and exfat_node))?

I didn't pursue that option because I wanted to keep the patch as small as possible. Requiring every caller of exfat_advance_cluster to provide an exfat_ptr would cause changes rippling out into a few more sections of code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants