Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed flushing dirty data and compressed the cache size #1467

Merged
merged 1 commit into from Nov 14, 2020

Conversation

ggtakec
Copy link
Member

@ggtakec ggtakec commented Nov 3, 2020

Relevant Issue (if applicable)

#1448

Details

Regarding the process when the max_dirty_data option is specified(the process of uploading while writing a file), the following two have been modified.

Fixed a bug

Changed the buffer that stores the return value when uploading(Flush) during the processing of s3fs_write.
The return value of the s3fs_write function must be the number of bytes written, so it returned the wrong value(error code).
Due to this, there were cases where uploading could not be performed normally.

Cache file compression

In the process by max_dirty_data option, when uploading while writing the file, the contents of the cache file remained as it was.
If it remains, it will take up disk space.
After uploading, new code punches a HOLE in the cache file so that it clears blocks on the hard disk for cached data.
This will minimize disk space pressure when uploading with max_dirty_data option.

Cache file state when max_dirty_data option is not specified
  • Cache file status(if uploading 66000044 bytes)
644532 -rw------- 1 guest users 660000044 Nov  3 05:20 big.txt
  • Cache file stat information (.<Bucketname>/big.txt)
1351054:660000044
0:660000044:1:0
Cache file state when max_dirty_data option is specified
  • Cache file status(if uploading 66000044 bytes)
25012 -rw------- 1 guest users 660000044 Nov  3 05:20 big.txt
  • Cache file stat information (.<Bucketname>/big.txt)
1351054:660000044
0:634388480:0:0
634388480:25611564:1:0

In this way, the disk space is not used except for the last uploaded area.

Notes

The fallocate() function is used for this process, and this function is a non-portable Linux-specific system call.
Therefore, this function does not exist except on Linux base(ex: OSX).
To avoid this, I check the fallocate function in configure and implement a dummy function if it does not exist.
The dummy function is always failing and does not compress the cache file.
This is a limitation on OSX etc.

@ggtakec ggtakec requested a review from gaul November 3, 2020 08:11
@gaul gaul merged commit d9f6469 into s3fs-fuse:master Nov 14, 2020
@gaul
Copy link
Member

gaul commented Nov 14, 2020

Thanks for completing this! Now s3fs should be able to handle very large files.

@ggtakec ggtakec deleted the holes_for_nodata_area branch November 14, 2020 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants