Fixed flushing dirty data and compressed the cache size #1467
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Relevant Issue (if applicable)
#1448
Details
Regarding the process when the max_dirty_data option is specified(the process of uploading while writing a file), the following two have been modified.
Fixed a bug
Changed the buffer that stores the return value when uploading(Flush) during the processing of s3fs_write.
The return value of the s3fs_write function must be the number of bytes written, so it returned the wrong value(error code).
Due to this, there were cases where uploading could not be performed normally.
Cache file compression
In the process by max_dirty_data option, when uploading while writing the file, the contents of the cache file remained as it was.
If it remains, it will take up disk space.
After uploading, new code punches a HOLE in the cache file so that it clears blocks on the hard disk for cached data.
This will minimize disk space pressure when uploading with max_dirty_data option.
Cache file state when max_dirty_data option is not specified
.<Bucketname>/big.txt
)Cache file state when max_dirty_data option is specified
.<Bucketname>/big.txt
)In this way, the disk space is not used except for the last uploaded area.
Notes
The fallocate() function is used for this process, and this function is a non-portable Linux-specific system call.
Therefore, this function does not exist except on Linux base(ex: OSX).
To avoid this, I check the fallocate function in configure and implement a dummy function if it does not exist.
The dummy function is always failing and does not compress the cache file.
This is a limitation on OSX etc.