New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce disk fragmentation in direct write mode on Windows #195

Closed
hugbug opened this Issue Mar 28, 2016 · 0 comments

Comments

Projects
None yet
1 participant
@hugbug
Member

hugbug commented Mar 28, 2016

Note: this problem affects only Windows.

The problem

An NZBGet user has reported that downloaded files have high fragmentation when option DirectWrite is active. After an investigation it turned out the sparse files are tend to be highly fragmented, even if they are written into sequentially.

In classic direct write mode NZBGet uses sparse files to avoid allocating of files with zeroes. Having files allocated with zeroes means that the same file segments are written twice into disk: first time when allocating disk space and second time when actually writing downloaded data. Writing two times if of course a big performance disadvantages. That's why NZBGet uses sparse files in direct write mode.

However, when using direct write mode with active article cache the files are typically written sequentially, at once, when all segments are downloaded (and stored in memory cache). In that case we don't really need output files to be sparse files.

Preallocated files

After investigating that a bit I’ve found out that Windows (NTFS) stores two sizes for a file: normal file size and valid data size.
When a SetFileSize-operation is performed Windows preallocates disk space and sets the “normal” file size to that size. It however doesn’t write zeroes to allocated space; the valid data size is set to “0” (for new files).

If an application tries to read beyond valid data size pointer it will become nulls as if they were written to disk.

When an applications writes to file starting from the last valid data size position Windows writes to the preallocated disk sectors and moves valid data size pointer. This saves unnecessary zeroing.

If however an application tries to write somewhere beyond the valid data size pointer Windows writes zeroes from the last valid data pointer to current write position (and moves valid data size pointer to current position).

That means that preallocating and writing from the beginning completely eliminates zeroing stage and still provides the advantage of unfragmented disk space preallocation.

The scenario "preallocate space, then write at random file positions” is not good performance wise as Windows will need to zero some data on disk. DirectWrite mode in NZBGet falls into this scenario. However if article cache is active there are no random writes and all segments are written after the file is completed, sequentially from the file beginning. This means we can use preallocating (without sparse) in DirectWrite mode when article cache is active.

Concerns

In the ideal case the file is written sequentially from the first segment to the last. No zeroes are written in that case.

If during flushing of cache there are not completed segments the system will write zeroes to disk. Later when the segments are downloaded they will be written to the file. In such unfortunate case the same disk sectors will be written twice, once with zeroes, then with actual data.

The question is what is worse: having fragmented sparse file or unfragmented file but with occasional unnecessary double writings. I hope the latter is a better strategy as a whole.

For example we are writing 90 segments of a 100 segment file (10 segments are stuck). We set file pointer to the position of the first segment and writing its data, then doing that for the second segment, etc. At some point we need to skip a stuck segment, so we setting the pointer to the next segment and writing its data. The system sees that we skipped a segment and it writes zeroes to the disk for that segment, then it writes our data segment. Since the whole file is unfragmented the writing of zeroes and of a real (next) segment will be done by the disk probably as one disk operation. In other words the writing of zeroes comes at no cost. Hopefully. Although we needed to write only 90 segments we have written 100 segments (10 of them with zeroes). But since we did this for unfragmented file the total time needed for this operation was likely less that if we were writing 90 segments of fragmented sparse file.

That’s all a guess-work. A real test would be to download a big nzb (several gigabytes) with different settings multiple times and compare results (download time, unpack time).

In this issue

  • test fragmentation in different modes:
    • direct write on with active cache;
    • direct write off with active cache;
    • direct write on, active cache off;
  • test how fragmentation affects unpack performance;
  • change implementation:
    • use spare files when article cache is disabled;
    • preallocate files (without sparse) when article cache is active;
  • test fragmentation with new implementation:
    • direct write on with active cache.

@hugbug hugbug added the feature label Mar 28, 2016

@hugbug hugbug added this to the v17.0 milestone Mar 28, 2016

hugbug added a commit that referenced this issue Mar 29, 2016

#195: preallocate files without sparse on Windows
Reduce disk fragmentation in direct write mode on Windows when using
article cache.

@hugbug hugbug closed this Mar 30, 2016

hugbug added a commit that referenced this issue Oct 9, 2017

#195: preallocate files without sparse on Windows
Reduce disk fragmentation in direct write mode on Windows when using
article cache.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment