Buffer reads from package tarfiles #1840

rbtcollins · 2019-05-09T05:17:52Z

Buffer reads from package tarfiles

This has variable impact on extract - when plenty of buffer cache is
available no impact; but in low-cache situations this avoids contention
as rust-docs writes several hundred MB but needs read only 11MB of
archive. I've measured this at between 0 seconds and 12 seconds of
improvement accordingly, depending on the scenario. The buffer size of
8MB is chosen to be twice the block size of the largest SSD around
today, so hopefully triggering read-ahead, which can matter if our
rchives were to increase in size. 4MB could be chosen, or an adaptive
buffer used, if we need to run on embedded devices.

It may be still better to never write these tarfiles to disk,
but that is a substantially larger change and I'm confident enough that
this is beneficial to propose it now.

In order to do this either an additional handle was required, or we
could simplify FileReaderWithProgress length handling - sent_start was
never being toggled: we were notifying the download tracker on every
read that hit the progress adapter. Rather than having that logic in
read(), move it to new_file, as that is a more obvious place to have it
anyway.

This has variable impact on extract - when plenty of buffer cache is available no impact; but in low-cache situations this avoids contention as rust-docs writes several hundred MB but needs read only 11MB of archive. I've measured this at between 0 seconds and 12 seconds of improvement accordingly, depending on the scenario. The buffer size of 8MB is chosen to be twice the block size of the largest SSD around today, so hopefully triggering read-ahead, which can matter if our archives were to increase in size. 4MB could be chosen, or an adaptive buffer used, if we need to run on embedded devices. It may be still better to never write these tarfiles to disk, but that is a substantially larger change and I'm confident enough that this is beneficial to propose it now. In order to do this either an additional handle was required, or we could simplify FileReaderWithProgress length handling - sent_start was never being toggled: we were notifying the download tracker on every read that hit the progress adapter. Rather than having that logic in read(), move it to new_file, as that is a more obvious place to have it anyway.

rbtcollins · 2019-05-09T06:53:25Z

(I tried to separate out the two logical changes, but really one has to be done to enable the other(

kinnison

LGTM, I'm sad I didn't think of using a BufReader when I first wrote this :D

rbtcollins · 2019-05-09T07:37:14Z

LGTM, I'm sad I didn't think of using a BufReader when I first wrote this :D

Until you've climbed deep enough into the std lib to know how little buffering there is by default, its probably unobvious that one would be needed: in Python for instance the default is to be buffered...

rbtcollins force-pushed the buffer-package-reads branch from 72c6599 to b92da9b Compare May 9, 2019 06:40

kinnison approved these changes May 9, 2019

View reviewed changes

kinnison merged commit 0baefed into rust-lang:master May 9, 2019

rbtcollins deleted the buffer-package-reads branch May 9, 2019 07:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buffer reads from package tarfiles #1840

Buffer reads from package tarfiles #1840

rbtcollins commented May 9, 2019 •

edited

rbtcollins commented May 9, 2019

kinnison left a comment

rbtcollins commented May 9, 2019

Buffer reads from package tarfiles #1840

Buffer reads from package tarfiles #1840

Conversation

rbtcollins commented May 9, 2019 • edited

rbtcollins commented May 9, 2019

kinnison left a comment

Choose a reason for hiding this comment

rbtcollins commented May 9, 2019

rbtcollins commented May 9, 2019 •

edited