Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate zio requests with ZIO_PRIORITY_SYNC_READ and #3529

Closed
wants to merge 2 commits into from

Conversation

bprotopopov
Copy link
Contributor

ZIO_PRIORITY_SYNC_WRITE into synchronous bio requests by setting
READ_SYNC and WRITE_SYNC flags. Specifically, WRITE_SYNC flag turns
out to have a pronounced effect when writing to an SSD-based SLOG.

When WRITE_SYNC is not set (WRITE is set instead), the block trace
for a SLOG device looks as follows:
...
130,96 0 3 0.008968390 0 C W 830464 + 136 [0]
130,96 0 4 0.011999161 0 C W 830720 + 136 [0]
130,96 0 5 0.023955549 0 C W 831744 + 136 [0]
130,96 0 6 0.024337663 19775 A W 832000 + 136 <- (130,97) 829952
130,96 0 7 0.024338823 19775 Q W 832000 + 136 [z_wr_iss/6]
130,96 0 8 0.024340523 19775 G W 832000 + 136 [z_wr_iss/6]
130,96 0 9 0.024343187 19775 P N [z_wr_iss/6]
130,96 0 10 0.024344120 19775 I W 832000 + 136 [z_wr_iss/6]
130,96 0 11 0.026784405 0 UT N [swapper] 1
130,96 0 12 0.026805339 202 U N [kblockd/0] 1
130,96 0 13 0.026807199 202 D W 832000 + 136 [kblockd/0]
130,96 0 14 0.026966948 0 C W 832000 + 136 [0]
130,96 3 1 0.000449358 19788 A W 829952 + 136 <- (130,97) 827904
130,96 3 2 0.000450951 19788 Q W 829952 + 136 [z_wr_iss/19]
130,96 3 3 0.000453212 19788 G W 829952 + 136 [z_wr_iss/19]
130,96 3 4 0.000455956 19788 P N [z_wr_iss/19]
130,96 3 5 0.000457076 19788 I W 829952 + 136 [z_wr_iss/19]
130,96 3 6 0.002786349 0 UT N [swapper] 1
...

Here the 130,197 is the partition created on the log device when adding it
to the pool, whereas the base device is 130,96. As one can see, the writes
to the SLOG are not marked synchronous (the S is missing next to W), and
the queue unplugs occur based on the timer (UT event) resulting in slightly
over 2 msec latency of writes. This results in a subpar perfromance of
single stream synchronus writes (limited by latency of the SLOG).

When the WRITE_SYNC is set, a similar trace looks as follows:
...
130,96 4 1 0.000000000 70714 A WS 4280576 + 136 <- (130,97) 4278528
130,96 4 2 0.000000832 70714 Q WS 4280576 + 136 [(null)]
130,96 4 3 0.000002109 70714 G WS 4280576 + 136 [(null)]
130,96 4 4 0.000003394 70714 P N [(null)]
130,96 4 5 0.000003846 70714 I WS 4280576 + 136 [(null)]
130,96 4 6 0.000004854 70714 D WS 4280576 + 136 [(null)]
130,96 5 1 0.000354487 70713 A WS 4280832 + 136 <- (130,97) 4278784
130,96 5 2 0.000355072 70713 Q WS 4280832 + 136 [(null)]
130,96 5 3 0.000356383 70713 G WS 4280832 + 136 [(null)]
130,96 5 4 0.000357635 70713 P N [(null)]
130,96 5 5 0.000358088 70713 I WS 4280832 + 136 [(null)]
130,96 5 6 0.000359191 70713 D WS 4280832 + 136 [(null)]
130,96 0 76 0.000159539 0 C WS 4280576 + 136 [0]
130,96 16 85 0.000742108 70718 A WS 4281088 + 136 <- (130,97) 4279040
130,96 16 86 0.000743197 70718 Q WS 4281088 + 136 [z_wr_iss/15]
130,96 16 87 0.000744450 70718 G WS 4281088 + 136 [z_wr_iss/15]
130,96 16 88 0.000745817 70718 P N [z_wr_iss/15]
130,96 16 89 0.000746705 70718 I WS 4281088 + 136 [z_wr_iss/15]
130,96 16 90 0.000747848 70718 D WS 4281088 + 136 [z_wr_iss/15]
130,96 0 77 0.000604063 0 C WS 4280832 + 136 [0]
130,96 0 78 0.000899858 0 C WS 4281088 + 136 [0]

As one can see, all the writes are synchronous (WS), and I/O completions
(e.g. from issue I to completion C) take 160-250 usec, or about 10x faster.

Since WRITE_SYNC or READ_SYNC flags are among several factors that are considered
when processing bio requests, it seems prudent to mark all the zio requests of
syncronous priority with the READ/WRITE_SYNC flags to make them eligible for
consideration as such by the Linux block I/O layer.

ZIO_PRIORITY_SYNC_WRITE into synchronous bio requests by setting
READ_SYNC and WRITE_SYNC flags. Specifically, WRITE_SYNC flag turns
out to have a pronounced effect when writing to an SSD-based SLOG.

When WRITE_SYNC is not set (WRITE is set instead), the block trace
for a SLOG device looks as follows:
...
130,96   0        3     0.008968390     0  C   W 830464 + 136 [0]
130,96   0        4     0.011999161     0  C   W 830720 + 136 [0]
130,96   0        5     0.023955549     0  C   W 831744 + 136 [0]
130,96   0        6     0.024337663 19775  A   W 832000 + 136 <- (130,97) 829952
130,96   0        7     0.024338823 19775  Q   W 832000 + 136 [z_wr_iss/6]
130,96   0        8     0.024340523 19775  G   W 832000 + 136 [z_wr_iss/6]
130,96   0        9     0.024343187 19775  P   N [z_wr_iss/6]
130,96   0       10     0.024344120 19775  I   W 832000 + 136 [z_wr_iss/6]
130,96   0       11     0.026784405     0 UT   N [swapper] 1
130,96   0       12     0.026805339   202  U   N [kblockd/0] 1
130,96   0       13     0.026807199   202  D   W 832000 + 136 [kblockd/0]
130,96   0       14     0.026966948     0  C   W 832000 + 136 [0]
130,96   3        1     0.000449358 19788  A   W 829952 + 136 <- (130,97) 827904
130,96   3        2     0.000450951 19788  Q   W 829952 + 136 [z_wr_iss/19]
130,96   3        3     0.000453212 19788  G   W 829952 + 136 [z_wr_iss/19]
130,96   3        4     0.000455956 19788  P   N [z_wr_iss/19]
130,96   3        5     0.000457076 19788  I   W 829952 + 136 [z_wr_iss/19]
130,96   3        6     0.002786349     0 UT   N [swapper] 1
...

Here the 130,197 is the partition created on the log device when adding it
to the pool, whereas the base device is 130,96. As one can see, the writes
to the SLOG are not marked synchronous (the S is missing next to W), and
the queue unplugs occur based on the timer (UT event) resulting in slightly
over 2 msec latency of writes. This results in a subpar perfromance of
single stream synchronus writes (limited by latency of the SLOG).

When the WRITE_SYNC is set, a similar trace looks as follows:
...
130,96   4        1     0.000000000 70714  A  WS 4280576 + 136 <- (130,97) 4278528
130,96   4        2     0.000000832 70714  Q  WS 4280576 + 136 [(null)]
130,96   4        3     0.000002109 70714  G  WS 4280576 + 136 [(null)]
130,96   4        4     0.000003394 70714  P   N [(null)]
130,96   4        5     0.000003846 70714  I  WS 4280576 + 136 [(null)]
130,96   4        6     0.000004854 70714  D  WS 4280576 + 136 [(null)]
130,96   5        1     0.000354487 70713  A  WS 4280832 + 136 <- (130,97) 4278784
130,96   5        2     0.000355072 70713  Q  WS 4280832 + 136 [(null)]
130,96   5        3     0.000356383 70713  G  WS 4280832 + 136 [(null)]
130,96   5        4     0.000357635 70713  P   N [(null)]
130,96   5        5     0.000358088 70713  I  WS 4280832 + 136 [(null)]
130,96   5        6     0.000359191 70713  D  WS 4280832 + 136 [(null)]
130,96   0       76     0.000159539     0  C  WS 4280576 + 136 [0]
130,96  16       85     0.000742108 70718  A  WS 4281088 + 136 <- (130,97) 4279040
130,96  16       86     0.000743197 70718  Q  WS 4281088 + 136 [z_wr_iss/15]
130,96  16       87     0.000744450 70718  G  WS 4281088 + 136 [z_wr_iss/15]
130,96  16       88     0.000745817 70718  P   N [z_wr_iss/15]
130,96  16       89     0.000746705 70718  I  WS 4281088 + 136 [z_wr_iss/15]
130,96  16       90     0.000747848 70718  D  WS 4281088 + 136 [z_wr_iss/15]
130,96   0       77     0.000604063     0  C  WS 4280832 + 136 [0]
130,96   0       78     0.000899858     0  C  WS 4281088 + 136 [0]

As one can see, all the writes are synchronous (WS), and I/O completions
(e.g. from issue I to completion C) take 160-250 usec, or about 10x faster.

Since WRITE_SYNC or READ_SYNC flags are among several factors that are considered
when processing bio requests, it seems prudent to mark all the zio requests of
syncronous priority with the READ/WRITE_SYNC flags to make them eligible for
consideration as such by the Linux block I/O layer.
@behlendorf
Copy link
Contributor

@bprotopopov it would be great if you could refresh this to address the cstyle issue and post any useful performance results. I'm quite curious to see how much this helps.

@bprotopopov
Copy link
Contributor Author

will do; when are you planning to tag 0.6.5 ?

Date: Mon, 29 Jun 2015 13:56:43 -0700
From: notifications@github.com
To: zfs@noreply.github.com
CC: bprotopopov@hotmail.com
Subject: Re: [zfs] Translate zio requests with ZIO_PRIORITY_SYNC_READ and (#3529)

@bprotopopov it would be great if you could refresh this to address the cstyle issue and post any useful performance results. I'm quite curious to see how much this helps.


Reply to this email directly or view it on GitHub.

@behlendorf
Copy link
Contributor

@bprotopopov in roughly a month.

@behlendorf
Copy link
Contributor

@bprotopopov could you refresh this to address the cstyle warning.

@behlendorf behlendorf added this to the 0.6.5 milestone Jul 10, 2015
@bprotopopov
Copy link
Contributor Author

Yes sorry.
What is the command to run the style check in Linux ?

Typos courtesy of my iPhone

On Jul 10, 2015, at 3:31 PM, Brian Behlendorf notifications@github.com wrote:

@bprotopopov could you refresh this to address the cstyle warning.


Reply to this email directly or view it on GitHub.

@behlendorf
Copy link
Contributor

make cstyle

@behlendorf
Copy link
Contributor

@bprotopopov thanks. This LGTM and clearly is a nice performance improvement! I've squashed the patches, rebased, and merged them to master.

janlam7 pushed a commit to janlam7/zfs that referenced this pull request Jul 25, 2015
Translate zio requests with ZIO_PRIORITY_SYNC_READ and
ZIO_PRIORITY_SYNC_WRITE into synchronous bio requests by setting
READ_SYNC and WRITE_SYNC flags. Specifically, WRITE_SYNC flag turns
out to have a pronounced effect when writing to an SSD-based SLOG.

When WRITE_SYNC is not set (WRITE is set instead), the block trace
for a SLOG device looks as follows:
...
130,96   0        3     0.008968390     0  C   W 830464 + 136 [0]
130,96   0        4     0.011999161     0  C   W 830720 + 136 [0]
130,96   0        5     0.023955549     0  C   W 831744 + 136 [0]
130,96   0        6     0.024337663 19775  A   W 832000 + 136 <- (130,97) 829952
130,96   0        7     0.024338823 19775  Q   W 832000 + 136 [z_wr_iss/6]
130,96   0        8     0.024340523 19775  G   W 832000 + 136 [z_wr_iss/6]
130,96   0        9     0.024343187 19775  P   N [z_wr_iss/6]
130,96   0       10     0.024344120 19775  I   W 832000 + 136 [z_wr_iss/6]
130,96   0       11     0.026784405     0 UT   N [swapper] 1
130,96   0       12     0.026805339   202  U   N [kblockd/0] 1
130,96   0       13     0.026807199   202  D   W 832000 + 136 [kblockd/0]
130,96   0       14     0.026966948     0  C   W 832000 + 136 [0]
130,96   3        1     0.000449358 19788  A   W 829952 + 136 <- (130,97) 827904
130,96   3        2     0.000450951 19788  Q   W 829952 + 136 [z_wr_iss/19]
130,96   3        3     0.000453212 19788  G   W 829952 + 136 [z_wr_iss/19]
130,96   3        4     0.000455956 19788  P   N [z_wr_iss/19]
130,96   3        5     0.000457076 19788  I   W 829952 + 136 [z_wr_iss/19]
130,96   3        6     0.002786349     0 UT   N [swapper] 1
...

Here the 130,197 is the partition created on the log device when adding it
to the pool, whereas the base device is 130,96. As one can see, the writes
to the SLOG are not marked synchronous (the S is missing next to W), and
the queue unplugs occur based on the timer (UT event) resulting in slightly
over 2 msec latency of writes. This results in a sub-par performance of
single stream synchronous writes (limited by latency of the SLOG).

When the WRITE_SYNC is set, a similar trace looks as follows:
...
130,96   4        1     0.000000000 70714  A  WS 4280576 + 136 <- (130,97) 4278528
130,96   4        2     0.000000832 70714  Q  WS 4280576 + 136 [(null)]
130,96   4        3     0.000002109 70714  G  WS 4280576 + 136 [(null)]
130,96   4        4     0.000003394 70714  P   N [(null)]
130,96   4        5     0.000003846 70714  I  WS 4280576 + 136 [(null)]
130,96   4        6     0.000004854 70714  D  WS 4280576 + 136 [(null)]
130,96   5        1     0.000354487 70713  A  WS 4280832 + 136 <- (130,97) 4278784
130,96   5        2     0.000355072 70713  Q  WS 4280832 + 136 [(null)]
130,96   5        3     0.000356383 70713  G  WS 4280832 + 136 [(null)]
130,96   5        4     0.000357635 70713  P   N [(null)]
130,96   5        5     0.000358088 70713  I  WS 4280832 + 136 [(null)]
130,96   5        6     0.000359191 70713  D  WS 4280832 + 136 [(null)]
130,96   0       76     0.000159539     0  C  WS 4280576 + 136 [0]
130,96  16       85     0.000742108 70718  A  WS 4281088 + 136 <- (130,97) 4279040
130,96  16       86     0.000743197 70718  Q  WS 4281088 + 136 [z_wr_iss/15]
130,96  16       87     0.000744450 70718  G  WS 4281088 + 136 [z_wr_iss/15]
130,96  16       88     0.000745817 70718  P   N [z_wr_iss/15]
130,96  16       89     0.000746705 70718  I  WS 4281088 + 136 [z_wr_iss/15]
130,96  16       90     0.000747848 70718  D  WS 4281088 + 136 [z_wr_iss/15]
130,96   0       77     0.000604063     0  C  WS 4280832 + 136 [0]
130,96   0       78     0.000899858     0  C  WS 4281088 + 136 [0]

As one can see, all the writes are synchronous (WS), and I/O completions
(e.g. from issue I to completion C) take 160-250 usec, or about 10x faster.

Since WRITE_SYNC or READ_SYNC flags are among several factors that are
considered when processing bio requests, it seems prudent to mark all the
zio requests of synchronous priority with the READ/WRITE_SYNC flags to make
them eligible for consideration as such by the Linux block I/O layer.

Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#3529
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants