Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When the IO_SIZE>= 65536, the qos bandwidth limit is abnormal #370

Closed
ghost opened this issue Jul 20, 2018 · 19 comments
Closed

When the IO_SIZE>= 65536, the qos bandwidth limit is abnormal #370

ghost opened this issue Jul 20, 2018 · 19 comments
Assignees
Labels

Comments

@ghost
Copy link

ghost commented Jul 20, 2018

Please use the issue tracker only for reporting suspected issues.

See The SPDK Community Page for other SPDK communications channels.

When iscsi_tgt uses qos to limit the bandwidth, io_size>=65536, the qos bandwidth rate limit is abnormal, and the entire bandwidth will suddenly drop to about 22M.

Expected Behavior

Within the range of bandwidth achievable speeds (such as 100M bandwidth), as io_size becomes larger, the actual measured bandwidth rate should be almost equal to the set bandwidth.

Current Behavior

Within the achievable bandwidth rate limit (such as 100M bandwidth), with io_size>=65536, the actual measured bandwidth rate is much lower than the limited rate, and the lowest is as low as 22M.

Possible Solution

Steps to Reproduce

  1. ./app/iscsi_tgt/iscsi_tgt -c iscsi.malloc.conf
  2. iscsiadm -m discovery -t st -p 127.0.0.1
  3. iscsiadm -m discovery -t st -p 192.168.5.10
  4. ./scripts/fio.py 65536 64 randwrite/randrw 10

note:
configfile
[QoS]
Limit_BPS Malloc0 100

Context (Environment including OS version, SPDK version, etc.)

OS: Linux 4.8.6-300.fc25.x86_64 x86_64 GNU/Linux
SPDK version: ee9db7d

@wanqunintel
Copy link

@Comphix is there any patch for this?

@jimharris
Copy link
Member

Hi @chenlo2x,

Could you please add more details on the steps to reproduce. It should include specifics on config files, RPCs, etc.

Thanks,

-Jim

@ghost
Copy link
Author

ghost commented Aug 6, 2018

@jimharris I have added steps to reproduce.

@Comphix
Copy link
Contributor

Comphix commented Aug 7, 2018

The problem here is that iSCSI target will split the larger big write into small ones. It seems only relate to iSCSI target and the write IO only. I am also looking at the iSCSI module.

@Comphix
Copy link
Contributor

Comphix commented Aug 9, 2018

This issue looks like relating to the iSCSI Data-Out (write) handling which is affected by the FirstBurstLength where our default value is 8192 byte. So that the larger write > 8K will not send from the iSCSI initiator as a whole I/O, instead it will send piece by piece. I've tried to change the FirstBurstLength to 16K and it works for the IO < 16K. Considering this involves the IO split, the network impact, so the the QoS goal is not able to achieve. For Read, it has a larger upper limit, so that the normal Read I/O sent from Initiator side is in one single I/O.

@Comphix
Copy link
Contributor

Comphix commented Aug 9, 2018

By the way, as the Bandwidth / IOPS is shown at the initiator side from the FIO, and the Write I/O split happens at the Initiator side and seems like the Target side QoS could not do too much handling. If split at the Target side SPDK bdev layer, it will just be OK. This issue seems only for iSCSI target and the NVMe-oF target / vhost target works as QoS limited.

@Comphix
Copy link
Contributor

Comphix commented Aug 9, 2018

For the Read from iSCSI initiator, it will also be affected by below negotiated parameter between iSCSI initiator and iSCSI target.

/*

  • SPDK iSCSI target currently only supports 64KB as the maximum data segment length
  • it can receive from initiators. Other values may work, but no guarantees.
    */
    #define SPDK_ISCSI_MAX_RECV_DATA_SEGMENT_LENGTH 65536

@Comphix
Copy link
Contributor

Comphix commented Aug 10, 2018

Please check patch https://review.gerrithub.io/#/c/spdk/spdk/+/421142/ where I have made a change to follow the RFC3720 to increase the FirstBurstLength as the default value of 64K. For the I/O size <= 64K, the QoS rate limits should be managed without splitting of more PDUs. For I/O size > 64KB, will document this behavior later.

@Comphix
Copy link
Contributor

Comphix commented Aug 13, 2018

https://review.gerrithub.io/#/c/spdk/spdk/+/421142/ Updated this patch to have the FirstBurstLength as a user configurable parameter. In the iscsi.conf file, it can be configured as FirstBurstLength 65536. In this way, the Write I/O <= 65536 can be QoS rate limit controlled. Still may need to document this behavior and let user to configure the own FirstBurstLength and set the proper QoS goal. Thanks, Gang

@ghost
Copy link
Author

ghost commented Aug 13, 2018

@Comphix I have just tested your patch, when set FirstBurstLength great than 16384, the iscsi_tgt can't be started. Reason for failure: create PDU data out pool failed,

@Comphix
Copy link
Contributor

Comphix commented Aug 13, 2018

Yes. Bigger FirstBurstLength needs bigger PDU pool. You can use NRHUGE to allocate more Hugepages and start iSCSI target again.

@wanqunintel
Copy link

@chenlo2x can you update the status here with allocate more Hugepages

@ghost
Copy link
Author

ghost commented Aug 14, 2018

As you said, it works.

@jimharris
Copy link
Member

I am OK with the 421142 commit. But I don't think changing the FirstBurstLength to 64KB is a good option - it results in a much higher memory footprint. It's still not clear to me why I/O size > 64KB would affect the bandwidth limits. Can you run some additional experiments, with default FirstBurstLength = 8KB?

  1. Does this problem happen with reads also, or only writes?
  2. Can you clarify exactly where this issue starts to occur? For example, does 32KB I/O size show expected QoS? What about 60KB?
  3. If you disable QoS, what is the bandwidth?

I think we need to look at the R2T logic as well. Increasing FirstBurstLength to 64KB means R2T is not needed (all data is immediate) and would work around an R2T bug.

@ZiyeYang can help here - he knows the iSCSI code well.

Thanks,

-Jim

@Comphix
Copy link
Contributor

Comphix commented Aug 23, 2018

For the Read from iSCSI initiator, it will also be affected by below negotiated parameter between iSCSI initiator and iSCSI target.

/*
SPDK iSCSI target currently only supports 64KB as the maximum data segment length
it can receive from initiators. Other values may work, but no guarantees.
*/
#define SPDK_ISCSI_MAX_RECV_DATA_SEGMENT_LENGTH 65536

The issue will occur when write size is bigger than 8K (FirstBurstLength). The larger of the write IO size, the far of the actual limit.

For IOPS side, it could be easier to explain. From FIO at Initiator side, each larger l/O like 64KB is counted as one IO, however at the iSCSI Target side, each received IO is counted as an individual one for the QoS limit. The larger write will be split majorly based on the FirstBurstLength. So in this case, for a 64KB FIO I/O, at the target side it will count as 9 (there is other overhead) I/Os. So that IOPS rate limit can be incorrect. For example, if we set 10000 IOPS at the Target side, the FIO will just show 10000/9 IOPS.

For Bandwidth, I am also trying to understand this, as even split into more sub I/Os, the overall bandwidth shall be same except there are many Network Communication overhead due to more split I/Os. However it seems to have the same symptom of IOPS. It will have lower actual Bandwidth after splitting into more I/Os.

@Comphix
Copy link
Contributor

Comphix commented Aug 23, 2018

Talked with Ziye, maybe we need some hint here for the upper layer like iSCSI target to send some original I/O related information from initiator side to bdev layer which is a common layer so that the QoS can be managed more accurately.

@Comphix
Copy link
Contributor

Comphix commented Aug 23, 2018

For the bandwidth, looks like due to the split and high I/O queue depth, those small split I/Os are handled in order by not at one time. For example, several first part of a single data out write from Initiator side can be handled at one time and the left parts may be handled later. In this case, due to the split and randomly handling of the left I/Os (may due to the network communication). The goal set at the Target side can be very different than the actual result at the Initiator side.

@Comphix
Copy link
Contributor

Comphix commented Aug 29, 2018

Related patch https://review.gerrithub.io/#/c/spdk/spdk/+/421142/ has been merged. It more likes a tune-able method to handle the larger Write and have controllable QoS rate limits.

@jimharris
Copy link
Member

421142 has been merged. This issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants