Permalink
Commits on Nov 9, 2017
  1. block/core: kill useless warning

    pfactum committed Nov 9, 2017
    Signed-off-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Commits on Nov 8, 2017
  1. nvme: fix __nvme_submit_sync_cmd prototype

    pfactum committed Nov 8, 2017
    Signed-off-by: Oleksandr Natalenko <oleksandr@natalenko.name>
  2. block, nvme: Introduce blk_mq_req_flags_t

    bvanassche authored and pfactum committed Oct 30, 2017
    Several block layer and NVMe core functions accept a combination
    of BLK_MQ_REQ_* flags through the 'flags' argument but there is
    no verification at compile time whether the right type of block
    layer flags is passed. Make it possible for sparse to verify this.
    This patch does not change any functionality.
    
    Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
    Reviewed-by: Hannes Reinecke <hare@suse.com>
    Cc: linux-nvme@lists.infradead.org
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Johannes Thumshirn <jthumshirn@suse.de>
    Cc: Ming Lei <ming.lei@redhat.com>
  3. block, scsi: Make SCSI quiesce and resume work reliably

    bvanassche authored and pfactum committed Oct 30, 2017
    The contexts from which a SCSI device can be quiesced or resumed are:
    * Writing into /sys/class/scsi_device/*/device/state.
    * SCSI parallel (SPI) domain validation.
    * The SCSI device power management methods. See also scsi_bus_pm_ops.
    
    It is essential during suspend and resume that neither the filesystem
    state nor the filesystem metadata in RAM changes. This is why while
    the hibernation image is being written or restored that SCSI devices
    are quiesced. The SCSI core quiesces devices through scsi_device_quiesce()
    and scsi_device_resume(). In the SDEV_QUIESCE state execution of
    non-preempt requests is deferred. This is realized by returning
    BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI
    devices. Avoid that a full queue prevents power management requests
    to be submitted by deferring allocation of non-preempt requests for
    devices in the quiesced state. This patch has been tested by running
    the following commands and by verifying that after each resume the
    fio job was still running:
    
    for ((i=0; i<10; i++)); do
      (
        cd /sys/block/md0/md &&
        while true; do
          [ "$(<sync_action)" = "idle" ] && echo check > sync_action
          sleep 1
        done
      ) &
      pids=($!)
      for d in /sys/class/block/sd*[a-z]; do
        bdev=${d#/sys/class/block/}
        hcil=$(readlink "$d/device")
        hcil=${hcil#../../../}
        echo 4 > "$d/queue/nr_requests"
        echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth"
        fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 \
          --rw=randread --ioengine=libaio --numjobs=4 --iodepth=16       \
          --iodepth_batch=1 --thread --loops=$((2**31)) &
        pids+=($!)
      done
      sleep 1
      echo "$(date) Hibernating ..." >>hibernate-test-log.txt
      systemctl hibernate
      sleep 10
      kill "${pids[@]}"
      echo idle > /sys/block/md0/md/sync_action
      wait
      echo "$(date) Done." >>hibernate-test-log.txt
    done
    
    Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
    References: "I/O hangs after resuming from suspend-to-ram" (https://marc.info/?l=linux-block&m=150340235201348).
    Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
    Tested-by: Martin Steigerwald <martin@lichtvoll.de>
    Reviewed-by: Hannes Reinecke <hare@suse.com>
    Cc: Martin K. Petersen <martin.petersen@oracle.com>
    Cc: Ming Lei <ming.lei@redhat.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Johannes Thumshirn <jthumshirn@suse.de>
  4. block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag

    bvanassche authored and pfactum committed Nov 7, 2017
    This flag will be used in the next patch to let the block layer
    core know whether or not a SCSI request queue has been quiesced.
    A quiesced SCSI queue namely only processes RQF_PREEMPT requests.
    
    Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
    Reviewed-by: Hannes Reinecke <hare@suse.com>
    Tested-by: Martin Steigerwald <martin@lichtvoll.de>
    Cc: Ming Lei <ming.lei@redhat.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Johannes Thumshirn <jthumshirn@suse.de>
  5. ide, scsi: Tell the block layer at request allocation time about pree…

    bvanassche authored and pfactum committed Oct 30, 2017
    …mpt requests
    
    Convert blk_get_request(q, op, __GFP_RECLAIM) into
    blk_get_request_flags(q, op, BLK_MQ_PREEMPT). This patch does not
    change any functionality.
    
    Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
    Tested-by: Martin Steigerwald <martin@lichtvoll.de>
    Acked-by: David S. Miller <davem@davemloft.net> [ for IDE ]
    Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
    Reviewed-by: Hannes Reinecke <hare@suse.com>
    Cc: Ming Lei <ming.lei@redhat.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Johannes Thumshirn <jthumshirn@suse.de>
  6. block: Introduce BLK_MQ_REQ_PREEMPT

    bvanassche authored and pfactum committed Oct 30, 2017
    Set RQF_PREEMPT if BLK_MQ_REQ_PREEMPT is passed to
    blk_get_request_flags().
    
    Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
    Reviewed-by: Hannes Reinecke <hare@suse.com>
    Tested-by: Martin Steigerwald <martin@lichtvoll.de>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Ming Lei <ming.lei@redhat.com>
    Cc: Johannes Thumshirn <jthumshirn@suse.de>
  7. block: Introduce blk_get_request_flags()

    bvanassche authored and pfactum committed Oct 30, 2017
    A side effect of this patch is that the GFP mask that is passed to
    several allocation functions in the legacy block layer is changed
    from GFP_KERNEL into __GFP_DIRECT_RECLAIM.
    
    Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
    Reviewed-by: Hannes Reinecke <hare@suse.com>
    Tested-by: Martin Steigerwald <martin@lichtvoll.de>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Ming Lei <ming.lei@redhat.com>
    Cc: Johannes Thumshirn <jthumshirn@suse.de>
  8. block: Make q_usage_counter also track legacy requests

    Ming Lei authored and pfactum committed Oct 30, 2017
    This patch makes it possible to pause request allocation for
    the legacy block layer by calling blk_mq_freeze_queue() and
    blk_mq_unfreeze_queue().
    
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    [ bvanassche: Combined two patches into one, edited a comment and made sure
      REQ_NOWAIT is handled properly in blk_old_get_request() ]
    Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
    Reviewed-by: Hannes Reinecke <hare@suse.com>
    Tested-by: Martin Steigerwald <martin@lichtvoll.de>
    Cc: Ming Lei <ming.lei@redhat.com>
  9. mq-deadline: add 'deadline' as a name alias

    axboe authored and pfactum committed Oct 25, 2017
    The scheduler framework now supports looking up the appropriate
    scheduler with the {name,mq} tupple. We can register mq-deadline
    with the alias of 'deadline', so that switching to 'deadline'
    will do the right thing based on the type of driver attached to
    it.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Reviewed-by: Omar Sandoval <osandov@fb.com>
  10. elevator: allow name aliases

    axboe authored and pfactum committed Oct 25, 2017
    Since we now lookup elevator types with the appropriate multiqueue
    capability, allow schedulers to register with an alias alongside
    the real name. This is in preparation for allowing 'mq-deadline'
    to register an alias of 'deadline' as well.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  11. elevator: lookup mq vs non-mq elevators

    axboe authored and pfactum committed Oct 25, 2017
    If an IO scheduler is selected via elevator= and it doesn't match
    the driver in question wrt blk-mq support, then we fail to boot.
    
    The elevator= parameter is deprecated and only supported for
    non-mq devices. Augment the elevator lookup API so that we
    pass in if we're looking for an mq capable scheduler or not,
    so that we only ever return a valid type for the queue in
    question.
    
    Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=196695
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  12. block: avoid to fail elevator switch

    Ming Lei authored and pfactum committed Oct 27, 2017
    elevator switch can be done just between register_disk() and
    blk_register_queue(), then we can't change elevator during this period
    because FLAG_REGISTERED isn't set at that time.
    
    One typical use case is that elevator is changed via udev by the following
    rule, and the KOBJ_ADD uevent is just emited at the end of register_disk()
    and before running blk_register_queue().
    
    ACTION=="add|change", SUBSYSTEM=="block" , KERNEL=="sda",  RUN+="/bin/sh -c 'echo none > /sys/block/sda/queue/scheduler'"
    
    This patch fixes the elevator switch failure issue.
    
    Fixes: e9a823f(block: fix warning when I/O elevator is changed as request_queue is being removed)
    Cc: David Jeffery <djeffery@redhat.com>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
  13. block: Invalidate cache on discard v2

    Dmitry Monakhov authored and pfactum committed Oct 25, 2017
    It is reasonable drop page cache on discard, otherwise that pages may
    be written by writeback second later, so thin provision devices will
    not be happy. This seems to be a  security leak in case of secure discard case.
    
    Also add check for queue_discard flag on early stage.
    
    Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  14. block/bfq: fix unbalanced decrements of burst size

    Paolo Valente authored and pfactum committed Oct 7, 2017
    The commit "block/bfq: decrease burst size when queues in burst
    exit" introduced the decrement of burst_size on the removal of a
    bfq_queue from the burst list. Unfortunately, this decrement can
    happen to be performed even when burst size is already equal to 0,
    because of unbalanced decrements. A description follows of the cause
    of these unbalanced decrements, namely a wrong assumption, and of the
    way how this wrong assumption leads to unbalanced decrements.
    
    The wrong assumption is that a bfq_queue can exit only if the process
    associated with the bfq_queue has exited. This is false, because a
    bfq_queue, say Q, may exit also as a consequence of a merge with
    another bfq_queue. In this case, Q exits because the I/O of its
    associated process has been redirected to another bfq_queue.
    
    The decrement unbalance occurs because Q may then be re-created after
    a split, and added back to the current burst list, *without*
    incrementing burst_size. burst_size is not incremented because Q is
    not a new bfq_queue added to the burst list, but a bfq_queue only
    temporarily removed from the list, and, before the commit "block/bfq:
    decrease burst size when queues in burst exit", burst_size was
    not decremented when Q was removed.
    
    This commit addresses this issue by just checking whether the exiting
    bfq_queue is a merged bfq_queue, and, in that case, not decrementing
    burst_size. Unfortunately, this still leaves room for unbalanced
    decrements, in the following rarer case: on a split, the bfq_queue
    happens to be inserted into a different burst list than that it was
    removed from when merged. If this happens, the number of elements in
    the new burst list becomes higher than burst_size (by one). When the
    bfq_queue then exits, it is of course not in a merged state any
    longer, thus burst_size is decremented, which results in an unbalanced
    decrement.  To handle this sporadic, unlucky case in a simple way,
    this commit also checks that burst_size is larger than 0 before
    decrementing it.
    
    Finally, this commit removes an useless, extra check: the check that
    the bfq_queue is sync, performed before checking whether the bfq_queue
    is in the burst list. This extra check is redundant, because only sync
    bfq_queues can be inserted into the burst list.
    
    Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
    Tested-by: Angelo Ruocco <angeloruocco90@gmail.com>
  15. block/bfq: decrease burst size when queues in burst exit

    Paolo Valente authored and pfactum committed Sep 19, 2017
    If many queues belonging to the same group happen to be created
    shortly after each other, then BFQ does not provide these queues with
    either weight raising or device idling. The reason is that the
    processes associated with these queues have typically a common goal,
    and get it done as soon as possible if not hampered by device idling
    (which is, instead, fundamental for weight-raising to work).  Examples
    are processes spawned by git grep, or by systemd during boot.
    
    On the other hand, a burst of queue creations may be caused also by
    the start-up of a complex application. In this case, these queues need
    usually to be served one after the other, and as quickly as possible,
    to maximise responsiveness. Therefore, in this case the best strategy
    is to weight-raise all the queues created during the burst, i.e., the
    exact opposite of the strategy for the above case.
    
    To distinguish between the two cases, BFQ uses an empirical burst-size
    threshold, found through extensive tests and monitoring of daily
    usage. Only large bursts, i.e., burst with a size above this
    threshold, are considered as generated by a high number of parallel
    processes. In this respect, upstart-based boot proved to be rather
    hard to detect as generating a large burst of queue creations, because
    with upstart most of the queues created in a burst exit *before* the
    next queues in the same burst are created. To address this issue, I
    changed the burst-detection mechanism so as to not decrease the size
    of the current burst even if one of the queues in the burst is
    eliminated.
    
    Unfortunately, this missing decrease causes false positives on very
    fast systems: on the start-up of a complex application, such as
    libreoffice writer, so many queues are created, served and exited
    shortly after each other, that a large burst of queue creations is
    wrongly detected as occurring. These false positives just disappear if
    the size of a burst is decreased when one of the queues in the burst
    exits. This commit restores the missing burst-size decrease, relying
    of the fact that upstart is apparently unlikely to be used on systems
    running this and future versions of the kernel.
    
    Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
    Signed-off-by: Mauro Andreolini <mauro.andreolini@unimore.it>
    Signed-off-by: Angelo Ruocco <angeloruocco90@gmail.com>
    Tested-by: Mirko Montanari <mirkomontanari91@gmail.com>
  16. block/bfq: let early-merged queues be weight-raised on split too

    Paolo Valente authored and pfactum committed Sep 19, 2017
    A just-created bfq_queue, say Q, may happen to be merged with another
    bfq_queue on the very first invocation of the function
    __bfq_insert_request. In such a case, even if Q would clearly deserve
    interactive weight raising (as it has just been created), the function
    bfq_add_request does not make it to be invoked for Q, and thus to
    activate weight raising for Q. As a consequence, when the state of Q
    is saved for a possible future restore, after a split of Q from the
    other bfq_queue(s), such a state happens to be (unjustly)
    non-weight-raised. Then the bfq_queue will not enjoy any weight
    raising on the split, even if should still be in an interactive
    weight-raising period when the split occurs.
    
    This commit solves this problem as follows, for a just-created
    bfq_queue that is being early-merged: it stores directly, in the saved
    state of the bfq_queue, the weight-raising state that would have been
    assigned to the bfq_queue if not early-merged.
    
    Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
    Tested-by: Angelo Ruocco <angeloruocco90@gmail.com>
    Tested-by: Mirko Montanari <mirkomontanari91@gmail.com>
  17. block/bfq: check and switch back to interactive wr also on queue split

    Paolo Valente authored and pfactum committed Sep 19, 2017
    As already explained in the message of commit "block/bfq: fix
    wrong init of saved start time for weight raising", if a soft
    real-time weight-raising period happens to be nested in a larger
    interactive weight-raising period, then BFQ restores the interactive
    weight raising at the end of the soft real-time weight raising. In
    particular, BFQ checks whether the latter has ended only on request
    dispatches.
    
    Unfortunately, the above scheme fails to restore interactive weight
    raising in the following corner case: if a bfq_queue, say Q,
    1) Is merged with another bfq_queue while it is in a nested soft
    real-time weight-raising period. The weight-raising state of Q is
    then saved, and not considered any longer until a split occurs.
    2) Is split from the other bfq_queue(s) at a time instant when its
    soft real-time weight raising is already finished.
    On the split, while resuming the previous, soft real-time
    weight-raised state of the bfq_queue Q, BFQ checks whether the
    current soft real-time weight-raising period is actually over. If so,
    BFQ switches weight raising off for Q, *without* checking whether the
    soft real-time period was actually nested in a non-yet-finished
    interactive weight-raising period.
    
    This commit addresses this issue by adding the above missing check in
    bfq_queue splits, and restoring interactive weight raising if needed.
    
    Oleksandr: adopted one hunk against previous fixes.
    
    Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
    Tested-by: Angelo Ruocco <angeloruocco90@gmail.com>
    Tested-by: Mirko Montanari <mirkomontanari91@gmail.com>
    Signed-off-by: Oleksandr Natalenko <oleksandr@natalenko.name>
  18. block/bfq: fix wrong init of saved start time for weight raising

    Paolo Valente authored and pfactum committed Sep 13, 2017
    This commit fixes a bug causing bfq to fail to guarantee high
    responsiveness on some drives, if there is heavy random read+write I/O
    in the background. More precisely, this failure allowed this bug to be
    found [1], but the bug may well cause other yet unreported anomalies.
    
    BFQ raises the weight of the bfq_queues associated to soft real-time
    applications, to privilege the I/O, and thus reduce latency for the
    latter. This mechanism is named soft-real-time weight raising in
    BFQ. It may happen that, when a bfq_queue switches to a soft real-time
    weight-raised state, the bfq_queue is already being weight-raised for
    a different reason: because it is deemed interactive too. In this
    case, BFQ saves in a special variable wr_start_at_switch_to_srt, the
    time instant when the interactive weight-raising period started for
    the bfq_queue, i.e., the time instant when BFQ started to deem the
    bfq_queue interactive. This value is then used to understand whether
    the interactive weight-raising period needs to be restored for the
    bfq_queue, when the soft real-time weight-raising period ends.
    
    If, instead, a bfq_queue switches to soft-real-time weight raising
    while it *is not* already in an interactive weight-raising period,
    then the variable wr_start_at_switch_to_srt has no meaning during the
    following soft real-time weight-raising period. Unfortunately the
    handling of this case is wrong in BFQ: not only the variable is not
    flagged somehow as meaningless, but it is also set to the time when
    the switch to soft real-time weight-raising occurs. This may cause an
    interactive weight-raising period to be considered mistakenly as still
    in progress, and thus a spurious interactive weight-raising period to
    start for the bfq_queue, at the end of the soft-real-time
    weight-raising period. In particular the spurious interactive
    weight-raising period will be considered as still in progress, if the
    soft-real-time weight-raising period does not last very long. The
    bfq_queue will then be wrongly privileged and, if I/O bound, will
    unjustly steal bandwidth to truly interactive or soft real-time
    bfq_queues, harming responsiveness and low latency.
    
    This commit fixes this issue by just setting wr_start_at_switch_to_srt
    to minus infinity (farthest past time instant according to jiffies
    macros): when the soft-real-time weight-raising period ends, certainly
    no interactive weight-raising period will be considered as still in
    progress.
    
    [1] Background I/O Type: Random - Background I/O mix: Reads and writes
    - Application to start: LibreOffice Writer in
    http://www.phoronix.com/scan.php?page=news_item&px=Linux-4.13-IO-Laptop
    
    Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
    Signed-off-by: Angelo Ruocco <angeloruocco90@gmail.com>
  19. block: directly insert blk-mq request from blk_insert_cloned_request()

    axboe authored and pfactum committed Sep 11, 2017
    A NULL pointer crash was reported for the case of having the BFQ IO
    scheduler attached to the underlying blk-mq paths of a DM multipath
    device.  The crash occured in blk_mq_sched_insert_request()'s call to
    e->type->ops.mq.insert_requests().
    
    Paolo Valente correctly summarized why the crash occured with:
    "the call chain (dm_mq_queue_rq -> map_request -> setup_clone ->
    blk_rq_prep_clone) creates a cloned request without invoking
    e->type->ops.mq.prepare_request for the target elevator e.  The cloned
    request is therefore not initialized for the scheduler, but it is
    however inserted into the scheduler by blk_mq_sched_insert_request."
    
    All said, a request-based DM multipath device's IO scheduler should be
    the only one used -- when the original requests are issued to the
    underlying paths as cloned requests they are inserted directly in the
    underlying dispatch queue(s) rather than through an additional elevator.
    
    But commit bd166ef ("blk-mq-sched: add framework for MQ capable IO
    schedulers") switched blk_insert_cloned_request() from using
    blk_mq_insert_request() to blk_mq_sched_insert_request().  Which
    incorrectly added elevator machinery into a call chain that isn't
    supposed to have any.
    
    To fix this introduce a blk-mq private blk_mq_request_bypass_insert()
    that blk_insert_cloned_request() calls to insert the request without
    involving any elevator that may be attached to the cloned request's
    request_queue.
    
    Fixes: bd166ef ("blk-mq-sched: add framework for MQ capable IO schedulers")
    Cc: stable@vger.kernel.org
    Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com>
    Tested-by: Mike Snitzer <snitzer@redhat.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  20. block,bfq: Disable writeback throttling

    Algodev-github authored and pfactum committed Sep 8, 2017
    Similarly to CFQ, BFQ has its write-throttling heuristics, and it
    is better not to combine them with further write-throttling
    heuristics of a different nature.
    So this commit disables write-back throttling for a device if BFQ
    is used as I/O scheduler for that device.
    
    Signed-off-by: Luca Miccio <lucmiccio@gmail.com>
    Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
  21. block: fix warning when I/O elevator is changed as request_queue is b…

    David Jeffery authored and pfactum committed Aug 28, 2017
    …eing removed
    
    There is a race between changing I/O elevator and request_queue removal
    which can trigger the warning in kobject_add_internal.  A program can
    use sysfs to request a change of elevator at the same time another task
    is unregistering the request_queue the elevator would be attached to.
    The elevator's kobject will then attempt to be connected to the
    request_queue in the object tree when the request_queue has just been
    removed from sysfs.  This triggers the warning in kobject_add_internal
    as the request_queue no longer has a sysfs directory:
    
    kobject_add_internal failed for iosched (error: -2 parent: queue)
    ------------[ cut here ]------------
    WARNING: CPU: 3 PID: 14075 at lib/kobject.c:244 kobject_add_internal+0x103/0x2d0
    
    To fix this warning, we can check the QUEUE_FLAG_REGISTERED flag when
    changing the elevator and use the request_queue's sysfs_lock to
    serialize between clearing the flag and the elevator testing the flag.
    
    Signed-off-by: David Jeffery <djeffery@redhat.com>
    Tested-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  22. bfq: Use icq_to_bic() consistently

    bvanassche authored and pfactum committed Aug 30, 2017
    Some code uses icq_to_bic() to convert an io_cq pointer to a
    bfq_io_cq pointer while other code uses a direct cast. Convert
    the code that uses a direct cast such that it uses icq_to_bic().
    
    Acked-by: Paolo Valente <paolo.valente@linaro.org>
    Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  23. bfq: Suppress compiler warnings about comparisons

    bvanassche authored and pfactum committed Aug 30, 2017
    This patch avoids that the following warnings are reported when
    building with W=1:
    
    block/bfq-iosched.c: In function 'bfq_back_seek_max_store':
    block/bfq-iosched.c:4860:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
      if (__data < (MIN))      \
                 ^
    block/bfq-iosched.c:4876:1: note: in expansion of macro 'STORE_FUNCTION'
     STORE_FUNCTION(bfq_back_seek_max_store, &bfqd->bfq_back_max, 0, INT_MAX, 0);
     ^~~~~~~~~~~~~~
    block/bfq-iosched.c: In function 'bfq_slice_idle_store':
    block/bfq-iosched.c:4860:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
      if (__data < (MIN))      \
                 ^
    block/bfq-iosched.c:4879:1: note: in expansion of macro 'STORE_FUNCTION'
     STORE_FUNCTION(bfq_slice_idle_store, &bfqd->bfq_slice_idle, 0, INT_MAX, 2);
     ^~~~~~~~~~~~~~
    block/bfq-iosched.c: In function 'bfq_slice_idle_us_store':
    block/bfq-iosched.c:4892:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
      if (__data < (MIN))      \
                 ^
    block/bfq-iosched.c:4899:1: note: in expansion of macro 'USEC_STORE_FUNCTION'
     USEC_STORE_FUNCTION(bfq_slice_idle_us_store, &bfqd->bfq_slice_idle, 0,
     ^~~~~~~~~~~~~~~~~~~
    
    Acked-by: Paolo Valente <paolo.valente@linaro.org>
    Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  24. bfq: Check kstrtoul() return value

    bvanassche authored and pfactum committed Aug 30, 2017
    Make sysfs writes fail for invalid numbers instead of storing
    uninitialized data copied from the stack. This patch removes
    all uninitialized_var() occurrences from the BFQ source code.
    
    Acked-by: Paolo Valente <paolo.valente@linaro.org>
    Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  25. bfq: Declare local functions static

    bvanassche authored and pfactum committed Aug 30, 2017
    Acked-by: Paolo Valente <paolo.valente@linaro.org>
    Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  26. bfq: Annotate fall-through in a switch statement

    bvanassche authored and pfactum committed Aug 30, 2017
    This patch avoids that gcc 7 issues a warning about fall-through
    when building with W=1.
    
    Acked-by: Paolo Valente <paolo.valente@linaro.org>
    Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  27. doc, block, bfq: better describe how to properly configure bfq

    Paolo Valente authored and pfactum committed Aug 31, 2017
    Many users have reported the lack of an HOWTO for properly configuring
    bfq as a function of the goal one wants to achieve (max
    responsiveness, max throughput, ...). In fact, all needed details are
    already provided in the documentation file bfq-iosched.txt. Yet the
    document lacks guidance on which parameter descriptions to look
    at. This commit adds some simple direction.
    
    Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
    Reviewed-by: Jeremy Hickman <jeremywh7@gmail.com>
    Reviewed-by: Laurentiu Nicola <lnicola@dend.ro>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  28. doc, block, bfq: fix some typos and remove stale stuff

    Paolo Valente authored and pfactum committed Aug 31, 2017
    In addition to containing some typos and stale sentences, the file
    bfq-iosched.txt still mentioned a set of sysfs parameters that have
    been removed from this version of bfq. This commit fixes all these
    issues.
    
    Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
    Reviewed-by: Jeremy Hickman <jeremywh7@gmail.com>
    Reviewed-by: Laurentiu Nicola <lnicola@dend.ro>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  29. block, scheduler: convert xxx_var_store to void

    weiping zhang authored and pfactum committed Aug 24, 2017
    The last parameter "count" never be used in xxx_var_store,
    convert these functions to void.
    
    Signed-off-by: weiping zhang <zhangweiping@didichuxing.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
  30. block/bfq: guarantee update_next_in_service always returns an eligibl…

    Paolo Valente authored and pfactum committed Aug 29, 2017
    …e entity
    
    If the function bfq_update_next_in_service is invoked as a consequence
    of the activation or requeueing of an entity, say E, then it doesn't
    invoke bfq_lookup_next_entity to get the next-in-service entity. In
    contrast, it follows a shorter path: if E happens to be eligible (see
    commit "block/bfq: make lookup_next_entity push up vtime on
    expirations" for details on eligibility) and to have a lower virtual
    finish time than the current candidate as next-in-service entity, then
    E directly becomes the next-in-service entity. Unfortunately, there is
    a corner case for which this shorter path makes
    bfq_update_next_in_service choose a non eligible entity: it occurs if
    both E and the current next-in-service entity happen to be non
    eligible when bfq_update_next_in_service is invoked. In this case, E
    is not set as next-in-service, and, since bfq_lookup_next_entity is
    not invoked, the state of the parent entity is not updated so as to
    end up with an eligible entity as the proper next-in-service entity.
    
    In this respect, next-in-service is actually allowed to be non
    eligible while some queue is in service: since no system-virtual-time
    push-up can be performed in that case (see again commit "block/bfq:
    make lookup_next_entity push up vtime on expirations" for details),
    next-in-service is chosen, speculatively, as a function of the
    possible value that the system virtual time may get after a push
    up. But the correctness of the schedule breaks if next-in-service is
    still a non eligible entity when it is time to set in service the next
    entity. Unfortunately, this may happen in the above corner case.
    
    This commit fixes this problem by making bfq_update_next_in_service
    invoke bfq_lookup_next_entity not only if the above shorter path
    cannot be taken, but also if the shorter path is taken but fails to
    yield an eligible next-in-service entity.
    
    Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
  31. block/bfq: remove direct switch to an entity in higher class

    Paolo Valente authored and pfactum committed Aug 29, 2017
    If the function bfq_update_next_in_service is invoked as a consequence
    of the activation or requeueing of an entity, say E, and finds out
    that E belongs to a higher-priority class than that of the current
    next-in-service entity, then it sets next_in_service directly to
    E. But this may lead to anomalous schedules, because E may happen not
    be eligible for service, because its virtual start time is higher than
    the system virtual time for its service tree.
    
    This commit addresses this issue by simply removing this direct
    switch.
    
    Signed-off-by: Paolo Valente <paolo.valente@linaro.org>