Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: perform the audit multicast send from the kauditd thread #23

Closed
pcmoore opened this issue Oct 21, 2016 · 5 comments
Closed

RFE: perform the audit multicast send from the kauditd thread #23

pcmoore opened this issue Oct 21, 2016 · 5 comments

Comments

@pcmoore
Copy link
Member

pcmoore commented Oct 21, 2016

We currently queue the audit unicast sends to a separate thread while we send the multicast messages immediately. There doesn't appear to be a reason why we can't also do the multicast send from the separate thread as long as we are careful to only queue the skb once and move the netlink tweaks to the dedicated kauditd_thread.

@pcmoore
Copy link
Member Author

pcmoore commented Oct 21, 2016

Upstream discussion that spawned this RFE:

Related to issue #22.

@pcmoore
Copy link
Member Author

pcmoore commented Oct 22, 2016

In addition, once we have rewritten the audit multicast code, we should reintroduce the changes that were causing the problem identified in #22, if that commit was reverted.

fengguang pushed a commit to 0day-ci/linux that referenced this issue Oct 22, 2016
This reverts commit bc51ddd ("netns: avoid disabling irq for
netns id") as it was found to cause problems with systems running
SELinux/audit, see the mailing list thread below:

 * http://marc.info/?t=147694653900002&r=1&w=2

Eventually we should be able to reintroduce this code once we have
rewritten the audit multicast code to queue messages much the same
way we do for unicast messages.  A tracking issue for this can be
found below:

 * linux-audit/audit-kernel#23

Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
Reported-by: Elad Raz <e@eladraz.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
jpirko pushed a commit to jpirko/linux_mlxsw that referenced this issue Oct 23, 2016
This reverts commit bc51ddd ("netns: avoid disabling irq for
netns id") as it was found to cause problems with systems running
SELinux/audit, see the mailing list thread below:

 * http://marc.info/?t=147694653900002&r=1&w=2

Eventually we should be able to reintroduce this code once we have
rewritten the audit multicast code to queue messages much the same
way we do for unicast messages.  A tracking issue for this can be
found below:

 * linux-audit/audit-kernel#23

Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
Reported-by: Elad Raz <e@eladraz.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pcmoore added a commit to pcmoore/misc-linux_kernel that referenced this issue Nov 24, 2016
Sending audit netlink multicast messages is bad for all the same
reasons that sending audit netlink unicast messages is bad, so this
patch reworks things so that we don't do the multicast send in
audit_log_end(), we do it from the dedicated kauditd_thread thread just
as we do for unicast messages.

See the GitHub issues below for more information/history:

 * linux-audit/audit-kernel#23
 * linux-audit/audit-kernel#22

Signed-off-by: Paul Moore <paul@paul-moore.com>
@pcmoore
Copy link
Member Author

pcmoore commented Nov 24, 2016

@pcmoore
Copy link
Member Author

pcmoore commented Nov 29, 2016

The RFC patchset is now in audit#next:

... and the revert from issue #22 has been reverted in audit#next as well:

pcmoore added a commit that referenced this issue Nov 29, 2016
Sending audit netlink multicast messages is bad for all the same
reasons that sending audit netlink unicast messages is bad, so this
patch reworks things so that we don't do the multicast send in
audit_log_end(), we do it from the dedicated kauditd_thread thread just
as we do for unicast messages.

See the GitHub issues below for more information/history:

 * #23
 * #22

Signed-off-by: Paul Moore <paul@paul-moore.com>
pcmoore added a commit that referenced this issue Nov 29, 2016
Bring back commit bc51ddd ("netns: avoid disabling irq for netns
id") now that we've fixed some audit multicast issues that caused
problems with original attempt.  Additional information, and history,
can be found in the links below:

 * #22
 * #23

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
fengguang pushed a commit to 0day-ci/linux that referenced this issue Dec 2, 2016
GIT 8375913f32f3e6f5bbdb18bd36046a07bbfb7654

commit e8d7c33232e5fdfa761c3416539bc5b4acd12db5
Author: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Date:   Sun Nov 27 19:32:32 2016 +0300

    md/raid5: limit request size according to implementation limits
    
    Current implementation employ 16bit counter of active stripes in lower
    bits of bio->bi_phys_segments. If request is big enough to overflow
    this counter bio will be completed and freed too early.
    
    Fortunately this not happens in default configuration because several
    other limits prevent that: stripe_cache_size * nr_disks effectively
    limits count of active stripes. And small max_sectors_kb at lower
    disks prevent that during normal read/write operations.
    
    Overflow easily happens in discard if it's enabled by module parameter
    "devices_handle_discard_safely" and stripe_cache_size is set big enough.
    
    This patch limits requests size with 256Mb - 8Kb to prevent overflows.
    
    Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
    Cc: Shaohua Li <shli@kernel.org>
    Cc: Neil Brown <neilb@suse.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Shaohua Li <shli@fb.com>

commit 1a0ec5c30c37d29e4435a45e75c896f91af970bd
Author: JackieLiu <liuyun01@kylinos.cn>
Date:   Tue Nov 29 11:57:30 2016 +0800

    md/raid5-cache: do not need to set STRIPE_PREREAD_ACTIVE repeatedly
    
    R5c_make_stripe_write_out has set this flag, do not need to set again.
    
    Signed-off-by: JackieLiu <liuyun01@kylinos.cn>
    Signed-off-by: Shaohua Li <shli@fb.com>

commit dbd22c8d7fc6276a48627e57a5605cf9565de78a
Author: JackieLiu <liuyun01@kylinos.cn>
Date:   Tue Nov 29 11:13:20 2016 +0800

    md/raid5-cache: remove the unnecessary next_cp_seq field from the r5l_log
    
    The next_cp_seq field is useless, remove it.
    
    Signed-off-by: JackieLiu <liuyun01@kylinos.cn>
    Signed-off-by: Shaohua Li <shli@fb.com>

commit bc8f167f9c4656b7a972936237e9c38e6ab80c67
Author: JackieLiu <liuyun01@kylinos.cn>
Date:   Mon Nov 28 16:19:20 2016 +0800

    md/raid5-cache: release the stripe_head at the appropriate location
    
    If we released the 'stripe_head' in r5c_recovery_flush_log,
    ctx->cached_list will both release the data-parity stripes and
    data-only stripes, which will become empty.
    And we also need to use the data-only stripes in
    r5c_recovery_rewrite_data_only_stripes, so we should wait util rewrite
    data-only stripes is done before releasing them.
    
    Reviewed-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
    Reviewed-by: Song Liu <songliubraving@fb.com>
    Signed-off-by: JackieLiu <liuyun01@kylinos.cn>
    Signed-off-by: Shaohua Li <shli@fb.com>

commit fc833c2a2f4129c42efdaed64b9eb6e9ae5fdcee
Author: JackieLiu <liuyun01@kylinos.cn>
Date:   Mon Nov 28 16:19:19 2016 +0800

    md/raid5-cache: use ring add to prevent overflow
    
    'write_pos' must be protected with 'r5l_ring_add', or it may overflow
    
    Signed-off-by: JackieLiu <liuyun01@kylinos.cn>
    Reviewed-by: Song Liu <songliubraving@fb.com>
    Signed-off-by: Shaohua Li <shli@fb.com>

commit 9b69173e5c6000b2c6fafc5085dcd7b173f073c8
Author: JackieLiu <liuyun01@kylinos.cn>
Date:   Mon Nov 28 16:19:18 2016 +0800

    md/raid5-cache: remove unnecessary function parameters
    
    The function parameter 'recovery_list' is not used in
    body, we can delete it
    
    Signed-off-by: JackieLiu <liuyun01@kylinos.cn>
    Reviewed-by: Song Liu <songliubraving@fb.com>
    Signed-off-by: Shaohua Li <shli@fb.com>

commit 462eb7d87297dae5837f3445b68b79e835ab0d6c
Author: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Date:   Sat Nov 26 10:57:14 2016 +0800

    raid5-cache: don't set STRIPE_R5C_PARTIAL_STRIPE flag while load stripe into cache
    
    r5c_recovery_load_one_stripe should not set STRIPE_R5C_PARTIAL_STRIPE flag,as
    the data-only stripe may be STRIPE_R5C_FULL_STRIPE stripe. The state machine
    would release the stripe later and add it into neither r5c_cached_full_stripes
    list or r5c_cached_partial_stripes list and set correct flag.
    
    Reviewed-by: JackieLiu <liuyun01@kylinos.cn>
    Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
    Signed-off-by: Shaohua Li <shli@fb.com>

commit 5b52b1d37a6fe92bb95c949d56ffd9ec3b7a72b3
Author: Paul Moore <paul@paul-moore.com>
Date:   Tue Nov 29 16:57:48 2016 -0500

    netns: avoid disabling irq for netns id
    
    Bring back commit bc51dddf98c9 ("netns: avoid disabling irq for netns
    id") now that we've fixed some audit multicast issues that caused
    problems with original attempt.  Additional information, and history,
    can be found in the links below:
    
     * https://github.com/linux-audit/audit-kernel/issues/22
     * https://github.com/linux-audit/audit-kernel/issues/23
    
    Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit ef256ce8a87ebecc335bd3c42280030e46428362
Author: Paul Moore <paul@paul-moore.com>
Date:   Tue Nov 29 16:53:26 2016 -0500

    audit: don't ever sleep on a command record/message
    
    Sleeping on a command record/message in audit_log_start() could slow
    something, e.g. auditd, from doing something important, e.g. clean
    shutdown, which could present problems on a heavily loaded system.
    This patch allows tasks to bypass any queue restrictions if they are
    logging a command record/message.
    
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit 2934a794d81231ce905cdb5fa1f02e776c26c78b
Author: Paul Moore <paul@paul-moore.com>
Date:   Tue Nov 29 16:53:26 2016 -0500

    audit: handle a clean auditd shutdown with grace
    
    When auditd stops cleanly it sets 'auditd_pid' to 0 with an
    AUDIT_SET message, in this case we should reset our backlog
    queues via the auditd_reset() function.  This patch also adds
    a 'auditd_pid' check to the top of kauditd_send_unicast_skb()
    so we can fail quicker.
    
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit 917a7707a4a49b3b0d35c53cac40d94bbfa8060b
Author: Paul Moore <paul@paul-moore.com>
Date:   Tue Nov 29 16:53:25 2016 -0500

    audit: wake up kauditd_thread after auditd registers
    
    This patch was suggested by Richard Briggs back in 2015, see the link
    to the mail archive below.  Unfortunately, that patch is no longer
    even remotely valid due to other changes to the code.
    
    * https://www.redhat.com/archives/linux-audit/2015-October/msg00075.html
    
    Suggested-by: Richard Guy Briggs <rgb@redhat.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit 5e6dbd54309b9ddeaa595e5c1015351617bbcc2d
Author: Paul Moore <paul@paul-moore.com>
Date:   Tue Nov 29 16:53:25 2016 -0500

    audit: rework audit_log_start()
    
    The backlog queue handling in audit_log_start() is a little odd with
    some questionable design decisions, this patch attempts to rectify
    this with the following changes:
    
    * Never make auditd wait, ignore any backlog limits as we need auditd
    awake so it can drain the backlog queue.
    
    * When we hit a backlog limit and start dropping records, don't wake
    all the tasks sleeping on the backlog, that's silly.  Instead, let
    kauditd_thread() take care of waking everyone once it has had a chance
    to drain the backlog queue.
    
    * Don't keep a global backlog timeout countdown, make it per-task.  A
    per-task timer means we won't have all the sleeping tasks waking at
    the same time and hammering on an already stressed backlog queue.
    
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit 8b4c3202464e880cb546a2ee6b4a64877c83d2ec
Author: Paul Moore <paul@paul-moore.com>
Date:   Tue Nov 29 16:53:25 2016 -0500

    audit: rework the audit queue handling
    
    The audit record backlog queue has always been a bit of a mess, and
    the moving the multicast send into kauditd_thread() from
    audit_log_end() only makes things worse.  This patch attempts to fix
    the backlog queue with a better design that should hold up better
    under load and have less of a performance impact at syscall
    invocation time.
    
    While it looks like there is a log going on in this patch, the main
    change is the move from a single backlog queue to three queues:
    
    * A queue for holding records generated from audit_log_end() that
    haven't been consumed by kauditd_thread() (audit_queue).
    
    * A queue for holding records that have been sent via multicast but
    had a temporary failure when sending via unicast and need a resend
    (audit_retry_queue).
    
    * A queue for holding records that haven't been sent via unicast
    because no one is listening (audit_hold_queue).
    
    Special care is taken in this patch to ensure that the proper
    record ordering is preserved, e.g. we send everything in the hold
    queue first, then the retry queue, and finally the main queue.
    
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit 2af6ea0994c77c87ffb8620df94498fb288dff55
Author: Paul Moore <paul@paul-moore.com>
Date:   Tue Nov 29 16:53:24 2016 -0500

    audit: rename the queues and kauditd related functions
    
    The audit queue names can be shortened and the record sending
    helpers associated with the kauditd task could be named better, do
    these small cleanups now to make life easier once we start reworking
    the queues and kauditd code.
    
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit 6bb9f3cfb0f5f5d91029bdc3b991de281864179b
Author: Paul Moore <paul@paul-moore.com>
Date:   Tue Nov 29 16:53:24 2016 -0500

    audit: queue netlink multicast sends just like we do for unicast sends
    
    Sending audit netlink multicast messages is bad for all the same
    reasons that sending audit netlink unicast messages is bad, so this
    patch reworks things so that we don't do the multicast send in
    audit_log_end(), we do it from the dedicated kauditd_thread thread just
    as we do for unicast messages.
    
    See the GitHub issues below for more information/history:
    
     * https://github.com/linux-audit/audit-kernel/issues/23
     * https://github.com/linux-audit/audit-kernel/issues/22
    
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit 04c7b99a807fa189cd9b85b2f75767f59ab9b7d1
Author: Paul Moore <paul@paul-moore.com>
Date:   Tue Nov 29 16:53:23 2016 -0500

    audit: fixup audit_init()
    
    Make sure everything is initialized before we start the kauditd_thread
    and don't emit the "initialized" record until everything is finished.
    We also panic with a descriptive message if we can't start the
    kauditd_thread.
    
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit d93f4097376007702a26886c836568a6f9e8c5dd
Author: Richard Guy Briggs <rbriggs@redhat.com>
Date:   Tue Nov 29 16:53:23 2016 -0500

    audit: move kaudit thread start from auditd registration to kaudit init (#2)
    
    Richard made this change some time ago but Eric backed it out because
    the rest of the supporting code wasn't ready.  In order to move the
    netlink multicast send to kauditd_thread we need to ensure the
    kauditd_thread is always running, so restore commit 6ff5e459 ("audit:
    move kaudit thread start from auditd registration to kaudit init").
    
    Signed-off-by: Richard Guy Briggs <rbriggs@redhat.com>
    [PM: brought forward and merged based on Richard's old patch]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit d6ba7a9c8b5a6a42e8f7a7efd8345122611b535c
Author: Jonathan Corbet <corbet@lwn.net>
Date:   Fri Nov 18 17:21:32 2016 -0700

    doc: Sphinxify the tracepoint docbook
    
    Convert the tracepoint docbook template to RST and add it to the core-api
    manual.  No changes to the actual text beyond the mechanical formatting
    conversion.
    
    Cc: Jason Baron <jbaron@redhat.com>
    Cc: William Cohen <wcohen@redhat.com>
    Signed-off-by: Jonathan Corbet <corbet@lwn.net>

commit 8da3dc53347205b0d32ded4ab9c96dcf336061d8
Author: Jonathan Corbet <corbet@lwn.net>
Date:   Fri Nov 18 17:17:11 2016 -0700

    doc: debugobjects: actually pull in the kerneldoc comments
    
    Add the appropriate markup to get the kerneldoc comments out of
    lib/debugobjects.c that have never seen the light of day until now.
    
    A logical next step, left for the reader at the moment, is to move the
    function descriptions *out* of debug-objects.rst and into the kerneldoc
    comments themselves.
    
    Signed-off-by: Jonathan Corbet <corbet@lwn.net>

commit 93dc3a112bf8e5f97e3d9744595934ff31708764
Author: Jonathan Corbet <corbet@lwn.net>
Date:   Fri Nov 18 17:06:13 2016 -0700

    doc: Convert the debugobjects DocBook template to sphinx
    
    A couple of the most minor heading tweaks, otherwise no changes to the text
    itself beyond the mechanical conversion.
    
    Note that the inclusion of the kerneldoc comments from the source has never
    worked, since exported symbols were asked for and none of those functions
    are exported to modules.  It doesn't work here either :)
    
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Jonathan Corbet <corbet@lwn.net>

commit 0bb33e25e5c91f69304f3799609a52354dca1af4
Author: Jonathan Corbet <corbet@lwn.net>
Date:   Fri Nov 18 16:04:48 2016 -0700

    docs: Move the 802.11 guide into the driver-api manual
    
    Put this documentation with the other driver docs and try to keep the top
    level reasonably clean.
    
    Cc: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Jonathan Corbet <corbet@lwn.net>

commit 9367a9cf15eae56d36b3ac76738970bef3698e63
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue Nov 29 13:19:27 2016 -0800

    fixup! rcu: Add functions to test for trivial grace periods

commit 19f52e52fedc204e3a7a1ab568b9208064ccfb1e
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue Nov 29 13:13:47 2016 -0800

    rcu: Add functions to test for trivial grace periods
    
    Under some circumstances, RCU grace periods are zero cost.  For
    RCU-preempt, this is the case during boot, and for RCU-bh and RCU-sched,
    this is the case if there is only one CPU.  This means that RCU users
    might wish to dispense with grace-period-avoidance strategies when
    grace periods are zero cost, so this commit adds rcu_trivial_gp(),
    rcu_bh_trivial_gp(), and rcu_sched_trivial_gp() to test for these
    conditions.  Because the conditions leading to zero-cost grace periods
    can change at any time (for example, when a second CPU is onlined), these
    functions should be used as performance hints, and must not be relied
    on for correctness.  For example, even if rcu_trivial_gp() returns true,
    you are required to invoke synchronize_rcu().
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

commit 8902bd9bfc63a1dbed31a6a44dc895a33c6238e9
Author: Amir Goldstein <amir73il@gmail.com>
Date:   Tue Nov 22 11:47:09 2016 +0200

    ovl: show redirect_dir mount option
    
    Show the value of redirect_dir in /proc/mounts.
    
    Signed-off-by: Amir Goldstein <amir73il@gmail.com>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

commit a53ca63502e62ca459de32821753c8227dc94197
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Nov 29 12:02:17 2016 +0000

    drm: Protect fb_helper list manipulation with a mutex
    
    Though we only walk the kernel_fb_helper_list inside a panic (or single
    thread debugging), we still need to protect the list manipulation on
    creating/removing a framebuffer device in order to prevent list
    corruption.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Signed-off-by: Sean Paul <seanpaul@chromium.org>
    Link: http://patchwork.freedesktop.org/patch/msgid/20161129120217.7344-3-chris@chris-wilson.co.uk

commit 64e94407fb5a4128636c2b15b38fa6e71a427228
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Nov 29 12:02:16 2016 +0000

    drm: Pull together probe + setup for drm_fb_helper
    
    drm_fb_helper_probe_connector_modes() is always called before
    drm_setup_crtcs(), so just move the call into drm_setup_crtcs for a
    small bit of code compaction.
    
    Note that register_framebuffer will do a modeset (when fbcon is enabled)
    and hence must be moved out of the critical section. A follow-up patch
    will add new locking for the fb list, hence move all the related
    registration code together.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Daniel Vetter <daniel.vetter@intel.com>
    Signed-off-by: Sean Paul <seanpaul@chromium.org>
    Link: http://patchwork.freedesktop.org/patch/msgid/20161129120217.7344-2-chris@chris-wilson.co.uk

commit 966a6a13c6660b499caf2932de22ae70c1317786
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Nov 29 12:02:15 2016 +0000

    drm: Hold mode_config.lock to prevent hotplug whilst setting up crtcs
    
    The fb_helper->connector_count is modified when a new connector is
    constructed following a hotplug event (e.g. DP-MST). This causes trouble
    for drm_setup_crtcs() and friends that assume that fb_helper is
    constant:
    
    [ 1250.872997] BUG: KASAN: slab-out-of-bounds in drm_setup_crtcs+0x320/0xf80 at addr ffff88074cdd2608
    [ 1250.873020] Write of size 40 by task kworker/u8:3/480
    [ 1250.873039] CPU: 2 PID: 480 Comm: kworker/u8:3 Tainted: G     U          4.9.0-rc6+ #285
    [ 1250.873043] Hardware name:                  /NUC6i3SYB, BIOS SYSKLi35.86A.0024.2015.1027.2142 10/27/2015
    [ 1250.873050] Workqueue: events_unbound async_run_entry_fn
    [ 1250.873056]  ffff88070f9d78f0 ffffffff814b72aa ffff88074e40c5c0 ffff88074cdd2608
    [ 1250.873067]  ffff88070f9d7918 ffffffff8124ff3c ffff88070f9d79b0 ffff88074cdd2600
    [ 1250.873079]  ffff88074e40c5c0 ffff88070f9d79a0 ffffffff812501e4 0000000000000005
    [ 1250.873090] Call Trace:
    [ 1250.873099]  [<ffffffff814b72aa>] dump_stack+0x67/0x9d
    [ 1250.873106]  [<ffffffff8124ff3c>] kasan_object_err+0x1c/0x70
    [ 1250.873113]  [<ffffffff812501e4>] kasan_report_error+0x204/0x4f0
    [ 1250.873120]  [<ffffffff81698df0>] ? drm_dev_printk+0x140/0x140
    [ 1250.873127]  [<ffffffff81250ac3>] kasan_report+0x53/0x60
    [ 1250.873134]  [<ffffffff81688b40>] ? drm_setup_crtcs+0x320/0xf80
    [ 1250.873142]  [<ffffffff8124f18e>] check_memory_region+0x13e/0x1a0
    [ 1250.873147]  [<ffffffff8124f5f3>] memset+0x23/0x40
    [ 1250.873154]  [<ffffffff81688b40>] drm_setup_crtcs+0x320/0xf80
    [ 1250.873161]  [<ffffffff810be7c5>] ? wake_up_q+0x45/0x80
    [ 1250.873169]  [<ffffffff81b0c180>] ? mutex_lock_nested+0x5a0/0x5a0
    [ 1250.873176]  [<ffffffff8168a0e6>] drm_fb_helper_initial_config+0x206/0x7a0
    [ 1250.873183]  [<ffffffff81689ee0>] ? drm_fb_helper_set_par+0x90/0x90
    [ 1250.873303]  [<ffffffffa0b68690>] ? intel_fbdev_fini+0x140/0x140 [i915]
    [ 1250.873387]  [<ffffffffa0b686b2>] intel_fbdev_initial_config+0x22/0x40 [i915]
    [ 1250.873391]  [<ffffffff810b50ff>] async_run_entry_fn+0x7f/0x270
    [ 1250.873394]  [<ffffffff810a64b0>] process_one_work+0x3d0/0x960
    [ 1250.873398]  [<ffffffff810a641d>] ? process_one_work+0x33d/0x960
    [ 1250.873401]  [<ffffffff810a60e0>] ? max_active_store+0xf0/0xf0
    [ 1250.873406]  [<ffffffff810f6f9d>] ? do_raw_spin_lock+0x10d/0x1a0
    [ 1250.873413]  [<ffffffff810a767d>] worker_thread+0x8d/0x840
    [ 1250.873419]  [<ffffffff810a75f0>] ? create_worker+0x2e0/0x2e0
    [ 1250.873426]  [<ffffffff810b0454>] kthread+0x194/0x1c0
    [ 1250.873432]  [<ffffffff810b02c0>] ? kthread_park+0x60/0x60
    [ 1250.873438]  [<ffffffff810f095d>] ? trace_hardirqs_on+0xd/0x10
    [ 1250.873446]  [<ffffffff810b02c0>] ? kthread_park+0x60/0x60
    [ 1250.873453]  [<ffffffff810b02c0>] ? kthread_park+0x60/0x60
    [ 1250.873457]  [<ffffffff81b12277>] ret_from_fork+0x27/0x40
    [ 1250.873460] Object at ffff88074cdd2608, in cache kmalloc-32 size: 32
    
    However, when holding the mode_config.lock around the fb_helper, we have
    to be careful of any callbacks that may reenter the fb_helper and so try
    to reacquire the mode_config.lock (e.g. register_framebuffer). To avoid
    the mutex recursion, we have to rearrange the sequence to move the
    registration into the caller outside of the mode_config.lock.
    
    v2: drop the 1; following the lockdep assertion inside the for(;;), I
    anticipated an error that doesn't happen!
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98826
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Daniel Vetter <daniel@ffwll.ch>
    Signed-off-by: Sean Paul <seanpaul@chromium.org>
    Link: http://patchwork.freedesktop.org/patch/msgid/20161129120217.7344-1-chris@chris-wilson.co.uk

commit 033a28bac0a20de78426e6faf3414637e4775fbc
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue Nov 29 11:06:05 2016 -0800

    rcu: Allow boot-time use of cond_resched_rcu_qs()
    
    The cond_resched_rcu_qs() macro is used to force RCU quiescent states into
    long-running in-kernel loops.  However, some of these loops can execute
    during early boot when interrupts are disabled, and during which time
    it is therefore illegal to enter the scheduler.  This commit therefore
    makes cond_resched_rcu_qs() be a no-op during early boot.
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

commit c3fdff480d5d5e23a6e5529d33d1c01060491e44
Author: Paul Moore <paul@paul-moore.com>
Date:   Sun Nov 20 16:47:55 2016 -0500

    audit: add support for session ID user filter
    
    Define AUDIT_SESSIONID in the uapi and add support for specifying user
    filters based on the session ID.  Also add the new session ID filter
    to the feature bitmap so userspace knows it is available.
    
    https://github.com/linux-audit/audit-kernel/issues/4
    RFE: add a session ID filter to the kernel's user filter
    
    Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
    [PM: combine multiple patches from Richard into this one]
    Signed-off-by: Paul Moore <paul@paul-moore.com>

commit f7b7bee75e06cbdce864f7b313ac05555e7eff6b
Author: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Date:   Sat Nov 26 10:57:13 2016 +0800

    raid5-cache: add another check conditon before replaying one stripe
    
    New stripe that was just allocated has no STRIPE_R5C_CACHING state too,
    add this check condition could avoid unnecessary replaying for empty stripe.
    
    r5l_recovery_replay_one_stripe would reset stripe for any case, delete it
    to make code more clean.
    
    Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
    Signed-off-by: Shaohua Li <shli@fb.com>

commit 9b57da0630c9fd36ed7a20fc0f98dc82cc0777fa
Author: Florian Westphal <fw@strlen.de>
Date:   Tue Nov 29 02:17:34 2016 +0100

    netfilter: ipv6: nf_defrag: drop mangled skb on ream error
    
    Dmitry Vyukov reported GPF in network stack that Andrey traced down to
    negative nh offset in nf_ct_frag6_queue().
    
    Problem is that all network headers before fragment header are pulled.
    Normal ipv6 reassembly will drop the skb when errors occur further down
    the line.
    
    netfilter doesn't do this, and instead passed the original fragment
    along.  That was also fine back when netfilter ipv6 defrag worked with
    cloned fragments, as the original, pristine fragment was passed on.
    
    So we either have to undo the pull op, or discard such fragments.
    Since they're malformed after all (e.g. overlapping fragment) it seems
    preferrable to just drop them.
    
    Same for temporary errors -- it doesn't make sense to accept (and
    perhaps forward!) only some fragments of same datagram.
    
    Fixes: 029f7f3b8701cc7ac ("netfilter: ipv6: nf_defrag: avoid/free clone operations")
    Reported-by: Dmitry Vyukov <dvyukov@google.com>
    Debugged-by: Andrey Konovalov <andreyknvl@google.com>
    Diagnosed-by: Eric Dumazet <Eric Dumazet <edumazet@google.com>
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Acked-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit 333ba053d145d6f9152f6b0311a345b876f0fed1
Author: Javier González <javier@cnexlabs.com>
Date:   Mon Nov 28 22:39:14 2016 +0100

    lightnvm: transform target get/set bad block
    
    Since targets are given a virtual target device, it is necessary to
    translate all communication between targets and the backend device.
    Implement the translation layer for get/set bad block table.
    
    Signed-off-by: Javier González <javier@cnexlabs.com>
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit da2d7cb828ce2714c603827ac5a6e1c98a02e861
Author: Javier González <jg@lightnvm.io>
Date:   Mon Nov 28 22:39:13 2016 +0100

    lightnvm: use target nvm on target-specific ops.
    
    On target-specific operations pass on nvm_tgt_dev instead of the generic
    nvm device.
    
    Signed-off-by: Javier González <javier@cnexlabs.com>
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit a279006afa3377493c4240395c70430f2a9b0d2b
Author: Javier González <jg@lightnvm.io>
Date:   Mon Nov 28 22:39:12 2016 +0100

    lightnvm: introduce max_phys_sects helper function
    
    Target devices do not have access to the device driver operations.
    Introduce a helper function that exposes the max. number of physical
    sectors supported by the underlying device.
    
    Signed-off-by: Javier González <javier@cnexlabs.com>
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit 959e911b31981b52ed3f3d6e351b107bcb9163ef
Author: Javier González <jg@lightnvm.io>
Date:   Mon Nov 28 22:39:11 2016 +0100

    lightnvm: introduce helpers for generic ops in rrpc
    
    Avoid calling media manager and device-specific operations directly from
    rrpc. Create helper functions on lightnvm's core instead.
    
    Signed-off-by: Javier González <javier@cnexlabs.com>
    
    Made it work with null_blk as well.
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit 8e53624d44c1de31b1b0d4f500703669418a4c67
Author: Javier González <jg@lightnvm.io>
Date:   Mon Nov 28 22:39:10 2016 +0100

    lightnvm: eliminate nvm_lun abstraction in mm
    
    In order to naturally support multi-target instances on an Open-Channel
    SSD, targets should own the LUNs they get blocks from and manage
    provisioning internally. This is done in several steps.
    
    Since targets own the LUNs the are instantiated on top of and manage the
    free block list internally, there is no need for a LUN abstraction in
    the media manager. LUNs are intrinsically managed as in the physical
    layout (ch:0,lun:0, ..., ch:0,lun:n, ch:1,lun:0, ch:1,lun:n, ...,
    ch:m,lun:0, ch:m,lun:n) and given to the targets based on the target
    creation ioctl. This simplifies LUN management and clears the path for a
    partition manager to sit directly underneath LightNVM targets.
    
    Signed-off-by: Javier González <javier@cnexlabs.com>
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit 2a02e627c245bfa987b97707123d7747d7b0e486
Author: Javier González <jg@lightnvm.io>
Date:   Mon Nov 28 22:39:09 2016 +0100

    lightnvm: eliminate nvm_block abstraction on mm
    
    In order to naturally support multi-target instances on an Open-Channel
    SSD, targets should own the LUNs they get blocks from and manage
    provisioning internally. This is done in several steps.
    
    A part of this transformation is that targets manage their blocks
    internally. This patch eliminates the nvm_block abstraction and moves
    block management to the target logic. The rrpc target is transformed.
    
    Signed-off-by: Javier González <javier@cnexlabs.com>
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit eec44565e9ab13bbf5b48864a68871eabf1115c1
Author: Javier González <jg@lightnvm.io>
Date:   Mon Nov 28 22:39:08 2016 +0100

    lightnvm: remove debug lun statistics from gennvm
    
    Since LUNs are managed internally on targets, the media manager has no
    access to the free LUN lists. Thus, debug functions that show LUN
    information on the device should not be implemented on the media
    manager, but rather on the target in itself.
    
    Signed-off-by: Javier González <javier@cnexlabs.com>
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit 0ac4072eb10c9627415eb1ca511121156e20012c
Author: Javier González <jg@lightnvm.io>
Date:   Mon Nov 28 22:39:07 2016 +0100

    lightnvm: remove get_lun operation on gennvm
    
    Since LUNs are managed internally on the target, there is no need for
    the media manager to implement a get_lun operation.
    
    Signed-off-by: Javier González <javier@cnexlabs.com>
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit 8e79b5cb1d3b8eceaf6862995952dd4de431dd99
Author: Javier González <jg@lightnvm.io>
Date:   Mon Nov 28 22:39:06 2016 +0100

    lightnvm: move block provisioning to targets
    
    In order to naturally support multi-target instances on an Open-Channel
    SSD, targets should own the LUNs they get blocks from and manage
    provisioning internally. This is done in several steps.
    
    This patch moves the block provisioning inside of the target and removes
    the get/put block interface from the media manager.
    
    Signed-off-by: Javier González <javier@cnexlabs.com>
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit 8176117b82e49e043d045f214ba7a892fba6b827
Author: Javier González <jg@lightnvm.io>
Date:   Mon Nov 28 22:39:05 2016 +0100

    lightnvm: manage lun partitions internally in mm
    
    LUNs are exclusively owned by targets implementing a block device FTL.
    Doing this reservation requires at the moment a 2-way callback gennvm
    <-> target. The reason behind this is that LUNs were not assumed to
    always be exclusively owned by targets. However, this design decision
    goes against I/O determinism QoS (two targets would mix I/O on the same
    parallel unit in the device).
    
    This patch makes LUN reservation as part of the target creation on the
    media manager. This makes that LUNs are always exclusively owned by the
    target instantiated on top of them. LUN stripping and/or sharing should
    be implemented on the target itself or the layers on top.
    
    Signed-off-by: Javier González <javier@cnexlabs.com>
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit de93434fcf74d41754a48e45365a5914e00bc0be
Author: Javier González <jg@lightnvm.io>
Date:   Mon Nov 28 22:39:04 2016 +0100

    lightnvm: remove gen_lun abstraction
    
    The gen_lun abstraction in the generic media manager was conceived on
    the assumption that a single target would instantiated on top of it.
    This has complicated target design to implement multi-instances. Remove
    this abstraction and move its logic to nvm_lun, which manages physical
    lun geometry and operations.
    
    Signed-off-by: Javier González <javier@cnexlabs.com>
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit 98379a12c54974ee5856dcf81781a5dc845505c3
Author: Javier González <jg@lightnvm.io>
Date:   Mon Nov 28 22:39:03 2016 +0100

    lightnvm: use constant name instead of value
    
    There is a constant to refer to free blocks. Use it when marking bad
    blocks instead of using a constant value
    
    Signed-off-by: Javier González <javier@cnexlabs.com>
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit eb00352b5213e52419ac7dd8bbd84a1300fe4b5d
Author: Javier González <jg@lightnvm.io>
Date:   Mon Nov 28 22:39:02 2016 +0100

    lightnvm: remove unnecessary variables in rrpc
    
    Before vectored I/Os were supported on rrpc, the physical address was
    stored as part of the nvm_rqd request. This variable become obsolete
    when the ppa_list was introduced. Cleanup this variable.
    
    Signed-off-by: Javier González <javier@cnexlabs.com>
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit 0e5c3246dbb96b6870634e7d51b2490f05c976cf
Author: Javier González <jg@lightnvm.io>
Date:   Mon Nov 28 22:39:01 2016 +0100

    lightnvm: make address conversion functions global
    
    Targets are assumed to used the same generic ppa format, where the
    address is partitioned on ch:lun:block:pg:pl:sec. Thus, make the
    function in charge of transforming the ppa address from a linear format
    to the generic one available to all targets.
    
    This function will be needed by the media manager in order to do target
    mapping translations when targets are divided on different physical
    partitions.
    
    Signed-off-by: Javier González <javier@cnexlabs.com>
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit 7e4f64a9b3004ce592f21653c3b7781628862232
Author: Javier González <jg@lightnvm.io>
Date:   Mon Nov 28 22:39:00 2016 +0100

    lightnvm: cleanup unused target operations
    
    Cleanup definition leftovers from old gennvm interface
    
    Signed-off-by: Javier González <javier@cnexlabs.com>
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit 17b25cfc873e6d7e2283cf8d77f2af93192484a5
Author: Javier González <jg@lightnvm.io>
Date:   Mon Nov 28 22:38:59 2016 +0100

    lightnvm: remove sysfs configuration interface
    
    LightNVM used to be managed and configured through sysfs. Since the
    introduction of management ioctls this interface is redundant and
    outdated. Get rid of it.
    
    Signed-off-by: Javier González <javier@cnexlabs.com>
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit f0b01b6a610f99fb7683d0f5259bb65649b0fd78
Author: Javier González <jg@lightnvm.io>
Date:   Mon Nov 28 22:38:58 2016 +0100

    lightnvm: rrpc: split bios of size > 256kb
    
    rrpc cannot handle bios of size > 256kb due to NVMe using a 64 bit
    bitmap to signal I/O completion. If a larger bio comes, split it
    explicitly.
    
    Signed-off-by: Javier González <javier@cnexlabs.com>
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit 402ab9a89d7b5bab08a5534027b39d80085ec19b
Author: Javier González <jg@lightnvm.io>
Date:   Mon Nov 28 22:38:57 2016 +0100

    lightnvm: add ECC error codes
    
    Add ECC error codes to enable the appropriate handling in the target.
    
    Signed-off-by: Javier González <javier@cnexlabs.com>
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit a24ba4644b7ae5af3cd2eb6992c237cb4548c45e
Author: Javier González <jg@lightnvm.io>
Date:   Mon Nov 28 22:38:56 2016 +0100

    lightnvm: export set bad block table
    
    Bad blocks should be managed by block owners. This would be either
    targets for data blocks or sysblk for system blocks.
    
    In order to support this, export two functions: One to mark a block as
    an specific type (e.g., bad block) and another to update the bad block
    table on the device.
    
    Move bad block management to rrpc.
    
    Signed-off-by: Javier González <javier@cnexlabs.com>
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit 8a3c95ab385fb98621455807ae52b4454192f8c5
Author: Javier González <jg@lightnvm.io>
Date:   Mon Nov 28 22:38:55 2016 +0100

    lightnvm: do not protect block 0
    
    Device blocks should be marked by the device and considered as bad
    blocks by the media manager. Thus, do not make assumptions on which
    blocks are going to be used by the device. In doing so we might lose
    valid blocks from the free list.
    
    Signed-off-by: Javier González <javier@cnexlabs.com>
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit bb3149792e0ed52cf5f457dda4c9bf9c5bda1542
Author: Javier González <jg@lightnvm.io>
Date:   Mon Nov 28 22:38:54 2016 +0100

    lightnvm: enable to send hint to erase command
    
    Erases might be subject to host hints. An example is multi-plane
    programming to erase blocks in parallel. Enable targets to specify this
    hint.
    
    Signed-off-by: Javier González <javier@cnexlabs.com>
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit 3dc87dd048dc442bab633e85bfb96c893612d765
Author: Matias Bjørling <m@bjorling.me>
Date:   Mon Nov 28 22:38:53 2016 +0100

    nvme: lightnvm: attach lightnvm sysfs to nvme block device
    
    Previously, LBA read and write were not supported in the lightnvm
    specification. Now that it supports it, lets use the traditional
    NVMe gendisk, and attach the lightnvm sysfs geometry export.
    
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit 7498e99fc51ca60b960ef79061e0e7b521feb07e
Author: Matias Bjørling <m@bjorling.me>
Date:   Mon Nov 28 22:38:52 2016 +0100

    nvme: lightnvm: frees wrong cmd structure
    
    When struct nvme_request was introduced, the nvme_nvm_submit_io was
    converted to the new interface. The interface moves nvme_nvm_command
    data structure into the struct request pdu. On io completion, rq->cmd is
    freed, which should have been the dereferenced pdu nvme_request->cmd.
    
    Fixes: d49187e97e94 "nvme: introduce struct nvme_request"
    Signed-off-by: Matias Bjørling <m@bjorling.me>
    Signed-off-by: Jens Axboe <axboe@fb.com>

commit e59d8bb574f6d8097e7e14981440dc37425fddc6
Author: Pan Bian <bianpan2016@163.com>
Date:   Tue Nov 29 07:33:07 2016 +0800

    ALSA: echoaudio: Fix improper return value in function load_asic
    
    When the second call to load_asic_generic() fails in function
    load_asic(), "false" is returned. The real value of "false" is 0, which
    indicates success in the context. As a result, the execution status and
    the return value may be inconsistent. This patch fixes the bug.
    
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188761
    Signed-off-by: Pan Bian <bianpan2016@163.com>
    Signed-off-by: Takashi Iwai <tiwai@suse.de>

commit 40931b85113dad7881d49e8759e5ad41d30a5e6c
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri Nov 25 07:46:20 2016 -0800

    mlx4: give precise rx/tx bytes/packets counters
    
    mlx4 stats are chaotic because a deferred work queue is responsible
    to update them every 250 ms.
    
    Even sampling stats every one second with "sar -n DEV 1" gives
    variations like the following :
    
    lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
    07:39:22         eth0 146877.00 3265554.00   9467.15 4828168.50
    07:39:23         eth0 146587.00 3260329.00   9448.15 4820445.98
    07:39:24         eth0 146894.00 3259989.00   9468.55 4819943.26
    07:39:25         eth0 110368.00 2454497.00   7113.95 3629012.17  <<>>
    07:39:26         eth0 146563.00 3257502.00   9447.25 4816266.23
    07:39:27         eth0 145678.00 3258292.00   9389.79 4817414.39
    07:39:28         eth0 145268.00 3253171.00   9363.85 4809852.46
    07:39:29         eth0 146439.00 3262185.00   9438.97 4823172.48
    07:39:30         eth0 146758.00 3264175.00   9459.94 4826124.13
    07:39:31         eth0 146843.00 3256903.00   9465.44 4815381.97
    Average:         eth0 142827.50 3179259.70   9206.30 4700578.16
    
    This patch allows rx/tx bytes/packets counters being folded at the
    time we need stats.
    
    We now can fetch stats every 1 ms if we want to check NIC behavior
    on a small time window. It is also easier to detect anomalies.
    
    lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
    07:42:50         eth0 142915.00 3177696.00   9212.06 4698270.42
    07:42:51         eth0 143741.00 3200232.00   9265.15 4731593.02
    07:42:52         eth0 142781.00 3171600.00   9202.92 4689260.16
    07:42:53         eth0 143835.00 3192932.00   9271.80 4720761.39
    07:42:54         eth0 141922.00 3165174.00   9147.64 4679759.21
    07:42:55         eth0 142993.00 3207038.00   9216.78 4741653.05
    07:42:56         eth0 141394.06 3154335.64   9113.85 4663731.73
    07:42:57         eth0 141850.00 3161202.00   9144.48 4673866.07
    07:42:58         eth0 143439.00 3180736.00   9246.05 4702755.35
    07:42:59         eth0 143501.00 3210992.00   9249.99 4747501.84
    Average:         eth0 142835.66 3182165.93   9206.98 4704874.08
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Tariq Toukan <tariqt@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 6a30c5a73fdc6c51cae4a833bb21375f26d5ccba
Author: Alexey Brodkin <abrodkin@synopsys.com>
Date:   Tue Nov 22 14:16:36 2016 +0300

    ARC: axs10x: really enable ARC PGU
    
    Up until now we had ARC PGU not enabled in axs10x defconfigs trying
    to not bloat kernel image again with yet another drivers and subsystems.
    
    This change configures ARC PGU (as well as DRM bits it depends on)
    to be built as a module and so those who need LCD screen to work on
    axs10x may bundle built .ko files in their target's file-system with
    help of the following command on host:
    ------------->8-------------
    make INSTALL_MOD_PATH=_path_to_target_fs_ modules_install
    ------------->8-------------
    
    and later on target with commands as simple as:
    ------------->8-------------
    modprobe adv7511.ko
    modprobe arcpgu.ko
    ------------->8-------------
    get LCD working.
    
    Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

commit 98d3e6d4a04f3d1b13a2d547f0d3f2d7f42f4f0e
Author: Vineet Gupta <vgupta@synopsys.com>
Date:   Fri Nov 18 14:19:27 2016 -0800

    ARC: rename Zebu platform support to HAPS
    
    There are more ARC Linux HAPS users than Zebu ones.
    
    Same kernel would work fine on both, even with embedded DT, assuming the FPGA
    bitfile configuration is same
    
    Suggested-by: Francois Bedard <fbedard@ynopsys.com>
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

commit f3fd0d516943e769e51c521311ab13dd65c7898f
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Tue Nov 22 15:33:44 2016 +0100

    clocksource: nps: avoid maybe-uninitialized warning
    
    We get a harmless false-positive warning with the newly added nps
    clocksource driver:
    
    drivers/clocksource/timer-nps.c: In function 'nps_setup_clocksource':
    drivers/clocksource/timer-nps.c:102:6: error: 'nps_timer1_freq' may be used uninitialized in this function [-Werror=maybe-uninitialized]
    
    Gcc here fails to identify that IS_ERR() is only true if PTR_ERR()
    has a nonzero value. Using PTR_ERR_OR_ZERO() to convert the result
    first makes this obvious and shuts up the warning.
    
    Fixes: 0ee4d9922df5 ("clocksource: Add clockevent support to NPS400 driver")
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

commit ab321058267b4df1cdd515fffa3ef380ed28ecd2
Author: Noam Camus <noamca@mellanox.com>
Date:   Thu Nov 17 09:12:43 2016 +0200

    clocksource: Add clockevent support to NPS400 driver
    
    Till now we used clockevent from generic ARC driver.
    This was enough as long as we worked with simple multicore SoC.
    When we are working with multithread SoC each HW thread can be
    scheduled to receive timer interrupt using timer mask register.
    This patch will provide a way to control clock events per HW thread.
    
    The design idea is that for each core there is dedicated register
    (TSI) serving all 16 HW threads.
    The register is a bitmask with one bit for each HW thread.
    When HW thread wants that next expiration of timer interrupt will
    hit it then the proper bit should be set in this dedicated register.
    When timer expires all HW threads within this core which their bit
    is set at the TSI register will be interrupted.
    
    Driver can be used from device tree by:
    compatible = "ezchip,nps400-timer0" <-- for clocksource
    compatible = "ezchip,nps400-timer1" <-- for clockevent
    
    Note that name convention for timer0/timer1 was taken from legacy
    ARC design. This design is our base before adding HW threads.
    For backward compatibility we keep "ezchip,nps400-timer" for clocksource
    
    Signed-off-by: Noam Camus <noamca@mellanox.com>
    Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
    Acked-by: Rob Herring <robh@kernel.org>

commit a08a8e2893015a7fc25b14d75c9fc7a222ae7b38
Author: Noam Camus <noamca@mellanox.com>
Date:   Wed Nov 16 08:31:12 2016 +0200

    clocksource: update "fn" at CLOCKSOURCE_OF_DECLARE() of nps400 timer
    
    nps_setup_clocksource() should take node as only argument as defined by
    typedef int (*of_init_fn_1_ret)(struct device_node *)
    
    Therefore need to replace:
    int __init nps_setup_clocksource(struct device_node *node, struct clk *clk)
    with
    int __init nps_setup_clocksource(struct device_node *node)
    
    This patch also serve as preparation for next patch which add support
    for clockevents to nps400.
    Specifically we add new function nps_get_timer_clk() to serve clocksource
    and later clockevent registration.
    
    Signed-off-by: Noam Camus <noamca@mellanox.com>
    Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>

commit 6e022b60285a6fbf8229e1ec268f2d8f55d35477
Author: Noam Camus <noamca@mellanox.com>
Date:   Wed Nov 16 08:31:11 2016 +0200

    soc: Support for NPS HW scheduling
    
    This new header file is for NPS400 SoC (part of ARC architecture).
    The header file includes macros for save/restore of HW scheduling.
    The control of HW scheduling is achieved by writing core registers.
    This code was moved from arc/plat-eznps so it can be used
    from drivers/clocksource/, available only for CONFIG_EZNPS_MTM_EXT.
    
    Signed-off-by: Noam Camus <noamca@mellanox.com>
    Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>

commit b8531ea8d22f5678a2a6b9cce7ad0ce02d2c4482
Author: Vineet Gupta <vgupta@synopsys.com>
Date:   Mon Oct 31 13:46:38 2016 -0700

    clocksource: import ARC timer driver
    
    This adds support for
    
     - CONFIG_ARC_TIMERS : legacy 32-bit TIMER0 and TIMER1 which count UP
       from @CNT to @LIMIT, before optionally triggering an interrupt.
       These are programmed using ARC auxiliary register interface.
       These are present in all ARC cores (ARC700 and ARC HS38)
       TIMER0 serves as clockevent for all ARC linux builds.
       TIMER1 is used for clocksource in arc700 builds.
    
     - CONFIG_ARC_TIMERS_64BIT: 64-bit counters, RTC and GFRC found in
       ARC HS38 cores. These are independnet IP blocks with different
       programming model respectively.
    
    Link: http://lkml.kernel.org/r/20161111231132.GA4186@mai
    Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

commit f201a7e1cf0592aa06931e1624d5e63c8116e765
Author: Vineet Gupta <vgupta@synopsys.com>
Date:   Mon Oct 31 13:06:19 2016 -0700

    ARC: breakout timer include code into separate header ...
    
    ... which allows for use in drivers/clocksource later
    
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

commit 43c7919a4133d80845573a9b8047e8bf10ea0eca
Author: Vineet Gupta <vgupta@synopsys.com>
Date:   Mon Oct 31 11:27:08 2016 -0700

    ARC: move mcip.h into include/soc and adjust the includes
    
    Also remove the dependency on ARCv2, to increase compile coverage for
    !ARCV2 builds
    
    Acked-by: Daniel Lezcano <daniel.lezcnao@linaro.org>
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

commit 6387055728a2a5a105433d8c7dd2e0fc17d999e2
Author: Vineet Gupta <vgupta@synopsys.com>
Date:   Mon Oct 31 11:09:34 2016 -0700

    ARC: breakout aux handling into a separate header
    
    ARC timers use aux registers for programming and this paves way for
    moving ARC timer drivers into drivers/clocksource
    
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

commit 21857529f5fe5081b74cb0da4785fa64ea332995
Author: Vineet Gupta <vgupta@synopsys.com>
Date:   Mon Oct 31 13:26:25 2016 -0700

    ARC: time: move time_init() out of the driver
    
    to allow future git mv of the driver into drivers/clocksource
    
    Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

commit 8a5b823f98ad30bb334d25c8123e8d5766c4fbf1
Author: Vineet Gupta <vgupta@synopsys.com>
Date:   Mon Oct 31 14:26:41 2016 -0700

    ARC: timer: gfrc, rtc: build under same option (64-bit timers)
    
    The original distinction was done as they were developed at different
    times and primarily because they are specific to UP (RTC) and SMP (GFRC).
    
    But given that driver handles that at runtime, (i.e. not allowing
    RTC as clocksource in SMP), we can simplify things a bit.
    
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

commit d0f76f838d74ffbe0f206d1fb3d6f0b01b372353
Author: Vineet Gupta <vgupta@synopsys.com>
Date:   Mon Oct 31 13:02:31 2016 -0700

    ARC: timer: gfrc, rtc: Read BCR to detect whether hardware exists ...
    
    ... don't rely on cpuinfo populated in arc boot code. This paves way for
    moving this code in drivers/clocksource/
    
    And while at it, convert the WARN() to pr_warn() as sugested by Daniel
    
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

commit 0081d7bcd0a7663d0240befdb634b65bba04c13f
Author: Vineet Gupta <vgupta@synopsys.com>
Date:   Thu Nov 3 11:38:52 2016 -0700

    ARC: timer: gfrc, rtc: deuglify big endian code
    
    A standard "C" shift will be handled appropriately by the compiler
    depending on the endian for the build. So we don't need the
    explicit distinction in code
    
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

commit 76fb051d42945d142fe265b6ec79e06aa9cfb250
Author: Russell King <rmk+kernel@armlinux.org.uk>
Date:   Mon Nov 21 16:07:05 2016 +0000

    ARM: mm: allow set_memory_*() to be used on the vmalloc region
    
    We can allow modules to be loaded into the vmalloc region, where they
    should also benefit from the same protections as those loaded into
    the more efficient module region.  Allow these functions to operate
    there as well.
    
    Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

commit 580218f9678e76f712a1cf6cff5a903917fa9558
Author: Russell King <rmk+kernel@armlinux.org.uk>
Date:   Mon Nov 21 16:02:08 2016 +0000

    ARM: mm: fix set_memory_*() bounds checks
    
    The set_memory_*() bounds checks are buggy on several fronts:
    
    1. They fail to round the region size up if the passed address is not
       page aligned.
    2. The region check was incomplete, and didn't correspond with what
       was being asked of apply_to_page_range()
    
    So, rework change_memory_common() to fix these problems, adding an
    "in_region()" helper to determine whether the start & size fit within
    the provided region start and stop addresses.
    
    Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

commit d3df9bc5fb5d838b049f32a476721eadbc349553
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue Nov 29 05:49:06 2016 -0800

    rcu: Once again use NMI-based stack traces in stall warnings
    
    This commit is for all intents and purposes a revert of bc1dce514e9b
    ("rcu: Don't use NMIs to dump other CPUs' stacks").  The reason to suppose
    that this can now safely be reverted is the presence of 42a0bb3f7138
    ("printk/nmi: generic solution for safe printk in NMI"), which is said
    to have made NMI-based stack dumps safe.
    
    However, this reversion keeps one nice property of bc1dce514e9b
    ("rcu: Don't use NMIs to dump other CPUs' stacks"), namely that
    only those CPUs blocking the grace period are dumped.  The new
    trigger_single_cpu_backtrace() is used to make this happen, as
    suggested by Josh Poimboeuf.
    
    Reported-by: Vince Weaver <vincent.weaver@maine.edu>
    Not-yet-signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>

commit f8045446ca778333e960dcb9e30a5858ff2b8c20
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Mon Nov 28 12:08:49 2016 -0800

    srcu: Force full grace-period ordering
    
    If a process invokes synchronize_srcu(), is delayed just the right amount
    of time, and thus does not sleep when waiting for the grace period to
    complete, there is no ordering between the end of the grace period and
    the code following the synchronize_srcu().  Similarly, there can be a
    lack of ordering between the end of the SRCU grace period and callback
    invocation.
    
    This commit adds the necessary ordering.
    
    Reported-by: Lance Roy <ldr709@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

commit 6a8b2ca702b279bea0e8f0363056439352e2081c
Author: Yuriy Kolerov <yuriy.kolerov@synopsys.com>
Date:   Mon Nov 28 07:07:17 2016 +0300

    ARC: mm: PAE40: Fix crash at munmap
    
    commit 1c3c90930392 broke PAE40. Macro pfn_pte(pfn, prot) creates paddr
    from pfn, but the page shift was getting truncated to 32 bits since we lost
    the proper cast to 64 bits (for PAE400
    
    Instead of reverting that commit, use a better helper which is 32/64 bits
    safe just like ARM implementation.
    
    Fixes: 1c3c90930392 ("ARC: mm: fix build breakage with STRICT_MM_TYPECHECKS")
    Cc: <stable@vger.kernel.org>   #4.4+
    Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com>
    [vgupta: massaged changelog]
    Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

commit 2349b533167315199b00b15db891ddc45b2c909d
Author: subhashj@codeaurora.org <subhashj@codeaurora.org>
Date:   Wed Nov 23 16:33:19 2016 -0800

    scsi: ufs: fix default power mode to FAST/SLOW
    
    We would by default like to run in FAST/SLOW mode instead
    of FASTAUTO/SLOWAUTO mode for performance reasons. This
    change sets the default speed mode to FAST/SLOW mode.
    
    Reviewed-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
    Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

commit 0b257734344aa89b565bd148ff94f29aa873ffa6
Author: subhashj@codeaurora.org <subhashj@codeaurora.org>
Date:   Wed Nov 23 16:33:08 2016 -0800

    scsi: ufs: optimize system suspend handling
    
    Consider following sequence of events:
    1. UFS is runtime suspended, link_state = Hibern8, device_state = sleep
    2. System goes into system suspend, ufshcd_system_suspend() brings both
       link and device to active state and then puts the device in Power_Down
       state and link in OFF state.
    3. System resumes at some later point in time, ufshcd_system_resume()
       doesn't do anything as UFS state is runtime suspended. Note that link
       is still on OFF state and device is in Power_Down state.
    4. Now system again goes into suspend without any UFS accesses before it.
       ufshcd_system_suspend() again brings both link and device to active
       state and then puts the device in Power_Down state and link if OFF
       state. But it's unnecessary to bring the link & device in active state
       as both link and device are already in desired low power states. This
       change fixes this issue by adding proper state checks in
       ufshcd_system_suspend().
    
    Reviewed-by: Gilad Broner <gbroner@codeaurora.org>
    Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

commit f37e9f8cf8bc681250880f8c1372fb882c9379b8
Author: Yaniv Gardi <ygardi@codeaurora.org>
Date:   Wed Nov 23 16:32:49 2016 -0800

    scsi: ufs: fix condition in which DME command failure msg is printed out
    
    The condition in which error message is printed out was incorrect and
    resulted error message only if retries exhausted.
    But retries happens only if DME command is a peer command, and thus
    DME commands which are not peer commands and fail are not printed out.
    This change fixes this issue.
    
    Signed-off-by: Yaniv Gardi <ygardi@codeaurora.org>
    Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

commit fb7b45f0462f144e1924a357995552f24f0c9d0c
Author: Dolev Raviv <draviv@codeaurora.org>
Date:   Wed Nov 23 16:32:32 2016 -0800

    scsi: ufs: handle errors from PHY_ADAPTER_ERROR register
    
    The PHY_ADAPTER_ERROR status register indicates PHY lane errors
    reported by the M-PHY layer. In some occasions the controller
    can recover from such errors. When the error is not recoverable,
    a stuck DB error will occur. Since the stuck DB error is spotted
    separately, no action other than clearing the register is necessary.
    
    Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
    Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

commit 7caf489b99a42a9017ef3d733912aea8794677e7
Author: subhashj@codeaurora.org <subhashj@codeaurora.org>
Date:   Wed Nov 23 16:32:20 2016 -0800

    scsi: ufs: issue link starup 2 times if device isn't active
    
    If we issue the link startup to the device while its UniPro state is
    LinkDown (and device state is sleep/power-down) then link startup
    will not move the device state to Active. Device will only move to
    active state if the link starup is issued when its UniPro state is
    LinkUp. So in this case, we would have to issue the link startup 2
    times to make sure that device moves to active state.
    
    Reviewed-by: Gilad Broner <gbroner@codeaurora.org>
    Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

commit c6a6db439868c7ba5cc90d4c461d9697ec731fa1
Author: subhashj@codeaurora.org <subhashj@codeaurora.org>
Date:   Wed Nov 23 16:32:08 2016 -0800

    scsi: ufs: ensure that host pa_tactivate is higher than device
    
    Some UFS devices require host PA_TACTIVATE to be higher than
    device PA_TACTIVATE otherwise it may get stuck during hibern8 sequence.
    This change allows this by using quirk.
    
    Reviewed-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
    Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

commit 10fe5888a40e6afbf1f3166c212c685624cae26b
Author: subhashj@codeaurora.org <subhashj@codeaurora.org>
Date:   Wed Nov 23 16:31:52 2016 -0800

    scsi: ufs: increase the scsi query response timeout
    
    It is found thats UFS device may take longer than 30ms to respond to
    query requests and in this case we might run into following scenario:
    
    1. UFS host SW sends a query request to UFS device to read an attribute
       value. SW uses tag #31 for this purpose.
    2. UFS host SW waits for 30ms to get the query response (and doorbell
       to be cleared by UFS host HW).
    3. UFS device doesn't respond back within 30ms hence UFS host SW times
       out waiting for the query response.
    4. UFS host SW clears the tag#31 from UTRLCLR register.
    5. UFS host SW waits until UFS host HW to clear tag#31 from the doorbell
       register.
    6. UFS host SW retries the same query request on same tag#31 (sends a query
       request to device to read an attribute value).
    7. UFS host HW gets the query response from the device but this was
       intended as a query response for the 1st query request sent (step-1).
    8. Now UFS device sends another query response to host (for query request
       sent @step-6).
    
    Now there are 2 issues that could happen with above scenario:
    1. UFS device should have actually responded back with only one query
       response but it is found that device may respond back with 2 query
       responses.
    2. If UFS device responds back with 2 resposes on same tag, host HW/SW
       behaviour isn't predictable.
    
    To avoid running into above scenario, we would basically allow device
    to take longer (upto 1.5 seconds) for query response.
    
    Reviewed-by: Gilad Broner <gbroner@codeaurora.org>
    Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

commit bde44bb665d049468b6a1a2fa7d666434de4f83f
Author: subhashj@codeaurora.org <subhashj@codeaurora.org>
Date:   Wed Nov 23 16:31:41 2016 -0800

    scsi: ufs: fix failure to read the string descriptor
    
    While reading variable size descriptors (like string descriptor), some UFS
    devices may report the "LENGTH" (field in "Transaction Specific fields" of
    Query Response UPIU) same as what was requested in Query Request UPIU
    instead of reporting the actual size of the variable size descriptor.
    Although it's safe to ignore the "LENGTH" field for variable size
    descriptors as we can always derive the length of the descriptor from
    the descriptor header fields. Hence this change impose the length match
    check only for fixed size descriptors (for which we always request the
    correct size as part of Query Request UPIU).
    
    Reviewed-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
    Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

commit 24d6243204633be4b710754f279b3ca57c69ceec
Author: Yaniv Gardi <ygardi@codeaurora.org>
Date:   Wed Nov 23 16:31:30 2016 -0800

    scsi: ufs: update device descriptor maximum size
    
    According to JESD220B - UFS v2.0, the maximum size of device descriptor
    has changed from 0x1F to 0x40. This patch updates the maximum size of
    this descriptor.
    
    Signed-off-by: Yaniv Gardi <ygardi@codeaurora.org>
    Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

commit 4b761b580150b081e6829885afd3d540b746ccf0
Author: Yaniv Gardi <ygardi@codeaurora.org>
Date:   Wed Nov 23 16:31:18 2016 -0800

    scsi: ufs: add index details to query error messages
    
    When sending query to the device, the index  of the failure
    is additional useful information that should be printed out as it
    might specify the logical unit (LU) where the error occurred.
    
    Signed-off-by: Yaniv Gardi <ygardi@codeaurora.org>
    Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

commit 61e073590b82a539654626ecae91b8fab11db3f3
Author: Dolev Raviv <draviv@codeaurora.org>
Date:   Wed Nov 23 16:30:49 2016 -0800

    scsi: ufs: add queries retry mechanism
    
    Some of the queries might fail during init. To avoid
    system failure, we add retry mechanism to issue queries
    several times.
    
    Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
    Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

commit 95d3af6bd18f381b5b1c62f117ce7f152a5db3ea
Author: Yazen Ghannam <Yazen.Ghannam@amd.com>
Date:   Thu Nov 17 17:57:43 2016 -0500

    EDAC, amd64: Autoload amd64_edac_mod on Fam17h systems
    
    Add Fam17h to the list of families to autoload amd64_edac_mod.
    
    Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
    Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
    Cc: linux-edac <linux-edac@vger.kernel.org>
    Cc: x86-ml <x86@kernel.org>
    Link: http://lkml.kernel.org/r/1479423463-8536-18-git-send-email-Yazen.Ghannam@amd.com
    Signed-off-by: Borislav Petkov <bp@suse.de>

commit 713ad54675fdfd7358dbcae21ab4788a014c6e23
Author: Yazen Ghannam <Yazen.Ghannam@amd.com>
Date:   Mon Nov 28 12:59:53 2016 -0600

    EDAC, amd64: Define and register UMC error decode function
    
    How we need to decode UMC errors is different from how we decode bus
    errors, so let's define a new function for this. We also need a way to
    determine the UMC channel since we're not guaranteed that there is a
    fixed relation between channel and MCA bank.
    
    Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
    Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
    Cc: linux-edac <linux-edac@vger.kernel.org>
    Cc: x86-ml <x86@kernel.org>
    Link: http://lkml.kernel.org/r/1480359593-80369-1-git-send-email-Yazen.Ghannam@amd.com
    [ Fold in decode_synd_reg(), simplify. ]
    Signed-off-by: Borislav Petkov <bp@suse.de>

commit d27f3a348e3677b7d5ee6954ebafce679b011164
Author: Yazen Ghannam <Yaz…
pcmoore added a commit that referenced this issue Dec 14, 2016
Sending audit netlink multicast messages is bad for all the same
reasons that sending audit netlink unicast messages is bad, so this
patch reworks things so that we don't do the multicast send in
audit_log_end(), we do it from the dedicated kauditd_thread thread just
as we do for unicast messages.

See the GitHub issues below for more information/history:

 * #23
 * #22

Signed-off-by: Paul Moore <paul@paul-moore.com>
pcmoore added a commit that referenced this issue Dec 14, 2016
Bring back commit bc51ddd ("netns: avoid disabling irq for netns
id") now that we've fixed some audit multicast issues that caused
problems with original attempt.  Additional information, and history,
can be found in the links below:

 * #22
 * #23

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
@pcmoore
Copy link
Member Author

pcmoore commented Mar 23, 2017

Resolved in Linux v4.10.

@pcmoore pcmoore closed this as completed Mar 23, 2017
pcmoore pushed a commit that referenced this issue Mar 27, 2017
As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pcmoore pushed a commit that referenced this issue Sep 5, 2017
syzkaller reported a double free [1], caused by the fact
that tun driver was not updated properly when priv_destructor
was added.

When/if register_netdevice() fails, priv_destructor() must have been
called already.

[1]
BUG: KASAN: double-free or invalid-free in selinux_tun_dev_free_security+0x15/0x20 security/selinux/hooks.c:5023

CPU: 0 PID: 2919 Comm: syzkaller227220 Not tainted 4.13.0-rc4+ #23
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:52
 print_address_description+0x7f/0x260 mm/kasan/report.c:252
 kasan_report_double_free+0x55/0x80 mm/kasan/report.c:333
 kasan_slab_free+0xa0/0xc0 mm/kasan/kasan.c:514
 __cache_free mm/slab.c:3503 [inline]
 kfree+0xd3/0x260 mm/slab.c:3820
 selinux_tun_dev_free_security+0x15/0x20 security/selinux/hooks.c:5023
 security_tun_dev_free_security+0x48/0x80 security/security.c:1512
 tun_set_iff drivers/net/tun.c:1884 [inline]
 __tun_chr_ioctl+0x2ce6/0x3d50 drivers/net/tun.c:2064
 tun_chr_ioctl+0x2a/0x40 drivers/net/tun.c:2309
 vfs_ioctl fs/ioctl.c:45 [inline]
 do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
 SYSC_ioctl fs/ioctl.c:700 [inline]
 SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x443ff9
RSP: 002b:00007ffc34271f68 EFLAGS: 00000217 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000004002e0 RCX: 0000000000443ff9
RDX: 0000000020533000 RSI: 00000000400454ca RDI: 0000000000000003
RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000401ce0
R13: 0000000000401d70 R14: 0000000000000000 R15: 0000000000000000

Allocated by task 2919:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_kmalloc+0xaa/0xd0 mm/kasan/kasan.c:551
 kmem_cache_alloc_trace+0x101/0x6f0 mm/slab.c:3627
 kmalloc include/linux/slab.h:493 [inline]
 kzalloc include/linux/slab.h:666 [inline]
 selinux_tun_dev_alloc_security+0x49/0x170 security/selinux/hooks.c:5012
 security_tun_dev_alloc_security+0x6d/0xa0 security/security.c:1506
 tun_set_iff drivers/net/tun.c:1839 [inline]
 __tun_chr_ioctl+0x1730/0x3d50 drivers/net/tun.c:2064
 tun_chr_ioctl+0x2a/0x40 drivers/net/tun.c:2309
 vfs_ioctl fs/ioctl.c:45 [inline]
 do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
 SYSC_ioctl fs/ioctl.c:700 [inline]
 SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
 entry_SYSCALL_64_fastpath+0x1f/0xbe

Freed by task 2919:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_slab_free+0x6e/0xc0 mm/kasan/kasan.c:524
 __cache_free mm/slab.c:3503 [inline]
 kfree+0xd3/0x260 mm/slab.c:3820
 selinux_tun_dev_free_security+0x15/0x20 security/selinux/hooks.c:5023
 security_tun_dev_free_security+0x48/0x80 security/security.c:1512
 tun_free_netdev+0x13b/0x1b0 drivers/net/tun.c:1563
 register_netdevice+0x8d0/0xee0 net/core/dev.c:7605
 tun_set_iff drivers/net/tun.c:1859 [inline]
 __tun_chr_ioctl+0x1caf/0x3d50 drivers/net/tun.c:2064
 tun_chr_ioctl+0x2a/0x40 drivers/net/tun.c:2309
 vfs_ioctl fs/ioctl.c:45 [inline]
 do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
 SYSC_ioctl fs/ioctl.c:700 [inline]
 SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
 entry_SYSCALL_64_fastpath+0x1f/0xbe

The buggy address belongs to the object at ffff8801d2843b40
 which belongs to the cache kmalloc-32 of size 32
The buggy address is located 0 bytes inside of
 32-byte region [ffff8801d2843b40, ffff8801d2843b60)
The buggy address belongs to the page:
page:ffffea000660cea8 count:1 mapcount:0 mapping:ffff8801d2843000 index:0xffff8801d2843fc1
flags: 0x200000000000100(slab)
raw: 0200000000000100 ffff8801d2843000 ffff8801d2843fc1 000000010000003f
raw: ffffea0006626a40 ffffea00066141a0 ffff8801dbc00100
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8801d2843a00: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
 ffff8801d2843a80: 00 00 00 fc fc fc fc fc fb fb fb fb fc fc fc fc
>ffff8801d2843b00: 00 00 00 00 fc fc fc fc fb fb fb fb fc fc fc fc
                                           ^
 ffff8801d2843b80: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
 ffff8801d2843c00: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc

==================================================================

Fixes: cf124db ("net: Fix inconsistent teardown and release of private netdev state.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kdrag0n pushed a commit to kdrag0n/proton_bluecross that referenced this issue Dec 24, 2018
Sending audit netlink multicast messages is bad for all the same
reasons that sending audit netlink unicast messages is bad, so this
patch reworks things so that we don't do the multicast send in
audit_log_end(), we do it from the dedicated kauditd_thread thread just
as we do for unicast messages.

See the GitHub issues below for more information/history:

 * linux-audit/audit-kernel#23
 * linux-audit/audit-kernel#22

Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: kdrag0n <dragon@khronodragon.com>
Eliminater74 pushed a commit to DevOnePlus/nebula_kernel_sdm845_rev3 that referenced this issue Feb 9, 2019
Sending audit netlink multicast messages is bad for all the same
reasons that sending audit netlink unicast messages is bad, so this
patch reworks things so that we don't do the multicast send in
audit_log_end(), we do it from the dedicated kauditd_thread thread just
as we do for unicast messages.

See the GitHub issues below for more information/history:

 * linux-audit/audit-kernel#23
 * linux-audit/audit-kernel#22

Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: kdrag0n <dragon@khronodragon.com>
Signed-off-by: Eliminater74 <eliminater74@gmail.com>
mawrick26 pushed a commit to mawrick26/SDM845_P_9.0 that referenced this issue Feb 21, 2019
Sending audit netlink multicast messages is bad for all the same
reasons that sending audit netlink unicast messages is bad, so this
patch reworks things so that we don't do the multicast send in
audit_log_end(), we do it from the dedicated kauditd_thread thread just
as we do for unicast messages.

See the GitHub issues below for more information/history:

 * linux-audit/audit-kernel#23
 * linux-audit/audit-kernel#22

Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: kdrag0n <dragon@khronodragon.com>
Signed-off-by: Eliminater74 <eliminater74@gmail.com>
mawrick26 pushed a commit to mawrick26/SDM845_P_9.0 that referenced this issue Feb 25, 2019
Sending audit netlink multicast messages is bad for all the same
reasons that sending audit netlink unicast messages is bad, so this
patch reworks things so that we don't do the multicast send in
audit_log_end(), we do it from the dedicated kauditd_thread thread just
as we do for unicast messages.

See the GitHub issues below for more information/history:

 * linux-audit/audit-kernel#23
 * linux-audit/audit-kernel#22

Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: kdrag0n <dragon@khronodragon.com>
Signed-off-by: Eliminater74 <eliminater74@gmail.com>
mawrick26 pushed a commit to mawrick26/SDM845_P_9.0 that referenced this issue Mar 23, 2019
Sending audit netlink multicast messages is bad for all the same
reasons that sending audit netlink unicast messages is bad, so this
patch reworks things so that we don't do the multicast send in
audit_log_end(), we do it from the dedicated kauditd_thread thread just
as we do for unicast messages.

See the GitHub issues below for more information/history:

 * linux-audit/audit-kernel#23
 * linux-audit/audit-kernel#22

Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: kdrag0n <dragon@khronodragon.com>
Signed-off-by: Eliminater74 <eliminater74@gmail.com>
pcmoore pushed a commit that referenced this issue Jan 27, 2020
Before commit 4bfc0bb ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
cgroup bpf structures were released with
corresponding cgroup structures. It guaranteed the hierarchical order
of destruction: children were always first. It preserved attached
programs from being released before their propagated copies.

But with cgroup auto-detachment there are no such guarantees anymore:
cgroup bpf is released as soon as the cgroup is offline and there are
no live associated sockets. It means that an attached program can be
detached and released, while its propagated copy is still living
in the cgroup subtree. This will obviously lead to an use-after-free
bug.

To reproduce the issue the following script can be used:

  #!/bin/bash

  CGROOT=/sys/fs/cgroup

  mkdir -p ${CGROOT}/A ${CGROOT}/B ${CGROOT}/A/C
  sleep 1

  ./test_cgrp2_attach ${CGROOT}/A egress &
  A_PID=$!
  ./test_cgrp2_attach ${CGROOT}/B egress &
  B_PID=$!

  echo $$ > ${CGROOT}/A/C/cgroup.procs
  iperf -s &
  S_PID=$!
  iperf -c localhost -t 100 &
  C_PID=$!

  sleep 1

  echo $$ > ${CGROOT}/B/cgroup.procs
  echo ${S_PID} > ${CGROOT}/B/cgroup.procs
  echo ${C_PID} > ${CGROOT}/B/cgroup.procs

  sleep 1

  rmdir ${CGROOT}/A/C
  rmdir ${CGROOT}/A

  sleep 1

  kill -9 ${S_PID} ${C_PID} ${A_PID} ${B_PID}

On the unpatched kernel the following stacktrace can be obtained:

[   33.619799] BUG: unable to handle page fault for address: ffffbdb4801ab002
[   33.620677] #PF: supervisor read access in kernel mode
[   33.621293] #PF: error_code(0x0000) - not-present page
[   33.622754] Oops: 0000 [#1] SMP NOPTI
[   33.623202] CPU: 0 PID: 601 Comm: iperf Not tainted 5.5.0-rc2+ #23
[   33.625545] RIP: 0010:__cgroup_bpf_run_filter_skb+0x29f/0x3d0
[   33.635809] Call Trace:
[   33.636118]  ? __cgroup_bpf_run_filter_skb+0x2bf/0x3d0
[   33.636728]  ? __switch_to_asm+0x40/0x70
[   33.637196]  ip_finish_output+0x68/0xa0
[   33.637654]  ip_output+0x76/0xf0
[   33.638046]  ? __ip_finish_output+0x1c0/0x1c0
[   33.638576]  __ip_queue_xmit+0x157/0x410
[   33.639049]  __tcp_transmit_skb+0x535/0xaf0
[   33.639557]  tcp_write_xmit+0x378/0x1190
[   33.640049]  ? _copy_from_iter_full+0x8d/0x260
[   33.640592]  tcp_sendmsg_locked+0x2a2/0xdc0
[   33.641098]  ? sock_has_perm+0x10/0xa0
[   33.641574]  tcp_sendmsg+0x28/0x40
[   33.641985]  sock_sendmsg+0x57/0x60
[   33.642411]  sock_write_iter+0x97/0x100
[   33.642876]  new_sync_write+0x1b6/0x1d0
[   33.643339]  vfs_write+0xb6/0x1a0
[   33.643752]  ksys_write+0xa7/0xe0
[   33.644156]  do_syscall_64+0x5b/0x1b0
[   33.644605]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix this by grabbing a reference to the bpf structure of each ancestor
on the initialization of the cgroup bpf structure, and dropping the
reference at the end of releasing the cgroup bpf structure.

This will restore the hierarchical order of cgroup bpf releasing,
without adding any operations on hot paths.

Thanks to Josef Bacik for the debugging and the initial analysis of
the problem.

Fixes: 4bfc0bb ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
Reported-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
laststandrighthere pushed a commit to laststandrighthere/sleepy-vince that referenced this issue Mar 25, 2020
Sending audit netlink multicast messages is bad for all the same
reasons that sending audit netlink unicast messages is bad, so this
patch reworks things so that we don't do the multicast send in
audit_log_end(), we do it from the dedicated kauditd_thread thread just
as we do for unicast messages.

See the GitHub issues below for more information/history:

 * linux-audit/audit-kernel#23
 * linux-audit/audit-kernel#22

Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Lau <laststandrighthere@gmail.com>
krazey pushed a commit to krazey/android_kernel_samsung_exynos9810 that referenced this issue Nov 26, 2021
Sending audit netlink multicast messages is bad for all the same
reasons that sending audit netlink unicast messages is bad, so this
patch reworks things so that we don't do the multicast send in
audit_log_end(), we do it from the dedicated kauditd_thread thread just
as we do for unicast messages.

See the GitHub issues below for more information/history:

 * linux-audit/audit-kernel#23
 * linux-audit/audit-kernel#22

Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: krazey <admin@krazey.de>

Conflicts:
kernel/audit.c
pcmoore pushed a commit that referenced this issue Oct 3, 2022
After modifying the QP to the Error state, all RX WR would be completed
with WC in IB_WC_WR_FLUSH_ERR status. Current implementation does not
wait for it is done, but destroy the QP and free the link group directly.
So there is a risk that accessing the freed memory in tasklet context.

Here is a crash example:

 BUG: unable to handle page fault for address: ffffffff8f220860
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 PGD f7300e067 P4D f7300e067 PUD f7300f063 PMD 8c4e45063 PTE 800ffff08c9df060
 Oops: 0002 [#1] SMP PTI
 CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Tainted: G S         OE     5.10.0-0607+ #23
 Hardware name: Inspur NF5280M4/YZMB-00689-101, BIOS 4.1.20 07/09/2018
 RIP: 0010:native_queued_spin_lock_slowpath+0x176/0x1b0
 Code: f3 90 48 8b 32 48 85 f6 74 f6 eb d5 c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 05 48 63 f6 48 05 00 c8 02 00 48 03 04 f5 00 09 98 8e <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32
 RSP: 0018:ffffb3b6c001ebd8 EFLAGS: 00010086
 RAX: ffffffff8f220860 RBX: 0000000000000246 RCX: 0000000000080000
 RDX: ffff91db1f86c800 RSI: 000000000000173c RDI: ffff91db62bace00
 RBP: ffff91db62bacc00 R08: 0000000000000000 R09: c00000010000028b
 R10: 0000000000055198 R11: ffffb3b6c001ea58 R12: ffff91db80e05010
 R13: 000000000000000a R14: 0000000000000006 R15: 0000000000000040
 FS:  0000000000000000(0000) GS:ffff91db1f840000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffff8f220860 CR3: 00000001f9580004 CR4: 00000000003706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <IRQ>
  _raw_spin_lock_irqsave+0x30/0x40
  mlx5_ib_poll_cq+0x4c/0xc50 [mlx5_ib]
  smc_wr_rx_tasklet_fn+0x56/0xa0 [smc]
  tasklet_action_common.isra.21+0x66/0x100
  __do_softirq+0xd5/0x29c
  asm_call_irq_on_stack+0x12/0x20
  </IRQ>
  do_softirq_own_stack+0x37/0x40
  irq_exit_rcu+0x9d/0xa0
  sysvec_call_function_single+0x34/0x80
  asm_sysvec_call_function_single+0x12/0x20

Fixes: bd4ad57 ("smc: initialize IB transport incl. PD, MR, QP, CQ, event, WR")
Signed-off-by: Yacan Liu <liuyacan@corp.netease.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pcmoore pushed a commit that referenced this issue Oct 17, 2022
Fix port I/O string accessors such as `insb', `outsb', etc. which use
the physical PCI port I/O address rather than the corresponding memory
mapping to get at the requested location, which in turn breaks at least
accesses made by our parport driver to a PCIe parallel port such as:

PCI parallel port detected: 1415:c118, I/O at 0x1000(0x1008), IRQ 20
parport0: PC-style at 0x1000 (0x1008), irq 20, using FIFO [PCSPP,TRISTATE,COMPAT,EPP,ECP]

causing a memory access fault:

Unable to handle kernel access to user memory without uaccess routines at virtual address 0000000000001008
Oops [#1]
Modules linked in:
CPU: 1 PID: 350 Comm: cat Not tainted 6.0.0-rc2-00283-g10d4879f9ef0-dirty #23
Hardware name: SiFive HiFive Unmatched A00 (DT)
epc : parport_pc_fifo_write_block_pio+0x266/0x416
 ra : parport_pc_fifo_write_block_pio+0xb4/0x416
epc : ffffffff80542c3e ra : ffffffff80542a8c sp : ffffffd88899fc60
 gp : ffffffff80fa2700 tp : ffffffd882b1e900 t0 : ffffffd883d0b000
 t1 : ffffffffff000002 t2 : 4646393043330a38 s0 : ffffffd88899fcf0
 s1 : 0000000000001000 a0 : 0000000000000010 a1 : 0000000000000000
 a2 : ffffffd883d0a010 a3 : 0000000000000023 a4 : 00000000ffff8fbb
 a5 : ffffffd883d0a001 a6 : 0000000100000000 a7 : ffffffc800000000
 s2 : ffffffffff000002 s3 : ffffffff80d28880 s4 : ffffffff80fa1f50
 s5 : 0000000000001008 s6 : 0000000000000008 s7 : ffffffd883d0a000
 s8 : 0004000000000000 s9 : ffffffff80dc1d80 s10: ffffffd8807e4000
 s11: 0000000000000000 t3 : 00000000000000ff t4 : 393044410a303930
 t5 : 0000000000001000 t6 : 0000000000040000
status: 0000000200000120 badaddr: 0000000000001008 cause: 000000000000000f
[<ffffffff80543212>] parport_pc_compat_write_block_pio+0xfe/0x200
[<ffffffff8053bbc0>] parport_write+0x46/0xf8
[<ffffffff8050530e>] lp_write+0x158/0x2d2
[<ffffffff80185716>] vfs_write+0x8e/0x2c2
[<ffffffff80185a74>] ksys_write+0x52/0xc2
[<ffffffff80185af2>] sys_write+0xe/0x16
[<ffffffff80003770>] ret_from_syscall+0x0/0x2
---[ end trace 0000000000000000 ]---

For simplicity address the problem by adding PCI_IOBASE to the physical
address requested in the respective wrapper macros only, observing that
the raw accessors such as `__insb', `__outsb', etc. are not supposed to
be used other than by said macros.  Remove the cast to `long' that is no
longer needed on `addr' now that it is used as an offset from PCI_IOBASE
and add parentheses around `addr' needed for predictable evaluation in
macro expansion.  No need to make said adjustments in separate changes
given that current code is gravely broken and does not ever work.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Fixes: fab957c ("RISC-V: Atomic and Locking Code")
Cc: stable@vger.kernel.org # v4.15+
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2209220223080.29493@angie.orcam.me.uk
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
pcmoore pushed a commit that referenced this issue Sep 11, 2023
Syzbot reported a null-ptr-deref of sqd->thread inside
io_sqpoll_wq_cpu_affinity.  It turns out the sqd->thread can go away
from under us during io_uring_register, in case the process gets a
fatal signal during io_uring_register.

It is not particularly hard to hit the race, and while I am not sure
this is the exact case hit by syzbot, it solves it.  Finally, checking
->thread is enough to close the race because we locked sqd while
"parking" the thread, thus preventing it from going away.

I reproduced it fairly consistently with a program that does:

int main(void) {
  ...
  io_uring_queue_init(RING_LEN, &ring1, IORING_SETUP_SQPOLL);
  while (1) {
    io_uring_register_iowq_aff(ring, 1, &mask);
  }
}

Executed in a loop with timeout to trigger SIGTERM:
  while true; do timeout 1 /a.out ; done

This will hit the following BUG() in very few attempts.

BUG: kernel NULL pointer dereference, address: 00000000000007a8
PGD 800000010e949067 P4D 800000010e949067 PUD 10e46e067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 15715 Comm: dead-sqpoll Not tainted 6.5.0-rc7-next-20230825-g193296236fa0-dirty #23
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:io_sqpoll_wq_cpu_affinity+0x27/0x70
Code: 90 90 90 0f 1f 44 00 00 55 53 48 8b 9f 98 03 00 00 48 85 db 74 4f
48 89 df 48 89 f5 e8 e2 f8 ff ff 48 8b 43 38 48 85 c0 74 22 <48> 8b b8
a8 07 00 00 48 89 ee e8 ba b1 00 00 48 89 df 89 c5 e8 70
RSP: 0018:ffffb04040ea7e70 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff93c010749e40 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffffffa7653331 RDI: 00000000ffffffff
RBP: ffffb04040ea7eb8 R08: 0000000000000000 R09: c0000000ffffdfff
R10: ffff93c01141b600 R11: ffffb04040ea7d18 R12: ffff93c00ea74840
R13: 0000000000000011 R14: 0000000000000000 R15: ffff93c00ea74800
FS:  00007fb7c276ab80(0000) GS:ffff93c36f200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000007a8 CR3: 0000000111634003 CR4: 0000000000370ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 ? __die_body+0x1a/0x60
 ? page_fault_oops+0x154/0x440
 ? do_user_addr_fault+0x174/0x7b0
 ? exc_page_fault+0x63/0x140
 ? asm_exc_page_fault+0x22/0x30
 ? io_sqpoll_wq_cpu_affinity+0x27/0x70
 __io_register_iowq_aff+0x2b/0x60
 __io_uring_register+0x614/0xa70
 __x64_sys_io_uring_register+0xaa/0x1a0
 do_syscall_64+0x3a/0x90
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
RIP: 0033:0x7fb7c226fec9
Code: 2e 00 b8 ca 00 00 00 0f 05 eb a5 66 0f 1f 44 00 00 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d 97 7f 2d 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe2c0674f8 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb7c226fec9
RDX: 00007ffe2c067530 RSI: 0000000000000011 RDI: 0000000000000003
RBP: 00007ffe2c0675d0 R08: 00007ffe2c067550 R09: 00007ffe2c067550
R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe2c067750 R14: 0000000000000000 R15: 0000000000000000
 </TASK>
Modules linked in:
CR2: 00000000000007a8
---[ end trace 0000000000000000 ]---

Reported-by: syzbot+c74fea926a78b8a91042@syzkaller.appspotmail.com
Fixes: ebdfefc ("io_uring/sqpoll: fix io-wq affinity when IORING_SETUP_SQPOLL is used")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/87v8cybuo6.fsf@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant