Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel changes post-4.7 cause PTRACE_SYSCALL notifications to happen before PTRACE_SECCOMP_EVENT #1762

Closed
rocallahan opened this issue Aug 2, 2016 · 11 comments

Comments

@rocallahan
Copy link
Collaborator

See torvalds/linux@93e35ef. @pipcet reported this in #1552.

This breaks rr, but not fatally. In fact, it makes rr recording a bit more efficient. When recording a non-buffered syscall before, we'd have incur three ptrace-stops: PTRACE_SECCOMP_EVENT, PTRACE_SYSCALL stop, then a final PTRACE_SYSCALL stop for syscall exit. The second stop there is basically redundant with the first stop, but there was no way to stop at syscall exit without incurring that second stop. With the new event ordering, if we always continue to the next system call with PTRACE_CONT, we'll skip right over the syscall-entry PTRACE_SYSCALL stop and just incur two stops: PTRACE_SECCOMP_EVENT and the PTRACE_SYSCALL stop for syscall exit.

We have to update a fair bit of code to detect and handle this reordering. Syscall-entry code that used to run at the first PTRACE_SYSCALL stop has to be triggered by PTRACE_SECCOMP_EVENT instead. Of course, for many years we'll have to support both behaviors.

@rocallahan
Copy link
Collaborator Author

I've got this mostly working, with 92% of tests passing. I'm stuck on the clock_nanosleep test, which is very simple and the problem is probably affecting the other tests: the main thread does an exit_group while another thread is still running, and sometimes that other thread exits without us getting a PTRACE_EVENT_EXIT for it :-(. The problem is intermittent and seems to not be reproducible if I run rr under ftrace, or strace, or gdb, though it usually does happen if I run rr normally.

This could have been a preexisting bug (in older kernels, or rr without my changes), but I don't recall seeing it ever happen before. My changes might somehow cause this, but I can't see how (the logs look very clear about what happens):

[RecordSession] trace time 228: Active task is 32702. Events:
[RecordSession]   32702: handle_ptrace_event PTRACE_EVENT_SECCOMP: event (none)
[RecordSession]   traced syscall entered: nanosleep
[RecordTask]   is syscall interruption of recorded (none)? (now nanosleep)
[RecordSession] EXEC_SYSCALL_ENTRY: status=0x7057f (PTRACE_EVENT_SECCOMP)
[RecordSession] after cont: status=0x7057f (PTRACE_EVENT_SECCOMP)
[RecordSession] EXEC_START: status=0x7057f (PTRACE_EVENT_SECCOMP)
[RecordTask]   is syscall interruption of recorded SYSCALL: nanosleep? (now nanosleep)
[Task] resuming execution of 32702 with PTRACE_SYSCALL
[Scheduler] Scheduling next task
[Scheduler]   32702 is blocked on SYSCALL: nanosleep; checking status ...
[Task] waitpid(32702, NOHANG) returns 0, status 0 (EXIT-0)
[Scheduler]   still blocked
[Scheduler]   need to reschedule
... no mentions of 32702 ...
[Scheduler] Scheduling next task
[Scheduler]   (32701 is un-switchable at SYSCALL: exit_group)
[Scheduler]   and running; waiting for state change
[Task] going into blocking waitpid(32701) ...
[Task]   waitpid(32701) returns 32701; status 0x6057f (PTRACE_EVENT_EXIT)
[Task]   (refreshing register cache)
[Scheduler]   new status is 0x6057f (PTRACE_EVENT_EXIT)
[RecordSession] trace time 241: Active task is 32701. Events:
[WARN handle_ptrace_exit_event() errno: SUCCESS] unstable exit; may misrecord CLONE_CHILD_CLEARTID memory race
[Task] task 32701 (rec:32701) is dying ...
[WARN ~Task() errno: SUCCESS] 32701 is unstable; not blocking on its termination
[Task]   dead
[Scheduler] Scheduling next task
[Scheduler]   need to reschedule
[Scheduler]   32702 is unstable
[Scheduler]   all tasks blocked or some unstable, waiting for runnable (1 total)
[Scheduler]   32702 changed status to 0 (EXIT-0)
[Task]   (refreshing register cache)
[FATAL /home/roc/rr/rr/src/Task.cc:1276:did_waitpid() errno: ESRCH] 
 (task 32702 (rec:32702) at time 242)
 -> Assertion `false' failed to hold. 
Launch gdb with 
  gdb '-l' '-1' '-ex' 'target extended-remote :32702' /home/roc/rr/obj/bin/clock_nanosleep

So maybe it's a regression in the kernel ... maybe related to the seccomp changes, maybe not.

@rocallahan
Copy link
Collaborator Author

Building Linux master with that one commit reverted seems to make the bug go away, so it's definitely that kernel commit or my rr changes.

@rocallahan
Copy link
Collaborator Author

This also happens in the nanosleep test.

What seems to happen is that the non-main thread enters its final, long nanosleep, rr gets the notification, and then the non-main thread is not scheduled again. The main thread proceeds to exit_group; at this point the non-main thread's kernel stack looks like

[<ffffffff810b5837>] ptrace_stop+0x167/0x2a0
[<ffffffff810b5a08>] ptrace_do_notify+0x98/0xc0
[<ffffffff810b6e6b>] ptrace_notify+0x5b/0x80
[<ffffffff8116235a>] __seccomp_filter+0x20a/0x270
[<ffffffff81162a35>] __secure_computing+0x35/0xb0
[<ffffffff810033ae>] syscall_trace_enter+0xce/0x2f0
[<ffffffff81003d37>] do_syscall_64+0x147/0x160
[<ffffffff817f4821>] return_from_SYSCALL_64+0x0/0x6a
[<ffffffffffffffff>] 0xffffffffffffffff

@rocallahan
Copy link
Collaborator Author

With a fair amount of pain I've managed to construct a standalone testcase: https://gist.github.com/rocallahan/b09b1de28a32918cb27d4ad68421678d
I've fairly confident it's a kernel bug now.

@rocallahan
Copy link
Collaborator Author

rocallahan commented Aug 2, 2016

And it reproduces in a kernel without the seccomp reordering changes. So it seems to be a longstanding kernel bug where an exit_group while a thread is at that point in ptrace_stop causes the thread to exit without reporting PTRACE_EVENT_EXIT.

@rocallahan
Copy link
Collaborator Author

I've figured this out. Here's what happens...

The problem occurs in this code in __seccomp_filter:

                /* Allow the BPF to provide the event message */
                ptrace_event(PTRACE_EVENT_SECCOMP, data);
                /*
                 * The delivery of a fatal signal during event
                 * notification may silently skip tracer notification.
                 * Terminating the task now avoids executing a system
                 * call that may not be intended.
                 */
                if (fatal_signal_pending(current)) {
                        do_exit(SIGSYS);
                }

When another thread in the thread-group does exit_group while a tracee thread is in the above ptrace-stop (or just after the stop has resumed but before we reach the fatal_signal_pending check), zap_other_threads marks the tracee thread as having a pending SIGKILL, so the tracee thread takes this do_exit path. do_exit calls ptrace_event(PTRACE_EVENT_EXIT, code) which puts the tracee thread into state TASK_TRACED and then (indirectly) calls __schedule. __schedule has this code:

                 if (unlikely(signal_pending_state(prev->state, prev))) {
                        prev->state = TASK_RUNNING;

In this case, SIGKILL is still pending so we change the state to TASK_RUNNING. The ptracer is notified and wakes up in wait4, which examines the tracee thread in wait_task_stopped. wait_task_stopped (via task_stopped_code) decides to skip the tracee because it is not in the TASK_TRACED state, so the ptracer doesn't see the PTRACE_EVENT_EXIT.

When a thread does an exit_group while the tracee thread is anywhere else, the synthetic SIGKILL is detected via get_signal which dequeues the signal before calling do_exit, which means the scheduler code does not force the transition back to TASK_RUNNING and everything works.

@rocallahan
Copy link
Collaborator Author

The seccomp reordering change made this a problem for rr because before that change, rr would always advance a tracee from the PTRACE_EVENT_SECCOMP stop to the PTRACE_SYSCALL stop before running any other tracee, so there was no way for another thread to do exit_group while the tracee was in the problematic section.

@rocallahan
Copy link
Collaborator Author

I'm not sure how to fix this, though that's mainly my lack of kernel experience. I think probably the seccomp code should dequeue the fatal signal before entering do_exit, to behave more like get_signal.

@rocallahan
Copy link
Collaborator Author

Email sent to LKML.

@rocallahan
Copy link
Collaborator Author

Here is the email thread: http://marc.info/?l=linux-kernel&m=147026862328685&w=2

0day-ci pushed a commit to 0day-ci/linux that referenced this issue Aug 11, 2016
This fixes a ptrace vs fatal pending signals bug as manifested in seccomp
now that ptrace was reordered to happen after ptrace. The short version is
that seccomp should not attempt to call do_exit() while fatal signals are
pending under a tracer. This was needlessly paranoid. Instead, the syscall
can just be skipped and normal signal handling, tracer notification, and
process death can happen.

Slightly edited original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35ef ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
torvalds pushed a commit to torvalds/linux that referenced this issue Aug 31, 2016
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35ef ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
0day-ci pushed a commit to 0day-ci/linux that referenced this issue Sep 2, 2016
GIT 071e31e254e0e0c438eecba3dba1d6e2d0da36c2

commit 9f834ec18defc369d73ccf9e87a2790bfa05bf46
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon Aug 22 16:41:46 2016 -0700

    binfmt_elf: switch to new creds when switching to new mm
    
    We used to delay switching to the new credentials until after we had
    mapped the executable (and possible elf interpreter).  That was kind of
    odd to begin with, since the new executable will actually then _run_
    with the new creds, but whatever.
    
    The bigger problem was that we also want to make sure that we turn off
    prof events and tracing before we start mapping the new executable
    state.  So while this is a cleanup, it's also a fix for a possible
    information leak.
    
    Reported-by: Robert Święcki <robert@swiecki.net>
    Tested-by: Peter Zijlstra <peterz@infradead.org>
    Acked-by: David Howells <dhowells@redhat.com>
    Acked-by: Oleg Nesterov <oleg@redhat.com>
    Acked-by: Andy Lutomirski <luto@amacapital.net>
    Acked-by: Eric W. Biederman <ebiederm@xmission.com>
    Cc: Willy Tarreau <w@1wt.eu>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit 485a252a5559b45d7df04c819ec91177c62c270b
Author: Kees Cook <keescook@chromium.org>
Date:   Wed Aug 10 16:28:09 2016 -0700

    seccomp: Fix tracer exit notifications during fatal signals
    
    This fixes a ptrace vs fatal pending signals bug as manifested in
    seccomp now that seccomp was reordered to happen after ptrace. The
    short version is that seccomp should not attempt to call do_exit()
    while fatal signals are pending under a tracer. The existing code was
    trying to be as defensively paranoid as possible, but it now ends up
    confusing ptrace. Instead, the syscall can just be skipped (which solves
    the original concern that the do_exit() was addressing) and normal signal
    handling, tracer notification, and process death can happen.
    
    Paraphrasing from the original bug report:
    
    If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
    after such a trap but not yet been scheduled, and another task in the
    thread-group calls exit_group(), then the tracee task exits without the
    ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
    https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7
    
    The bug happens because when __seccomp_filter() detects
    fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
    signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
    that task is descheduled, __schedule() notices that there is a fatal
    signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
    That prevents the ptracer's waitpid() from returning the ptrace event.
    A more detailed analysis is here:
    https://github.com/mozilla/rr/issues/1762#issuecomment-237396255.
    
    Reported-by: Robert O'Callahan <robert@ocallahan.org>
    Reported-by: Kyle Huey <khuey@kylehuey.com>
    Tested-by: Kyle Huey <khuey@kylehuey.com>
    Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
    Signed-off-by: Kees Cook <keescook@chromium.org>
    Acked-by: Oleg Nesterov <oleg@redhat.com>
    Acked-by: James Morris <james.l.morris@oracle.com>

commit 0d025d271e55f3de21f0aaaf54b42d20404d2b23
Author: Josh Poimboeuf <jpoimboe@redhat.com>
Date:   Tue Aug 30 08:04:16 2016 -0500

    mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
    
    There are three usercopy warnings which are currently being silenced for
    gcc 4.6 and newer:
    
    1) "copy_from_user() buffer size is too small" compile warning/error
    
       This is a static warning which happens when object size and copy size
       are both const, and copy size > object size.  I didn't see any false
       positives for this one.  So the function warning attribute seems to
       be working fine here.
    
       Note this scenario is always a bug and so I think it should be
       changed to *always* be an error, regardless of
       CONFIG_DEBUG_STRICT_USER_COPY_CHECKS.
    
    2) "copy_from_user() buffer size is not provably correct" compile warning
    
       This is another static warning which happens when I enable
       __compiletime_object_size() for new compilers (and
       CONFIG_DEBUG_STRICT_USER_COPY_CHECKS).  It happens when object size
       is const, but copy size is *not*.  In this case there's no way to
       compare the two at build time, so it gives the warning.  (Note the
       warning is a byproduct of the fact that gcc has no way of knowing
       whether the overflow function will be called, so the call isn't dead
       code and the warning attribute is activated.)
    
       So this warning seems to only indicate "this is an unusual pattern,
       maybe you should check it out" rather than "this is a bug".
    
       I get 102(!) of these warnings with allyesconfig and the
       __compiletime_object_size() gcc check removed.  I don't know if there
       are any real bugs hiding in there, but from looking at a small
       sample, I didn't see any.  According to Kees, it does sometimes find
       real bugs.  But the false positive rate seems high.
    
    3) "Buffer overflow detected" runtime warning
    
       This is a runtime warning where object size is const, and copy size >
       object size.
    
    All three warnings (both static and runtime) were completely disabled
    for gcc 4.6 with the following commit:
    
      2fb0815c9ee6 ("gcc4: disable __compiletime_object_size for GCC 4.6+")
    
    That commit mistakenly assumed that the false positives were caused by a
    gcc bug in __compiletime_object_size().  But in fact,
    __compiletime_object_size() seems to be working fine.  The false
    positives were instead triggered by #2 above.  (Though I don't have an
    explanation for why the warnings supposedly only started showing up in
    gcc 4.6.)
    
    So remove warning #2 to get rid of all the false positives, and re-enable
    warnings #1 and #3 by reverting the above commit.
    
    Furthermore, since #1 is a real bug which is detected at compile time,
    upgrade it to always be an error.
    
    Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer
    needed.
    
    Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: "H . Peter Anvin" <hpa@zytor.com>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Brian Gerst <brgerst@gmail.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Frederic Weisbecker <fweisbec@gmail.com>
    Cc: Byungchul Park <byungchul.park@lge.com>
    Cc: Nilay Vaish <nilayvaish@gmail.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit 9ebae9e4bcd7dff22536af8a969d8f66e6f23900
Author: Alan Cox <alan@linux.intel.com>
Date:   Tue Aug 30 16:47:02 2016 +0100

    pata_ninja32: Avoid corrupting status flags
    
    Ninja32 needs to set some flags to indicate it does 32bit IO. However it currently assigns this which
    loses the initializing flag and causes a warning spew. Fix it to use a logical or as is intended.
    
    Signed-off-by: Alan Cox <alan@linux.intel.com>
    Tested-by: Ellmar Stelnberger <estellnb@elstel.org>
    Signed-off-by: Tejun Heo <tj@kernel.org>

commit 98b0f80c2396224bbbed81792b526e6c72ba9efa
Author: Trond Myklebust <trond.myklebust@primarydata.com>
Date:   Mon Aug 29 11:15:36 2016 -0400

    NFSv4.x: Fix a refcount leak in nfs_callback_up_net
    
    On error, the callers expect us to return without bumping
    nn->cb_users[].
    
    Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
    Cc: stable@vger.kernel.org # v3.7+

commit 52442f9b11b7e5d4a38d99143011831fd171f8d9
Author: Benjamin Coddington <bcodding@redhat.com>
Date:   Tue Aug 30 09:20:32 2016 -0400

    NFS4: Avoid migration loops
    
    If a server returns itself as a location while migrating, the client may
    end up getting stuck attempting to migrate twice to the same server.  Catch
    this by checking if the nfs_client found is the same as the existing
    client.  For the other two callers to nfs4_set_client, the nfs_client will
    always be ERR_PTR(-EINVAL).
    
    Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

commit 3dc147359e3dcdf0648f1e2c11f62cfae3160df0
Author: Trond Myklebust <trond.myklebust@primarydata.com>
Date:   Mon Aug 29 15:12:54 2016 -0400

    pNFS/flexfiles: Fix an Oopsable condition when connection to the DS fails
    
    If the attempt to connect to a DS fails inside ff_layout_pg_init_read or
    ff_layout_pg_init_write, then we currently end up clearing the layout
    segment carried by the struct nfs_pageio_descriptor, causing an Oops
    when we later call into ff_layout_read_pagelist/ff_layout_write_pagelist.
    
    The fix is to ensure we return the layout and then retry.
    
    Fixes: 446ca2195303 ("pNFS/flexfiles: When initing reads or writes, we...")
    Cc: stable@vger.kernel.org # v4.7+
    Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

commit 3c3292634fc2de1ab97b6aa3222fee647f737adb
Author: Jean Delvare <jdelvare@suse.de>
Date:   Mon Aug 29 13:18:23 2016 +0200

    hwmon: (it87) Add missing sysfs attribute group terminator
    
    Attribute array it87_attributes_in lacks its NULL terminator,
    causing random behavior when operating on the attribute group.
    
    Fixes: 52929715634a ("hwmon: (it87) Use is_visible for voltage sensors")
    Signed-off-by: Jean Delvare <jdelvare@suse.de>
    Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
    Cc: Guenter Roeck <linux@roeck-us.net>
    Cc: stable@vger.kernel.org
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>

commit da43bf0c21e57fff0221da5de0a9a388ec0d27cd
Author: Paul Gortmaker <paul.gortmaker@windriver.com>
Date:   Mon Aug 15 18:24:59 2016 -0400

    intel_pmic_gpio: Make explicitly non-modular
    
    The Kconfig entry controlling compilation of this code is:
    
    drivers/platform/x86/Kconfig:config GPIO_INTEL_PMIC
    drivers/platform/x86/Kconfig:   bool "Intel PMIC GPIO support"
    
    ...meaning that it currently is not being built as a module by anyone.
    
    Lets remove the couple traces of modular infrastructure use, so that
    when reading the driver there is no doubt it is builtin-only.
    
    We delete the MODULE_LICENSE tag etc. since all that information
    was (or is now) contained at the top of the file in the comments.
    
    We don't replace module.h with init.h since the file already has that.
    
    Cc: Alek Du <alek.du@intel.com>
    Cc: platform-driver-x86@vger.kernel.org
    Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
    Signed-off-by: Darren Hart <dvhart@linux.intel.com>

commit f48d1496b8537d75776478c6942dd87f34d7f270
Author: Paul Gortmaker <paul.gortmaker@windriver.com>
Date:   Mon Aug 15 18:25:17 2016 -0400

    platform/olpc: Make ec explicitly non-modular
    
    The Kconfig entry controlling compilation of this code is:
    
    arch/x86/Kconfig:config OLPC
    arch/x86/Kconfig:       bool "One Laptop Per Child support"
    
    ...meaning that it currently is not being built as a module by anyone.
    
    Lets remove the couple traces of modular infrastructure use, so that
    when reading the driver there is no doubt it is builtin-only.
    
    We delete the MODULE_LICENSE tag etc. since all that information
    was (or is now) contained at the top of the file in the comments.
    
    Cc: platform-driver-x86@vger.kernel.org
    Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
    Acked-by: Andres Salomon <dilinger@queued.net>
    Signed-off-by: Darren Hart <dvhart@linux.intel.com>

commit b99b43bb4bdf1d361f7487cf03d803082bbf9101
Author: Owen Lin <olin@rivetnetworks.com>
Date:   Fri Aug 26 13:49:09 2016 +0800

    Add Killer E2500 device ID in alx driver.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 2fb04fdf30192ff1e2b5834e9b7745889ea8bbcb
Author: Russell King <rmk+kernel@armlinux.org.uk>
Date:   Sat Aug 27 17:33:03 2016 +0100

    net: smc91x: fix SMC accesses
    
    Commit b70661c70830 ("net: smc91x: use run-time configuration on all ARM
    machines") broke some ARM platforms through several mistakes.  Firstly,
    the access size must correspond to the following rule:
    
    (a) at least one of 16-bit or 8-bit access size must be supported
    (b) 32-bit accesses are optional, and may be enabled in addition to
        the above.
    
    Secondly, it provides no emulation of 16-bit accesses, instead blindly
    making 16-bit accesses even when the platform specifies that only 8-bit
    is supported.
    
    Reorganise smc91x.h so we can make use of the existing 16-bit access
    emulation already provided - if 16-bit accesses are supported, use
    16-bit accesses directly, otherwise if 8-bit accesses are supported,
    use the provided 16-bit access emulation.  If neither, BUG().  This
    exactly reflects the driver behaviour prior to the commit being fixed.
    
    Since the conversion incorrectly cut down the available access sizes on
    several platforms, we also need to go through every platform and fix up
    the overly-restrictive access size: Arnd assumed that if a platform can
    perform 32-bit, 16-bit and 8-bit accesses, then only a 32-bit access
    size needed to be specified - not so, all available access sizes must
    be specified.
    
    This likely fixes some performance regressions in doing this: if a
    platform does not support 8-bit accesses, 8-bit accesses have been
    emulated by performing a 16-bit read-modify-write access.
    
    Tested on the Intel Assabet/Neponset platform, which supports only 8-bit
    accesses, which was broken by the original commit.
    
    Fixes: b70661c70830 ("net: smc91x: use run-time configuration on all ARM machines")
    Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
    Tested-by: Robert Jarzmik <robert.jarzmik@free.fr>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 7d13eca09ed5e477f6ecfd97a35058762228b5e4
Author: Florian Fainelli <f.fainelli@gmail.com>
Date:   Sat Aug 27 15:34:20 2016 -0700

    Documentation: networking: dsa: Remove platform device TODO
    
    Since commit 83c0afaec7b7 ("net: dsa: Add new binding implementation"),
    the shortcomings of the dsa platform device have been addressed, remove
    that TODO item.
    
    Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
    Acked-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit e5835f2833b12808c53aa621d1d3aa085706b5b3
Author: Maor Gottlieb <maorg@mellanox.com>
Date:   Mon Aug 29 01:13:50 2016 +0300

    net/mlx5: Increase number of ethtool steering priorities
    
    Ethtool has 11 flow tables, each flow table has its own priority.
    Increase the number of priorities to be aligned with the number of flow
    tables.
    
    Fixes: 1174fce8d141 ('net/mlx5e: Support l3/l4 flow type specs in ethtool flow steering')
    Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
    Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 1722b9694ecfbc602865017c3fa6da0e3ec234d8
Author: Eran Ben Elisha <eranbe@mellanox.com>
Date:   Mon Aug 29 01:13:49 2016 +0300

    net/mlx5: Add error prints when validate ETS failed
    
    Upon set ETS failure due to user invalid input, add error prints to
    specify the exact error to the user.
    
    Fixes: cdcf11212b22 ('net/mlx5e: Validate BW weight values of ETS')
    Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
    Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit bf50082c15eb2bc47d1922e70f424c57f36646d5
Author: Kamal Heib <kamalh@mellanox.com>
Date:   Mon Aug 29 01:13:48 2016 +0300

    net/mlx5e: Fix memory leak if refreshing TIRs fails
    
    Free 'in' command object also when mlx5_core_modify_tir fails.
    
    Fixes: 724b2aa15126 ("net/mlx5e: TIRs management refactoring")
    Signed-off-by: Kamal Heib <kamalh@mellanox.com>
    Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit c8cf78fe100b0d152a1932327c24cefc0ba4bdbe
Author: Tariq Toukan <tariqt@mellanox.com>
Date:   Mon Aug 29 01:13:47 2016 +0300

    net/mlx5e: Add ethtool counter for TX xmit_more
    
    Add a counter in ethtool for the number of times that
    TX xmit_more was used.
    
    Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
    Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit cc8e9ebf952699cb6870f1366a4920d05b036e31
Author: Eran Ben Elisha <eranbe@mellanox.com>
Date:   Mon Aug 29 01:13:46 2016 +0300

    net/mlx5e: Fix ethtool -g/G rx ring parameter report with striding RQ
    
    The driver RQ has two possible configurations: striding RQ and
    non-striding RQ.  Until this patch, the driver always reported the
    number of hardware WQEs (ring descriptors). For non striding RQ
    configuration, this was OK since we have one WQE per pending packet
    For striding RQ, multiple packets can fit into one WQE. For better
    user experience we normalize the rx_pending parameter (size of wqe/mtu)
    as the average ring size in case of striding RQ.
    
    Fixes: 461017cb006a ('net/mlx5e: Support RX multi-packet WQE ...')
    Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
    Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 6e8dd6d6f4bd2fd6fefdbf2e73bf251e36db59af
Author: Saeed Mahameed <saeedm@mellanox.com>
Date:   Mon Aug 29 01:13:45 2016 +0300

    net/mlx5e: Don't wait for SQ completions on close
    
    Instead of asking the firmware to flush the SQ (Send Queue) via
    asynchronous completions when moved to error, we handle SQ flush
    manually (mlx5e_free_tx_descs) same as we did when SQ flush got
    timed out or on tx_timeout.
    
    This will reduce SQs flush time and speedup interface down procedure.
    
    Moved mlx5e_free_tx_descs to the end of en_tx.c for tx
    critical code locality.
    
    Fixes: 29429f3300a3 ('net/mlx5e: Timeout if SQ doesn't flush during close')
    Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 8484f9ed13b26043be80ff5774506024956eae8f
Author: Saeed Mahameed <saeedm@mellanox.com>
Date:   Mon Aug 29 01:13:44 2016 +0300

    net/mlx5e: Don't post fragmented MPWQE when RQ is disabled
    
    ICO (Internal control operations) SQ (Send Queue) is closed/disabled
    after RQ (Receive Queue).  After RQ is closed an ICO SQ completion
    might post a fragmented MPWQE (Multi Packet Work Queue Element) into
    that RQ.
    
    As on regular RQ post, check if we are allowed to post to that
    RQ (RQ is enabled). Cleanup in-progress UMR MPWQE on mlx5e_free_rx_descs
    if needed.
    
    Fixes: bc77b240b3c5 ('net/mlx5e: Add fragmented memory support for RX multi packet WQE')
    Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit f2fde18c52a7367a8f6cf6855e2a7174e601c8ee
Author: Saeed Mahameed <saeedm@mellanox.com>
Date:   Mon Aug 29 01:13:43 2016 +0300

    net/mlx5e: Don't wait for RQ completions on close
    
    This will significantly reduce receive queue flush time on interface
    down.
    
    Instead of asking the firmware to flush the RQ (Receive Queue) via
    asynchronous completions when moved to error, we handle RQ flush
    manually (mlx5e_free_rx_descs) same as we did when RQ flush got timed
    out.
    
    This will reduce RQs flush time and speedup interface down procedure
    (ifconfig down) from 6 sec to 0.3 sec on a 48 cores system.
    
    Moved mlx5e_free_rx_descs en_main.c where it is needed, to keep en_rx.c
    free form non critical data path code for better code locality.
    
    Fixes: 6cd392a082de ('net/mlx5e: Handle RQ flush in error cases')
    Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit fe4c988bdd1cc60402a4e3ca3976a686ea991b5a
Author: Saeed Mahameed <saeedm@mellanox.com>
Date:   Mon Aug 29 01:13:42 2016 +0300

    net/mlx5e: Limit UMR length to the device's limitation
    
    ConnectX-4 UMR (User Memory Region) MTT translation table offset in WQE
    is limited to U16_MAX, before this patch we ignored that limitation and
    requested the maximum possible UMR translation length that the netdev
    might need (MAX channels * MAX pages per channel).
    In case of a system with #cores > 32 and when linear WQE allocation fails,
    falling back to using UMR WQEs will cause the RQ (Receive Queue) to get
    stuck.
    
    Here we limit UMR length to min(U16_MAX, max required pages) (while
    considering the required alignments) on driver load, by default U16_MAX is
    sufficient since the default RX rings value guarantees that we are in
    range, dynamically (on set_ringparam/set_channels) we will check if the
    new required UMR length (num mtts) is still in range, if not, fail the
    request.
    
    Fixes: bc77b240b3c5 ('net/mlx5e: Add fragmented memory support for RX multi packet WQE')
    Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 78a3e8889b4b6b99775ed954696ff3e017f5d19b
Author: Cyril Bur <cyrilbur@gmail.com>
Date:   Tue Aug 23 10:46:17 2016 +1000

    powerpc: signals: Discard transaction state from signal frames
    
    Userspace can begin and suspend a transaction within the signal
    handler which means they might enter sys_rt_sigreturn() with the
    processor in suspended state.
    
    sys_rt_sigreturn() wants to restore process context (which may have
    been in a transaction before signal delivery). To do this it must
    restore TM SPRS. To achieve this, any transaction initiated within the
    signal frame must be discarded in order to be able to restore TM SPRs
    as TM SPRs can only be manipulated non-transactionally..
    >From the PowerPC ISA:
      TM Bad Thing Exception [Category: Transactional Memory]
       An attempt is made to execute a mtspr targeting a TM register in
       other than Non-transactional state.
    
    Not doing so results in a TM Bad Thing:
    [12045.221359] Kernel BUG at c000000000050a40 [verbose debug info unavailable]
    [12045.221470] Unexpected TM Bad Thing exception at c000000000050a40 (msr 0x201033)
    [12045.221540] Oops: Unrecoverable exception, sig: 6 [#1]
    [12045.221586] SMP NR_CPUS=2048 NUMA PowerNV
    [12045.221634] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE
     nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
     xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter
     ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables kvm_hv kvm
     uio_pdrv_genirq ipmi_powernv uio powernv_rng ipmi_msghandler autofs4 ses enclosure
     scsi_transport_sas bnx2x ipr mdio libcrc32c
    [12045.222167] CPU: 68 PID: 6178 Comm: sigreturnpanic Not tainted 4.7.0 #34
    [12045.222224] task: c0000000fce38600 ti: c0000000fceb4000 task.ti: c0000000fceb4000
    [12045.222293] NIP: c000000000050a40 LR: c0000000000163bc CTR: 0000000000000000
    [12045.222361] REGS: c0000000fceb7ac0 TRAP: 0700   Not tainted (4.7.0)
    [12045.222418] MSR: 9000000300201033 <SF,HV,ME,IR,DR,RI,LE,TM[SE]> CR: 28444280  XER: 20000000
    [12045.222625] CFAR: c0000000000163b8 SOFTE: 0 PACATMSCRATCH: 900000014280f033
    GPR00: 01100000b8000001 c0000000fceb7d40 c00000000139c100 c0000000fce390d0
    GPR04: 900000034280f033 0000000000000000 0000000000000000 0000000000000000
    GPR08: 0000000000000000 b000000000001033 0000000000000001 0000000000000000
    GPR12: 0000000000000000 c000000002926400 0000000000000000 0000000000000000
    GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    GPR24: 0000000000000000 00003ffff98cadd0 00003ffff98cb470 0000000000000000
    GPR28: 900000034280f033 c0000000fceb7ea0 0000000000000001 c0000000fce390d0
    [12045.223535] NIP [c000000000050a40] tm_restore_sprs+0xc/0x1c
    [12045.223584] LR [c0000000000163bc] tm_recheckpoint+0x5c/0xa0
    [12045.223630] Call Trace:
    [12045.223655] [c0000000fceb7d80] [c000000000026e74] sys_rt_sigreturn+0x494/0x6c0
    [12045.223738] [c0000000fceb7e30] [c0000000000092e0] system_call+0x38/0x108
    [12045.223806] Instruction dump:
    [12045.223841] 7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8
    [12045.223955] 4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020
    [12045.224074] ---[ end trace cb8002ee240bae76 ]---
    
    It isn't clear exactly if there is really a use case for userspace
    returning with a suspended transaction, however, doing so doesn't (on
    its own) constitute a bad frame. As such, this patch simply discards
    the transactional state of the context calling the sigreturn and
    continues.
    
    Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
    Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
    Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
    Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
    Acked-by: Simon Guo <wei.guo.simon@gmail.com>
    Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

commit a9cbf0b2195b695cbeeeecaa4e2770948c212e9a
Author: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Date:   Mon Aug 22 12:17:44 2016 +0530

    powerpc/powernv : Drop reference added by kset_find_obj()
    
    In a situation, where Linux kernel gets notified about duplicate error log
    from OPAL, it is been observed that kernel fails to remove sysfs entries
    (/sys/firmware/opal/elog/0xXXXXXXXX) of such error logs. This is because,
    we currently search the error log/dump kobject in the kset list via
    'kset_find_obj()' routine. Which eventually increment the reference count
    by one, once it founds the kobject.
    
    So, unless we decrement the reference count by one after it found the kobject,
    we would not be able to release the kobject properly later.
    
    This patch adds the 'kobject_put()' which was missing earlier.
    
    Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
    Cc: stable@vger.kernel.org
    Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
    Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

commit cc7786d3ee7e3c979799db834b528db2c0834c2e
Author: Nicholas Piggin <npiggin@gmail.com>
Date:   Mon Jul 25 14:26:51 2016 +1000

    powerpc/tm: do not use r13 for tabort_syscall
    
    tabort_syscall runs with RI=1, so a nested recoverable machine
    check will load the paca into r13 and overwrite what we loaded
    it with, because exceptions returning to privileged mode do not
    restore r13.
    
    Fixes: b4b56f9ecab4 (powerpc/tm: Abort syscalls in active transactions)
    Cc: stable@vger.kernel.org
    Signed-off-by: Nick Piggin <npiggin@gmail.com>
    Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

commit d138027a8256a3e9d7657c8d0dae84c08ef2cfe1
Author: Trond Myklebust <trond.myklebust@primarydata.com>
Date:   Sun Aug 28 12:19:04 2016 -0400

    NFSv4.1: Remove obsolete and incorrrect assignment in nfs4_callback_sequence
    
    Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

commit 2e80dbe7ac51a911e8a828407b1a48c5ba938cd2
Author: Trond Myklebust <trond.myklebust@primarydata.com>
Date:   Sun Aug 28 11:50:26 2016 -0400

    NFSv4.1: Close callback races for OPEN, LAYOUTGET and LAYOUTRETURN
    
    Defer freeing the slot until after we have processed the results from
    OPEN and LAYOUTGET. This means that the server can rely on the
    mechanism in RFC5661 Section 2.10.6.3 to ensure that replies to an
    OPEN or LAYOUTGET/RETURN RPC call don't race with the callbacks that
    apply to them.
    
    Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

commit 07e8dcbda71ef87e9cbdc42b5bb16a44c1ab839b
Author: Trond Myklebust <trond.myklebust@primarydata.com>
Date:   Sun Aug 28 10:28:25 2016 -0400

    NFSv4.1: Defer bumping the slot sequence number until we free the slot
    
    For operations like OPEN or LAYOUTGET, which return recallable state
    (i.e. delegations and layouts) we want to enable the mechanism for
    resolving recall races in RFC5661 Section 2.10.6.3.
    To do so, we will want to defer bumping the slot's sequence number until
    we have finished processing the RPC results.
    
    Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

commit 045d2a6d076a2ecd7043ea543ea198af943f8b16
Author: Trond Myklebust <trond.myklebust@primarydata.com>
Date:   Sun Aug 28 13:25:43 2016 -0400

    NFSv4.1: Delay callback processing when there are referring triples
    
    If CB_SEQUENCE tells us that the processing of this request depends on
    the completion of one or more referring triples (see RFC 5661 Section
    2.10.6.3), delay the callback processing until after the RPC requests
    being referred to have completed.
    If we end up delaying for more than 1/2 second, then fall back to
    returning NFS4ERR_DELAY in reply to the callback.
    
    Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

commit e09c978aae5bedfdb379be80363b024b7d82638b
Author: Trond Myklebust <trond.myklebust@primarydata.com>
Date:   Sat Aug 27 23:44:04 2016 -0400

    NFSv4.1: Fix Oopsable condition in server callback races
    
    The slot table hasn't been an array since v3.7. Ensure that we
    use nfs4_lookup_slot() to access the slot correctly.
    
    Fixes: 87dda67e7386 ("NFSv4.1: Allow SEQUENCE to resize the slot table...")
    Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
    Cc: stable@vger.kernel.org # v3.8+

commit 9dbeea7f08f3784b152d9fb3b86beb34aad77c72
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri Aug 26 08:51:39 2016 -0700

    rhashtable: fix a memory leak in alloc_bucket_locks()
    
    If vmalloc() was successful, do not attempt a kmalloc_array()
    
    Fixes: 4cf0b354d92e ("rhashtable: avoid large lock-array allocations")
    Reported-by: CAI Qian <caiqian@redhat.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Florian Westphal <fw@strlen.de>
    Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
    Tested-by: CAI Qian <caiqian@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit e70c70c38d7a5ced76fc8b1c4a7ccee76e9c2911
Author: Andrew Rybchenko <Andrew.Rybchenko@oktetlabs.ru>
Date:   Fri Aug 26 11:19:34 2016 +0100

    sfc: fix potential stack corruption from running past stat bitmask
    
    On 32-bit systems, mask is only an array of 3 longs, not 4, so don't try
    to write to mask[3].
    Also include build-time checks in case the size of the bitmask changes.
    
    Fixes: 3c36a2aded8c ("sfc: display vadaptor statistics for all interfaces")
    Signed-off-by: Edward Cree <ecree@solarflare.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit c15e07b02bf0450bc8e60f2cc51cb42daa371417
Author: Jiri Pirko <jiri@mellanox.com>
Date:   Thu Aug 25 18:30:52 2016 +0200

    team: loadbalance: push lacpdus to exact delivery
    
    When team is in bridge and LACP is utilized, LACPDU packets are pushed
    to userspace using raw socket and there they are processed. However,
    since 8626c56c8279b, LACPDU skbs are dropped by bridge rx_handler so
    they never reach packet handlers in rx path. Fix this by explicity treat
    LACPDUs to be pushed to exact delivery in team rx_handler.
    
    Reported-by: Ido Schimmel <idosch@mellanox.com>
    Fixes: 8626c56c8279b ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict")
    Signed-off-by: Jiri Pirko <jiri@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit c234af5875ffeab39d5a2c4230a477a35987a484
Author: Colin Ian King <colin.king@canonical.com>
Date:   Thu Aug 25 07:51:10 2016 +0100

    net: hns: dereference ppe_cb->ppe_common_cb if it is non-null
    
    ppe_cb->ppe_common_cb is being dereferenced before a null check is
    being made on it.  If ppe_cb->ppe_common_cb is null then we end up
    with a null pointer dereference when assigning dsaf_dev.  Fix this
    by moving the initialisation of dsaf_dev once we know
    ppe_cb->ppe_common_cb is OK to dereference.
    
    Signed-off-by: Colin Ian King <colin.king@canonical.com>
    Acked-by: Yisen Zhuang <yisen.zhuang@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit b628d611a2a53858263fc419dba552f32431dba4
Author: Gao Feng <fgao@ikuai8.com>
Date:   Thu Aug 25 09:45:39 2016 +0800

    8139cp: Fix one possible deadloop in cp_rx_poll
    
    When cp_rx_poll does not get enough packet, it will check the rx
    interrupt status again. If so, it will jumpt to rx_status_loop again.
    But the goto jump resets the rx variable as zero too.
    
    As a result, it causes one possible deadloop. Assume this case,
    rx_status_loop only gets the packet count which is less than budget,
    and (cpr16(IntrStatus) & cp_rx_intr_mask) condition is always true.
    It causes the deadloop happens and system is blocked.
    
    Signed-off-by: Gao Feng <fgao@ikuai8.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit f38ff2ee7727994685494bcc4d7c274b35b5418a
Author: Anjali Singhai Jain <anjali.singhai@intel.com>
Date:   Wed Aug 24 17:51:53 2016 -0700

    i40e: Change some init flow for the client
    
    This change makes a common flow for Client instance open during init
    and reset path. The Client subtask can handle both the cases instead of
    making a separate notify_client_of_open call.
    Also it may fix a bug during reset where the service task was leaking
    some memory and causing issues.
    
    Change-Id: I7232a32fd52b82e863abb54266fa83122f80a0cd
    Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
    Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit c3e70edd7c2eed6acd234627a6007627f5c76e8e
Author: Xander Huff <xander.huff@ni.com>
Date:   Wed Aug 24 16:47:53 2016 -0500

    Revert "phy: IRQ cannot be shared"
    
    This reverts:
      commit 33c133cc7598 ("phy: IRQ cannot be shared")
    
    On hardware with multiple PHY devices hooked up to the same IRQ line, allow
    them to share it.
    
    Sergei Shtylyov says:
      "I'm not sure now what was the reason I concluded that the IRQ sharing
      was impossible... most probably I thought that the kernel IRQ handling
      code exited the loop over the IRQ actions once IRQ_HANDLED was returned
      -- which is obviously not so in reality..."
    
    Signed-off-by: Xander Huff <xander.huff@ni.com>
    Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 4f101c47791cdcb831b3ef1f831b1cc51e4fe03c
Author: Florian Fainelli <f.fainelli@gmail.com>
Date:   Wed Aug 24 11:01:20 2016 -0700

    net: dsa: bcm_sf2: Fix race condition while unmasking interrupts
    
    We kept shadow copies of which interrupt sources we have enabled and
    disabled, but due to an order bug in how intrl2_mask_clear was defined,
    we could run into the following scenario:
    
    CPU0					CPU1
    intrl2_1_mask_clear(..)
    sets INTRL2_CPU_MASK_CLEAR
    					bcm_sf2_switch_1_isr
    					read INTRL2_CPU_STATUS and masks with stale
    					irq1_mask value
    updates irq1_mask value
    
    Which would make us loop again and again trying to process and interrupt
    we are not clearing since our copy of whether it was enabled before
    still indicates it was not. Fix this by updating the shadow copy first,
    and then unasking at the HW level.
    
    Fixes: 246d7f773c13 ("net: dsa: add Broadcom SF2 switch driver")
    Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 166ee5b87866de07a3e56c1b757f2b5cabba72a5
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Aug 24 09:39:02 2016 -0700

    qdisc: fix a module refcount leak in qdisc_create_dflt()
    
    Should qdisc_alloc() fail, we must release the module refcount
    we got right before.
    
    Fixes: 6da7c8fcbcbd ("qdisc: allow setting default queuing discipline")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Acked-by: John Fastabend <john.r.fastabend@intel.com>
    Acked-by: John Fastabend <john.r.fastabend@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit a5de125dd46c851fc962806135953c1bd0a0f0df
Author: Wei Yongjun <weiyongjun1@huawei.com>
Date:   Wed Aug 24 13:32:19 2016 +0000

    tipc: fix the error handling in tipc_udp_enable()
    
    Fix to return a negative error code in enable_mcast() error handling
    case, and release udp socket when necessary.
    
    Fixes: d0f91938bede ("tipc: add ip/udp media type")
    Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 4f34228b67246ae3b3ab1dc33b980c77c0650ef4
Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date:   Mon Aug 15 16:02:20 2016 +0300

    Bluetooth: Fix hci_sock_recvmsg when MSG_TRUNC is not set
    
    Similar to bt_sock_recvmsg MSG_TRUNC shall be checked using the original
    flags not msg_flags.
    
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

commit 90a56f72edb088c678083c32d05936c7c8d9a948
Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date:   Fri Aug 12 15:11:28 2016 +0300

    Bluetooth: Fix bt_sock_recvmsg when MSG_TRUNC is not set
    
    Commit b5f34f9420b50c9b5876b9a2b68e96be6d629054 attempt to introduce
    proper handling for MSG_TRUNC but recv and variants should still work
    as read if no flag is passed, but because the code may set MSG_TRUNC to
    msg->msg_flags that shall not be used as it may cause it to be behave as
    if MSG_TRUNC is always, so instead of using it this changes the code to
    use the flags parameter which shall contain the original flags.
    
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

commit 16590a228109e2f318d2cc6466221134cfab723a
Author: Chuck Lever <chuck.lever@oracle.com>
Date:   Mon Aug 22 14:57:42 2016 -0400

    SUNRPC: Silence WARN_ON when NFSv4.1 over RDMA is in use
    
    Using NFSv4.1 on RDMA should be safe, so broaden the new checks in
    rpc_create().
    
    WARN_ON_ONCE is used, matching most other WARN call sites in clnt.c.
    
    Fixes: 39a9beab5acb ("rpc: share one xps between all backchannels")
    Fixes: d50039ea5ee6 ("nfsd4/rpc: move backchannel create logic...")
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Reviewed-by: J. Bruce Fields <bfields@fieldses.org>
    Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

commit 45c91d808ff989d950e260dab9f89e8f4a3c9c2c
Author: Shaohua Li <shli@fb.com>
Date:   Mon Aug 22 21:14:02 2016 -0700

    raid5: avoid unnecessary bio data set
    
    bio_reset doesn't change bi_io_vec and bi_max_vecs, so we don't need to
    set them every time. bi_private will be set before the bio is
    dispatched.
    
    Signed-off-by: Shaohua Li <shli@fb.com>

commit 5f9d1fde7d54a5d5fd8cccbee9c9c31474fcdcf2
Author: Shaohua Li <shli@fb.com>
Date:   Mon Aug 22 21:14:01 2016 -0700

    raid5: fix memory leak of bio integrity data
    
    Yi reported a memory leak of raid5 with DIF/DIX enabled disks. raid5
    doesn't alloc/free bio, instead it reuses bios. There are two issues in
    current code:
    1. the code calls bio_init (from
    init_stripe->raid5_build_block->bio_init) then bio_reset (ops_run_io).
    The bio is reused, so likely there is integrity data attached. bio_init
    will clear a pointer to integrity data and makes bio_reset can't release
    the data
    2. bio_reset is called before dispatching bio. After bio is finished,
    it's possible we don't free bio's integrity data (eg, we don't call
    bio_reset again)
    Both issues will cause memory leak. The patch moves bio_init to stripe
    creation and bio_reset to bio end io. This will fix the two issues.
    
    Reported-by: Yi Zhang <yizhan@redhat.com>
    Signed-off-by: Shaohua Li <shli@fb.com>

commit 27028626b4b9022dcac23688e09ea43b36e1183c
Author: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Date:   Tue Aug 23 10:53:57 2016 +0200

    raid10: record correct address of bad block
    
    For failed write request record block address on a device, not block
    address in an array.
    
    Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
    Signed-off-by: Shaohua Li <shli@fb.com>

commit 0f6187dbe542d71ace8ba0908954b0f4f8a30a1e
Author: Wei Yongjun <weiyongjun1@huawei.com>
Date:   Sun Aug 21 14:42:25 2016 +0000

    md-cluster: fix error return code in join()
    
    Fix to return error code -ENOMEM from the lockres_init() error
    handling case instead of 0, as done elsewhere in this function.
    
    Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
    Signed-off-by: Shaohua Li <shli@fb.com>

commit 486b0f7bcd64be027535811ef44195bc1027fbd3
Author: Song Liu <songliubraving@fb.com>
Date:   Fri Aug 19 15:34:01 2016 -0700

    r5cache: set MD_JOURNAL_CLEAN correctly
    
    Currently, the code sets MD_JOURNAL_CLEAN when the array has
    MD_FEATURE_JOURNAL and the recovery_cp is MaxSector. The array
    will be MD_JOURNAL_CLEAN even if the journal device is missing.
    
    With this patch, the MD_JOURNAL_CLEAN is only set when the journal
    device presents.
    
    Signed-off-by: Song Liu <songliubraving@fb.com>
    Signed-off-by: Shaohua Li <shli@fb.com>

commit 51af96b53469f3b8cfcfe0504d0ff87239175b78
Author: Yotam Gigi <yotamg@mellanox.com>
Date:   Wed Aug 24 11:18:52 2016 +0200

    mlxsw: router: Enable neighbors to be created on stacked devices
    
    Make the function mlxsw_router_neigh_construct search the rif according
    to the neighbour dev other than the dev that was passed to the ndo, thus
    allowing creating neigbhours upon stacked devices.
    
    Fixes: 6cf3c971dc84 ("mlxsw: spectrum_router: Add private neigh table")
    Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
    Reviewed-by: Ido Schimmel <idosch@mellanox.com>
    Signed-off-by: Jiri Pirko <jiri@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit f888f58795b640442165e60a6fa93e8e623d01a5
Author: Ido Schimmel <idosch@mellanox.com>
Date:   Wed Aug 24 11:18:51 2016 +0200

    mlxsw: spectrum: Add missing flood to router port
    
    In case we have a layer 3 interface on top of a bridge (VLAN / FID RIF),
    then we should flood the following packet types to the router:
    
    * Broadcast: If DIP is the broadcast address of the interface, then we
    need to be able to get it to CPU by trapping it following route lookup.
    
    * Reserved IP multicast (224.0.0.X): Some control packets (e.g. OSPF)
    use this range and are trapped in the router block.
    
    Fixes: 99f44bb3527b ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices")
    Signed-off-by: Ido Schimmel <idosch@mellanox.com>
    Signed-off-by: Jiri Pirko <jiri@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit dbb50887c8f619fc5c3489783ebc3122bc134a31
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Wed Jul 27 11:40:14 2016 -0700

    Bluetooth: split sk_filter in l2cap_sock_recv_cb
    
    During an audit for sk_filter(), we found that rx_busy_skb handling
    in l2cap_sock_recv_cb() and l2cap_sock_recvmsg() looks not quite as
    intended.
    
    The assumption from commit e328140fdacb ("Bluetooth: Use event-driven
    approach for handling ERTM receive buffer") is that errors returned
    from sock_queue_rcv_skb() are due to receive buffer shortage. However,
    nothing should prevent doing a setsockopt() with SO_ATTACH_FILTER on
    the socket, that could drop some of the incoming skbs when handled in
    sock_queue_rcv_skb().
    
    In that case sock_queue_rcv_skb() will return with -EPERM, propagated
    from sk_filter() and if in L2CAP_MODE_ERTM mode, wrong assumption was
    that we failed due to receive buffer being full. From that point onwards,
    due to the to-be-dropped skb being held in rx_busy_skb, we cannot make
    any forward progress as rx_busy_skb is never cleared from l2cap_sock_recvmsg(),
    due to the filter drop verdict over and over coming from sk_filter().
    Meanwhile, in l2cap_sock_recv_cb() all new incoming skbs are being
    dropped due to rx_busy_skb being occupied.
    
    Instead, just use __sock_queue_rcv_skb() where an error really tells that
    there's a receive buffer issue. Split the sk_filter() and enable it for
    non-segmented modes at queuing time since at this point in time the skb has
    already been through the ERTM state machine and it has been acked, so dropping
    is not allowed. Instead, for ERTM and streaming mode, call sk_filter() in
    l2cap_data_rcv() so the packet can be dropped before the state machine sees it.
    
    Fixes: e328140fdacb ("Bluetooth: Use event-driven approach for handling ERTM receive buffer")
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Acked-by: Willem de Bruijn <willemb@google.com>
    Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

commit 9afee94939e3eda4c8bf239f7727cb56e158c976
Author: Frederic Dalleau <frederic.dalleau@collabora.co.uk>
Date:   Tue Aug 23 07:59:19 2016 +0200

    Bluetooth: Fix memory leak at end of hci requests
    
    In hci_req_sync_complete the event skb is referenced in hdev->req_skb.
    It is used (via hci_req_run_skb) from either __hci_cmd_sync_ev which will
    pass the skb to the caller, or __hci_req_sync which leaks.
    
    unreferenced object 0xffff880005339a00 (size 256):
      comm "kworker/u3:1", pid 1011, jiffies 4294671976 (age 107.389s)
      backtrace:
        [<ffffffff818d89d9>] kmemleak_alloc+0x49/0xa0
        [<ffffffff8116bba8>] kmem_cache_alloc+0x128/0x180
        [<ffffffff8167c1df>] skb_clone+0x4f/0xa0
        [<ffffffff817aa351>] hci_event_packet+0xc1/0x3290
        [<ffffffff8179a57b>] hci_rx_work+0x18b/0x360
        [<ffffffff810692ea>] process_one_work+0x14a/0x440
        [<ffffffff81069623>] worker_thread+0x43/0x4d0
        [<ffffffff8106ead4>] kthread+0xc4/0xe0
        [<ffffffff818dd38f>] ret_from_fork+0x1f/0x40
        [<ffffffffffffffff>] 0xffffffffffffffff
    
    Signed-off-by: Frédéric Dalleau <frederic.dalleau@collabora.co.uk>
    Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

commit 901d3d4fee83e9407d91e7178048e2fed6c91f6b
Author: Li Zhong <zhong@linux.vnet.ibm.com>
Date:   Wed Aug 24 15:34:40 2016 +0800

    crypto: vmx - fix null dereference in p8_aes_xts_crypt
    
    walk.iv is not assigned a value in blkcipher_walk_init. It makes iv uninitialized.
    It is possibly a null value(as shown below), which is then used by aes_p8_encrypt.
    
    This patch moves iv = walk.iv after blkcipher_walk_virt, in which walk.iv is set.
    
    [17856.268050] Unable to handle kernel paging request for data at address 0x00000000
    [17856.268212] Faulting instruction address: 0xd000000002ff04bc
    7:mon> t
    [link register   ] d000000002ff47b8 p8_aes_xts_crypt+0x168/0x2a0 [vmx_crypto]   (938)
    [c000000013b77960] d000000002ff4794 p8_aes_xts_crypt+0x144/0x2a0 [vmx_crypto] (unreliable)
    [c000000013b77a70] c000000000544d64 skcipher_decrypt_blkcipher+0x64/0x80
    [c000000013b77ac0] d000000003c0175c crypt_convert+0x53c/0x620 [dm_crypt]
    [c000000013b77ba0] d000000003c043fc kcryptd_crypt+0x3cc/0x440 [dm_crypt]
    [c000000013b77c50] c0000000000f3070 process_one_work+0x1e0/0x590
    [c000000013b77ce0] c0000000000f34c8 worker_thread+0xa8/0x660
    [c000000013b77d80] c0000000000fc0b0 kthread+0x110/0x130
    [c000000013b77e30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c
    
    Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

commit 10bb087ce381c812cd81a65ffd5e6f83e6399291
Author: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Date:   Thu Aug 18 19:53:36 2016 +0100

    crypto: qat - fix aes-xts key sizes
    
    Increase value of supported key sizes for qat_aes_xts.
    aes-xts keys consists of keys of equal size concatenated.
    
    Fixes: def14bfaf30d ("crypto: qat - add support for ctr(aes) and xts(aes)")
    Cc: stable@vger.kernel.org
    Reported-by: Wenqian Yu <wenqian.yu@intel.com>
    Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

commit f74bdd4cb5d0d4c3e89919e850e0bbb8789f32f9
Author: Fabian Frederick <fabf@skynet.be>
Date:   Tue Aug 16 21:49:45 2016 +0200

    hwrng: mxc-rnga - Fix Kconfig dependency
    
    We can directly depend on SOC_IMX31 since commit c9ee94965dce
    ("ARM: imx: deconstruct mxc_rnga initialization")
    
    Since that commit, CONFIG_HW_RANDOM_MXC_RNGA could not be switched on
    with unknown symbol ARCH_HAS_RNGA and mxc-rnga.o can't be generated with
    ARCH=arm make M=drivers/char/hw_random
    Previously, HW_RANDOM_MXC_RNGA required ARCH_HAS_RNGA
    which was based on IMX_HAVE_PLATFORM_MXC_RNGA  && ARCH_MXC.
    IMX_HAVE_PLATFORM_MXC_RNGA  was based on SOC_IMX31.
    
    Fixes: c9ee94965dce ("ARM: imx: deconstruct mxc_rnga initialization")
    Signed-off-by: Fabian Frederick <fabf@skynet.be>
    Acked-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

commit d7226c7a4dd19929d6df4ae04698da2fcf6f875a
Author: David Ahern <dsa@cumulusnetworks.com>
Date:   Tue Aug 23 21:05:27 2016 -0700

    net: diag: Fix refcnt leak in error path destroying socket
    
    inet_diag_find_one_icsk takes a reference to a socket that is not
    released if sock_diag_destroy returns an error. Fix by changing
    tcp_diag_destroy to manage the refcnt for all cases and remove
    the sock_put calls from tcp_abort.
    
    Fixes: c1e64e298b8ca ("net: diag: Support destroying TCP sockets")
    Reported-by: Lorenzo Colitti <lorenzo@google.com>
    Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 7b996243fab46092fb3a29c773c54be8152366e4
Author: Soheil Hassas Yeganeh <soheil@google.com>
Date:   Tue Aug 23 18:22:33 2016 -0400

    tun: fix transmit timestamp support
    
    Instead of using sock_tx_timestamp, use skb_tx_timestamp to record
    software transmit timestamp of a packet.
    
    sock_tx_timestamp resets and overrides the tx_flags of the skb.
    The function is intended to be called from within the protocol
    layer when creating the skb, not from a device driver. This is
    inconsistent with other drivers and will cause issues for TCP.
    
    In TCP, we intend to sample the timestamps for the last byte
    for each sendmsg/sendpage. For that reason, tcp_sendmsg calls
    tcp_tx_timestamp only with the last skb that it generates.
    For example, if a 128KB message is split into two 64KB packets
    we want to sample the SND timestamp of the last packet. The current
    code in the tun driver, however, will result in sampling the SND
    timestamp for both packets.
    
    Also, when the last packet is split into smaller packets for
    retranmission (see tcp_fragment), the tun driver will record
    timestamps for all of the retransmitted packets and not only the
    last packet.
    
    Fixes: eda297729171 (tun: Support software transmit time stamping.)
    Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
    Signed-off-by: Francis Yan <francisyyan@google.com>
    Acked-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 75d855a5e93e6f3d9b37a8719d69a5318f051453
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Aug 23 09:57:51 2016 -0700

    udp: get rid of SLAB_DESTROY_BY_RCU allocations
    
    After commit ca065d0cf80f ("udp: no longer use SLAB_DESTROY_BY_RCU")
    we do not need this special allocation mode anymore, even if it is
    harmless.
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 232cb53a45965f8789fbf0a9a1962f8c67ab1a3c
Author: Lance Richardson <lrichard@redhat.com>
Date:   Tue Aug 23 11:40:52 2016 -0400

    sctp: fix overrun in sctp_diag_dump_one()
    
    The function sctp_diag_dump_one() currently performs a memcpy()
    of 64 bytes from a 16 byte field into another 16 byte field. Fix
    by using correct size, use sizeof to obtain correct size instead
    of using a hard-coded constant.
    
    Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file")
    Signed-off-by: Lance Richardson <lrichard@redhat.com>
    Reviewed-by: Xin Long <lucien.xin@gmail.com>
    Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit a8184003c0bb1d6362c2af76c560b3caae6832cb
Author: Rabin Vincent <rabinv@axis.com>
Date:   Tue Aug 23 16:31:28 2016 +0200

    dwc_eth_qos: fix interrupt enable race
    
    We currently enable interrupts before we enable NAPI. If an RX interrupt
    hits before we enabled NAPI then the NAPI callback is never called and
    we leave the hardware with RX interrupts disabled, which of course leads
    us to never handling received packets.  Fix this by moving the interrupt
    enable to after we've enable NAPI and the reclaim tasklet.
    
    Fixes: cd5e41234729 ("dwc_eth_qos: do phy_start before resetting hardware")
    Signed-off-by: Rabin Vincent <rabinv@axis.com>
    Signed-off-by: Lars Persson <larper@axis.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 53080fe9c451e7625e71b91c384e7bef1be72b00
Author: Fabio Estevam <fabio.estevam@nxp.com>
Date:   Tue Aug 23 09:48:20 2016 -0300

    net: lpc_eth: Check clk_prepare_enable() error
    
    clk_prepare_enable() may fail, so we should better check its return
    value and propagate it in the case of failure
    
    While at it, replace __lpc_eth_clock_enable() with a plain
    clk_prepare_enable/clk_disable_unprepare() call in order to
    simplify the code.
    
    Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
    Acked-by: Vladimir Zapolskiy <vz@mleia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 1bc261fabe866c4cdc97f52319eaa0c7ee31026e
Author: Jamie Lentin <jm@lentin.co.uk>
Date:   Mon Aug 22 22:47:08 2016 +0100

    net: mv88e6xxx: Fix ingress rate removal for mv6131 chips
    
    The PORT_RATE_CONTROL register works differently on 88e6095/6095f/6131
    in comparison to 6123/61/65, and 0x0 disables. The distinction was lost
    Linux 4.1 --> 4.2
    
    Signed-off-by: Jamie Lentin <jm@lentin.co.uk>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit f64f14820e2deb5db056a05d7672ee2b1c6290e5
Author: Xander Huff <xander.huff@ni.com>
Date:   Mon Aug 22 15:57:16 2016 -0500

    phy: micrel: Reenable interrupts during resume for ksz9031
    
    Like the ksz8081, the ksz9031 has the behavior where it will clear the
    interrupt enable bits when leaving power down. This takes advantage of the
    solution provided by f5aba91.
    
    Signed-off-by: Xander Huff <xander.huff@ni.com>
    Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
    Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 20a2b49fc538540819a0c552877086548cff8d8d
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Aug 22 11:31:10 2016 -0700

    tcp: properly scale window in tcp_v[46]_reqsk_send_ack()
    
    When sending an ack in SYN_RECV state, we must scale the offered
    window if wscale option was negotiated and accepted.
    
    Tested:
     Following packetdrill test demonstrates the issue :
    
    0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
    +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
    
    +0 bind(3, ..., ...) = 0
    +0 listen(3, 1) = 0
    
    // Establish a connection.
    +0 < S 0:0(0) win 20000 <mss 1000,sackOK,wscale 7, nop, TS val 100 ecr 0>
    +0 > S. 0:0(0) ack 1 win 28960 <mss 1460,sackOK, TS val 100 ecr 100, nop, wscale 7>
    
    +0 < . 1:11(10) ack 1 win 156 <nop,nop,TS val 99 ecr 100>
    // check that window is properly scaled !
    +0 > . 1:1(0) ack 1 win 226 <nop,nop,TS val 200 ecr 100>
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Yuchung Cheng <ycheng@google.com>
    Cc: Neal Cardwell <ncardwell@google.com>
    Acked-by: Yuchung Cheng <ycheng@google.com>
    Acked-by: Neal Cardwell <ncardwell@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 6c389fc931bcda88940c809f752ada6d7799482c
Author: Zefir Kurtisi <zefir.kurtisi@neratec.com>
Date:   Mon Aug 22 15:58:12 2016 +0200

    gianfar: fix size of scatter-gathered frames
    
    The current scatter-gather logic in gianfar is flawed, since
    it does not consider the eTSEC's RxBD 'Data Length' field is
    context depening: for the last fragment it contains the full
    frame size, while fragments contain the fragment size, which
    equals the value written to register MRBLR.
    
    This causes data corruption as soon as the hardware starts
    to fragment receiving frames. As a result, the size of
    fragmented frames is increased by
    (nr_frags - 1) * MRBLR
    
    We first noticed this issue working with DSA, where an ICMP
    request sized 1472 bytes causes the scatter-gather logic to
    kick in. The full Ethernet frame (1518) gets increased by
    DSA (4), GMAC_FCB_LEN (8), and FSL_GIANFAR_DEV_HAS_TIMER
    (priv->padding=8) to a total of 1538 octets, which is
    fragmented by the hardware and reconstructed by the driver
    to a 3074 octet frame.
    
    This patch fixes the problem by adjusting the size of
    the last fragment.
    
    It was tested by setting MRBLR to different multiples of
    64, proving correct scatter-gather operation on frames
    with up to 9000 octets in size.
    
    Signed-off-by: Zefir Kurtisi <zefir.kurtisi@neratec.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit b323431bc017e9862870cbbac004774c769ee112
Author: Zefir Kurtisi <zefir.kurtisi@neratec.com>
Date:   Mon Aug 22 15:56:38 2016 +0200

    gianfar: prevent fragmentation in DSA environments
    
    The eTSEC register MRBLR defines the maximum space in
    the RX buffers and is set to 1536 by gianfar. This
    reasonably covers the common use case where the MTU
    is kept at default 1500. In that case, the largest
    Ethernet frame size of 1518 plus an optional
    GMAC_FCB_LEN of 8, and an additional padding of 8
    to handle FSL_GIANFAR_DEV_HAS_TIMER totals to 1534
    and nicely fit within the chosen MRBLR.
    
    Alas, if the eTSEC is attached to a DSA enabled switch,
    the (E)DSA header extension (4 or 8 bytes) causes every
    maximum sized frame to be fragmented by the hardware.
    
    This patch increases the maximum RX buffer size by 8
    and rounds up to the next multiple of 64, which the
    hardware's defines as RX buffer granularity.
    
    Signed-off-by: Zefir Kurtisi <zefir.kurtisi@neratec.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit e83c6744e81abc93a20d0eb3b7f504a176a6126a
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Aug 23 13:59:33 2016 -0700

    udp: fix poll() issue with zero sized packets
    
    Laura tracked poll() [and friends] regression caused by commit
    e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
    
    udp_poll() needs to know if there is a valid packet in receive queue,
    even if its payload length is 0.
    
    Change first_packet_length() to return an signed int, and use -1
    as the indication of an empty queue.
    
    Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
    Reported-by: Laura Abbott <labbott@redhat.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Tested-by: Laura Abbott <labbott@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 41963c10c47a35185e68cb9049f7a3493c94d2d7
Author: Benjamin Coddington <bcodding@redhat.com>
Date:   Mon Aug 22 14:11:16 2016 -0400

    pnfs/blocklayout: update last_write_offset atomically with extents
    
    Block/SCSI layout write completion may add committable extents to the
    extent tree before updating the layout's last-written byte under the inode
    lock.  If a sync happens before this value is updated, then
    prepare_layoutcommit may find and encode these extents which would produce
    a LAYOUTCOMMIT request whose encoded extents are larger than the request's
    loca_length.
    
    Fix this by using a last-written byte value that is updated atomically with
    the extent tree so that commitable extents always match.
    
    Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
    Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

commit b88fa69eaa8649f11828158c7b65c4bcd886ebd5
Author: Trond Myklebust <trond.myklebust@primarydata.com>
Date:   Tue Aug 23 11:19:33 2016 -0400

    pNFS: The client must not do I/O to the DS if it's lease has expired
    
    Ensure that the client conforms to the normative behaviour described in
    RFC5661 Section 12.7.2: "If a client believes its lease has expired,
    it MUST NOT send I/O to the storage device until it has validated its
    lease."
    
    So ensure that we wait for the lease to be validated before using
    the layout.
    
    Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
    Cc: stable@vger.kernel.org # v3.20+

commit 28a10c426e81afc88514bca8e73affccf850fdf6
Author: Jamal Hadi Salim <jhs@mojatatu.com>
Date:   Mon Aug 22 07:10:20 2016 -0400

    net sched: fix encoding to use real length
    
    Encoding of the metadata was using the padded length as opposed to
    the real length of the data which is a bug per specification.
    This has not been an issue todate because all metadatum specified
    so far has been 32 bit where aligned and data length are the same width.
    This also includes a bug fix for validating the length of a u16 field.
    But since there is no metadata of size u16 yes we are fine to include it
    here.
    
    While at it get rid of magic numbers.
    
    Fixes: ef6980b6becb ("net sched: introduce IFE action")
    Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 4870e704d901602e4ae5de462c4e65732cf2ed6c
Author: Yuval Mintz <Yuval.Mintz@qlogic.com>
Date:   Mon Aug 22 12:03:29 2016 +0300

    qed: FLR of active VFs might lead to FW assert
    
    Driver never bothered marking the VF's vport with the VF's sw_fid.
    As a result, FLR flows are not going to clean those vports.
    
    If the vport was active when FLRed, re-activating it would lead
    to a FW assertion.
    
    Fixes: dacd88d6f6851 ("qed: IOV l2 functionality")
    Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit c0451fe1f27b815b3f400df2a63b9aecf589b7b0
Author: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Date:   Sun Aug 21 11:22:32 2016 +0300

    net: ip_finish_output_gso: Allow fragmenting segments of tunneled skbs if their DF is unset
    
    In b8247f095e,
    
       "net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs"
    
    gso skbs arriving from an ingress interface that go through UDP
    tunneling, are allowed to be fragmented if the resulting encapulated
    segments exceed the dst mtu of the egress interface.
    
    This aligned the behavior of gso skbs to non-gso skbs going through udp
    encapsulation path.
    
    However the non-gso vs gso anomaly is present also in the following
    cases of a GRE tunnel:
     - ip_gre in collect_md mode, where TUNNEL_DONT_FRAGMENT is not set
       (e.g. OvS vport-gre with df_default=false)
     - ip_gre in nopmtudisc mode, where IFLA_GRE_IGNORE_DF is set
    
    In both of the above cases, the non-gso skbs get fragmented, whereas the
    gso skbs (having skb_gso_network_seglen that exceeds dst mtu) get dropped,
    as they don't go through the segment+fragment code path.
    
    Fix: Setting IPSKB_FRAG_SEGS if the tunnel specified IP_DF bit is NOT set.
    
    Tunnels that do set IP_DF, will not go to fragmentation o…
@khuey
Copy link
Collaborator

khuey commented Sep 26, 2016

This got fixed \o/

@khuey khuey closed this as completed Sep 26, 2016
krasCGQ pushed a commit to KudProject/kernel_asus_sdm660 that referenced this issue Nov 28, 2018
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
travarilo pushed a commit to travarilo/kernel_asus_sdm660 that referenced this issue Nov 28, 2018
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
adekmaulana pushed a commit to adekmaulana/android_kernel_xiaomi_sdm660 that referenced this issue Nov 28, 2018
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35ef ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
amery pushed a commit to linux-sunxi/linux-sunxi that referenced this issue Nov 29, 2018
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35ef ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
ahmedradaideh pushed a commit to ahmedradaideh/OnePlus3T that referenced this issue Nov 29, 2018
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: ahmedradaideh <ahmed.radaideh@gmail.com>
freak07 pushed a commit to freak07/Kirisakura_Taimen_8.1.0 that referenced this issue Dec 2, 2018
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
(cherry picked from commit a26c91b6d2b97fb0517849f789d9628b53eb7a94)
blissgerrit pushed a commit to Jackeagle/kernel-msm-4.4 that referenced this issue Dec 12, 2018
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35ef ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
hzj5158888 pushed a commit to hzj5158888/android_kernel_xiaomi_msm8998 that referenced this issue Dec 16, 2018
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
adekmaulana pushed a commit to adekmaulana/android_kernel_xiaomi_sdm660 that referenced this issue Jan 9, 2019
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35ef ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
UtsavBalar1231 pushed a commit to UtsavBalar1231/immensity_kernel_motorola_msm8953 that referenced this issue Jan 31, 2019
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35ef ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
freak07 pushed a commit to freak07/Kirisakura-Harmony_Pixel that referenced this issue Feb 7, 2019
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
(cherry picked from commit c41d17fc3f588fc7f85e255dd43b743b6b886b16)
UtsavBalar1231 pushed a commit to UtsavBalar1231/KUNT-KERNEL that referenced this issue Feb 9, 2019
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
bdashore3 pushed a commit to King-Kernel/KingKernel-marlin-old that referenced this issue Feb 17, 2019
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
(cherry picked from commit c41d17fc3f588fc7f85e255dd43b743b6b886b16)
UtsavBalar1231 pushed a commit to RevengeOS-Devices/android_kernel_motorola_msm8953 that referenced this issue Mar 10, 2019
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Tiktodz pushed a commit to Tiktodz/android_kernel_asus_sdm636 that referenced this issue Jul 22, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Kneba <abenkenary3@gmail.com>
Tiktodz pushed a commit to Tiktodz/android_kernel_asus_sdm636 that referenced this issue Jul 22, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: RyuujiX <saputradenny712@gmail.com>
Signed-off-by: dotkit <ewprjkt@proton.me>
Tiktodz pushed a commit to Tiktodz/android_kernel_asus_sdm636 that referenced this issue Jul 22, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: RyuujiX <saputradenny712@gmail.com>
Signed-off-by: dotkit <ewprjkt@proton.me>
iqba78 pushed a commit to iqba78/android_kernel_xiaomi_sdm660_southwest that referenced this issue Jul 23, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35ef ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Kneba <abenkenary3@gmail.com>
Signed-off-by: dotkit <ewprjkt@proton.me>
iqba78 pushed a commit to iqba78/android_kernel_xiaomi_sdm660_southwest that referenced this issue Jul 23, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35ef ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Kneba <abenkenary3@gmail.com>
Signed-off-by: dotkit <ewprjkt@proton.me>
iqba78 pushed a commit to iqba78/android_kernel_xiaomi_sdm660_southwest that referenced this issue Jul 23, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35ef ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Kneba <abenkenary3@gmail.com>
Signed-off-by: Tiktodz <ewprjkt@proton.me>
Tiktodz pushed a commit to Tiktodz/android_kernel_asus_sdm636 that referenced this issue Jul 23, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: RyuujiX <saputradenny712@gmail.com>
Signed-off-by: dotkit <ewprjkt@proton.me>
strongreasons pushed a commit to strongreasons/android_kernel_asus_sdm660 that referenced this issue Jul 28, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: RyuujiX <saputradenny712@gmail.com>
Signed-off-by: strongreasons <abenkenari@gmail.com>
log1cs pushed a commit to nokia-msm8998/android_kernel_nokia_msm8998 that referenced this issue Jul 31, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: RyuujiX <saputradenny712@gmail.com>
iqba78 pushed a commit to iqba78/android_kernel_xiaomi_sdm660_southwest that referenced this issue Aug 2, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35ef ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Kneba <abenkenary3@gmail.com>
Signed-off-by: dotkit <ewprjkt@proton.me>
iqba78 pushed a commit to iqba78/android_kernel_xiaomi_sdm660_southwest that referenced this issue Aug 2, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35ef ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Kneba <abenkenary3@gmail.com>
Signed-off-by: Tiktodz <ewprjkt@proton.me>
strongreasons pushed a commit to strongreasons/msm-4.4 that referenced this issue Aug 6, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Kneba <abenkenary3@gmail.com>
Tiktodz pushed a commit to Tiktodz/android_kernel_asus_sdm636 that referenced this issue Aug 11, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Kneba <abenkenary3@gmail.com>
Signed-off-by: dotkit <ewprjkt@proton.me>
log1cs pushed a commit to nokia-msm8998/android_kernel_nokia_msm8998 that referenced this issue Aug 20, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: RyuujiX <saputradenny712@gmail.com>
log1cs pushed a commit to nokia-msm8998/android_kernel_nokia_msm8998 that referenced this issue Aug 23, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: RyuujiX <saputradenny712@gmail.com>
log1cs pushed a commit to nokia-msm8998/android_kernel_nokia_msm8998 that referenced this issue Oct 1, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: RyuujiX <saputradenny712@gmail.com>
chilkat81 pushed a commit to chilkat81/android_kernel_nubia_msm8998 that referenced this issue Oct 16, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: RyuujiX <saputradenny712@gmail.com>
chilkat81 pushed a commit to chilkat81/android_kernel_nubia_msm8998 that referenced this issue Oct 16, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: RyuujiX <saputradenny712@gmail.com>
chilkat81 pushed a commit to chilkat81/android_kernel_nubia_msm8998 that referenced this issue Oct 16, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: RyuujiX <saputradenny712@gmail.com>
chilkat81 pushed a commit to chilkat81/android_kernel_nubia_msm8998 that referenced this issue Oct 16, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: RyuujiX <saputradenny712@gmail.com>
chilkat81 pushed a commit to chilkat81/android_kernel_nubia_msm8998 that referenced this issue Oct 17, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: RyuujiX <saputradenny712@gmail.com>
log1cs pushed a commit to nokia-msm8998/android_kernel_nokia_msm8998 that referenced this issue Nov 23, 2023
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: RyuujiX <saputradenny712@gmail.com>
strongreasons pushed a commit to strongreasons/android_kernel_asus_sdm660 that referenced this issue Jan 20, 2024
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: RyuujiX <saputradenny712@gmail.com>
Kneba pushed a commit to Tiktodz/android_kernel_asus_sdm636 that referenced this issue Jan 21, 2024
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: RyuujiX <saputradenny712@gmail.com>
Signed-off-by: Kneba <abenkenary3@gmail.com>
Tiktodz pushed a commit to Tiktodz/android_kernel_asus_sdm636 that referenced this issue Jan 27, 2024
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Kneba <abenkenary3@gmail.com>
Signed-off-by: Tiktodz <ewprjkt@proton.me>
Signed-off-by: dotkit <ewprjkt@proton.me>
Tiktodz pushed a commit to Tiktodz/android_kernel_asus_sdm636 that referenced this issue Mar 18, 2024
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: RyuujiX <saputradenny712@gmail.com>
Signed-off-by: Tiktodz <ewprjkt@proton.me>
ryxpace pushed a commit to ryxpace/kernel_xiaomi_whyred that referenced this issue May 10, 2024
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: RyuujiX <saputradenny712@gmail.com>
strongreasons pushed a commit to strongreasons/android_kernel_asus_sdm660 that referenced this issue Jun 9, 2024
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Kneba <abenkenary3@gmail.com>
Signed-off-by: Tiktodz <ewprjkt@proton.me>
Signed-off-by: dotkit <ewprjkt@proton.me>
Signed-off-by: strongreasons <abenkenari@gmail.com>
strongreasons pushed a commit to strongreasons/android_kernel_asus_sdm660 that referenced this issue Jun 17, 2024
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: RyuujiX <saputradenny712@gmail.com>
Signed-off-by: strongreasons <strongreasons@users.noreply.github.com>
Tiktodz pushed a commit to Tiktodz/android_kernel_asus_sdm636 that referenced this issue Jul 4, 2024
This fixes a ptrace vs fatal pending signals bug as manifested in
seccomp now that seccomp was reordered to happen after ptrace. The
short version is that seccomp should not attempt to call do_exit()
while fatal signals are pending under a tracer. The existing code was
trying to be as defensively paranoid as possible, but it now ends up
confusing ptrace. Instead, the syscall can just be skipped (which solves
the original concern that the do_exit() was addressing) and normal signal
handling, tracer notification, and process death can happen.

Paraphrasing from the original bug report:

If a tracee task is in a PTRACE_EVENT_SECCOMP trap, or has been resumed
after such a trap but not yet been scheduled, and another task in the
thread-group calls exit_group(), then the tracee task exits without the
ptracer receiving a PTRACE_EVENT_EXIT notification. Test case here:
https://gist.github.com/khuey/3c43ac247c72cef8c956ca73281c9be7

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
rr-debugger/rr#1762 (comment).

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Reported-by: Kyle Huey <khuey@kylehuey.com>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Fixes: 93e35efb8de4 ("x86/ptrace: run seccomp after ptrace")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
(cherry picked from commit 485a252a5559b45d7df04c819ec91177c62c270b)

Bug: 119769499
Change-Id: I444e69093e88d58587b4d5c4f2d777985591c32d
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: RyuujiX <saputradenny712@gmail.com>
Signed-off-by: dotkit <dotkit@electrowizard.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants