Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backporting 5.3/5.4 changes into 4.19 #32

Merged

Conversation

kitakar5525
Copy link
Member

@kitakar5525 kitakar5525 commented Jan 17, 2020

See also #31
This one large PR is composed of the following backports:

I first wanted to separate this one large PR into multiple PRs, but one logical change depends on another change. So, sending one large PR.

  • [ ] Is this OK?
  • [ ] Is there another better way?

Commit that needs discussion:

  • 437b497 ("Fixup: BACKPORT: ipts: add static_assert definition to header files")
    (Backporting upstream commits vs defining macros in header files, see commit message for details)
    I added static_assert definition into header files, is that OK?
commit log with annotation

feature/ipts-no-modeswitch

cherry-picked without conflicts:
69ea9ac - Dorian Stoll, 2019-12-20 : BACKPORT: ipts: Dont automatically switch sensor modes

wifi-suspend-fix

cherry-picked without conflicts:
81a669a - sebanc, 2019-11-04 : BACKPORT: wireless/mwifiex: Fix S0ix / suspend

git revert <commit> according to linux-surface 5.3:
0202aad - kitakar5525, 2020-01-16 : Revert "wireless/mwifiex: Add quirks for Surface devices"

feature/companion-no-feedback

cherry-picked without conflicts:
78ec176 - Dorian Stoll, 2019-11-10 : BACKPORT: ipts: Add a generic interface for quirks
83eb67d - Dorian Stoll, 2019-11-08 : BACKPORT: ipts: Let the companion decide about no_feedback

feature/ipts-cleanup

cherry-picked without conflicts:
38c7513 - Dorian Stoll, 2019-11-07 : BACKPORT: ipts: Document output of IPTS debug thread
16d6324 - Dorian Stoll, 2019-10-28 : BACKPORT: ipts: Remove ipts prefix from driver source files
75345e3 - Dorian Stoll, 2019-10-28 : BACKPORT: ipts: Move debugging settings into modparams

Fixup by me:
437b497 - kitakar5525, 2020-01-16 : Fixup: BACKPORT: ipts: add static_assert definition to header files
5928135 - kitakar5525, 2020-01-16 : Fixup: BACKPORT: ipts: Cleanup ipts-i915 to pass checkpatch

Resolved conflicts by me:
7d81703 - Dorian Stoll, 2019-10-28 : BACKPORT: ipts: Cleanup ipts-i915 to pass checkpatch

cherry-picked without conflicts:
8ab16f2 - Dorian Stoll, 2019-10-28 : BACKPORT: ipts: Cleanup ipts-mei to pass checkpatch
21832a5 - Dorian Stoll, 2019-10-28 : BACKPORT: ipts: Update Makefile / Kconfig to pass checkpatch
14ca509 - Dorian Stoll, 2019-10-28 : BACKPORT: ipts: Cleanup ipts-kernel to pass checkpatch
5874171 - Dorian Stoll, 2019-10-28 : BACKPORT: ipts: Cleanup ipts-hid to pass checkpatch
47b71b6 - Dorian Stoll, 2019-10-28 : BACKPORT: ipts: Cleanup public headers to pass checkpatch
79d32d1 - Dorian Stoll, 2019-10-28 : BACKPORT: ipts: Cleanup ipts-surface to pass checkpatch
085f2d3 - Dorian Stoll, 2019-10-28 : BACKPORT: ipts: Cleanup ipts-msg-handler to pass checkpatch
6c2bceb - Dorian Stoll, 2019-10-28 : BACKPORT: ipts: Cleanup ipts-companion to pass checkpatch
36bb00d - Dorian Stoll, 2019-10-27 : BACKPORT: ipts: Cleanup ipts-gfx to pass checkpatch
76642e3 - Dorian Stoll, 2019-10-27 : BACKPORT: ipts: Cleanup ipts-state.h to pass checkpatch
7672bb0 - Dorian Stoll, 2019-10-27 : BACKPORT: ipts: Cleanup ipts-resource to pass checkpatch
83619a6 - Dorian Stoll, 2019-10-27 : BACKPORT: ipts: Cleanup ipts-params to pass checkpatch
525aeb7 - Dorian Stoll, 2019-10-27 : BACKPORT: ipts: Cleanup ipts-dbgfs.c to pass checkpatch
97256f6 - Dorian Stoll, 2019-10-27 : BACKPORT: ipts: Cleanup ipts-sensor-regs.h to pass checkpatch
b5cd975 - Dorian Stoll, 2019-10-27 : BACKPORT: ipts: Clean up ipts.h to pass checkpatch
7dfd1b6 - Dorian Stoll, 2019-10-27 : BACKPORT: ipts: Clean up ipts-mei-msg.h to pass checkpatch

feature/no-ipts-config-file

cherry-picked without conflicts:
e54ba9e - Dorian Stoll, 2019-10-27 : BACKPORT: ipts: Address comments by qzed and grayhatter
8700a7f - Dorian Stoll, 2019-10-20 : BACKPORT: ipts: Move ipts_fw_config.bin into the companion
c48c2ee - Dorian Stoll, 2019-10-20 : BACKPORT: ipts: Implement new companion interface

bugfix/touch-rotation

cherry-picked without conflicts:
7e520ac - Dorian Stoll, 2019-09-18 : BACKPORT: Remove IPTS sensor mode APIs

git am <patch> (see commit message for patch location):
2f2ae1a - kitakar5525, 2019-09-06 : BACKPORT: ipts: hid-multitouch: revert all the changes

old-patches-cleanup

cherry-picked without conflicts:
cb11130 - Maximilian Luz, 2019-09-18 : BACKPORT: Revert downstream_hpd_needs_d0 changes in i915

git revert <commit> according to linux-surface 5.3:
1e27f5e - kitakar5525, 2020-01-16 : Revert "wireless: Add mwlwifi driver"
e1adbee - kitakar5525, 2020-01-16 : Revert "power/suspend: Add delay for device quirks"
f346b5b - kitakar5525, 2020-01-16 : Revert "staging: Add WIP camera drivers for Surfae devices"
5ab0b86 - kitakar5525, 2020-01-16 : Revert "USB: Modify lpm check for sdcard-reader"

This reverts commit 218ad8f.

(corresponds to commit 810a019)
This reverts commit 4bb6b4b.

(corresponds to commit 71867ba)
This reverts commit b026a27.

(corresponds to commit 09372ba)
@kitakar5525
Copy link
Member Author

Oops, I should have separated the following revert commits:
e8e9087 - kitakar5525, 2020-01-16 : Revert "HID: Improve Windows Precision Touchpad detection."
6515d8e - kitakar5525, 2020-01-16 : Revert "sparc64: implement ioremap_uc"

Those commits are not backporting, but needed to be reverted to avoid conflicts between those commits and upstream commits.

I'll separate those two commits.

@kitakar5525
Copy link
Member Author

Separating done by making another PR.

@qzed
Copy link
Member

qzed commented Jan 17, 2020

Looks good to me! Regarding static_assert: I think you've chosen the best option.

@kitakar5525
Copy link
Member Author

  • Oh, I used exact the same commit title from 5.3: "BACKPORT: Revert downstream_hpd_needs_d0 changes in i915_legacy". Is that confusing? Should remove "_legacy"?

  • The commit title "Fixup: BACKPORT: add static_assert definition to header files" too generic?
    Should be something like "Fixup: BACKPORT: ipts: add static_assert definition to header files"?

@kitakar5525 kitakar5525 marked this pull request as ready for review January 18, 2020 03:23
@kitakar5525 kitakar5525 changed the title [WIP] Backporting 5.3/5.4 changes into 4.19 Backporting 5.3/5.4 changes into 4.19 Jan 18, 2020
@qzed
Copy link
Member

qzed commented Jan 18, 2020

  • Oh, I used exact the same commit title from 5.3: "BACKPORT: Revert downstream_hpd_needs_d0 changes in i915_legacy". Is that confusing? Should remove "_legacy"?

Yeah, I think dropping the _legacy is a good idea.

  • The commit title "Fixup: BACKPORT: add static_assert definition to header files" too generic?
    Should be something like "Fixup: BACKPORT: ipts: add static_assert definition to header files"?

The ipts is a bit clearer, but I'm fine with both.

qzed and others added 17 commits January 19, 2020 10:46
(cherry-picked from commit 434376f)
[changed commit title from "Revert downstream_hpd_needs_d0 changes
in i915_legacy" because 5.3 kernel used i915_legacy but 4.19 kernel
uses i915.]
The previous commit reverted the changes to hid-multitouch that were done by
the IPTS patch. The original implementation modified hid-multitouch to make the
single-touch mode work. However, window managers such as GNOME's mutter cannot
differentiate between two touchscreens on the same panel. This results in broken
touch input rotation (the single-touch panel gets rotated, even if it is not
active).

jakeday/linux-surface#549 had some discussion on this
and multiple fixes. But I think the less changes to unrelated kernel code, the
better. I don't think that anyone actually uses single-touch mode.

Since single-touch mode was disabled by the previous commit, remove all user-
space APIs to enable it, since that just sends all touch input to /dev/null

Because IPTS does switch it's mode during initialization, it is not possible to
remove all references to the sensor mode, at least not without a more in-depth
review.

Signed-off-by: Dorian Stoll <dorian.stoll@tmsp.io>

(cherry-picked from commit 86f95d2)
This makes it easier to extend the companion driver, since everything is
stored in a single struct.

The ipts_companion_t struct is stored in two places, the IPTS main driver,
and the driver registering it. It is still passed through to the functions
that the companion driver implements to allow for device specific behaviour.

Signed-off-by: Dorian Stoll <dorian.stoll@tmsp.io>

(cherry-picked from commit a1531a7)
Since ipts_fw_config.bin is mapped directly onto C structs by the IPTS
driver, we can declare those values ourselves, instead of putting them
into an obscure binary file that was provided by Intel ages ago.

This could also open up the door to modifying the firmware config in
case anyone ever understands IPTS that much.

Signed-off-by: Dorian Stoll <dorian.stoll@tmsp.io>

(cherry-picked from commit e1cfd84)
* Removed some of the gotos in the simpler companion function
* Made the companion driver the first parameter in the firmware handler
* Changed ipts-surface to warn when adding the companion driver failed
* Made ipts_alloc_fw_list static
* Streamlined ipts_request_firmware_config
* Added explicit values for ipts_bin_data_file_flags_t elements

Signed-off-by: Dorian Stoll <dorian.stoll@tmsp.io>

(cherry-picked from commit 636e23d)
Signed-off-by: Dorian Stoll <dorian.stoll@tmsp.io>

(cherry-picked from commit a16bb16)
Signed-off-by: Dorian Stoll <dorian.stoll@tmsp.io>

(cherry-picked from commit bda1092)
Signed-off-by: Dorian Stoll <dorian.stoll@tmsp.io>

(cherry-picked from commit 30a9d30)
Signed-off-by: Dorian Stoll <dorian.stoll@tmsp.io>

(cherry-picked from commit 52e26dd)
Signed-off-by: Dorian Stoll <dorian.stoll@tmsp.io>

(cherry-picked commit from 9c9dfad)
Signed-off-by: Dorian Stoll <dorian.stoll@tmsp.io>

(cherry-picked from commit 05f51be)
Signed-off-by: Dorian Stoll <dorian.stoll@tmsp.io>

(cherry-picked from commit 4aaed39)
Signed-off-by: Dorian Stoll <dorian.stoll@tmsp.io>

(cherry-picked from commit 24f97ef)
Signed-off-by: Dorian Stoll <dorian.stoll@tmsp.io>

(cherry-picked from commit f63f758)
Signed-off-by: Dorian Stoll <dorian.stoll@tmsp.io>

(cherry-picked from commit 0774af9)
Signed-off-by: Dorian Stoll <dorian.stoll@tmsp.io>

(cherry-picked from commit c0556e1)
StollD and others added 6 commits January 19, 2020 10:49
Signed-off-by: Dorian Stoll <dorian.stoll@tmsp.io>

(cherry-picked from commit 6fad9aa)
Having a DMI match for some surface devices doesn't really match with
the concept of having a device-specific companion driver, so this commit
moves the matching code to the companion.

ipts-surface is now setup in a way that allows us to add device-specific
data pretty easily. All you need is the interface to IPTS, which can
just be copied from no_feedback.

Signed-off-by: Dorian Stoll <dorian.stoll@tmsp.io>

(cherry-picked from commit 40f343b)
Signed-off-by: Dorian Stoll <dorian.stoll@tmsp.io>

(cherry-picked from commit 55e49e1)
This reverts commit 7f59405.

(corresponds to commit d95adde)
We don't want IPTS to set itself to singletouch mode in case it thinks
that might fix anything, because it doesn't. Even a malfunctioning
multitouch IPTS has a higher chance of working than singletouch.

Signed-off-by: Dorian Stoll <dorian.stoll@tmsp.io>

(cherry-picked from commit 894fbba)
@kitakar5525
Copy link
Member Author

Yeah, I think dropping the _legacy is a good idea.

The ipts is a bit clearer, but I'm fine with both.

OK, both done.

@kitakar5525
Copy link
Member Author

kitakar5525 commented Jan 19, 2020

By the way, when checking generated IPTS patch, checkpatch added one error and one warning compared to IPTS patch for 5.3:

ERROR: else should follow close brace '}'
#386: FILE: drivers/gpu/drm/i915/intel_guc_submission.c:117:
+	}
+	else {
WARNING: line over 80 characters
#707: FILE: drivers/gpu/drm/i915/intel_ipts.c:204:
+	ret = execlists_context_deferred_alloc(ipts_ctx, dev_priv->engine[RCS], ce);
  • cause of the error is due to changes after 4.19 and the change was not backported.
    On 5.3, the error was already fixed when adding IPTS support (871f089)

  • cause of the warning is due to execlists_context_deferred_alloc function definition changes between 4.19 and 5.3, but long-line warning can be ignored for now (and already ignoring some long-lines).

I'll make another PR to fix the error after this PR has been merged (because this is not backporting from 5.3/5.4).

@qzed
Copy link
Member

qzed commented Jan 20, 2020

Yeah, I think we can ignore the "line over 80 characters" warnings for now.

@archseer @StollD any last comments?

@StollD
Copy link
Member

StollD commented Jan 20, 2020

As long as the code works I am fine with it. I wouldn't bother too much with checkpatch and IPTS at this point. If everything goes well, that driver won't be the way forward anyways.

@qzed
Copy link
Member

qzed commented Jan 20, 2020

Right. I'll test it later and merge it if I don' find any issues.

@qzed
Copy link
Member

qzed commented Jan 21, 2020

Tested with 4.19.95. Haven't noticed any apparent issues. @kitakar5525 Thank you for your work!

@qzed qzed merged commit b959e95 into linux-surface:v4.19-surface-devel Jan 21, 2020
@kitakar5525 kitakar5525 deleted the backport/v4.19-surface-devel branch January 29, 2020 13:03
qzed pushed a commit that referenced this pull request Mar 6, 2020
… delivery

After a treclaim, we expect to be in non-transactional state. If we
don't clear the current thread's MSR[TS] before we get preempted, then
tm_recheckpoint_new_task() will recheckpoint and we get rescheduled in
suspended transaction state.

When handling a signal caught in transactional state,
handle_rt_signal64() calls get_tm_stackpointer() that treclaims the
transaction using tm_reclaim_current() but without clearing the
thread's MSR[TS]. This can cause the TM Bad Thing exception below if
later we pagefault and get preempted trying to access the user's
sigframe, using __put_user(). Afterwards, when we are rescheduled back
into do_page_fault() (but now in suspended state since the thread's
MSR[TS] was not cleared), upon executing 'rfid' after completion of
the page fault handling, the exception is raised because a transition
from suspended to non-transactional state is invalid.

  Unexpected TM Bad Thing exception at c00000000000de44 (msr 0x8000000302a03031) tm_scratch=800000010280b033
  Oops: Unrecoverable exception, sig: 6 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  CPU: 25 PID: 15547 Comm: a.out Not tainted 5.4.0-rc2 #32
  NIP:  c00000000000de44 LR: c000000000034728 CTR: 0000000000000000
  REGS: c00000003fe7bd70 TRAP: 0700   Not tainted  (5.4.0-rc2)
  MSR:  8000000302a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[SE]>  CR: 44000884  XER: 00000000
  CFAR: c00000000000dda4 IRQMASK: 0
  PACATMSCRATCH: 800000010280b033
  GPR00: c000000000034728 c000000f65a17c80 c000000001662800 00007fffacf3fd78
  GPR04: 0000000000001000 0000000000001000 0000000000000000 c000000f611f8af0
  GPR08: 0000000000000000 0000000078006001 0000000000000000 000c000000000000
  GPR12: c000000f611f84b0 c00000003ffcb200 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000f611f8140
  GPR24: 0000000000000000 00007fffacf3fd68 c000000f65a17d90 c000000f611f7800
  GPR28: c000000f65a17e90 c000000f65a17e90 c000000001685e18 00007fffacf3f000
  NIP [c00000000000de44] fast_exception_return+0xf4/0x1b0
  LR [c000000000034728] handle_rt_signal64+0x78/0xc50
  Call Trace:
  [c000000f65a17c80] [c000000000034710] handle_rt_signal64+0x60/0xc50 (unreliable)
  [c000000f65a17d30] [c000000000023640] do_notify_resume+0x330/0x460
  [c000000f65a17e20] [c00000000000dcc4] ret_from_except_lite+0x70/0x74
  Instruction dump:
  7c4ff120 e8410170 7c5a03a6 38400000 f8410060 e8010070 e8410080 e8610088
  60000000 60000000 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed0989
  ---[ end trace 93094aa44b442f87 ]---

The simplified sequence of events that triggers the above exception is:

  ...				# userspace in NON-TRANSACTIONAL state
  tbegin			# userspace in TRANSACTIONAL state
  signal delivery		# kernelspace in SUSPENDED state
  handle_rt_signal64()
    get_tm_stackpointer()
      treclaim			# kernelspace in NON-TRANSACTIONAL state
    __put_user()
      page fault happens. We will never get back here because of the TM Bad Thing exception.

  page fault handling kicks in and we voluntarily preempt ourselves
  do_page_fault()
    __schedule()
      __switch_to(other_task)

  our task is rescheduled and we recheckpoint because the thread's MSR[TS] was not cleared
  __switch_to(our_task)
    switch_to_tm()
      tm_recheckpoint_new_task()
        trechkpt			# kernelspace in SUSPENDED state

  The page fault handling resumes, but now we are in suspended transaction state
  do_page_fault()    completes
  rfid     <----- trying to get back where the page fault happened (we were non-transactional back then)
  TM Bad Thing			# illegal transition from suspended to non-transactional

This patch fixes that issue by clearing the current thread's MSR[TS]
just after treclaim in get_tm_stackpointer() so that we stay in
non-transactional state in case we are preempted. In order to make
treclaim and clearing the thread's MSR[TS] atomic from a preemption
perspective when CONFIG_PREEMPT is set, preempt_disable/enable() is
used. It's also necessary to save the previous value of the thread's
MSR before get_tm_stackpointer() is called so that it can be exposed
to the signal handler later in setup_tm_sigcontexts() to inform the
userspace MSR at the moment of the signal delivery.

Found with tm-signal-context-force-tm kernel selftest.

Fixes: 2b0a576 ("powerpc: Add new transactional memory state to the signal context")
Cc: stable@vger.kernel.org # v3.9
Signed-off-by: Gustavo Luiz Duarte <gustavold@linux.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200211033831.11165-1-gustavold@linux.ibm.com
qzed pushed a commit that referenced this pull request Mar 16, 2020
… delivery

commit 2464cc4 upstream.

After a treclaim, we expect to be in non-transactional state. If we
don't clear the current thread's MSR[TS] before we get preempted, then
tm_recheckpoint_new_task() will recheckpoint and we get rescheduled in
suspended transaction state.

When handling a signal caught in transactional state,
handle_rt_signal64() calls get_tm_stackpointer() that treclaims the
transaction using tm_reclaim_current() but without clearing the
thread's MSR[TS]. This can cause the TM Bad Thing exception below if
later we pagefault and get preempted trying to access the user's
sigframe, using __put_user(). Afterwards, when we are rescheduled back
into do_page_fault() (but now in suspended state since the thread's
MSR[TS] was not cleared), upon executing 'rfid' after completion of
the page fault handling, the exception is raised because a transition
from suspended to non-transactional state is invalid.

  Unexpected TM Bad Thing exception at c00000000000de44 (msr 0x8000000302a03031) tm_scratch=800000010280b033
  Oops: Unrecoverable exception, sig: 6 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  CPU: 25 PID: 15547 Comm: a.out Not tainted 5.4.0-rc2 #32
  NIP:  c00000000000de44 LR: c000000000034728 CTR: 0000000000000000
  REGS: c00000003fe7bd70 TRAP: 0700   Not tainted  (5.4.0-rc2)
  MSR:  8000000302a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[SE]>  CR: 44000884  XER: 00000000
  CFAR: c00000000000dda4 IRQMASK: 0
  PACATMSCRATCH: 800000010280b033
  GPR00: c000000000034728 c000000f65a17c80 c000000001662800 00007fffacf3fd78
  GPR04: 0000000000001000 0000000000001000 0000000000000000 c000000f611f8af0
  GPR08: 0000000000000000 0000000078006001 0000000000000000 000c000000000000
  GPR12: c000000f611f84b0 c00000003ffcb200 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000f611f8140
  GPR24: 0000000000000000 00007fffacf3fd68 c000000f65a17d90 c000000f611f7800
  GPR28: c000000f65a17e90 c000000f65a17e90 c000000001685e18 00007fffacf3f000
  NIP [c00000000000de44] fast_exception_return+0xf4/0x1b0
  LR [c000000000034728] handle_rt_signal64+0x78/0xc50
  Call Trace:
  [c000000f65a17c80] [c000000000034710] handle_rt_signal64+0x60/0xc50 (unreliable)
  [c000000f65a17d30] [c000000000023640] do_notify_resume+0x330/0x460
  [c000000f65a17e20] [c00000000000dcc4] ret_from_except_lite+0x70/0x74
  Instruction dump:
  7c4ff120 e8410170 7c5a03a6 38400000 f8410060 e8010070 e8410080 e8610088
  60000000 60000000 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed0989
  ---[ end trace 93094aa44b442f87 ]---

The simplified sequence of events that triggers the above exception is:

  ...				# userspace in NON-TRANSACTIONAL state
  tbegin			# userspace in TRANSACTIONAL state
  signal delivery		# kernelspace in SUSPENDED state
  handle_rt_signal64()
    get_tm_stackpointer()
      treclaim			# kernelspace in NON-TRANSACTIONAL state
    __put_user()
      page fault happens. We will never get back here because of the TM Bad Thing exception.

  page fault handling kicks in and we voluntarily preempt ourselves
  do_page_fault()
    __schedule()
      __switch_to(other_task)

  our task is rescheduled and we recheckpoint because the thread's MSR[TS] was not cleared
  __switch_to(our_task)
    switch_to_tm()
      tm_recheckpoint_new_task()
        trechkpt			# kernelspace in SUSPENDED state

  The page fault handling resumes, but now we are in suspended transaction state
  do_page_fault()    completes
  rfid     <----- trying to get back where the page fault happened (we were non-transactional back then)
  TM Bad Thing			# illegal transition from suspended to non-transactional

This patch fixes that issue by clearing the current thread's MSR[TS]
just after treclaim in get_tm_stackpointer() so that we stay in
non-transactional state in case we are preempted. In order to make
treclaim and clearing the thread's MSR[TS] atomic from a preemption
perspective when CONFIG_PREEMPT is set, preempt_disable/enable() is
used. It's also necessary to save the previous value of the thread's
MSR before get_tm_stackpointer() is called so that it can be exposed
to the signal handler later in setup_tm_sigcontexts() to inform the
userspace MSR at the moment of the signal delivery.

Found with tm-signal-context-force-tm kernel selftest.

Fixes: 2b0a576 ("powerpc: Add new transactional memory state to the signal context")
Cc: stable@vger.kernel.org # v3.9
Signed-off-by: Gustavo Luiz Duarte <gustavold@linux.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200211033831.11165-1-gustavold@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
qzed pushed a commit that referenced this pull request Mar 17, 2020
… delivery

commit 2464cc4 upstream.

After a treclaim, we expect to be in non-transactional state. If we
don't clear the current thread's MSR[TS] before we get preempted, then
tm_recheckpoint_new_task() will recheckpoint and we get rescheduled in
suspended transaction state.

When handling a signal caught in transactional state,
handle_rt_signal64() calls get_tm_stackpointer() that treclaims the
transaction using tm_reclaim_current() but without clearing the
thread's MSR[TS]. This can cause the TM Bad Thing exception below if
later we pagefault and get preempted trying to access the user's
sigframe, using __put_user(). Afterwards, when we are rescheduled back
into do_page_fault() (but now in suspended state since the thread's
MSR[TS] was not cleared), upon executing 'rfid' after completion of
the page fault handling, the exception is raised because a transition
from suspended to non-transactional state is invalid.

  Unexpected TM Bad Thing exception at c00000000000de44 (msr 0x8000000302a03031) tm_scratch=800000010280b033
  Oops: Unrecoverable exception, sig: 6 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  CPU: 25 PID: 15547 Comm: a.out Not tainted 5.4.0-rc2 #32
  NIP:  c00000000000de44 LR: c000000000034728 CTR: 0000000000000000
  REGS: c00000003fe7bd70 TRAP: 0700   Not tainted  (5.4.0-rc2)
  MSR:  8000000302a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[SE]>  CR: 44000884  XER: 00000000
  CFAR: c00000000000dda4 IRQMASK: 0
  PACATMSCRATCH: 800000010280b033
  GPR00: c000000000034728 c000000f65a17c80 c000000001662800 00007fffacf3fd78
  GPR04: 0000000000001000 0000000000001000 0000000000000000 c000000f611f8af0
  GPR08: 0000000000000000 0000000078006001 0000000000000000 000c000000000000
  GPR12: c000000f611f84b0 c00000003ffcb200 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000f611f8140
  GPR24: 0000000000000000 00007fffacf3fd68 c000000f65a17d90 c000000f611f7800
  GPR28: c000000f65a17e90 c000000f65a17e90 c000000001685e18 00007fffacf3f000
  NIP [c00000000000de44] fast_exception_return+0xf4/0x1b0
  LR [c000000000034728] handle_rt_signal64+0x78/0xc50
  Call Trace:
  [c000000f65a17c80] [c000000000034710] handle_rt_signal64+0x60/0xc50 (unreliable)
  [c000000f65a17d30] [c000000000023640] do_notify_resume+0x330/0x460
  [c000000f65a17e20] [c00000000000dcc4] ret_from_except_lite+0x70/0x74
  Instruction dump:
  7c4ff120 e8410170 7c5a03a6 38400000 f8410060 e8010070 e8410080 e8610088
  60000000 60000000 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed0989
  ---[ end trace 93094aa44b442f87 ]---

The simplified sequence of events that triggers the above exception is:

  ...				# userspace in NON-TRANSACTIONAL state
  tbegin			# userspace in TRANSACTIONAL state
  signal delivery		# kernelspace in SUSPENDED state
  handle_rt_signal64()
    get_tm_stackpointer()
      treclaim			# kernelspace in NON-TRANSACTIONAL state
    __put_user()
      page fault happens. We will never get back here because of the TM Bad Thing exception.

  page fault handling kicks in and we voluntarily preempt ourselves
  do_page_fault()
    __schedule()
      __switch_to(other_task)

  our task is rescheduled and we recheckpoint because the thread's MSR[TS] was not cleared
  __switch_to(our_task)
    switch_to_tm()
      tm_recheckpoint_new_task()
        trechkpt			# kernelspace in SUSPENDED state

  The page fault handling resumes, but now we are in suspended transaction state
  do_page_fault()    completes
  rfid     <----- trying to get back where the page fault happened (we were non-transactional back then)
  TM Bad Thing			# illegal transition from suspended to non-transactional

This patch fixes that issue by clearing the current thread's MSR[TS]
just after treclaim in get_tm_stackpointer() so that we stay in
non-transactional state in case we are preempted. In order to make
treclaim and clearing the thread's MSR[TS] atomic from a preemption
perspective when CONFIG_PREEMPT is set, preempt_disable/enable() is
used. It's also necessary to save the previous value of the thread's
MSR before get_tm_stackpointer() is called so that it can be exposed
to the signal handler later in setup_tm_sigcontexts() to inform the
userspace MSR at the moment of the signal delivery.

Found with tm-signal-context-force-tm kernel selftest.

Fixes: 2b0a576 ("powerpc: Add new transactional memory state to the signal context")
Cc: stable@vger.kernel.org # v3.9
Signed-off-by: Gustavo Luiz Duarte <gustavold@linux.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200211033831.11165-1-gustavold@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
qzed pushed a commit that referenced this pull request Oct 1, 2020
[ Upstream commit 96298f6 ]

According to Core Spec Version 5.2 | Vol 3, Part A 6.1.5,
the incoming L2CAP_ConfigReq should be handled during
OPEN state.

The section below shows the btmon trace when running
L2CAP/COS/CFD/BV-12-C before and after this change.

=== Before ===
...
> ACL Data RX: Handle 256 flags 0x02 dlen 12                #22
      L2CAP: Connection Request (0x02) ident 2 len 4
        PSM: 1 (0x0001)
        Source CID: 65
< ACL Data TX: Handle 256 flags 0x00 dlen 16                #23
      L2CAP: Connection Response (0x03) ident 2 len 8
        Destination CID: 64
        Source CID: 65
        Result: Connection successful (0x0000)
        Status: No further information available (0x0000)
< ACL Data TX: Handle 256 flags 0x00 dlen 12                #24
      L2CAP: Configure Request (0x04) ident 2 len 4
        Destination CID: 65
        Flags: 0x0000
> HCI Event: Number of Completed Packets (0x13) plen 5      #25
        Num handles: 1
        Handle: 256
        Count: 1
> HCI Event: Number of Completed Packets (0x13) plen 5      #26
        Num handles: 1
        Handle: 256
        Count: 1
> ACL Data RX: Handle 256 flags 0x02 dlen 16                #27
      L2CAP: Configure Request (0x04) ident 3 len 8
        Destination CID: 64
        Flags: 0x0000
        Option: Unknown (0x10) [hint]
        01 00                                            ..
< ACL Data TX: Handle 256 flags 0x00 dlen 18                #28
      L2CAP: Configure Response (0x05) ident 3 len 10
        Source CID: 65
        Flags: 0x0000
        Result: Success (0x0000)
        Option: Maximum Transmission Unit (0x01) [mandatory]
          MTU: 672
> HCI Event: Number of Completed Packets (0x13) plen 5      #29
        Num handles: 1
        Handle: 256
        Count: 1
> ACL Data RX: Handle 256 flags 0x02 dlen 14                #30
      L2CAP: Configure Response (0x05) ident 2 len 6
        Source CID: 64
        Flags: 0x0000
        Result: Success (0x0000)
> ACL Data RX: Handle 256 flags 0x02 dlen 20                #31
      L2CAP: Configure Request (0x04) ident 3 len 12
        Destination CID: 64
        Flags: 0x0000
        Option: Unknown (0x10) [hint]
        01 00 91 02 11 11                                ......
< ACL Data TX: Handle 256 flags 0x00 dlen 14                #32
      L2CAP: Command Reject (0x01) ident 3 len 6
        Reason: Invalid CID in request (0x0002)
        Destination CID: 64
        Source CID: 65
> HCI Event: Number of Completed Packets (0x13) plen 5      #33
        Num handles: 1
        Handle: 256
        Count: 1
...
=== After ===
...
> ACL Data RX: Handle 256 flags 0x02 dlen 12               #22
      L2CAP: Connection Request (0x02) ident 2 len 4
        PSM: 1 (0x0001)
        Source CID: 65
< ACL Data TX: Handle 256 flags 0x00 dlen 16               #23
      L2CAP: Connection Response (0x03) ident 2 len 8
        Destination CID: 64
        Source CID: 65
        Result: Connection successful (0x0000)
        Status: No further information available (0x0000)
< ACL Data TX: Handle 256 flags 0x00 dlen 12               #24
      L2CAP: Configure Request (0x04) ident 2 len 4
        Destination CID: 65
        Flags: 0x0000
> HCI Event: Number of Completed Packets (0x13) plen 5     #25
        Num handles: 1
        Handle: 256
        Count: 1
> HCI Event: Number of Completed Packets (0x13) plen 5     #26
        Num handles: 1
        Handle: 256
        Count: 1
> ACL Data RX: Handle 256 flags 0x02 dlen 16               #27
      L2CAP: Configure Request (0x04) ident 3 len 8
        Destination CID: 64
        Flags: 0x0000
        Option: Unknown (0x10) [hint]
        01 00                                            ..
< ACL Data TX: Handle 256 flags 0x00 dlen 18               #28
      L2CAP: Configure Response (0x05) ident 3 len 10
        Source CID: 65
        Flags: 0x0000
        Result: Success (0x0000)
        Option: Maximum Transmission Unit (0x01) [mandatory]
          MTU: 672
> HCI Event: Number of Completed Packets (0x13) plen 5     #29
        Num handles: 1
        Handle: 256
        Count: 1
> ACL Data RX: Handle 256 flags 0x02 dlen 14               #30
      L2CAP: Configure Response (0x05) ident 2 len 6
        Source CID: 64
        Flags: 0x0000
        Result: Success (0x0000)
> ACL Data RX: Handle 256 flags 0x02 dlen 20               #31
      L2CAP: Configure Request (0x04) ident 3 len 12
        Destination CID: 64
        Flags: 0x0000
        Option: Unknown (0x10) [hint]
        01 00 91 02 11 11                                .....
< ACL Data TX: Handle 256 flags 0x00 dlen 18               #32
      L2CAP: Configure Response (0x05) ident 3 len 10
        Source CID: 65
        Flags: 0x0000
        Result: Success (0x0000)
        Option: Maximum Transmission Unit (0x01) [mandatory]
          MTU: 672
< ACL Data TX: Handle 256 flags 0x00 dlen 12               #33
      L2CAP: Configure Request (0x04) ident 3 len 4
        Destination CID: 65
        Flags: 0x0000
> HCI Event: Number of Completed Packets (0x13) plen 5     #34
        Num handles: 1
        Handle: 256
        Count: 1
> HCI Event: Number of Completed Packets (0x13) plen 5     #35
        Num handles: 1
        Handle: 256
        Count: 1
...

Signed-off-by: Howard Chung <howardchung@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
qzed pushed a commit that referenced this pull request Oct 1, 2020
[ Upstream commit 96298f6 ]

According to Core Spec Version 5.2 | Vol 3, Part A 6.1.5,
the incoming L2CAP_ConfigReq should be handled during
OPEN state.

The section below shows the btmon trace when running
L2CAP/COS/CFD/BV-12-C before and after this change.

=== Before ===
...
> ACL Data RX: Handle 256 flags 0x02 dlen 12                #22
      L2CAP: Connection Request (0x02) ident 2 len 4
        PSM: 1 (0x0001)
        Source CID: 65
< ACL Data TX: Handle 256 flags 0x00 dlen 16                #23
      L2CAP: Connection Response (0x03) ident 2 len 8
        Destination CID: 64
        Source CID: 65
        Result: Connection successful (0x0000)
        Status: No further information available (0x0000)
< ACL Data TX: Handle 256 flags 0x00 dlen 12                #24
      L2CAP: Configure Request (0x04) ident 2 len 4
        Destination CID: 65
        Flags: 0x0000
> HCI Event: Number of Completed Packets (0x13) plen 5      #25
        Num handles: 1
        Handle: 256
        Count: 1
> HCI Event: Number of Completed Packets (0x13) plen 5      #26
        Num handles: 1
        Handle: 256
        Count: 1
> ACL Data RX: Handle 256 flags 0x02 dlen 16                #27
      L2CAP: Configure Request (0x04) ident 3 len 8
        Destination CID: 64
        Flags: 0x0000
        Option: Unknown (0x10) [hint]
        01 00                                            ..
< ACL Data TX: Handle 256 flags 0x00 dlen 18                #28
      L2CAP: Configure Response (0x05) ident 3 len 10
        Source CID: 65
        Flags: 0x0000
        Result: Success (0x0000)
        Option: Maximum Transmission Unit (0x01) [mandatory]
          MTU: 672
> HCI Event: Number of Completed Packets (0x13) plen 5      #29
        Num handles: 1
        Handle: 256
        Count: 1
> ACL Data RX: Handle 256 flags 0x02 dlen 14                #30
      L2CAP: Configure Response (0x05) ident 2 len 6
        Source CID: 64
        Flags: 0x0000
        Result: Success (0x0000)
> ACL Data RX: Handle 256 flags 0x02 dlen 20                #31
      L2CAP: Configure Request (0x04) ident 3 len 12
        Destination CID: 64
        Flags: 0x0000
        Option: Unknown (0x10) [hint]
        01 00 91 02 11 11                                ......
< ACL Data TX: Handle 256 flags 0x00 dlen 14                #32
      L2CAP: Command Reject (0x01) ident 3 len 6
        Reason: Invalid CID in request (0x0002)
        Destination CID: 64
        Source CID: 65
> HCI Event: Number of Completed Packets (0x13) plen 5      #33
        Num handles: 1
        Handle: 256
        Count: 1
...
=== After ===
...
> ACL Data RX: Handle 256 flags 0x02 dlen 12               #22
      L2CAP: Connection Request (0x02) ident 2 len 4
        PSM: 1 (0x0001)
        Source CID: 65
< ACL Data TX: Handle 256 flags 0x00 dlen 16               #23
      L2CAP: Connection Response (0x03) ident 2 len 8
        Destination CID: 64
        Source CID: 65
        Result: Connection successful (0x0000)
        Status: No further information available (0x0000)
< ACL Data TX: Handle 256 flags 0x00 dlen 12               #24
      L2CAP: Configure Request (0x04) ident 2 len 4
        Destination CID: 65
        Flags: 0x0000
> HCI Event: Number of Completed Packets (0x13) plen 5     #25
        Num handles: 1
        Handle: 256
        Count: 1
> HCI Event: Number of Completed Packets (0x13) plen 5     #26
        Num handles: 1
        Handle: 256
        Count: 1
> ACL Data RX: Handle 256 flags 0x02 dlen 16               #27
      L2CAP: Configure Request (0x04) ident 3 len 8
        Destination CID: 64
        Flags: 0x0000
        Option: Unknown (0x10) [hint]
        01 00                                            ..
< ACL Data TX: Handle 256 flags 0x00 dlen 18               #28
      L2CAP: Configure Response (0x05) ident 3 len 10
        Source CID: 65
        Flags: 0x0000
        Result: Success (0x0000)
        Option: Maximum Transmission Unit (0x01) [mandatory]
          MTU: 672
> HCI Event: Number of Completed Packets (0x13) plen 5     #29
        Num handles: 1
        Handle: 256
        Count: 1
> ACL Data RX: Handle 256 flags 0x02 dlen 14               #30
      L2CAP: Configure Response (0x05) ident 2 len 6
        Source CID: 64
        Flags: 0x0000
        Result: Success (0x0000)
> ACL Data RX: Handle 256 flags 0x02 dlen 20               #31
      L2CAP: Configure Request (0x04) ident 3 len 12
        Destination CID: 64
        Flags: 0x0000
        Option: Unknown (0x10) [hint]
        01 00 91 02 11 11                                .....
< ACL Data TX: Handle 256 flags 0x00 dlen 18               #32
      L2CAP: Configure Response (0x05) ident 3 len 10
        Source CID: 65
        Flags: 0x0000
        Result: Success (0x0000)
        Option: Maximum Transmission Unit (0x01) [mandatory]
          MTU: 672
< ACL Data TX: Handle 256 flags 0x00 dlen 12               #33
      L2CAP: Configure Request (0x04) ident 3 len 4
        Destination CID: 65
        Flags: 0x0000
> HCI Event: Number of Completed Packets (0x13) plen 5     #34
        Num handles: 1
        Handle: 256
        Count: 1
> HCI Event: Number of Completed Packets (0x13) plen 5     #35
        Num handles: 1
        Handle: 256
        Count: 1
...

Signed-off-by: Howard Chung <howardchung@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
kitakar5525 pushed a commit to kitakar5525/linux-kernel that referenced this pull request Jul 26, 2021
When the dwc2 platform device is removed, it unregisters the generic
phy. usb_remove_phy() is called and the dwc2 usb_phy is removed from the
"phy_list", but the uevent may still attempt to get the usb_phy from the
list, resulting in a page fault bug. Currently we can't access the usb_phy
from the "phy_list" after the device is removed. As a fix check to make
sure that we can get the usb_phy before moving forward with the uevent.

[   84.949345] BUG: unable to handle page fault for address:00000007935688d8
[   84.949349] #PF: supervisor read access in kernel mode
[   84.949351] #PF: error_code(0x0000) - not-present page
[   84.949353] PGD 0 P4D 0
[   84.949356] Oops: 0000 [#1] SMP PTI
[   84.949360] CPU: 2 PID: 2081 Comm: rmmod Not tainted 5.13.0-rc4-snps-16547-ga8534cb092d7-dirty linux-surface#32
[   84.949363] Hardware name: Hewlett-Packard HP Z400 Workstation/0B4Ch, BIOS 786G3 v03.54 11/02/2011
[   84.949365] RIP: 0010:usb_phy_uevent+0x99/0x121
[   84.949372] Code: 8d 83 f8 00 00 00 48 3d b0 12 22 94 74 05 4c 3b 23
75 5b 8b 83 9c 00 00 00 be 32 00 00 00 48 8d 7c 24 04 48 c7 c2 d4 5d 7b
93 <48> 8b 0c c5 e0 88 56 93 e8 0f 63 8a ff 8b 83 98 00 00 00 be 32 00
[   84.949375] RSP: 0018:ffffa46bc0f2fc70 EFLAGS: 00010246
[   84.949378] RAX: 00000000ffffffff RBX: ffffffff942211b8 RCX: 0000000000000027
[   84.949380] RDX: ffffffff937b5dd4 RSI: 0000000000000032 RDI: ffffa46bc0f2fc74
[   84.949383] RBP: ffff94a306613000 R08: 0000000000000000 R09: 00000000fffeffff
[   84.949385] R10: ffffa46bc0f2faa8 R11: ffffa46bc0f2faa0 R12: ffff94a30186d410
[   84.949387] R13: ffff94a32d188a80 R14: ffff94a30029f960 R15: ffffffff93522dd0
[   84.949389] FS:  00007efdbd417540(0000) GS:ffff94a513a80000(0000) knlGS:0000000000000000
[   84.949392] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   84.949394] CR2: 00000007935688d8 CR3: 0000000165606000 CR4: 00000000000006e0
[   84.949396] Call Trace:
[   84.949401]  dev_uevent+0x190/0x1ad
[   84.949408]  kobject_uevent_env+0x18e/0x46c
[   84.949414]  device_release_driver_internal+0x17f/0x18e
[   84.949418]  bus_remove_device+0xd3/0xe5
[   84.949421]  device_del+0x1c3/0x31d
[   84.949425]  ? kobject_put+0x97/0xa8
[   84.949428]  platform_device_del+0x1c/0x63
[   84.949432]  platform_device_unregister+0xa/0x11
[   84.949436]  dwc2_pci_remove+0x1e/0x2c [dwc2_pci]
[   84.949440]  pci_device_remove+0x31/0x81
[   84.949445]  device_release_driver_internal+0xea/0x18e
[   84.949448]  driver_detach+0x68/0x72
[   84.949450]  bus_remove_driver+0x63/0x82
[   84.949453]  pci_unregister_driver+0x1a/0x75
[   84.949457]  __do_sys_delete_module+0x149/0x1e9
[   84.949462]  ? task_work_run+0x64/0x6e
[   84.949465]  ? exit_to_user_mode_prepare+0xd4/0x10d
[   84.949471]  do_syscall_64+0x5d/0x70
[   84.949475]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   84.949480] RIP: 0033:0x7efdbd563bcb
[   84.949482] Code: 73 01 c3 48 8b 0d c5 82 0c 00 f7 d8 64 89 01 48 83
c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 95 82 0c 00 f7 d8 64 89 01 48
[   84.949485] RSP: 002b:00007ffe944d7d98 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[   84.949489] RAX: ffffffffffffffda RBX: 00005651072eb700 RCX: 00007efdbd563bcb
[   84.949491] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005651072eb768
[   84.949493] RBP: 00007ffe944d7df8 R08: 0000000000000000 R09: 0000000000000000
[   84.949495] R10: 00007efdbd5dfac0 R11: 0000000000000206 R12: 00007ffe944d7fd0
[   84.949497] R13: 00007ffe944d8610 R14: 00005651072eb2a0 R15: 00005651072eb700
[   84.949500] Modules linked in: uas configfs dwc2_pci(-) phy_generic fuse crc32c_intel [last unloaded: udc_core]
[   84.949508] CR2: 00000007935688d8
[   84.949510] ---[ end trace e40c871ca3e4dc9e ]---
[   84.949512] RIP: 0010:usb_phy_uevent+0x99/0x121

Fixes: a8534cb ("usb: phy: introduce usb_phy device type with its own uevent handler")
Reviewed-by: Peter Chen <peter.chen@kernel.org>
Signed-off-by: Artur Petrosyan <Arthur.Petrosyan@synopsys.com>
Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Link: https://lore.kernel.org/r/20210710092247.D7AFEA005D@mailhost.synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
qzed pushed a commit that referenced this pull request Feb 3, 2022
[ Upstream commit e4a41c2 ]

The following error is reported when running "./test_progs -t for_each"
under arm64:

  bpf_jit: multi-func JIT bug 58 != 56
  [...]
  JIT doesn't support bpf-to-bpf calls

The root cause is the size of BPF_PSEUDO_FUNC instruction increases
from 2 to 3 after the address of called bpf-function is settled and
there are two bpf-to-bpf calls in test_pkt_access. The generated
instructions are shown below:

  0x48:  21 00 C0 D2    movz x1, #0x1, lsl #32
  0x4c:  21 00 80 F2    movk x1, #0x1

  0x48:  E1 3F C0 92    movn x1, #0x1ff, lsl #32
  0x4c:  41 FE A2 F2    movk x1, #0x17f2, lsl #16
  0x50:  81 70 9F F2    movk x1, #0xfb84

Fixing it by using emit_addr_mov_i64() for BPF_PSEUDO_FUNC, so
the size of jited image will not change.

Fixes: 69c087b ("bpf: Add bpf_for_each_map_elem() helper")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211231151018.3781550-1-houtao1@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
kitakar5525 pushed a commit to kitakar5525/linux-kernel that referenced this pull request Feb 14, 2022
Rafael reports that on a system with LX2160A and Marvell DSA switches,
if a reboot occurs while the DSA master (dpaa2-eth) is up, the following
panic can be seen:

systemd-shutdown[1]: Rebooting.
Unable to handle kernel paging request at virtual address 00a0000800000041
[00a0000800000041] address between user and kernel address ranges
Internal error: Oops: 96000004 [#1] PREEMPT SMP
CPU: 6 PID: 1 Comm: systemd-shutdow Not tainted 5.16.5-00042-g8f5585009b24 linux-surface#32
pc : dsa_slave_netdevice_event+0x130/0x3e4
lr : raw_notifier_call_chain+0x50/0x6c
Call trace:
 dsa_slave_netdevice_event+0x130/0x3e4
 raw_notifier_call_chain+0x50/0x6c
 call_netdevice_notifiers_info+0x54/0xa0
 __dev_close_many+0x50/0x130
 dev_close_many+0x84/0x120
 unregister_netdevice_many+0x130/0x710
 unregister_netdevice_queue+0x8c/0xd0
 unregister_netdev+0x20/0x30
 dpaa2_eth_remove+0x68/0x190
 fsl_mc_driver_remove+0x20/0x5c
 __device_release_driver+0x21c/0x220
 device_release_driver_internal+0xac/0xb0
 device_links_unbind_consumers+0xd4/0x100
 __device_release_driver+0x94/0x220
 device_release_driver+0x28/0x40
 bus_remove_device+0x118/0x124
 device_del+0x174/0x420
 fsl_mc_device_remove+0x24/0x40
 __fsl_mc_device_remove+0xc/0x20
 device_for_each_child+0x58/0xa0
 dprc_remove+0x90/0xb0
 fsl_mc_driver_remove+0x20/0x5c
 __device_release_driver+0x21c/0x220
 device_release_driver+0x28/0x40
 bus_remove_device+0x118/0x124
 device_del+0x174/0x420
 fsl_mc_bus_remove+0x80/0x100
 fsl_mc_bus_shutdown+0xc/0x1c
 platform_shutdown+0x20/0x30
 device_shutdown+0x154/0x330
 __do_sys_reboot+0x1cc/0x250
 __arm64_sys_reboot+0x20/0x30
 invoke_syscall.constprop.0+0x4c/0xe0
 do_el0_svc+0x4c/0x150
 el0_svc+0x24/0xb0
 el0t_64_sync_handler+0xa8/0xb0
 el0t_64_sync+0x178/0x17c

It can be seen from the stack trace that the problem is that the
deregistration of the master causes a dev_close(), which gets notified
as NETDEV_GOING_DOWN to dsa_slave_netdevice_event().
But dsa_switch_shutdown() has already run, and this has unregistered the
DSA slave interfaces, and yet, the NETDEV_GOING_DOWN handler attempts to
call dev_close_many() on those slave interfaces, leading to the problem.

The previous attempt to avoid the NETDEV_GOING_DOWN on the master after
dsa_switch_shutdown() was called seems improper. Unregistering the slave
interfaces is unnecessary and unhelpful. Instead, after the slaves have
stopped being uppers of the DSA master, we can now reset to NULL the
master->dsa_ptr pointer, which will make DSA start ignoring all future
notifier events on the master.

Fixes: 0650bf5 ("net: dsa: be compatible with masters which unregister on shutdown")
Reported-by: Rafael Richter <rafael.richter@gin.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qzed pushed a commit that referenced this pull request Feb 18, 2022
[ Upstream commit ee53437 ]

Rafael reports that on a system with LX2160A and Marvell DSA switches,
if a reboot occurs while the DSA master (dpaa2-eth) is up, the following
panic can be seen:

systemd-shutdown[1]: Rebooting.
Unable to handle kernel paging request at virtual address 00a0000800000041
[00a0000800000041] address between user and kernel address ranges
Internal error: Oops: 96000004 [#1] PREEMPT SMP
CPU: 6 PID: 1 Comm: systemd-shutdow Not tainted 5.16.5-00042-g8f5585009b24 #32
pc : dsa_slave_netdevice_event+0x130/0x3e4
lr : raw_notifier_call_chain+0x50/0x6c
Call trace:
 dsa_slave_netdevice_event+0x130/0x3e4
 raw_notifier_call_chain+0x50/0x6c
 call_netdevice_notifiers_info+0x54/0xa0
 __dev_close_many+0x50/0x130
 dev_close_many+0x84/0x120
 unregister_netdevice_many+0x130/0x710
 unregister_netdevice_queue+0x8c/0xd0
 unregister_netdev+0x20/0x30
 dpaa2_eth_remove+0x68/0x190
 fsl_mc_driver_remove+0x20/0x5c
 __device_release_driver+0x21c/0x220
 device_release_driver_internal+0xac/0xb0
 device_links_unbind_consumers+0xd4/0x100
 __device_release_driver+0x94/0x220
 device_release_driver+0x28/0x40
 bus_remove_device+0x118/0x124
 device_del+0x174/0x420
 fsl_mc_device_remove+0x24/0x40
 __fsl_mc_device_remove+0xc/0x20
 device_for_each_child+0x58/0xa0
 dprc_remove+0x90/0xb0
 fsl_mc_driver_remove+0x20/0x5c
 __device_release_driver+0x21c/0x220
 device_release_driver+0x28/0x40
 bus_remove_device+0x118/0x124
 device_del+0x174/0x420
 fsl_mc_bus_remove+0x80/0x100
 fsl_mc_bus_shutdown+0xc/0x1c
 platform_shutdown+0x20/0x30
 device_shutdown+0x154/0x330
 __do_sys_reboot+0x1cc/0x250
 __arm64_sys_reboot+0x20/0x30
 invoke_syscall.constprop.0+0x4c/0xe0
 do_el0_svc+0x4c/0x150
 el0_svc+0x24/0xb0
 el0t_64_sync_handler+0xa8/0xb0
 el0t_64_sync+0x178/0x17c

It can be seen from the stack trace that the problem is that the
deregistration of the master causes a dev_close(), which gets notified
as NETDEV_GOING_DOWN to dsa_slave_netdevice_event().
But dsa_switch_shutdown() has already run, and this has unregistered the
DSA slave interfaces, and yet, the NETDEV_GOING_DOWN handler attempts to
call dev_close_many() on those slave interfaces, leading to the problem.

The previous attempt to avoid the NETDEV_GOING_DOWN on the master after
dsa_switch_shutdown() was called seems improper. Unregistering the slave
interfaces is unnecessary and unhelpful. Instead, after the slaves have
stopped being uppers of the DSA master, we can now reset to NULL the
master->dsa_ptr pointer, which will make DSA start ignoring all future
notifier events on the master.

Fixes: 0650bf5 ("net: dsa: be compatible with masters which unregister on shutdown")
Reported-by: Rafael Richter <rafael.richter@gin.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
qzed pushed a commit that referenced this pull request Jun 7, 2022
This commit adds python script to parse CoreSight tracing event and
print out source line and disassembly, it generates readable program
execution flow for easier humans inspecting.

The script receives CoreSight tracing packet with below format:

                +------------+------------+------------+
  packet(n):    |    addr    |    ip      |    cpu     |
                +------------+------------+------------+
  packet(n+1):  |    addr    |    ip      |    cpu     |
                +------------+------------+------------+

packet::addr presents the start address of the coming branch sample, and
packet::ip is the last address of the branch smple.  Therefore, a code
section between branches starts from packet(n)::addr and it stops at
packet(n+1)::ip.  As results we combines the two continuous packets to
generate the address range for instructions:

  [ sample(n)::addr .. sample(n+1)::ip ]

The script supports both objdump or llvm-objdump for disassembly with
specifying option '-d'.  If doesn't specify option '-d', the script
simply outputs source lines and symbols.

Below shows usages with llvm-objdump or objdump to output disassembly.

  # perf script -s scripts/python/arm-cs-trace-disasm.py -- -d llvm-objdump-11 -k ./vmlinux
  ARM CoreSight Trace Data Assembler Dump
  	ffff800008eb3198 <etm4_enable_hw>:
  	ffff800008eb3310: c0 38 00 35  	cbnz	w0, 0xffff800008eb3a28 <etm4_enable_hw+0x890>
  	ffff800008eb3314: 9f 3f 03 d5  	dsb	sy
  	ffff800008eb3318: df 3f 03 d5  	isb
  	ffff800008eb331c: f5 5b 42 a9  	ldp	x21, x22, [sp, #32]
  	ffff800008eb3320: fb 73 45 a9  	ldp	x27, x28, [sp, #80]
  	ffff800008eb3324: e0 82 40 39  	ldrb	w0, [x23, #32]
  	ffff800008eb3328: 60 00 00 34  	cbz	w0, 0xffff800008eb3334 <etm4_enable_hw+0x19c>
  	ffff800008eb332c: e0 03 19 aa  	mov	x0, x25
  	ffff800008eb3330: 8c fe ff 97  	bl	0xffff800008eb2d60 <etm4_cs_lock.isra.0.part.0>
              main  6728/6728  [0004]         0.000000000  etm4_enable_hw+0x198                    [kernel.kallsyms]
  	ffff800008eb2d60 <etm4_cs_lock.isra.0.part.0>:
  	ffff800008eb2d60: 1f 20 03 d5  	nop
  	ffff800008eb2d64: 1f 20 03 d5  	nop
  	ffff800008eb2d68: 3f 23 03 d5  	hint	#25
  	ffff800008eb2d6c: 00 00 40 f9  	ldr	x0, [x0]
  	ffff800008eb2d70: 9f 3f 03 d5  	dsb	sy
  	ffff800008eb2d74: 00 c0 3e 91  	add	x0, x0, #4016
  	ffff800008eb2d78: 1f 00 00 b9  	str	wzr, [x0]
  	ffff800008eb2d7c: bf 23 03 d5  	hint	#29
  	ffff800008eb2d80: c0 03 5f d6  	ret
              main  6728/6728  [0004]         0.000000000  etm4_cs_lock.isra.0.part.0+0x20

  # perf script -s scripts/python/arm-cs-trace-disasm.py -- -d objdump -k ./vmlinux
  ARM CoreSight Trace Data Assembler Dump
  	ffff800008eb3310 <etm4_enable_hw+0x178>:
  	ffff800008eb3310:	350038c0 	cbnz	w0, ffff800008eb3a28 <etm4_enable_hw+0x890>
  	ffff800008eb3314:	d5033f9f 	dsb	sy
  	ffff800008eb3318:	d5033fdf 	isb
  	ffff800008eb331c:	a9425bf5 	ldp	x21, x22, [sp, #32]
  	ffff800008eb3320:	a94573fb 	ldp	x27, x28, [sp, #80]
  	ffff800008eb3324:	394082e0 	ldrb	w0, [x23, #32]
  	ffff800008eb3328:	34000060 	cbz	w0, ffff800008eb3334 <etm4_enable_hw+0x19c>
  	ffff800008eb332c:	aa1903e0 	mov	x0, x25
  	ffff800008eb3330:	97fffe8c 	bl	ffff800008eb2d60 <etm4_cs_lock.isra.0.part.0>
              main  6728/6728  [0004]         0.000000000  etm4_enable_hw+0x198                    [kernel.kallsyms]
  	ffff800008eb2d60 <etm4_cs_lock.isra.0.part.0>:
  	ffff800008eb2d60:	d503201f 	nop
  	ffff800008eb2d64:	d503201f 	nop
  	ffff800008eb2d68:	d503233f 	paciasp
  	ffff800008eb2d6c:	f9400000 	ldr	x0, [x0]
  	ffff800008eb2d70:	d5033f9f 	dsb	sy
  	ffff800008eb2d74:	913ec000 	add	x0, x0, #0xfb0
  	ffff800008eb2d78:	b900001f 	str	wzr, [x0]
  	ffff800008eb2d7c:	d50323bf 	autiasp
  	ffff800008eb2d80:	d65f03c0 	ret
              main  6728/6728  [0004]         0.000000000  etm4_cs_lock.isra.0.part.0+0x20

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Co-authored-by: Al Grant <al.grant@arm.com>
Co-authored-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Co-authored-by: Tor Jeremiassen <tor@ti.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Eelco Chaudron <echaudro@redhat.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Tanmay Jagdale <tanmay@marvell.com>
Cc: coresight@lists.linaro.org
Cc: zengshun . wu <zengshun.wu@outlook.com>
Link: https://lore.kernel.org/r/20220521130446.4163597-3-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
qzed pushed a commit that referenced this pull request Jan 2, 2023
[ Upstream commit bcd7026 ]

By keep sending L2CAP_CONF_REQ packets, chan->num_conf_rsp increases
multiple times and eventually it will wrap around the maximum number
(i.e., 255).
This patch prevents this by adding a boundary check with
L2CAP_MAX_CONF_RSP

Btmon log:
Bluetooth monitor ver 5.64
= Note: Linux version 6.1.0-rc2 (x86_64)                               0.264594
= Note: Bluetooth subsystem version 2.22                               0.264636
@ MGMT Open: btmon (privileged) version 1.22                  {0x0001} 0.272191
= New Index: 00:00:00:00:00:00 (Primary,Virtual,hci0)          [hci0] 13.877604
@ RAW Open: 9496 (privileged) version 2.22                   {0x0002} 13.890741
= Open Index: 00:00:00:00:00:00                                [hci0] 13.900426
(...)
> ACL Data RX: Handle 200 flags 0x00 dlen 1033             #32 [hci0] 14.273106
        invalid packet size (12 != 1033)
        08 00 01 00 02 01 04 00 01 10 ff ff              ............
> ACL Data RX: Handle 200 flags 0x00 dlen 1547             #33 [hci0] 14.273561
        invalid packet size (14 != 1547)
        0a 00 01 00 04 01 06 00 40 00 00 00 00 00        ........@.....
> ACL Data RX: Handle 200 flags 0x00 dlen 2061             #34 [hci0] 14.274390
        invalid packet size (16 != 2061)
        0c 00 01 00 04 01 08 00 40 00 00 00 00 00 00 04  ........@.......
> ACL Data RX: Handle 200 flags 0x00 dlen 2061             #35 [hci0] 14.274932
        invalid packet size (16 != 2061)
        0c 00 01 00 04 01 08 00 40 00 00 00 07 00 03 00  ........@.......
= bluetoothd: Bluetooth daemon 5.43                                   14.401828
> ACL Data RX: Handle 200 flags 0x00 dlen 1033             #36 [hci0] 14.275753
        invalid packet size (12 != 1033)
        08 00 01 00 04 01 04 00 40 00 00 00              ........@...

Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
qzed pushed a commit that referenced this pull request Feb 11, 2023
[ Upstream commit d055368 ]

The conclusion "j1939_session_deactivate() should be called with a
session ref-count of at least 2" is incorrect. In some concurrent
scenarios, j1939_session_deactivate can be called with the session
ref-count less than 2. But there is not any problem because it
will check the session active state before session putting in
j1939_session_deactivate_locked().

Here is the concurrent scenario of the problem reported by syzbot
and my reproduction log.

        cpu0                            cpu1
                                j1939_xtp_rx_eoma
j1939_xtp_rx_abort_one
                                j1939_session_get_by_addr [kref == 2]
j1939_session_get_by_addr [kref == 3]
j1939_session_deactivate [kref == 2]
j1939_session_put [kref == 1]
				j1939_session_completed
				j1939_session_deactivate
				WARN_ON_ONCE(kref < 2)

=====================================================
WARNING: CPU: 1 PID: 21 at net/can/j1939/transport.c:1088 j1939_session_deactivate+0x5f/0x70
CPU: 1 PID: 21 Comm: ksoftirqd/1 Not tainted 5.14.0-rc7+ #32
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
RIP: 0010:j1939_session_deactivate+0x5f/0x70
Call Trace:
 j1939_session_deactivate_activate_next+0x11/0x28
 j1939_xtp_rx_eoma+0x12a/0x180
 j1939_tp_recv+0x4a2/0x510
 j1939_can_recv+0x226/0x380
 can_rcv_filter+0xf8/0x220
 can_receive+0x102/0x220
 ? process_backlog+0xf0/0x2c0
 can_rcv+0x53/0xf0
 __netif_receive_skb_one_core+0x67/0x90
 ? process_backlog+0x97/0x2c0
 __netif_receive_skb+0x22/0x80

Fixes: 0c71437 ("can: j1939: j1939_session_deactivate(): clarify lifetime of session object")
Reported-by: syzbot+9981a614060dcee6eeca@syzkaller.appspotmail.com
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/all/20210906094200.95868-1-william.xuanziyang@huawei.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
iwanders pushed a commit to iwanders/linux-surface-kernel that referenced this pull request Mar 2, 2024
 ------------[ cut here ]------------
 memcpy: detected field-spanning write (size 56) of single field "eseg->inline_hdr.start" at /var/lib/dkms/mlnx-ofed-kernel/5.8/build/drivers/infiniband/hw/mlx5/wr.c:131 (size 2)
 WARNING: CPU: 0 PID: 293779 at /var/lib/dkms/mlnx-ofed-kernel/5.8/build/drivers/infiniband/hw/mlx5/wr.c:131 mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib]
 Modules linked in: 8021q garp mrp stp llc rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) mlx5_core(OE) pci_hyperv_intf mlxdevm(OE) mlx_compat(OE) tls mlxfw(OE) psample nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink mst_pciconf(OE) knem(OE) vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd irqbypass cuse nfsv3 nfs fscache netfs xfrm_user xfrm_algo ipmi_devintf ipmi_msghandler binfmt_misc crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 snd_pcsp aesni_intel crypto_simd cryptd snd_pcm snd_timer joydev snd soundcore input_leds serio_raw evbug nfsd auth_rpcgss nfs_acl lockd grace sch_fq_codel sunrpc drm efi_pstore ip_tables x_tables autofs4 psmouse virtio_net net_failover failover floppy
  [last unloaded: mlx_compat(OE)]
 CPU: 0 PID: 293779 Comm: ssh Tainted: G           OE      6.2.0-32-generic linux-surface#32~22.04.1-Ubuntu
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib]
 Code: 0c 01 00 a8 01 75 25 48 8b 75 a0 b9 02 00 00 00 48 c7 c2 10 5b fd c0 48 c7 c7 80 5b fd c0 c6 05 57 0c 03 00 01 e8 95 4d 93 da <0f> 0b 44 8b 4d b0 4c 8b 45 c8 48 8b 4d c0 e9 49 fb ff ff 41 0f b7
 RSP: 0018:ffffb5b48478b570 EFLAGS: 00010046
 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: ffffb5b48478b628 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffffb5b48478b5e8
 R13: ffff963a3c609b5e R14: ffff9639c3fbd800 R15: ffffb5b480475a80
 FS:  00007fc03b444c80(0000) GS:ffff963a3dc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000556f46bdf000 CR3: 0000000006ac6003 CR4: 00000000003706f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <TASK>
  ? show_regs+0x72/0x90
  ? mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib]
  ? __warn+0x8d/0x160
  ? mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib]
  ? report_bug+0x1bb/0x1d0
  ? handle_bug+0x46/0x90
  ? exc_invalid_op+0x19/0x80
  ? asm_exc_invalid_op+0x1b/0x20
  ? mlx5_ib_post_send+0x191b/0x1a60 [mlx5_ib]
  mlx5_ib_post_send_nodrain+0xb/0x20 [mlx5_ib]
  ipoib_send+0x2ec/0x770 [ib_ipoib]
  ipoib_start_xmit+0x5a0/0x770 [ib_ipoib]
  dev_hard_start_xmit+0x8e/0x1e0
  ? validate_xmit_skb_list+0x4d/0x80
  sch_direct_xmit+0x116/0x3a0
  __dev_xmit_skb+0x1fd/0x580
  __dev_queue_xmit+0x284/0x6b0
  ? _raw_spin_unlock_irq+0xe/0x50
  ? __flush_work.isra.0+0x20d/0x370
  ? push_pseudo_header+0x17/0x40 [ib_ipoib]
  neigh_connected_output+0xcd/0x110
  ip_finish_output2+0x179/0x480
  ? __smp_call_single_queue+0x61/0xa0
  __ip_finish_output+0xc3/0x190
  ip_finish_output+0x2e/0xf0
  ip_output+0x78/0x110
  ? __pfx_ip_finish_output+0x10/0x10
  ip_local_out+0x64/0x70
  __ip_queue_xmit+0x18a/0x460
  ip_queue_xmit+0x15/0x30
  __tcp_transmit_skb+0x914/0x9c0
  tcp_write_xmit+0x334/0x8d0
  tcp_push_one+0x3c/0x60
  tcp_sendmsg_locked+0x2e1/0xac0
  tcp_sendmsg+0x2d/0x50
  inet_sendmsg+0x43/0x90
  sock_sendmsg+0x68/0x80
  sock_write_iter+0x93/0x100
  vfs_write+0x326/0x3c0
  ksys_write+0xbd/0xf0
  ? do_syscall_64+0x69/0x90
  __x64_sys_write+0x19/0x30
  do_syscall_64+0x59/0x90
  ? do_user_addr_fault+0x1d0/0x640
  ? exit_to_user_mode_prepare+0x3b/0xd0
  ? irqentry_exit_to_user_mode+0x9/0x20
  ? irqentry_exit+0x43/0x50
  ? exc_page_fault+0x92/0x1b0
  entry_SYSCALL_64_after_hwframe+0x72/0xdc
 RIP: 0033:0x7fc03ad14a37
 Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
 RSP: 002b:00007ffdf8697fe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
 RAX: ffffffffffffffda RBX: 0000000000008024 RCX: 00007fc03ad14a37
 RDX: 0000000000008024 RSI: 0000556f46bd8270 RDI: 0000000000000003
 RBP: 0000556f46bb1800 R08: 0000000000007fe3 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
 R13: 0000556f46bc66b0 R14: 000000000000000a R15: 0000556f46bb2f50
  </TASK>
 ---[ end trace 0000000000000000 ]---

Link: https://lore.kernel.org/r/8228ad34bd1a25047586270f7b1fb4ddcd046282.1706433934.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
StollD pushed a commit that referenced this pull request Apr 16, 2024
[ Upstream commit 5ee0d47 ]

Currently, when using non-blocking commits, we can see the following
kernel warning:

[  110.908514] ------------[ cut here ]------------
[  110.908529] refcount_t: underflow; use-after-free.
[  110.908620] WARNING: CPU: 0 PID: 1866 at lib/refcount.c:87 refcount_dec_not_one+0xb8/0xc0
[  110.908664] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device cmac algif_hash aes_arm64 aes_generic algif_skcipher af_alg bnep hid_logitech_hidpp vc4 brcmfmac hci_uart btbcm brcmutil bluetooth snd_soc_hdmi_codec cfg80211 cec drm_display_helper drm_dma_helper drm_kms_helper snd_soc_core snd_compress snd_pcm_dmaengine fb_sys_fops sysimgblt syscopyarea sysfillrect raspberrypi_hwmon ecdh_generic ecc rfkill libaes i2c_bcm2835 binfmt_misc joydev snd_bcm2835(C) bcm2835_codec(C) bcm2835_isp(C) v4l2_mem2mem videobuf2_dma_contig snd_pcm bcm2835_v4l2(C) raspberrypi_gpiomem bcm2835_mmal_vchiq(C) videobuf2_v4l2 snd_timer videobuf2_vmalloc videobuf2_memops videobuf2_common snd videodev vc_sm_cma(C) mc hid_logitech_dj uio_pdrv_genirq uio i2c_dev drm fuse dm_mod drm_panel_orientation_quirks backlight ip_tables x_tables ipv6
[  110.909086] CPU: 0 PID: 1866 Comm: kodi.bin Tainted: G         C         6.1.66-v8+ #32
[  110.909104] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
[  110.909114] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  110.909132] pc : refcount_dec_not_one+0xb8/0xc0
[  110.909152] lr : refcount_dec_not_one+0xb4/0xc0
[  110.909170] sp : ffffffc00913b9c0
[  110.909177] x29: ffffffc00913b9c0 x28: 000000556969bbb0 x27: 000000556990df60
[  110.909205] x26: 0000000000000002 x25: 0000000000000004 x24: ffffff8004448480
[  110.909230] x23: ffffff800570b500 x22: ffffff802e03a7bc x21: ffffffecfca68c78
[  110.909257] x20: ffffff8002b42000 x19: ffffff802e03a600 x18: 0000000000000000
[  110.909283] x17: 0000000000000011 x16: ffffffffffffffff x15: 0000000000000004
[  110.909308] x14: 0000000000000fff x13: ffffffed577e47e0 x12: 0000000000000003
[  110.909333] x11: 0000000000000000 x10: 0000000000000027 x9 : c912d0d083728c00
[  110.909359] x8 : c912d0d083728c00 x7 : 65646e75203a745f x6 : 746e756f63666572
[  110.909384] x5 : ffffffed579f62ee x4 : ffffffed579eb01e x3 : 0000000000000000
[  110.909409] x2 : 0000000000000000 x1 : ffffffc00913b750 x0 : 0000000000000001
[  110.909434] Call trace:
[  110.909441]  refcount_dec_not_one+0xb8/0xc0
[  110.909461]  vc4_bo_dec_usecnt+0x4c/0x1b0 [vc4]
[  110.909903]  vc4_cleanup_fb+0x44/0x50 [vc4]
[  110.910315]  drm_atomic_helper_cleanup_planes+0x88/0xa4 [drm_kms_helper]
[  110.910669]  vc4_atomic_commit_tail+0x390/0x9dc [vc4]
[  110.911079]  commit_tail+0xb0/0x164 [drm_kms_helper]
[  110.911397]  drm_atomic_helper_commit+0x1d0/0x1f0 [drm_kms_helper]
[  110.911716]  drm_atomic_commit+0xb0/0xdc [drm]
[  110.912569]  drm_mode_atomic_ioctl+0x348/0x4b8 [drm]
[  110.913330]  drm_ioctl_kernel+0xec/0x15c [drm]
[  110.914091]  drm_ioctl+0x24c/0x3b0 [drm]
[  110.914850]  __arm64_sys_ioctl+0x9c/0xd4
[  110.914873]  invoke_syscall+0x4c/0x114
[  110.914897]  el0_svc_common+0xd0/0x118
[  110.914917]  do_el0_svc+0x38/0xd0
[  110.914936]  el0_svc+0x30/0x8c
[  110.914958]  el0t_64_sync_handler+0x84/0xf0
[  110.914979]  el0t_64_sync+0x18c/0x190
[  110.914996] ---[ end trace 0000000000000000 ]---

This happens because, although `prepare_fb` and `cleanup_fb` are
perfectly balanced, we cannot guarantee consistency in the check
plane->state->fb == state->fb. This means that sometimes we can increase
the refcount in `prepare_fb` and don't decrease it in `cleanup_fb`. The
opposite can also be true.

In fact, the struct drm_plane .state shouldn't be accessed directly
but instead, the `drm_atomic_get_new_plane_state()` helper function should
be used. So, we could stick to this check, but using
`drm_atomic_get_new_plane_state()`. But actually, this check is not really
needed. We can increase and decrease the refcount symmetrically without
problems.

This is going to make the code more simple and consistent.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240105175908.242000-1-mcanal@igalia.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants