Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sun4i_defconfig (et al) suggestions #52

Open
romanrm opened this issue Jul 6, 2012 · 15 comments
Open

sun4i_defconfig (et al) suggestions #52

romanrm opened this issue Jul 6, 2012 · 15 comments

Comments

@romanrm
Copy link

romanrm commented Jul 6, 2012

# CONFIG_IPV6 is not set
Please change to =m

# CONFIG_TUN is not set
same here

# CONFIG_NETFILTER is not set
same here, but this also opens up a huge subsection to configure, I'd suggest copying this section verbatim from Debian.

# USB Network Adapters
#
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_USBNET is not set
# CONFIG_USB_HSO is not set
# CONFIG_USB_IPHETH is not set
# CONFIG_WAN is not set

I suggest enabling all these as =m, needed on the MK802 for wired Ethernet via USB.

CONFIG_NO_HZ=y
but on http://rhombus-tech.net/allwinner_a10/hacking_the_mele_a1000/server/ someone wrote "tickless kernel doesn't work. (If you turn it on, the system will hang randomly.)" -- is this no longer true?

@turl
Copy link

turl commented Jul 6, 2012

Disabling HIGHMEM might be interesting too, but for devices with >512M RAM (and maybe <512 too?) it causes large quantities of RAM to remain unused due to the way allocations are handled. HIGHMEM works around this issue.

@amery
Copy link
Member

amery commented Jul 9, 2012

@romanrm can you make a paste with the desired CONFIG_NETFILTER=m values?

@amery
Copy link
Member

amery commented Jul 9, 2012

9638828 enables IPV6 as built-in (doesn't compile as m), and tun and usb network stuff as m too.

pending is still netfilter and no_hz.

edit: did I mention I really prefer patches?

@romanrm
Copy link
Author

romanrm commented Jul 10, 2012

@amery I got a better idea than copying from debian or inventing our own desired values: why not merge the netfilter section from the defconfig of x86?

@j1nx
Copy link

j1nx commented Jul 10, 2012

OK while we are at it;

CONFIG_ANDROID=y
CONFIG_ANDROID_BINDER_IPC=y
CONFIG_ANDROID_LOGGER=y
CONFIG_ANDROID_RAM_CONSOLE=y
CONFIG_ANDROID_RAM_CONSOLE_ENABLE_VERBOSE=y
CONFIG_ANDROID_RAM_CONSOLE_EARLY_INIT=y
CONFIG_ANDROID_RAM_CONSOLE_EARLY_ADDR=0
CONFIG_ANDROID_RAM_CONSOLE_EARLY_SIZE=0
CONFIG_ANDROID_TIMED_OUTPUT=y
CONFIG_ANDROID_LOW_MEMORY_KILLER=y

can be replaced by;

'# CONFIG_ANDROID is not set

@turl
Copy link

turl commented Jul 10, 2012

I personally would keep CONFIG_ANDROID_RAM_CONSOLE=y; it's always nice to have, even if you're not running Android, and can help catch unexpected kernel panics.

I would also set CONFIG_PANIC_TIMEOUT=1 to reboot on kernel panic.

@j1nx
Copy link

j1nx commented Jul 10, 2012

@turl

All above android is in the normal defconfig file and should only be in the crane version if you ask me.

I am not aware you can use the ram console under linux? Is that true? Might help MK802 users who can't solder that good (<-Me soon)

@amery; Exactly.

@rsalvaterra-zz
Copy link

Here are my suggestions (sorry for not posting a real patch)... :)

CONFIG_ZRAM is not set
zRam (compressed RAM block device) is extremely useful for maximising available memory, this should be built as a module.

CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
Optimising for size. This could be a win (must be benchmarked, though) due to better cache usage.

CONFIG_KERNEL_LZO is not set
This should be enabled. Although the resulting image will be larger, boot times will be shorter (LZO decompresses much faster than gzip).

CONFIG_HIGHMEM=y
Is it really needed? IIRC, only Cortex-A15 supports LPAE.

CONFIG_DNOTIFY=y
dnotify is obsolete, no?

@turl
Copy link

turl commented Jul 10, 2012

@j1nx theorically you should be able to, as far as I'm aware it has no userspace dependencies.

@rsalvaterra look at my first comment on here regarding HIGHMEM

@rsalvaterra-zz
Copy link

@turl sorry, I missed it. Anyway, wouldn't such issue qualify as an upstream bug? Or is there something that makes ARMv7 special as far as memory allocation is concerned?

@j1nx
Copy link

j1nx commented Jul 10, 2012

@turl
Nice, but should we then not better change the source to reflect; CONFIG_RAM_CONSOLE=y (without ANDROID)

@amery
Copy link
Member

amery commented Jul 10, 2012

23e5456 enables zram as m on all *_defconfigs, and disables android in sun4i_defconfig.

pending discussion: no_hz, highmem, default optimizations (profiling needed), lzo, dnotify,
pending patch: netfilter, ram_console without android (@turl).

@Vayu
Copy link

Vayu commented Jul 10, 2012

Few things which caught my eye, feel free to ignore

CONFIG_SPARSE_IRQ=y
Can be disabled since it is only relevant for multi-core systems, which sun4i is not (e.g. Mele has total 96 interrupts, compared to average x86 system which has around 1000)

CONFIG_COMPACTION=y
CONFIG_MIGRATION=y
These two are related to hugepages support, which is not present in 3.0.x for ARM arch. Probably does nothing, still safer to disable.

@turl
Copy link

turl commented Jul 11, 2012

@j1nx I didn't invent it, it's upstream code. I don't think changing the name is a good idea.

@rsalvaterra-zz
Copy link

@turl I dug a little deeper, it seems the HIGHMEM workaround is related to DMA mappings reserved at boot. I believe the DMA mapping changes (and the contiguous memory allocator) in Linux 3.5 will fix the problem, but I could be wrong...

amery pushed a commit that referenced this issue Nov 12, 2013
As the new x86 CPU bootup printout format code maintainer, I am
taking immediate action to improve and clean (and thus indulge
my OCD) the reporting of the cores when coming up online.

Fix padding to a right-hand alignment, cleanup code and bind
reporting width to the max number of supported CPUs on the
system, like this:

 [    0.074509] smpboot: Booting Node   0, Processors:      #1  #2  #3  #4  #5  #6  #7 OK
 [    0.644008] smpboot: Booting Node   1, Processors:  #8  #9 #10 #11 #12 #13 #14 #15 OK
 [    1.245006] smpboot: Booting Node   2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK
 [    1.864005] smpboot: Booting Node   3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK
 [    2.489005] smpboot: Booting Node   4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK
 [    3.093005] smpboot: Booting Node   5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK
 [    3.698005] smpboot: Booting Node   6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK
 [    4.304005] smpboot: Booting Node   7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK
 [    4.961413] Brought up 64 CPUs

and this:

 [    0.072367] smpboot: Booting Node   0, Processors:    #1 #2 #3 #4 #5 #6 #7 OK
 [    0.686329] Brought up 8 CPUs

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Libin <huawei.libin@huawei.com>
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
amery pushed a commit that referenced this issue Nov 12, 2013
Turn it into (for example):

[    0.073380] x86: Booting SMP configuration:
[    0.074005] .... node   #0, CPUs:          #1   #2   #3   #4   #5   #6   #7
[    0.603005] .... node   #1, CPUs:     #8   #9  #10  #11  #12  #13  #14  #15
[    1.200005] .... node   #2, CPUs:    #16  #17  #18  #19  #20  #21  #22  #23
[    1.796005] .... node   #3, CPUs:    #24  #25  #26  #27  #28  #29  #30  #31
[    2.393005] .... node   #4, CPUs:    #32  #33  #34  #35  #36  #37  #38  #39
[    2.996005] .... node   #5, CPUs:    #40  #41  #42  #43  #44  #45  #46  #47
[    3.600005] .... node   #6, CPUs:    #48  #49  #50  #51  #52  #53  #54  #55
[    4.202005] .... node   #7, CPUs:    #56  #57  #58  #59  #60  #61  #62  #63
[    4.811005] .... node   #8, CPUs:    #64  #65  #66  #67  #68  #69  #70  #71
[    5.421006] .... node   #9, CPUs:    #72  #73  #74  #75  #76  #77  #78  #79
[    6.032005] .... node  #10, CPUs:    #80  #81  #82  #83  #84  #85  #86  #87
[    6.648006] .... node  #11, CPUs:    #88  #89  #90  #91  #92  #93  #94  #95
[    7.262005] .... node  #12, CPUs:    #96  #97  #98  #99 #100 #101 #102 #103
[    7.865005] .... node  #13, CPUs:   #104 #105 #106 #107 #108 #109 #110 #111
[    8.466005] .... node  #14, CPUs:   #112 #113 #114 #115 #116 #117 #118 #119
[    9.073006] .... node  #15, CPUs:   #120 #121 #122 #123 #124 #125 #126 #127
[    9.679901] x86: Booted up 16 nodes, 128 CPUs

and drop useless elements.

Change num_digits() to hpa's division-avoiding, cell-phone-typed
version which he went at great lengths and pains to submit on a
Saturday evening.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: huawei.libin@huawei.com
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
amery pushed a commit that referenced this issue Jan 24, 2014
The irq data for rng module defined in hwmod data previously
missed the OMAP_INTC_START relative offset, so the interrupt
number is probably misconfigured during the DT node addition
adjusting for this OMAP_INTC_START. Interrupt #36 is associated
with a watchdog timer, so fix the rng module's interrupt to the
appropriate interrupt #52.

Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
amery pushed a commit that referenced this issue Apr 8, 2014
MIPI Block #52 which provides configuration details for the MIPI panel
including dphy settings as per panel and tcon specs

Block #53 gives information on panel enable sequences

v2: Address review comemnts from Jani
    - Move panel ids from intel_dsi.h to intel_bios.h
    - bdb_mipi_config structure improvements for cleaner code
    - Adding units for the pps delays, all in ms
    - change data structure to be more cleaner and simple

v3: Corrected the unit for pps delays as 100us

Signed-off-by: Shobhit Kumar <shobhit.kumar@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
amery pushed a commit that referenced this issue Jun 12, 2014
The parser extracts the config block(#52) and sequence(#53) data
and store in private data structures.

v2: Address review comments by Jani
    - adjust code for the structure changes for bdb_mipi_config
    - add boundry and buffer overflow checks as suggested
    - use kmemdup instead of kmalloc and memcpy

v3: More strict check while parsing VBT
    - Ensure that at anytime we do not go beyond sequence block
      while parsing
    - On unknown element fail the whole parsing

v4: Style changes and spell check mostly as suggested by Jani

Signed-off-by: Shobhit Kumar <shobhit.kumar@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
amery pushed a commit that referenced this issue Jun 12, 2014
This driver makes use of the generic panel information from the VBT.
Panel information is classified into two - panel configuration and panel
power sequence which is unique to each panel. The generic driver uses the
panel configuration and sequence parsed from VBT block #52 and #53

v2: Address review comments by Jani
    - Move all of the things in driver c file from header
    - Make all functions static
    - Make use of video/mipi_display.c instead of redefining
    - Null checks during sequence execution

v3: Address review comments by Damien
    - Rename the panel driver file as intel_dsi_panel_vbt.c
    - Fix style changes as suggested
    - Correct comments for lp->hs and hs->lp count calculations
    - General updating comments to have more clarity
    - using max() instead of ternary operator
    - Fix names (ui_num, ui_den) while using UI in calculations
    - compute max of lp_to_hs switch and hs_to_lp switch while computing
      hs_lp_switch_count

Signed-off-by: Shobhit Kumar <shobhit.kumar@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
amery pushed a commit that referenced this issue Aug 25, 2014
…ode()

In __ftrace_replace_code(), when converting the call to a nop in a function
it needs to compare against the "curr" (current) value of the ftrace ops, and
not the "new" one. It currently does not affect x86 which is the only arch
to do the trampolines with function graph tracer, but when other archs that do
depend on this code implement the function graph trampoline, it can crash.

Here's an example when ARM uses the trampolines (in the future):

 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 9 at kernel/trace/ftrace.c:1716 ftrace_bug+0x17c/0x1f4()
 Modules linked in: omap_rng rng_core ipv6
 CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.16.0-test-10959-gf0094b28f303-dirty #52
 [<c02188f4>] (unwind_backtrace) from [<c021343c>] (show_stack+0x20/0x24)
 [<c021343c>] (show_stack) from [<c095a674>] (dump_stack+0x78/0x94)
 [<c095a674>] (dump_stack) from [<c02532a0>] (warn_slowpath_common+0x7c/0x9c)
 [<c02532a0>] (warn_slowpath_common) from [<c02532ec>] (warn_slowpath_null+0x2c/0x34)
 [<c02532ec>] (warn_slowpath_null) from [<c02cbac4>] (ftrace_bug+0x17c/0x1f4)
 [<c02cbac4>] (ftrace_bug) from [<c02cc44c>] (ftrace_replace_code+0x80/0x9c)
 [<c02cc44c>] (ftrace_replace_code) from [<c02cc658>] (ftrace_modify_all_code+0xb8/0x164)
 [<c02cc658>] (ftrace_modify_all_code) from [<c02cc718>] (__ftrace_modify_code+0x14/0x1c)
 [<c02cc718>] (__ftrace_modify_code) from [<c02c7244>] (multi_cpu_stop+0xf4/0x134)
 [<c02c7244>] (multi_cpu_stop) from [<c02c6e90>] (cpu_stopper_thread+0x54/0x130)
 [<c02c6e90>] (cpu_stopper_thread) from [<c0271cd4>] (smpboot_thread_fn+0x1ac/0x1bc)
 [<c0271cd4>] (smpboot_thread_fn) from [<c026ddf0>] (kthread+0xe0/0xfc)
 [<c026ddf0>] (kthread) from [<c020f318>] (ret_from_fork+0x14/0x20)
 ---[ end trace dc9ce72c5b617d8f ]---
[   65.047264] ftrace failed to modify [<c0208580>] asm_do_IRQ+0x10/0x1c
[   65.054070]  actual: 85:1b:00:eb

Fixes: 7413af1 "ftrace: Make get_ftrace_addr() and get_ftrace_addr_old() global"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
amery pushed a commit that referenced this issue Jun 25, 2016
mem_cgroup_migrate() uses local_irq_disable/enable() but can be called
with irq disabled from migrate_page_copy().  This ends up enabling irq
while holding a irq context lock triggering the following lockdep
warning.  Fix it by using irq_save/restore instead.

  =================================
  [ INFO: inconsistent lock state ]
  4.7.0-rc1+ #52 Tainted: G        W
  ---------------------------------
  inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
  kcompactd0/151 [HC0[0]:SC0[0]:HE1:SE1] takes:
   (&(&ctx->completion_lock)->rlock){+.?.-.}, at: [<000000000038fd96>] aio_migratepage+0x156/0x1e8
  {IN-SOFTIRQ-W} state was registered at:
     __lock_acquire+0x5b6/0x1930
     lock_acquire+0xee/0x270
     _raw_spin_lock_irqsave+0x66/0xb0
     aio_complete+0x98/0x328
     dio_complete+0xe4/0x1e0
     blk_update_request+0xd4/0x450
     scsi_end_request+0x48/0x1c8
     scsi_io_completion+0x272/0x698
     blk_done_softirq+0xca/0xe8
     __do_softirq+0xc8/0x518
     irq_exit+0xee/0x110
     do_IRQ+0x6a/0x88
     io_int_handler+0x11a/0x25c
     __mutex_unlock_slowpath+0x144/0x1d8
     __mutex_unlock_slowpath+0x140/0x1d8
     kernfs_iop_permission+0x64/0x80
     __inode_permission+0x9e/0xf0
     link_path_walk+0x6e/0x510
     path_lookupat+0xc4/0x1a8
     filename_lookup+0x9c/0x160
     user_path_at_empty+0x5c/0x70
     SyS_readlinkat+0x68/0x140
     system_call+0xd6/0x270
  irq event stamp: 971410
  hardirqs last  enabled at (971409):  migrate_page_move_mapping+0x3ea/0x588
  hardirqs last disabled at (971410):  _raw_spin_lock_irqsave+0x3c/0xb0
  softirqs last  enabled at (970526):  __do_softirq+0x460/0x518
  softirqs last disabled at (970519):  irq_exit+0xee/0x110

  other info that might help us debug this:
   Possible unsafe locking scenario:

	 CPU0
	 ----
    lock(&(&ctx->completion_lock)->rlock);
    <Interrupt>
      lock(&(&ctx->completion_lock)->rlock);

    *** DEADLOCK ***

  3 locks held by kcompactd0/151:
   #0:  (&(&mapping->private_lock)->rlock){+.+.-.}, at:  aio_migratepage+0x42/0x1e8
   #1:  (&ctx->ring_lock){+.+.+.}, at:  aio_migratepage+0x5a/0x1e8
   #2:  (&(&ctx->completion_lock)->rlock){+.?.-.}, at:  aio_migratepage+0x156/0x1e8

  stack backtrace:
  CPU: 20 PID: 151 Comm: kcompactd0 Tainted: G        W       4.7.0-rc1+ #52
  Call Trace:
    show_trace+0xea/0xf0
    show_stack+0x72/0xf0
    dump_stack+0x9a/0xd8
    print_usage_bug.part.27+0x2d4/0x2e8
    mark_lock+0x17e/0x758
    mark_held_locks+0xa2/0xd0
    trace_hardirqs_on_caller+0x140/0x1c0
    mem_cgroup_migrate+0x266/0x370
    aio_migratepage+0x16a/0x1e8
    move_to_new_page+0xb0/0x260
    migrate_pages+0x8f4/0x9f0
    compact_zone+0x4dc/0xdc8
    kcompactd_do_work+0x1aa/0x358
    kcompactd+0xba/0x2c8
    kthread+0x10a/0x110
    kernel_thread_starter+0x6/0xc
    kernel_thread_starter+0x0/0xc
  INFO: lockdep is turned off.

Link: http://lkml.kernel.org/r/20160620184158.GO3262@mtj.duckdns.org
Link: http://lkml.kernel.org/g/5767CFE5.7080904@de.ibm.com
Fixes: 74485cf ("mm: migrate: consolidate mem_cgroup_migrate() calls")
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: <stable@vger.kernel.org>	[4.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
amery pushed a commit that referenced this issue Oct 29, 2016
When interrupting an application which was allocating DMAable
memory, it was possible, that the DMA memory was deallocated
twice, leading to the error symptoms below.

Thanks to Gerald, who analyzed the problem and provided this
patch.

I agree with his analysis of the problem: ddcb_cmd_fixups() ->
genwqe_alloc_sync_sgl() (fails in f/lpage, but sgl->sgl != NULL
and f/lpage maybe also != NULL) -> ddcb_cmd_cleanup() ->
genwqe_free_sync_sgl() (double free, because sgl->sgl != NULL and
f/lpage maybe also != NULL)

In this scenario we would have exactly the kind of double free that
would explain the WARNING / Bad page state, and as expected it is
caused by broken error handling (cleanup).

Using the Ubuntu git source, tag Ubuntu-4.4.0-33.52, he was able to reproduce
the "Bad page state" issue, and with the patch on top he could not reproduce
it any more.

------------[ cut here ]------------
WARNING: at /build/linux-o03cxz/linux-4.4.0/arch/s390/include/asm/pci_dma.h:141
Modules linked in: qeth_l2 ghash_s390 prng aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common genwqe_card qeth crc_itu_t qdio ccwgroup vmur dm_multipath dasd_eckd_mod dasd_mod
CPU: 2 PID: 3293 Comm: genwqe_gunzip Not tainted 4.4.0-33-generic #52-Ubuntu
task: 0000000032c7e270 ti: 00000000324e4000 task.ti: 00000000324e4000
Krnl PSW : 0404c00180000000 0000000000156346 (dma_update_cpu_trans+0x9e/0xa8)
           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
Krnl GPRS: 00000000324e7bcd 0000000000c3c34a 0000000027628298 000000003215b400
           0000000000000400 0000000000001fff 0000000000000400 0000000116853000
           07000000324e7b1e 0000000000000001 0000000000000001 0000000000000001
           0000000000001000 0000000116854000 0000000000156402 00000000324e7a38
Krnl Code: 000000000015633a: 95001000           cli     0(%r1),0
           000000000015633e: a774ffc3           brc     7,1562c4
          #0000000000156342: a7f40001           brc     15,156344
          >0000000000156346: 92011000           mvi     0(%r1),1
           000000000015634a: a7f4ffbd           brc     15,1562c4
           000000000015634e: 0707               bcr     0,%r7
           0000000000156350: c00400000000       brcl    0,156350
           0000000000156356: eb7ff0500024       stmg    %r7,%r15,80(%r15)
Call Trace:
([<00000000001563e0>] dma_update_trans+0x90/0x228)
 [<00000000001565dc>] s390_dma_unmap_pages+0x64/0x160
 [<00000000001567c2>] s390_dma_free+0x62/0x98
 [<000003ff801310ce>] __genwqe_free_consistent+0x56/0x70 [genwqe_card]
 [<000003ff801316d0>] genwqe_free_sync_sgl+0xf8/0x160 [genwqe_card]
 [<000003ff8012bd6e>] ddcb_cmd_cleanup+0x86/0xa8 [genwqe_card]
 [<000003ff8012c1c0>] do_execute_ddcb+0x110/0x348 [genwqe_card]
 [<000003ff8012c914>] genwqe_ioctl+0x51c/0xc20 [genwqe_card]
 [<000000000032513a>] do_vfs_ioctl+0x3b2/0x518
 [<0000000000325344>] SyS_ioctl+0xa4/0xb8
 [<00000000007b86c6>] system_call+0xd6/0x264
 [<000003ff9e8e520a>] 0x3ff9e8e520a
Last Breaking-Event-Address:
 [<0000000000156342>] dma_update_cpu_trans+0x9a/0xa8
---[ end trace 35996336235145c8 ]---
BUG: Bad page state in process jbd2/dasdb1-8  pfn:3215b
page:000003d100c856c0 count:-1 mapcount:0 mapping:          (null) index:0x0
flags: 0x3fffc0000000000()
page dumped because: nonzero _count

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Frank Haverkamp <haver@linux.vnet.ibm.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
repojohnray pushed a commit to repojohnray/linux-sunxi-4.7.y that referenced this issue Nov 10, 2016
commit a7a7aee upstream.

When interrupting an application which was allocating DMAable
memory, it was possible, that the DMA memory was deallocated
twice, leading to the error symptoms below.

Thanks to Gerald, who analyzed the problem and provided this
patch.

I agree with his analysis of the problem: ddcb_cmd_fixups() ->
genwqe_alloc_sync_sgl() (fails in f/lpage, but sgl->sgl != NULL
and f/lpage maybe also != NULL) -> ddcb_cmd_cleanup() ->
genwqe_free_sync_sgl() (double free, because sgl->sgl != NULL and
f/lpage maybe also != NULL)

In this scenario we would have exactly the kind of double free that
would explain the WARNING / Bad page state, and as expected it is
caused by broken error handling (cleanup).

Using the Ubuntu git source, tag Ubuntu-4.4.0-33.52, he was able to reproduce
the "Bad page state" issue, and with the patch on top he could not reproduce
it any more.

------------[ cut here ]------------
WARNING: at /build/linux-o03cxz/linux-4.4.0/arch/s390/include/asm/pci_dma.h:141
Modules linked in: qeth_l2 ghash_s390 prng aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common genwqe_card qeth crc_itu_t qdio ccwgroup vmur dm_multipath dasd_eckd_mod dasd_mod
CPU: 2 PID: 3293 Comm: genwqe_gunzip Not tainted 4.4.0-33-generic linux-sunxi#52-Ubuntu
task: 0000000032c7e270 ti: 00000000324e4000 task.ti: 00000000324e4000
Krnl PSW : 0404c00180000000 0000000000156346 (dma_update_cpu_trans+0x9e/0xa8)
           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
Krnl GPRS: 00000000324e7bcd 0000000000c3c34a 0000000027628298 000000003215b400
           0000000000000400 0000000000001fff 0000000000000400 0000000116853000
           07000000324e7b1e 0000000000000001 0000000000000001 0000000000000001
           0000000000001000 0000000116854000 0000000000156402 00000000324e7a38
Krnl Code: 000000000015633a: 95001000           cli     0(%r1),0
           000000000015633e: a774ffc3           brc     7,1562c4
          #0000000000156342: a7f40001           brc     15,156344
          >0000000000156346: 92011000           mvi     0(%r1),1
           000000000015634a: a7f4ffbd           brc     15,1562c4
           000000000015634e: 0707               bcr     0,%r7
           0000000000156350: c00400000000       brcl    0,156350
           0000000000156356: eb7ff0500024       stmg    %r7,%r15,80(%r15)
Call Trace:
([<00000000001563e0>] dma_update_trans+0x90/0x228)
 [<00000000001565dc>] s390_dma_unmap_pages+0x64/0x160
 [<00000000001567c2>] s390_dma_free+0x62/0x98
 [<000003ff801310ce>] __genwqe_free_consistent+0x56/0x70 [genwqe_card]
 [<000003ff801316d0>] genwqe_free_sync_sgl+0xf8/0x160 [genwqe_card]
 [<000003ff8012bd6e>] ddcb_cmd_cleanup+0x86/0xa8 [genwqe_card]
 [<000003ff8012c1c0>] do_execute_ddcb+0x110/0x348 [genwqe_card]
 [<000003ff8012c914>] genwqe_ioctl+0x51c/0xc20 [genwqe_card]
 [<000000000032513a>] do_vfs_ioctl+0x3b2/0x518
 [<0000000000325344>] SyS_ioctl+0xa4/0xb8
 [<00000000007b86c6>] system_call+0xd6/0x264
 [<000003ff9e8e520a>] 0x3ff9e8e520a
Last Breaking-Event-Address:
 [<0000000000156342>] dma_update_cpu_trans+0x9a/0xa8
---[ end trace 35996336235145c8 ]---
BUG: Bad page state in process jbd2/dasdb1-8  pfn:3215b
page:000003d100c856c0 count:-1 mapcount:0 mapping:          (null) index:0x0
flags: 0x3fffc0000000000()
page dumped because: nonzero _count

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Frank Haverkamp <haver@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
amery pushed a commit that referenced this issue Apr 4, 2017
[ Upstream commit a7a7aee ]

When interrupting an application which was allocating DMAable
memory, it was possible, that the DMA memory was deallocated
twice, leading to the error symptoms below.

Thanks to Gerald, who analyzed the problem and provided this
patch.

I agree with his analysis of the problem: ddcb_cmd_fixups() ->
genwqe_alloc_sync_sgl() (fails in f/lpage, but sgl->sgl != NULL
and f/lpage maybe also != NULL) -> ddcb_cmd_cleanup() ->
genwqe_free_sync_sgl() (double free, because sgl->sgl != NULL and
f/lpage maybe also != NULL)

In this scenario we would have exactly the kind of double free that
would explain the WARNING / Bad page state, and as expected it is
caused by broken error handling (cleanup).

Using the Ubuntu git source, tag Ubuntu-4.4.0-33.52, he was able to reproduce
the "Bad page state" issue, and with the patch on top he could not reproduce
it any more.

------------[ cut here ]------------
WARNING: at /build/linux-o03cxz/linux-4.4.0/arch/s390/include/asm/pci_dma.h:141
Modules linked in: qeth_l2 ghash_s390 prng aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common genwqe_card qeth crc_itu_t qdio ccwgroup vmur dm_multipath dasd_eckd_mod dasd_mod
CPU: 2 PID: 3293 Comm: genwqe_gunzip Not tainted 4.4.0-33-generic #52-Ubuntu
task: 0000000032c7e270 ti: 00000000324e4000 task.ti: 00000000324e4000
Krnl PSW : 0404c00180000000 0000000000156346 (dma_update_cpu_trans+0x9e/0xa8)
           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
Krnl GPRS: 00000000324e7bcd 0000000000c3c34a 0000000027628298 000000003215b400
           0000000000000400 0000000000001fff 0000000000000400 0000000116853000
           07000000324e7b1e 0000000000000001 0000000000000001 0000000000000001
           0000000000001000 0000000116854000 0000000000156402 00000000324e7a38
Krnl Code: 000000000015633a: 95001000           cli     0(%r1),0
           000000000015633e: a774ffc3           brc     7,1562c4
          #0000000000156342: a7f40001           brc     15,156344
          >0000000000156346: 92011000           mvi     0(%r1),1
           000000000015634a: a7f4ffbd           brc     15,1562c4
           000000000015634e: 0707               bcr     0,%r7
           0000000000156350: c00400000000       brcl    0,156350
           0000000000156356: eb7ff0500024       stmg    %r7,%r15,80(%r15)
Call Trace:
([<00000000001563e0>] dma_update_trans+0x90/0x228)
 [<00000000001565dc>] s390_dma_unmap_pages+0x64/0x160
 [<00000000001567c2>] s390_dma_free+0x62/0x98
 [<000003ff801310ce>] __genwqe_free_consistent+0x56/0x70 [genwqe_card]
 [<000003ff801316d0>] genwqe_free_sync_sgl+0xf8/0x160 [genwqe_card]
 [<000003ff8012bd6e>] ddcb_cmd_cleanup+0x86/0xa8 [genwqe_card]
 [<000003ff8012c1c0>] do_execute_ddcb+0x110/0x348 [genwqe_card]
 [<000003ff8012c914>] genwqe_ioctl+0x51c/0xc20 [genwqe_card]
 [<000000000032513a>] do_vfs_ioctl+0x3b2/0x518
 [<0000000000325344>] SyS_ioctl+0xa4/0xb8
 [<00000000007b86c6>] system_call+0xd6/0x264
 [<000003ff9e8e520a>] 0x3ff9e8e520a
Last Breaking-Event-Address:
 [<0000000000156342>] dma_update_cpu_trans+0x9a/0xa8
---[ end trace 35996336235145c8 ]---
BUG: Bad page state in process jbd2/dasdb1-8  pfn:3215b
page:000003d100c856c0 count:-1 mapcount:0 mapping:          (null) index:0x0
flags: 0x3fffc0000000000()
page dumped because: nonzero _count

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Frank Haverkamp <haver@linux.vnet.ibm.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
codekipper pushed a commit to codekipper/linux-sunxi that referenced this issue Jul 29, 2017
If the request has been completed before the reset took effect, we don't
need to mark it up as being a victim. Touching fence->error after the
fence has been signaled is detected by dma_fence_set_error() and
triggers a BUG:

[  231.743133] kernel BUG at ./include/linux/dma-fence.h:434!
[  231.743156] invalid opcode: 0000 [#1] SMP KASAN
[  231.743172] Modules linked in: i915 drm_kms_helper drm iptable_nat nf_nat_ipv4 nf_nat x86_pkg_temp_thermal iosf_mbi i2c_algo_bit cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea fb font fbdev [last unloaded: drm]
[  231.743221] CPU: 2 PID: 20 Comm: kworker/2:0 Tainted: G     U          4.13.0-rc1+ linux-sunxi#52
[  231.743236] Hardware name: Hewlett-Packard HP EliteBook 8460p/161C, BIOS 68SCF Ver. F.01 03/11/2011
[  231.743363] Workqueue: events_long i915_hangcheck_elapsed [i915]
[  231.743382] task: ffff8801f42e9780 task.stack: ffff8801f42f8000
[  231.743489] RIP: 0010:i915_gem_reset_engine+0x45a/0x460 [i915]
[  231.743505] RSP: 0018:ffff8801f42ff770 EFLAGS: 00010202
[  231.743521] RAX: 0000000000000007 RBX: ffff8801bf6b1880 RCX: ffffffffa02881a6
[  231.743537] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff8801bf6b18c8
[  231.743551] RBP: ffff8801f42ff7c8 R08: 0000000000000001 R09: 0000000000000000
[  231.743566] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801edb02d00
[  231.743581] R13: ffff8801e19d4200 R14: 000000000000001d R15: ffff8801ce2a4000
[  231.743599] FS:  0000000000000000(0000) GS:ffff8801f5a80000(0000) knlGS:0000000000000000
[  231.743614] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  231.743629] CR2: 00007f0ebd1add10 CR3: 0000000002621000 CR4: 00000000000406e0
[  231.743643] Call Trace:
[  231.743752]  i915_gem_reset+0x6c/0x150 [i915]
[  231.743853]  i915_reset+0x175/0x210 [i915]
[  231.743958]  i915_reset_device+0x33b/0x350 [i915]
[  231.744061]  ? valleyview_pipestat_irq_handler+0xe0/0xe0 [i915]
[  231.744081]  ? trace_hardirqs_off_caller+0x70/0x110
[  231.744102]  ? _raw_spin_unlock_irqrestore+0x46/0x50
[  231.744120]  ? find_held_lock+0x119/0x150
[  231.744138]  ? mark_lock+0x6d/0x850
[  231.744241]  ? gen8_gt_irq_ack+0x1f0/0x1f0 [i915]
[  231.744262]  ? work_on_cpu_safe+0x60/0x60
[  231.744284]  ? rcu_read_lock_sched_held+0x57/0xa0
[  231.744400]  ? gen6_read32+0x2ba/0x320 [i915]
[  231.744506]  i915_handle_error+0x382/0x5f0 [i915]
[  231.744611]  ? gen6_rps_reset_ei+0x20/0x20 [i915]
[  231.744630]  ? vsnprintf+0x128/0x8e0
[  231.744649]  ? pointer+0x6b0/0x6b0
[  231.744667]  ? debug_check_no_locks_freed+0x1a0/0x1a0
[  231.744688]  ? scnprintf+0x92/0xe0
[  231.744706]  ? snprintf+0xb0/0xb0
[  231.744820]  hangcheck_declare_hang+0x15a/0x1a0 [i915]
[  231.744932]  ? engine_stuck+0x440/0x440 [i915]
[  231.744951]  ? rcu_read_lock_sched_held+0x57/0xa0
[  231.745062]  ? gen6_read32+0x2ba/0x320 [i915]
[  231.745173]  ? gen6_read16+0x320/0x320 [i915]
[  231.745284]  ? intel_engine_get_active_head+0x91/0x170 [i915]
[  231.745401]  i915_hangcheck_elapsed+0x3d8/0x400 [i915]
[  231.745424]  process_one_work+0x3e8/0xac0
[  231.745444]  ? pwq_dec_nr_in_flight+0x110/0x110
[  231.745464]  ? do_raw_spin_lock+0x8e/0x120
[  231.745484]  worker_thread+0x8d/0x720
[  231.745506]  kthread+0x19e/0x1f0
[  231.745524]  ? process_one_work+0xac0/0xac0
[  231.745541]  ? kthread_create_on_node+0xa0/0xa0
[  231.745560]  ret_from_fork+0x27/0x40
[  231.745581] Code: 8b 7d c8 e8 49 0d 02 e1 49 8b 7f 38 48 8b 75 b8 48 83 c7 10 e8 b8 89 be e1 e9 95 fc ff ff 4c 89 e7 e8 4b b9 ff ff e9 30 ff ff ff <0f> 0b 0f 1f 40 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fe
[  231.745767] RIP: i915_gem_reset_engine+0x45a/0x460 [i915] RSP: ffff8801f42ff770

At first glance this looks to be related to commit c64992e
("drm/i915: Look for active requests earlier in the reset path"), but it
could easily happen before as well. On the other hand, we no longer
logged victims due to the active_request being dropped earlier.

v2: Be trickier to unwind the incomplete request as we cannot rely on
request retirement for the lockless per-engine reset.
v3: Reprobe the active request at the time of the reset.

Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Fixes: c64992e ("drm/i915: Look for active requests earlier in the reset path")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-15-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v1
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
amery pushed a commit that referenced this issue Nov 9, 2018
Increase kasan instrumented kernel stack size from 32k to 64k. Other
architectures seems to get away with just doubling kernel stack size under
kasan, but on s390 this appears to be not enough due to bigger frame size.
The particular pain point is kasan inlined checks (CONFIG_KASAN_INLINE
vs CONFIG_KASAN_OUTLINE). With inlined checks one particular case hitting
stack overflow is fs sync on xfs filesystem:

 #0 [9a0681e8]  704 bytes  check_usage at 34b1fc
 #1 [9a0684a8]  432 bytes  check_usage at 34c710
 #2 [9a068658]  1048 bytes  validate_chain at 35044a
 #3 [9a068a70]  312 bytes  __lock_acquire at 3559fe
 #4 [9a068ba8]  440 bytes  lock_acquire at 3576ee
 #5 [9a068d60]  104 bytes  _raw_spin_lock at 21b44e0
 #6 [9a068dc8]  1992 bytes  enqueue_entity at 2dbf72
 #7 [9a069590]  1496 bytes  enqueue_task_fair at 2df5f0
 #8 [9a069b68]  64 bytes  ttwu_do_activate at 28f438
 #9 [9a069ba8]  552 bytes  try_to_wake_up at 298c4c
 #10 [9a069dd0]  168 bytes  wake_up_worker at 23f97c
 #11 [9a069e78]  200 bytes  insert_work at 23fc2e
 #12 [9a069f40]  648 bytes  __queue_work at 2487c0
 #13 [9a06a1c8]  200 bytes  __queue_delayed_work at 24db28
 #14 [9a06a290]  248 bytes  mod_delayed_work_on at 24de84
 #15 [9a06a388]  24 bytes  kblockd_mod_delayed_work_on at 153e2a0
 #16 [9a06a3a0]  288 bytes  __blk_mq_delay_run_hw_queue at 158168c
 #17 [9a06a4c0]  192 bytes  blk_mq_run_hw_queue at 1581a3c
 #18 [9a06a580]  184 bytes  blk_mq_sched_insert_requests at 15a2192
 #19 [9a06a638]  1024 bytes  blk_mq_flush_plug_list at 1590f3a
 #20 [9a06aa38]  704 bytes  blk_flush_plug_list at 1555028
 #21 [9a06acf8]  320 bytes  schedule at 219e476
 #22 [9a06ae38]  760 bytes  schedule_timeout at 21b0aac
 #23 [9a06b130]  408 bytes  wait_for_common at 21a1706
 #24 [9a06b2c8]  360 bytes  xfs_buf_iowait at fa1540
 #25 [9a06b430]  256 bytes  __xfs_buf_submit at fadae6
 #26 [9a06b530]  264 bytes  xfs_buf_read_map at fae3f6
 #27 [9a06b638]  656 bytes  xfs_trans_read_buf_map at 10ac9a8
 #28 [9a06b8c8]  304 bytes  xfs_btree_kill_root at e72426
 #29 [9a06b9f8]  288 bytes  xfs_btree_lookup_get_block at e7bc5e
 #30 [9a06bb18]  624 bytes  xfs_btree_lookup at e7e1a6
 #31 [9a06bd88]  2664 bytes  xfs_alloc_ag_vextent_near at dfa070
 #32 [9a06c7f0]  144 bytes  xfs_alloc_ag_vextent at dff3ca
 #33 [9a06c880]  1128 bytes  xfs_alloc_vextent at e05fce
 #34 [9a06cce8]  584 bytes  xfs_bmap_btalloc at e58342
 #35 [9a06cf30]  1336 bytes  xfs_bmapi_write at e618de
 #36 [9a06d468]  776 bytes  xfs_iomap_write_allocate at ff678e
 #37 [9a06d770]  720 bytes  xfs_map_blocks at f82af8
 #38 [9a06da40]  928 bytes  xfs_writepage_map at f83cd6
 #39 [9a06dde0]  320 bytes  xfs_do_writepage at f85872
 #40 [9a06df20]  1320 bytes  write_cache_pages at 73dfe8
 #41 [9a06e448]  208 bytes  xfs_vm_writepages at f7f892
 #42 [9a06e518]  88 bytes  do_writepages at 73fe6a
 #43 [9a06e570]  872 bytes  __writeback_single_inode at a20cb6
 #44 [9a06e8d8]  664 bytes  writeback_sb_inodes at a23be2
 #45 [9a06eb70]  296 bytes  __writeback_inodes_wb at a242e0
 #46 [9a06ec98]  928 bytes  wb_writeback at a2500e
 #47 [9a06f038]  848 bytes  wb_do_writeback at a260ae
 #48 [9a06f388]  536 bytes  wb_workfn at a28228
 #49 [9a06f5a0]  1088 bytes  process_one_work at 24a234
 #50 [9a06f9e0]  1120 bytes  worker_thread at 24ba26
 #51 [9a06fe40]  104 bytes  kthread at 26545a
 #52 [9a06fea8]             kernel_thread_starter at 21b6b62

To be able to increase the stack size to 64k reuse LLILL instruction
in __switch_to function to load 64k - STACK_FRAME_OVERHEAD - __PT_SIZE
(65192) value as unsigned.

Reported-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
repojohnray pushed a commit to repojohnray/linux-sunxi-4.7.y that referenced this issue Mar 7, 2021
commit 64b9cea upstream.

Syzkaller was able to trigger the following splat again:

WARNING: CPU: 1 PID: 12512 at net/mptcp/protocol.c:761 mptcp_reset_timer+0x12a/0x160 net/mptcp/protocol.c:761
Modules linked in:
CPU: 1 PID: 12512 Comm: kworker/1:6 Not tainted 5.10.0-rc6 linux-sunxi#52
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: events mptcp_worker
RIP: 0010:mptcp_reset_timer+0x12a/0x160 net/mptcp/protocol.c:761
Code: e8 4b 0c ad ff e8 56 21 88 fe 48 b8 00 00 00 00 00 fc ff df 48 c7 04 03 00 00 00 00 48 83 c4 40 5b 5d 41 5c c3 e8 36 21 88 fe <0f> 0b 41 bc c8 00 00 00 eb 98 e8 e7 b1 af fe e9 30 ff ff ff 48 c7
RSP: 0018:ffffc900018c7c68 EFLAGS: 00010293
RAX: ffff888108cb1c80 RBX: 1ffff92000318f8d RCX: ffffffff82ad0307
RDX: 0000000000000000 RSI: ffffffff82ad036a RDI: 0000000000000007
RBP: ffff888113e2d000 R08: ffff888108cb1c80 R09: ffffed10227c5ab7
R10: ffff888113e2d5b7 R11: ffffed10227c5ab6 R12: 0000000000000000
R13: ffff88801f100000 R14: ffff888113e2d5b0 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff88811b500000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd76a874ef8 CR3: 000000001689c005 CR4: 0000000000170ee0
Call Trace:
 mptcp_worker+0xaa4/0x1560 net/mptcp/protocol.c:2334
 process_one_work+0x8d3/0x1200 kernel/workqueue.c:2272
 worker_thread+0x9c/0x1090 kernel/workqueue.c:2418
 kthread+0x303/0x410 kernel/kthread.c:292
 ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:296

The mptcp_worker tries to update the MPTCP retransmission timer
even if such timer is not currently scheduled.

The mptcp_rtx_head() return value is bogus: we can have enqueued
data not yet transmitted. The above may additionally cause spurious,
unneeded MPTCP-level retransmissions.

Fix the issue adding an explicit clearing of the rtx queue before
trying to retransmit and checking for unacked data.
Additionally drop an unneeded timer stop call and the unused
mptcp_rtx_tail() helper.

Reported-by: Christoph Paasch <cpaasch@apple.com>
Fixes: 6e628cd ("mptcp: use mptcp release_cb for delayed tasks")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
repojohnray pushed a commit to repojohnray/linux-sunxi-4.7.y that referenced this issue May 22, 2021
[ Upstream commit d5027ca ]

Ritesh reported a bug [1] against UML, noting that it crashed on
startup. The backtrace shows the following (heavily redacted):

(gdb) bt
...
 linux-sunxi#26 0x0000000060015b5d in sem_init () at ipc/sem.c:268
 linux-sunxi#27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-gnu/libcom_err.so.2
 linux-sunxi#28 0x00007f8990ab8fb2 in call_init (...) at dl-init.c:72
...
 linux-sunxi#40 0x00007f89909bf3a6 in nss_load_library (...) at nsswitch.c:359
...
 linux-sunxi#44 0x00007f8990895e35 in _nss_compat_getgrnam_r (...) at nss_compat/compat-grp.c:486
 linux-sunxi#45 0x00007f8990968b85 in __getgrnam_r [...]
 linux-sunxi#46 0x00007f89909d6b77 in grantpt [...]
 linux-sunxi#47 0x00007f8990a9394e in __GI_openpty [...]
 linux-sunxi#48 0x00000000604a1f65 in openpty_cb (...) at arch/um/os-Linux/sigio.c:407
 linux-sunxi#49 0x00000000604a58d0 in start_idle_thread (...) at arch/um/os-Linux/skas/process.c:598
 linux-sunxi#50 0x0000000060004a3d in start_uml () at arch/um/kernel/skas/process.c:45
 linux-sunxi#51 0x00000000600047b2 in linux_main (...) at arch/um/kernel/um_arch.c:334
 linux-sunxi#52 0x000000006000574f in main (...) at arch/um/os-Linux/main.c:144

indicating that the UML function openpty_cb() calls openpty(),
which internally calls __getgrnam_r(), which causes the nsswitch
machinery to get started.

This loads, through lots of indirection that I snipped, the
libcom_err.so.2 library, which (in an unknown function, "??")
calls sem_init().

Now, of course it wants to get libpthread's sem_init(), since
it's linked against libpthread. However, the dynamic linker
looks up that symbol against the binary first, and gets the
kernel's sem_init().

Hajime Tazaki noted that "objcopy -L" can localize a symbol,
so the dynamic linker wouldn't do the lookup this way. I tried,
but for some reason that didn't seem to work.

Doing the same thing in the linker script instead does seem to
work, though I cannot entirely explain - it *also* works if I
just add "VERSION { { global: *; }; }" instead, indicating that
something else is happening that I don't really understand. It
may be that explicitly doing that marks them with some kind of
empty version, and that's different from the default.

Explicitly marking them with a version breaks kallsyms, so that
doesn't seem to be possible.

Marking all the symbols as local seems correct, and does seem
to address the issue, so do that. Also do it for static link,
nsswitch libraries could still be loaded there.

[1] https://bugs.debian.org/983379

Reported-by: Ritesh Raj Sarraf <rrs@debian.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Tested-By: Ritesh Raj Sarraf <rrs@debian.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
repojohnray pushed a commit to repojohnray/linux-sunxi-4.7.y that referenced this issue May 24, 2021
[ Upstream commit d5027ca ]

Ritesh reported a bug [1] against UML, noting that it crashed on
startup. The backtrace shows the following (heavily redacted):

(gdb) bt
...
 linux-sunxi#26 0x0000000060015b5d in sem_init () at ipc/sem.c:268
 linux-sunxi#27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-gnu/libcom_err.so.2
 linux-sunxi#28 0x00007f8990ab8fb2 in call_init (...) at dl-init.c:72
...
 linux-sunxi#40 0x00007f89909bf3a6 in nss_load_library (...) at nsswitch.c:359
...
 linux-sunxi#44 0x00007f8990895e35 in _nss_compat_getgrnam_r (...) at nss_compat/compat-grp.c:486
 linux-sunxi#45 0x00007f8990968b85 in __getgrnam_r [...]
 linux-sunxi#46 0x00007f89909d6b77 in grantpt [...]
 linux-sunxi#47 0x00007f8990a9394e in __GI_openpty [...]
 linux-sunxi#48 0x00000000604a1f65 in openpty_cb (...) at arch/um/os-Linux/sigio.c:407
 linux-sunxi#49 0x00000000604a58d0 in start_idle_thread (...) at arch/um/os-Linux/skas/process.c:598
 linux-sunxi#50 0x0000000060004a3d in start_uml () at arch/um/kernel/skas/process.c:45
 linux-sunxi#51 0x00000000600047b2 in linux_main (...) at arch/um/kernel/um_arch.c:334
 linux-sunxi#52 0x000000006000574f in main (...) at arch/um/os-Linux/main.c:144

indicating that the UML function openpty_cb() calls openpty(),
which internally calls __getgrnam_r(), which causes the nsswitch
machinery to get started.

This loads, through lots of indirection that I snipped, the
libcom_err.so.2 library, which (in an unknown function, "??")
calls sem_init().

Now, of course it wants to get libpthread's sem_init(), since
it's linked against libpthread. However, the dynamic linker
looks up that symbol against the binary first, and gets the
kernel's sem_init().

Hajime Tazaki noted that "objcopy -L" can localize a symbol,
so the dynamic linker wouldn't do the lookup this way. I tried,
but for some reason that didn't seem to work.

Doing the same thing in the linker script instead does seem to
work, though I cannot entirely explain - it *also* works if I
just add "VERSION { { global: *; }; }" instead, indicating that
something else is happening that I don't really understand. It
may be that explicitly doing that marks them with some kind of
empty version, and that's different from the default.

Explicitly marking them with a version breaks kallsyms, so that
doesn't seem to be possible.

Marking all the symbols as local seems correct, and does seem
to address the issue, so do that. Also do it for static link,
nsswitch libraries could still be loaded there.

[1] https://bugs.debian.org/983379

Reported-by: Ritesh Raj Sarraf <rrs@debian.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Tested-By: Ritesh Raj Sarraf <rrs@debian.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
jwrdegoede pushed a commit to jwrdegoede/linux-sunxi that referenced this issue Jan 19, 2023
The inline assembly for arm64's cmpxchg_double*() implementations use a
+Q constraint to hazard against other accesses to the memory location
being exchanged. However, the pointer passed to the constraint is a
pointer to unsigned long, and thus the hazard only applies to the first
8 bytes of the location.

GCC can take advantage of this, assuming that other portions of the
location are unchanged, leading to a number of potential problems.

This is similar to what we fixed back in commit:

  fee960b ("arm64: xchg: hazard against entire exchange variable")

... but we forgot to adjust cmpxchg_double*() similarly at the same
time.

The same problem applies, as demonstrated with the following test:

| struct big {
|         u64 lo, hi;
| } __aligned(128);
|
| unsigned long foo(struct big *b)
| {
|         u64 hi_old, hi_new;
|
|         hi_old = b->hi;
|         cmpxchg_double_local(&b->lo, &b->hi, 0x12, 0x34, 0x56, 0x78);
|         hi_new = b->hi;
|
|         return hi_old ^ hi_new;
| }

... which GCC 12.1.0 compiles as:

| 0000000000000000 <foo>:
|    0:   d503233f        paciasp
|    4:   aa0003e4        mov     x4, x0
|    8:   1400000e        b       40 <foo+0x40>
|    c:   d2800240        mov     x0, #0x12                       // linux-sunxi#18
|   10:   d2800681        mov     x1, #0x34                       // linux-sunxi#52
|   14:   aa0003e5        mov     x5, x0
|   18:   aa0103e6        mov     x6, x1
|   1c:   d2800ac2        mov     x2, #0x56                       // linux-sunxi#86
|   20:   d2800f03        mov     x3, #0x78                       // linux-sunxi#120
|   24:   48207c82        casp    x0, x1, x2, x3, [x4]
|   28:   ca050000        eor     x0, x0, x5
|   2c:   ca060021        eor     x1, x1, x6
|   30:   aa010000        orr     x0, x0, x1
|   34:   d2800000        mov     x0, #0x0                        // #0    <--- BANG
|   38:   d50323bf        autiasp
|   3c:   d65f03c0        ret
|   40:   d2800240        mov     x0, #0x12                       // linux-sunxi#18
|   44:   d2800681        mov     x1, #0x34                       // linux-sunxi#52
|   48:   d2800ac2        mov     x2, #0x56                       // linux-sunxi#86
|   4c:   d2800f03        mov     x3, #0x78                       // linux-sunxi#120
|   50:   f9800091        prfm    pstl1strm, [x4]
|   54:   c87f1885        ldxp    x5, x6, [x4]
|   58:   ca0000a5        eor     x5, x5, x0
|   5c:   ca0100c6        eor     x6, x6, x1
|   60:   aa0600a6        orr     x6, x5, x6
|   64:   b5000066        cbnz    x6, 70 <foo+0x70>
|   68:   c8250c82        stxp    w5, x2, x3, [x4]
|   6c:   35ffff45        cbnz    w5, 54 <foo+0x54>
|   70:   d2800000        mov     x0, #0x0                        // #0     <--- BANG
|   74:   d50323bf        autiasp
|   78:   d65f03c0        ret

Notice that at the lines with "BANG" comments, GCC has assumed that the
higher 8 bytes are unchanged by the cmpxchg_double() call, and that
`hi_old ^ hi_new` can be reduced to a constant zero, for both LSE and
LL/SC versions of cmpxchg_double().

This patch fixes the issue by passing a pointer to __uint128_t into the
+Q constraint, ensuring that the compiler hazards against the entire 16
bytes being modified.

With this change, GCC 12.1.0 compiles the above test as:

| 0000000000000000 <foo>:
|    0:   f9400407        ldr     x7, [x0, linux-sunxi#8]
|    4:   d503233f        paciasp
|    8:   aa0003e4        mov     x4, x0
|    c:   1400000f        b       48 <foo+0x48>
|   10:   d2800240        mov     x0, #0x12                       // linux-sunxi#18
|   14:   d2800681        mov     x1, #0x34                       // linux-sunxi#52
|   18:   aa0003e5        mov     x5, x0
|   1c:   aa0103e6        mov     x6, x1
|   20:   d2800ac2        mov     x2, #0x56                       // linux-sunxi#86
|   24:   d2800f03        mov     x3, #0x78                       // linux-sunxi#120
|   28:   48207c82        casp    x0, x1, x2, x3, [x4]
|   2c:   ca050000        eor     x0, x0, x5
|   30:   ca060021        eor     x1, x1, x6
|   34:   aa010000        orr     x0, x0, x1
|   38:   f9400480        ldr     x0, [x4, linux-sunxi#8]
|   3c:   d50323bf        autiasp
|   40:   ca0000e0        eor     x0, x7, x0
|   44:   d65f03c0        ret
|   48:   d2800240        mov     x0, #0x12                       // linux-sunxi#18
|   4c:   d2800681        mov     x1, #0x34                       // linux-sunxi#52
|   50:   d2800ac2        mov     x2, #0x56                       // linux-sunxi#86
|   54:   d2800f03        mov     x3, #0x78                       // linux-sunxi#120
|   58:   f9800091        prfm    pstl1strm, [x4]
|   5c:   c87f1885        ldxp    x5, x6, [x4]
|   60:   ca0000a5        eor     x5, x5, x0
|   64:   ca0100c6        eor     x6, x6, x1
|   68:   aa0600a6        orr     x6, x5, x6
|   6c:   b5000066        cbnz    x6, 78 <foo+0x78>
|   70:   c8250c82        stxp    w5, x2, x3, [x4]
|   74:   35ffff45        cbnz    w5, 5c <foo+0x5c>
|   78:   f9400480        ldr     x0, [x4, linux-sunxi#8]
|   7c:   d50323bf        autiasp
|   80:   ca0000e0        eor     x0, x7, x0
|   84:   d65f03c0        ret

... sampling the high 8 bytes before and after the cmpxchg, and
performing an EOR, as we'd expect.

For backporting, I've tested this atop linux-4.9.y with GCC 5.5.0. Note
that linux-4.9.y is oldest currently supported stable release, and
mandates GCC 5.1+. Unfortunately I couldn't get a GCC 5.1 binary to run
on my machines due to library incompatibilities.

I've also used a standalone test to check that we can use a __uint128_t
pointer in a +Q constraint at least as far back as GCC 4.8.5 and LLVM
3.9.1.

Fixes: 5284e1b ("arm64: xchg: Implement cmpxchg_double")
Fixes: e9a4b79 ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU")
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/lkml/Y6DEfQXymYVgL3oJ@boqun-archlinux/
Reported-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/Y6GXoO4qmH9OIZ5Q@hirez.programming.kicks-ass.net/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230104151626.3262137-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
repojohnray pushed a commit to repojohnray/linux-sunxi-4.7.y that referenced this issue Jan 25, 2023
commit 031af50 upstream.

The inline assembly for arm64's cmpxchg_double*() implementations use a
+Q constraint to hazard against other accesses to the memory location
being exchanged. However, the pointer passed to the constraint is a
pointer to unsigned long, and thus the hazard only applies to the first
8 bytes of the location.

GCC can take advantage of this, assuming that other portions of the
location are unchanged, leading to a number of potential problems.

This is similar to what we fixed back in commit:

  fee960b ("arm64: xchg: hazard against entire exchange variable")

... but we forgot to adjust cmpxchg_double*() similarly at the same
time.

The same problem applies, as demonstrated with the following test:

| struct big {
|         u64 lo, hi;
| } __aligned(128);
|
| unsigned long foo(struct big *b)
| {
|         u64 hi_old, hi_new;
|
|         hi_old = b->hi;
|         cmpxchg_double_local(&b->lo, &b->hi, 0x12, 0x34, 0x56, 0x78);
|         hi_new = b->hi;
|
|         return hi_old ^ hi_new;
| }

... which GCC 12.1.0 compiles as:

| 0000000000000000 <foo>:
|    0:   d503233f        paciasp
|    4:   aa0003e4        mov     x4, x0
|    8:   1400000e        b       40 <foo+0x40>
|    c:   d2800240        mov     x0, #0x12                       // linux-sunxi#18
|   10:   d2800681        mov     x1, #0x34                       // linux-sunxi#52
|   14:   aa0003e5        mov     x5, x0
|   18:   aa0103e6        mov     x6, x1
|   1c:   d2800ac2        mov     x2, #0x56                       // linux-sunxi#86
|   20:   d2800f03        mov     x3, #0x78                       // linux-sunxi#120
|   24:   48207c82        casp    x0, x1, x2, x3, [x4]
|   28:   ca050000        eor     x0, x0, x5
|   2c:   ca060021        eor     x1, x1, x6
|   30:   aa010000        orr     x0, x0, x1
|   34:   d2800000        mov     x0, #0x0                        // #0    <--- BANG
|   38:   d50323bf        autiasp
|   3c:   d65f03c0        ret
|   40:   d2800240        mov     x0, #0x12                       // linux-sunxi#18
|   44:   d2800681        mov     x1, #0x34                       // linux-sunxi#52
|   48:   d2800ac2        mov     x2, #0x56                       // linux-sunxi#86
|   4c:   d2800f03        mov     x3, #0x78                       // linux-sunxi#120
|   50:   f9800091        prfm    pstl1strm, [x4]
|   54:   c87f1885        ldxp    x5, x6, [x4]
|   58:   ca0000a5        eor     x5, x5, x0
|   5c:   ca0100c6        eor     x6, x6, x1
|   60:   aa0600a6        orr     x6, x5, x6
|   64:   b5000066        cbnz    x6, 70 <foo+0x70>
|   68:   c8250c82        stxp    w5, x2, x3, [x4]
|   6c:   35ffff45        cbnz    w5, 54 <foo+0x54>
|   70:   d2800000        mov     x0, #0x0                        // #0     <--- BANG
|   74:   d50323bf        autiasp
|   78:   d65f03c0        ret

Notice that at the lines with "BANG" comments, GCC has assumed that the
higher 8 bytes are unchanged by the cmpxchg_double() call, and that
`hi_old ^ hi_new` can be reduced to a constant zero, for both LSE and
LL/SC versions of cmpxchg_double().

This patch fixes the issue by passing a pointer to __uint128_t into the
+Q constraint, ensuring that the compiler hazards against the entire 16
bytes being modified.

With this change, GCC 12.1.0 compiles the above test as:

| 0000000000000000 <foo>:
|    0:   f9400407        ldr     x7, [x0, linux-sunxi#8]
|    4:   d503233f        paciasp
|    8:   aa0003e4        mov     x4, x0
|    c:   1400000f        b       48 <foo+0x48>
|   10:   d2800240        mov     x0, #0x12                       // linux-sunxi#18
|   14:   d2800681        mov     x1, #0x34                       // linux-sunxi#52
|   18:   aa0003e5        mov     x5, x0
|   1c:   aa0103e6        mov     x6, x1
|   20:   d2800ac2        mov     x2, #0x56                       // linux-sunxi#86
|   24:   d2800f03        mov     x3, #0x78                       // linux-sunxi#120
|   28:   48207c82        casp    x0, x1, x2, x3, [x4]
|   2c:   ca050000        eor     x0, x0, x5
|   30:   ca060021        eor     x1, x1, x6
|   34:   aa010000        orr     x0, x0, x1
|   38:   f9400480        ldr     x0, [x4, linux-sunxi#8]
|   3c:   d50323bf        autiasp
|   40:   ca0000e0        eor     x0, x7, x0
|   44:   d65f03c0        ret
|   48:   d2800240        mov     x0, #0x12                       // linux-sunxi#18
|   4c:   d2800681        mov     x1, #0x34                       // linux-sunxi#52
|   50:   d2800ac2        mov     x2, #0x56                       // linux-sunxi#86
|   54:   d2800f03        mov     x3, #0x78                       // linux-sunxi#120
|   58:   f9800091        prfm    pstl1strm, [x4]
|   5c:   c87f1885        ldxp    x5, x6, [x4]
|   60:   ca0000a5        eor     x5, x5, x0
|   64:   ca0100c6        eor     x6, x6, x1
|   68:   aa0600a6        orr     x6, x5, x6
|   6c:   b5000066        cbnz    x6, 78 <foo+0x78>
|   70:   c8250c82        stxp    w5, x2, x3, [x4]
|   74:   35ffff45        cbnz    w5, 5c <foo+0x5c>
|   78:   f9400480        ldr     x0, [x4, linux-sunxi#8]
|   7c:   d50323bf        autiasp
|   80:   ca0000e0        eor     x0, x7, x0
|   84:   d65f03c0        ret

... sampling the high 8 bytes before and after the cmpxchg, and
performing an EOR, as we'd expect.

For backporting, I've tested this atop linux-4.9.y with GCC 5.5.0. Note
that linux-4.9.y is oldest currently supported stable release, and
mandates GCC 5.1+. Unfortunately I couldn't get a GCC 5.1 binary to run
on my machines due to library incompatibilities.

I've also used a standalone test to check that we can use a __uint128_t
pointer in a +Q constraint at least as far back as GCC 4.8.5 and LLVM
3.9.1.

Fixes: 5284e1b ("arm64: xchg: Implement cmpxchg_double")
Fixes: e9a4b79 ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU")
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/lkml/Y6DEfQXymYVgL3oJ@boqun-archlinux/
Reported-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/Y6GXoO4qmH9OIZ5Q@hirez.programming.kicks-ass.net/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230104151626.3262137-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants