Skip to content
This repository has been archived by the owner on Oct 3, 2024. It is now read-only.

linux 4.19 regression: graphic artifacts, invalid tiling mode #61

Closed
tinywrkb opened this issue Nov 26, 2018 · 5 comments
Closed

linux 4.19 regression: graphic artifacts, invalid tiling mode #61

tinywrkb opened this issue Nov 26, 2018 · 5 comments

Comments

@tinywrkb
Copy link

I noticed this with linux 4.19.4, downgraded to linux 4.18.7 to confirm that this is a kernel regression.
Attached video demonstrates the graphical artifacts: gvt_regression.tar.gz

Error message in kernel:

gvt: vgpu 1: invalid tiling mode: 1

Tested with qemu 3.0.0, mesa 18.2.5.
CPU: E3-1265L v4

@TerrenceXu
Copy link

Thank you for your good finding, we will fix it ASAP.

zhenyw pushed a commit that referenced this issue Dec 3, 2018
Commit b244ffa ("drm/i915/gvt: Fix drm_format_mod value for vGPU
plane") introduced a regression issue to the tiled memory decoding on BDW.

This patch can fix this issue.

Here is the issue detail: #61

v1->v2:
- Refine the commit message. (Zhenyu)

Fixes: b244ffa("drm/i915/gvt: Fix drm_format_mod value for vGPU plane")
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Cc: stable@vger.kernel.org # v4.19+
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
@TerrenceXu
Copy link

@tinywrkbee , can you try the Commit b244ffa for a try? Thanks!

@tinywrkb
Copy link
Author

tinywrkb commented Dec 3, 2018

Seems to be resolved. Tested with a build of gvt-staging-2018y-12m-03d-17h-06m-10s tag (4.20rc4), didn't try to patch a stable 4.19 release.

Guests tested: Windows 10 RS5 and Linux 4.19 based desktop.

Thanks @TerrenceXu

@tinywrkb tinywrkb closed this as completed Dec 3, 2018
@tinywrkb
Copy link
Author

tinywrkb commented Dec 3, 2018

I had some time to test the official 4.19.4 release with patch applied on top and it looks fine.

@TerrenceXu
Copy link

OK, thank you for your testing @tinywrkbee !

m-hennecke pushed a commit to m-hennecke/linux that referenced this issue Dec 16, 2018
Commit b244ffa ("drm/i915/gvt: Fix drm_format_mod value for vGPU
plane") introduced a regression issue to the tiled memory decoding on BDW.

This patch can fix this issue.

Here is the issue detail: intel/gvt-linux#61

v1->v2:
- Refine the commit message. (Zhenyu)

Fixes: b244ffa("drm/i915/gvt: Fix drm_format_mod value for vGPU plane")
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Cc: stable@vger.kernel.org # v4.19+
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
woodsts pushed a commit to woodsts/linux-stable that referenced this issue Dec 19, 2018
commit a40fa23 upstream.

Commit b244ffa ("drm/i915/gvt: Fix drm_format_mod value for vGPU
plane") introduced a regression issue to the tiled memory decoding on BDW.

This patch can fix this issue.

Here is the issue detail: intel/gvt-linux#61

v1->v2:
- Refine the commit message. (Zhenyu)

Fixes: b244ffa("drm/i915/gvt: Fix drm_format_mod value for vGPU plane")
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Cc: stable@vger.kernel.org # v4.19+
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
isjerryxiao pushed a commit to isjerryxiao/Amlogic_s905-kernel that referenced this issue Dec 22, 2018
commit a40fa23 upstream.

Commit b244ffa ("drm/i915/gvt: Fix drm_format_mod value for vGPU
plane") introduced a regression issue to the tiled memory decoding on BDW.

This patch can fix this issue.

Here is the issue detail: intel/gvt-linux#61

v1->v2:
- Refine the commit message. (Zhenyu)

Fixes: b244ffa("drm/i915/gvt: Fix drm_format_mod value for vGPU plane")
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Cc: stable@vger.kernel.org # v4.19+
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
johalun pushed a commit to FreeBSDDesktop/kms-drm that referenced this issue Jan 3, 2019
Commit b244ffa15c8b ("drm/i915/gvt: Fix drm_format_mod value for vGPU
plane") introduced a regression issue to the tiled memory decoding on BDW.

This patch can fix this issue.

Here is the issue detail: intel/gvt-linux#61

v1->v2:
- Refine the commit message. (Zhenyu)

Fixes: b244ffa15c8b("drm/i915/gvt: Fix drm_format_mod value for vGPU plane")
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Cc: stable@vger.kernel.org # v4.19+
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
zhenyw pushed a commit that referenced this issue Jun 17, 2019
When calling kmalloc with GFP_KERNEL in case CONFIG_SLOB is unset,
kmem_cache_alloc_trace is called.

In case CONFIG_TRACING is set, kmem_cache_alloc_trace will ball
slab_alloc, which will call slab_pre_alloc_hook which might_sleep_if.

The context in which it is called in this case, the
intel_sst_interrupt_mrfld, calling a sleeping kmalloc generates a BUG():

Fixes: 972b0d4 ("ASoC: Intel: remove GFP_ATOMIC, use GFP_KERNEL")

[   20.250671] BUG: sleeping function called from invalid context at mm/slab.h:422
[   20.250683] in_atomic(): 1, irqs_disabled(): 1, pid: 1791, name: Chrome_IOThread
[   20.250690] CPU: 0 PID: 1791 Comm: Chrome_IOThread Tainted: G        W         4.19.43 #61
[   20.250693] Hardware name: GOOGLE Kefka, BIOS Google_Kefka.7287.337.0 03/02/2017
[   20.250697] Call Trace:
[   20.250704]  <IRQ>
[   20.250716]  dump_stack+0x7e/0xc3
[   20.250725]  ___might_sleep+0x12a/0x140
[   20.250731]  kmem_cache_alloc_trace+0x53/0x1c5
[   20.250736]  ? update_cfs_rq_load_avg+0x17e/0x1aa
[   20.250740]  ? cpu_load_update+0x6c/0xc2
[   20.250746]  sst_create_ipc_msg+0x2d/0x88
[   20.250752]  intel_sst_interrupt_mrfld+0x12a/0x22c
[   20.250758]  __handle_irq_event_percpu+0x133/0x228
[   20.250764]  handle_irq_event_percpu+0x35/0x7a
[   20.250768]  handle_irq_event+0x36/0x55
[   20.250773]  handle_fasteoi_irq+0xab/0x16c
[   20.250779]  handle_irq+0xd9/0x11e
[   20.250785]  do_IRQ+0x54/0xe0
[   20.250791]  common_interrupt+0xf/0xf
[   20.250795]  </IRQ>
[   20.250800] RIP: 0010:__lru_cache_add+0x4e/0xad
[   20.250806] Code: 00 01 48 c7 c7 b8 df 01 00 65 48 03 3c 25 28 f1 00 00 48 8b 48 08 48 89 ca 48 ff ca f6 c1 01 48 0f 44 d0 f0 ff 42 34 0f b6 0f <89> ca fe c2 88 17 48 89 44 cf 08 80 fa 0f 74 0e 48 8b 08 66 85 c9
[   20.250809] RSP: 0000:ffffa568810bfd98 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffd6
[   20.250814] RAX: ffffd3b904eb1940 RBX: ffffd3b904eb1940 RCX: 0000000000000004
[   20.250817] RDX: ffffd3b904eb1940 RSI: ffffa10ee5c47450 RDI: ffffa10efba1dfb8
[   20.250821] RBP: ffffa568810bfda8 R08: ffffa10ef9c741c1 R09: dead000000000100
[   20.250824] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa10ee8d52a40
[   20.250827] R13: ffffa10ee8d52000 R14: ffffa10ee5c47450 R15: 800000013ac65067
[   20.250835]  lru_cache_add_active_or_unevictable+0x4e/0xb8
[   20.250841]  handle_mm_fault+0xd98/0x10c4
[   20.250848]  __do_page_fault+0x235/0x42d
[   20.250853]  ? page_fault+0x8/0x30
[   20.250858]  do_page_fault+0x3d/0x17a
[   20.250862]  ? page_fault+0x8/0x30
[   20.250866]  page_fault+0x1e/0x30
[   20.250872] RIP: 0033:0x7962fdea9304
[   20.250875] Code: 0f 11 4c 17 f0 c3 48 3b 15 f1 26 31 00 0f 83 e2 00 00 00 48 39 f7 72 0f 74 12 4c 8d 0c 16 4c 39 cf 0f 82 63 01 00 00 48 89 d1 <f3> a4 c3 80 fa 08 73 12 80 fa 04 73 1e 80 fa 01 77 26 72 05 0f b6
[   20.250879] RSP: 002b:00007962f4db5468 EFLAGS: 00010206
[   20.250883] RAX: 00003c8cc9d47008 RBX: 0000000000000000 RCX: 0000000000001b48
[   20.250886] RDX: 0000000000002b40 RSI: 00003c8cc9551000 RDI: 00003c8cc9d48000
[   20.250890] RBP: 00007962f4db5820 R08: 0000000000000000 R09: 00003c8cc9552b48
[   20.250893] R10: 0000562dd1064d30 R11: 00003c8cc825b908 R12: 00003c8cc966d3c0
[   20.250896] R13: 00003c8cc9e280c0 R14: 0000000000000000 R15: 0000000000000000

Signed-off-by: Alex Levin <levinale@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
zhenyw pushed a commit that referenced this issue Jan 20, 2020
…y section

When we remove an early section, we don't free the usage map, as the
usage maps of other sections are placed into the same page.  Once the
section is removed, it is no longer an early section (especially, the
memmap is freed).  When we re-add that section, the usage map is reused,
however, it is no longer an early section.  When removing that section
again, we try to kfree() a usage map that was allocated during early
boot - bad.

Let's check against PageReserved() to see if we are dealing with an
usage map that was allocated during boot.  We could also check against
!(PageSlab(usage_page) || PageCompound(usage_page)), but PageReserved() is
cleaner.

Can be triggered using memtrace under ppc64/powernv:

  $ mount -t debugfs none /sys/kernel/debug/
  $ echo 0x20000000 > /sys/kernel/debug/powerpc/memtrace/enable
  $ echo 0x20000000 > /sys/kernel/debug/powerpc/memtrace/enable
   ------------[ cut here ]------------
   kernel BUG at mm/slub.c:3969!
   Oops: Exception in kernel mode, sig: 5 [#1]
   LE PAGE_SIZE=3D64K MMU=3DHash SMP NR_CPUS=3D2048 NUMA PowerNV
   Modules linked in:
   CPU: 0 PID: 154 Comm: sh Not tainted 5.5.0-rc2-next-20191216-00005-g0be1dba7b7c0 #61
   NIP kfree+0x338/0x3b0
   LR section_deactivate+0x138/0x200
   Call Trace:
     section_deactivate+0x138/0x200
     __remove_pages+0x114/0x150
     arch_remove_memory+0x3c/0x160
     try_remove_memory+0x114/0x1a0
     __remove_memory+0x20/0x40
     memtrace_enable_set+0x254/0x850
     simple_attr_write+0x138/0x160
     full_proxy_write+0x8c/0x110
     __vfs_write+0x38/0x70
     vfs_write+0x11c/0x2a0
     ksys_write+0x84/0x140
     system_call+0x5c/0x68
   ---[ end trace 4b053cbd84e0db62 ]---

The first invocation will offline+remove memory blocks.  The second
invocation will first add+online them again, in order to offline+remove
them again (usually we are lucky and the exact same memory blocks will
get "reallocated").

Tested on powernv with boot memory: The usage map will not get freed.
Tested on x86-64 with DIMMs: The usage map will get freed.

Using Dynamic Memory under a Power DLAPR can trigger it easily.

Triggering removal (I assume after previously removed+re-added) of
memory from the HMC GUI can crash the kernel with the same call trace
and is fixed by this patch.

Link: http://lkml.kernel.org/r/20191217104637.5509-1-david@redhat.com
Fixes: 326e1b8 ("mm/sparsemem: introduce a SECTION_IS_EARLY flag")
Signed-off-by: David Hildenbrand <david@redhat.com>
Tested-by: Pingfan Liu <piliu@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zhenyw pushed a commit that referenced this issue Jul 29, 2020
The "inline" keyword is a hint for the compiler to inline a function.  The
functions system_uses_irq_prio_masking() and gic_write_pmr() are used by
the code running at EL2 on a non-VHE system, so mark them as
__always_inline to make sure they'll always be part of the .hyp.text
section.

This fixes the following splat when trying to run a VM:

[   47.625273] Kernel panic - not syncing: HYP panic:
[   47.625273] PS:a00003c9 PC:0000ca0b42049fc4 ESR:86000006
[   47.625273] FAR:0000ca0b42049fc4 HPFAR:0000000010001000 PAR:0000000000000000
[   47.625273] VCPU:0000000000000000
[   47.647261] CPU: 1 PID: 217 Comm: kvm-vcpu-0 Not tainted 5.8.0-rc1-ARCH+ #61
[   47.654508] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT)
[   47.661139] Call trace:
[   47.663659]  dump_backtrace+0x0/0x1cc
[   47.667413]  show_stack+0x18/0x24
[   47.670822]  dump_stack+0xb8/0x108
[   47.674312]  panic+0x124/0x2f4
[   47.677446]  panic+0x0/0x2f4
[   47.680407] SMP: stopping secondary CPUs
[   47.684439] Kernel Offset: disabled
[   47.688018] CPU features: 0x240402,20002008
[   47.692318] Memory Limit: none
[   47.695465] ---[ end Kernel panic - not syncing: HYP panic:
[   47.695465] PS:a00003c9 PC:0000ca0b42049fc4 ESR:86000006
[   47.695465] FAR:0000ca0b42049fc4 HPFAR:0000000010001000 PAR:0000000000000000
[   47.695465] VCPU:0000000000000000 ]---

The instruction abort was caused by the code running at EL2 trying to fetch
an instruction which wasn't mapped in the EL2 translation tables. Using
objdump showed the two functions as separate symbols in the .text section.

Fixes: 85738e0 ("arm64: kvm: Unmask PMR before entering guest")
Cc: stable@vger.kernel.org
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20200618171254.1596055-1-alexandru.elisei@arm.com
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants