Machine freeze #109

markfoodyburton · 2015-10-04T09:53:41Z

I am running Debian Squeeze with the 4.2.0-1 kernel on an Asrock N3150M, with a TechniSat SkyStar 2 eXpress HD.
I have followed the instructions on http://www.linuxtv.org/wiki/index.php/TBS6280 to build and install the modules, which are correctly loading.

02:00.0 Multimedia controller: Philips Semiconductors SAA7160 (rev 03)
Subsystem: Device 1ae4:0700
Flags: bus master, fast devsel, latency 0, IRQ 18
Memory at 91300000 (64-bit, non-prefetchable) [size=1M]
Capabilities: [40] MSI: Enable- Count=1/32 Maskable- 64bit+
Capabilities: [50] Express Endpoint, MSI 00
Capabilities: [74] Power Management version 2
Capabilities: [80] Vendor Specific Information: Len=50 Capabilities: [100] Vendor Specific Information: ID=0000 Rev=0 Len=088
Kernel driver in use: SAA716x Budget

However, once the card is being used (e.g. tvheadend is scanning for channels), after a relatively short period of time (5-10 minutes), the machine 'freezes', and my only choice is the Big Red Button (it's small and greenish, but thats a super micro-case for you :-)).

Adding verbose=2 gives me the following log message:
saa716x_i2c_send (0): FIFO full or Blocked

Cheers
Mark.

markfoodyburton · 2015-10-05T10:55:25Z

By getting the driver to take an exit on the first try, rather than trying 100 times, then I can get some debug information out. Shortly after this is printed, the machine hangs, and it's BRB...

saa716x_i2c_send (0): FIFO full or Blocked
saa716x_i2c_send (0): I2C Send failed (Err=-5)
saa716x_i2c_write_msg (0): Address write failed
saa716x_i2c_write_msg (0): Error writing data, err=-5
saa716x_i2c_xfer (0): Error in Transfer, try 0
saa716x_i2c_xfer (0): msg 0, addr = 0x68, len=2, flags=0x0
saa716x_i2c_xfer (0): <W 0000> 0xf4
saa716x_i2c_xfer (0): <W 0001> 0x12
saa716x_i2c_xfer (0): msg 1, addr = 0x68, len=1, flags=0x1

ljalves · 2016-06-24T16:28:14Z

Closing old issue.
Please re-open if still have this issue or updates about it.

The func v4l2_m2m_ctx_release waits for currently running jobs to finish and then stop streaming both queues and frees the buffers. All this should be done before the call to mtk_vcodec_enc_release which frees the encoder handler. This fixes null-pointer dereference bug: [ 638.028076] Mem abort info: [ 638.030932] ESR = 0x96000004 [ 638.033978] EC = 0x25: DABT (current EL), IL = 32 bits [ 638.039293] SET = 0, FnV = 0 [ 638.042338] EA = 0, S1PTW = 0 [ 638.045474] FSC = 0x04: level 0 translation fault [ 638.050349] Data abort info: [ 638.053224] ISV = 0, ISS = 0x00000004 [ 638.057055] CM = 0, WnR = 0 [ 638.060018] user pgtable: 4k pages, 48-bit VAs, pgdp=000000012b6db000 [ 638.066485] [00000000000001a0] pgd=0000000000000000, p4d=0000000000000000 [ 638.073277] Internal error: Oops: 96000004 [ljalves#1] SMP [ 638.078145] Modules linked in: rfkill mtk_vcodec_dec mtk_vcodec_enc uvcvideo mtk_mdp mtk_vcodec_common videobuf2_dma_contig v4l2_h264 cdc_ether v4l2_mem2mem videobuf2_vmalloc usbnet videobuf2_memops videobuf2_v4l2 r8152 videobuf2_common videodev cros_ec_sensors cros_ec_sensors_core industrialio_triggered_buffer kfifo_buf elan_i2c elants_i2c sbs_battery mc cros_usbpd_charger cros_ec_chardev cros_usbpd_logger crct10dif_ce mtk_vpu fuse ip_tables x_tables ipv6 [ 638.118583] CPU: 0 PID: 212 Comm: kworker/u8:5 Not tainted 5.15.0-06427-g58a1d4dcfc74-dirty ljalves#109 [ 638.127357] Hardware name: Google Elm (DT) [ 638.131444] Workqueue: mtk-vcodec-enc mtk_venc_worker [mtk_vcodec_enc] [ 638.137974] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 638.144925] pc : vp8_enc_encode+0x34/0x2b0 [mtk_vcodec_enc] [ 638.150493] lr : venc_if_encode+0xac/0x1b0 [mtk_vcodec_enc] [ 638.156060] sp : ffff8000124d3c40 [ 638.159364] x29: ffff8000124d3c40 x28: 0000000000000000 x27: 0000000000000000 [ 638.166493] x26: 0000000000000000 x25: ffff0000e7f252d0 x24: ffff8000124d3d58 [ 638.173621] x23: ffff8000124d3d58 x22: ffff8000124d3d60 x21: 0000000000000001 [ 638.180750] x20: ffff80001137e000 x19: 0000000000000000 x18: 0000000000000001 [ 638.187878] x17: 000000040044ffff x16: 00400032b5503510 x15: 0000000000000000 [ 638.195006] x14: ffff8000118536c0 x13: ffff8000ee1da000 x12: 0000000030d4d91d [ 638.202134] x11: 0000000000000000 x10: 0000000000000980 x9 : ffff8000124d3b20 [ 638.209262] x8 : ffff0000c18d4ea0 x7 : ffff0000c18d44c0 x6 : ffff0000c18d44c0 [ 638.216391] x5 : ffff80000904a3b0 x4 : ffff8000124d3d58 x3 : ffff8000124d3d60 [ 638.223519] x2 : ffff8000124d3d78 x1 : 0000000000000001 x0 : ffff80001137efb8 [ 638.230648] Call trace: [ 638.233084] vp8_enc_encode+0x34/0x2b0 [mtk_vcodec_enc] [ 638.238304] venc_if_encode+0xac/0x1b0 [mtk_vcodec_enc] [ 638.243525] mtk_venc_worker+0x110/0x250 [mtk_vcodec_enc] [ 638.248918] process_one_work+0x1f8/0x498 [ 638.252923] worker_thread+0x140/0x538 [ 638.256664] kthread+0x148/0x158 [ 638.259884] ret_from_fork+0x10/0x20 [ 638.263455] Code: f90023f9 2a0103f5 aa0303f6 aa0403f8 (f940d277) [ 638.269538] ---[ end trace e374fc10f8e181f5 ]--- [gst-master] root@debian:~/gst-build# [ 638.019193] Unable to handle kernel NULL pointer dereference at virtual address 00000000000001a0 Fixes: 4e855a6 ("[media] vcodec: mediatek: Add Mediatek V4L2 Video Encoder Driver") Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Reading EEPROM fails with following warning: [ 16.357496] ------------[ cut here ]------------ [ 16.357529] fsl_spi b01004c0.spi: rejecting DMA map of vmalloc memory [ 16.357698] WARNING: CPU: 0 PID: 371 at include/linux/dma-mapping.h:326 fsl_spi_cpm_bufs+0x2a0/0x2d8 [ 16.357775] CPU: 0 PID: 371 Comm: od Not tainted 5.16.11-s3k-dev-01743-g19beecbfe9d6-dirty ljalves#109 [ 16.357806] NIP: c03fbc9c LR: c03fbc9c CTR: 00000000 [ 16.357825] REGS: e68d9b20 TRAP: 0700 Not tainted (5.16.11-s3k-dev-01743-g19beecbfe9d6-dirty) [ 16.357849] MSR: 00029032 <EE,ME,IR,DR,RI> CR: 24002282 XER: 00000000 [ 16.357931] [ 16.357931] GPR00: c03fbc9c e68d9be0 c26d06a0 00000039 00000001 c0d36364 c0e96428 00000027 [ 16.357931] GPR08: 00000001 00000000 00000023 3fffc000 24002282 100d3dd6 100a2ffc 00000000 [ 16.357931] GPR16: 100cd280 100b0000 00000000 aff54f7e 100d0000 100d0000 00000001 100cf328 [ 16.357931] GPR24: 100cf328 00000000 00000003 e68d9e30 c156b410 e67ab4c0 e68d9d38 c24ab278 [ 16.358253] NIP [c03fbc9c] fsl_spi_cpm_bufs+0x2a0/0x2d8 [ 16.358292] LR [c03fbc9c] fsl_spi_cpm_bufs+0x2a0/0x2d8 [ 16.358325] Call Trace: [ 16.358336] [e68d9be0] [c03fbc9c] fsl_spi_cpm_bufs+0x2a0/0x2d8 (unreliable) [ 16.358388] [e68d9c00] [c03fcb44] fsl_spi_bufs.isra.0+0x94/0x1a0 [ 16.358436] [e68d9c20] [c03fd970] fsl_spi_do_one_msg+0x254/0x3dc [ 16.358483] [e68d9cb0] [c03f7e50] __spi_pump_messages+0x274/0x8a4 [ 16.358529] [e68d9ce0] [c03f9d30] __spi_sync+0x344/0x378 [ 16.358573] [e68d9d20] [c03fb52c] spi_sync+0x34/0x60 [ 16.358616] [e68d9d30] [c03b4dec] at25_ee_read+0x138/0x1a8 [ 16.358667] [e68d9e50] [c04a8fb8] bin_attr_nvmem_read+0x98/0x110 [ 16.358725] [e68d9e60] [c0204b14] kernfs_fop_read_iter+0xc0/0x1fc [ 16.358774] [e68d9e80] [c0168660] vfs_read+0x284/0x410 [ 16.358821] [e68d9f00] [c016925c] ksys_read+0x6c/0x11c [ 16.358863] [e68d9f30] [c00160e0] ret_from_syscall+0x0/0x28 ... [ 16.359608] ---[ end trace a4ce3e34afef0cb5 ]--- [ 16.359638] fsl_spi b01004c0.spi: unable to map tx dma This is due to the AT25 driver using buffers on stack, which is not possible with CONFIG_VMAP_STACK. As mentionned in kernel Documentation (Documentation/spi/spi-summary.rst): - Follow standard kernel rules, and provide DMA-safe buffers in your messages. That way controller drivers using DMA aren't forced to make extra copies unless the hardware requires it (e.g. working around hardware errata that force the use of bounce buffering). Modify the driver to use a buffer located in the at25 device structure which is allocated via kmalloc during probe. Protect writes in this new buffer with the driver's mutex. Fixes: b587b13 ("[PATCH] SPI eeprom driver") Cc: stable <stable@kernel.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/r/230a9486fc68ea0182df46255e42a51099403642.1648032613.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

The following crash happens for me when running the -mm selftests (below). Specifically, it happens while running the uffd-stress subtests: kernel BUG at mm/hugetlb.c:7249! invalid opcode: 0000 [ljalves#1] PREEMPT SMP NOPTI CPU: 0 PID: 3238 Comm: uffd-stress Not tainted 6.4.0-hubbard-github+ ljalves#109 Hardware name: ASUS X299-A/PRIME X299-A, BIOS 1503 08/03/2018 RIP: 0010:huge_pte_alloc+0x12c/0x1a0 ... Call Trace: <TASK> ? __die_body+0x63/0xb0 ? die+0x9f/0xc0 ? do_trap+0xab/0x180 ? huge_pte_alloc+0x12c/0x1a0 ? do_error_trap+0xc6/0x110 ? huge_pte_alloc+0x12c/0x1a0 ? handle_invalid_op+0x2c/0x40 ? huge_pte_alloc+0x12c/0x1a0 ? exc_invalid_op+0x33/0x50 ? asm_exc_invalid_op+0x16/0x20 ? __pfx_put_prev_task_idle+0x10/0x10 ? huge_pte_alloc+0x12c/0x1a0 hugetlb_fault+0x1a3/0x1120 ? finish_task_switch+0xb3/0x2a0 ? lock_is_held_type+0xdb/0x150 handle_mm_fault+0xb8a/0xd40 ? find_vma+0x5d/0xa0 do_user_addr_fault+0x257/0x5d0 exc_page_fault+0x7b/0x1f0 asm_exc_page_fault+0x22/0x30 That happens because a BUG() statement in huge_pte_alloc() attempts to check that a pte, if present, is a hugetlb pte, but it does so in a non-lockless-safe manner that leads to a false BUG() report. We got here due to a couple of bugs, each of which by itself was not quite enough to cause a problem: First of all, before commit c33c794("mm: ptep_get() conversion"), the BUG() statement in huge_pte_alloc() was itself fragile: it relied upon compiler behavior to only read the pte once, despite using it twice in the same conditional. Next, commit c33c794 ("mm: ptep_get() conversion") broke that delicate situation, by causing all direct pte reads to be done via READ_ONCE(). And so READ_ONCE() got called twice within the same BUG() conditional, leading to comparing (potentially, occasionally) different versions of the pte, and thus to false BUG() reports. Fix this by taking a single snapshot of the pte before using it in the BUG conditional. Now, that commit is only partially to blame here but, people doing bisections will invariably land there, so this will help them find a fix for a real crash. And also, the previous behavior was unlikely to ever expose this bug--it was fragile, yet not actually broken. So that's why I chose this commit for the Fixes tag, rather than the commit that created the original BUG() statement. Link: https://lkml.kernel.org/r/20230701010442.2041858-1-jhubbard@nvidia.com Fixes: c33c794 ("mm: ptep_get() conversion") Signed-off-by: John Hubbard <jhubbard@nvidia.com> Acked-by: James Houghton <jthoughton@google.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Airlie <airlied@gmail.com> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: SeongJae Park <sj@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ljalves closed this as completed Jun 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Machine freeze #109

Machine freeze #109

markfoodyburton commented Oct 4, 2015

markfoodyburton commented Oct 5, 2015

ljalves commented Jun 24, 2016

Machine freeze #109

Machine freeze #109

Comments

markfoodyburton commented Oct 4, 2015

markfoodyburton commented Oct 5, 2015

ljalves commented Jun 24, 2016