Skip to content
This repository has been archived by the owner on Apr 13, 2019. It is now read-only.

build instructions slightly wrong? #1

Closed
tommythorn opened this issue Feb 26, 2014 · 1 comment
Closed

build instructions slightly wrong? #1

tommythorn opened this issue Feb 26, 2014 · 1 comment

Comments

@tommythorn
Copy link

[Reposted from an ucb-bar / riscv-tools issue]

Thank you so much! It's working brilliantly.

I had to make a few modifications to get it to work: first arch wants to be inside linux-3.4.76 so I did

mkdir linux-3.4.76 && mv arch linux-3.4.76

before the curl step. Also, building riscv-gcc with make linux fails for me, but the riscv-tools works great, so I instead just edited arch/riscv/configs/riscv_defconfig:CONFIG_CROSS_COMPILE to point to riscv- instead of riscv-linux-

jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit 2f01a33 upstream.

The ehci_init_driver is used to initialize hcd APIs for each
ehci controller driver, it is designed to be called only one time
and before driver register is called. The current design will
cause ehci_init_driver is called multiple times at probe process,
it will cause hc_driver's initialization affect current running hcd.

We run out NULL pointer dereference problem when one hcd is started
by module_init, and the other is started by otg thread at SMP platform.
The reason for this problem is ehci_init_driver will do memory copy
for current uniform hc_driver, and this memory copy will do memset (as 0)
first, so when the first hcd is running usb_add_hcd, and the second
hcd may clear the uniform hc_driver's space (at ehci_init_driver),
then the first hcd will meet NULL pointer at the same time.

See below two logs:

LOG_1:
ci_hdrc ci_hdrc.0: EHCI Host Controller
ci_hdrc ci_hdrc.0: new USB bus registered, assigned bus number 1
ci_hdrc ci_hdrc.1: doesn't support gadget
Unable to handle kernel NULL pointer dereference at virtual address 00000014
pgd = 80004000
[00000014] *pgd=00000000
Internal error: Oops: 805 [riscvarchive#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 108 Comm: kworker/u8:2 Not tainted 3.14.38-222193-g24b2734-dirty riscvarchive#25
Workqueue: ci_otg ci_otg_work
task: d839ec00 ti: d8400000 task.ti: d8400000
PC is at ehci_run+0x4c/0x284
LR is at _raw_spin_unlock_irqrestore+0x28/0x54
pc : [<8041f9a0>]    lr : [<8070ea84>]    psr: 60000113
sp : d8401e30  ip : 00000000  fp : d8004400
r10: 00000001  r9 : 00000001  r8 : 00000000
r7 : 00000000  r6 : d8419940  r5 : 80dd24c0  r4 : d8419800
r3 : 8001d060  r2 : 00000000  r1 : 00000001  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 10c53c7d  Table: 1000404a  DAC: 00000015
Process kworker/u8:2 (pid: 108, stack limit = 0xd8400238)
Stack: (0xd8401e30 to 0xd8402000)
1e20:                                     d87523c0 d8401e48 66667562 d8419800
1e40: 00000000 00000000 d8419800 00000000 00000000 00000000 d84198b0 8040fcdc
1e60: 00000000 80dd320c d8477610 d8419c00 d803d010 d8419800 00000000 00000000
1e80: d8004400 00000000 d8400008 80431494 80431374 d803d100 d803d010 d803d1ac
1ea0: 00000000 80432428 804323d4 d803d100 00000001 80435eb8 80e0d0bc d803d100
1ec0: 00000006 80436458 00000000 d803d100 80e92ec8 80436f44 d803d010 d803d100
1ee0: d83fde00 8043292 d8752710 d803d1f4 d803d010 8042ddfc 8042ddb8 d83f3b00
1f00: d803d1f4 80042b60 00000000 00000003 00000001 00000001 80054598 d83f3b00
1f20: d8004400 d83f3b18 d8004414 d8400000 80e3957b 00000089 d8004400 80043814
1f40: d839ec00 00000000 d83fcd80 d83f3b00 800436e4 00000000 00000000 00000000
1f60: 00000000 80048f34 00000000 00000000 00000000 d83f3b00 00000000 00000000
1f80: d8401f80 d8401f80 00000000 00000000 d8401f90 d8401f90 d8401fac d83fcd80
1fa0: 80048e68 00000000 00000000 8000e538 00000000 00000000 00000000 00000000
1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[<8041f9a0>] (ehci_run) from [<8040fcdc>] (usb_add_hcd+0x248/0x6e8)
[<8040fcdc>] (usb_add_hcd) from [<80431494>] (host_start+0x120/0x2e4)
[<80431494>] (host_start) from [<80432428>] (ci_otg_start_host+0x54/0xbc)
[<80432428>] (ci_otg_start_host) from [<80435eb8>] (otg_set_protocol+0xa4/0xd0)
[<80435eb8>] (otg_set_protocol) from [<80436458>] (otg_set_state+0x574/0xc58)
[<80436458>] (otg_set_state) from [<80436f44>] (otg_statemachine+0x408/0x46c)
[<80436f44>] (otg_statemachine) from [<8043292c>] (ci_otg_fsm_work+0x3c/0x190)
[<8043292c>] (ci_otg_fsm_work) from [<8042ddfc>] (ci_otg_work+0x44/0x1c4)
[<8042ddfc>] (ci_otg_work) from [<80042b60>] (process_one_work+0xf4/0x35c)
[<80042b60>] (process_one_work) from [<80043814>] (worker_thread+0x130/0x3bc)
[<80043814>] (worker_thread) from [<80048f34>] (kthread+0xcc/0xe4)
[<80048f34>] (kthread) from [<8000e538>] (ret_from_fork+0x14/0x3c)
Code: e5953018 e3530000 0a000000 e12fff33 (e5878014)

LOG_2:
ci_hdrc ci_hdrc.0: EHCI Host Controller
ci_hdrc ci_hdrc.0: new USB bus registered, assigned bus number 1
ci_hdrc ci_hdrc.1: doesn't support gadget
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = 80004000
[00000000] *pgd=00000000
In Online 00:00ternal e      Offline rror: Oops: 80000005 [riscvarchive#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 108 Comm: kworker/u8:2 Not tainted 3.14.38-02007-g24b2734-dirty riscvarchive#127
Workque Online 00:00ue: ci_o      Offline tg ci_otg_work
Online 00:00task: d8      Offline 39ec00 ti: d83ea000 task.ti: d83ea000
PC is at 0x0
LR is at usb_add_hcd+0x248/0x6e8
pc : [<00000000>]    lr : [<8040f644>]    psr: 60000113
sp : d83ebe60  ip : 00000000  fp : d8004400
r10: 00000001  r9 : 00000001  r8 : d85fd4b0
r7 : 00000000  r6 : 00000000  r5 : 00000000  r4 : d85fd400
r3 : 00000000  r2 : d85fd4f4  r1 : 80410178  r0 : d85fd400
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 10c53c7d  Table: 1000404a  DAC: 00000015
Process kworker/u8:2 (pid: 108, stack limit = 0xd83ea238)
Stack: (0xd83ebe60 to 0xd83ec000)
be60: 00000000 80dd920c d8654e10 d85fd800 d803e010 d85fd400 00000000 00000000
be80: d8004400 00000000 d83ea008 80430e34 80430d14 d803e100 d803e010 d803e1ac
bea0: 00000000 80431dc8 80431d74 d803e100 00000001 80435858 80e130bc d803e100
bec0: 00000006 80435df8 00000000 d803e100 80e98ec8 804368e4 d803e010 d803e100
bee0: d86e8100 804322cc d86cf050 d803e1f4 d803e010 8042d79c 8042d758 d83cf900
bf00: d803e1f4 80042b78 00000000 00000003 00000001 00000001 800545e8 d83cf900
bf20: d8004400 d83cf918 d8004414 d83ea000 80e3f57b 00000089 d8004400 8004382c
bf40: d839ec00 00000000 d8393780 d83cf900 800436fc 00000000 00000000 00000000
bf60: 00000000 80048f50 80e019f4 00000000 0000264c d83cf900 00000000 00000000
bf80: d83ebf80 d83ebf80 00000000 00000000 d83ebf90 d83ebf90 d83ebfac d8393780
bfa0: 80048e84 00000000 00000000 8000e538 00000000 00000000 00000000 00000000
bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 ee66e85d 133ebd03
[<804 Online 00:000f644>]       Offline (usb_add_hcd) from [<80430e34>] (host_start+0x120/0x2e4)
[<80430e34>] (host_start) from [<80431dc8>] (ci_otg_start_host+0x54/0xbc)
[<80431dc8>] (ci_otg_start_host) from [<80435858>] (otg_set_protocol+0xa4/0xd0)
[<80435858>] (otg_set_protocol) from [<80435df8>] (otg_set_state+0x574/0xc58)
[<80435df8>] (otg_set_state) from [<804368e4>] (otg_statemachine+0x408/0x46c)
[<804368e4>] (otg_statemachine) from [<804322cc>] (ci_otg_fsm_work+0x3c/0x190)
[<804322cc>] (ci_otg_fsm_work) from [<8042d79c>] (ci_otg_work+0x44/0x1c4)
[<8042d79c>] (ci_otg_work) from [<80042b78>] (process_one_work+0xf4/0x35c)
[<80042b78>] (process_one_work) from [<8004382c>] (worker_thread+0x130/0x3bc)
[<8004382c>] (worker_thread) from [<80048f50>] (kthread+0xcc/0xe4)
[<80048f50>] (kthread) from [<8000e538>] (ret_from_fork+0x14/0x3c)
Code: bad PC value

Cc: Jun Li <jun.li@freescale.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit c2227a3 upstream.

On an absent filesystem (one served by another server), we need to be
able to handle requests for certain attributest (like fs_locations, so
the client can find out which server does have the filesystem), but
others we can't.

We forgot to take that into account when adding another attribute
bitmask work for the SECURITY_LABEL attribute.

There an export entry with the "refer" option can result in:

[   88.414272] kernel BUG at fs/nfsd/nfs4xdr.c:2249!
[   88.414828] invalid opcode: 0000 [riscvarchive#1] SMP
[   88.415368] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nfsd xfs libcrc32c iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iosf_mbi ppdev btrfs coretemp crct10dif_pclmul crc32_pclmul crc32c_intel xor ghash_clmulni_intel raid6_pq vmw_balloon parport_pc parport i2c_piix4 shpchp vmw_vmci acpi_cpufreq auth_rpcgss nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi mptscsih serio_raw mptbase e1000 scsi_transport_spi ata_generic pata_acpi [last unloaded: nfsd]
[   88.417827] CPU: 0 PID: 2116 Comm: nfsd Not tainted 4.0.7-300.fc22.x86_64 riscvarchive#1
[   88.418448] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[   88.419093] task: ffff880079146d50 ti: ffff8800785d8000 task.ti: ffff8800785d8000
[   88.419729] RIP: 0010:[<ffffffffa04b3c10>]  [<ffffffffa04b3c10>] nfsd4_encode_fattr+0x820/0x1f00 [nfsd]
[   88.420376] RSP: 0000:ffff8800785db998  EFLAGS: 00010206
[   88.421027] RAX: 0000000000000001 RBX: 000000000018091a RCX: ffff88006668b980
[   88.421676] RDX: 00000000fffef7fc RSI: 0000000000000000 RDI: ffff880078d05000
[   88.422315] RBP: ffff8800785dbb58 R08: ffff880078d043f8 R09: ffff880078d4a000
[   88.422968] R10: 0000000000010000 R11: 0000000000000002 R12: 0000000000b0a23a
[   88.423612] R13: ffff880078d05000 R14: ffff880078683100 R15: ffff88006668b980
[   88.424295] FS:  0000000000000000(0000) GS:ffff88007c600000(0000) knlGS:0000000000000000
[   88.424944] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   88.425597] CR2: 00007f40bc370f90 CR3: 0000000035af5000 CR4: 00000000001407f0
[   88.426285] Stack:
[   88.426921]  ffff8800785dbaa8 ffffffffa049e4af ffff8800785dba08 ffffffff813298f0
[   88.427585]  ffff880078683300 ffff8800769b0de8 0000089d00000001 0000000087f805e0
[   88.428228]  ffff880000000000 ffff880079434a00 0000000000000000 ffff88006668b980
[   88.428877] Call Trace:
[   88.429527]  [<ffffffffa049e4af>] ? exp_get_by_name+0x7f/0xb0 [nfsd]
[   88.430168]  [<ffffffff813298f0>] ? inode_doinit_with_dentry+0x210/0x6a0
[   88.430807]  [<ffffffff8123833e>] ? d_lookup+0x2e/0x60
[   88.431449]  [<ffffffff81236133>] ? dput+0x33/0x230
[   88.432097]  [<ffffffff8123f214>] ? mntput+0x24/0x40
[   88.432719]  [<ffffffff812272b2>] ? path_put+0x22/0x30
[   88.433340]  [<ffffffffa049ac87>] ? nfsd_cross_mnt+0xb7/0x1c0 [nfsd]
[   88.433954]  [<ffffffffa04b54e0>] nfsd4_encode_dirent+0x1b0/0x3d0 [nfsd]
[   88.434601]  [<ffffffffa04b5330>] ? nfsd4_encode_getattr+0x40/0x40 [nfsd]
[   88.435172]  [<ffffffffa049c991>] nfsd_readdir+0x1c1/0x2a0 [nfsd]
[   88.435710]  [<ffffffffa049a530>] ? nfsd_direct_splice_actor+0x20/0x20 [nfsd]
[   88.436447]  [<ffffffffa04abf30>] nfsd4_encode_readdir+0x120/0x220 [nfsd]
[   88.437011]  [<ffffffffa04b58cd>] nfsd4_encode_operation+0x7d/0x190 [nfsd]
[   88.437566]  [<ffffffffa04aa6dd>] nfsd4_proc_compound+0x24d/0x6f0 [nfsd]
[   88.438157]  [<ffffffffa0496103>] nfsd_dispatch+0xc3/0x220 [nfsd]
[   88.438680]  [<ffffffffa006f0cb>] svc_process_common+0x43b/0x690 [sunrpc]
[   88.439192]  [<ffffffffa0070493>] svc_process+0x103/0x1b0 [sunrpc]
[   88.439694]  [<ffffffffa0495a57>] nfsd+0x117/0x190 [nfsd]
[   88.440194]  [<ffffffffa0495940>] ? nfsd_destroy+0x90/0x90 [nfsd]
[   88.440697]  [<ffffffff810bb728>] kthread+0xd8/0xf0
[   88.441260]  [<ffffffff810bb650>] ? kthread_worker_fn+0x180/0x180
[   88.441762]  [<ffffffff81789e58>] ret_from_fork+0x58/0x90
[   88.442322]  [<ffffffff810bb650>] ? kthread_worker_fn+0x180/0x180
[   88.442879] Code: 0f 84 93 05 00 00 83 f8 ea c7 85 a0 fe ff ff 00 00 27 30 0f 84 ba fe ff ff 85 c0 0f 85 a5 fe ff ff e9 e3 f9 ff ff 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 be 04 00 00 00 4c 89 ef 4c 89 8d 68 fe
[   88.444052] RIP  [<ffffffffa04b3c10>] nfsd4_encode_fattr+0x820/0x1f00 [nfsd]
[   88.444658]  RSP <ffff8800785db998>
[   88.445232] ---[ end trace 6cb9d0487d94a29f ]---

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit ecf5fc6 upstream.

Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  riscvarchive#1 schedule at ffffffff815ab76e
  riscvarchive#2 schedule_timeout at ffffffff815ae5e5
  riscvarchive#3 io_schedule_timeout at ffffffff815aad6a
  riscvarchive#4 bit_wait_io at ffffffff815abfc6
  riscvarchive#5 __wait_on_bit at ffffffff815abda5
  riscvarchive#6 wait_on_page_bit at ffffffff8111fd4f
  riscvarchive#7 shrink_page_list at ffffffff81135445
  riscvarchive#8 shrink_inactive_list at ffffffff81135845
  riscvarchive#9 shrink_lruvec at ffffffff81135ead
 riscvarchive#10 shrink_zone at ffffffff811360c3
 riscvarchive#11 shrink_zones at ffffffff81136eff
 riscvarchive#12 do_try_to_free_pages at ffffffff8113712f
 riscvarchive#13 try_to_free_mem_cgroup_pages at ffffffff811372be
 riscvarchive#14 try_charge at ffffffff81189423
 riscvarchive#15 mem_cgroup_try_charge at ffffffff8118c6f5
 riscvarchive#16 __add_to_page_cache_locked at ffffffff8112137d
 riscvarchive#17 add_to_page_cache_lru at ffffffff81121618
 riscvarchive#18 pagecache_get_page at ffffffff8112170b
 riscvarchive#19 grow_dev_page at ffffffff811c8297
 riscvarchive#20 __getblk_slow at ffffffff811c91d6
 riscvarchive#21 __getblk_gfp at ffffffff811c92c1
 riscvarchive#22 ext4_ext_grow_indepth at ffffffff8124565c
 riscvarchive#23 ext4_ext_create_new_leaf at ffffffff81246ca8
 riscvarchive#24 ext4_ext_insert_extent at ffffffff81246f09
 riscvarchive#25 ext4_ext_map_blocks at ffffffff8124a848
 riscvarchive#26 ext4_map_blocks at ffffffff8121a5b7
 riscvarchive#27 mpage_map_one_extent at ffffffff8121b1fa
 riscvarchive#28 mpage_map_and_submit_extent at ffffffff8121f07b
 riscvarchive#29 ext4_writepages at ffffffff8121f6d5
 riscvarchive#30 do_writepages at ffffffff8112c490
 riscvarchive#31 __filemap_fdatawrite_range at ffffffff81120199
 riscvarchive#32 filemap_flush at ffffffff8112041c
 riscvarchive#33 ext4_alloc_da_blocks at ffffffff81219da1
 riscvarchive#34 ext4_rename at ffffffff81229b91
 riscvarchive#35 ext4_rename2 at ffffffff81229e32
 riscvarchive#36 vfs_rename at ffffffff811a08a5
 riscvarchive#37 SYSC_renameat2 at ffffffff811a3ffc
 riscvarchive#38 sys_renameat2 at ffffffff811a408e
 riscvarchive#39 sys_rename at ffffffff8119e51e
 riscvarchive#40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Cc: stable@vger.kernel.org # 3.9+
[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
…hore set exits

commit 602b859 upstream.

The current semaphore code allows a potential use after free: in
exit_sem we may free the task's sem_undo_list while there is still
another task looping through the same semaphore set and cleaning the
sem_undo list at freeary function (the task called IPC_RMID for the same
semaphore set).

For example, with a test program [1] running which keeps forking a lot
of processes (which then do a semop call with SEM_UNDO flag), and with
the parent right after removing the semaphore set with IPC_RMID, and a
kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
in the kernel log:

   Slab corruption (Not tainted): kmalloc-64 start=ffff88003b45c1c0, len=64
   000: 6b 6b 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b  kkkkkkkk.kkkkkkk
   010: ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
   Prev obj: start=ffff88003b45c180, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff c0 fb 01 37 00 88 ff ff  ...........7....
   Next obj: start=ffff88003b45c200, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff 68 29 a7 3c 00 88 ff ff  ........h).<....
   BUG: spinlock wrong CPU on CPU#2, test/18028
   general protection fault: 0000 [riscvarchive#1] SMP
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 2 PID: 18028 Comm: test Not tainted 4.2.0-rc5+ riscvarchive#1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: spin_dump+0x53/0xc0
   Call Trace:
     spin_bug+0x30/0x40
     do_raw_spin_unlock+0x71/0xa0
     _raw_spin_unlock+0xe/0x10
     freeary+0x82/0x2a0
     ? _raw_spin_lock+0xe/0x10
     semctl_down.clone.0+0xce/0x160
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     SyS_semctl+0x236/0x2c0
     ? syscall_trace_leave+0xde/0x130
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2c a4 81 31 c0 65 8b 15 eb 40 f3 7e e8 08 31 68 00 4d 85 e4 44 8b 4b 08 74 5e <45> 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89
   RIP  [<ffffffff810d6053>] spin_dump+0x53/0xc0
    RSP <ffff88003750fd68>
   ---[ end trace 783ebb76612867a0 ]---
   NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:18053]
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 3 PID: 18053 Comm: test Tainted: G      D         4.2.0-rc5+ riscvarchive#1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: native_read_tsc+0x0/0x20
   Call Trace:
     ? delay_tsc+0x40/0x70
     __delay+0xf/0x20
     do_raw_spin_lock+0x96/0x140
     _raw_spin_lock+0xe/0x10
     sem_lock_and_putref+0x11/0x70
     SYSC_semtimedop+0x7bf/0x960
     ? handle_mm_fault+0xbf6/0x1880
     ? dequeue_task_fair+0x79/0x4a0
     ? __do_page_fault+0x19a/0x430
     ? kfree_debugcheck+0x16/0x40
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     ? do_audit_syscall_entry+0x66/0x70
     ? syscall_trace_enter_phase1+0x139/0x160
     SyS_semtimedop+0xe/0x10
     SyS_semop+0x10/0x20
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 47 10 83 e8 01 85 c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 <55> 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9
   Kernel panic - not syncing: softlockup: hung tasks

I wasn't able to trigger any badness on a recent kernel without the
proper config debugs enabled, however I have softlockup reports on some
kernel versions, in the semaphore code, which are similar as above (the
scenario is seen on some servers running IBM DB2 which uses semaphore
syscalls).

The patch here fixes the race against freeary, by acquiring or waiting
on the sem_undo_list lock as necessary (exit_sem can race with freeary,
while freeary sets un->semid to -1 and removes the same sem_undo from
list_proc or when it removes the last sem_undo).

After the patch I'm unable to reproduce the problem using the test case
[1].

[1] Test case used below:

    #include <stdio.h>
    #include <sys/types.h>
    #include <sys/ipc.h>
    #include <sys/sem.h>
    #include <sys/wait.h>
    #include <stdlib.h>
    #include <time.h>
    #include <unistd.h>
    #include <errno.h>

    #define NSEM 1
    #define NSET 5

    int sid[NSET];

    void thread()
    {
            struct sembuf op;
            int s;
            uid_t pid = getuid();

            s = rand() % NSET;
            op.sem_num = pid % NSEM;
            op.sem_op = 1;
            op.sem_flg = SEM_UNDO;

            semop(sid[s], &op, 1);
            exit(EXIT_SUCCESS);
    }

    void create_set()
    {
            int i, j;
            pid_t p;
            union {
                    int val;
                    struct semid_ds *buf;
                    unsigned short int *array;
                    struct seminfo *__buf;
            } un;

            /* Create and initialize semaphore set */
            for (i = 0; i < NSET; i++) {
                    sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
                    if (sid[i] < 0) {
                            perror("semget");
                            exit(EXIT_FAILURE);
                    }
            }
            un.val = 0;
            for (i = 0; i < NSET; i++) {
                    for (j = 0; j < NSEM; j++) {
                            if (semctl(sid[i], j, SETVAL, un) < 0)
                                    perror("semctl");
                    }
            }

            /* Launch threads that operate on semaphore set */
            for (i = 0; i < NSEM * NSET * NSET; i++) {
                    p = fork();
                    if (p < 0)
                            perror("fork");
                    if (p == 0)
                            thread();
            }

            /* Free semaphore set */
            for (i = 0; i < NSET; i++) {
                    if (semctl(sid[i], NSEM, IPC_RMID))
                            perror("IPC_RMID");
            }

            /* Wait for forked processes to exit */
            while (wait(NULL)) {
                    if (errno == ECHILD)
                            break;
            };
    }

    int main(int argc, char **argv)
    {
            pid_t p;

            srand(time(NULL));

            while (1) {
                    p = fork();
                    if (p < 0) {
                            perror("fork");
                            exit(EXIT_FAILURE);
                    }
                    if (p == 0) {
                            create_set();
                            goto end;
                    }

                    /* Wait for forked processes to exit */
                    while (wait(NULL)) {
                            if (errno == ECHILD)
                                    break;
                    };
            }
    end:
            return 0;
    }

[akpm@linux-foundation.org: use normal comment layout]
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Rafael Aquini <aquini@redhat.com>
CC: Aristeu Rozanski <aris@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit fc5fee8 upstream.

It turns out that a PV domU also requires the "Xen PV" APIC
driver. Otherwise, the flat driver is used and we get stuck in busy
loops that never exit, such as in this stack trace:

(gdb) target remote localhost:9999
Remote debugging using localhost:9999
__xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
56              while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
(gdb) bt
 #0  __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
 riscvarchive#1  __default_send_IPI_shortcut (shortcut=<optimized out>,
dest=<optimized out>, vector=<optimized out>) at
./arch/x86/include/asm/ipi.h:75
 riscvarchive#2  apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54
 riscvarchive#3  0xffffffff81011336 in arch_irq_work_raise () at
arch/x86/kernel/irq_work.c:47
 riscvarchive#4  0xffffffff8114990c in irq_work_queue (work=0xffff88000fc0e400) at
kernel/irq_work.c:100
 riscvarchive#5  0xffffffff8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633
 riscvarchive#6  0xffffffff8110ca60 in vprintk_emit (facility=0, level=<optimized
out>, dict=0x0 <irq_stack_union>, dictlen=<optimized out>,
fmt=<optimized out>, args=<optimized out>)
    at kernel/printk/printk.c:1778
 riscvarchive#7  0xffffffff816010c8 in printk (fmt=<optimized out>) at
kernel/printk/printk.c:1868
 riscvarchive#8  0xffffffffc00013ea in ?? ()
 riscvarchive#9  0x0000000000000000 in ?? ()

Mailing-list-thread: https://lkml.org/lkml/2015/8/4/755
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit 49bda21 upstream.

This commit fixes the following issues:

1. The 9th bit of buf was believed to be the LSB of divisor's
exponent, but the hardware interprets it as MSB (9th bit) of the
mantissa. The exponent is actually one bit shorter and applies
to base 4, not 2 as previously believed.

2. Loop iterations doubled the exponent instead of incrementing.

3. The exponent wasn't checked for overflow.

4. The function returned requested rate instead of actual rate.

Due to issue riscvarchive#2, the old code deviated from the wrong formula
described in riscvarchive#1 and actually yielded correct rates when divisor
was lower than 4096 by using exponents of 0, 2 or 4 base-2,
interpreted as 0, 1, 2 base-4 with the 9th mantissa bit clear.
However, at 93.75 kbaud or less the rate turned out too slow
due to riscvarchive#2 or too fast due to riscvarchive#2 and riscvarchive#3.

I tested this patch by sending and validating 0x00,0x01,..,0xff
to an FTDI dongle at 234, 987, 2401, 9601, 31415, 115199, 250k,
500k, 750k, 1M, 1.5M, 3M+1 baud. All rates passed.

I also used pv to check speed at some rates unsupported by FTDI:
45 (the lowest possible), 2M, 4M, 5M and 6M-1. Looked sane.

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Fixes: 399aa9a ("USB: pl2303: use divisors for unsupported baud
rates")
[johan: update summary ]
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit 596f5aa upstream.

There may be lots of pending requests so that the buffer of PAGE_SIZE
can't hold them at all.

One typical example is scsi-mq, the queue depth(.can_queue) of
scsi_host and blk-mq is quite big but scsi_device's queue_depth
is a bit small(.cmd_per_lun), then it is quite easy to have lots
of pending requests in hw queue.

This patch fixes the following warning and the related memory
destruction.

[  359.025101] fill_read_buffer: blk_mq_hw_sysfs_show+0x0/0x7d returned bad count^M
[  359.055595] irq event stamp: 15537^M
[  359.055606] general protection fault: 0000 [riscvarchive#1] PREEMPT SMP DEBUG_PAGEALLOC ^M
[  359.055614] Dumping ftrace buffer:^M
[  359.055660]    (ftrace buffer empty)^M
[  359.055672] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw^M
[  359.055678] CPU: 4 PID: 21631 Comm: stress-ng-sysfs Not tainted 4.2.0-rc5-next-20150805 #434^M
[  359.055679] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011^M
[  359.055682] task: ffff8802161cc000 ti: ffff88021b4a8000 task.ti: ffff88021b4a8000^M
[  359.055693] RIP: 0010:[<ffffffff811541c5>]  [<ffffffff811541c5>] __kmalloc+0xe8/0x152^M

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit bdfe0cb upstream.

This reverts commit 08439fe.

Unfortunately we still need to test for bdi->dev to avoid a crash when a
USB stick is yanked out while a file system is mounted:

   usb 2-2: USB disconnect, device number 2
   Buffer I/O error on dev sdb1, logical block 15237120, lost sync page write
   JBD2: Error -5 detected when updating journal superblock for sdb1-8.
   BUG: unable to handle kernel paging request at 34beb000
   IP: [<c136ce88>] __percpu_counter_add+0x18/0xc0
   *pdpt = 0000000023db9001 *pde = 0000000000000000
   Oops: 0000 [riscvarchive#1] SMP
   CPU: 0 PID: 4083 Comm: umount Tainted: G     U     OE   4.1.1-040101-generic #201507011435
   Hardware name: LENOVO 7675CTO/7675CTO, BIOS 7NETC2WW (2.22 ) 03/22/2011
   task: ebf06b50 ti: ebebc000 task.ti: ebebc000
   EIP: 0060:[<c136ce88>] EFLAGS: 00010082 CPU: 0
   EIP is at __percpu_counter_add+0x18/0xc0
   EAX: f21c8e88 EBX: f21c8e88 ECX: 00000000 EDX: 00000001
   ESI: 00000001 EDI: 00000000 EBP: ebebde60 ESP: ebebde40
    DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
   CR0: 8005003b CR2: 34beb000 CR3: 33354200 CR4: 000007f0
   Stack:
    c1abe100 edcb0098 edcb00ec ffffffff f21c8e68 ffffffff f21c8e68 f286d160
    ebebde84 c1160454 00000010 00000282 f72a77f8 00000984 f72a77f8 f286d160
    f286d170 ebebdea0 c11e613f 00000000 00000282 f72a77f8 edd7f4d0 00000000
   Call Trace:
    [<c1160454>] account_page_dirtied+0x74/0x110
    [<c11e613f>] __set_page_dirty+0x3f/0xb0
    [<c11e6203>] mark_buffer_dirty+0x53/0xc0
    [<c124a0cb>] ext4_commit_super+0x17b/0x250
    [<c124ac71>] ext4_put_super+0xc1/0x320
    [<c11f04ba>] ? fsnotify_unmount_inodes+0x1aa/0x1c0
    [<c11cfeda>] ? evict_inodes+0xca/0xe0
    [<c11b925a>] generic_shutdown_super+0x6a/0xe0
    [<c10a1df0>] ? prepare_to_wait_event+0xd0/0xd0
    [<c1165a50>] ? unregister_shrinker+0x40/0x50
    [<c11b92f6>] kill_block_super+0x26/0x70
    [<c11b94f5>] deactivate_locked_super+0x45/0x80
    [<c11ba007>] deactivate_super+0x47/0x60
    [<c11d2b39>] cleanup_mnt+0x39/0x80
    [<c11d2bc0>] __cleanup_mnt+0x10/0x20
    [<c1080b51>] task_work_run+0x91/0xd0
    [<c1011e3c>] do_notify_resume+0x7c/0x90
    [<c1720da5>] work_notify
   Code: 8b 55 e8 e9 f4 fe ff ff 90 90 90 90 90 90 90 90 90 90 90 55 89 e5 83 ec 20 89 5d f4 89 c3 89 75 f8 89 d6 89 7d fc 89 cf 8b 48 14 <64> 8b 01 89 45 ec 89 c2 8b 45 08 c1 fa 1f 01 75 ec 89 55 f0 89
   EIP: [<c136ce88>] __percpu_counter_add+0x18/0xc0 SS:ESP 0068:ebebde40
   CR2: 0000000034beb000
   ---[ end trace dd564a7bea834ecd ]---

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=101011

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit 1ab3638 upstream.

Not all gpio banks are necessarily enabled, in the current code this can
lead to null pointer dereferences.

[   51.130000] Unable to handle kernel NULL pointer dereference at virtual address 00000058
[   51.130000] pgd = dee04000
[   51.130000] [00000058] *pgd=3f66d831, *pte=00000000, *ppte=00000000
[   51.140000] Internal error: Oops: 17 [riscvarchive#1] ARM
[   51.140000] Modules linked in:
[   51.140000] CPU: 0 PID: 1664 Comm: cat Not tainted 4.1.1+ riscvarchive#6
[   51.140000] Hardware name: Atmel SAMA5
[   51.140000] task: df6dd880 ti: dec60000 task.ti: dec60000
[   51.140000] PC is at at91_pinconf_get+0xb4/0x200
[   51.140000] LR is at at91_pinconf_get+0xb4/0x200
[   51.140000] pc : [<c01e71a0>]    lr : [<c01e71a0>]    psr: 600f0013
sp : dec61e48  ip : 600f0013  fp : df522538
[   51.140000] r10: df52250c  r9 : 00000058  r8 : 00000068
[   51.140000] r7 : 00000000  r6 : df53c910  r5 : 00000000  r4 : dec61e7c
[   51.140000] r3 : 00000000  r2 : c06746d4  r1 : 00000000  r0 : 00000003
[   51.140000] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   51.140000] Control: 10c53c7d  Table: 3ee04059  DAC: 00000015
[   51.140000] Process cat (pid: 1664, stack limit = 0xdec60208)
[   51.140000] Stack: (0xdec61e48 to 0xdec62000)
[   51.140000] 1e40:                   00000358 00000000 df522500 ded15f80 c05a9d08 ded15f80
[   51.140000] 1e60: 0000048c 00000061 df522500 ded15f80 c05a9d08 c01e7304 ded15f80 00000000
[   51.140000] 1e80: c01e6008 00000060 0000048c c01e6034 c01e5f6c ded15f80 dec61ec0 00000000
[   51.140000] 1ea0: 00020000 ded6f280 dec61f80 00000001 00000001 c00ae0b8 b6e80000 ded15fb0
[   51.140000] 1ec0: 00000000 00000000 df4bc974 00000055 00000800 ded6f280 b6e80000 ded6f280
[   51.140000] 1ee0: ded6f280 00020000 b6e80000 00000000 00020000 c0090dec c0671e1c dec61fb0
[   51.140000] 1f00: b6f8b510 00000001 00004201 c000924c 00000000 00000003 00000003 00000000
[   51.140000] 1f20: df4bc940 00022000 00000022 c066e188 b6e7f000 c00836f4 000b6e7f ded6f280
[   51.140000] 1f40: ded6f280 b6e80000 dec61f80 ded6f280 00020000 c0091508 00000000 00000003
[   51.140000] 1f60: 00022000 00000000 00000000 ded6f280 ded6f280 00020000 b6e80000 c0091d9c
[   51.140000] 1f80: 00000000 00000000 ffffffff 00020000 00020000 b6e80000 00000003 c000f124
[   51.140000] 1fa0: dec60000 c000efa0 00020000 00020000 00000003 b6e80000 00020000 000271c4
[   51.140000] 1fc0: 00020000 00020000 b6e80000 00000003 7fffe000 00000000 00000000 00020000
[   51.140000] 1fe0: 00000000 bef50b64 00013835 b6f29c76 400f0030 00000003 00000000 00000000
[   51.140000] [<c01e71a0>] (at91_pinconf_get) from [<c01e7304>] (at91_pinconf_dbg_show+0x18/0x2c0)
[   51.140000] [<c01e7304>] (at91_pinconf_dbg_show) from [<c01e6034>] (pinconf_pins_show+0xc8/0xf8)
[   51.140000] [<c01e6034>] (pinconf_pins_show) from [<c00ae0b8>] (seq_read+0x1a0/0x464)
[   51.140000] [<c00ae0b8>] (seq_read) from [<c0090dec>] (__vfs_read+0x20/0xd0)
[   51.140000] [<c0090dec>] (__vfs_read) from [<c0091508>] (vfs_read+0x7c/0x108)
[   51.140000] [<c0091508>] (vfs_read) from [<c0091d9c>] (SyS_read+0x40/0x94)
[   51.140000] [<c0091d9c>] (SyS_read) from [<c000efa0>] (ret_fast_syscall+0x0/0x3c)
[   51.140000] Code: eb010ec2 e30a0d08 e34c005a eb0ae5a7 (e5993000)
[   51.150000] ---[ end trace fb3c370da3ea4794 ]---

Fixes: a0b957f ("pinctrl: at91: allow to have disabled gpio bank")
Signed-off-by: David Dueck <davidcdueck@googlemail.com>
Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit 0469785 upstream.

Tony Luck found on his setup, if memory block size 512M will cause crash
during booting.

  BUG: unable to handle kernel paging request at ffffea0074000020
  IP: get_nid_for_pfn+0x17/0x40
  PGD 128ffcb067 PUD 128ffc9067 PMD 0
  Oops: 0000 [riscvarchive#1] SMP
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc8 riscvarchive#1
  ...
  Call Trace:
     ? register_mem_sect_under_node+0x66/0xe0
     register_one_node+0x17b/0x240
     ? pci_iommu_alloc+0x6e/0x6e
     topology_init+0x3c/0x95
     do_one_initcall+0xcd/0x1f0

The system has non continuous RAM address:
 BIOS-e820: [mem 0x0000001300000000-0x0000001cffffffff] usable
 BIOS-e820: [mem 0x0000001d70000000-0x0000001ec7ffefff] usable
 BIOS-e820: [mem 0x0000001f00000000-0x0000002bffffffff] usable
 BIOS-e820: [mem 0x0000002c18000000-0x0000002d6fffefff] usable
 BIOS-e820: [mem 0x0000002e00000000-0x00000039ffffffff] usable

So there are start sections in memory block not present.  For example:

    memory block : [0x2c18000000, 0x2c20000000) 512M

first three sections are not present.

The current register_mem_sect_under_node() assume first section is
present, but memory block section number range [start_section_nr,
end_section_nr] would include not present section.

For arch that support vmemmap, we don't setup memmap for struct page
area within not present sections area.

So skip the pfn range that belong to absent section.

[akpm@linux-foundation.org: simplification]
[rientjes@google.com: more simplification]
Fixes: bdee237 ("x86: mm: Use 2GB memory block size on large memory x86-64 systems")
Fixes: 982792c ("x86, mm: probe memory block size for generic x86 64bit")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Cc: Greg KH <greg@kroah.com>
Cc: Ingo Molnar <mingo@elte.hu>
Tested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
…2 client

commit 18e3b73 upstream.

---Steps to Reproduce--
<nfs-server>
# cat /etc/exports
/nfs/referal  *(rw,insecure,no_subtree_check,no_root_squash,crossmnt)
/nfs/old      *(ro,insecure,subtree_check,root_squash,crossmnt)

<nfs-client>
# mount -t nfs nfs-server:/nfs/ /mnt/
# ll /mnt/*/

<nfs-server>
# cat /etc/exports
/nfs/referal   *(rw,insecure,no_subtree_check,no_root_squash,crossmnt,refer=/nfs/old/@nfs-server)
/nfs/old       *(ro,insecure,subtree_check,root_squash,crossmnt)
# service nfs restart

<nfs-client>
# ll /mnt/*/    --->>>>> oops here

[ 5123.102925] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 5123.103363] IP: [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
[ 5123.103752] PGD 587b9067 PUD 3cbf5067 PMD 0
[ 5123.104131] Oops: 0000 [riscvarchive#1]
[ 5123.104529] Modules linked in: nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev vmw_balloon parport_pc parport i2c_piix4 shpchp auth_rpcgss nfs_acl vmw_vmci lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi serio_raw scsi_transport_spi e1000 mptscsih mptbase ata_generic pata_acpi [last unloaded: nfsd]
[ 5123.105887] CPU: 0 PID: 15853 Comm: ::1-manager Tainted: G           OE   4.2.0-rc6+ #214
[ 5123.106358] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[ 5123.106860] task: ffff88007620f300 ti: ffff88005877c000 task.ti: ffff88005877c000
[ 5123.107363] RIP: 0010:[<ffffffffa03ed38b>]  [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
[ 5123.107909] RSP: 0018:ffff88005877fdb8  EFLAGS: 00010246
[ 5123.108435] RAX: ffff880053f3bc00 RBX: ffff88006ce6c908 RCX: ffff880053a0d240
[ 5123.108968] RDX: ffffea0000e6d940 RSI: ffff8800399a0000 RDI: ffff88006ce6c908
[ 5123.109503] RBP: ffff88005877fe28 R08: ffffffff81c708a0 R09: 0000000000000000
[ 5123.110045] R10: 00000000000001a2 R11: ffff88003ba7f5c8 R12: ffff880054c55800
[ 5123.110618] R13: 0000000000000000 R14: ffff880053a0d240 R15: ffff880053a0d240
[ 5123.111169] FS:  0000000000000000(0000) GS:ffffffff81c27000(0000) knlGS:0000000000000000
[ 5123.111726] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5123.112286] CR2: 0000000000000000 CR3: 0000000054cac000 CR4: 00000000001406f0
[ 5123.112888] Stack:
[ 5123.113458]  ffffea0000e6d940 ffff8800399a0000 00000000000167d0 0000000000000000
[ 5123.114049]  0000000000000000 0000000000000000 0000000000000000 00000000a7ec82c6
[ 5123.114662]  ffff88005877fe18 ffffea0000e6d940 ffff8800399a0000 ffff880054c55800
[ 5123.115264] Call Trace:
[ 5123.115868]  [<ffffffffa03fb44b>] nfs4_try_migration+0xbb/0x220 [nfsv4]
[ 5123.116487]  [<ffffffffa03fcb3b>] nfs4_run_state_manager+0x4ab/0x7b0 [nfsv4]
[ 5123.117104]  [<ffffffffa03fc690>] ? nfs4_do_reclaim+0x510/0x510 [nfsv4]
[ 5123.117813]  [<ffffffff810a4527>] kthread+0xd7/0xf0
[ 5123.118456]  [<ffffffff810a4450>] ? kthread_worker_fn+0x160/0x160
[ 5123.119108]  [<ffffffff816d9cdf>] ret_from_fork+0x3f/0x70
[ 5123.119723]  [<ffffffff810a4450>] ? kthread_worker_fn+0x160/0x160
[ 5123.120329] Code: 4c 8b 6a 58 74 17 eb 52 48 8d 55 a8 89 c6 4c 89 e7 e8 4a b5 ff ff 8b 45 b0 85 c0 74 1c 4c 89 f9 48 8b 55 90 48 8b 75 98 48 89 df <41> ff 55 00 3d e8 d8 ff ff 41 89 c6 74 cf 48 8b 4d c8 65 48 33
[ 5123.121643] RIP  [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
[ 5123.122308]  RSP <ffff88005877fdb8>
[ 5123.122942] CR2: 0000000000000000

Fixes: ec011fe ("NFS: Introduce a vector of migration recovery ops")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
[ Upstream commit 0470eb9 ]

Kirill A. Shutemov says:

This simple test-case trigers few locking asserts in kernel:

int main(int argc, char **argv)
{
        unsigned int block_size = 16 * 4096;
        struct nl_mmap_req req = {
                .nm_block_size          = block_size,
                .nm_block_nr            = 64,
                .nm_frame_size          = 16384,
                .nm_frame_nr            = 64 * block_size / 16384,
        };
        unsigned int ring_size;
	int fd;

	fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
        if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0)
                exit(1);
        if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0)
                exit(1);

	ring_size = req.nm_block_nr * req.nm_block_size;
	mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
	return 0;
}

+++ exited with 0 +++
BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616
in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init
3 locks held by init/1:
 #0:  (reboot_mutex){+.+...}, at: [<ffffffff81080959>] SyS_reboot+0xa9/0x220
 riscvarchive#1:  ((reboot_notifier_list).rwsem){.+.+..}, at: [<ffffffff8107f379>] __blocking_notifier_call_chain+0x39/0x70
 riscvarchive#2:  (rcu_callback){......}, at: [<ffffffff810d32e0>] rcu_do_batch.isra.49+0x160/0x10c0
Preemption disabled at:[<ffffffff8145365f>] __delay+0xf/0x20

CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
 ffff88017b3d8000 ffff88027bc03c38 ffffffff81929ceb 0000000000000102
 0000000000000000 ffff88027bc03c68 ffffffff81085a9d 0000000000000002
 ffffffff81ca2a20 0000000000000268 0000000000000000 ffff88027bc03c98
Call Trace:
 <IRQ>  [<ffffffff81929ceb>] dump_stack+0x4f/0x7b
 [<ffffffff81085a9d>] ___might_sleep+0x16d/0x270
 [<ffffffff81085bed>] __might_sleep+0x4d/0x90
 [<ffffffff8192e96f>] mutex_lock_nested+0x2f/0x430
 [<ffffffff81932fed>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
 [<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20
 [<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350
 [<ffffffff8182e000>] ? netlink_undo_bind+0x70/0x70
 [<ffffffff8182fe20>] netlink_sock_destruct+0x80/0x150
 [<ffffffff817e484d>] __sk_free+0x1d/0x160
 [<ffffffff817e49a9>] sk_free+0x19/0x20
[..]

Cong Wang says:

We can't hold mutex lock in a rcu callback, [..]

Thomas Graf says:

The socket should be dead at this point. It might be simpler to
add a netlink_release_ring() function which doesn't require
locking at all.

Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Diagnosed-by: Cong Wang <cwang@twopensource.com>
Suggested-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
This patch fixes a v4.1 only regression bug as reported by Martin
where UNIT_ATTENTION checking for pre v4.2-rc1 RCU conversion code
legacy se_node_acl->device_list[] was hitting a NULL pointer
dereference in:

[ 1858.639654] CPU: 2 PID: 1293 Comm: kworker/2:1 Tainted: G          I 4.1.6-fixxcopy+ riscvarchive#1
[ 1858.639699] Hardware name: Dell Inc. PowerEdge R410/0N83VF, BIOS 1.11.0 07/20/2012
[ 1858.639747] Workqueue: xcopy_wq target_xcopy_do_work [target_core_mod]
[ 1858.639782] task: ffff880036f0cbe0 ti: ffff880317940000 task.ti: ffff880317940000
[ 1858.639822] RIP: 0010:[<ffffffffa01d3774>]  [<ffffffffa01d3774>] target_scsi3_ua_check+0x24/0x60 [target_core_mod]
[ 1858.639884] RSP: 0018:ffff880317943ce0  EFLAGS: 00010282
[ 1858.639913] RAX: 0000000000000000 RBX: ffff880317943dc0 RCX: 0000000000000000
[ 1858.639950] RDX: 0000000000000000 RSI: ffff880317943dd0 RDI: ffff88030eaee408
[ 1858.639987] RBP: ffff88030eaee408 R08: 0000000000000001 R09: 0000000000000001
[ 1858.640025] R10: 0000000000000000 R11: 00000000000706e0 R12: ffff880315e0a000
[ 1858.640062] R13: ffff88030eaee408 R14: 0000000000000001 R15: ffff88030eaee408
[ 1858.640100] FS:  0000000000000000(0000) GS:ffff880322e80000(0000) knlGS:0000000000000000
[ 1858.640143] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1858.640173] CR2: 0000000000000000 CR3: 000000000180d000 CR4: 00000000000006e0
[ 1858.640210] Stack:
[ 1858.640223]  ffffffffa01cadfa ffff88030eaee400 ffff880318e7c340 ffff880315e0a000
[ 1858.640267]  ffffffffa01d8c25 ffff8800cae809e0 0000000000000400 0000000000000400
[ 1858.640310]  ffff880318e7c3d0 0000000006b75800 0000000000080000 ffff88030eaee400
[ 1858.640354] Call Trace:
[ 1858.640379]  [<ffffffffa01cadfa>] ? target_setup_cmd_from_cdb+0x13a/0x2c0 [target_core_mod]
[ 1858.640429]  [<ffffffffa01d8c25>] ? target_xcopy_setup_pt_cmd+0x85/0x320 [target_core_mod]
[ 1858.640479]  [<ffffffffa01d9424>] ? target_xcopy_do_work+0x264/0x700 [target_core_mod]
[ 1858.640526]  [<ffffffff810ac3a0>] ? pick_next_task_fair+0x720/0x8f0
[ 1858.640562]  [<ffffffff8108b3fb>] ? process_one_work+0x14b/0x430
[ 1858.640595]  [<ffffffff8108bf5b>] ? worker_thread+0x6b/0x560
[ 1858.640627]  [<ffffffff8108bef0>] ? rescuer_thread+0x390/0x390
[ 1858.640661]  [<ffffffff810913b3>] ? kthread+0xd3/0xf0
[ 1858.640689]  [<ffffffff810912e0>] ? kthread_create_on_node+0x180/0x180

Also, check for the same se_node_acl->device_list[] during EXTENDED_COPY
operation as a non-holding persistent reservation port.

Reported-by: Martin Svec <martin,svec@zoner.cz>
Tested-by: Martin Svec <martin,svec@zoner.cz>
Cc: Martin Svec <martin,svec@zoner.cz>
Cc: Alex Gorbachev <ag@iss-integration.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit eddd382 upstream.

Dmitry Vyukov reported the following using trinity and the memory
error detector AddressSanitizer
(https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel).

[ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
address ffff88002e280000
[ 124.576801] ffff88002e280000 is located 131938492886538 bytes to
the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a)
[ 124.578633] Accessed by thread T10915:
[ 124.579295] inlined in describe_heap_address
./arch/x86/mm/asan/report.c:164
[ 124.579295] #0 ffffffff810dd277 in asan_report_error
./arch/x86/mm/asan/report.c:278
[ 124.580137] riscvarchive#1 ffffffff810dc6a0 in asan_check_region
./arch/x86/mm/asan/asan.c:37
[ 124.581050] riscvarchive#2 ffffffff810dd423 in __tsan_read8 ??:0
[ 124.581893] riscvarchive#3 ffffffff8107c093 in get_wchan
./arch/x86/kernel/process_64.c:444

The address checks in the 64bit implementation of get_wchan() are
wrong in several ways:

 - The lower bound of the stack is not the start of the stack
   page. It's the start of the stack page plus sizeof (struct
   thread_info)

 - The upper bound must be:

       top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long).

   The 2 * sizeof(unsigned long) is required because the stack pointer
   points at the frame pointer. The layout on the stack is: ... IP FP
   ... IP FP. So we need to make sure that both IP and FP are in the
   bounds.

Fix the bound checks and get rid of the mix of numeric constants, u64
and unsigned long. Making all unsigned long allows us to use the same
function for 32bit as well.

Use READ_ONCE() when accessing the stack. This does not prevent a
concurrent wakeup of the task and the stack changing, but at least it
avoids TOCTOU.

Also check task state at the end of the loop. Again that does not
prevent concurrent changes, but it avoids walking for nothing.

Add proper comments while at it.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Based-on-patch-from: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
Link: http://lkml.kernel.org/r/20150930083302.694788319@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
…fy a fault

commit 2f84a89 upstream.

SunDong reported the following on

  https://bugzilla.kernel.org/show_bug.cgi?id=103841

	I think I find a linux bug, I have the test cases is constructed. I
	can stable recurring problems in fedora22(4.0.4) kernel version,
	arch for x86_64.  I construct transparent huge page, when the parent
	and child process with MAP_SHARE, MAP_PRIVATE way to access the same
	huge page area, it has the opportunity to lead to huge page copy on
	write failure, and then it will munmap the child corresponding mmap
	area, but then the child mmap area with VM_MAYSHARE attributes, child
	process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags
	functions (vma - > vm_flags & VM_MAYSHARE).

There were a number of problems with the report (e.g.  it's hugetlbfs that
triggers this, not transparent huge pages) but it was fundamentally
correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that
looks like this

	 vma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000
	 next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800
	 prot 8000000000000027 anon_vma           (null) vm_ops ffffffff8182a7a0
	 pgoff 0 file ffff88106bdb9800 private_data           (null)
	 flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb)
	 ------------
	 kernel BUG at mm/hugetlb.c:462!
	 SMP
	 Modules linked in: xt_pkttype xt_LOG xt_limit [..]
	 CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default riscvarchive#1
	 Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012
	 set_vma_resv_flags+0x2d/0x30

The VM_BUG_ON is correct because private and shared mappings have
different reservation accounting but the warning clearly shows that the
VMA is shared.

When a private COW fails to allocate a new page then only the process
that created the VMA gets the page -- all the children unmap the page.
If the children access that data in the future then they get killed.

The problem is that the same file is mapped shared and private.  During
the COW, the allocation fails, the VMAs are traversed to unmap the other
private pages but a shared VMA is found and the bug is triggered.  This
patch identifies such VMAs and skips them.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: SunDong <sund_sky@126.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: David Rientjes <rientjes@google.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit 8405a8f upstream.

Add code to nf_unregister_hook to flush the nf_queue when a hook is
unregistered.  This guarantees that the pointer that the nf_queue code
retains into the nf_hook list will remain valid while a packet is
queued.

I tested what would happen if we do not flush queued packets and was
trivially able to obtain the oops below.  All that was required was
to stop the nf_queue listening process, to delete all of the nf_tables,
and to awaken the nf_queue listening process.

> BUG: unable to handle kernel paging request at 0000000100000001
> IP: [<0000000100000001>] 0x100000001
> PGD b9c35067 PUD 0
> Oops: 0010 [riscvarchive#1] SMP
> Modules linked in:
> CPU: 0 PID: 519 Comm: lt-nfqnl_test Not tainted
> task: ffff8800b9c8c050 ti: ffff8800ba9d8000 task.ti: ffff8800ba9d8000
> RIP: 0010:[<0000000100000001>]  [<0000000100000001>] 0x100000001
> RSP: 0018:ffff8800ba9dba40  EFLAGS: 00010a16
> RAX: ffff8800bab48a00 RBX: ffff8800ba9dba90 RCX: ffff8800ba9dba90
> RDX: ffff8800b9c10128 RSI: ffff8800ba940900 RDI: ffff8800bab48a00
> RBP: ffff8800b9c10128 R08: ffffffff82976660 R09: ffff8800ba9dbb28
> R10: dead000000100100 R11: dead000000200200 R12: ffff8800ba940900
> R13: ffffffff8313fd50 R14: ffff8800b9c95200 R15: 0000000000000000
> FS:  00007fb91fc34700(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000100000001 CR3: 00000000babfb000 CR4: 00000000000007f0
> Stack:
>  ffffffff8206ab0f ffffffff82982240 ffff8800bab48a00 ffff8800b9c100a8
>  ffff8800b9c10100 0000000000000001 ffff8800ba940900 ffff8800b9c10128
>  ffffffff8206bd65 ffff8800bfb0d5e0 ffff8800bab48a00 0000000000014dc0
> Call Trace:
>  [<ffffffff8206ab0f>] ? nf_iterate+0x4f/0xa0
>  [<ffffffff8206bd65>] ? nf_reinject+0x125/0x190
>  [<ffffffff8206dee5>] ? nfqnl_recv_verdict+0x255/0x360
>  [<ffffffff81386290>] ? nla_parse+0x80/0xf0
>  [<ffffffff8206c42c>] ? nfnetlink_rcv_msg+0x13c/0x240
>  [<ffffffff811b2fec>] ? __memcg_kmem_get_cache+0x4c/0x150
>  [<ffffffff8206c2f0>] ? nfnl_lock+0x20/0x20
>  [<ffffffff82068159>] ? netlink_rcv_skb+0xa9/0xc0
>  [<ffffffff820677bf>] ? netlink_unicast+0x12f/0x1c0
>  [<ffffffff82067ade>] ? netlink_sendmsg+0x28e/0x650
>  [<ffffffff81fdd814>] ? sock_sendmsg+0x44/0x50
>  [<ffffffff81fde07b>] ? ___sys_sendmsg+0x2ab/0x2c0
>  [<ffffffff810e8f73>] ? __wake_up+0x43/0x70
>  [<ffffffff8141a134>] ? tty_write+0x1c4/0x2a0
>  [<ffffffff81fde9f4>] ? __sys_sendmsg+0x44/0x80
>  [<ffffffff823ff8d7>] ? system_call_fastpath+0x12/0x6a
> Code:  Bad RIP value.
> RIP  [<0000000100000001>] 0x100000001
>  RSP <ffff8800ba9dba40>
> CR2: 0000000100000001
> ---[ end trace 08eb65d42362793f ]---

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit 274d835 upstream.

Since 9eb1e57
drm/dp/mst: make sure mst_primary mstb is valid in work function

we validate the mstb structs in the work function, and doing
that takes a reference. So we should never get here with the
work function running using the mstb device, only if the work
function hasn't run yet or is running for another mstb.

So we don't need to sync the work here, this was causing
lockdep spew as below.

[  +0.000160] =============================================
[  +0.000001] [ INFO: possible recursive locking detected ]
[  +0.000002] 3.10.0-320.el7.rhel72.stable.backport.3.x86_64.debug riscvarchive#1 Tainted: G        W      ------------
[  +0.000001] ---------------------------------------------
[  +0.000001] kworker/4:2/1262 is trying to acquire lock:
[  +0.000001]  ((&mgr->work)){+.+.+.}, at: [<ffffffff810b29a5>] flush_work+0x5/0x2e0
[  +0.000007]
but task is already holding lock:
[  +0.000001]  ((&mgr->work)){+.+.+.}, at: [<ffffffff810b57e4>] process_one_work+0x1b4/0x710
[  +0.000004]
other info that might help us debug this:
[  +0.000001]  Possible unsafe locking scenario:

[  +0.000002]        CPU0
[  +0.000000]        ----
[  +0.000001]   lock((&mgr->work));
[  +0.000002]   lock((&mgr->work));
[  +0.000001]
 *** DEADLOCK ***

[  +0.000001]  May be due to missing lock nesting notation

[  +0.000002] 2 locks held by kworker/4:2/1262:
[  +0.000001]  #0:  (events_long){.+.+.+}, at: [<ffffffff810b57e4>] process_one_work+0x1b4/0x710
[  +0.000004]  riscvarchive#1:  ((&mgr->work)){+.+.+.}, at: [<ffffffff810b57e4>] process_one_work+0x1b4/0x710
[  +0.000003]
stack backtrace:
[  +0.000003] CPU: 4 PID: 1262 Comm: kworker/4:2 Tainted: G        W      ------------   3.10.0-320.el7.rhel72.stable.backport.3.x86_64.debug riscvarchive#1
[  +0.000001] Hardware name: LENOVO 20EGS0R600/20EGS0R600, BIOS GNET71WW (2.19 ) 02/05/2015
[  +0.000008] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
[  +0.000001]  ffffffff82c26c90 00000000a527b914 ffff88046399bae8 ffffffff816fe04d
[  +0.000004]  ffff88046399bb58 ffffffff8110f47f ffff880461438000 0001009b840fc003
[  +0.000002]  ffff880461438a98 0000000000000000 0000000804dc26e1 ffffffff824a2c00
[  +0.000003] Call Trace:
[  +0.000004]  [<ffffffff816fe04d>] dump_stack+0x19/0x1b
[  +0.000004]  [<ffffffff8110f47f>] __lock_acquire+0x115f/0x1250
[  +0.000002]  [<ffffffff8110fd49>] lock_acquire+0x99/0x1e0
[  +0.000002]  [<ffffffff810b29a5>] ? flush_work+0x5/0x2e0
[  +0.000002]  [<ffffffff810b29ee>] flush_work+0x4e/0x2e0
[  +0.000002]  [<ffffffff810b29a5>] ? flush_work+0x5/0x2e0
[  +0.000004]  [<ffffffff81025905>] ? native_sched_clock+0x35/0x80
[  +0.000002]  [<ffffffff81025959>] ? sched_clock+0x9/0x10
[  +0.000002]  [<ffffffff810da1f5>] ? local_clock+0x25/0x30
[  +0.000002]  [<ffffffff8110dca9>] ? mark_held_locks+0xb9/0x140
[  +0.000003]  [<ffffffff810b4ed5>] ? __cancel_work_timer+0x95/0x160
[  +0.000002]  [<ffffffff810b4ee8>] __cancel_work_timer+0xa8/0x160
[  +0.000002]  [<ffffffff810b4fb0>] cancel_work_sync+0x10/0x20
[  +0.000007]  [<ffffffffa0160d17>] drm_dp_destroy_mst_branch_device+0x27/0x120 [drm_kms_helper]
[  +0.000006]  [<ffffffffa0163968>] drm_dp_mst_link_probe_work+0x78/0xa0 [drm_kms_helper]
[  +0.000002]  [<ffffffff810b5850>] process_one_work+0x220/0x710
[  +0.000002]  [<ffffffff810b57e4>] ? process_one_work+0x1b4/0x710
[  +0.000005]  [<ffffffff810b5e5b>] worker_thread+0x11b/0x3a0
[  +0.000003]  [<ffffffff810b5d40>] ? process_one_work+0x710/0x710
[  +0.000002]  [<ffffffff810beced>] kthread+0xed/0x100
[  +0.000003]  [<ffffffff810bec00>] ? insert_kthread_work+0x80/0x80
[  +0.000003]  [<ffffffff817121d8>] ret_from_fork+0x58/0x90

v2: add flush_work.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit cf6f54e upstream.

Fixes the following lockdep splat:
[    1.244527] =============================================
[    1.245193] [ INFO: possible recursive locking detected ]
[    1.245193] 4.2.0-rc1+ riscvarchive#37 Not tainted
[    1.245193] ---------------------------------------------
[    1.245193] cp/742 is trying to acquire lock:
[    1.245193]  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff812b3f69>] ubifs_init_security+0x29/0xb0
[    1.245193]
[    1.245193] but task is already holding lock:
[    1.245193]  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff81198e7f>] path_openat+0x3af/0x1280
[    1.245193]
[    1.245193] other info that might help us debug this:
[    1.245193]  Possible unsafe locking scenario:
[    1.245193]
[    1.245193]        CPU0
[    1.245193]        ----
[    1.245193]   lock(&sb->s_type->i_mutex_key#9);
[    1.245193]   lock(&sb->s_type->i_mutex_key#9);
[    1.245193]
[    1.245193]  *** DEADLOCK ***
[    1.245193]
[    1.245193]  May be due to missing lock nesting notation
[    1.245193]
[    1.245193] 2 locks held by cp/742:
[    1.245193]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811ad37f>] mnt_want_write+0x1f/0x50
[    1.245193]  riscvarchive#1:  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff81198e7f>] path_openat+0x3af/0x1280
[    1.245193]
[    1.245193] stack backtrace:
[    1.245193] CPU: 2 PID: 742 Comm: cp Not tainted 4.2.0-rc1+ riscvarchive#37
[    1.245193] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140816_022509-build35 04/01/2014
[    1.245193]  ffffffff8252d530 ffff88007b023a38 ffffffff814f6f49 ffffffff810b56c5
[    1.245193]  ffff88007c30cc80 ffff88007b023af8 ffffffff810a150d ffff88007b023a68
[    1.245193]  000000008101302a ffff880000000000 00000008f447e23f ffffffff8252d500
[    1.245193] Call Trace:
[    1.245193]  [<ffffffff814f6f49>] dump_stack+0x4c/0x65
[    1.245193]  [<ffffffff810b56c5>] ? console_unlock+0x1c5/0x510
[    1.245193]  [<ffffffff810a150d>] __lock_acquire+0x1a6d/0x1ea0
[    1.245193]  [<ffffffff8109fa78>] ? __lock_is_held+0x58/0x80
[    1.245193]  [<ffffffff810a1a93>] lock_acquire+0xd3/0x270
[    1.245193]  [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0
[    1.245193]  [<ffffffff814fc83b>] mutex_lock_nested+0x6b/0x3a0
[    1.245193]  [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0
[    1.245193]  [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0
[    1.245193]  [<ffffffff812b3f69>] ubifs_init_security+0x29/0xb0
[    1.245193]  [<ffffffff8128e286>] ubifs_create+0xa6/0x1f0
[    1.245193]  [<ffffffff81198e7f>] ? path_openat+0x3af/0x1280
[    1.245193]  [<ffffffff81195d15>] vfs_create+0x95/0xc0
[    1.245193]  [<ffffffff8119929c>] path_openat+0x7cc/0x1280
[    1.245193]  [<ffffffff8109ffe3>] ? __lock_acquire+0x543/0x1ea0
[    1.245193]  [<ffffffff81088f20>] ? sched_clock_cpu+0x90/0xc0
[    1.245193]  [<ffffffff81088c00>] ? calc_global_load_tick+0x60/0x90
[    1.245193]  [<ffffffff81088f20>] ? sched_clock_cpu+0x90/0xc0
[    1.245193]  [<ffffffff811a9cef>] ? __alloc_fd+0xaf/0x180
[    1.245193]  [<ffffffff8119ac55>] do_filp_open+0x75/0xd0
[    1.245193]  [<ffffffff814ffd86>] ? _raw_spin_unlock+0x26/0x40
[    1.245193]  [<ffffffff811a9cef>] ? __alloc_fd+0xaf/0x180
[    1.245193]  [<ffffffff81189bd9>] do_sys_open+0x129/0x200
[    1.245193]  [<ffffffff81189cc9>] SyS_open+0x19/0x20
[    1.245193]  [<ffffffff81500717>] entry_SYSCALL_64_fastpath+0x12/0x6f

While the lockdep splat is a false positive, becuase path_openat holds i_mutex
of the parent directory and ubifs_init_security() tries to acquire i_mutex
of a new inode, it reveals that taking i_mutex in ubifs_init_security() is
in vain because it is only being called in the inode allocation path
and therefore nobody else can see the inode yet.

Reported-and-tested-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reviewed-and-tested-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: dedekind1@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit ee556d0 upstream.

When function graph tracer is enabled, the following operation
will trigger panic:

mount -t debugfs nodev /sys/kernel
echo next_tgid > /sys/kernel/tracing/set_ftrace_filter
echo function_graph > /sys/kernel/tracing/current_tracer
ls /proc/

------------[ cut here ]------------
[  198.501417] Unable to handle kernel paging request at virtual address cb88537fdc8ba316
[  198.506126] pgd = ffffffc008f79000
[  198.509363] [cb88537fdc8ba316] *pgd=00000000488c6003, *pud=00000000488c6003, *pmd=0000000000000000
[  198.517726] Internal error: Oops: 94000005 [riscvarchive#1] SMP
[  198.518798] Modules linked in:
[  198.520582] CPU: 1 PID: 1388 Comm: ls Tainted: G
[  198.521800] Hardware name: linux,dummy-virt (DT)
[  198.522852] task: ffffffc0fa9e8000 ti: ffffffc0f9ab0000 task.ti: ffffffc0f9ab0000
[  198.524306] PC is at next_tgid+0x30/0x100
[  198.525205] LR is at return_to_handler+0x0/0x20
[  198.526090] pc : [<ffffffc0002a1070>] lr : [<ffffffc0000907c0>] pstate: 60000145
[  198.527392] sp : ffffffc0f9ab3d40
[  198.528084] x29: ffffffc0f9ab3d40 x28: ffffffc0f9ab0000
[  198.529406] x27: ffffffc000d6a000 x26: ffffffc000b786e8
[  198.530659] x25: ffffffc0002a1900 x24: ffffffc0faf16c00
[  198.531942] x23: ffffffc0f9ab3ea0 x22: 0000000000000002
[  198.533202] x21: ffffffc000d85050 x20: 0000000000000002
[  198.534446] x19: 0000000000000002 x18: 0000000000000000
[  198.535719] x17: 000000000049fa08 x16: ffffffc000242efc
[  198.537030] x15: 0000007fa472b54c x14: ffffffffff000000
[  198.538347] x13: ffffffc0fada84a0 x12: 0000000000000001
[  198.539634] x11: ffffffc0f9ab3d70 x10: ffffffc0f9ab3d70
[  198.540915] x9 : ffffffc0000907c0 x8 : ffffffc0f9ab3d40
[  198.542215] x7 : 0000002e330f08f0 x6 : 0000000000000015
[  198.543508] x5 : 0000000000000f08 x4 : ffffffc0f9835ec0
[  198.544792] x3 : cb88537fdc8ba316 x2 : cb88537fdc8ba306
[  198.546108] x1 : 0000000000000002 x0 : ffffffc000d85050
[  198.547432]
[  198.547920] Process ls (pid: 1388, stack limit = 0xffffffc0f9ab0020)
[  198.549170] Stack: (0xffffffc0f9ab3d40 to 0xffffffc0f9ab4000)
[  198.582568] Call trace:
[  198.583313] [<ffffffc0002a1070>] next_tgid+0x30/0x100
[  198.584359] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
[  198.585503] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
[  198.586574] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
[  198.587660] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
[  198.588896] Code: aa0003f5 2a0103f4 b4000102 91004043 (885f7c60)
[  198.591092] ---[ end trace 6a346f8f20949ac8 ]---

This is because when using function graph tracer, if the traced
function return value is in multi regs ([x0-x7]), return_to_handler
may corrupt them. So in return_to_handler, the parameter regs should
be protected properly.

Signed-off-by: Li Bin <huawei.libin@huawei.com>
Acked-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit 3ec0c97 upstream.

If filelayout_decode_layout fail, _filelayout_free_lseg will causes
a double freeing of fh_array.

[ 1179.279800] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1179.280198] IP: [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.281010] PGD 0
[ 1179.281443] Oops: 0000 [riscvarchive#1]
[ 1179.281831] Modules linked in: nfs_layout_nfsv41_files(OE) nfsv4(OE) nfs(OE) fscache(E) xfs libcrc32c coretemp nfsd crct10dif_pclmul ppdev crc32_pclmul crc32c_intel auth_rpcgss ghash_clmulni_intel nfs_acl lockd vmw_balloon grace sunrpc parport_pc vmw_vmci parport shpchp i2c_piix4 vmwgfx drm_kms_helper ttm drm serio_raw mptspi scsi_transport_spi mptscsih e1000 mptbase ata_generic pata_acpi [last unloaded: fscache]
[ 1179.283891] CPU: 0 PID: 13336 Comm: cat Tainted: G           OE   4.3.0-rc1-pnfs+ #244
[ 1179.284323] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[ 1179.285206] task: ffff8800501d48c0 ti: ffff88003e3c4000 task.ti: ffff88003e3c4000
[ 1179.285668] RIP: 0010:[<ffffffffa027222d>]  [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.286612] RSP: 0018:ffff88003e3c77f8  EFLAGS: 00010202
[ 1179.287092] RAX: 0000000000000000 RBX: ffff88001fe78900 RCX: 0000000000000000
[ 1179.287731] RDX: ffffea0000f40760 RSI: ffff88001fe789c8 RDI: ffff88001fe789c0
[ 1179.288383] RBP: ffff88003e3c7810 R08: ffffea0000f40760 R09: 0000000000000000
[ 1179.289170] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88001fe789c8
[ 1179.289959] R13: ffff88001fe789c0 R14: ffff88004ec05a80 R15: ffff88004f935b88
[ 1179.290791] FS:  00007f4e66bb5700(0000) GS:ffffffff81c29000(0000) knlGS:0000000000000000
[ 1179.291580] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1179.292209] CR2: 0000000000000000 CR3: 00000000203f8000 CR4: 00000000001406f0
[ 1179.292731] Stack:
[ 1179.293195]  ffff88001fe78900 00000000000000d0 ffff88001fe78178 ffff88003e3c7868
[ 1179.293676]  ffffffffa0272737 0000000000000001 0000000000000001 ffff88001fe78800
[ 1179.294151]  00000000614fffce ffffffff81727671 ffff88001fe78100 ffff88001fe78100
[ 1179.294623] Call Trace:
[ 1179.295092]  [<ffffffffa0272737>] filelayout_alloc_lseg+0xa7/0x2d0 [nfs_layout_nfsv41_files]
[ 1179.295625]  [<ffffffff81727671>] ? out_of_line_wait_on_bit+0x81/0xb0
[ 1179.296133]  [<ffffffffa040407e>] pnfs_layout_process+0xae/0x320 [nfsv4]
[ 1179.296632]  [<ffffffffa03e0a01>] nfs4_proc_layoutget+0x2b1/0x360 [nfsv4]
[ 1179.297134]  [<ffffffffa0402983>] pnfs_update_layout+0x853/0xb30 [nfsv4]
[ 1179.297632]  [<ffffffffa039db24>] ? nfs_get_lock_context+0x74/0x170 [nfs]
[ 1179.298158]  [<ffffffffa0271807>] filelayout_pg_init_read+0x37/0x50 [nfs_layout_nfsv41_files]
[ 1179.298834]  [<ffffffffa03a72d9>] __nfs_pageio_add_request+0x119/0x460 [nfs]
[ 1179.299385]  [<ffffffffa03a6bd7>] ? nfs_create_request.part.9+0x37/0x2e0 [nfs]
[ 1179.299872]  [<ffffffffa03a7cc3>] nfs_pageio_add_request+0xa3/0x1b0 [nfs]
[ 1179.300362]  [<ffffffffa03a8635>] readpage_async_filler+0x85/0x260 [nfs]
[ 1179.300907]  [<ffffffff81180cb1>] read_cache_pages+0x91/0xd0
[ 1179.301391]  [<ffffffffa03a85b0>] ? nfs_read_completion+0x220/0x220 [nfs]
[ 1179.301867]  [<ffffffffa03a8dc8>] nfs_readpages+0x128/0x200 [nfs]
[ 1179.302330]  [<ffffffff81180ef3>] __do_page_cache_readahead+0x203/0x280
[ 1179.302784]  [<ffffffff81180dc8>] ? __do_page_cache_readahead+0xd8/0x280
[ 1179.303413]  [<ffffffff81181116>] ondemand_readahead+0x1a6/0x2f0
[ 1179.303855]  [<ffffffff81181371>] page_cache_sync_readahead+0x31/0x50
[ 1179.304286]  [<ffffffff811750a6>] generic_file_read_iter+0x4a6/0x5c0
[ 1179.304711]  [<ffffffffa03a0316>] ? __nfs_revalidate_mapping+0x1f6/0x240 [nfs]
[ 1179.305132]  [<ffffffffa039ccf2>] nfs_file_read+0x52/0xa0 [nfs]
[ 1179.305540]  [<ffffffff811e343c>] __vfs_read+0xcc/0x100
[ 1179.305936]  [<ffffffff811e3d15>] vfs_read+0x85/0x130
[ 1179.306326]  [<ffffffff811e4a98>] SyS_read+0x58/0xd0
[ 1179.306708]  [<ffffffff8172caaf>] entry_SYSCALL_64_fastpath+0x12/0x76
[ 1179.307094] Code: c4 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 8b 07 49 89 f4 85 c0 74 47 48 8b 06 49 89 fd <48> 8b 38 48 85 ff 74 22 31 db eb 0c 48 63 d3 48 8b 3c d0 48 85
[ 1179.308357] RIP  [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.309177]  RSP <ffff88003e3c77f8>
[ 1179.309582] CR2: 0000000000000000

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: William Dauchy <william@gandi.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit e81107d upstream.

My colleague ran into a program stall on a x86_64 server, where
n_tty_read() was waiting for data even if there was data in the buffer
in the pty.  kernel stack for the stuck process looks like below.
 #0 [ffff88303d107b58] __schedule at ffffffff815c4b20
 riscvarchive#1 [ffff88303d107bd0] schedule at ffffffff815c513e
 riscvarchive#2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818
 riscvarchive#3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2
 riscvarchive#4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23
 riscvarchive#5 [ffff88303d107dd0] tty_read at ffffffff81368013
 riscvarchive#6 [ffff88303d107e20] __vfs_read at ffffffff811a3704
 riscvarchive#7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57
 riscvarchive#8 [ffff88303d107f00] sys_read at ffffffff811a4306
 riscvarchive#9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7

There seems to be two problems causing this issue.

First, in drivers/tty/n_tty.c, __receive_buf() stores the data and
updates ldata->commit_head using smp_store_release() and then checks
the wait queue using waitqueue_active().  However, since there is no
memory barrier, __receive_buf() could return without calling
wake_up_interactive_poll(), and at the same time, n_tty_read() could
start to wait in wait_woken() as in the following chart.

        __receive_buf()                         n_tty_read()
------------------------------------------------------------------------
if (waitqueue_active(&tty->read_wait))
/* Memory operations issued after the
   RELEASE may be completed before the
   RELEASE operation has completed */
                                        add_wait_queue(&tty->read_wait, &wait);
                                        ...
                                        if (!input_available_p(tty, 0)) {
smp_store_release(&ldata->commit_head,
                  ldata->read_head);
                                        ...
                                        timeout = wait_woken(&wait,
                                          TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------

The second problem is that n_tty_read() also lacks a memory barrier
call and could also cause __receive_buf() to return without calling
wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken()
as in the chart below.

        __receive_buf()                         n_tty_read()
------------------------------------------------------------------------
                                        spin_lock_irqsave(&q->lock, flags);
                                        /* from add_wait_queue() */
                                        ...
                                        if (!input_available_p(tty, 0)) {
                                        /* Memory operations issued after the
                                           RELEASE may be completed before the
                                           RELEASE operation has completed */
smp_store_release(&ldata->commit_head,
                  ldata->read_head);
if (waitqueue_active(&tty->read_wait))
                                        __add_wait_queue(q, wait);
                                        spin_unlock_irqrestore(&q->lock,flags);
                                        /* from add_wait_queue() */
                                        ...
                                        timeout = wait_woken(&wait,
                                          TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------

There are also other places in drivers/tty/n_tty.c which have similar
calls to waitqueue_active(), so instead of adding many memory barrier
calls, this patch simply removes the call to waitqueue_active(),
leaving just wake_up*() behind.

This fixes both problems because, even though the memory access before
or after the spinlocks in both wake_up*() and add_wait_queue() can
sneak into the critical section, it cannot go past it and the critical
section assures that they will be serialized (please see "INTER-CPU
ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a
better explanation).  Moreover, the resulting code is much simpler.

Latency measurement using a ping-pong test over a pty doesn't show any
visible performance drop.

Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit 7180ddd upstream.

The kernel may delay interrupts for a long time which can result in timers
being delayed. If this occurs the intel_pstate driver will crash with a
divide by zero error:

divide error: 0000 [riscvarchive#1] SMP
Modules linked in: btrfs zlib_deflate raid6_pq xor msdos ext4 mbcache jbd2 binfmt_misc arc4 md4 nls_utf8 cifs dns_resolver tcp_lp bnep bluetooth rfkill fuse dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ftp ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables intel_powerclamp coretemp vfat fat kvm_intel iTCO_wdt iTCO_vendor_support ipmi_devintf sr_mod kvm crct10dif_pclmul
 crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel cdc_ether lrw usbnet cdrom mii gf128mul glue_helper ablk_helper cryptd lpc_ich mfd_core pcspkr sb_edac edac_core ipmi_si ipmi_msghandler ioatdma wmi shpchp acpi_pad nfsd auth_rpcgss nfs_acl lockd uinput dm_multipath sunrpc xfs libcrc32c usb_storage sd_mod crc_t10dif crct10dif_common ixgbe mgag200 syscopyarea sysfillrect sysimgblt mdio drm_kms_helper ttm igb drm ptp pps_core dca i2c_algo_bit megaraid_sas i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 113 PID: 0 Comm: swapper/113 Tainted: G        W   --------------   3.10.0-229.1.2.el7.x86_64 riscvarchive#1
Hardware name: IBM x3950 X6 -[3837AC2]-/00FN827, BIOS -[A8E112BUS-1.00]- 08/27/2014
task: ffff880fe8abe660 ti: ffff880fe8ae4000 task.ti: ffff880fe8ae4000
RIP: 0010:[<ffffffff814a9279>]  [<ffffffff814a9279>] intel_pstate_timer_func+0x179/0x3d0
RSP: 0018:ffff883fff4e3db8  EFLAGS: 00010206
RAX: 0000000027100000 RBX: ffff883fe6965100 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000010 RDI: 000000002e53632d
RBP: ffff883fff4e3e20 R08: 000e6f69a5a125c0 R09: ffff883fe84ec001
R10: 0000000000000002 R11: 0000000000000005 R12: 00000000000049f5
R13: 0000000000271000 R14: 00000000000049f5 R15: 0000000000000246
FS:  0000000000000000(0000) GS:ffff883fff4e0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7668601000 CR3: 000000000190a000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffff883fff4e3e58 ffffffff81099dc1 0000000000000086 0000000000000071
 ffff883fff4f3680 0000000000000071 fbdc8a965e33afee ffffffff810b69dd
 ffff883fe84ec000 ffff883fe6965108 0000000000000100 ffffffff814a9100
Call Trace:
 <IRQ>

 [<ffffffff81099dc1>] ? run_posix_cpu_timers+0x51/0x840
 [<ffffffff810b69dd>] ? trigger_load_balance+0x5d/0x200
 [<ffffffff814a9100>] ? pid_param_set+0x130/0x130
 [<ffffffff8107df56>] call_timer_fn+0x36/0x110
 [<ffffffff814a9100>] ? pid_param_set+0x130/0x130
 [<ffffffff8107fdcf>] run_timer_softirq+0x21f/0x320
 [<ffffffff81077b2f>] __do_softirq+0xef/0x280
 [<ffffffff816156dc>] call_softirq+0x1c/0x30
 [<ffffffff81015d95>] do_softirq+0x65/0xa0
 [<ffffffff81077ec5>] irq_exit+0x115/0x120
 [<ffffffff81616355>] smp_apic_timer_interrupt+0x45/0x60
 [<ffffffff81614a1d>] apic_timer_interrupt+0x6d/0x80
 <EOI>

 [<ffffffff814a9c32>] ? cpuidle_enter_state+0x52/0xc0
 [<ffffffff814a9c28>] ? cpuidle_enter_state+0x48/0xc0
 [<ffffffff814a9d65>] cpuidle_idle_call+0xc5/0x200
 [<ffffffff8101d14e>] arch_cpu_idle+0xe/0x30
 [<ffffffff810c67c1>] cpu_startup_entry+0xf1/0x290
 [<ffffffff8104228a>] start_secondary+0x1ba/0x230
Code: 42 0f 00 45 89 e6 48 01 c2 43 8d 44 6d 00 39 d0 73 26 49 c1 e5 08 89 d2 4d 63 f4 49 63 c5 48 c1 e2 08 48 c1 e0 08 48 63 ca 48 99 <48> f7 f9 48 98 4c 0f af f0 49 c1 ee 08 8b 43 78 c1 e0 08 44 29
RIP  [<ffffffff814a9279>] intel_pstate_timer_func+0x179/0x3d0
 RSP <ffff883fff4e3db8>

The kernel values for cpudata for CPU 113 were:

struct cpudata {
  cpu = 113,
  timer = {
    entry = {
      next = 0x0,
      prev = 0xdead000000200200
    },
    expires = 8357799745,
    base = 0xffff883fe84ec001,
    function = 0xffffffff814a9100 <intel_pstate_timer_func>,
    data = 18446612406765768960,
<snip>
    i_gain = 0,
    d_gain = 0,
    deadband = 0,
    last_err = 22489
  },
  last_sample_time = {
    tv64 = 4063132438017305
  },
  prev_aperf = 287326796397463,
  prev_mperf = 251427432090198,
  sample = {
    core_pct_busy = 23081,
    aperf = 2937407,
    mperf = 3257884,
    freq = 2524484,
    time = {
      tv64 = 4063149215234118
    }
  }
}

which results in the time between samples = last_sample_time - sample.time
= 4063149215234118 - 4063132438017305 = 16777216813 which is 16.777 seconds.

The duration between reads of the APERF and MPERF registers overflowed a s32
sized integer in intel_pstate_get_scaled_busy()'s call to div_fp().  The result
is that int_tofp(duration_us) == 0, and the kernel attempts to divide by 0.

While the kernel shouldn't be delaying for a long time, it can and does
happen and the intel_pstate driver should not panic in this situation.  This
patch changes the div_fp() function to use div64_s64() to allow for "long"
division.  This will avoid the overflow condition on long delays.

[v2]: use div64_s64() in div_fp()

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Renninger <trenn@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
[ Upstream commit e674016 ]

Since commit 2b018d5 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release"),
pppoe_release() calls dev_put(po->pppoe_dev) if sk is in the
PPPOX_ZOMBIE state. But pppoe_flush_dev() can set sk->sk_state to
PPPOX_ZOMBIE _and_ reset po->pppoe_dev to NULL. This leads to the
following oops:

[  570.140800] BUG: unable to handle kernel NULL pointer dereference at 00000000000004e0
[  570.142931] IP: [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe]
[  570.144601] PGD 3d119067 PUD 3dbc1067 PMD 0
[  570.144601] Oops: 0000 [riscvarchive#1] SMP
[  570.144601] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppoe pppox ppp_generic slhc loop crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ansi_cprng aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper acpi_cpufreq evdev serio_raw processor button ext4 crc16 mbcache jbd2 virtio_net virtio_blk virtio_pci virtio_ring virtio
[  570.144601] CPU: 1 PID: 15738 Comm: ppp-apitest Not tainted 4.2.0 riscvarchive#1
[  570.144601] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
[  570.144601] task: ffff88003d30d600 ti: ffff880036b60000 task.ti: ffff880036b60000
[  570.144601] RIP: 0010:[<ffffffffa018c701>]  [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe]
[  570.144601] RSP: 0018:ffff880036b63e08  EFLAGS: 00010202
[  570.144601] RAX: 0000000000000000 RBX: ffff880034340000 RCX: 0000000000000206
[  570.144601] RDX: 0000000000000006 RSI: ffff88003d30dd20 RDI: ffff88003d30dd20
[  570.144601] RBP: ffff880036b63e28 R08: 0000000000000001 R09: 0000000000000000
[  570.144601] R10: 00007ffee9b50420 R11: ffff880034340078 R12: ffff8800387ec780
[  570.144601] R13: ffff8800387ec7b0 R14: ffff88003e222aa0 R15: ffff8800387ec7b0
[  570.144601] FS:  00007f5672f48700(0000) GS:ffff88003fc80000(0000) knlGS:0000000000000000
[  570.144601] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  570.144601] CR2: 00000000000004e0 CR3: 0000000037f7e000 CR4: 00000000000406a0
[  570.144601] Stack:
[  570.144601]  ffffffffa018f240 ffff8800387ec780 ffffffffa018f240 ffff8800387ec7b0
[  570.144601]  ffff880036b63e48 ffffffff812caabe ffff880039e4e000 0000000000000008
[  570.144601]  ffff880036b63e58 ffffffff812cabad ffff880036b63ea8 ffffffff811347f5
[  570.144601] Call Trace:
[  570.144601]  [<ffffffff812caabe>] sock_release+0x1a/0x75
[  570.144601]  [<ffffffff812cabad>] sock_close+0xd/0x11
[  570.144601]  [<ffffffff811347f5>] __fput+0xff/0x1a5
[  570.144601]  [<ffffffff811348cb>] ____fput+0x9/0xb
[  570.144601]  [<ffffffff81056682>] task_work_run+0x66/0x90
[  570.144601]  [<ffffffff8100189e>] prepare_exit_to_usermode+0x8c/0xa7
[  570.144601]  [<ffffffff81001a26>] syscall_return_slowpath+0x16d/0x19b
[  570.144601]  [<ffffffff813babb1>] int_ret_from_sys_call+0x25/0x9f
[  570.144601] Code: 48 8b 83 c8 01 00 00 a8 01 74 12 48 89 df e8 8b 27 14 e1 b8 f7 ff ff ff e9 b7 00 00 00 8a 43 12 a8 0b 74 1c 48 8b 83 a8 04 00 00 <48> 8b 80 e0 04 00 00 65 ff 08 48 c7 83 a8 04 00 00 00 00 00 00
[  570.144601] RIP  [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe]
[  570.144601]  RSP <ffff880036b63e08>
[  570.144601] CR2: 00000000000004e0
[  570.200518] ---[ end trace 46956baf17349563 ]---

pppoe_flush_dev() has no reason to override sk->sk_state with
PPPOX_ZOMBIE. pppox_unbind_sock() already sets sk->sk_state to
PPPOX_DEAD, which is the correct state given that sk is unbound and
po->pppoe_dev is NULL.

Fixes: 2b018d5 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release")
Tested-by: Oleksii Berezhniak <core@irc.lg.ua>
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
[ Upstream commit 93d08b6 ]

When sockets have a native eBPF program attached through
setsockopt(sk, SOL_SOCKET, SO_ATTACH_BPF, ...), and then try to
dump these over getsockopt(sk, SOL_SOCKET, SO_GET_FILTER, ...),
the following panic appears:

  [49904.178642] BUG: unable to handle kernel NULL pointer dereference at (null)
  [49904.178762] IP: [<ffffffff81610fd9>] sk_get_filter+0x39/0x90
  [49904.182000] PGD 86fc9067 PUD 531a1067 PMD 0
  [49904.185196] Oops: 0000 [riscvarchive#1] SMP
  [...]
  [49904.224677] Call Trace:
  [49904.226090]  [<ffffffff815e3d49>] sock_getsockopt+0x319/0x740
  [49904.227535]  [<ffffffff812f59e3>] ? sock_has_perm+0x63/0x70
  [49904.228953]  [<ffffffff815e2fc8>] ? release_sock+0x108/0x150
  [49904.230380]  [<ffffffff812f5a43>] ? selinux_socket_getsockopt+0x23/0x30
  [49904.231788]  [<ffffffff815dff36>] SyS_getsockopt+0xa6/0xc0
  [49904.233267]  [<ffffffff8171b9ae>] entry_SYSCALL_64_fastpath+0x12/0x71

The underlying issue is the very same as in commit b382c08
("sock, diag: fix panic in sock_diag_put_filterinfo"), that is,
native eBPF programs don't store an original program since this
is only needed in cBPF ones.

However, sk_get_filter() wasn't updated to test for this at the
time when eBPF could be attached. Just throw an error to the user
to indicate that eBPF cannot be dumped over this interface.
That way, it can also be known that a program _is_ attached (as
opposed to just return 0), and a different (future) method needs
to be consulted for a dump.

Fixes: 89aa075 ("net: sock: allow eBPF programs to be attached to sockets")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit 874bbfe upstream.

My system keeps crashing with below message. vmstat_update() schedules a delayed
work in current cpu and expects the work runs in the cpu.
schedule_delayed_work() is expected to make delayed work run in local cpu. The
problem is timer can be migrated with NO_HZ. __queue_work() queues work in
timer handler, which could run in a different cpu other than where the delayed
work is scheduled. The end result is the delayed work runs in different cpu.
The patch makes __queue_delayed_work records local cpu earlier. Where the timer
runs doesn't change where the work runs with the change.

[   28.010131] ------------[ cut here ]------------
[   28.010609] kernel BUG at ../mm/vmstat.c:1392!
[   28.011099] invalid opcode: 0000 [riscvarchive#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[   28.011860] Modules linked in:
[   28.012245] CPU: 0 PID: 289 Comm: kworker/0:3 Tainted: G        W4.3.0-rc3+ #634
[   28.013065] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153802- 04/01/2014
[   28.014160] Workqueue: events vmstat_update
[   28.014571] task: ffff880117682580 ti: ffff8800ba428000 task.ti: ffff8800ba428000
[   28.015445] RIP: 0010:[<ffffffff8115f921>]  [<ffffffff8115f921>]vmstat_update+0x31/0x80
[   28.016282] RSP: 0018:ffff8800ba42fd80  EFLAGS: 00010297
[   28.016812] RAX: 0000000000000000 RBX: ffff88011a858dc0 RCX:0000000000000000
[   28.017585] RDX: ffff880117682580 RSI: ffffffff81f14d8c RDI:ffffffff81f4df8d
[   28.018366] RBP: ffff8800ba42fd90 R08: 0000000000000001 R09:0000000000000000
[   28.019169] R10: 0000000000000000 R11: 0000000000000121 R12:ffff8800baa9f640
[   28.019947] R13: ffff88011a81e340 R14: ffff88011a823700 R15:0000000000000000
[   28.020071] FS:  0000000000000000(0000) GS:ffff88011a800000(0000)knlGS:0000000000000000
[   28.020071] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   28.020071] CR2: 00007ff6144b01d0 CR3: 00000000b8e93000 CR4:00000000000006f0
[   28.020071] Stack:
[   28.020071]  ffff88011a858dc0 ffff8800baa9f640 ffff8800ba42fe00ffffffff8106bd88
[   28.020071]  ffffffff8106bd0b 0000000000000096 0000000000000000ffffffff82f9b1e8
[   28.020071]  ffffffff829f0b10 0000000000000000 ffffffff81f18460ffff88011a81e340
[   28.020071] Call Trace:
[   28.020071]  [<ffffffff8106bd88>] process_one_work+0x1c8/0x540
[   28.020071]  [<ffffffff8106bd0b>] ? process_one_work+0x14b/0x540
[   28.020071]  [<ffffffff8106c214>] worker_thread+0x114/0x460
[   28.020071]  [<ffffffff8106c100>] ? process_one_work+0x540/0x540
[   28.020071]  [<ffffffff81071bf8>] kthread+0xf8/0x110
[   28.020071]  [<ffffffff81071b00>] ?kthread_create_on_node+0x200/0x200
[   28.020071]  [<ffffffff81a6522f>] ret_from_fork+0x3f/0x70
[   28.020071]  [<ffffffff81071b00>] ?kthread_create_on_node+0x200/0x200

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit 8832317 upstream.

Currently we do not validate rtas.entry before calling enter_rtas(). This
leads to a kernel oops when user space calls rtas system call on a powernv
platform (see below). This patch adds code to validate rtas.entry before
making enter_rtas() call.

  Oops: Exception in kernel mode, sig: 4 [riscvarchive#1]
  SMP NR_CPUS=1024 NUMA PowerNV
  task: c000000004294b80 ti: c0000007e1a78000 task.ti: c0000007e1a78000
  NIP: 0000000000000000 LR: 0000000000009c14 CTR: c000000000423140
  REGS: c0000007e1a7b920 TRAP: 0e40   Not tainted  (3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le)
  MSR: 1000000000081000 <HV,ME>  CR: 00000000  XER: 00000000
  CFAR: c000000000009c0c SOFTE: 0
  NIP [0000000000000000]           (null)
  LR [0000000000009c14] 0x9c14
  Call Trace:
  [c0000007e1a7bba0] [c00000000041a7f4] avc_has_perm_noaudit+0x54/0x110 (unreliable)
  [c0000007e1a7bd80] [c00000000002ddc0] ppc_rtas+0x150/0x2d0
  [c0000007e1a7be30] [c000000000009358] syscall_exit+0x0/0x98

Fixes: 55190f8 ("powerpc: Add skeleton PowerNV platform")
Reported-by: NAGESWARA R. SASTRY <nasastry@in.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
[mpe: Reword change log, trim oops, and add stable + fixes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
[ Upstream commit 74e98eb ]

There was no verification that an underlying transport exists when creating
a connection, this would cause dereferencing a NULL ptr.

It might happen on sockets that weren't properly bound before attempting to
send a message, which will cause a NULL ptr deref:

[135546.047719] kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: 0000 [riscvarchive#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[135546.051270] Modules linked in:
[135546.051781] CPU: 4 PID: 15650 Comm: trinity-c4 Not tainted 4.2.0-next-20150902-sasha-00041-gbaa1222-dirty #2527
[135546.053217] task: ffff8800835bc000 ti: ffff8800bc708000 task.ti: ffff8800bc708000
[135546.054291] RIP: __rds_conn_create (net/rds/connection.c:194)
[135546.055666] RSP: 0018:ffff8800bc70fab0  EFLAGS: 00010202
[135546.056457] RAX: dffffc0000000000 RBX: 0000000000000f2c RCX: ffff8800835bc000
[135546.057494] RDX: 0000000000000007 RSI: ffff8800835bccd8 RDI: 0000000000000038
[135546.058530] RBP: ffff8800bc70fb18 R08: 0000000000000001 R09: 0000000000000000
[135546.059556] R10: ffffed014d7a3a23 R11: ffffed014d7a3a21 R12: 0000000000000000
[135546.060614] R13: 0000000000000001 R14: ffff8801ec3d0000 R15: 0000000000000000
[135546.061668] FS:  00007faad4ffb700(0000) GS:ffff880252000000(0000) knlGS:0000000000000000
[135546.062836] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[135546.063682] CR2: 000000000000846a CR3: 000000009d137000 CR4: 00000000000006a0
[135546.064723] Stack:
[135546.065048]  ffffffffafe2055c ffffffffafe23fc1 ffffed00493097bf ffff8801ec3d0008
[135546.066247]  0000000000000000 00000000000000d0 0000000000000000 ac194a24c0586342
[135546.067438]  1ffff100178e1f78 ffff880320581b00 ffff8800bc70fdd0 ffff880320581b00
[135546.068629] Call Trace:
[135546.069028] ? __rds_conn_create (include/linux/rcupdate.h:856 net/rds/connection.c:134)
[135546.069989] ? rds_message_copy_from_user (net/rds/message.c:298)
[135546.071021] rds_conn_create_outgoing (net/rds/connection.c:278)
[135546.071981] rds_sendmsg (net/rds/send.c:1058)
[135546.072858] ? perf_trace_lock (include/trace/events/lock.h:38)
[135546.073744] ? lockdep_init (kernel/locking/lockdep.c:3298)
[135546.074577] ? rds_send_drop_to (net/rds/send.c:976)
[135546.075508] ? __might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3795)
[135546.076349] ? __might_fault (mm/memory.c:3795)
[135546.077179] ? rds_send_drop_to (net/rds/send.c:976)
[135546.078114] sock_sendmsg (net/socket.c:611 net/socket.c:620)
[135546.078856] SYSC_sendto (net/socket.c:1657)
[135546.079596] ? SYSC_connect (net/socket.c:1628)
[135546.080510] ? trace_dump_stack (kernel/trace/trace.c:1926)
[135546.081397] ? ring_buffer_unlock_commit (kernel/trace/ring_buffer.c:2479 kernel/trace/ring_buffer.c:2558 kernel/trace/ring_buffer.c:2674)
[135546.082390] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
[135546.083410] ? trace_event_raw_event_sys_enter (include/trace/events/syscalls.h:16)
[135546.084481] ? do_audit_syscall_entry (include/trace/events/syscalls.h:16)
[135546.085438] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
[135546.085515] rds_ib_laddr_check(): addr 36.74.25.172 ret -99 node type -1

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit facf47e upstream.

For imx27, it needs three clocks to let the controller work,
the old code is wrong, and usbmisc has not included clock handling
code any more. Without this patch, it will cause below data
abort when accessing usbmisc registers.

usbcore: registered new interface driver usb-storage
Unhandled fault: external abort on non-linefetch (0x008) at 0xf4424600
pgd = c0004000
[f4424600] *pgd=10000452(bad)
Internal error: : 8 [riscvarchive#1] PREEMPT ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 4.1.0-next-20150701-dirty #3089
Hardware name: Freescale i.MX27 (Device Tree Support)
task: c7832b60 ti: c783e000 task.ti: c783e000
PC is at usbmisc_imx27_init+0x4c/0xbc
LR is at usbmisc_imx27_init+0x40/0xbc
pc : [<c03cb5c0>]    lr : [<c03cb5b4>]    psr: 60000093
sp : c783fe08  ip : 00000000  fp : 00000000
r10: c0576434  r9 : 0000009c  r8 : c7a773a0
r7 : 01000000  r6 : 60000013  r5 : c7a776f0  r4 : c7a773f0
r3 : f4424600  r2 : 00000000  r1 : 00000001  r0 : 00000001
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 0005317f  Table: a0004000  DAC: 00000017
Process swapper (pid: 1, stack limit = 0xc783e190)
Stack: (0xc783fe08 to 0xc7840000)

Signed-off-by: Peter Chen <peter.chen@freescale.com>
Reported-by: Fabio Estevam <fabio.estevam@freescale.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Acked-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit 519ee69 upstream.

In case of one shot NOA the interval can be 0, catch that
instead of potentially (depending on the driver) crashing
like this:

divide error: 0000 [riscvarchive#1] SMP
[...]
Call Trace:
<IRQ>
[<ffffffffc08e891c>] ieee80211_extend_absent_time+0x6c/0xb0 [mac80211]
[<ffffffffc08e8a17>] ieee80211_update_p2p_noa+0xb7/0xe0 [mac80211]
[<ffffffffc069cc30>] ath9k_p2p_ps_timer+0x170/0x190 [ath9k]
[<ffffffffc070adf8>] ath_gen_timer_isr+0xc8/0xf0 [ath9k_hw]
[<ffffffffc0691156>] ath9k_tasklet+0x296/0x2f0 [ath9k]
[<ffffffff8107ad65>] tasklet_action+0xe5/0xf0
[...]

Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
jameyhicks pushed a commit to cambridgehackers/riscv-linux-4.1.y that referenced this issue Jan 14, 2016
commit 19cd80a upstream.

It is not permitted to set task state before lock. usblp_wwait sets
the state to TASK_INTERRUPTIBLE and calls mutex_lock_interruptible.
Upon return from that function, the state will be TASK_RUNNING again.

This is clearly a bug and a warning is generated with LOCKDEP too:
WARNING: CPU: 1 PID: 5109 at kernel/sched/core.c:7404 __might_sleep+0x7d/0x90()
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffffa0c588d0>] usblp_wwait+0xa0/0x310 [usblp]
Modules linked in: ...
CPU: 1 PID: 5109 Comm: captmon Tainted: G        W       4.2.5-0.gef2823b-default riscvarchive#1
Hardware name: LENOVO 23252SG/23252SG, BIOS G2ET33WW (1.13 ) 07/24/2012
 ffffffff81a4edce ffff880236ec7ba8 ffffffff81716651 0000000000000000
 ffff880236ec7bf8 ffff880236ec7be8 ffffffff8106e146 0000000000000282
 ffffffff81a50119 000000000000028b 0000000000000000 ffff8802dab7c508
Call Trace:
...
 [<ffffffff8106e1c6>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff8109a8bd>] __might_sleep+0x7d/0x90
 [<ffffffff8171b20f>] mutex_lock_interruptible_nested+0x2f/0x4b0
 [<ffffffffa0c588fc>] usblp_wwait+0xcc/0x310 [usblp]
 [<ffffffffa0c58bb2>] usblp_write+0x72/0x350 [usblp]
 [<ffffffff8121ed98>] __vfs_write+0x28/0xf0
...

Commit 7f47735 (usblp: Implement the
ENOSPC convention) moved the set prior locking. So move it back after
the lock.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Fixes: 7f47735 ("usblp: Implement the ENOSPC convention")
Acked-By: Pete Zaitcev <zaitcev@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Mar 26, 2019
The array str[] should have six elements.

  =================================================================
  ==4322==ERROR: AddressSanitizer: global-buffer-overflow on address 0x56463844e300 at pc 0x564637e7ad0d bp 0x7f30c8c89d10 sp 0x7f30c8c89d00
  READ of size 8 at 0x56463844e300 thread T9
      #0 0x564637e7ad0c in __ordered_events__flush util/ordered-events.c:316
      riscvarchive#1 0x564637e7b0e4 in ordered_events__flush util/ordered-events.c:338
      riscvarchive#2 0x564637c6a57d in process_thread /home/changbin/work/linux/tools/perf/builtin-top.c:1073
      riscvarchive#3 0x7f30d173a163 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x8163)
      riscvarchive#4 0x7f30cfffbdee in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x11adee)

  0x56463844e300 is located 32 bytes to the left of global variable 'flags' defined in 'util/trace-event-parse.c:229:26' (0x56463844e320) of size 192
  0x56463844e300 is located 0 bytes to the right of global variable 'str' defined in 'util/ordered-events.c:268:28' (0x56463844e2e0) of size 32
  SUMMARY: AddressSanitizer: global-buffer-overflow util/ordered-events.c:316 in __ordered_events__flush
  Shadow bytes around the buggy address:
    0x0ac947081c10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ac947081c20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ac947081c30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ac947081c40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ac947081c50: 00 00 00 00 00 00 00 00 f9 f9 f9 f9 00 00 00 00
  =>0x0ac947081c60:[f9]f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ac947081c70: 00 00 00 00 00 00 00 00 00 00 00 00 f9 f9 f9 f9
    0x0ac947081c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ac947081c90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ac947081ca0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ac947081cb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Partially addressable: 01 02 03 04 05 06 07
    Heap left redzone:       fa
    Freed heap region:       fd
    Stack left redzone:      f1
    Stack mid redzone:       f2
    Stack right redzone:     f3
    Stack after return:      f5
    Stack use after scope:   f8
    Global redzone:          f9
    Global init order:       f6
    Poisoned by user:        f7
    Container overflow:      fc
    Array cookie:            ac
    Intra object redzone:    bb
    ASan internal:           fe
    Left alloca redzone:     ca
    Right alloca redzone:    cb
  Thread T9 created by T0 here:
      #0 0x7f30d179de5f in __interceptor_pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x4ae5f)
      riscvarchive#1 0x564637c6b954 in __cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1253
      riscvarchive#2 0x564637c7173c in cmd_top /home/changbin/work/linux/tools/perf/builtin-top.c:1642
      riscvarchive#3 0x564637d85038 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      riscvarchive#4 0x564637d85577 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      riscvarchive#5 0x564637d8597b in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      riscvarchive#6 0x564637d860e9 in main /home/changbin/work/linux/tools/perf/perf.c:520
      riscvarchive#7 0x7f30cff0509a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Fixes: 16c66bc ("perf top: Add processing thread")
Fixes: 68ca5d0 ("perf ordered_events: Add ordered_events__flush_time interface")
Link: http://lkml.kernel.org/r/20190316080556.3075-13-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Mar 26, 2019
Using gcc's ASan, Changbin reports:

  =================================================================
  ==7494==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 48 byte(s) in 1 object(s) allocated from:
      #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      riscvarchive#1 0x5625e5330a5e in zalloc util/util.h:23
      riscvarchive#2 0x5625e5330a9b in perf_counts__new util/counts.c:10
      riscvarchive#3 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47
      riscvarchive#4 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505
      riscvarchive#5 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347
      riscvarchive#6 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47
      riscvarchive#7 0x5625e51528e6 in run_test tests/builtin-test.c:358
      riscvarchive#8 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      riscvarchive#9 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      riscvarchive#10 0x5625e515572f in cmd_test tests/builtin-test.c:722
      riscvarchive#11 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      riscvarchive#12 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      riscvarchive#13 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      riscvarchive#14 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      riscvarchive#15 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

  Indirect leak of 72 byte(s) in 1 object(s) allocated from:
      #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      riscvarchive#1 0x5625e532560d in zalloc util/util.h:23
      riscvarchive#2 0x5625e532566b in xyarray__new util/xyarray.c:10
      riscvarchive#3 0x5625e5330aba in perf_counts__new util/counts.c:15
      riscvarchive#4 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47
      riscvarchive#5 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505
      riscvarchive#6 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347
      riscvarchive#7 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47
      riscvarchive#8 0x5625e51528e6 in run_test tests/builtin-test.c:358
      riscvarchive#9 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      riscvarchive#10 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      riscvarchive#11 0x5625e515572f in cmd_test tests/builtin-test.c:722
      riscvarchive#12 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      riscvarchive#13 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      riscvarchive#14 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      riscvarchive#15 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      riscvarchive#16 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

His patch took care of evsel->prev_raw_counts, but the above backtraces
are about evsel->counts, so fix that instead.

Reported-by: Changbin Du <changbin.du@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/n/tip-hd1x13g59f0nuhe4anxhsmfp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Mar 26, 2019
…_event_on_all_cpus test

  =================================================================
  ==7497==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 40 byte(s) in 1 object(s) allocated from:
      #0 0x7f0333a88f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30)
      riscvarchive#1 0x5625e5326213 in cpu_map__trim_new util/cpumap.c:45
      riscvarchive#2 0x5625e5326703 in cpu_map__read util/cpumap.c:103
      riscvarchive#3 0x5625e53267ef in cpu_map__read_all_cpu_map util/cpumap.c:120
      riscvarchive#4 0x5625e5326915 in cpu_map__new util/cpumap.c:135
      riscvarchive#5 0x5625e517b355 in test__openat_syscall_event_on_all_cpus tests/openat-syscall-all-cpus.c:36
      riscvarchive#6 0x5625e51528e6 in run_test tests/builtin-test.c:358
      riscvarchive#7 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      riscvarchive#8 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      riscvarchive#9 0x5625e515572f in cmd_test tests/builtin-test.c:722
      riscvarchive#10 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      riscvarchive#11 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      riscvarchive#12 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      riscvarchive#13 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      riscvarchive#14 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: f30a79b ("perf tools: Add reference counting for cpu_map object")
Link: http://lkml.kernel.org/r/20190316080556.3075-15-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Mar 26, 2019
  =================================================================
  ==7506==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 13 byte(s) in 3 object(s) allocated from:
      #0 0x7f03339d6070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070)
      riscvarchive#1 0x5625e53aaef0 in expr__find_other util/expr.y:221
      riscvarchive#2 0x5625e51bcd3f in test__expr tests/expr.c:52
      riscvarchive#3 0x5625e51528e6 in run_test tests/builtin-test.c:358
      riscvarchive#4 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      riscvarchive#5 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      riscvarchive#6 0x5625e515572f in cmd_test tests/builtin-test.c:722
      riscvarchive#7 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      riscvarchive#8 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      riscvarchive#9 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      riscvarchive#10 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      riscvarchive#11 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: 0751673 ("perf tools: Add a simple expression parser for JSON")
Link: http://lkml.kernel.org/r/20190316080556.3075-16-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Mar 26, 2019
  =================================================================
  ==20875==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 1160 byte(s) in 1 object(s) allocated from:
      #0 0x7f1b6fc84138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      riscvarchive#1 0x55bd50005599 in zalloc util/util.h:23
      riscvarchive#2 0x55bd500068f5 in perf_evsel__newtp_idx util/evsel.c:327
      riscvarchive#3 0x55bd4ff810fc in perf_evsel__newtp /home/work/linux/tools/perf/util/evsel.h:216
      riscvarchive#4 0x55bd4ff81608 in test__perf_evsel__tp_sched_test tests/evsel-tp-sched.c:69
      riscvarchive#5 0x55bd4ff528e6 in run_test tests/builtin-test.c:358
      riscvarchive#6 0x55bd4ff52baf in test_and_print tests/builtin-test.c:388
      riscvarchive#7 0x55bd4ff543fe in __cmd_test tests/builtin-test.c:583
      riscvarchive#8 0x55bd4ff5572f in cmd_test tests/builtin-test.c:722
      riscvarchive#9 0x55bd4ffc4087 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      riscvarchive#10 0x55bd4ffc45c6 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      riscvarchive#11 0x55bd4ffc49ca in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      riscvarchive#12 0x55bd4ffc5138 in main /home/changbin/work/linux/tools/perf/perf.c:520
      riscvarchive#13 0x7f1b6e34809a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

  Indirect leak of 19 byte(s) in 1 object(s) allocated from:
      #0 0x7f1b6fc83f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30)
      riscvarchive#1 0x7f1b6e3ac30f in vasprintf (/lib/x86_64-linux-gnu/libc.so.6+0x8830f)

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: 6a6cd11 ("perf test: Add test for the sched tracepoint format fields")
Link: http://lkml.kernel.org/r/20190316080556.3075-17-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Mar 26, 2019
MIXER on Exynos5 SoCs uses different synchronisation method than Exynos4
to update internal state (shadow registers).
Apparently the driver implements it incorrectly. The rule should be
as follows:
- do not request updating registers until previous request was finished,
  ie. MXR_CFG_LAYER_UPDATE_COUNT must be 0.
- before setting registers synchronisation on VSYNC should be turned off,
  ie. MXR_STATUS_SYNC_ENABLE should be reset,
- after finishing MXR_STATUS_SYNC_ENABLE should be set again.
The patch hopefully implements it correctly.
Below sample kernel log from page fault caused by the bug:

[   25.670038] exynos-sysmmu 14650000.sysmmu: 14450000.mixer: PAGE FAULT occurred at 0x2247b800
[   25.677888] ------------[ cut here ]------------
[   25.682164] kernel BUG at ../drivers/iommu/exynos-iommu.c:450!
[   25.687971] Internal error: Oops - BUG: 0 [riscvarchive#1] PREEMPT SMP ARM
[   25.693778] Modules linked in:
[   25.696816] CPU: 5 PID: 1553 Comm: fb-release_test Not tainted 5.0.0-rc7-01157-g5f86b1566bdd riscvarchive#136
[   25.705646] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[   25.711710] PC is at exynos_sysmmu_irq+0x1c0/0x264
[   25.716470] LR is at lock_is_held_type+0x44/0x64

v2: added missing MXR_CFG_LAYER_UPDATE bit setting in mixer_enable_sync

Reported-by: Marian Mihailescu <mihailescu2m@gmail.com>
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Mar 26, 2019
This reverts commit b189e75.

Unable to handle kernel paging request at virtual address c8358000
pgd = efa405c3
[c8358000] *pgd=00000000
Internal error: Oops: 805 [riscvarchive#1] PREEMPT ARM
CPU: 0 PID: 711 Comm: kworker/0:2 Not tainted 4.20.0+ riscvarchive#30
Hardware name: Freescale i.MX27 (Device Tree Support)
Workqueue: events mxcmci_datawork
PC is at mxcmci_datawork+0xbc/0x2ac
LR is at mxcmci_datawork+0xac/0x2ac
pc : [<c04e33c8>]    lr : [<c04e33b8>]    psr: 60000013
sp : c6c93f08  ip : 24004180  fp : 00000008
r10: c8358000  r9 : c78b3e24  r8 : c6c92000
r7 : 00000000  r6 : c7bb8680  r5 : c7bb86d4  r4 : c78b3de0
r3 : 00002502  r2 : c090b2e0  r1 : 00000880  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: a68a8000  DAC: 00000055
Process kworker/0:2 (pid: 711, stack limit = 0x389543bc)
Stack: (0xc6c93f08 to 0xc6c94000)
3f00:                   c7bb86d4 00000000 00000000 c6cbfde0 c7bb86d4 c7ee4200
3f20: 00000000 c0907ea8 00000000 c7bb86d8 c0907ea8 c012077c c6cbfde0 c7bb86d4
3f40: c6cbfde0 c6c92000 c6cbfdf4 c09280ba c0907ea8 c090b2e0 c0907ebc c0120c18
3f60: c6cbfde0 00000000 00000000 c6cbb580 c7ba7c40 c7837edc c6cbb598 00000000
3f80: c6cbfde0 c01208f8 00000000 c01254fc c7ba7c40 c0125400 00000000 00000000
3fa0: 00000000 00000000 00000000 c01010d0 00000000 00000000 00000000 00000000
3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[<c04e33c8>] (mxcmci_datawork) from [<c012077c>] (process_one_work+0x1f0/0x338)
[<c012077c>] (process_one_work) from [<c0120c18>] (worker_thread+0x320/0x474)
[<c0120c18>] (worker_thread) from [<c01254fc>] (kthread+0xfc/0x118)
[<c01254fc>] (kthread) from [<c01010d0>] (ret_from_fork+0x14/0x24)
Exception stack(0xc6c93fb0 to 0xc6c93ff8)
3fa0:                                     00000000 00000000 00000000 00000000
3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
Code: e3500000 1a000059 e5153050 e5933038 (e48a3004)
---[ end trace 54ca629b75f0e737 ]---
note: kworker/0:2[711] exited with preempt_count 1

Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Fixes: b189e75 ("mmc: mxcmmc: handle highmem pages")
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
Parity page is incorrectly unmapped in finish_parity_scrub(), triggering
a reference counter bug on i386, i.e.:

 [ 157.662401] kernel BUG at mm/highmem.c:349!
 [ 157.666725] invalid opcode: 0000 [riscvarchive#1] SMP PTI

The reason is that kunmap(p_page) was completely left out, so we never
did an unmap for the p_page and the loop unmapping the rbio page was
iterating over the wrong number of stripes: unmapping should be done
with nr_data instead of rbio->real_stripes.

Test case to reproduce the bug:

 - create a raid5 btrfs filesystem:
   # mkfs.btrfs -m raid5 -d raid5 /dev/sdb /dev/sdc /dev/sdd /dev/sde

 - mount it:
   # mount /dev/sdb /mnt

 - run btrfs scrub in a loop:
   # while :; do btrfs scrub start -BR /mnt; done

BugLink: https://bugs.launchpad.net/bugs/1812845
Fixes: 5a6ac9e ("Btrfs, raid56: support parity scrub on raid56")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
Back in commit a89ca6f ("Btrfs: fix fsync after truncate when
no_holes feature is enabled") I added an assertion that is triggered when
an inline extent is found to assert that the length of the (uncompressed)
data the extent represents is the same as the i_size of the inode, since
that is true most of the time I couldn't find or didn't remembered about
any exception at that time. Later on the assertion was expanded twice to
deal with a case of a compressed inline extent representing a range that
matches the sector size followed by an expanding truncate, and another
case where fallocate can update the i_size of the inode without adding
or updating existing extents (if the fallocate range falls entirely within
the first block of the file). These two expansion/fixes of the assertion
were done by commit 7ed586d ("Btrfs: fix assertion on fsync of
regular file when using no-holes feature") and commit 6399fb5
("Btrfs: fix assertion failure during fsync in no-holes mode").
These however missed the case where an falloc expands the i_size of an
inode to exactly the sector size and inline extent exists, for example:

 $ mkfs.btrfs -f -O no-holes /dev/sdc
 $ mount /dev/sdc /mnt

 $ xfs_io -f -c "pwrite -S 0xab 0 1096" /mnt/foobar
 wrote 1096/1096 bytes at offset 0
 1 KiB, 1 ops; 0.0002 sec (4.448 MiB/sec and 4255.3191 ops/sec)

 $ xfs_io -c "falloc 1096 3000" /mnt/foobar
 $ xfs_io -c "fsync" /mnt/foobar
 Segmentation fault

 $ dmesg
 [701253.602385] assertion failed: len == i_size || (len == fs_info->sectorsize && btrfs_file_extent_compression(leaf, extent) != BTRFS_COMPRESS_NONE) || (len < i_size && i_size < fs_info->sectorsize), file: fs/btrfs/tree-log.c, line: 4727
 [701253.602962] ------------[ cut here ]------------
 [701253.603224] kernel BUG at fs/btrfs/ctree.h:3533!
 [701253.603503] invalid opcode: 0000 [riscvarchive#1] SMP DEBUG_PAGEALLOC PTI
 [701253.603774] CPU: 2 PID: 7192 Comm: xfs_io Tainted: G        W         5.0.0-rc8-btrfs-next-45 riscvarchive#1
 [701253.604054] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
 [701253.604650] RIP: 0010:assfail.constprop.23+0x18/0x1a [btrfs]
 (...)
 [701253.605591] RSP: 0018:ffffbb48c186bc48 EFLAGS: 00010286
 [701253.605914] RAX: 00000000000000de RBX: ffff921d0a7afc08 RCX: 0000000000000000
 [701253.606244] RDX: 0000000000000000 RSI: ffff921d36b16868 RDI: ffff921d36b16868
 [701253.606580] RBP: ffffbb48c186bcf0 R08: 0000000000000000 R09: 0000000000000000
 [701253.606913] R10: 0000000000000003 R11: 0000000000000000 R12: ffff921d05d2de18
 [701253.607247] R13: ffff921d03b54000 R14: 0000000000000448 R15: ffff921d059ecf80
 [701253.607769] FS:  00007f14da906700(0000) GS:ffff921d36b00000(0000) knlGS:0000000000000000
 [701253.608163] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [701253.608516] CR2: 000056087ea9f278 CR3: 00000002268e8001 CR4: 00000000003606e0
 [701253.608880] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 [701253.609250] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 [701253.609608] Call Trace:
 [701253.609994]  btrfs_log_inode+0xdfb/0xe40 [btrfs]
 [701253.610383]  btrfs_log_inode_parent+0x2be/0xa60 [btrfs]
 [701253.610770]  ? do_raw_spin_unlock+0x49/0xc0
 [701253.611150]  btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
 [701253.611537]  btrfs_sync_file+0x3b2/0x440 [btrfs]
 [701253.612010]  ? do_sysinfo+0xb0/0xf0
 [701253.612552]  do_fsync+0x38/0x60
 [701253.612988]  __x64_sys_fsync+0x10/0x20
 [701253.613360]  do_syscall_64+0x60/0x1b0
 [701253.613733]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
 [701253.614103] RIP: 0033:0x7f14da4e66d0
 (...)
 [701253.615250] RSP: 002b:00007fffa670fdb8 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
 [701253.615647] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f14da4e66d0
 [701253.616047] RDX: 000056087ea9c260 RSI: 000056087ea9c260 RDI: 0000000000000003
 [701253.616450] RBP: 0000000000000001 R08: 0000000000000020 R09: 0000000000000010
 [701253.616854] R10: 000000000000009b R11: 0000000000000246 R12: 000056087ea9c260
 [701253.617257] R13: 000056087ea9c240 R14: 0000000000000000 R15: 000056087ea9dd10
 (...)
 [701253.619941] ---[ end trace e088d74f132b6da5 ]---

Updating the assertion again to allow for this particular case would result
in a meaningless assertion, plus there is currently no risk of logging
content that would result in any corruption after a log replay if the size
of the data encoded in an inline extent is greater than the inode's i_size
(which is not currently possibe either with or without compression),
therefore just remove the assertion.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall action csum icmp pass index 90
 # tc actions replace action csum icmp goto chain 42 index 90 \
 > cookie c1a0c1a0
 # tc actions show action csum

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

        action order 0: csum (icmp) action goto chain 42
        index 90 ref 2 bind 1
        cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 8000000074692067 P4D 8000000074692067 PUD 2e210067 PMD 0
 Oops: 0000 [riscvarchive#1] SMP PTI
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.0.0-rc4.gotochain_crash+ #533
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffff93153da03be0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff9314ee40f700 RCX: 0000000000003a00
 RDX: 0000000000000000 RSI: ffff931537c87828 RDI: ffff931537c87818
 RBP: ffff93153da03c80 R08: 00000000527cffff R09: 0000000000000003
 R10: 000000000000003f R11: 0000000000000028 R12: ffff9314edf68400
 R13: ffff9314edf68408 R14: 0000000000000001 R15: ffff9314ed67b600
 FS:  0000000000000000(0000) GS:ffff93153da00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 0000000073e32003 CR4: 00000000001606f0
 Call Trace:
  <IRQ>
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ip6_finish_output2+0x369/0x590
  ip6_finish_output2+0x369/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.35+0x79/0xc0
  mld_sendpack+0x16f/0x220
  mld_ifc_timer_expire+0x195/0x2c0
  ? igmp6_timer_handler+0x70/0x70
  call_timer_fn+0x2b/0x130
  run_timer_softirq+0x3e8/0x440
  ? tick_sched_timer+0x37/0x70
  __do_softirq+0xe3/0x2f5
  irq_exit+0xf0/0x100
  smp_apic_timer_interrupt+0x6c/0x130
  apic_timer_interrupt+0xf/0x20
  </IRQ>
 RIP: 0010:native_safe_halt+0x2/0x10
 Code: 66 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
 RSP: 0018:ffffffff9a803e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
 RAX: ffffffff99e184f0 RBX: 0000000000000000 RCX: 0000000000000001
 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000000
 RBP: 0000000000000000 R08: 000eb5c4572376b3 R09: 0000000000000000
 R10: ffffa53e806a3ca0 R11: 00000000000f4240 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  ? __sched_text_end+0x1/0x1
  default_idle+0x1c/0x140
  do_idle+0x1c4/0x280
  cpu_startup_entry+0x19/0x20
  start_kernel+0x49e/0x4be
  secondary_startup_64+0xa4/0xb0
 Modules linked in: act_csum veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel mbcache snd_hda_codec jbd2 snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd snd_timer glue_helper snd joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt net_failover fb_sys_fops virtio_console virtio_blk ttm failover drm ata_piix crc32c_intel floppy virtio_pci serio_raw libata virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_csum_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514 ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action gact pass index 90
 # tc actions replace action gact \
 > goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action gact

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: gact action goto chain 42
          random type none pass val 0
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 0 P4D 0
 Oops: 0000 [riscvarchive#1] SMP PTI
 CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffff8c2434703be0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff8c23ed6d7e00 RCX: 000000000000005a
 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8c23ed6d7e00
 RBP: ffff8c2434703c80 R08: ffff8c243b639ac8 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c2429e68b00
 R13: ffff8c2429e68b08 R14: 0000000000000001 R15: ffff8c2429c5a480
 FS:  0000000000000000(0000) GS:ffff8c2434700000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000002dc0e005 CR4: 00000000001606e0
 Call Trace:
  <IRQ>
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ip6_finish_output2+0x369/0x590
  ip6_finish_output2+0x369/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.35+0x79/0xc0
  mld_sendpack+0x16f/0x220
  mld_ifc_timer_expire+0x195/0x2c0
  ? igmp6_timer_handler+0x70/0x70
  call_timer_fn+0x2b/0x130
  run_timer_softirq+0x3e8/0x440
  ? tick_sched_timer+0x37/0x70
  __do_softirq+0xe3/0x2f5
  irq_exit+0xf0/0x100
  smp_apic_timer_interrupt+0x6c/0x130
  apic_timer_interrupt+0xf/0x20
  </IRQ>
 RIP: 0010:native_safe_halt+0x2/0x10
 Code: 74 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
 RSP: 0018:ffff9c8640387eb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
 RAX: ffffffff8b2184f0 RBX: 0000000000000002 RCX: 0000000000000001
 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000002
 RBP: 0000000000000002 R08: 000eb57882b36cc3 R09: 0000000000000020
 R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  ? __sched_text_end+0x1/0x1
  default_idle+0x1c/0x140
  do_idle+0x1c4/0x280
  cpu_startup_entry+0x19/0x20
  start_secondary+0x1a7/0x200
  secondary_startup_64+0xa4/0xb0
 Modules linked in: act_gact act_bpf veth ip6table_filter ip6_tables iptable_filter binfmt_misc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic ext4 snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core mbcache jbd2 snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper virtio_balloon joydev pcspkr snd_timer snd i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea virtio_net sysfillrect net_failover virtio_blk sysimgblt fb_sys_fops virtio_console ttm failover drm crc32c_intel serio_raw ata_piix libata floppy virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_bpf]
 CR2: 0000000000000000

Validating the control action within tcf_gact_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514 ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action ife encode allow mark pass index 90
 # tc actions replace action ife \
 > encode allow mark goto chain 42 index 90 cookie c1a0c1a0
 # tc action show action ife

had the following output:

 IFE type 0xED3E
 IFE type 0xED3E
 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: ife encode action goto chain 42 type 0XED3E
         allow mark
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 800000007b4e7067 P4D 800000007b4e7067 PUD 7b4e6067 PMD 0
 Oops: 0000 [riscvarchive#1] SMP PTI
 CPU: 2 PID: 164 Comm: kworker/2:1 Not tainted 5.0.0-rc4.gotochain_crash+ #533
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Workqueue: ipv6_addrconf addrconf_dad_work
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffffa6a7c0553ad0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff9796ee1bbd00 RCX: 0000000000000001
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: ffffa6a7c0553b70 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: ffff9797385bb038 R12: ffff9796ead9d700
 R13: ffff9796ead9d708 R14: 0000000000000001 R15: ffff9796ead9d800
 FS:  0000000000000000(0000) GS:ffff97973db00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000007c41e006 CR4: 00000000001606e0
 Call Trace:
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ndisc_next_option+0x50/0x50
  ? ___neigh_create+0x4d5/0x680
  ? ip6_finish_output2+0x1b5/0x590
  ip6_finish_output2+0x1b5/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.28+0x79/0xc0
  ndisc_send_skb+0x248/0x2e0
  ndisc_send_ns+0xf8/0x200
  ? addrconf_dad_work+0x389/0x4b0
  addrconf_dad_work+0x389/0x4b0
  ? __switch_to_asm+0x34/0x70
  ? process_one_work+0x195/0x380
  ? addrconf_dad_completed+0x370/0x370
  process_one_work+0x195/0x380
  worker_thread+0x30/0x390
  ? process_one_work+0x380/0x380
  kthread+0x113/0x130
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
 Modules linked in: act_gact act_meta_mark act_ife dummy veth ip6table_filter ip6_tables iptable_filter binfmt_misc snd_hda_codec_generic ext4 snd_hda_intel snd_hda_codec crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hwdep snd_hda_core ghash_clmulni_intel snd_seq snd_seq_device snd_pcm snd_timer aesni_intel crypto_simd snd cryptd glue_helper virtio_balloon joydev pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl virtio_net drm_kms_helper virtio_blk net_failover syscopyarea failover sysfillrect virtio_console sysimgblt fb_sys_fops ttm drm crc32c_intel serio_raw ata_piix virtio_pci virtio_ring libata virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_ife]
 CR2: 0000000000000000

Validating the control action within tcf_ife_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514 ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action mirred ingress mirror dev lo pass
 # tc actions replace action mirred \
 > ingress mirror dev lo goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action mirred

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: mirred (Ingress Mirror to device lo) goto chain 42
         index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 Mirror/redirect action on
 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 0 P4D 0
 Oops: 0000 [riscvarchive#1] SMP PTI
 CPU: 3 PID: 47 Comm: kworker/3:1 Not tainted 5.0.0-rc4.gotochain_crash+ #533
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Workqueue: ipv6_addrconf addrconf_dad_work
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffffa772404b7ad0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff9c5afc3f4300 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffff9c5afdba9380 RDI: 0000000000029380
 RBP: ffffa772404b7b70 R08: ffff9c5af7010028 R09: ffff9c5af7010029
 R10: 0000000000000000 R11: ffff9c5af94c6a38 R12: ffff9c5af7953000
 R13: ffff9c5af7953008 R14: 0000000000000001 R15: ffff9c5af7953d00
 FS:  0000000000000000(0000) GS:ffff9c5afdb80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000007c514004 CR4: 00000000001606e0
 Call Trace:
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ndisc_next_option+0x50/0x50
  ? ___neigh_create+0x4d5/0x680
  ? ip6_finish_output2+0x1b5/0x590
  ip6_finish_output2+0x1b5/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.28+0x79/0xc0
  ndisc_send_skb+0x248/0x2e0
  ndisc_send_ns+0xf8/0x200
  ? addrconf_dad_work+0x389/0x4b0
  addrconf_dad_work+0x389/0x4b0
  ? __switch_to_asm+0x34/0x70
  ? process_one_work+0x195/0x380
  ? addrconf_dad_completed+0x370/0x370
  process_one_work+0x195/0x380
  worker_thread+0x30/0x390
  ? process_one_work+0x380/0x380
  kthread+0x113/0x130
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
 Modules linked in: act_mirred veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul snd_hda_codec_generic crc32_pclmul snd_hda_intel snd_hda_codec mbcache ghash_clmulni_intel jbd2 snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer snd crypto_simd cryptd glue_helper soundcore virtio_balloon joydev pcspkr i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio_net ttm virtio_blk net_failover virtio_console failover drm ata_piix crc32c_intel virtio_pci serio_raw libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_mirred_init() proved to fix the
above issue. For the same reason, postpone the assignment of tcfa_action
and tcfm_eaction to avoid partial reconfiguration of a mirred rule when
it's replaced by another one that mirrors to a device that does not
exist. A TDC selftest is added to verify the correct behavior.

Fixes: db50514 ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action connmark pass index 90
 # tc actions replace action connmark \
 > goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action connmark

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: connmark zone 0 goto chain 42
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 0 P4D 0
 Oops: 0000 [riscvarchive#1] SMP PTI
 CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Workqueue: ipv6_addrconf addrconf_dad_work
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
 RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
 R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
 R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
 FS:  0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
 Call Trace:
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ndisc_next_option+0x50/0x50
  ? ___neigh_create+0x4d5/0x680
  ? ip6_finish_output2+0x1b5/0x590
  ip6_finish_output2+0x1b5/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.28+0x79/0xc0
  ndisc_send_skb+0x248/0x2e0
  ndisc_send_ns+0xf8/0x200
  ? addrconf_dad_work+0x389/0x4b0
  addrconf_dad_work+0x389/0x4b0
  ? __switch_to_asm+0x34/0x70
  ? process_one_work+0x195/0x380
  ? addrconf_dad_completed+0x370/0x370
  process_one_work+0x195/0x380
  worker_thread+0x30/0x390
  ? process_one_work+0x380/0x380
  kthread+0x113/0x130
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
 Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514 ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action nat ingress 1.18.1.1 1.18.2.2 pass index 90
 # tc actions replace action nat \
 > ingress 1.18.1.1 1.18.2.2 goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action nat

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0:  nat ingress 1.18.1.1/32 1.18.2.2 goto chain 42
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 800000002d180067 P4D 800000002d180067 PUD 7cb8b067 PMD 0
 Oops: 0000 [riscvarchive#1] SMP PTI
 CPU: 3 PID: 164 Comm: kworker/3:1 Not tainted 5.0.0-rc4.gotochain_crash+ #533
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Workqueue: ipv6_addrconf addrconf_dad_work
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffffae4500e2fad0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff9fa52e28c800 RCX: 0000000001011201
 RDX: 0000000000000000 RSI: 0000000000000056 RDI: ffff9fa52ca12800
 RBP: ffffae4500e2fb70 R08: 0000000000000022 R09: 000000000000000e
 R10: 00000000ffffffff R11: 0000000001011201 R12: ffff9fa52cbc9c00
 R13: ffff9fa52cbc9c08 R14: 0000000000000001 R15: ffff9fa52ca12780
 FS:  0000000000000000(0000) GS:ffff9fa57db80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 0000000073f8c004 CR4: 00000000001606e0
 Call Trace:
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ndisc_next_option+0x50/0x50
  ? ___neigh_create+0x4d5/0x680
  ? ip6_finish_output2+0x1b5/0x590
  ip6_finish_output2+0x1b5/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.28+0x79/0xc0
  ndisc_send_skb+0x248/0x2e0
  ndisc_send_ns+0xf8/0x200
  ? addrconf_dad_work+0x389/0x4b0
  addrconf_dad_work+0x389/0x4b0
  ? __switch_to_asm+0x34/0x70
  ? process_one_work+0x195/0x380
  ? addrconf_dad_completed+0x370/0x370
  process_one_work+0x195/0x380
  worker_thread+0x30/0x390
  ? process_one_work+0x380/0x380
  kthread+0x113/0x130
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
 Modules linked in: act_nat veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mbcache jbd2 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper snd_timer snd joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs qxl ata_generic pata_acpi drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_net virtio_blk net_failover failover virtio_console drm crc32c_intel floppy ata_piix libata virtio_pci virtio_ring virtio serio_raw dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_nat_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514 ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
the following script:

 # tc filter add dev crash0 egress matchall \
 > action pedit ex munge ip ttl set 10 pass index 90
 # tc actions replace action pedit \
 > ex munge ip ttl set 10 goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action pedit

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0:  pedit action goto chain 42 keys 1
          index 90 ref 2 bind 1
          key #0  at ipv4+8: val 0a000000 mask 00ffffff
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 0 P4D 0
 Oops: 0000 [riscvarchive#1] SMP PTI
 CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffff94a73db03be0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff94a6ee4c0700 RCX: 000000000000000a
 RDX: 0000000000000000 RSI: ffff94a6ed22c800 RDI: 0000000000000000
 RBP: ffff94a73db03c80 R08: ffff94a7386fa4c8 R09: ffff94a73229ea20
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff94a6ed22cb00
 R13: ffff94a6ed22cb08 R14: 0000000000000001 R15: ffff94a6ed22c800
 FS:  0000000000000000(0000) GS:ffff94a73db00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000007120e002 CR4: 00000000001606e0
 Call Trace:
  <IRQ>
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ip6_finish_output2+0x369/0x590
  ip6_finish_output2+0x369/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.35+0x79/0xc0
  mld_sendpack+0x16f/0x220
  mld_ifc_timer_expire+0x195/0x2c0
  ? igmp6_timer_handler+0x70/0x70
  call_timer_fn+0x2b/0x130
  run_timer_softirq+0x3e8/0x440
  ? tick_sched_timer+0x37/0x70
  __do_softirq+0xe3/0x2f5
  irq_exit+0xf0/0x100
  smp_apic_timer_interrupt+0x6c/0x130
  apic_timer_interrupt+0xf/0x20
  </IRQ>
 RIP: 0010:native_safe_halt+0x2/0x10
 Code: 4e ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
 RSP: 0018:ffffab1740387eb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
 RAX: ffffffffb18184f0 RBX: 0000000000000002 RCX: 0000000000000001
 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000002
 RBP: 0000000000000002 R08: 000f168fa695f9a9 R09: 0000000000000020
 R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  ? __sched_text_end+0x1/0x1
  default_idle+0x1c/0x140
  do_idle+0x1c4/0x280
  cpu_startup_entry+0x19/0x20
  start_secondary+0x1a7/0x200
  secondary_startup_64+0xa4/0xb0
 Modules linked in: act_pedit veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 mbcache jbd2 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep aesni_intel snd_hda_core crypto_simd snd_seq cryptd glue_helper snd_seq_device snd_pcm joydev snd_timer pcspkr virtio_balloon snd soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs qxl ata_generic pata_acpi drm_kms_helper virtio_net net_failover syscopyarea sysfillrect sysimgblt failover virtio_blk fb_sys_fops virtio_console ttm drm crc32c_intel serio_raw ata_piix virtio_pci libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_pedit_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514 ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action police rate 3mbit burst 250k pass index 90
 # tc actions replace action police \
 > rate 3mbit burst 250k goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action police rate 3mbit burst

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0:  police 0x5a rate 3Mbit burst 250Kb mtu 2Kb  action goto chain 42 overhead 0b
         ref 2 bind 1
         cookie c1a0c1a0

Then, when crash0 starts transmitting more than 3Mbit/s, the following
kernel crash is observed:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 800000007a779067 P4D 800000007a779067 PUD 2ad96067 PMD 0
 Oops: 0000 [riscvarchive#1] SMP PTI
 CPU: 3 PID: 5032 Comm: netperf Not tainted 5.0.0-rc4.gotochain_crash+ #533
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffffb0e04064fa60 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff93bb3322cce0 RCX: 0000000000000005
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff93bb3322cce0
 RBP: ffffb0e04064fb00 R08: 0000000000000022 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000001 R12: ffff93bb3beed300
 R13: ffff93bb3beed308 R14: 0000000000000001 R15: ffff93bb3b64d000
 FS:  00007f0bc6be5740(0000) GS:ffff93bb3db80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000000746a8001 CR4: 00000000001606e0
 Call Trace:
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ipt_do_table+0x31c/0x420 [ip_tables]
  ? ip_finish_output2+0x16f/0x430
  ip_finish_output2+0x16f/0x430
  ? ip_output+0x69/0xe0
  ip_output+0x69/0xe0
  ? ip_forward_options+0x1a0/0x1a0
  __tcp_transmit_skb+0x563/0xa40
  tcp_write_xmit+0x243/0xfa0
  __tcp_push_pending_frames+0x32/0xf0
  tcp_sendmsg_locked+0x404/0xd30
  tcp_sendmsg+0x27/0x40
  sock_sendmsg+0x36/0x40
  __sys_sendto+0x10e/0x140
  ? __sys_connect+0x87/0xf0
  ? syscall_trace_enter+0x1df/0x2e0
  ? __audit_syscall_exit+0x216/0x260
  __x64_sys_sendto+0x24/0x30
  do_syscall_64+0x5b/0x180
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7f0bc5ffbafd
 Code: 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 8b 05 ae c4 2c 00 85 c0 75 2d 45 31 c9 45 31 c0 4c 63 d1 48 63 ff b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 63 63 2c 00 f7 d8 64 89 02 48
 RSP: 002b:00007fffef94b7f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 0000000000004000 RCX: 00007f0bc5ffbafd
 RDX: 0000000000004000 RSI: 00000000017e5420 RDI: 0000000000000004
 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
 R13: 00000000017e51d0 R14: 0000000000000010 R15: 0000000000000006
 Modules linked in: act_police veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 snd_hda_codec_generic mbcache crct10dif_pclmul jbd2 crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper snd_timer snd joydev pcspkr virtio_balloon soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_blk virtio_net virtio_console net_failover failover crc32c_intel ata_piix libata serio_raw virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_police_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514 ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action sample rate 1024 group 4 pass index 90
 # tc actions replace action sample \
 > rate 1024 group 4 goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action sample

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: sample rate 1/1024 group 4 goto chain 42
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 8000000079966067 P4D 8000000079966067 PUD 7987b067 PMD 0
 Oops: 0000 [riscvarchive#1] SMP PTI
 CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.0.0-rc4.gotochain_crash+ #536
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Workqueue: ipv6_addrconf addrconf_dad_work
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffffbee60033fad0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff99d7ae6e3b00 RCX: 00000000e555df9b
 RDX: 0000000000000000 RSI: 00000000b0352718 RDI: ffff99d7fda1fcf0
 RBP: ffffbee60033fb70 R08: 0000000070731ab1 R09: 0000000000000400
 R10: 0000000000000000 R11: ffff99d7ac733838 R12: ffff99d7f3c2be00
 R13: ffff99d7f3c2be08 R14: 0000000000000001 R15: ffff99d7f3c2b600
 FS:  0000000000000000(0000) GS:ffff99d7fda00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000000797de006 CR4: 00000000001606f0
 Call Trace:
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ndisc_next_option+0x50/0x50
  ? ___neigh_create+0x4d5/0x680
  ? ip6_finish_output2+0x1b5/0x590
  ip6_finish_output2+0x1b5/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.28+0x79/0xc0
  ndisc_send_skb+0x248/0x2e0
  ndisc_send_ns+0xf8/0x200
  ? addrconf_dad_work+0x389/0x4b0
  addrconf_dad_work+0x389/0x4b0
  ? __switch_to_asm+0x34/0x70
  ? process_one_work+0x195/0x380
  ? addrconf_dad_completed+0x370/0x370
  process_one_work+0x195/0x380
  worker_thread+0x30/0x390
  ? process_one_work+0x380/0x380
  kthread+0x113/0x130
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
 Modules linked in: act_sample psample veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mbcache jbd2 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device aesni_intel crypto_simd snd_pcm cryptd glue_helper snd_timer joydev snd pcspkr virtio_balloon i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt fb_sys_fops net_failover ttm failover virtio_blk virtio_console drm ata_piix serio_raw crc32c_intel libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_sample_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514 ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action simple sdata hello pass index 90
 # tc actions replace action simple \
 > sdata world goto chain 42 index 90 cookie c1a0c1a0
 # tc action show action simple

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: Simple <world>
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 800000006a6fb067 P4D 800000006a6fb067 PUD 6aed6067 PMD 0
 Oops: 0000 [riscvarchive#1] SMP PTI
 CPU: 2 PID: 3241 Comm: kworker/2:0 Not tainted 5.0.0-rc4.gotochain_crash+ #536
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Workqueue: ipv6_addrconf addrconf_dad_work
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffffbe6781763ad0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff9e59bdb80e00 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffff9e59b4716738 RDI: ffff9e59ab12d140
 RBP: ffffbe6781763b70 R08: 0000000000000234 R09: 0000000000aaaaaa
 R10: 0000000000000000 R11: ffff9e59b247cd50 R12: ffff9e59b112f100
 R13: ffff9e59b112f108 R14: 0000000000000001 R15: ffff9e59ab12d0c0
 FS:  0000000000000000(0000) GS:ffff9e59b4700000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000006af92004 CR4: 00000000001606e0
 Call Trace:
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ndisc_next_option+0x50/0x50
  ? ___neigh_create+0x4d5/0x680
  ? ip6_finish_output2+0x1b5/0x590
  ip6_finish_output2+0x1b5/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.28+0x79/0xc0
  ndisc_send_skb+0x248/0x2e0
  ndisc_send_ns+0xf8/0x200
  ? addrconf_dad_work+0x389/0x4b0
  addrconf_dad_work+0x389/0x4b0
  ? __switch_to_asm+0x34/0x70
  ? process_one_work+0x195/0x380
  ? addrconf_dad_completed+0x370/0x370
  process_one_work+0x195/0x380
  worker_thread+0x30/0x390
  ? process_one_work+0x380/0x380
  kthread+0x113/0x130
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
 Modules linked in: act_simple veth ip6table_filter ip6_tables iptable_filter binfmt_misc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ext4 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep mbcache snd_hda_core jbd2 snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd snd_timer glue_helper snd joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio_net ttm net_failover virtio_console virtio_blk failover drm crc32c_intel serio_raw floppy ata_piix libata virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_simple_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514 ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action skbedit ptype host pass index 90
 # tc actions replace action skbedit \
 > ptype host goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action skbedit

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: skbedit  ptype host goto chain 42
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 0 P4D 0
 Oops: 0000 [riscvarchive#1] SMP PTI
 CPU: 3 PID: 3467 Comm: kworker/3:3 Not tainted 5.0.0-rc4.gotochain_crash+ #536
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Workqueue: ipv6_addrconf addrconf_dad_work
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffffb50a81e1fad0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff9aa47ba4ea00 RCX: 0000000000000001
 RDX: 0000000000000000 RSI: ffff9aa469eeb3c0 RDI: ffff9aa47ba4ea00
 RBP: ffffb50a81e1fb70 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: ffff9aa47bce0638 R12: ffff9aa4793b0c00
 R13: ffff9aa4793b0c08 R14: 0000000000000001 R15: ffff9aa469eeb3c0
 FS:  0000000000000000(0000) GS:ffff9aa474780000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000007360e005 CR4: 00000000001606e0
 Call Trace:
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ndisc_next_option+0x50/0x50
  ? ___neigh_create+0x4d5/0x680
  ? ip6_finish_output2+0x1b5/0x590
  ip6_finish_output2+0x1b5/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.28+0x79/0xc0
  ndisc_send_skb+0x248/0x2e0
  ndisc_send_ns+0xf8/0x200
  ? addrconf_dad_work+0x389/0x4b0
  addrconf_dad_work+0x389/0x4b0
  ? __switch_to_asm+0x34/0x70
  ? process_one_work+0x195/0x380
  ? addrconf_dad_completed+0x370/0x370
  process_one_work+0x195/0x380
  worker_thread+0x30/0x390
  ? process_one_work+0x380/0x380
  kthread+0x113/0x130
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
 Modules linked in: act_skbedit veth ip6table_filter ip6_tables iptable_filter binfmt_misc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ext4 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep mbcache snd_hda_core jbd2 snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd snd_timer glue_helper snd joydev soundcore pcspkr virtio_balloon i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_net net_failover drm failover virtio_blk virtio_console ata_piix virtio_pci crc32c_intel serio_raw libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_skbedit_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514 ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action skbmod set smac 00:c1:a0:c1:a0:00 pass index 90
 # tc actions replace action skbmod \
 > set smac 00:c1:a0:c1:a0:00 goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action skbmod

had the following output:

 src MAC address <00:c1:a0:c1:a0:00>
 src MAC address <00:c1:a0:c1:a0:00>
 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: skbmod goto chain 42 set smac 00:c1:a0:c1:a0:00
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 800000002d5c7067 P4D 800000002d5c7067 PUD 77e16067 PMD 0
 Oops: 0000 [riscvarchive#1] SMP PTI
 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-rc4.gotochain_crash+ #536
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffff8987ffd83be0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff8987aeb68800 RCX: ffff8987fa263640
 RDX: 0000000000000000 RSI: ffff8987f51c8802 RDI: 00000000000000a0
 RBP: ffff8987ffd83c80 R08: ffff8987f939bac8 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8987f5c77d00
 R13: ffff8987f5c77d08 R14: 0000000000000001 R15: ffff8987f0c29f00
 FS:  0000000000000000(0000) GS:ffff8987ffd80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000007832c004 CR4: 00000000001606e0
 Call Trace:
  <IRQ>
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ip6_finish_output2+0x369/0x590
  ip6_finish_output2+0x369/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.35+0x79/0xc0
  mld_sendpack+0x16f/0x220
  mld_ifc_timer_expire+0x195/0x2c0
  ? igmp6_timer_handler+0x70/0x70
  call_timer_fn+0x2b/0x130
  run_timer_softirq+0x3e8/0x440
  ? tick_sched_timer+0x37/0x70
  __do_softirq+0xe3/0x2f5
  irq_exit+0xf0/0x100
  smp_apic_timer_interrupt+0x6c/0x130
  apic_timer_interrupt+0xf/0x20
  </IRQ>
 RIP: 0010:native_safe_halt+0x2/0x10
 Code: 56 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
 RSP: 0018:ffffa2a1c038feb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
 RAX: ffffffffa94184f0 RBX: 0000000000000003 RCX: 0000000000000001
 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000003
 RBP: 0000000000000003 R08: 001123cfc2ba71ac R09: 0000000000000000
 R10: 0000000000000000 R11: 00000000000f4240 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  ? __sched_text_end+0x1/0x1
  default_idle+0x1c/0x140
  do_idle+0x1c4/0x280
  cpu_startup_entry+0x19/0x20
  start_secondary+0x1a7/0x200
  secondary_startup_64+0xa4/0xb0
 Modules linked in: act_skbmod veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mbcache jbd2 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device aesni_intel crypto_simd cryptd glue_helper snd_pcm joydev pcspkr virtio_balloon snd_timer snd i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt fb_sys_fops net_failover virtio_console ttm virtio_blk failover drm crc32c_intel serio_raw ata_piix virtio_pci libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_skbmod_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514 ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action tunnel_key set src_ip 10.10.10.1 dst_ip 20.20.2 dst_port 3128 \
 > nocsum id 1 pass index 90
 # tc actions replace action tunnel_key \
 > set src_ip 10.10.10.1 dst_ip 20.20.2 dst_port 3128 nocsum id 1 \
 > goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action tunnel_key

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: tunnel_key  set
         src_ip 10.10.10.1
         dst_ip 20.20.2.0
         key_id 1
         dst_port 3128
         nocsum goto chain 42
          index 90 ref 2 bind 1
         cookie c1a0c1a0

then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 800000002aba4067 P4D 800000002aba4067 PUD 795f9067 PMD 0
 Oops: 0000 [riscvarchive#1] SMP PTI
 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-rc4.gotochain_crash+ #536
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffff9346bdb83be0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff9346bb795c00 RCX: 0000000000000002
 RDX: 0000000000000000 RSI: ffff93466c881700 RDI: 0000000000000246
 RBP: ffff9346bdb83c80 R08: ffff9346b3e1e0c8 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9346b978f000
 R13: ffff9346b978f008 R14: 0000000000000001 R15: ffff93466dceeb40
 FS:  0000000000000000(0000) GS:ffff9346bdb80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000007a6c2002 CR4: 00000000001606e0
 Call Trace:
  <IRQ>
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ip6_finish_output2+0x369/0x590
  ip6_finish_output2+0x369/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.35+0x79/0xc0
  mld_sendpack+0x16f/0x220
  mld_ifc_timer_expire+0x195/0x2c0
  ? igmp6_timer_handler+0x70/0x70
  call_timer_fn+0x2b/0x130
  run_timer_softirq+0x3e8/0x440
  ? tick_sched_timer+0x37/0x70
  __do_softirq+0xe3/0x2f5
  irq_exit+0xf0/0x100
  smp_apic_timer_interrupt+0x6c/0x130
  apic_timer_interrupt+0xf/0x20
  </IRQ>
 RIP: 0010:native_safe_halt+0x2/0x10
 Code: 55 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
 RSP: 0018:ffffa48a8038feb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
 RAX: ffffffffaa8184f0 RBX: 0000000000000003 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000003
 RBP: 0000000000000003 R08: 0011251c6fcfac49 R09: ffff9346b995be00
 R10: ffffa48a805e7ce8 R11: 00000000024c38dd R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  ? __sched_text_end+0x1/0x1
  default_idle+0x1c/0x140
  do_idle+0x1c4/0x280
  cpu_startup_entry+0x19/0x20
  start_secondary+0x1a7/0x200
  secondary_startup_64+0xa4/0xb0
 Modules linked in: act_tunnel_key veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel mbcache snd_hda_intel jbd2 snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev snd_timer snd pcspkr virtio_balloon soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt fb_sys_fops ttm net_failover virtio_console virtio_blk failover drm serio_raw crc32c_intel ata_piix virtio_pci floppy virtio_ring libata virtio dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_tunnel_key_init() proved to fix
the above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514 ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action vlan pop pass index 90
 # tc actions replace action vlan \
 > pop goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action vlan

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: vlan  pop goto chain 42
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 800000007974f067 P4D 800000007974f067 PUD 79638067 PMD 0
 Oops: 0000 [riscvarchive#1] SMP PTI
 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-rc4.gotochain_crash+ #536
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffff982dfdb83be0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff982dfc55db00 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffff982df97099c0 RDI: ffff982dfc55db00
 RBP: ffff982dfdb83c80 R08: ffff982df983fec8 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff982df5aacd00
 R13: ffff982df5aacd08 R14: 0000000000000001 R15: ffff982df97099c0
 FS:  0000000000000000(0000) GS:ffff982dfdb80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000000796d0005 CR4: 00000000001606e0
 Call Trace:
  <IRQ>
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ip6_finish_output2+0x369/0x590
  ip6_finish_output2+0x369/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.35+0x79/0xc0
  mld_sendpack+0x16f/0x220
  mld_ifc_timer_expire+0x195/0x2c0
  ? igmp6_timer_handler+0x70/0x70
  call_timer_fn+0x2b/0x130
  run_timer_softirq+0x3e8/0x440
  ? enqueue_hrtimer+0x39/0x90
  __do_softirq+0xe3/0x2f5
  irq_exit+0xf0/0x100
  smp_apic_timer_interrupt+0x6c/0x130
  apic_timer_interrupt+0xf/0x20
  </IRQ>
 RIP: 0010:native_safe_halt+0x2/0x10
 Code: 7b ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
 RSP: 0018:ffffa4714038feb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
 RAX: ffffffff840184f0 RBX: 0000000000000003 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000001e57d3f387
 RBP: 0000000000000003 R08: 001125d9ca39e1eb R09: 0000000000000000
 R10: 000000000000027d R11: 000000000009f400 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  ? __sched_text_end+0x1/0x1
  default_idle+0x1c/0x140
  do_idle+0x1c4/0x280
  cpu_startup_entry+0x19/0x20
  start_secondary+0x1a7/0x200
  secondary_startup_64+0xa4/0xb0
 Modules linked in: act_vlan veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 snd_hda_codec_generic mbcache crct10dif_pclmul jbd2 snd_hda_intel crc32_pclmul snd_hda_codec ghash_clmulni_intel snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev snd_timer virtio_balloon snd pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt virtio_net fb_sys_fops virtio_blk ttm net_failover virtio_console failover ata_piix drm libata crc32c_intel virtio_pci serio_raw virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_vlan_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514 ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
A new mirred action is created by the tcf_mirred_init function. This
contains a list head struct which is inserted into a global list on
successful creation of a new action. However, after a creation, it is
still possible to error out and call the tcf_idr_release function. This,
in turn, calls the act_mirr cleanup function via __tcf_idr_release and
__tcf_action_put. This cleanup function tries to delete the list entry
which is as yet uninitialised, leading to a NULL pointer exception.

Fix this by initialising the list entry on creation of a new action.

Bug report:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
PGD 8000000840c73067 P4D 8000000840c73067 PUD 858dcc067 PMD 0
Oops: 0002 [riscvarchive#1] SMP PTI
CPU: 32 PID: 5636 Comm: handler194 Tainted: G           OE     5.0.0+ #186
Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.3.6 06/03/2015
RIP: 0010:tcf_mirred_release+0x42/0xa7 [act_mirred]
Code: f0 90 39 c0 e8 52 04 57 c8 48 c7 c7 b8 80 39 c0 e8 94 fa d4 c7 48 8b 93 d0 00 00 00 48 8b 83 d8 00 00 00 48 c7 c7 f0 90 39 c0 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 d0 00
RSP: 0018:ffffac4aa059f688 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff9dcd1b214d00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9dcd1fa165f8 RDI: ffffffffc03990f0
RBP: ffff9dccf9c7af80 R08: 0000000000000a3b R09: 0000000000000000
R10: ffff9dccfa11f420 R11: 0000000000000000 R12: 0000000000000001
R13: ffff9dcd16b433c0 R14: ffff9dcd1b214d80 R15: 0000000000000000
FS:  00007f441bfff700(0000) GS:ffff9dcd1fa00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000839e64004 CR4: 00000000001606e0
Call Trace:
tcf_action_cleanup+0x59/0xca
__tcf_action_put+0x54/0x6b
__tcf_idr_release.cold.33+0x9/0x12
tcf_mirred_init.cold.20+0x22e/0x3b0 [act_mirred]
tcf_action_init_1+0x3d0/0x4c0
tcf_action_init+0x9c/0x130
tcf_exts_validate+0xab/0xc0
fl_change+0x1ca/0x982 [cls_flower]
tc_new_tfilter+0x647/0x8d0
? load_balance+0x14b/0x9e0
rtnetlink_rcv_msg+0xe3/0x370
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? _cond_resched+0x15/0x30
? __kmalloc_node_track_caller+0x1d4/0x2b0
? rtnl_calcit.isra.31+0xf0/0xf0
netlink_rcv_skb+0x49/0x110
netlink_unicast+0x16f/0x210
netlink_sendmsg+0x1df/0x390
sock_sendmsg+0x36/0x40
___sys_sendmsg+0x27b/0x2c0
? futex_wake+0x80/0x140
? do_futex+0x2b9/0xac0
? ep_scan_ready_list.constprop.22+0x1f2/0x210
? ep_poll+0x7a/0x430
__sys_sendmsg+0x47/0x80
do_syscall_64+0x55/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 4e23281 ("net: sched: act_mirred: remove dependency on rtnl lock")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
Chandan reported that fstests' generic/026 test hit a crash:

  BUG: Unable to handle kernel data access at 0xc00000062ac40000
  Faulting instruction address: 0xc000000000092240
  Oops: Kernel access of bad area, sig: 11 [riscvarchive#1]
  LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries
  CPU: 0 PID: 27828 Comm: chacl Not tainted 5.0.0-rc2-next-20190115-00001-g6de6dba64dda riscvarchive#1
  NIP:  c000000000092240 LR: c00000000066a55c CTR: 0000000000000000
  REGS: c00000062c0c3430 TRAP: 0300   Not tainted  (5.0.0-rc2-next-20190115-00001-g6de6dba64dda)
  MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 44000842  XER: 20000000
  CFAR: 00007fff7f3108ac DAR: c00000062ac40000 DSISR: 40000000 IRQMASK: 0
  GPR00: 0000000000000000 c00000062c0c36c0 c0000000017f4c00 c00000000121a660
  GPR04: c00000062ac3fff9 0000000000000004 0000000000000020 00000000275b19c4
  GPR08: 000000000000000c 46494c4500000000 5347495f41434c5f c0000000026073a0
  GPR12: 0000000000000000 c0000000027a0000 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: c00000062ea70020 c00000062c0c38d0 0000000000000002 0000000000000002
  GPR24: c00000062ac3ffe8 00000000275b19c4 0000000000000001 c00000062ac30000
  GPR28: c00000062c0c38d0 c00000062ac30050 c00000062ac30058 0000000000000000
  NIP memcmp+0x120/0x690
  LR  xfs_attr3_leaf_lookup_int+0x53c/0x5b0
  Call Trace:
    xfs_attr3_leaf_lookup_int+0x78/0x5b0 (unreliable)
    xfs_da3_node_lookup_int+0x32c/0x5a0
    xfs_attr_node_addname+0x170/0x6b0
    xfs_attr_set+0x2ac/0x340
    __xfs_set_acl+0xf0/0x230
    xfs_set_acl+0xd0/0x160
    set_posix_acl+0xc0/0x130
    posix_acl_xattr_set+0x68/0x110
    __vfs_setxattr+0xa4/0x110
    __vfs_setxattr_noperm+0xac/0x240
    vfs_setxattr+0x128/0x130
    setxattr+0x248/0x600
    path_setxattr+0x108/0x120
    sys_setxattr+0x28/0x40
    system_call+0x5c/0x70
  Instruction dump:
  7d201c28 7d402428 7c295040 38630008 38840008 408201f0 4200ffe8 2c050000
  4182ff6c 20c50008 54c61838 7d201c28 <7d402428> 7d293436 7d4a3436 7c295040

The instruction dump decodes as:
  subfic  r6,r5,8
  rlwinm  r6,r6,3,0,28
  ldbrx   r9,0,r3
  ldbrx   r10,0,r4      <-

Which shows us doing an 8 byte load from c00000062ac3fff9, which
crosses the page boundary at c00000062ac40000 and faults.

It's not OK for memcmp to read past the end of the source or
destination buffers if that would cross a page boundary, because we
don't know that the next page is mapped.

As pointed out by Segher, we can read past the end of the source or
destination as long as we don't cross a 4K boundary, because that's
our minimum page size on all platforms.

The bug is in the code at the .Lcmp_rest_lt8bytes label. When we get
there we know that s1 is 8-byte aligned and we have at least 1 byte to
read, so a single 8-byte load won't read past the end of s1 and cross
a page boundary.

But we have to be more careful with s2. So check if it's within 8
bytes of a 4K boundary and if so go to the byte-by-byte loop.

Fixes: 2d9ee32 ("powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp()")
Cc: stable@vger.kernel.org # v4.19+
Reported-by: Chandan Rajendra <chandan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Tested-by: Chandan Rajendra <chandan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
Inside sbitmap_queue_clear(), once the clear bit is set, it will be
visiable to allocation path immediately. Meantime READ/WRITE on old
associated instance(such as request in case of blk-mq) may be
out-of-order with the setting clear bit, so race with re-allocation
may be triggered.

Adds one memory barrier for ordering READ/WRITE of the freed associated
instance with setting clear bit for avoiding race with re-allocation.

The following kernel oops triggerd by block/006 on aarch64 may be fixed:

[  142.330954] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000330
[  142.338794] Mem abort info:
[  142.341554]   ESR = 0x96000005
[  142.344632]   Exception class = DABT (current EL), IL = 32 bits
[  142.350500]   SET = 0, FnV = 0
[  142.353544]   EA = 0, S1PTW = 0
[  142.356678] Data abort info:
[  142.359528]   ISV = 0, ISS = 0x00000005
[  142.363343]   CM = 0, WnR = 0
[  142.366305] user pgtable: 64k pages, 48-bit VAs, pgdp = 000000002a3c51c0
[  142.372983] [0000000000000330] pgd=0000000000000000, pud=0000000000000000
[  142.379777] Internal error: Oops: 96000005 [riscvarchive#1] SMP
[  142.384613] Modules linked in: null_blk ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp vfat fat rpcrdma sunrpc rdma_ucm ib_iser rdma_cm iw_cm libiscsi ib_umad scsi_transport_iscsi ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core sbsa_gwdt crct10dif_ce ghash_ce ipmi_ssif sha2_ce ipmi_devintf sha256_arm64 sg sha1_ce ipmi_msghandler ip_tables xfs libcrc32c mlx5_core sdhci_acpi mlxfw ahci_platform at803x sdhci libahci_platform qcom_emac mmc_core hdma hdma_mgmt i2c_dev [last unloaded: null_blk]
[  142.429753] CPU: 7 PID: 1983 Comm: fio Not tainted 5.0.0.cki riscvarchive#2
[  142.449458] pstate: 00400005 (nzcv daif +PAN -UAO)
[  142.454239] pc : __blk_mq_free_request+0x4c/0xa8
[  142.458830] lr : blk_mq_free_request+0xec/0x118
[  142.463344] sp : ffff00003360f6a0
[  142.466646] x29: ffff00003360f6a0 x28: ffff000010e70000
[  142.471941] x27: ffff801729a50048 x26: 0000000000010000
[  142.477232] x25: ffff00003360f954 x24: ffff7bdfff021440
[  142.482529] x23: 0000000000000000 x22: 00000000ffffffff
[  142.487830] x21: ffff801729810000 x20: 0000000000000000
[  142.493123] x19: ffff801729a50000 x18: 0000000000000000
[  142.498413] x17: 0000000000000000 x16: 0000000000000001
[  142.503709] x15: 00000000000000ff x14: ffff7fe000000000
[  142.509003] x13: ffff8017dcde09a0 x12: 0000000000000000
[  142.514308] x11: 0000000000000001 x10: 0000000000000008
[  142.519597] x9 : ffff8017dcde09a0 x8 : 0000000000002000
[  142.524889] x7 : ffff8017dcde0a00 x6 : 000000015388f9be
[  142.530187] x5 : 0000000000000001 x4 : 0000000000000000
[  142.535478] x3 : 0000000000000000 x2 : 0000000000000000
[  142.540777] x1 : 0000000000000001 x0 : ffff00001041b194
[  142.546071] Process fio (pid: 1983, stack limit = 0x000000006460a0ea)
[  142.552500] Call trace:
[  142.554926]  __blk_mq_free_request+0x4c/0xa8
[  142.559181]  blk_mq_free_request+0xec/0x118
[  142.563352]  blk_mq_end_request+0xfc/0x120
[  142.567444]  end_cmd+0x3c/0xa8 [null_blk]
[  142.571434]  null_complete_rq+0x20/0x30 [null_blk]
[  142.576194]  blk_mq_complete_request+0x108/0x148
[  142.580797]  null_handle_cmd+0x1d4/0x718 [null_blk]
[  142.585662]  null_queue_rq+0x60/0xa8 [null_blk]
[  142.590171]  blk_mq_try_issue_directly+0x148/0x280
[  142.594949]  blk_mq_try_issue_list_directly+0x9c/0x108
[  142.600064]  blk_mq_sched_insert_requests+0xb0/0xd0
[  142.604926]  blk_mq_flush_plug_list+0x16c/0x2a0
[  142.609441]  blk_flush_plug_list+0xec/0x118
[  142.613608]  blk_finish_plug+0x3c/0x4c
[  142.617348]  blkdev_direct_IO+0x3b4/0x428
[  142.621336]  generic_file_read_iter+0x84/0x180
[  142.625761]  blkdev_read_iter+0x50/0x78
[  142.629579]  aio_read.isra.6+0xf8/0x190
[  142.633409]  __io_submit_one.isra.8+0x148/0x738
[  142.637912]  io_submit_one.isra.9+0x88/0xb8
[  142.642078]  __arm64_sys_io_submit+0xe0/0x238
[  142.646428]  el0_svc_handler+0xa0/0x128
[  142.650238]  el0_svc+0x8/0xc
[  142.653104] Code: b9402a63 f9000a7f 3100047f 540000a0 (f9419a81)
[  142.659202] ---[ end trace 467586bc175eb09d ]---

Fixes: ea86ea2 ("sbitmap: ammortize cost of clearing bits")
Reported-and-bisected_and_tested-by: Yi Zhang <yi.zhang@redhat.com>
Cc: Yi Zhang <yi.zhang@redhat.com>
Cc: "jianchao.wang" <jianchao.w.wang@oracle.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
Function __hw_perf_event_init() used a CPU variable without
ensuring CPU preemption has been disabled. This caused the
following warning in the kernel log:

  [ 7.277085] BUG: using smp_processor_id() in preemptible
                 [00000000] code: cf-csdiag/1892
  [ 7.277111] caller is cf_diag_event_init+0x13a/0x338
  [ 7.277122] CPU: 10 PID: 1892 Comm: cf-csdiag Not tainted
                 5.0.0-20190318.rc0.git0.9e1a11e0f602.300.fc29.s390x+debug riscvarchive#1
  [ 7.277131] Hardware name: IBM 2964 NC9 712 (LPAR)
  [ 7.277139] Call Trace:
  [ 7.277150] ([<000000000011385a>] show_stack+0x82/0xd0)
  [ 7.277161]  [<0000000000b7a71a>] dump_stack+0x92/0xd0
  [ 7.277174]  [<00000000007b7e9c>] check_preemption_disabled+0xe4/0x100
  [ 7.277183]  [<00000000001228aa>] cf_diag_event_init+0x13a/0x338
  [ 7.277195]  [<00000000002cf3aa>] perf_try_init_event+0x72/0xf0
  [ 7.277204]  [<00000000002d0bba>] perf_event_alloc+0x6fa/0xce0
  [ 7.277214]  [<00000000002dc4a8>] __s390x_sys_perf_event_open+0x398/0xd50
  [ 7.277224]  [<0000000000b9e8f0>] system_call+0xdc/0x2d8
  [ 7.277233] 2 locks held by cf-csdiag/1892:
  [ 7.277241]  #0: 00000000976f5510 (&sig->cred_guard_mutex){+.+.},
                  at: __s390x_sys_perf_event_open+0xd2e/0xd50
  [ 7.277257]  riscvarchive#1: 00000000363b11bd (&pmus_srcu){....},
                  at: perf_event_alloc+0x52e/0xce0

The variable is now accessed in proper context. Use
get_cpu_var()/put_cpu_var() pair to disable
preemption during access.
As the hardware authorization settings apply to all CPUs, it
does not matter which CPU is used to check the authorization setting.

Remove the event->count assignment. It is not needed as function
perf_event_alloc() allocates memory for the event with kzalloc() and
thus count is already set to zero.

Fixes: fe5908b ("s390/cpum_cf_diag: Add support for s390 counter facility diagnostic trace")

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
Syzkaller reports:

kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [riscvarchive#1] SMP KASAN PTI
CPU: 1 PID: 5373 Comm: syz-executor.0 Not tainted 5.0.0-rc8+ riscvarchive#3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
RIP: 0010:put_links+0x101/0x440 fs/proc/proc_sysctl.c:1599
Code: 00 0f 85 3a 03 00 00 48 8b 43 38 48 89 44 24 20 48 83 c0 38 48 89 c2 48 89 44 24 28 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 fe 02 00 00 48 8b 74 24 20 48 c7 c7 60 2a 9d 91
RSP: 0018:ffff8881d828f238 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff8881e01b1140 RCX: ffffffff8ee98267
RDX: 0000000000000007 RSI: ffffc90001479000 RDI: ffff8881e01b1178
RBP: dffffc0000000000 R08: ffffed103ee27259 R09: ffffed103ee27259
R10: 0000000000000001 R11: ffffed103ee27258 R12: fffffffffffffff4
R13: 0000000000000006 R14: ffff8881f59838c0 R15: dffffc0000000000
FS:  00007f072254f700(0000) GS:ffff8881f7100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fff8b286668 CR3: 00000001f0542002 CR4: 00000000007606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 drop_sysctl_table+0x152/0x9f0 fs/proc/proc_sysctl.c:1629
 get_subdir fs/proc/proc_sysctl.c:1022 [inline]
 __register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335
 br_netfilter_init+0xbc/0x1000 [br_netfilter]
 do_one_initcall+0xfa/0x5ca init/main.c:887
 do_init_module+0x204/0x5f6 kernel/module.c:3460
 load_module+0x66b2/0x8570 kernel/module.c:3808
 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x462e99
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f072254ec58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000003
RBP: 00007f072254ec70 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f072254f6bc
R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004
Modules linked in: br_netfilter(+) dvb_usb_dibusb_mc_common dib3000mc dibx000_common dvb_usb_dibusb_common dvb_usb_dw2102 dvb_usb classmate_laptop palmas_regulator cn videobuf2_v4l2 v4l2_common snd_soc_bd28623 mptbase snd_usb_usx2y snd_usbmidi_lib snd_rawmidi wmi libnvdimm lockd sunrpc grace rc_kworld_pc150u rc_core rtc_da9063 sha1_ssse3 i2c_cros_ec_tunnel adxl34x_spi adxl34x nfnetlink lib80211 i5500_temp dvb_as102 dvb_core videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops udc_core lnbp22 leds_lp3952 hid_roccat_ryos s1d13xxxfb mtd vport_geneve openvswitch nf_conncount nf_nat_ipv6 nsh geneve udp_tunnel ip6_udp_tunnel snd_soc_mt6351 sis_agp phylink snd_soc_adau1761_spi snd_soc_adau1761 snd_soc_adau17x1 snd_soc_core snd_pcm_dmaengine ac97_bus snd_compress snd_soc_adau_utils snd_soc_sigmadsp_regmap snd_soc_sigmadsp raid_class hid_roccat_konepure hid_roccat_common hid_roccat c2port_duramar2150 core mdio_bcm_unimac iptable_security iptable_raw iptable_mangle
 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim devlink vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel joydev mousedev ide_pci_generic piix aesni_intel aes_x86_64 ide_core crypto_simd atkbd cryptd glue_helper serio_raw ata_generic pata_acpi i2c_piix4 floppy sch_fq_codel ip_tables x_tables ipv6 [last unloaded: lm73]
Dumping ftrace buffer:
   (ftrace buffer empty)
---[ end trace 770020de38961fd0 ]---

A new dir entry can be created in get_subdir and its 'header->parent' is
set to NULL.  Only after insert_header success, it will be set to 'dir',
otherwise 'header->parent' is set to NULL and drop_sysctl_table is called.
However in err handling path of get_subdir, drop_sysctl_table also be
called on 'new->header' regardless its value of parent pointer.  Then
put_links is called, which triggers NULL-ptr deref when access member of
header->parent.

In fact we have multiple error paths which call drop_sysctl_table() there,
upon failure on insert_links() we also call drop_sysctl_table().And even
in the successful case on __register_sysctl_table() we still always call
drop_sysctl_table().This patch fix it.

Link: http://lkml.kernel.org/r/20190314085527.13244-1-yuehaibing@huawei.com
Fixes: 0e47c99 ("sysctl: Replace root_list with links between sysctl_table_sets")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>    [3.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
…mory

When halting a guest, QEMU flushes the virtual ITS caches, which
amounts to writing to the various tables that the guest has allocated.

When doing this, we fail to take the srcu lock, and the kernel
shouts loudly if running a lockdep kernel:

[   69.680416] =============================
[   69.680819] WARNING: suspicious RCU usage
[   69.681526] 5.1.0-rc1-00008-g600025238f51-dirty riscvarchive#18 Not tainted
[   69.682096] -----------------------------
[   69.682501] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage!
[   69.683225]
[   69.683225] other info that might help us debug this:
[   69.683225]
[   69.683975]
[   69.683975] rcu_scheduler_active = 2, debug_locks = 1
[   69.684598] 6 locks held by qemu-system-aar/4097:
[   69.685059]  #0: 0000000034196013 (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0
[   69.686087]  riscvarchive#1: 00000000f2ed935e (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0
[   69.686919]  riscvarchive#2: 000000005e71ea54 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[   69.687698]  riscvarchive#3: 00000000c17e548d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[   69.688475]  riscvarchive#4: 00000000ba386017 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[   69.689978]  riscvarchive#5: 00000000c2c3c335 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[   69.690729]
[   69.690729] stack backtrace:
[   69.691151] CPU: 2 PID: 4097 Comm: qemu-system-aar Not tainted 5.1.0-rc1-00008-g600025238f51-dirty riscvarchive#18
[   69.691984] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019
[   69.692831] Call trace:
[   69.694072]  lockdep_rcu_suspicious+0xcc/0x110
[   69.694490]  gfn_to_memslot+0x174/0x190
[   69.694853]  kvm_write_guest+0x50/0xb0
[   69.695209]  vgic_its_save_tables_v0+0x248/0x330
[   69.695639]  vgic_its_set_attr+0x298/0x3a0
[   69.696024]  kvm_device_ioctl_attr+0x9c/0xd8
[   69.696424]  kvm_device_ioctl+0x8c/0xf8
[   69.696788]  do_vfs_ioctl+0xc8/0x960
[   69.697128]  ksys_ioctl+0x8c/0xa0
[   69.697445]  __arm64_sys_ioctl+0x28/0x38
[   69.697817]  el0_svc_common+0xd8/0x138
[   69.698173]  el0_svc_handler+0x38/0x78
[   69.698528]  el0_svc+0x8/0xc

The fix is to obviously take the srcu lock, just like we do on the
read side of things since bf30824. One wonders why this wasn't
fixed at the same time, but hey...

Fixes: bf30824 ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
AlexGhiti pushed a commit to AlexGhiti/riscv-linux that referenced this issue Apr 4, 2019
Calling kvm_is_visible_gfn() implies that we're parsing the memslots,
and doing this without the srcu lock is frown upon:

[12704.164532] =============================
[12704.164544] WARNING: suspicious RCU usage
[12704.164560] 5.1.0-rc1-00008-g600025238f51-dirty riscvarchive#16 Tainted: G        W
[12704.164573] -----------------------------
[12704.164589] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage!
[12704.164602] other info that might help us debug this:
[12704.164616] rcu_scheduler_active = 2, debug_locks = 1
[12704.164631] 6 locks held by qemu-system-aar/13968:
[12704.164644]  #0: 000000007ebdae4f (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0
[12704.164691]  riscvarchive#1: 000000007d751022 (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0
[12704.164726]  riscvarchive#2: 00000000219d2706 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[12704.164761]  riscvarchive#3: 00000000a760aecd (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[12704.164794]  riscvarchive#4: 000000000ef8e31d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[12704.164827]  riscvarchive#5: 000000007a872093 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
[12704.164861] stack backtrace:
[12704.164878] CPU: 2 PID: 13968 Comm: qemu-system-aar Tainted: G        W         5.1.0-rc1-00008-g600025238f51-dirty riscvarchive#16
[12704.164887] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019
[12704.164896] Call trace:
[12704.164910]  dump_backtrace+0x0/0x138
[12704.164920]  show_stack+0x24/0x30
[12704.164934]  dump_stack+0xbc/0x104
[12704.164946]  lockdep_rcu_suspicious+0xcc/0x110
[12704.164958]  gfn_to_memslot+0x174/0x190
[12704.164969]  kvm_is_visible_gfn+0x28/0x70
[12704.164980]  vgic_its_check_id.isra.0+0xec/0x1e8
[12704.164991]  vgic_its_save_tables_v0+0x1ac/0x330
[12704.165001]  vgic_its_set_attr+0x298/0x3a0
[12704.165012]  kvm_device_ioctl_attr+0x9c/0xd8
[12704.165022]  kvm_device_ioctl+0x8c/0xf8
[12704.165035]  do_vfs_ioctl+0xc8/0x960
[12704.165045]  ksys_ioctl+0x8c/0xa0
[12704.165055]  __arm64_sys_ioctl+0x28/0x38
[12704.165067]  el0_svc_common+0xd8/0x138
[12704.165078]  el0_svc_handler+0x38/0x78
[12704.165089]  el0_svc+0x8/0xc

Make sure the lock is taken when doing this.

Fixes: bf30824 ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock")
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants