Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Received tx completion for invalid msdu_id: 0 #1

Open
dasnapkinman opened this issue Jan 19, 2017 · 3 comments
Open

Received tx completion for invalid msdu_id: 0 #1

dasnapkinman opened this issue Jan 19, 2017 · 3 comments

Comments

@dasnapkinman
Copy link

I've compiled this experimental USB support and tested with the Linksys WUSB6100 USB adapter (QCA9377). My purpose is for P2P testing (WiFi Display Development). The driver picks up the device and also initialized properly. I am able to broadcast the correct IEs for WFD, however during or immediately after GO negotiation with any device (Nexus 4 and Surface 3 tested) the situation fails and the driver begins multiple log lines of "Received tx completion for invalid msdu_id: 0" . I see a bug in http://lists.infradead.org/pipermail/ath10k/2016-March/006969.html as a possible culprit but am no expert in ath10k. How can I provide more useful feedback?

@erstrom
Copy link
Owner

erstrom commented Jan 19, 2017

Can you please provide some debug logs. HTT logging would be useful:

core.c:
unsigned int ath10k_debug_mask = ATH10K_DBG_HTT; /ATH10K_DBG_HTT = 0x00000008,/

Make sure you build with CONFIG_ATH10K_DEBUG.

Can you also check if you have trigged any of the below prints in usb.c?:
ath10k_warn(ar, "Zero length frame received! Possible fw crash?\n");
ath10k_warn(ar, "Malformed frame received! Possible fw crash?\n");

@dasnapkinman
Copy link
Author

dasnapkinman commented Jan 20, 2017 via email

@erstrom
Copy link
Owner

erstrom commented Jan 22, 2017

I had expected to see "htt tx alloc msdu_id 0" prints in log, but they are not present...
Yet there are plenty of "htt rx, msg_type: 0xE" prints which indicates that the device transmits Mgmt TX completion messages to the host.
I always thought these were responses to Mgmt TX messages so it is a little odd that there are no "htt tx alloc msdu_id 0" prints.

Unfortunately I am not familiar with your setup.
Can you provide me some more info about your setup. Which commands you are running (iw, wpa_cli etc.)?
If I can find the time, I might give it a shot.

erstrom pushed a commit that referenced this issue Jan 22, 2017
The following crash may be seen if bad data is received from the
touchscreen.

[ 2189.425150] elants_i2c i2c-ELAN0001:00: unknown packet ff ff ff ff
[ 2189.430738] divide error: 0000 [#1] PREEMPT SMP
[ 2189.434679] gsmi: Log Shutdown Reason 0x03
[ 2189.434689] Modules linked in: ip6t_REJECT nf_reject_ipv6 rfcomm evdi
uinput uvcvideo cmac videobuf2_vmalloc videobuf2_memops snd_hda_codec_hdmi
i2c_dev videobuf2_core snd_soc_sst_cht_bsw_rt5645 snd_hda_intel
snd_intel_sst_acpi btusb btrtl btbcm btintel bluetooth snd_soc_sst_acpi
snd_hda_codec snd_intel_sst_core snd_hwdep snd_soc_sst_mfld_platform
snd_hda_core snd_soc_rt5645 memconsole_x86_legacy memconsole zram snd_soc_rl6231
fuse ip6table_filter iwlmvm iwlwifi iwl7000_mac80211 cfg80211 iio_trig_sysfs
joydev cros_ec_sensors cros_ec_sensors_core industrialio_triggered_buffer
kfifo_buf industrialio snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq
snd_seq_device ppp_async ppp_generic slhc tun
[ 2189.434866] CPU: 0 PID: 106 Comm: irq/184-ELAN000 Tainted: G        W
3.18.0-13101-g57e8190 #1
[ 2189.434883] Hardware name: GOOGLE Ultima, BIOS Google_Ultima.7287.131.43 07/20/2016
[ 2189.434898] task: ffff88017a0b6d80 ti: ffff88017a2bc000 task.ti: ffff88017a2bc000
[ 2189.434913] RIP: 0010:[<ffffffffbecc48d5>]  [<ffffffffbecc48d5>] elants_i2c_irq+0x190/0x200
[ 2189.434937] RSP: 0018:ffff88017a2bfd98  EFLAGS: 00010293
[ 2189.434948] RAX: 0000000000000000 RBX: ffff88017a967828 RCX: ffff88017a9678e8
[ 2189.434962] RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000000000000
[ 2189.434975] RBP: ffff88017a2bfdd8 R08: 00000000000003e8 R09: 0000000000000000
[ 2189.434989] R10: 0000000000000000 R11: 000000000044a2bd R12: ffff88017a991800
[ 2189.435001] R13: ffffffffbe8a2a53 R14: ffff88017a0b6d80 R15: ffff88017a0b6d80
[ 2189.435011] FS:  0000000000000000(0000) GS:ffff88017fc00000(0000) knlGS:0000000000000000
[ 2189.435022] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2189.435030] CR2: 00007f678d94b000 CR3: 000000003f41a000 CR4: 00000000001007f0
[ 2189.435039] Stack:
[ 2189.435044]  ffff88017a2bfda8 ffff88017a9678e8 646464647a2bfdd8 0000000006e09574
[ 2189.435060]  0000000000000000 ffff88017a088b80 ffff88017a921000 ffffffffbe8a2a53
[ 2189.435074]  ffff88017a2bfe08 ffffffffbe8a2a73 ffff88017a0b6d80 0000000006e09574
[ 2189.435089] Call Trace:
[ 2189.435101]  [<ffffffffbe8a2a53>] ? irq_thread_dtor+0xa9/0xa9
[ 2189.435112]  [<ffffffffbe8a2a73>] irq_thread_fn+0x20/0x40
[ 2189.435123]  [<ffffffffbe8a2be1>] irq_thread+0x14e/0x222
[ 2189.435135]  [<ffffffffbee8cbeb>] ? __schedule+0x3b3/0x57a
[ 2189.435145]  [<ffffffffbe8a29aa>] ? wake_threads_waitq+0x2d/0x2d
[ 2189.435156]  [<ffffffffbe8a2a93>] ? irq_thread_fn+0x40/0x40
[ 2189.435168]  [<ffffffffbe87c385>] kthread+0x10e/0x116
[ 2189.435178]  [<ffffffffbe87c277>] ? __kthread_parkme+0x67/0x67
[ 2189.435189]  [<ffffffffbee900ac>] ret_from_fork+0x7c/0xb0
[ 2189.435199]  [<ffffffffbe87c277>] ? __kthread_parkme+0x67/0x67
[ 2189.435208] Code: ff ff eb 73 0f b6 bb c1 00 00 00 83 ff 03 7e 13 49 8d 7c
24 20 ba 04 00 00 00 48 c7 c6 8a cd 21 bf eb 4d 0f b6 83 c2 00 00 00 99 <f7> ff
83 f8 37 75 15 48 6b f7 37 4c 8d a3 c4 00 00 00 4c 8d ac
[ 2189.435312] RIP  [<ffffffffbecc48d5>] elants_i2c_irq+0x190/0x200
[ 2189.435323]  RSP <ffff88017a2bfd98>
[ 2189.435350] ---[ end trace f4945345a75d96dd ]---
[ 2189.443841] Kernel panic - not syncing: Fatal exception
[ 2189.444307] Kernel Offset: 0x3d800000 from 0xffffffff81000000
	(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 2189.444519] gsmi: Log Shutdown Reason 0x02

The problem was seen with a 3.18 based kernel, but there is no reason
to believe that the upstream code is safe.

Fixes: 66aee90 ("Input: add support for Elan eKTH I2C touchscreens")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
erstrom pushed a commit that referenced this issue Jan 22, 2017
The crash happens rather often when we reset some cluster nodes while
nodes contend fiercely to do truncate and append.

The crash backtrace is below:

   dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover_grant 1 locks on 971 resources
   dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover 9 generation 5 done: 4 ms
   ocfs2: Begin replay journal (node 318952601, slot 2) on device (253,18)
   ocfs2: End replay journal (node 318952601, slot 2) on device (253,18)
   ocfs2: Beginning quota recovery on device (253,18) for slot 2
   ocfs2: Finishing quota recovery on device (253,18) for slot 2
   (truncate,30154,1):ocfs2_truncate_file:470 ERROR: bug expression: le64_to_cpu(fe->i_size) != i_size_read(inode)
   (truncate,30154,1):ocfs2_truncate_file:470 ERROR: Inode 290321, inode i_size = 732 != di i_size = 937, i_flags = 0x1
   ------------[ cut here ]------------
   kernel BUG at /usr/src/linux/fs/ocfs2/file.c:470!
   invalid opcode: 0000 [#1] SMP
   Modules linked in: ocfs2_stack_user(OEN) ocfs2(OEN) ocfs2_nodemanager ocfs2_stackglue(OEN) quota_tree dlm(OEN) configfs fuse sd_mod    iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs softdog xfs libcrc32c ppdev parport_pc pcspkr parport      joydev virtio_balloon virtio_net i2c_piix4 acpi_cpufreq button processor ext4 crc16 jbd2 mbcache ata_generic cirrus virtio_blk ata_piix               drm_kms_helper ahci syscopyarea libahci sysfillrect sysimgblt fb_sys_fops ttm floppy libata drm virtio_pci virtio_ring uhci_hcd virtio ehci_hcd       usbcore serio_raw usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
   Supported: No, Unsupported modules are loaded
   CPU: 1 PID: 30154 Comm: truncate Tainted: G           OE   N  4.4.21-69-default #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20151112_172657-sheep25 04/01/2014
   task: ffff88004ff6d240 ti: ffff880074e68000 task.ti: ffff880074e68000
   RIP: 0010:[<ffffffffa05c8c30>]  [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]
   RSP: 0018:ffff880074e6bd50  EFLAGS: 00010282
   RAX: 0000000000000074 RBX: 000000000000029e RCX: 0000000000000000
   RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
   RBP: ffff880074e6bda8 R08: 000000003675dc7a R09: ffffffff82013414
   R10: 0000000000034c50 R11: 0000000000000000 R12: ffff88003aab3448
   R13: 00000000000002dc R14: 0000000000046e11 R15: 0000000000000020
   FS:  00007f839f965700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
   CR2: 00007f839f97e000 CR3: 0000000036723000 CR4: 00000000000006e0
   Call Trace:
     ocfs2_setattr+0x698/0xa90 [ocfs2]
     notify_change+0x1ae/0x380
     do_truncate+0x5e/0x90
     do_sys_ftruncate.constprop.11+0x108/0x160
     entry_SYSCALL_64_fastpath+0x12/0x6d
   Code: 24 28 ba d6 01 00 00 48 c7 c6 30 43 62 a0 8b 41 2c 89 44 24 08 48 8b 41 20 48 c7 c1 78 a3 62 a0 48 89 04 24 31 c0 e8 a0 97 f9 ff <0f> 0b 3d 00 fe ff ff 0f 84 ab fd ff ff 83 f8 fc 0f 84 a2 fd ff
   RIP  [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]

It's because ocfs2_inode_lock() get us stale LVB in which the i_size is
not equal to the disk i_size.  We mistakenly trust the LVB because the
underlaying fsdlm dlm_lock() doesn't set lkb_sbflags with
DLM_SBF_VALNOTVALID properly for us.  But, why?

The current code tries to downconvert lock without DLM_LKF_VALBLK flag
to tell o2cb don't update RSB's LVB if it's a PR->NULL conversion, even
if the lock resource type needs LVB.  This is not the right way for
fsdlm.

The fsdlm plugin behaves different on DLM_LKF_VALBLK, it depends on
DLM_LKF_VALBLK to decide if we care about the LVB in the LKB.  If
DLM_LKF_VALBLK is not set, fsdlm will skip recovering RSB's LVB from
this lkb and set the right DLM_SBF_VALNOTVALID appropriately when node
failure happens.

The following diagram briefly illustrates how this crash happens:

RSB1 is inode metadata lock resource with LOCK_TYPE_USES_LVB;

The 1st round:

             Node1                                    Node2
RSB1: PR
                                                  RSB1(master): NULL->EX
ocfs2_downconvert_lock(PR->NULL, set_lvb==0)
  ocfs2_dlm_lock(no DLM_LKF_VALBLK)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

dlm_lock(no DLM_LKF_VALBLK)
  convert_lock(overwrite lkb->lkb_exflags
               with no DLM_LKF_VALBLK)

RSB1: NULL                                        RSB1: EX
                                                  reset Node2
dlm_recover_rsbs()
  recover_lvb()

/* The LVB is not trustable if the node with EX fails and
 * no lock >= PR is left. We should set RSB_VALNOTVALID for RSB1.
 */

 if(!(kb_exflags & DLM_LKF_VALBLK)) /* This means we miss the chance to
           return;                   * to invalid the LVB here.
                                     */

The 2nd round:

         Node 1                                Node2
RSB1(become master from recovery)

ocfs2_setattr()
  ocfs2_inode_lock(NULL->EX)
    /* dlm_lock() return the stale lvb without setting DLM_SBF_VALNOTVALID */
    ocfs2_meta_lvb_is_trustable() return 1 /* so we don't refresh inode from disk */
  ocfs2_truncate_file()
      mlog_bug_on_msg(disk isize != i_size_read(inode))  /* crash! */

The fix is quite straightforward.  We keep to set DLM_LKF_VALBLK flag
for dlm_lock() if the lock resource type needs LVB and the fsdlm plugin
is uesed.

Link: http://lkml.kernel.org/r/1481275846-6604-1-git-send-email-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
erstrom pushed a commit that referenced this issue Jan 22, 2017
Both arch_add_memory() and arch_remove_memory() expect a single threaded
context.

For example, arch/x86/mm/init_64.c::kernel_physical_mapping_init() does
not hold any locks over this check and branch:

    if (pgd_val(*pgd)) {
    	pud = (pud_t *)pgd_page_vaddr(*pgd);
    	paddr_last = phys_pud_init(pud, __pa(vaddr),
    				   __pa(vaddr_end),
    				   page_size_mask);
    	continue;
    }

    pud = alloc_low_page();
    paddr_last = phys_pud_init(pud, __pa(vaddr), __pa(vaddr_end),
    			   page_size_mask);

The result is that two threads calling devm_memremap_pages()
simultaneously can end up colliding on pgd initialization.  This leads
to crash signatures like the following where the loser of the race
initializes the wrong pgd entry:

    BUG: unable to handle kernel paging request at ffff888ebfff0000
    IP: memcpy_erms+0x6/0x10
    PGD 2f8e8fc067 PUD 0 /* <---- Invalid PUD */
    Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
    CPU: 54 PID: 3818 Comm: systemd-udevd Not tainted 4.6.7+ #13
    task: ffff882fac290040 ti: ffff882f887a4000 task.ti: ffff882f887a4000
    RIP: memcpy_erms+0x6/0x10
    [..]
    Call Trace:
      ? pmem_do_bvec+0x205/0x370 [nd_pmem]
      ? blk_queue_enter+0x3a/0x280
      pmem_rw_page+0x38/0x80 [nd_pmem]
      bdev_read_page+0x84/0xb0

Hold the standard memory hotplug mutex over calls to
arch_{add,remove}_memory().

Fixes: 41e94a8 ("add devm_memremap_pages")
Link: http://lkml.kernel.org/r/148357647831.9498.12606007370121652979.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
erstrom pushed a commit that referenced this issue Jan 22, 2017
Reported syzkaller:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    IP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
    PGD 0

    Oops: 0002 [#1] SMP
    CPU: 1 PID: 125 Comm: kworker/1:1 Not tainted 4.9.0+ #1
    Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
    task: ffff9bbe0dfbb900 task.stack: ffffb61802014000
    RIP: 0010:irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
    Call Trace:
     irqfd_shutdown+0x66/0xa0 [kvm]
     process_one_work+0x16b/0x480
     worker_thread+0x4b/0x500
     kthread+0x101/0x140
     ? process_one_work+0x480/0x480
     ? kthread_create_on_node+0x60/0x60
     ret_from_fork+0x25/0x30
    RIP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] RSP: ffffb61802017e20
    CR2: 0000000000000008

The syzkaller folks reported a NULL pointer dereference that due to
unregister an consumer which fails registration before. The syzkaller
creates two VMs w/ an equal eventfd occasionally. So the second VM
fails to register an irqbypass consumer. It will make irqfd as inactive
and queue an workqueue work to shutdown irqfd and unregister the irqbypass
consumer when eventfd is closed. However, the second consumer has been
initialized though it fails registration. So the token(same as the first
VM's) is taken to unregister the consumer through the workqueue, the
consumer of the first VM is found and unregistered, then NULL deref incurred
in the path of deleting consumer from the consumers list.

This patch fixes it by making irq_bypass_register/unregister_consumer()
looks for the consumer entry based on consumer pointer itself instead of
token matching.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
erstrom pushed a commit that referenced this issue Jan 22, 2017
Reported by syzkaller:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000001b0
    IP: _raw_spin_lock+0xc/0x30
    PGD 3e28eb067
    PUD 3f0ac6067
    PMD 0
    Oops: 0002 [#1] SMP
    CPU: 0 PID: 2431 Comm: test Tainted: G           OE   4.10.0-rc1+ #3
    Call Trace:
     ? kvm_ioapic_scan_entry+0x3e/0x110 [kvm]
     kvm_arch_vcpu_ioctl_run+0x10a8/0x15f0 [kvm]
     ? pick_next_task_fair+0xe1/0x4e0
     ? kvm_arch_vcpu_load+0xea/0x260 [kvm]
     kvm_vcpu_ioctl+0x33a/0x600 [kvm]
     ? hrtimer_try_to_cancel+0x29/0x130
     ? do_nanosleep+0x97/0xf0
     do_vfs_ioctl+0xa1/0x5d0
     ? __hrtimer_init+0x90/0x90
     ? do_nanosleep+0x5b/0xf0
     SyS_ioctl+0x79/0x90
     do_syscall_64+0x6e/0x180
     entry_SYSCALL64_slow_path+0x25/0x25
    RIP: _raw_spin_lock+0xc/0x30 RSP: ffffa43688973cc0

The syzkaller folks reported a NULL pointer dereference due to
ENABLE_CAP succeeding even without an irqchip.  The Hyper-V
synthetic interrupt controller is activated, resulting in a
wrong request to rescan the ioapic and a NULL pointer dereference.

    #include <sys/ioctl.h>
    #include <sys/mman.h>
    #include <sys/types.h>
    #include <linux/kvm.h>
    #include <pthread.h>
    #include <stddef.h>
    #include <stdint.h>
    #include <stdlib.h>
    #include <string.h>
    #include <unistd.h>

    #ifndef KVM_CAP_HYPERV_SYNIC
    #define KVM_CAP_HYPERV_SYNIC 123
    #endif

    void* thr(void* arg)
    {
	struct kvm_enable_cap cap;
	cap.flags = 0;
	cap.cap = KVM_CAP_HYPERV_SYNIC;
	ioctl((long)arg, KVM_ENABLE_CAP, &cap);
	return 0;
    }

    int main()
    {
	void *host_mem = mmap(0, 0x1000, PROT_READ|PROT_WRITE,
			MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
	int kvmfd = open("/dev/kvm", 0);
	int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
	struct kvm_userspace_memory_region memreg;
	memreg.slot = 0;
	memreg.flags = 0;
	memreg.guest_phys_addr = 0;
	memreg.memory_size = 0x1000;
	memreg.userspace_addr = (unsigned long)host_mem;
	host_mem[0] = 0xf4;
	ioctl(vmfd, KVM_SET_USER_MEMORY_REGION, &memreg);
	int cpufd = ioctl(vmfd, KVM_CREATE_VCPU, 0);
	struct kvm_sregs sregs;
	ioctl(cpufd, KVM_GET_SREGS, &sregs);
	sregs.cr0 = 0;
	sregs.cr4 = 0;
	sregs.efer = 0;
	sregs.cs.selector = 0;
	sregs.cs.base = 0;
	ioctl(cpufd, KVM_SET_SREGS, &sregs);
	struct kvm_regs regs = { .rflags = 2 };
	ioctl(cpufd, KVM_SET_REGS, &regs);
	ioctl(vmfd, KVM_CREATE_IRQCHIP, 0);
	pthread_t th;
	pthread_create(&th, 0, thr, (void*)(long)cpufd);
	usleep(rand() % 10000);
	ioctl(cpufd, KVM_RUN, 0);
	pthread_join(th, 0);
	return 0;
    }

This patch fixes it by failing ENABLE_CAP if without an irqchip.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: 5c91941 (kvm/x86: Hyper-V synthetic interrupt controller)
Cc: stable@vger.kernel.org # 4.5+
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
erstrom pushed a commit that referenced this issue Feb 16, 2017
[ Upstream commit c6894de ]

After promisc mode management was introduced a bridge device could do
dev_set_promiscuity from its ndo_change_rx_flags() callback which in
turn can be called after the bridge's addr_list_lock has been taken
(e.g. by dev_uc_add). This causes a false positive lockdep splat because
the port interfaces' addr_list_lock is taken when br_manage_promisc()
runs after the bridge's addr list lock was already taken.
To remove the false positive introduce a custom bridge addr_list_lock
class and set it on bridge init.
A simple way to reproduce this is with the following:
$ brctl addbr br0
$ ip l add l br0 br0.100 type vlan id 100
$ ip l set br0 up
$ ip l set br0.100 up
$ echo 1 > /sys/class/net/br0/bridge/vlan_filtering
$ brctl addif br0 eth0
Splat:
[   43.684325] =============================================
[   43.684485] [ INFO: possible recursive locking detected ]
[   43.684636] 4.4.0-rc8+ #54 Not tainted
[   43.684755] ---------------------------------------------
[   43.684906] brctl/1187 is trying to acquire lock:
[   43.685047]  (_xmit_ETHER){+.....}, at: [<ffffffff8150169e>] dev_set_rx_mode+0x1e/0x40
[   43.685460]  but task is already holding lock:
[   43.685618]  (_xmit_ETHER){+.....}, at: [<ffffffff815072a7>] dev_uc_add+0x27/0x80
[   43.686015]  other info that might help us debug this:
[   43.686316]  Possible unsafe locking scenario:

[   43.686743]        CPU0
[   43.686967]        ----
[   43.687197]   lock(_xmit_ETHER);
[   43.687544]   lock(_xmit_ETHER);
[   43.687886] *** DEADLOCK ***

[   43.688438]  May be due to missing lock nesting notation

[   43.688882] 2 locks held by brctl/1187:
[   43.689134]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81510317>] rtnl_lock+0x17/0x20
[   43.689852]  #1:  (_xmit_ETHER){+.....}, at: [<ffffffff815072a7>] dev_uc_add+0x27/0x80
[   43.690575] stack backtrace:
[   43.690970] CPU: 0 PID: 1187 Comm: brctl Not tainted 4.4.0-rc8+ #54
[   43.691270] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
[   43.691770]  ffffffff826a25c0 ffff8800369fb8e0 ffffffff81360ceb ffffffff826a25c0
[   43.692425]  ffff8800369fb9b8 ffffffff810d0466 ffff8800369fb968 ffffffff81537139
[   43.693071]  ffff88003a08c880 0000000000000000 00000000ffffffff 0000000002080020
[   43.693709] Call Trace:
[   43.693931]  [<ffffffff81360ceb>] dump_stack+0x4b/0x70
[   43.694199]  [<ffffffff810d0466>] __lock_acquire+0x1e46/0x1e90
[   43.694483]  [<ffffffff81537139>] ? netlink_broadcast_filtered+0x139/0x3e0
[   43.694789]  [<ffffffff8153b5da>] ? nlmsg_notify+0x5a/0xc0
[   43.695064]  [<ffffffff810d10f5>] lock_acquire+0xe5/0x1f0
[   43.695340]  [<ffffffff8150169e>] ? dev_set_rx_mode+0x1e/0x40
[   43.695623]  [<ffffffff815edea5>] _raw_spin_lock_bh+0x45/0x80
[   43.695901]  [<ffffffff8150169e>] ? dev_set_rx_mode+0x1e/0x40
[   43.696180]  [<ffffffff8150169e>] dev_set_rx_mode+0x1e/0x40
[   43.696460]  [<ffffffff8150189c>] dev_set_promiscuity+0x3c/0x50
[   43.696750]  [<ffffffffa0586845>] br_port_set_promisc+0x25/0x50 [bridge]
[   43.697052]  [<ffffffffa05869aa>] br_manage_promisc+0x8a/0xe0 [bridge]
[   43.697348]  [<ffffffffa05826ee>] br_dev_change_rx_flags+0x1e/0x20 [bridge]
[   43.697655]  [<ffffffff81501532>] __dev_set_promiscuity+0x132/0x1f0
[   43.697943]  [<ffffffff81501672>] __dev_set_rx_mode+0x82/0x90
[   43.698223]  [<ffffffff815072de>] dev_uc_add+0x5e/0x80
[   43.698498]  [<ffffffffa05b3c62>] vlan_device_event+0x542/0x650 [8021q]
[   43.698798]  [<ffffffff8109886d>] notifier_call_chain+0x5d/0x80
[   43.699083]  [<ffffffff810988b6>] raw_notifier_call_chain+0x16/0x20
[   43.699374]  [<ffffffff814f456e>] call_netdevice_notifiers_info+0x6e/0x80
[   43.699678]  [<ffffffff814f4596>] call_netdevice_notifiers+0x16/0x20
[   43.699973]  [<ffffffffa05872be>] br_add_if+0x47e/0x4c0 [bridge]
[   43.700259]  [<ffffffffa058801e>] add_del_if+0x6e/0x80 [bridge]
[   43.700548]  [<ffffffffa0588b5f>] br_dev_ioctl+0xaf/0xc0 [bridge]
[   43.700836]  [<ffffffff8151a7ac>] dev_ifsioc+0x30c/0x3c0
[   43.701106]  [<ffffffff8151aac9>] dev_ioctl+0xf9/0x6f0
[   43.701379]  [<ffffffff81254345>] ? mntput_no_expire+0x5/0x450
[   43.701665]  [<ffffffff812543ee>] ? mntput_no_expire+0xae/0x450
[   43.701947]  [<ffffffff814d7b02>] sock_do_ioctl+0x42/0x50
[   43.702219]  [<ffffffff814d8175>] sock_ioctl+0x1e5/0x290
[   43.702500]  [<ffffffff81242d0b>] do_vfs_ioctl+0x2cb/0x5c0
[   43.702771]  [<ffffffff81243079>] SyS_ioctl+0x79/0x90
[   43.703033]  [<ffffffff815eebb6>] entry_SYSCALL_64_fastpath+0x16/0x7a

CC: Vlad Yasevich <vyasevic@redhat.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: Bridge list <bridge@lists.linux-foundation.org>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Fixes: 2796d0c ("bridge: Automatically manage port promiscuous mode.")
Reported-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Feb 16, 2017
commit 9b2761c upstream.

The maximum chunks used by the function is
(SPI_AGGR_BUFFER_SIZE / WSPI_MAX_CHUNK_SIZE + 1).
The original commands array had space for
(SPI_AGGR_BUFFER_SIZE / WSPI_MAX_CHUNK_SIZE) commands.
When the last chunk is used (len > 4 * WSPI_MAX_CHUNK_SIZE), the last
command is stored outside the bounds of the commands array.

Oops 5 (page fault) is generated during current wl1271 firmware load
attempt:

root@debian-armhf:~# ifconfig wlan0 up
[  294.312399] Unable to handle kernel paging request at virtual address
00203fc4
[  294.320173] pgd = de528000
[  294.323028] [00203fc4] *pgd=00000000
[  294.326916] Internal error: Oops: 5 [#1] SMP ARM
[  294.331789] Modules linked in: bnep rfcomm bluetooth ipv6 arc4 wl12xx
wlcore mac80211 musb_dsps cfg80211 musb_hdrc usbcore usb_common
wlcore_spi omap_rng rng_core musb_am335x omap_wdt cpufreq_dt thermal_sys
hwmon
[  294.351838] CPU: 0 PID: 1827 Comm: ifconfig Not tainted
4.2.0-00002-g3e9ad27-dirty #78
[  294.360154] Hardware name: Generic AM33XX (Flattened Device Tree)
[  294.366557] task: dc9d6d40 ti: de550000 task.ti: de550000
[  294.372236] PC is at __spi_validate+0xa8/0x2ac
[  294.376902] LR is at __spi_sync+0x78/0x210
[  294.381200] pc : [<c049c760>]    lr : [<c049ebe0>]    psr: 60000013
[  294.381200] sp : de551998  ip : de5519d8  fp : 00200000
[  294.393242] r10: de551c8c  r9 : de5519d8  r8 : de3a9000
[  294.398730] r7 : de3a9258  r6 : de3a9400  r5 : de551a48  r4 :
00203fbc
[  294.405577] r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 :
de3a9000
[  294.412420] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
Segment user
[  294.419918] Control: 10c5387d  Table: 9e528019  DAC: 00000015
[  294.425954] Process ifconfig (pid: 1827, stack limit = 0xde550218)
[  294.432437] Stack: (0xde551998 to 0xde552000)

...

[  294.883613] [<c049c760>] (__spi_validate) from [<c049ebe0>]
(__spi_sync+0x78/0x210)
[  294.891670] [<c049ebe0>] (__spi_sync) from [<bf036598>]
(wl12xx_spi_raw_write+0xfc/0x148 [wlcore_spi])
[  294.901661] [<bf036598>] (wl12xx_spi_raw_write [wlcore_spi]) from
[<bf21c694>] (wlcore_boot_upload_firmware+0x1ec/0x458 [wlcore])
[  294.914038] [<bf21c694>] (wlcore_boot_upload_firmware [wlcore]) from
[<bf24532c>] (wl12xx_boot+0xc10/0xfac [wl12xx])
[  294.925161] [<bf24532c>] (wl12xx_boot [wl12xx]) from [<bf20d5cc>]
(wl1271_op_add_interface+0x5b0/0x910 [wlcore])
[  294.936364] [<bf20d5cc>] (wl1271_op_add_interface [wlcore]) from
[<bf15c4ac>] (ieee80211_do_open+0x44c/0xf7c [mac80211])
[  294.947963] [<bf15c4ac>] (ieee80211_do_open [mac80211]) from
[<c0537978>] (__dev_open+0xa8/0x110)
[  294.957307] [<c0537978>] (__dev_open) from [<c0537bf8>]
(__dev_change_flags+0x88/0x148)
[  294.965713] [<c0537bf8>] (__dev_change_flags) from [<c0537cd0>]
(dev_change_flags+0x18/0x48)
[  294.974576] [<c0537cd0>] (dev_change_flags) from [<c05a55a0>]
(devinet_ioctl+0x6b4/0x7d0)
[  294.983191] [<c05a55a0>] (devinet_ioctl) from [<c0517040>]
(sock_ioctl+0x1e4/0x2bc)
[  294.991244] [<c0517040>] (sock_ioctl) from [<c017d378>]
(do_vfs_ioctl+0x420/0x6b0)
[  294.999208] [<c017d378>] (do_vfs_ioctl) from [<c017d674>]
(SyS_ioctl+0x6c/0x7c)
[  295.006880] [<c017d674>] (SyS_ioctl) from [<c000f4c0>]
(ret_fast_syscall+0x0/0x54)
[  295.014835] Code: e1550004 e2444034 0a00007d e5953018 (e5942008)
[  295.021544] ---[ end trace 66ed188198f4e24e ]---

Signed-off-by: Uri Mashiach <uri.mashiach@compulab.co.il>
Acked-by: Igor Grinberg <grinberg@compulab.co.il>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Feb 16, 2017
commit e47301b upstream.

Fix the below Oops when trying to modprobe wlcore_spi.
The oops occurs because the wl1271_power_{off,on}()
function doesn't check the power() function pointer.

[   23.401447] Unable to handle kernel NULL pointer dereference at
virtual address 00000000
[   23.409954] pgd = c0004000
[   23.412922] [00000000] *pgd=00000000
[   23.416693] Internal error: Oops: 80000007 [#1] SMP ARM
[   23.422168] Modules linked in: wl12xx wlcore mac80211 cfg80211
musb_dsps musb_hdrc usbcore usb_common snd_soc_simple_card evdev joydev
omap_rng wlcore_spi snd_soc_tlv320aic23_i2c rng_core snd_soc_tlv320aic23
c_can_platform c_can can_dev snd_soc_davinci_mcasp snd_soc_edma
snd_soc_omap omap_wdt musb_am335x cpufreq_dt thermal_sys hwmon
[   23.453253] CPU: 0 PID: 36 Comm: kworker/0:2 Not tainted
4.2.0-00002-g951efee-dirty #233
[   23.461720] Hardware name: Generic AM33XX (Flattened Device Tree)
[   23.468123] Workqueue: events request_firmware_work_func
[   23.473690] task: de32efc0 ti: de4ee000 task.ti: de4ee000
[   23.479341] PC is at 0x0
[   23.482112] LR is at wl12xx_set_power_on+0x28/0x124 [wlcore]
[   23.488074] pc : [<00000000>]    lr : [<bf2581f0>]    psr: 60000013
[   23.488074] sp : de4efe50  ip : 00000002  fp : 00000000
[   23.500162] r10: de7cdd00  r9 : dc848800  r8 : bf27af00
[   23.505663] r7 : bf27a1a8  r6 : dcbd8a80  r5 : dce0e2e0  r4 :
dce0d2e0
[   23.512536] r3 : 00000000  r2 : 00000000  r1 : 00000001  r0 :
dc848810
[   23.519412] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
Segment kernel
[   23.527109] Control: 10c5387d  Table: 9cb78019  DAC: 00000015
[   23.533160] Process kworker/0:2 (pid: 36, stack limit = 0xde4ee218)
[   23.539760] Stack: (0xde4efe50 to 0xde4f0000)

[...]

[   23.665030] [<bf2581f0>] (wl12xx_set_power_on [wlcore]) from
[<bf25f7ac>] (wlcore_nvs_cb+0x118/0xa4c [wlcore])
[   23.675604] [<bf25f7ac>] (wlcore_nvs_cb [wlcore]) from [<c04387ec>]
(request_firmware_work_func+0x30/0x58)
[   23.685784] [<c04387ec>] (request_firmware_work_func) from
[<c0058e2c>] (process_one_work+0x1b4/0x4b4)
[   23.695591] [<c0058e2c>] (process_one_work) from [<c0059168>]
(worker_thread+0x3c/0x4a4)
[   23.704124] [<c0059168>] (worker_thread) from [<c005ee68>]
(kthread+0xd4/0xf0)
[   23.711747] [<c005ee68>] (kthread) from [<c000f598>]
(ret_from_fork+0x14/0x3c)
[   23.719357] Code: bad PC value
[   23.722760] ---[ end trace 981be8510db9b3a9 ]---

Prevent oops by validationg power() pointer value before
calling the function.

Signed-off-by: Uri Mashiach <uri.mashiach@compulab.co.il>
Acked-by: Igor Grinberg <grinberg@compulab.co.il>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Feb 16, 2017
commit ac75fe5 upstream.

That prevents this bug:
[ 2382.269496] BUG: unable to handle kernel NULL pointer dereference at 0000000000000540
[ 2382.270013] IP: [<ffffffffa01fe616>] snd_card_free+0x36/0x70 [snd]
[ 2382.270013] PGD 0
[ 2382.270013] Oops: 0002 [#1] SMP
[ 2382.270013] Modules linked in: saa7134_alsa(-) tda1004x saa7134_dvb videobuf2_dvb dvb_core tda827x tda8290 tuner saa7134 tveeprom videobuf2_dma_sg videobuf2_memops videobuf2_v4l2 videobuf2_core v4l2_common videodev media auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc tun bridge stp llc ebtables ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack it87 hwmon_vid snd_hda_codec_idt snd_hda_codec_generic iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq pcspkr i2c_i801 snd_seq_device snd_pcm snd_timer lpc_ich snd mfd_core soundcore binfmt_misc i915 video i2c_algo_bit drm_kms_helper drm r8169 ata_generic serio_raw pata_acpi mii i2c_core [last unloaded: videobuf2_memops]
[ 2382.270013] CPU: 0 PID: 4899 Comm: rmmod Not tainted 4.5.0-rc1+ #4
[ 2382.270013] Hardware name: PCCHIPS P17G/P17G, BIOS 080012  05/14/2008
[ 2382.270013] task: ffff880039c38000 ti: ffff88003c764000 task.ti: ffff88003c764000
[ 2382.270013] RIP: 0010:[<ffffffffa01fe616>]  [<ffffffffa01fe616>] snd_card_free+0x36/0x70 [snd]
[ 2382.270013] RSP: 0018:ffff88003c767ea0  EFLAGS: 00010286
[ 2382.270013] RAX: ffff88003c767eb8 RBX: 0000000000000000 RCX: 0000000000006260
[ 2382.270013] RDX: ffffffffa020a060 RSI: ffffffffa0206de1 RDI: ffff88003c767eb0
[ 2382.270013] RBP: ffff88003c767ed8 R08: 0000000000019960 R09: ffffffff811a5412
[ 2382.270013] R10: ffffea0000d7c200 R11: 0000000000000000 R12: ffff88003c767ea8
[ 2382.270013] R13: 00007ffe760617f7 R14: 0000000000000000 R15: 0000557625d7f1e0
[ 2382.270013] FS:  00007f80bb1c0700(0000) GS:ffff88003f400000(0000) knlGS:0000000000000000
[ 2382.270013] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2382.270013] CR2: 0000000000000540 CR3: 000000003c00f000 CR4: 00000000000006f0
[ 2382.270013] Stack:
[ 2382.270013]  000000003c767ed8 ffffffff00000000 ffff880000000000 ffff88003c767eb8
[ 2382.270013]  ffff88003c767eb8 ffffffffa049a890 00007ffe76060060 ffff88003c767ef0
[ 2382.270013]  ffffffffa049889d ffffffffa049a500 ffff88003c767f48 ffffffff8111079c
[ 2382.270013] Call Trace:
[ 2382.270013]  [<ffffffffa049889d>] saa7134_alsa_exit+0x1d/0x780 [saa7134_alsa]
[ 2382.270013]  [<ffffffff8111079c>] SyS_delete_module+0x19c/0x1f0
[ 2382.270013]  [<ffffffff8170fc2e>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 2382.270013] Code: 20 a0 48 c7 c6 e1 6d 20 a0 48 89 e5 41 54 53 4c 8d 65 d0 48 89 fb 48 83 ec 28 c7 45 d0 00 00 00 00 49 8d 7c 24 08 e8 7a 55 ed e0 <4c> 89 a3 40 05 00 00 48 89 df e8 eb fd ff ff 85 c0 75 1a 48 8d
[ 2382.270013] RIP  [<ffffffffa01fe616>] snd_card_free+0x36/0x70 [snd]
[ 2382.270013]  RSP <ffff88003c767ea0>
[ 2382.270013] CR2: 0000000000000540

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Feb 16, 2017
commit a38a08d upstream.

This driver registers for extcon events as part of its probe, but
never unregisters them in case of error in the probe path.

There were multiple issues noticed due to this missing error handling.
One of them is random crashes if the regulators are not ready yet by the
time probe is invoked.

Ivan's previous attempt [1] to fix this issue, did not really address
all the failure cases like regualtor/get_irq failures.

[1] https://lkml.org/lkml/2015/9/7/62

Without this patch the kernel would carsh with log:
...
Unable to handle kernel paging request at virtual address 17d78410
pgd = ffffffc001a5c000
[17d78410] *pgd=00000000b6806003, *pud=00000000b6806003, *pmd=0000000000000000
Internal error: Oops: 96000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 4.4.0+ #48
Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
Workqueue: deferwq deferred_probe_work_func
task: ffffffc03686e900 ti: ffffffc0368b0000 task.ti: ffffffc0368b0000
PC is at raw_notifier_chain_register+0x1c/0x44
LR is at extcon_register_notifier+0x88/0xc8
pc : [<ffffffc0000da43c>] lr : [<ffffffc000606298>] pstate: 80000085
sp : ffffffc0368b3a70
x29: ffffffc0368b3a70 x28: ffffffc03680c310
x27: ffffffc035518000 x26: ffffffc035518000
x25: ffffffc03bfa20e0 x24: ffffffc035580a18
x23: 0000000000000000 x22: ffffffc035518458
x21: ffffffc0355e9a60 x20: ffffffc035518000
x19: 0000000000000000 x18: 0000000000000028
x17: 0000000000000003 x16: ffffffc0018153c8
x15: 0000000000000001 x14: ffffffc03686f0f8
x13: ffffffc03686f0f8 x12: 0000000000000003
x11: 0000000000000001 x10: 0000000000000001
x9 : ffffffc03686f0f8 x8 : 0000e3872014c1a1
x7 : 0000000000000028 x6 : 0000000000000000
x5 : 0000000000000001 x4 : 0000000000000000
x3 : 00000000354fb170 x2 : 0000000017d78400
x1 : ffffffc0355e9a60 x0 : ffffffc0354fb268

Fixes: 	591fc11 ("usb: phy: msm: Use extcon framework for VBUS and ID detection")
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Felipe Balbi <balbi@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Feb 16, 2017
commit 63e41eb upstream.

We miss to take the crypto_alg_sem semaphore when traversing the
crypto_alg_list for CRYPTO_MSG_GETALG dumps. This allows a race with
crypto_unregister_alg() removing algorithms from the list while we're
still traversing it, thereby leading to a use-after-free as show below:

[ 3482.071639] general protection fault: 0000 [#1] SMP
[ 3482.075639] Modules linked in: aes_x86_64 glue_helper lrw ablk_helper cryptd gf128mul ipv6 pcspkr serio_raw virtio_net microcode virtio_pci virtio_ring virtio sr_mod cdrom [last unloaded: aesni_intel]
[ 3482.075639] CPU: 1 PID: 11065 Comm: crconf Not tainted 4.3.4-grsec+ #126
[ 3482.075639] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 3482.075639] task: ffff88001cd41a40 ti: ffff88001cd422c8 task.ti: ffff88001cd422c8
[ 3482.075639] RIP: 0010:[<ffffffff93722bd3>]  [<ffffffff93722bd3>] strncpy+0x13/0x30
[ 3482.075639] RSP: 0018:ffff88001f713b60  EFLAGS: 00010202
[ 3482.075639] RAX: ffff88001f6c4430 RBX: ffff88001f6c43a0 RCX: ffff88001f6c4430
[ 3482.075639] RDX: 0000000000000040 RSI: fefefefefefeff16 RDI: ffff88001f6c4430
[ 3482.075639] RBP: ffff88001f713b60 R08: ffff88001f6c4470 R09: ffff88001f6c4480
[ 3482.075639] R10: 0000000000000002 R11: 0000000000000246 R12: ffff88001ce2aa28
[ 3482.075639] R13: ffff880000093700 R14: ffff88001f5e4bf8 R15: 0000000000003b20
[ 3482.075639] FS:  0000033826fa2700(0000) GS:ffff88001e900000(0000) knlGS:0000000000000000
[ 3482.075639] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3482.075639] CR2: ffffffffff600400 CR3: 00000000139ec000 CR4: 00000000001606f0
[ 3482.075639] Stack:
[ 3482.075639]  ffff88001f713bd8 ffffffff936ccd00 ffff88001e5c4200 ffff880000093700
[ 3482.075639]  ffff88001f713bd0 ffffffff938ef4bf 0000000000000000 0000000000003b20
[ 3482.075639]  ffff88001f5e4bf8 ffff88001f5e4848 0000000000000000 0000000000003b20
[ 3482.075639] Call Trace:
[ 3482.075639]  [<ffffffff936ccd00>] crypto_report_alg+0xc0/0x3e0
[ 3482.075639]  [<ffffffff938ef4bf>] ? __alloc_skb+0x16f/0x300
[ 3482.075639]  [<ffffffff936cd08a>] crypto_dump_report+0x6a/0x90
[ 3482.075639]  [<ffffffff93935707>] netlink_dump+0x147/0x2e0
[ 3482.075639]  [<ffffffff93935f99>] __netlink_dump_start+0x159/0x190
[ 3482.075639]  [<ffffffff936ccb13>] crypto_user_rcv_msg+0xc3/0x130
[ 3482.075639]  [<ffffffff936cd020>] ? crypto_report_alg+0x3e0/0x3e0
[ 3482.075639]  [<ffffffff936cc4b0>] ? alg_test_crc32c+0x120/0x120
[ 3482.075639]  [<ffffffff93933145>] ? __netlink_lookup+0xd5/0x120
[ 3482.075639]  [<ffffffff936cca50>] ? crypto_add_alg+0x1d0/0x1d0
[ 3482.075639]  [<ffffffff93938141>] netlink_rcv_skb+0xe1/0x130
[ 3482.075639]  [<ffffffff936cc4f8>] crypto_netlink_rcv+0x28/0x40
[ 3482.075639]  [<ffffffff939375a8>] netlink_unicast+0x108/0x180
[ 3482.075639]  [<ffffffff93937c21>] netlink_sendmsg+0x541/0x770
[ 3482.075639]  [<ffffffff938e31e1>] sock_sendmsg+0x21/0x40
[ 3482.075639]  [<ffffffff938e4763>] SyS_sendto+0xf3/0x130
[ 3482.075639]  [<ffffffff93444203>] ? bad_area_nosemaphore+0x13/0x20
[ 3482.075639]  [<ffffffff93444470>] ? __do_page_fault+0x80/0x3a0
[ 3482.075639]  [<ffffffff939d80cb>] entry_SYSCALL_64_fastpath+0x12/0x6e
[ 3482.075639] Code: 88 4a ff 75 ed 5d 48 0f ba 2c 24 3f c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 85 d2 48 89 f8 48 89 f9 4c 8d 04 17 48 89 e5 74 15 <0f> b6 16 80 fa 01 88 11 48 83 de ff 48 83 c1 01 4c 39 c1 75 eb
[ 3482.075639] RIP  [<ffffffff93722bd3>] strncpy+0x13/0x30

To trigger the race run the following loops simultaneously for a while:
  $ while : ; do modprobe aesni-intel; rmmod aesni-intel; done
  $ while : ; do crconf show all > /dev/null; done

Fix the race by taking the crypto_alg_sem read lock, thereby preventing
crypto_unregister_alg() from modifying the algorithm list during the
dump.

This bug has been detected by the PaX memory sanitize feature.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Feb 16, 2017
commit 3d5fe03 upstream.

We can end up allocating a new compression stream with GFP_KERNEL from
within the IO path, which may result is nested (recursive) IO
operations.  That can introduce problems if the IO path in question is a
reclaimer, holding some locks that will deadlock nested IOs.

Allocate streams and working memory using GFP_NOIO flag, forbidding
recursive IO and FS operations.

An example:

  inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
  git/20158 [HC0[0]:SC0[0]:HE1:SE1] takes:
   (jbd2_handle){+.+.?.}, at:  start_this_handle+0x4ca/0x555
  {IN-RECLAIM_FS-W} state was registered at:
     __lock_acquire+0x8da/0x117b
     lock_acquire+0x10c/0x1a7
     start_this_handle+0x52d/0x555
     jbd2__journal_start+0xb4/0x237
     __ext4_journal_start_sb+0x108/0x17e
     ext4_dirty_inode+0x32/0x61
     __mark_inode_dirty+0x16b/0x60c
     iput+0x11e/0x274
     __dentry_kill+0x148/0x1b8
     shrink_dentry_list+0x274/0x44a
     prune_dcache_sb+0x4a/0x55
     super_cache_scan+0xfc/0x176
     shrink_slab.part.14.constprop.25+0x2a2/0x4d3
     shrink_zone+0x74/0x140
     kswapd+0x6b7/0x930
     kthread+0x107/0x10f
     ret_from_fork+0x3f/0x70
  irq event stamp: 138297
  hardirqs last  enabled at (138297):  debug_check_no_locks_freed+0x113/0x12f
  hardirqs last disabled at (138296):  debug_check_no_locks_freed+0x33/0x12f
  softirqs last  enabled at (137818):  __do_softirq+0x2d3/0x3e9
  softirqs last disabled at (137813):  irq_exit+0x41/0x95

               other info that might help us debug this:
   Possible unsafe locking scenario:
         CPU0
         ----
    lock(jbd2_handle);
    <Interrupt>
      lock(jbd2_handle);

                *** DEADLOCK ***
  5 locks held by git/20158:
   #0:  (sb_writers#7){.+.+.+}, at: [<ffffffff81155411>] mnt_want_write+0x24/0x4b
   #1:  (&type->i_mutex_dir_key#2/1){+.+.+.}, at: [<ffffffff81145087>] lock_rename+0xd9/0xe3
   #2:  (&sb->s_type->i_mutex_key#11){+.+.+.}, at: [<ffffffff8114f8e2>] lock_two_nondirectories+0x3f/0x6b
   #3:  (&sb->s_type->i_mutex_key#11/4){+.+.+.}, at: [<ffffffff8114f909>] lock_two_nondirectories+0x66/0x6b
   #4:  (jbd2_handle){+.+.?.}, at: [<ffffffff811e31db>] start_this_handle+0x4ca/0x555

               stack backtrace:
  CPU: 2 PID: 20158 Comm: git Not tainted 4.1.0-rc7-next-20150615-dbg-00016-g8bdf555-dirty #211
  Call Trace:
    dump_stack+0x4c/0x6e
    mark_lock+0x384/0x56d
    mark_held_locks+0x5f/0x76
    lockdep_trace_alloc+0xb2/0xb5
    kmem_cache_alloc_trace+0x32/0x1e2
    zcomp_strm_alloc+0x25/0x73 [zram]
    zcomp_strm_multi_find+0xe7/0x173 [zram]
    zcomp_strm_find+0xc/0xe [zram]
    zram_bvec_rw+0x2ca/0x7e0 [zram]
    zram_make_request+0x1fa/0x301 [zram]
    generic_make_request+0x9c/0xdb
    submit_bio+0xf7/0x120
    ext4_io_submit+0x2e/0x43
    ext4_bio_write_page+0x1b7/0x300
    mpage_submit_page+0x60/0x77
    mpage_map_and_submit_buffers+0x10f/0x21d
    ext4_writepages+0xc8c/0xe1b
    do_writepages+0x23/0x2c
    __filemap_fdatawrite_range+0x84/0x8b
    filemap_flush+0x1c/0x1e
    ext4_alloc_da_blocks+0xb8/0x117
    ext4_rename+0x132/0x6dc
    ? mark_held_locks+0x5f/0x76
    ext4_rename2+0x29/0x2b
    vfs_rename+0x540/0x636
    SyS_renameat2+0x359/0x44d
    SyS_rename+0x1e/0x20
    entry_SYSCALL_64_fastpath+0x12/0x6f

[minchan@kernel.org: add stable mark]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Kyeongdon Kim <kyeongdon.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Feb 16, 2017
commit f4eafd8 upstream.

A kernel page fault oops with the callstack below was observed
when a read syscall was made to a pmem device after a huge amount
(>512GB) of vmalloc ranges was allocated by ioremap() on a x86_64
system:

     BUG: unable to handle kernel paging request at ffff880840000ff8
     IP: vmalloc_fault+0x1be/0x300
     PGD c7f03a067 PUD 0
     Oops: 0000 [#1] SM
     Call Trace:
        __do_page_fault+0x285/0x3e0
        do_page_fault+0x2f/0x80
        ? put_prev_entity+0x35/0x7a0
        page_fault+0x28/0x30
        ? memcpy_erms+0x6/0x10
        ? schedule+0x35/0x80
        ? pmem_rw_bytes+0x6a/0x190 [nd_pmem]
        ? schedule_timeout+0x183/0x240
        btt_log_read+0x63/0x140 [nd_btt]
         :
        ? __symbol_put+0x60/0x60
        ? kernel_read+0x50/0x80
        SyS_finit_module+0xb9/0xf0
        entry_SYSCALL_64_fastpath+0x1a/0xa4

Since v4.1, ioremap() supports large page (pud/pmd) mappings in
x86_64 and PAE.  vmalloc_fault() however assumes that the vmalloc
range is limited to pte mappings.

vmalloc faults do not normally happen in ioremap'd ranges since
ioremap() sets up the kernel page tables, which are shared by
user processes.  pgd_ctor() sets the kernel's PGD entries to
user's during fork().  When allocation of the vmalloc ranges
crosses a 512GB boundary, ioremap() allocates a new pud table
and updates the kernel PGD entry to point it.  If user process's
PGD entry does not have this update yet, a read/write syscall
to the range will cause a vmalloc fault, which hits the Oops
above as it does not handle a large page properly.

Following changes are made to vmalloc_fault().

64-bit:

 - No change for the PGD sync operation as it handles large
   pages already.
 - Add pud_huge() and pmd_huge() to the validation code to
   handle large pages.
 - Change pud_page_vaddr() to pud_pfn() since an ioremap range
   is not directly mapped (while the if-statement still works
   with a bogus addr).
 - Change pmd_page() to pmd_pfn() since an ioremap range is not
   backed by struct page (while the if-statement still works
   with a bogus addr).

32-bit:
 - No change for the sync operation since the index3 PGD entry
   covers the entire vmalloc range, which is always valid.
   (A separate change to sync PGD entry is necessary if this
    memory layout is changed regardless of the page size.)
 - Add pmd_huge() to the validation code to handle large pages.
   This is for completeness since vmalloc_fault() won't happen
   in ioremap'd ranges as its PGD entry is always valid.

Reported-by: Henning Schild <henning.schild@siemens.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/1455758214-24623-1-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Feb 16, 2017
commit bc4ef75 upstream.

The value of ctx->pos in the last readdir call is supposed to be set to
INT_MAX due to 32bit compatibility, unless 'pos' is intentially set to a
larger value, then it's LLONG_MAX.

There's a report from PaX SIZE_OVERFLOW plugin that "ctx->pos++"
overflows (https://forums.grsecurity.net/viewtopic.php?f=1&t=4284), on a
64bit arch, where the value is 0x7fffffffffffffff ie. LLONG_MAX before
the increment.

We can get to that situation like that:

* emit all regular readdir entries
* still in the same call to readdir, bump the last pos to INT_MAX
* next call to readdir will not emit any entries, but will reach the
  bump code again, finds pos to be INT_MAX and sets it to LLONG_MAX

Normally this is not a problem, but if we call readdir again, we'll find
'pos' set to LLONG_MAX and the unconditional increment will overflow.

The report from Victor at
(http://thread.gmane.org/gmane.comp.file-systems.btrfs/49500) with debugging
print shows that pattern:

 Overflow: e
 Overflow: 7fffffff
 Overflow: 7fffffffffffffff
 PAX: size overflow detected in function btrfs_real_readdir
   fs/btrfs/inode.c:5760 cicus.935_282 max, count: 9, decl: pos; num: 0;
   context: dir_context;
 CPU: 0 PID: 2630 Comm: polkitd Not tainted 4.2.3-grsec #1
 Hardware name: Gigabyte Technology Co., Ltd. H81ND2H/H81ND2H, BIOS F3 08/11/2015
  ffffffff81901608 0000000000000000 ffffffff819015e6 ffffc90004973d48
  ffffffff81742f0f 0000000000000007 ffffffff81901608 ffffc90004973d78
  ffffffff811cb706 0000000000000000 ffff8800d47359e0 ffffc90004973ed8
 Call Trace:
  [<ffffffff81742f0f>] dump_stack+0x4c/0x7f
  [<ffffffff811cb706>] report_size_overflow+0x36/0x40
  [<ffffffff812ef0bc>] btrfs_real_readdir+0x69c/0x6d0
  [<ffffffff811dafc8>] iterate_dir+0xa8/0x150
  [<ffffffff811e6d8d>] ? __fget_light+0x2d/0x70
  [<ffffffff811dba3a>] SyS_getdents+0xba/0x1c0
 Overflow: 1a
  [<ffffffff811db070>] ? iterate_dir+0x150/0x150
  [<ffffffff81749b69>] entry_SYSCALL_64_fastpath+0x12/0x83

The jump from 7fffffff to 7fffffffffffffff happens when new dir entries
are not yet synced and are processed from the delayed list. Then the code
could go to the bump section again even though it might not emit any new
dir entries from the delayed list.

The fix avoids entering the "bump" section again once we've finished
emitting the entries, both for synced and delayed entries.

References: https://forums.grsecurity.net/viewtopic.php?f=1&t=4284
Reported-by: Victor <services@swwu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Tested-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Feb 16, 2017
commit e0bd70c upstream.

In the extent_same ioctl we are getting the pages for the source and
target ranges and unlocking them immediately after, which is incorrect
because later we attempt to map them (with kmap_atomic) and access their
contents at btrfs_cmp_data(). When we do such access the pages might have
been relocated or removed from memory, which leads to an invalid memory
access. This issue is detected on a kernel with CONFIG_DEBUG_PAGEALLOC=y
which produces a trace like the following:

186736.677437] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[186736.680382] Modules linked in: btrfs dm_flakey dm_mod ppdev xor raid6_pq sha256_generic hmac drbg ansi_cprng acpi_cpufreq evdev sg aesni_intel aes_x86_64
parport_pc ablk_helper tpm_tis psmouse parport i2c_piix4 tpm cryptd i2c_core lrw processor button serio_raw pcspkr gf128mul glue_helper loop autofs4 ext4
crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last
unloaded: btrfs]
[186736.681319] CPU: 13 PID: 10222 Comm: duperemove Tainted: G        W       4.4.0-rc6-btrfs-next-18+ #1
[186736.681319] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[186736.681319] task: ffff880132600400 ti: ffff880362284000 task.ti: ffff880362284000
[186736.681319] RIP: 0010:[<ffffffff81264d00>]  [<ffffffff81264d00>] memcmp+0xb/0x22
[186736.681319] RSP: 0018:ffff880362287d70  EFLAGS: 00010287
[186736.681319] RAX: 000002c002468acf RBX: 0000000012345678 RCX: 0000000000000000
[186736.681319] RDX: 0000000000001000 RSI: 0005d129c5cf9000 RDI: 0005d129c5cf9000
[186736.681319] RBP: ffff880362287d70 R08: 0000000000000000 R09: 0000000000001000
[186736.681319] R10: ffff880000000000 R11: 0000000000000476 R12: 0000000000001000
[186736.681319] R13: ffff8802f91d4c88 R14: ffff8801f2a77830 R15: ffff880352e83e40
[186736.681319] FS:  00007f27b37fe700(0000) GS:ffff88043dda0000(0000) knlGS:0000000000000000
[186736.681319] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[186736.681319] CR2: 00007f27a406a000 CR3: 0000000217421000 CR4: 00000000001406e0
[186736.681319] Stack:
[186736.681319]  ffff880362287ea0 ffffffffa048d0bd 000000000009f000 0000000000001000
[186736.681319]  0100000000000000 ffff8801f2a77850 ffff8802f91d49b0 ffff880132600400
[186736.681319]  00000000000004f8 ffff8801c1efbe41 0000000000000000 0000000000000038
[186736.681319] Call Trace:
[186736.681319]  [<ffffffffa048d0bd>] btrfs_ioctl+0x24cb/0x2731 [btrfs]
[186736.681319]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
[186736.681319]  [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d
[186736.681319]  [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea
[186736.681319]  [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71
[186736.681319]  [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
[186736.681319]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
[186736.681319] Code: 0a 3c 6e 74 0d 3c 79 74 04 3c 59 75 0c c6 06 01 eb 03 c6 06 00 31 c0 eb 05 b8 ea ff ff ff 5d c3 55 31 c9 48 89 e5 48 39 d1 74 13 <0f> b6
04 0f 44 0f b6 04 0e 48 ff c1 44 29 c0 74 ea eb 02 31 c0

(gdb) list *(btrfs_ioctl+0x24cb)
0x5e0e1 is in btrfs_ioctl (fs/btrfs/ioctl.c:2972).
2967                    dst_addr = kmap_atomic(dst_page);
2968
2969                    flush_dcache_page(src_page);
2970                    flush_dcache_page(dst_page);
2971
2972                    if (memcmp(addr, dst_addr, cmp_len))
2973                            ret = BTRFS_SAME_DATA_DIFFERS;
2974
2975                    kunmap_atomic(addr);
2976                    kunmap_atomic(dst_addr);

So fix this by making sure we keep the pages locked and respect the same
locking order as everywhere else: get and lock the pages first and then
lock the range in the inode's io tree (like for example at
__btrfs_buffered_write() and extent_readpages()). If an ordered extent
is found after locking the range in the io tree, unlock the range,
unlock the pages, wait for the ordered extent to complete and repeat the
entire locking process until no overlapping ordered extents are found.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Feb 16, 2017
commit 0c0fe3b upstream.

While doing some tests I ran into an hang on an extent buffer's rwlock
that produced the following trace:

[39389.800012] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [fdm-stress:32166]
[39389.800016] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [fdm-stress:32165]
[39389.800016] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs]
[39389.800016] irq event stamp: 0
[39389.800016] hardirqs last  enabled at (0): [<          (null)>]           (null)
[39389.800016] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
[39389.800016] softirqs last  enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
[39389.800016] softirqs last disabled at (0): [<          (null)>]           (null)
[39389.800016] CPU: 14 PID: 32165 Comm: fdm-stress Not tainted 4.4.0-rc6-btrfs-next-18+ #1
[39389.800016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[39389.800016] task: ffff880175b1ca40 ti: ffff8800a185c000 task.ti: ffff8800a185c000
[39389.800016] RIP: 0010:[<ffffffff810902af>]  [<ffffffff810902af>] queued_spin_lock_slowpath+0x57/0x158
[39389.800016] RSP: 0018:ffff8800a185fb80  EFLAGS: 00000202
[39389.800016] RAX: 0000000000000101 RBX: ffff8801710c4e9c RCX: 0000000000000101
[39389.800016] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000001
[39389.800016] RBP: ffff8800a185fb98 R08: 0000000000000001 R09: 0000000000000000
[39389.800016] R10: ffff8800a185fb68 R11: 6db6db6db6db6db7 R12: ffff8801710c4e98
[39389.800016] R13: ffff880175b1ca40 R14: ffff8800a185fc10 R15: ffff880175b1ca40
[39389.800016] FS:  00007f6d37fff700(0000) GS:ffff8802be9c0000(0000) knlGS:0000000000000000
[39389.800016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[39389.800016] CR2: 00007f6d300019b8 CR3: 0000000037c93000 CR4: 00000000001406e0
[39389.800016] Stack:
[39389.800016]  ffff8801710c4e98 ffff8801710c4e98 ffff880175b1ca40 ffff8800a185fbb0
[39389.800016]  ffffffff81091e11 ffff8801710c4e98 ffff8800a185fbc8 ffffffff81091895
[39389.800016]  ffff8801710c4e98 ffff8800a185fbe8 ffffffff81486c5c ffffffffa067288c
[39389.800016] Call Trace:
[39389.800016]  [<ffffffff81091e11>] queued_read_lock_slowpath+0x46/0x60
[39389.800016]  [<ffffffff81091895>] do_raw_read_lock+0x3e/0x41
[39389.800016]  [<ffffffff81486c5c>] _raw_read_lock+0x3d/0x44
[39389.800016]  [<ffffffffa067288c>] ? btrfs_tree_read_lock+0x54/0x125 [btrfs]
[39389.800016]  [<ffffffffa067288c>] btrfs_tree_read_lock+0x54/0x125 [btrfs]
[39389.800016]  [<ffffffffa0622ced>] ? btrfs_find_item+0xa7/0xd2 [btrfs]
[39389.800016]  [<ffffffffa069363f>] btrfs_ref_to_path+0xd6/0x174 [btrfs]
[39389.800016]  [<ffffffffa0693730>] inode_to_path+0x53/0xa2 [btrfs]
[39389.800016]  [<ffffffffa0693e2e>] paths_from_inode+0x117/0x2ec [btrfs]
[39389.800016]  [<ffffffffa0670cff>] btrfs_ioctl+0xd5b/0x2793 [btrfs]
[39389.800016]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
[39389.800016]  [<ffffffff81276727>] ? __this_cpu_preempt_check+0x13/0x15
[39389.800016]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
[39389.800016]  [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d
[39389.800016]  [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea
[39389.800016]  [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71
[39389.800016]  [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
[39389.800016]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
[39389.800016] Code: b9 01 01 00 00 f7 c6 00 ff ff ff 75 32 83 fe 01 89 ca 89 f0 0f 45 d7 f0 0f b1 13 39 f0 74 04 89 c6 eb e2 ff ca 0f 84 fa 00 00 00 <8b> 03 84 c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 e6 00 00 00 e8
[39389.800012] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs]
[39389.800012] irq event stamp: 0
[39389.800012] hardirqs last  enabled at (0): [<          (null)>]           (null)
[39389.800012] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
[39389.800012] softirqs last  enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
[39389.800012] softirqs last disabled at (0): [<          (null)>]           (null)
[39389.800012] CPU: 15 PID: 32166 Comm: fdm-stress Tainted: G             L  4.4.0-rc6-btrfs-next-18+ #1
[39389.800012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[39389.800012] task: ffff880179294380 ti: ffff880034a60000 task.ti: ffff880034a60000
[39389.800012] RIP: 0010:[<ffffffff81091e8d>]  [<ffffffff81091e8d>] queued_write_lock_slowpath+0x62/0x72
[39389.800012] RSP: 0018:ffff880034a639f0  EFLAGS: 00000206
[39389.800012] RAX: 0000000000000101 RBX: ffff8801710c4e98 RCX: 0000000000000000
[39389.800012] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffff8801710c4e9c
[39389.800012] RBP: ffff880034a639f8 R08: 0000000000000001 R09: 0000000000000000
[39389.800012] R10: ffff880034a639b0 R11: 0000000000001000 R12: ffff8801710c4e98
[39389.800012] R13: 0000000000000001 R14: ffff880172cbc000 R15: ffff8801710c4e00
[39389.800012] FS:  00007f6d377fe700(0000) GS:ffff8802be9e0000(0000) knlGS:0000000000000000
[39389.800012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[39389.800012] CR2: 00007f6d3d3c1000 CR3: 0000000037c93000 CR4: 00000000001406e0
[39389.800012] Stack:
[39389.800012]  ffff8801710c4e98 ffff880034a63a10 ffffffff81091963 ffff8801710c4e98
[39389.800012]  ffff880034a63a30 ffffffff81486f1b ffffffffa0672cb3 ffff8801710c4e00
[39389.800012]  ffff880034a63a78 ffffffffa0672cb3 ffff8801710c4e00 ffff880034a63a58
[39389.800012] Call Trace:
[39389.800012]  [<ffffffff81091963>] do_raw_write_lock+0x72/0x8c
[39389.800012]  [<ffffffff81486f1b>] _raw_write_lock+0x3a/0x41
[39389.800012]  [<ffffffffa0672cb3>] ? btrfs_tree_lock+0x119/0x251 [btrfs]
[39389.800012]  [<ffffffffa0672cb3>] btrfs_tree_lock+0x119/0x251 [btrfs]
[39389.800012]  [<ffffffffa061aeba>] ? rcu_read_unlock+0x5b/0x5d [btrfs]
[39389.800012]  [<ffffffffa061ce13>] ? btrfs_root_node+0xda/0xe6 [btrfs]
[39389.800012]  [<ffffffffa061ce83>] btrfs_lock_root_node+0x22/0x42 [btrfs]
[39389.800012]  [<ffffffffa062046b>] btrfs_search_slot+0x1b8/0x758 [btrfs]
[39389.800012]  [<ffffffff810fc6b0>] ? time_hardirqs_on+0x15/0x28
[39389.800012]  [<ffffffffa06365db>] btrfs_lookup_inode+0x31/0x95 [btrfs]
[39389.800012]  [<ffffffff8108d62f>] ? trace_hardirqs_on+0xd/0xf
[39389.800012]  [<ffffffff8148482b>] ? mutex_lock_nested+0x397/0x3bc
[39389.800012]  [<ffffffffa068821b>] __btrfs_update_delayed_inode+0x59/0x1c0 [btrfs]
[39389.800012]  [<ffffffffa068858e>] __btrfs_commit_inode_delayed_items+0x194/0x5aa [btrfs]
[39389.800012]  [<ffffffff81486ab7>] ? _raw_spin_unlock+0x31/0x44
[39389.800012]  [<ffffffffa0688a48>] __btrfs_run_delayed_items+0xa4/0x15c [btrfs]
[39389.800012]  [<ffffffffa0688d62>] btrfs_run_delayed_items+0x11/0x13 [btrfs]
[39389.800012]  [<ffffffffa064048e>] btrfs_commit_transaction+0x234/0x96e [btrfs]
[39389.800012]  [<ffffffffa0618d10>] btrfs_sync_fs+0x145/0x1ad [btrfs]
[39389.800012]  [<ffffffffa0671176>] btrfs_ioctl+0x11d2/0x2793 [btrfs]
[39389.800012]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
[39389.800012]  [<ffffffff81140261>] ? __might_fault+0x4c/0xa7
[39389.800012]  [<ffffffff81140261>] ? __might_fault+0x4c/0xa7
[39389.800012]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
[39389.800012]  [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d
[39389.800012]  [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea
[39389.800012]  [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71
[39389.800012]  [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
[39389.800012]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
[39389.800012] Code: f0 0f b1 13 85 c0 75 ef eb 2a f3 90 8a 03 84 c0 75 f8 f0 0f b0 13 84 c0 75 f0 ba ff 00 00 00 eb 0a f0 0f b1 13 ff c8 74 0b f3 90 <8b> 03 83 f8 01 75 f7 eb ed c6 43 04 00 5b 5d c3 0f 1f 44 00 00

This happens because in the code path executed by the inode_paths ioctl we
end up nesting two calls to read lock a leaf's rwlock when after the first
call to read_lock() and before the second call to read_lock(), another
task (running the delayed items as part of a transaction commit) has
already called write_lock() against the leaf's rwlock. This situation is
illustrated by the following diagram:

         Task A                       Task B

  btrfs_ref_to_path()               btrfs_commit_transaction()
    read_lock(&eb->lock);

                                      btrfs_run_delayed_items()
                                        __btrfs_commit_inode_delayed_items()
                                          __btrfs_update_delayed_inode()
                                            btrfs_lookup_inode()

                                              write_lock(&eb->lock);
                                                --> task waits for lock

    read_lock(&eb->lock);
    --> makes this task hang
        forever (and task B too
	of course)

So fix this by avoiding doing the nested read lock, which is easily
avoidable. This issue does not happen if task B calls write_lock() after
task A does the second call to read_lock(), however there does not seem
to exist anything in the documentation that mentions what is the expected
behaviour for recursive locking of rwlocks (leaving the idea that doing
so is not a good usage of rwlocks).

Also, as a side effect necessary for this fix, make sure we do not
needlessly read lock extent buffers when the input path has skip_locking
set (used when called from send).

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Feb 16, 2017
commit ec183d2 upstream.

Fixes segmentation fault using, for instance:

  (gdb) run record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
  Starting program: /home/acme/bin/perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
  Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-7.fc23.x86_64
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib64/libthread_db.so.1".

 Program received signal SIGSEGV, Segmentation fault.
  0 x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410
  (gdb) bt
  #0  0x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410
  #1  0x00000000004b9fc5 in add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
      at util/parse-events.c:433
  #2  0x00000000004ba334 in add_tracepoint_event (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
      at util/parse-events.c:498
  #3  0x00000000004bb699 in parse_events_add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys=0x19b1370 "sched", event=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
      at util/parse-events.c:936
  #4  0x00000000004f6eda in parse_events_parse (_data=0x7fffffffb8b0, scanner=0x19a49d0) at util/parse-events.y:391
  #5  0x00000000004bc8e5 in parse_events__scanner (str=0x663ff2 "sched:sched_switch", data=0x7fffffffb8b0, start_token=258) at util/parse-events.c:1361
  #6  0x00000000004bca57 in parse_events (evlist=0x19a5220, str=0x663ff2 "sched:sched_switch", err=0x0) at util/parse-events.c:1401
  #7  0x0000000000518d5f in perf_evlist__can_select_event (evlist=0x19a3b90, str=0x663ff2 "sched:sched_switch") at util/record.c:253
  #8  0x0000000000553c42 in intel_pt_track_switches (evlist=0x19a3b90) at arch/x86/util/intel-pt.c:364
  #9  0x00000000005549d1 in intel_pt_recording_options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at arch/x86/util/intel-pt.c:664
  #10 0x000000000051e076 in auxtrace_record__options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at util/auxtrace.c:539
  #11 0x0000000000433368 in cmd_record (argc=1, argv=0x7fffffffde60, prefix=0x0) at builtin-record.c:1264
  #12 0x000000000049bec2 in run_builtin (p=0x8fa2a8 <commands+168>, argc=5, argv=0x7fffffffde60) at perf.c:390
  #13 0x000000000049c12a in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:451
  #14 0x000000000049c278 in run_argv (argcp=0x7fffffffdcbc, argv=0x7fffffffdcb0) at perf.c:495
  #15 0x000000000049c60a in main (argc=5, argv=0x7fffffffde60) at perf.c:618
(gdb)

Intel PT attempts to find the sched:sched_switch tracepoint but that seg
faults if tracefs is not readable, because the error reporting structure
is null, as errors are not reported when automatically adding
tracepoints.  Fix by checking before using.

Committer note:

This doesn't take place in a kernel that supports
perf_event_attr.context_switch, that is the default way that will be
used for tracking context switches, only in older kernels, like 4.2, in
a machine with Intel PT (e.g. Broadwell) for non-priviledged users.

Further info from a similar patch by Wang:

The error is in tracepoint_error: it assumes the 'e' parameter is valid.

However, there are many situation a parse_event() can be called without
parse_events_error. See result of

  $ grep 'parse_events(.*NULL)' ./tools/perf/ -r'

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Tong Zhang <ztong@vt.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 1965817 ("perf tools: Enhance parsing events tracepoint error output")
Link: http://lkml.kernel.org/r/1453809921-24596-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Feb 16, 2017
commit 90a88d6 upstream.

This softlockup is currently happening:

[  444.088002] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:1:29]
[  444.088002] Modules linked in: lpfc(-) qla2x00tgt(O) qla2xxx_scst(O) scst_vdisk(O) scsi_transport_fc libcrc32c scst(O) dlm configfs nfsd lockd grace nfs_acl auth_rpcgss sunrpc ed
d snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device dm_mod iTCO_wdt snd_hda_codec_realtek snd_hda_codec_generic gpio_ich iTCO_vendor_support ppdev snd_hda_intel snd_hda_codec snd_hda
_core snd_hwdep tg3 snd_pcm snd_timer libphy lpc_ich parport_pc ptp acpi_cpufreq snd pps_core fjes parport i2c_i801 ehci_pci tpm_tis tpm sr_mod cdrom soundcore floppy hwmon sg 8250_
fintek pcspkr i915 drm_kms_helper uhci_hcd ehci_hcd drm fb_sys_fops sysimgblt sysfillrect syscopyarea i2c_algo_bit usbcore button video usb_common fan ata_generic ata_piix libata th
ermal
[  444.088002] CPU: 1 PID: 29 Comm: kworker/1:1 Tainted: G           O    4.4.0-rc5-2.g1e923a3-default #1
[  444.088002] Hardware name: FUJITSU SIEMENS ESPRIMO E           /D2164-A1, BIOS 5.00 R1.10.2164.A1               05/08/2006
[  444.088002] Workqueue: fc_wq_4 fc_rport_final_delete [scsi_transport_fc]
[  444.088002] task: f6266ec0 ti: f6268000 task.ti: f6268000
[  444.088002] EIP: 0060:[<c07e7044>] EFLAGS: 00000286 CPU: 1
[  444.088002] EIP is at _raw_spin_unlock_irqrestore+0x14/0x20
[  444.088002] EAX: 00000286 EBX: f20d3800 ECX: 00000002 EDX: 00000286
[  444.088002] ESI: f50ba800 EDI: f2146848 EBP: f6269ec8 ESP: f6269ec8
[  444.088002]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  444.088002] CR0: 8005003b CR2: 08f96600 CR3: 363ae000 CR4: 000006d0
[  444.088002] Stack:
[  444.088002]  f6269eec c066b0f7 00000286 f2146848 f50ba808 f50ba800 f50ba800 f2146a90
[  444.088002]  f2146848 f6269f08 f8f0a4ed f3141000 f2146800 f2146a90 f619fa00 00000040
[  444.088002]  f6269f40 c026cb25 00000001 166c6392 00000061 f6757140 f6136340 00000004
[  444.088002] Call Trace:
[  444.088002]  [<c066b0f7>] scsi_remove_target+0x167/0x1c0
[  444.088002]  [<f8f0a4ed>] fc_rport_final_delete+0x9d/0x1e0 [scsi_transport_fc]
[  444.088002]  [<c026cb25>] process_one_work+0x155/0x3e0
[  444.088002]  [<c026cde7>] worker_thread+0x37/0x490
[  444.088002]  [<c027214b>] kthread+0x9b/0xb0
[  444.088002]  [<c07e72c1>] ret_from_kernel_thread+0x21/0x40

What appears to be happening is that something has pinned the target
so it can't go into STARGET_DEL via final release and the loop in
scsi_remove_target spins endlessly until that happens.

The fix for this soft lockup is to not keep looping over a device that
we've called remove on but which hasn't gone into DEL state.  This
patch will retain a simplistic memory of the last target and not keep
looping over it.

Reported-by: Sebastian Herbszt <herbszt@gmx.de>
Tested-by: Sebastian Herbszt <herbszt@gmx.de>
Fixes: 4099819
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Feb 16, 2017
commit 361cad3 upstream.

We've seen this in a packet capture - I've intermixed what I
think was going on. The fix here is to grab the so_lock sooner.

1964379 -> #1 open (for write) reply seqid=1
1964393 -> #2 open (for read) reply seqid=2

  __nfs4_close(), state->n_wronly--
  nfs4_state_set_mode_locked(), changes state->state = [R]
  state->flags is [RW]
  state->state is [R], state->n_wronly == 0, state->n_rdonly == 1

1964398 -> #3 open (for write) call -> because close is already running
1964399 -> downgrade (to read) call seqid=2 (close of #1)
1964402 -> #3 open (for write) reply seqid=3

 __update_open_stateid()
   nfs_set_open_stateid_locked(), changes state->flags
   state->flags is [RW]
   state->state is [R], state->n_wronly == 0, state->n_rdonly == 1
   new sequence number is exposed now via nfs4_stateid_copy()

   next step would be update_open_stateflags(), pending so_lock

1964403 -> downgrade reply seqid=2, fails with OLD_STATEID (close of #1)

   nfs4_close_prepare() gets so_lock and recalcs flags -> send close

1964405 -> downgrade (to read) call seqid=3 (close of #1 retry)

   __update_open_stateid() gets so_lock
 * update_open_stateflags() updates state->n_wronly.
   nfs4_state_set_mode_locked() updates state->state

   state->flags is [RW]
   state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1

 * should have suppressed the preceding nfs4_close_prepare() from
   sending open_downgrade

1964406 -> write call
1964408 -> downgrade (to read) reply seqid=4 (close of #1 retry)

   nfs_clear_open_stateid_locked()
   state->flags is [R]
   state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1

1964409 -> write reply (fails, openmode)

Signed-off-by: Andrew Elble <aweits@rit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Feb 16, 2017
commit 57adec8 upstream.

Calling apply_to_page_range with an empty range results in a BUG_ON
from the core code. This can be triggered by trying to load the st_drv
module with CONFIG_DEBUG_SET_MODULE_RONX enabled:

  kernel BUG at mm/memory.c:1874!
  Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 3 PID: 1764 Comm: insmod Not tainted 4.5.0-rc1+ #2
  Hardware name: ARM Juno development board (r0) (DT)
  task: ffffffc9763b8000 ti: ffffffc975af8000 task.ti: ffffffc975af8000
  PC is at apply_to_page_range+0x2cc/0x2d0
  LR is at change_memory_common+0x80/0x108

This patch fixes the issue by making change_memory_common (called by the
set_memory_* functions) a NOP when numpages == 0, therefore avoiding the
erroneous call to apply_to_page_range and bringing us into line with x86
and s390.

Reviewed-by: Laura Abbott <labbott@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Mika Penttilä <mika.penttila@nextfour.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Feb 16, 2017
commit d96b339 upstream.

I saw the following BUG_ON triggered in a testcase where a process calls
madvise(MADV_SOFT_OFFLINE) on thps, along with a background process that
calls migratepages command repeatedly (doing ping-pong among different
NUMA nodes) for the first process:

   Soft offlining page 0x60000 at 0x700000600000
   __get_any_page: 0x60000 free buddy page
   page:ffffea0001800000 count:0 mapcount:-127 mapping:          (null) index:0x1
   flags: 0x1fffc0000000000()
   page dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0)
   ------------[ cut here ]------------
   kernel BUG at /src/linux-dev/include/linux/mm.h:342!
   invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
   Modules linked in: cfg80211 rfkill crc32c_intel serio_raw virtio_balloon i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi
   CPU: 3 PID: 3035 Comm: test_alloc_gene Tainted: G           O    4.4.0-rc8-v4.4-rc8-160107-1501-00000-rc8+ #74
   Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
   task: ffff88007c63d5c0 ti: ffff88007c210000 task.ti: ffff88007c210000
   RIP: 0010:[<ffffffff8118998c>]  [<ffffffff8118998c>] put_page+0x5c/0x60
   RSP: 0018:ffff88007c213e00  EFLAGS: 00010246
   Call Trace:
     put_hwpoison_page+0x4e/0x80
     soft_offline_page+0x501/0x520
     SyS_madvise+0x6bc/0x6f0
     entry_SYSCALL_64_fastpath+0x12/0x6a
   Code: 8b fc ff ff 5b 5d c3 48 89 df e8 b0 fa ff ff 48 89 df 31 f6 e8 c6 7d ff ff 5b 5d c3 48 c7 c6 08 54 a2 81 48 89 df e8 a4 c5 01 00 <0f> 0b 66 90 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 8b 47
   RIP  [<ffffffff8118998c>] put_page+0x5c/0x60
    RSP <ffff88007c213e00>

The root cause resides in get_any_page() which retries to get a refcount
of the page to be soft-offlined.  This function calls
put_hwpoison_page(), expecting that the target page is putback to LRU
list.  But it can be also freed to buddy.  So the second check need to
care about such case.

Fixes: af8fae7 ("mm/memory-failure.c: clean up soft_offline_page()")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Feb 16, 2017
commit ddf1d39 upstream.

An unprivileged user can trigger an oops on a kernel with
CONFIG_CHECKPOINT_RESTORE.

proc_pid_cmdline_read takes mmap_sem for reading and obtains args + env
start/end values. These get sanity checked as follows:
        BUG_ON(arg_start > arg_end);
        BUG_ON(env_start > env_end);

These can be changed by prctl_set_mm. Turns out also takes the semaphore for
reading, effectively rendering it useless. This results in:

  kernel BUG at fs/proc/base.c:240!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: virtio_net
  CPU: 0 PID: 925 Comm: a.out Not tainted 4.4.0-rc8-next-20160105dupa+ #71
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  task: ffff880077a68000 ti: ffff8800784d0000 task.ti: ffff8800784d0000
  RIP: proc_pid_cmdline_read+0x520/0x530
  RSP: 0018:ffff8800784d3db8  EFLAGS: 00010206
  RAX: ffff880077c5b6b0 RBX: ffff8800784d3f18 RCX: 0000000000000000
  RDX: 0000000000000002 RSI: 00007f78e8857000 RDI: 0000000000000246
  RBP: ffff8800784d3e40 R08: 0000000000000008 R09: 0000000000000001
  R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000050
  R13: 00007f78e8857800 R14: ffff88006fcef000 R15: ffff880077c5b600
  FS:  00007f78e884a740(0000) GS:ffff88007b200000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00007f78e8361770 CR3: 00000000790a5000 CR4: 00000000000006f0
  Call Trace:
    __vfs_read+0x37/0x100
    vfs_read+0x82/0x130
    SyS_read+0x58/0xd0
    entry_SYSCALL_64_fastpath+0x12/0x76
  Code: 4c 8b 7d a8 eb e9 48 8b 9d 78 ff ff ff 4c 8b 7d 90 48 8b 03 48 39 45 a8 0f 87 f0 fe ff ff e9 d1 fe ff ff 4c 8b 7d 90 eb c6 0f 0b <0f> 0b 0f 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
  RIP   proc_pid_cmdline_read+0x520/0x530
  ---[ end trace 97882617ae9c6818 ]---

Turns out there are instances where the code just reads aformentioned
values without locking whatsoever - namely environ_read and get_cmdline.

Interestingly these functions look quite resilient against bogus values,
but I don't believe this should be relied upon.

The first patch gets rid of the oops bug by grabbing mmap_sem for
writing.

The second patch is optional and puts locking around aformentioned
consumers for safety.  Consumers of other fields don't seem to benefit
from similar treatment and are left untouched.

This patch (of 2):

The code was taking the semaphore for reading, which does not protect
against readers nor concurrent modifications.

The problem could cause a sanity checks to fail in procfs's cmdline
reader, resulting in an OOPS.

Note that some functions perform an unlocked read of various mm fields,
but they seem to be fine despite possible modificaton.

Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jarod Wilson <jarod@redhat.com>
Cc: Jan Stancek <jstancek@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anshuman Khandual <anshuman.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Feb 16, 2017
[ Upstream commit 7716682 ]

Ilya reported following lockdep splat:

kernel: =========================
kernel: [ BUG: held lock freed! ]
kernel: 4.5.0-rc1-ceph-00026-g5e0a311 #1 Not tainted
kernel: -------------------------
kernel: swapper/5/0 is freeing memory
ffff880035c9d200-ffff880035c9dbff, with a lock still held there!
kernel: (&(&queue->rskq_lock)->rlock){+.-...}, at:
[<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0
kernel: 4 locks held by swapper/5/0:
kernel: #0:  (rcu_read_lock){......}, at: [<ffffffff8169ef6b>]
netif_receive_skb_internal+0x4b/0x1f0
kernel: #1:  (rcu_read_lock){......}, at: [<ffffffff816e977f>]
ip_local_deliver_finish+0x3f/0x380
kernel: #2:  (slock-AF_INET){+.-...}, at: [<ffffffff81685ffb>]
sk_clone_lock+0x19b/0x440
kernel: #3:  (&(&queue->rskq_lock)->rlock){+.-...}, at:
[<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0

To properly fix this issue, inet_csk_reqsk_queue_add() needs
to return to its callers if the child as been queued
into accept queue.

We also need to make sure listener is still there before
calling sk->sk_data_ready(), by holding a reference on it,
since the reference carried by the child can disappear as
soon as the child is put on accept queue.

Reported-by: Ilya Dryomov <idryomov@gmail.com>
Fixes: ebb516a ("tcp/dccp: fix race at listener dismantle phase")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Feb 16, 2017
…vice

commit fecaee6 upstream.

This bug can be reproduced by the following script:

  #!/bin/bash

  bcache_sysfs="/sys/fs/bcache"

  function clear_cache()
  {
  	if [ ! -e $bcache_sysfs ]; then
  		echo "no bcache sysfs"
  		exit
  	fi

  	cset_uuid=$(ls -l $bcache_sysfs|head -n 2|tail -n 1|awk '{print $9}')
  	sudo sh -c "echo $cset_uuid > /sys/block/sdb/sdb1/bcache/detach"
  	sleep 5
  	sudo sh -c "echo $cset_uuid > /sys/block/sdb/sdb1/bcache/attach"
  }

  for ((i=0;i<10;i++)); do
  	clear_cache
  done

The warning messages look like below:
[  275.948611] ------------[ cut here ]------------
[  275.963840] WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xb8/0xd0() (Tainted: P        W
---------------   )
[  275.979253] Hardware name: Tecal RH2285
[  275.994106] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:09.0/0000:08:00.0/host4/target4:2:1/4:2:1:0/block/sdb/sdb1/bcache/cache'
[  276.024105] Modules linked in: bcache tcp_diag inet_diag ipmi_devintf ipmi_si ipmi_msghandler
bonding 8021q garp stp llc ipv6 ext3 jbd loop sg iomemory_vsl(P) bnx2 microcode serio_raw i2c_i801
i2c_core iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 jbd2 mbcache megaraid_sas
pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_wait_scan]
[  276.072643] Pid: 2765, comm: sh Tainted: P        W  ---------------    2.6.32 #1
[  276.089315] Call Trace:
[  276.105801]  [<ffffffff81070fe7>] ? warn_slowpath_common+0x87/0xc0
[  276.122650]  [<ffffffff810710d6>] ? warn_slowpath_fmt+0x46/0x50
[  276.139361]  [<ffffffff81205c08>] ? sysfs_add_one+0xb8/0xd0
[  276.156012]  [<ffffffff8120609b>] ? sysfs_do_create_link+0x12b/0x170
[  276.172682]  [<ffffffff81206113>] ? sysfs_create_link+0x13/0x20
[  276.189282]  [<ffffffffa03bda21>] ? bcache_device_link+0xc1/0x110 [bcache]
[  276.205993]  [<ffffffffa03bfa08>] ? bch_cached_dev_attach+0x478/0x4f0 [bcache]
[  276.222794]  [<ffffffffa03c4a17>] ? bch_cached_dev_store+0x627/0x780 [bcache]
[  276.239680]  [<ffffffff8116783a>] ? alloc_pages_current+0xaa/0x110
[  276.256594]  [<ffffffff81203b15>] ? sysfs_write_file+0xe5/0x170
[  276.273364]  [<ffffffff811887b8>] ? vfs_write+0xb8/0x1a0
[  276.290133]  [<ffffffff811890b1>] ? sys_write+0x51/0x90
[  276.306368]  [<ffffffff8100c072>] ? system_call_fastpath+0x16/0x1b
[  276.322301] ---[ end trace 9f5d4fcdd0c3edfb ]---
[  276.338241] ------------[ cut here ]------------
[  276.354109] WARNING: at /home/wenqing.lz/bcache/bcache/super.c:720
bcache_device_link+0xdf/0x110 [bcache]() (Tainted: P        W  ---------------   )
[  276.386017] Hardware name: Tecal RH2285
[  276.401430] Couldn't create device <-> cache set symlinks
[  276.401759] Modules linked in: bcache tcp_diag inet_diag ipmi_devintf ipmi_si ipmi_msghandler
bonding 8021q garp stp llc ipv6 ext3 jbd loop sg iomemory_vsl(P) bnx2 microcode serio_raw i2c_i801
i2c_core iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 jbd2 mbcache megaraid_sas
pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_wait_scan]
[  276.465477] Pid: 2765, comm: sh Tainted: P        W  ---------------    2.6.32 #1
[  276.482169] Call Trace:
[  276.498610]  [<ffffffff81070fe7>] ? warn_slowpath_common+0x87/0xc0
[  276.515405]  [<ffffffff810710d6>] ? warn_slowpath_fmt+0x46/0x50
[  276.532059]  [<ffffffffa03bda3f>] ? bcache_device_link+0xdf/0x110 [bcache]
[  276.548808]  [<ffffffffa03bfa08>] ? bch_cached_dev_attach+0x478/0x4f0 [bcache]
[  276.565569]  [<ffffffffa03c4a17>] ? bch_cached_dev_store+0x627/0x780 [bcache]
[  276.582418]  [<ffffffff8116783a>] ? alloc_pages_current+0xaa/0x110
[  276.599341]  [<ffffffff81203b15>] ? sysfs_write_file+0xe5/0x170
[  276.616142]  [<ffffffff811887b8>] ? vfs_write+0xb8/0x1a0
[  276.632607]  [<ffffffff811890b1>] ? sys_write+0x51/0x90
[  276.648671]  [<ffffffff8100c072>] ? system_call_fastpath+0x16/0x1b
[  276.664756] ---[ end trace 9f5d4fcdd0c3edfc ]---

We forget to clear BCACHE_DEV_UNLINK_DONE flag in bcache_device_attach()
function when we attach a backing device first time.  After detaching this
backing device, this flag will be true and sysfs_remove_link() isn't called in
bcache_device_unlink().  Then when we attach this backing device again,
sysfs_create_link() will return EEXIST error in bcache_device_link().

So the fix is trival and we clear this flag in bcache_device_link().

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Tested-by: Joshua Schmid <jschmid@suse.com>
Tested-by: Eric Wheeler <bcache@linux.ewheeler.net>
Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Feb 16, 2017
commit 9f2dfda upstream.

An inverted return value check in hostfs_mknod() caused the function
to return success after handling it as an error (and cleaning up).

It resulted in the following segfault when trying to bind() a named
unix socket:

  Pid: 198, comm: a.out Not tainted 4.4.0-rc4
  RIP: 0033:[<0000000061077df6>]
  RSP: 00000000daae5d60  EFLAGS: 00010202
  RAX: 0000000000000000 RBX: 000000006092a460 RCX: 00000000dfc54208
  RDX: 0000000061073ef1 RSI: 0000000000000070 RDI: 00000000e027d600
  RBP: 00000000daae5de0 R08: 00000000da980ac0 R09: 0000000000000000
  R10: 0000000000000003 R11: 00007fb1ae08f72a R12: 0000000000000000
  R13: 000000006092a460 R14: 00000000daaa97c0 R15: 00000000daaa9a88
  Kernel panic - not syncing: Kernel mode fault at addr 0x40, ip 0x61077df6
  CPU: 0 PID: 198 Comm: a.out Not tainted 4.4.0-rc4 #1
  Stack:
   e027d620 dfc54208 0000006f da981398
   61bee000 0000c1ed daae5de0 0000006e
   e027d620 dfcd4208 00000005 6092a460
  Call Trace:
   [<60dedc67>] SyS_bind+0xf7/0x110
   [<600587be>] handle_syscall+0x7e/0x80
   [<60066ad7>] userspace+0x3e7/0x4e0
   [<6006321f>] ? save_registers+0x1f/0x40
   [<6006c88e>] ? arch_prctl+0x1be/0x1f0
   [<60054985>] fork_handler+0x85/0x90

Let's also get rid of the "cosmic ray protection" while we're at it.

Fixes: e919305 "hostfs: fix races in dentry_name() and inode_name()"
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Feb 16, 2017
commit dcc7fdb upstream.

v4l2-compliance sends a zeroed struct v4l2_streamparm in
v4l2-test-formats.cpp::testParmType(), and this results in a division by
0 in some gspca subdrivers:

  divide error: 0000 [#1] SMP
  Modules linked in: gspca_ov534 gspca_main ...
  CPU: 0 PID: 17201 Comm: v4l2-compliance Not tainted 4.3.0-rc2-ao2 #1
  Hardware name: System manufacturer System Product Name/M2N-E SLI, BIOS
    ASUS M2N-E SLI ACPI BIOS Revision 1301 09/16/2010
  task: ffff8800818306c0 ti: ffff880095c4c000 task.ti: ffff880095c4c000
  RIP: 0010:[<ffffffffa079bd62>]  [<ffffffffa079bd62>] sd_set_streamparm+0x12/0x60 [gspca_ov534]
  RSP: 0018:ffff880095c4fce8  EFLAGS: 00010296
  RAX: 0000000000000000 RBX: ffff8800c9522000 RCX: ffffffffa077a140
  RDX: 0000000000000000 RSI: ffff880095e0c100 RDI: ffff8800c9522000
  RBP: ffff880095e0c100 R08: ffffffffa077a100 R09: 00000000000000cc
  R10: ffff880067ec7740 R11: 0000000000000016 R12: ffffffffa07bb400
  R13: 0000000000000000 R14: ffff880081b6a800 R15: 0000000000000000
  FS:  00007fda0de78740(0000) GS:ffff88012fc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000014630f8 CR3: 00000000cf349000 CR4: 00000000000006f0
  Stack:
   ffffffffa07a6431 ffff8800c9522000 ffffffffa077656e 00000000c0cc5616
   ffff8800c9522000 ffffffffa07a5e20 ffff880095e0c100 0000000000000000
   ffff880067ec7740 ffffffffa077a140 ffff880067ec7740 0000000000000016
  Call Trace:
   [<ffffffffa07a6431>] ? v4l_s_parm+0x21/0x50 [videodev]
   [<ffffffffa077656e>] ? vidioc_s_parm+0x4e/0x60 [gspca_main]
   [<ffffffffa07a5e20>] ? __video_do_ioctl+0x280/0x2f0 [videodev]
   [<ffffffffa07a5ba0>] ? video_ioctl2+0x20/0x20 [videodev]
   [<ffffffffa07a59b9>] ? video_usercopy+0x319/0x4e0 [videodev]
   [<ffffffff81182dc1>] ? page_add_new_anon_rmap+0x71/0xa0
   [<ffffffff811afb92>] ? mem_cgroup_commit_charge+0x52/0x90
   [<ffffffff81179b18>] ? handle_mm_fault+0xc18/0x1680
   [<ffffffffa07a15cc>] ? v4l2_ioctl+0xac/0xd0 [videodev]
   [<ffffffff811c846f>] ? do_vfs_ioctl+0x28f/0x480
   [<ffffffff811c86d4>] ? SyS_ioctl+0x74/0x80
   [<ffffffff8154a8b6>] ? entry_SYSCALL_64_fastpath+0x16/0x75
  Code: c7 93 d9 79 a0 5b 5d e9 f1 f3 9a e0 0f 1f 00 66 2e 0f 1f 84 00
    00 00 00 00 66 66 66 66 90 53 31 d2 48 89 fb 48 83 ec 08 8b 46 10 <f7>
    76 0c 80 bf ac 0c 00 00 00 88 87 4e 0e 00 00 74 09 80 bf 4f
  RIP  [<ffffffffa079bd62>] sd_set_streamparm+0x12/0x60 [gspca_ov534]
   RSP <ffff880095c4fce8>
  ---[ end trace 279710c2c6c72080 ]---

Following what the doc says about a zeroed timeperframe (see
http://www.linuxtv.org/downloads/v4l-dvb-apis/vidioc-g-parm.html):

  ...
  To reset manually applications can just set this field to zero.

fix the issue by resetting the frame rate to a default value in case of
an unusable timeperframe.

The fix is done in the subdrivers instead of gspca.c because only the
subdrivers have notion of a default frame rate to reset the camera to.

Signed-off-by: Antonio Ospite <ao2@ao2.it>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Feb 16, 2017
commit 4c58f32 upstream.

The fixes provided in this patch assigns a valid net_device structure to
skb before dispatching it for further processing.

Scenario #1:
============

Bluetooth 6lowpan receives an uncompressed IPv6 header, and dispatches it
to netif. The following error occurs:

Null pointer dereference error #1 crash log:

[  845.854013] BUG: unable to handle kernel NULL pointer dereference at
               0000000000000048
[  845.855785] IP: [<ffffffff816e3d36>] enqueue_to_backlog+0x56/0x240
...
[  845.909459] Call Trace:
[  845.911678]  [<ffffffff816e3f64>] netif_rx_internal+0x44/0xf0

The first modification fixes the NULL pointer dereference error by
assigning dev to the local_skb in order to set a valid net_device before
processing the skb by netif_rx_ni().

Scenario #2:
============

Bluetooth 6lowpan receives an UDP compressed message which needs further
decompression by nhc_udp. The following error occurs:

Null pointer dereference error #2 crash log:

[   63.295149] BUG: unable to handle kernel NULL pointer dereference at
               0000000000000840
[   63.295931] IP: [<ffffffffc0559540>] udp_uncompress+0x320/0x626
               [nhc_udp]

The second modification fixes the NULL pointer dereference error by
assigning dev to the local_skb in the case of a udp compressed packet.
The 6lowpan udp_uncompress function expects that the net_device is set in
the skb when checking lltype.

Signed-off-by: Glenn Ruben Bakke <glenn.ruben.bakke@nordicsemi.no>
Signed-off-by: Lukasz Duda <lukasz.duda@nordicsemi.no>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
erstrom pushed a commit that referenced this issue Sep 18, 2017
netif_napi_del() should be paired with netif_napi_add(), however no
such call takes place in ftgmac100_probe(). This triggers a NULL
pointer dereference if e.g. no PHY is found by the MDIO probe:

	 [ 2.770000] libphy: Fixed MDIO Bus: probed
	 [ 2.770000] ftgmac100 1e660000.ethernet: Generated random MAC address 66:58:c0:5a:50:b8
	 [ 2.790000] libphy: ftgmac100_mdio: probed
	 [ 2.790000] ftgmac100 1e660000.ethernet (unnamed net_device) (uninitialized): eth%d: no PHY found
	 [ 2.790000] ftgmac100 1e660000.ethernet: MII Probe failed!
	 [ 2.810000] Unable to handle kernel NULL pointer dereference at virtual address 00000004
	 [ 2.810000] pgd = 80004000
	 [ 2.810000] [00000004] *pgd=00000000
	 [ 2.810000] Internal error: Oops: 805 [#1] ARM
	 [ 2.810000] CPU: 0 PID: 1 Comm: swapper Not tainted 4.10.17-1a4df30c39cf5ee0e3d2528c409787ccbb4a672a #1
	 [ 2.810000] Hardware name: ASpeed SoC
	 [ 2.810000] task: 9e421b60 task.stack: 9e4a0000
	 [ 2.810000] PC is at netif_napi_del+0x74/0xa4
	 [ 2.810000] LR is at ftgmac100_probe+0x290/0x674
	 [ 2.810000] pc : [<80331004>] lr : [<80292b30>] psr: 60000013
	 [ 2.810000] sp : 9e4a1d70 ip : 9e4a1d88 fp : 9e4a1d84
	 [ 2.810000] r10: 9e565000 r9 : ffffffed r8 : 00000007
	 [ 2.810000] r7 : 9e565480 r6 : 9ec072c0 r5 : 00000000 r4 : 9e5654d8
	 [ 2.810000] r3 : 9e565530 r2 : 00000000 r1 : 00000000 r0 : 9e5654d8
	 [ 2.810000] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
	 [ 2.810000] Control: 00c5387d Table: 80004008 DAC: 00000055
	 [ 2.810000] Process swapper (pid: 1, stack limit = 0x9e4a0188)
	 [ 2.810000] Stack: (0x9e4a1d70 to 0x9e4a2000)
	 [ 2.810000] 1d60: 9e565000 9e549e10 9e4a1dcc 9e4a1d88
	 [ 2.810000] 1d80: 80292b30 80330f9c ffffffff 9e4a1d98 80146058 9ec072c0 00009e10 00000000
	 [ 2.810000] 1da0: 9e549e18 9e549e10 ffffffed 805f81f4 fffffdfb 00000000 00000000 00000000
	 [ 2.810000] 1dc0: 9e4a1dec 9e4a1dd0 80243df8 802928ac 9e549e10 8062cbd8 8062cbe0 805f81f4
	 [ 2.810000] 1de0: 9e4a1e24 9e4a1df0 80242178 80243da4 803001d0 802ffa60 9e4a1e24 9e549e10
	 [ 2.810000] 1e00: 9e549e44 805f81f4 00000000 00000000 805b8840 8058a6b0 9e4a1e44 9e4a1e28
	 [ 2.810000] 1e20: 80242434 80241f04 00000000 805f81f4 80242344 00000000 9e4a1e6c 9e4a1e48
	 [ 2.810000] 1e40: 80240148 80242350 9e425bac 9e4fdc90 9e790e94 805f81f4 9e790e60 805f5640
	 [ 2.810000] 1e60: 9e4a1e7c 9e4a1e70 802425dc 802400d8 9e4a1ea4 9e4a1e80 80240ba8 802425c0
	 [ 2.810000] 1e80: 8050b6ac 9e4a1e90 805f81f4 ffffe000 805b8838 80616720 9e4a1ebc 9e4a1ea8
	 [ 2.810000] 1ea0: 80243068 80240a68 805ab24c ffffe000 9e4a1ecc 9e4a1ec0 80244a38 80242fec
	 [ 2.810000] 1ec0: 9e4a1edc 9e4a1ed0 805ab264 80244a04 9e4a1f4c 9e4a1ee0 8058ae70 805ab258
	 [ 2.810000] 1ee0: 80032c68 801e3fd8 8052f800 8041af2c 9e4a1f4c 9e4a1f00 80032f90 8058a6bc
	 [ 2.810000] 1f00: 9e4a1f2c 9e4a1f10 00000006 00000006 00000000 8052f220 805112f0 00000000
	 [ 2.810000] 1f20: 9e4a1f4c 00000006 80616720 805cf400 80616720 805b8838 80616720 00000057
	 [ 2.810000] 1f40: 9e4a1f94 9e4a1f50 8058b040 8058add0 00000006 00000006 00000000 8058a6b0
	 [ 2.810000] 1f60: 3940bf3d 00000007 f115c2e8 00000000 803fd158 00000000 00000000 00000000
	 [ 2.810000] 1f80: 00000000 00000000 9e4a1fac 9e4a1f98 803fd170 8058af38 00000000 803fd158
	 [ 2.810000] 1fa0: 00000000 9e4a1fb0 8000a5e8 803fd164 00000000 00000000 00000000 00000000
	 [ 2.810000] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
	 [ 2.810000] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 d11dcae8 af8ddec5
	 [ 2.810000] [<80331004>] (netif_napi_del) from [<80292b30>] (ftgmac100_probe+0x290/0x674)
	 [ 2.810000] [<80292b30>] (ftgmac100_probe) from [<80243df8>] (platform_drv_probe+0x60/0xc0)
	 [ 2.810000] [<80243df8>] (platform_drv_probe) from [<80242178>] (driver_probe_device+0x280/0x44c)
	 [ 2.810000] [<80242178>] (driver_probe_device) from [<80242434>] (__driver_attach+0xf0/0x104)
	 [ 2.810000] [<80242434>] (__driver_attach) from [<80240148>] (bus_for_each_dev+0x7c/0xb0)
	 [ 2.810000] [<80240148>] (bus_for_each_dev) from [<802425dc>] (driver_attach+0x28/0x30)
	 [ 2.810000] [<802425dc>] (driver_attach) from [<80240ba8>] (bus_add_driver+0x14c/0x268)
	 [ 2.810000] [<80240ba8>] (bus_add_driver) from [<80243068>] (driver_register+0x88/0x104)
	 [ 2.810000] [<80243068>] (driver_register) from [<80244a38>] (__platform_driver_register+0x40/0x54)
	 [ 2.810000] [<80244a38>] (__platform_driver_register) from [<805ab264>] (ftgmac100_driver_init+0x18/0x20)
	 [ 2.810000] [<805ab264>] (ftgmac100_driver_init) from [<8058ae70>] (do_one_initcall+0xac/0x168)
	 [ 2.810000] [<8058ae70>] (do_one_initcall) from [<8058b040>] (kernel_init_freeable+0x114/0x1cc)
	 [ 2.810000] [<8058b040>] (kernel_init_freeable) from [<803fd170>] (kernel_init+0x18/0x104)
	 [ 2.810000] [<803fd170>] (kernel_init) from [<8000a5e8>] (ret_from_fork+0x14/0x2c)
	 [ 2.810000] Code: e594205c e5941058 e2843058 e3a05000 (e5812004)
	 [ 3.210000] ---[ end trace f32811052fd3860c ]---

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom pushed a commit that referenced this issue Sep 18, 2017
In commit, 139bb36 ("tipc: advance the time of deleting
subscription from subscriber->subscrp_list"), we delete the
subscription from the subscribers list and from nametable
unconditionally. This leads to the following bug if the timer
running tipc_subscrp_timeout() in another CPU accesses the
subscription list after the subscription delete request.

[39.570] general protection fault: 0000 [#1] SMP
::
[39.574] task: ffffffff81c10540 task.stack: ffffffff81c00000
[39.575] RIP: 0010:tipc_subscrp_timeout+0x32/0x80 [tipc]
[39.576] RSP: 0018:ffff88003ba03e90 EFLAGS: 00010282
[39.576] RAX: dead000000000200 RBX: ffff88003f0f3600 RCX: 0000000000000101
[39.577] RDX: dead000000000100 RSI: 0000000000000201 RDI: ffff88003f0d7948
[39.578] RBP: ffff88003ba03ea0 R08: 0000000000000001 R09: ffff88003ba03ef8
[39.579] R10: 000000000000014f R11: 0000000000000000 R12: ffff88003f0d7948
[39.580] R13: ffff88003f0f3618 R14: ffffffffa006c250 R15: ffff88003f0f3600
[39.581] FS:  0000000000000000(0000) GS:ffff88003ba00000(0000) knlGS:0000000000000000
[39.582] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[39.583] CR2: 00007f831c6e0714 CR3: 000000003d3b0000 CR4: 00000000000006f0
[39.584] Call Trace:
[39.584]  <IRQ>
[39.585]  call_timer_fn+0x3d/0x180
[39.585]  ? tipc_subscrb_rcv_cb+0x260/0x260 [tipc]
[39.586]  run_timer_softirq+0x168/0x1f0
[39.586]  ? sched_clock_cpu+0x16/0xc0
[39.587]  __do_softirq+0x9b/0x2de
[39.587]  irq_exit+0x60/0x70
[39.588]  smp_apic_timer_interrupt+0x3d/0x50
[39.588]  apic_timer_interrupt+0x86/0x90
[39.589] RIP: 0010:default_idle+0x20/0xf0
[39.589] RSP: 0018:ffffffff81c03e58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[39.590] RAX: 0000000000000000 RBX: ffffffff81c10540 RCX: 0000000000000000
[39.591] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[39.592] RBP: ffffffff81c03e68 R08: 0000000000000000 R09: 0000000000000000
[39.593] R10: ffffc90001cbbe00 R11: 0000000000000000 R12: 0000000000000000
[39.594] R13: ffffffff81c10540 R14: 0000000000000000 R15: 0000000000000000
[39.595]  </IRQ>
::
[39.603] RIP: tipc_subscrp_timeout+0x32/0x80 [tipc] RSP: ffff88003ba03e90
[39.604] ---[ end trace 79ce94b7216cb459 ]---

Fixes: 139bb36 ("tipc: advance the time of deleting subscription from subscriber->subscrp_list")
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom pushed a commit that referenced this issue Sep 18, 2017
No matter whether a request is inserted into workqueue as a work item
to cancel a subscription or to delete a subscription's subscriber
asynchronously, the work items may be executed in different workers.
As a result, it doesn't mean that one request which is raised prior to
another request is definitely handled before the latter. By contrast,
if the latter request is executed before the former request, below
error may happen:

[  656.183644] BUG: spinlock bad magic on CPU#0, kworker/u8:0/12117
[  656.184487] general protection fault: 0000 [#1] SMP
[  656.185160] Modules linked in: tipc ip6_udp_tunnel udp_tunnel 9pnet_virtio 9p 9pnet virtio_net virtio_pci virtio_ring virtio [last unloaded: ip6_udp_tunnel]
[  656.187003] CPU: 0 PID: 12117 Comm: kworker/u8:0 Not tainted 4.11.0-rc7+ #6
[  656.187920] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  656.188690] Workqueue: tipc_rcv tipc_recv_work [tipc]
[  656.189371] task: ffff88003f5cec40 task.stack: ffffc90004448000
[  656.190157] RIP: 0010:spin_bug+0xdd/0xf0
[  656.190678] RSP: 0018:ffffc9000444bcb8 EFLAGS: 00010202
[  656.191375] RAX: 0000000000000034 RBX: ffff88003f8d1388 RCX: 0000000000000000
[  656.192321] RDX: ffff88003ba13708 RSI: ffff88003ba0cd08 RDI: ffff88003ba0cd08
[  656.193265] RBP: ffffc9000444bcd0 R08: 0000000000000030 R09: 000000006b6b6b6b
[  656.194208] R10: ffff8800bde3e000 R11: 00000000000001b4 R12: 6b6b6b6b6b6b6b6b
[  656.195157] R13: ffffffff81a3ca64 R14: ffff88003f8d1388 R15: ffff88003f8d13a0
[  656.196101] FS:  0000000000000000(0000) GS:ffff88003ba00000(0000) knlGS:0000000000000000
[  656.197172] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  656.197935] CR2: 00007f0b3d2e6000 CR3: 000000003ef9e000 CR4: 00000000000006f0
[  656.198873] Call Trace:
[  656.199210]  do_raw_spin_lock+0x66/0xa0
[  656.199735]  _raw_spin_lock_bh+0x19/0x20
[  656.200258]  tipc_subscrb_subscrp_delete+0x28/0xf0 [tipc]
[  656.200990]  tipc_subscrb_rcv_cb+0x45/0x260 [tipc]
[  656.201632]  tipc_receive_from_sock+0xaf/0x100 [tipc]
[  656.202299]  tipc_recv_work+0x2b/0x60 [tipc]
[  656.202872]  process_one_work+0x157/0x420
[  656.203404]  worker_thread+0x69/0x4c0
[  656.203898]  kthread+0x138/0x170
[  656.204328]  ? process_one_work+0x420/0x420
[  656.204889]  ? kthread_create_on_node+0x40/0x40
[  656.205527]  ret_from_fork+0x29/0x40
[  656.206012] Code: 48 8b 0c 25 00 c5 00 00 48 c7 c7 f0 24 a3 81 48 81 c1 f0 05 00 00 65 8b 15 61 ef f5 7e e8 9a 4c 09 00 4d 85 e4 44 8b 4b 08 74 92 <45> 8b 84 24 40 04 00 00 49 8d 8c 24 f0 05 00 00 eb 8d 90 0f 1f
[  656.208504] RIP: spin_bug+0xdd/0xf0 RSP: ffffc9000444bcb8
[  656.209798] ---[ end trace e2a800e6eb0770be ]---

In above scenario, the request of deleting subscriber was performed
earlier than the request of canceling a subscription although the
latter was issued before the former, which means tipc_subscrb_delete()
was called before tipc_subscrp_cancel(). As a result, when
tipc_subscrb_subscrp_delete() called by tipc_subscrp_cancel() was
executed to cancel a subscription, the subscription's subscriber
refcnt had been decreased to 1. After tipc_subscrp_delete() where
the subscriber was freed because its refcnt was decremented to zero,
but the subscriber's lock had to be released, as a consequence, panic
happened.

By contrast, if we increase subscriber's refcnt before
tipc_subscrb_subscrp_delete() is called in tipc_subscrp_cancel(),
the panic issue can be avoided.

Fixes: d094c4d ("tipc: add subscription refcount to avoid invalid delete")
Reported-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom pushed a commit that referenced this issue Sep 18, 2017
Priit reported that stmmac was crashing with the trace below. This is because
phy_attached_print() is called too early right after the PHY device has been
found, but before it has a driver attached, since that is only done in
phy_probe() which occurs later.

Fix this by dealing with a possibly NULL phydev->drv point since that can
happen here, but could also happen if we voluntarily did an unbind of the
PHY device with the PHY driver.

sun7i-dwmac 1c50000.ethernet: PTP uses main clock
sun7i-dwmac 1c50000.ethernet: no reset control found
sun7i-dwmac 1c50000.ethernet: no regulator found
sun7i-dwmac 1c50000.ethernet: Ring mode enabled
sun7i-dwmac 1c50000.ethernet: DMA HW capability register supported
sun7i-dwmac 1c50000.ethernet: Normal descriptors
libphy: stmmac: probed
Unable to handle kernel NULL pointer dereference at virtual address 00000048
pgd = c0004000
[00000048] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc6-00318-g0065bd7fa384 #1
Hardware name: Allwinner sun7i (A20) Family
task: ee868000 task.stack: ee85c000
PC is at phy_attached_print+0x1c/0x8c
LR is at stmmac_mdio_register+0x12c/0x200
pc : [<c04510ac>]    lr : [<c045e6b4>]    psr: 60000013
sp : ee85ddc8  ip : 00000000  fp : c07dfb5c
r10: ee981210  r9 : 00000001  r8 : eea73000
r7 : eeaa6dd0  r6 : eeb49800  r5 : 00000000  r4 : 00000000
r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : eeb49800
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 4000406a  DAC: 00000051
Process swapper/0 (pid: 1, stack limit = 0xee85c210)
Stack: (0xee85ddc8 to 0xee85e000)
ddc0:                   00000000 00000002 eeb49400 eea72000 00000000 eeb49400
dde0: c045e6b4 00000000 ffffffff eeab0810 00000000 c08051f8 ee9292c0 c016d480
de00: eea725c0 eea73000 eea72000 00000001 eea726c0 c0457d0c 00000040 00000020
de20: 00000000 c045b850 00000001 00000000 ee981200 eeab0810 eeaa6ed0 ee981210
de40: 00000000 c094a4a0 00000000 c0465180 eeaa7550 f08d0000 c9ffb90c 00000032
de60: fffffffa 00000032 ee981210 ffffffed c0a46620 fffffdfb c0a46620 c03f7be8
de80: ee981210 c0a9a388 00000000 00000000 c0a46620 c03f63e0 ee981210 c0a46620
dea0: ee981244 00000000 00000007 000000c6 c094a4a0 c03f6534 00000000 c0a46620
dec0: c03f6490 c03f49ec ee828a58 ee9217b4 c0a46620 eeaa4b00 c0a43230 c03f59fc
dee0: c08051f8 c094a49c c0a46620 c0a46620 00000000 c091c668 c093783c c03f6dfc
df00: ffffe000 00000000 c091c668 c010177c eefe0938 eefe0935 c085e200 000000c6
df20: 00000005 c0136bc8 60000013 c080b3a4 00000006 00000006 c07ce7b4 00000000
df40: c07d7ddc c07cef28 eefe0938 eefe093e c0a0b2f0 c0a641c0 c0a641c0 c0a641c0
df60: c0937834 00000007 000000c6 c094a4a0 00000000 c0900d88 00000006 00000006
df80: 00000000 c09005a8 00000000 c060ecf4 00000000 00000000 00000000 00000000
dfa0: 00000000 c060ecfc 00000000 c0107738 00000000 00000000 00000000 00000000
dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 ffdeffff ffffffff
[<c04510ac>] (phy_attached_print) from [<c045e6b4>] (stmmac_mdio_register+0x12c/0x200)
[<c045e6b4>] (stmmac_mdio_register) from [<c045b850>] (stmmac_dvr_probe+0x850/0x96c)
[<c045b850>] (stmmac_dvr_probe) from [<c0465180>] (sun7i_gmac_probe+0x120/0x180)
[<c0465180>] (sun7i_gmac_probe) from [<c03f7be8>] (platform_drv_probe+0x50/0xac)
[<c03f7be8>] (platform_drv_probe) from [<c03f63e0>] (driver_probe_device+0x234/0x2e4)
[<c03f63e0>] (driver_probe_device) from [<c03f6534>] (__driver_attach+0xa4/0xa8)
[<c03f6534>] (__driver_attach) from [<c03f49ec>] (bus_for_each_dev+0x4c/0x9c)
[<c03f49ec>] (bus_for_each_dev) from [<c03f59fc>] (bus_add_driver+0x190/0x214)
[<c03f59fc>] (bus_add_driver) from [<c03f6dfc>] (driver_register+0x78/0xf4)
[<c03f6dfc>] (driver_register) from [<c010177c>] (do_one_initcall+0x44/0x168)
[<c010177c>] (do_one_initcall) from [<c0900d88>] (kernel_init_freeable+0x144/0x1d0)
[<c0900d88>] (kernel_init_freeable) from [<c060ecfc>] (kernel_init+0x8/0x110)
[<c060ecfc>] (kernel_init) from [<c0107738>] (ret_from_fork+0x14/0x3c)
Code: e59021c8 e59d401c e590302c e3540000 (e5922048)
---[ end trace 39ae87c7923562d0 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

Tested-By: Priit Laes <plaes@plaes.org>
Fixes: fbca164 ("net: stmmac: Use the right logging function in stmmac_mdio_register")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom pushed a commit that referenced this issue Sep 18, 2017
inet_diag_msg_sctp{,l}addr_fill() and sctp_get_sctp_info() copy
sizeof(sockaddr_storage) bytes to fill in sockaddr structs used
to export diagnostic information to userspace.

However, the memory allocated to store sockaddr information is
smaller than that and depends on the address family, so we leak
up to 100 uninitialized bytes to userspace. Just use the size of
the source structs instead, in all the three cases this is what
userspace expects. Zero out the remaining memory.

Unused bytes (i.e. when IPv4 addresses are used) in source
structs sctp_sockaddr_entry and sctp_transport are already
cleared by sctp_add_bind_addr() and sctp_transport_new(),
respectively.

Noticed while testing KASAN-enabled kernel with 'ss':

[ 2326.885243] BUG: KASAN: slab-out-of-bounds in inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag] at addr ffff881be8779800
[ 2326.896800] Read of size 128 by task ss/9527
[ 2326.901564] CPU: 0 PID: 9527 Comm: ss Not tainted 4.11.0-22.el7a.x86_64 #1
[ 2326.909236] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017
[ 2326.917585] Call Trace:
[ 2326.920312]  dump_stack+0x63/0x8d
[ 2326.924014]  kasan_object_err+0x21/0x70
[ 2326.928295]  kasan_report+0x288/0x540
[ 2326.932380]  ? inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag]
[ 2326.938500]  ? skb_put+0x8b/0xd0
[ 2326.942098]  ? memset+0x31/0x40
[ 2326.945599]  check_memory_region+0x13c/0x1a0
[ 2326.950362]  memcpy+0x23/0x50
[ 2326.953669]  inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag]
[ 2326.959596]  ? inet_diag_msg_sctpasoc_fill+0x460/0x460 [sctp_diag]
[ 2326.966495]  ? __lock_sock+0x102/0x150
[ 2326.970671]  ? sock_def_wakeup+0x60/0x60
[ 2326.975048]  ? remove_wait_queue+0xc0/0xc0
[ 2326.979619]  sctp_diag_dump+0x44a/0x760 [sctp_diag]
[ 2326.985063]  ? sctp_ep_dump+0x280/0x280 [sctp_diag]
[ 2326.990504]  ? memset+0x31/0x40
[ 2326.994007]  ? mutex_lock+0x12/0x40
[ 2326.997900]  __inet_diag_dump+0x57/0xb0 [inet_diag]
[ 2327.003340]  ? __sys_sendmsg+0x150/0x150
[ 2327.007715]  inet_diag_dump+0x4d/0x80 [inet_diag]
[ 2327.012979]  netlink_dump+0x1e6/0x490
[ 2327.017064]  __netlink_dump_start+0x28e/0x2c0
[ 2327.021924]  inet_diag_handler_cmd+0x189/0x1a0 [inet_diag]
[ 2327.028045]  ? inet_diag_rcv_msg_compat+0x1b0/0x1b0 [inet_diag]
[ 2327.034651]  ? inet_diag_dump_compat+0x190/0x190 [inet_diag]
[ 2327.040965]  ? __netlink_lookup+0x1b9/0x260
[ 2327.045631]  sock_diag_rcv_msg+0x18b/0x1e0
[ 2327.050199]  netlink_rcv_skb+0x14b/0x180
[ 2327.054574]  ? sock_diag_bind+0x60/0x60
[ 2327.058850]  sock_diag_rcv+0x28/0x40
[ 2327.062837]  netlink_unicast+0x2e7/0x3b0
[ 2327.067212]  ? netlink_attachskb+0x330/0x330
[ 2327.071975]  ? kasan_check_write+0x14/0x20
[ 2327.076544]  netlink_sendmsg+0x5be/0x730
[ 2327.080918]  ? netlink_unicast+0x3b0/0x3b0
[ 2327.085486]  ? kasan_check_write+0x14/0x20
[ 2327.090057]  ? selinux_socket_sendmsg+0x24/0x30
[ 2327.095109]  ? netlink_unicast+0x3b0/0x3b0
[ 2327.099678]  sock_sendmsg+0x74/0x80
[ 2327.103567]  ___sys_sendmsg+0x520/0x530
[ 2327.107844]  ? __get_locked_pte+0x178/0x200
[ 2327.112510]  ? copy_msghdr_from_user+0x270/0x270
[ 2327.117660]  ? vm_insert_page+0x360/0x360
[ 2327.122133]  ? vm_insert_pfn_prot+0xb4/0x150
[ 2327.126895]  ? vm_insert_pfn+0x32/0x40
[ 2327.131077]  ? vvar_fault+0x71/0xd0
[ 2327.134968]  ? special_mapping_fault+0x69/0x110
[ 2327.140022]  ? __do_fault+0x42/0x120
[ 2327.144008]  ? __handle_mm_fault+0x1062/0x17a0
[ 2327.148965]  ? __fget_light+0xa7/0xc0
[ 2327.153049]  __sys_sendmsg+0xcb/0x150
[ 2327.157133]  ? __sys_sendmsg+0xcb/0x150
[ 2327.161409]  ? SyS_shutdown+0x140/0x140
[ 2327.165688]  ? exit_to_usermode_loop+0xd0/0xd0
[ 2327.170646]  ? __do_page_fault+0x55d/0x620
[ 2327.175216]  ? __sys_sendmsg+0x150/0x150
[ 2327.179591]  SyS_sendmsg+0x12/0x20
[ 2327.183384]  do_syscall_64+0xe3/0x230
[ 2327.187471]  entry_SYSCALL64_slow_path+0x25/0x25
[ 2327.192622] RIP: 0033:0x7f41d18fa3b0
[ 2327.196608] RSP: 002b:00007ffc3b731218 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 2327.205055] RAX: ffffffffffffffda RBX: 00007ffc3b731380 RCX: 00007f41d18fa3b0
[ 2327.213017] RDX: 0000000000000000 RSI: 00007ffc3b731340 RDI: 0000000000000003
[ 2327.220978] RBP: 0000000000000002 R08: 0000000000000004 R09: 0000000000000040
[ 2327.228939] R10: 00007ffc3b730f30 R11: 0000000000000246 R12: 0000000000000003
[ 2327.236901] R13: 00007ffc3b731340 R14: 00007ffc3b7313d0 R15: 0000000000000084
[ 2327.244865] Object at ffff881be87797e0, in cache kmalloc-64 size: 64
[ 2327.251953] Allocated:
[ 2327.254581] PID = 9484
[ 2327.257215]  save_stack_trace+0x1b/0x20
[ 2327.261485]  save_stack+0x46/0xd0
[ 2327.265179]  kasan_kmalloc+0xad/0xe0
[ 2327.269165]  kmem_cache_alloc_trace+0xe6/0x1d0
[ 2327.274138]  sctp_add_bind_addr+0x58/0x180 [sctp]
[ 2327.279400]  sctp_do_bind+0x208/0x310 [sctp]
[ 2327.284176]  sctp_bind+0x61/0xa0 [sctp]
[ 2327.288455]  inet_bind+0x5f/0x3a0
[ 2327.292151]  SYSC_bind+0x1a4/0x1e0
[ 2327.295944]  SyS_bind+0xe/0x10
[ 2327.299349]  do_syscall_64+0xe3/0x230
[ 2327.303433]  return_from_SYSCALL_64+0x0/0x6a
[ 2327.308194] Freed:
[ 2327.310434] PID = 4131
[ 2327.313065]  save_stack_trace+0x1b/0x20
[ 2327.317344]  save_stack+0x46/0xd0
[ 2327.321040]  kasan_slab_free+0x73/0xc0
[ 2327.325220]  kfree+0x96/0x1a0
[ 2327.328530]  dynamic_kobj_release+0x15/0x40
[ 2327.333195]  kobject_release+0x99/0x1e0
[ 2327.337472]  kobject_put+0x38/0x70
[ 2327.341266]  free_notes_attrs+0x66/0x80
[ 2327.345545]  mod_sysfs_teardown+0x1a5/0x270
[ 2327.350211]  free_module+0x20/0x2a0
[ 2327.354099]  SyS_delete_module+0x2cb/0x2f0
[ 2327.358667]  do_syscall_64+0xe3/0x230
[ 2327.362750]  return_from_SYSCALL_64+0x0/0x6a
[ 2327.367510] Memory state around the buggy address:
[ 2327.372855]  ffff881be8779700: fc fc fc fc 00 00 00 00 00 00 00 00 fc fc fc fc
[ 2327.380914]  ffff881be8779780: fb fb fb fb fb fb fb fb fc fc fc fc 00 00 00 00
[ 2327.388972] >ffff881be8779800: 00 00 00 00 fc fc fc fc fb fb fb fb fb fb fb fb
[ 2327.397031]                                ^
[ 2327.401792]  ffff881be8779880: fc fc fc fc fb fb fb fb fb fb fb fb fc fc fc fc
[ 2327.409850]  ffff881be8779900: 00 00 00 00 00 04 fc fc fc fc fc fc 00 00 00 00
[ 2327.417907] ==================================================================

This fixes CVE-2017-7558.

References: https://bugzilla.redhat.com/show_bug.cgi?id=1480266
Fixes: 8f840e4 ("sctp: add the sctp_diag.c file")
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom pushed a commit that referenced this issue Sep 18, 2017
Edward Cree says:

====================
bpf: verifier fixes

Fix a couple of bugs introduced in my recent verifier patches.
Patch #2 does slightly increase the insn count on bpf_lxc.o, but only by
 about a hundred insns (i.e. 0.2%).

v2: added test for write-marks bug (patch #1); reworded comment on
 propagate_liveness() for clarity.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom pushed a commit that referenced this issue Sep 18, 2017
commit bbb0302 ("strparser: Generalize strparser") added more
function pointers to 'struct strp_callbacks'; however, kcm_attach() was
not updated to initialize them.  This could cause the ->lock() and/or
->unlock() function pointers to be set to garbage values, causing a
crash in strp_work().

Fix the bug by moving the callback structs into static memory, so
unspecified members are zeroed.  Also constify them while we're at it.

This bug was found by syzkaller, which encountered the following splat:

    IP: 0x55
    PGD 3b1ca067
    P4D 3b1ca067
    PUD 3b12f067
    PMD 0

    Oops: 0010 [#1] SMP KASAN
    Dumping ftrace buffer:
       (ftrace buffer empty)
    Modules linked in:
    CPU: 2 PID: 1194 Comm: kworker/u8:1 Not tainted 4.13.0-rc4-next-20170811 #2
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Workqueue: kstrp strp_work
    task: ffff88006bb0e480 task.stack: ffff88006bb10000
    RIP: 0010:0x55
    RSP: 0018:ffff88006bb17540 EFLAGS: 00010246
    RAX: dffffc0000000000 RBX: ffff88006ce4bd60 RCX: 0000000000000000
    RDX: 1ffff1000d9c97bd RSI: 0000000000000000 RDI: ffff88006ce4bc48
    RBP: ffff88006bb17558 R08: ffffffff81467ab2 R09: 0000000000000000
    R10: ffff88006bb17438 R11: ffff88006bb17940 R12: ffff88006ce4bc48
    R13: ffff88003c683018 R14: ffff88006bb17980 R15: ffff88003c683000
    FS:  0000000000000000(0000) GS:ffff88006de00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000055 CR3: 000000003c145000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     process_one_work+0xbf3/0x1bc0 kernel/workqueue.c:2098
     worker_thread+0x223/0x1860 kernel/workqueue.c:2233
     kthread+0x35e/0x430 kernel/kthread.c:231
     ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
    Code:  Bad RIP value.
    RIP: 0x55 RSP: ffff88006bb17540
    CR2: 0000000000000055
    ---[ end trace f0e4920047069cee ]---

Here is a C reproducer (requires CONFIG_BPF_SYSCALL=y and
CONFIG_AF_KCM=y):

    #include <linux/bpf.h>
    #include <linux/kcm.h>
    #include <linux/types.h>
    #include <stdint.h>
    #include <sys/ioctl.h>
    #include <sys/socket.h>
    #include <sys/syscall.h>
    #include <unistd.h>

    static const struct bpf_insn bpf_insns[3] = {
        { .code = 0xb7 }, /* BPF_MOV64_IMM(0, 0) */
        { .code = 0x95 }, /* BPF_EXIT_INSN() */
    };

    static const union bpf_attr bpf_attr = {
        .prog_type = 1,
        .insn_cnt = 2,
        .insns = (uintptr_t)&bpf_insns,
        .license = (uintptr_t)"",
    };

    int main(void)
    {
        int bpf_fd = syscall(__NR_bpf, BPF_PROG_LOAD,
                             &bpf_attr, sizeof(bpf_attr));
        int inet_fd = socket(AF_INET, SOCK_STREAM, 0);
        int kcm_fd = socket(AF_KCM, SOCK_DGRAM, 0);

        ioctl(kcm_fd, SIOCKCMATTACH,
              &(struct kcm_attach) { .fd = inet_fd, .bpf_fd = bpf_fd });
    }

Fixes: bbb0302 ("strparser: Generalize strparser")
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Tom Herbert <tom@quantonium.net>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom pushed a commit that referenced this issue Sep 18, 2017
…/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2017-08-25

This series contains updates to i40e and i40evf only.

Mitch adjusts the max packet size to account for two VLAN tags.

Sudheer provides a fix to ensure that the watchdog timer is scheduled
immediately after admin queue operations are scheduled in i40evf_down().
Fixes an issue by adding locking around the admin queue command and
update of state variables so that adminq_subtask will have the accurate
information whenever it gets scheduled.

Anjali fixes a bug where the PF flag setup should happen before the VMDq
RSS queue count is initialized for VMDq VSI to get the right number of
queues for RSS in the case of x722 devices.  Fixed a problem with the
hardware ATR eviction feature where the NVM setting was incorrect.

Jake separates the flags into two types, hw_features and flags.  The
hw_features flags contain a set of features which are enabled at init
time and will not contain feature flags that can be toggled.  Everything
else will remain in the flags variable, and can be modified anytime
during run time.  We should not be directly copying a cpumask_t, since
it is bitmap and might not be copied correctly, so use cpumask_copy()
instead.

Stefan Assmann makes vf _offload_flags more "generic" by renaming it to
vf_cap_flags, which allows other capabilities besides offloading to be
added.

Alan makes it such that if adaptive-rx/tx is enabled, the user cannot
make any manual adjustments to interrupt moderation.  Also makes it so
that if ITR is disabled by adaptive-rx/tx is then enabled, ITR will be
re-enabled.

v2: Dropped patches #1 & #8 from the original patch series submission,
    while Jesse and Jake re-work their patches based on feedback from
    David Miller.  Also removed the duplicate patch 3 that was
    accidentally sent out twice in the previous submission.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom pushed a commit that referenced this issue Sep 18, 2017
This reverts commit aa8db49.

Early demux structs can not be made const. Doing so results in:
[   84.967355] BUG: unable to handle kernel paging request at ffffffff81684b10
[   84.969272] IP: proc_configure_early_demux+0x1e/0x3d
[   84.970544] PGD 1a0a067
[   84.970546] P4D 1a0a067
[   84.971212] PUD 1a0b063
[   84.971733] PMD 80000000016001e1

[   84.972669] Oops: 0003 [#1] SMP
[   84.973065] Modules linked in: ip6table_filter ip6_tables veth vrf
[   84.973833] CPU: 0 PID: 955 Comm: sysctl Not tainted 4.13.0-rc6+ #22
[   84.974612] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[   84.975855] task: ffff88003854ce00 task.stack: ffffc900005a4000
[   84.976580] RIP: 0010:proc_configure_early_demux+0x1e/0x3d
[   84.977253] RSP: 0018:ffffc900005a7dd0 EFLAGS: 00010246
[   84.977891] RAX: ffffffff81684b10 RBX: 0000000000000001 RCX: 0000000000000000
[   84.978759] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000000
[   84.979628] RBP: ffffc900005a7dd0 R08: 0000000000000000 R09: 0000000000000000
[   84.980501] R10: 0000000000000001 R11: 0000000000000008 R12: 0000000000000001
[   84.981373] R13: ffffffffffffffea R14: ffffffff81a9b4c0 R15: 0000000000000002
[   84.982249] FS:  00007feb237b7700(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[   84.983231] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   84.983941] CR2: ffffffff81684b10 CR3: 0000000038492000 CR4: 00000000000406f0
[   84.984817] Call Trace:
[   84.985133]  proc_tcp_early_demux+0x29/0x30

I think this is the second time such a patch has been reverted.

Cc: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom pushed a commit that referenced this issue Sep 18, 2017
current switchdev drivers dont seem to support offloading fdb
entries pointing to the bridge device which have fdb->dst
not set to any port. This patch adds a NULL fdb->dst check in
the switchdev notifier code.

This patch fixes the below NULL ptr dereference:
$bridge fdb add 00:02:00:00:00:33 dev br0 self

[   69.953374] BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
[   69.954044] IP: br_switchdev_fdb_notify+0x29/0x80
[   69.954044] PGD 66527067
[   69.954044] P4D 66527067
[   69.954044] PUD 7899c067
[   69.954044] PMD 0
[   69.954044]
[   69.954044] Oops: 0000 [#1] SMP
[   69.954044] Modules linked in:
[   69.954044] CPU: 1 PID: 3074 Comm: bridge Not tainted 4.13.0-rc6+ #1
[   69.954044] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org
04/01/2014
[   69.954044] task: ffff88007b827140 task.stack: ffffc90001564000
[   69.954044] RIP: 0010:br_switchdev_fdb_notify+0x29/0x80
[   69.954044] RSP: 0018:ffffc90001567918 EFLAGS: 00010246
[   69.954044] RAX: 0000000000000000 RBX: ffff8800795e0880 RCX:
00000000000000c0
[   69.954044] RDX: ffffc90001567920 RSI: 000000000000001c RDI:
ffff8800795d0600
[   69.954044] RBP: ffffc90001567938 R08: ffff8800795d0600 R09:
0000000000000000
[   69.954044] R10: ffffc90001567a88 R11: ffff88007b849400 R12:
ffff8800795e0880
[   69.954044] R13: ffff8800795d0600 R14: ffffffff81ef8880 R15:
000000000000001c
[   69.954044] FS:  00007f93d3085700(0000) GS:ffff88007fd00000(0000)
knlGS:0000000000000000
[   69.954044] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   69.954044] CR2: 0000000000000008 CR3: 0000000066551000 CR4:
00000000000006e0
[   69.954044] Call Trace:
[   69.954044]  fdb_notify+0x3f/0xf0
[   69.954044]  __br_fdb_add.isra.12+0x1a7/0x370
[   69.954044]  br_fdb_add+0x178/0x280
[   69.954044]  rtnl_fdb_add+0x10a/0x200
[   69.954044]  rtnetlink_rcv_msg+0x1b4/0x240
[   69.954044]  ? skb_free_head+0x21/0x40
[   69.954044]  ? rtnl_calcit.isra.18+0xf0/0xf0
[   69.954044]  netlink_rcv_skb+0xed/0x120
[   69.954044]  rtnetlink_rcv+0x15/0x20
[   69.954044]  netlink_unicast+0x180/0x200
[   69.954044]  netlink_sendmsg+0x291/0x370
[   69.954044]  ___sys_sendmsg+0x180/0x2e0
[   69.954044]  ? filemap_map_pages+0x2db/0x370
[   69.954044]  ? do_wp_page+0x11d/0x420
[   69.954044]  ? __handle_mm_fault+0x794/0xd80
[   69.954044]  ? vma_link+0xcb/0xd0
[   69.954044]  __sys_sendmsg+0x4c/0x90
[   69.954044]  SyS_sendmsg+0x12/0x20
[   69.954044]  do_syscall_64+0x63/0xe0
[   69.954044]  entry_SYSCALL64_slow_path+0x25/0x25
[   69.954044] RIP: 0033:0x7f93d2bad690
[   69.954044] RSP: 002b:00007ffc7217a638 EFLAGS: 00000246 ORIG_RAX:
000000000000002e
[   69.954044] RAX: ffffffffffffffda RBX: 00007ffc72182eac RCX:
00007f93d2bad690
[   69.954044] RDX: 0000000000000000 RSI: 00007ffc7217a670 RDI:
0000000000000003
[   69.954044] RBP: 0000000059a1f7f8 R08: 0000000000000006 R09:
000000000000000a
[   69.954044] R10: 00007ffc7217a400 R11: 0000000000000246 R12:
00007ffc7217a670
[   69.954044] R13: 00007ffc72182a98 R14: 00000000006114c0 R15:
00007ffc72182aa0
[   69.954044] Code: 1f 00 66 66 66 66 90 55 48 89 e5 48 83 ec 20 f6 47
20 04 74 0a 83 fe 1c 74 09 83 fe 1d 74 2c c9 66 90 c3 48 8b 47 10 48 8d
55 e8 <48> 8b 70 08 0f b7 47 1e 48 83 c7 18 48 89 7d f0 bf 03 00 00 00
[   69.954044] RIP: br_switchdev_fdb_notify+0x29/0x80 RSP:
ffffc90001567918
[   69.954044] CR2: 0000000000000008
[   69.954044] ---[ end trace 03e9eec4a82c238b ]---

Fixes: 6b26b51 ("net: bridge: Add support for notifying devices about FDB add/del")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom pushed a commit that referenced this issue Sep 18, 2017
If we do not have a master network device attached dst->cpu_dp will be
NULL and accessing cpu_dp->netdev will create a trace similar to the one
below. The correct check is on dst->cpu_dp period.

[    1.004650] DSA: switch 0 0 parsed
[    1.008078] Unable to handle kernel NULL pointer dereference at
virtual address 00000010
[    1.016195] pgd = c0003000
[    1.018918] [00000010] *pgd=80000000004003, *pmd=00000000
[    1.024349] Internal error: Oops: 206 [#1] SMP ARM
[    1.029157] Modules linked in:
[    1.032228] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.13.0-rc6-00071-g45b45afab9bd-dirty #7
[    1.040772] Hardware name: Broadcom STB (Flattened Device Tree)
[    1.046704] task: ee08f840 task.stack: ee090000
[    1.051258] PC is at dsa_register_switch+0x5e0/0x9dc
[    1.056234] LR is at dsa_register_switch+0x5d0/0x9dc
[    1.061211] pc : [<c08fb28c>]    lr : [<c08fb27c>]    psr: 60000213
[    1.067491] sp : ee091d88  ip : 00000000  fp : 0000000c
[    1.072728] r10: 00000000  r9 : 00000001  r8 : ee208010
[    1.077965] r7 : ee2b57b0  r6 : ee2b5780  r5 : 00000000  r4 :
ee208e0c
[    1.084506] r3 : 00000000  r2 : 00040d00  r1 : 2d1b2000  r0 :
00000016
[    1.091050] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
Segment user
[    1.098199] Control: 32c5387d  Table: 00003000  DAC: fffffffd
[    1.103957] Process swapper/0 (pid: 1, stack limit = 0xee090210)

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 6d3c8c0 ("net: dsa: Remove master_netdev and use dst->cpu_dp->netdev")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom pushed a commit that referenced this issue Sep 18, 2017
The commit below added a call to the ->destroy() callback for all qdiscs
which failed in their ->init(), but some were not prepared for such
change and can't handle partially initialized qdisc. HTB is one of them
and if any error occurs before the qdisc watchdog timer and qdisc work are
initialized then we can hit either a null ptr deref (timer->base) when
canceling in ->destroy or lockdep error info about trying to register
a non-static key and a stack dump. So to fix these two move the watchdog
timer and workqueue init before anything that can err out.
To reproduce userspace needs to send broken htb qdisc create request,
tested with a modified tc (q_htb.c).

Trace log:
[ 2710.897602] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 2710.897977] IP: hrtimer_active+0x17/0x8a
[ 2710.898174] PGD 58fab067
[ 2710.898175] P4D 58fab067
[ 2710.898353] PUD 586c0067
[ 2710.898531] PMD 0
[ 2710.898710]
[ 2710.899045] Oops: 0000 [#1] SMP
[ 2710.899232] Modules linked in:
[ 2710.899419] CPU: 1 PID: 950 Comm: tc Not tainted 4.13.0-rc6+ #54
[ 2710.899646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 2710.900035] task: ffff880059ed2700 task.stack: ffff88005ad4c000
[ 2710.900262] RIP: 0010:hrtimer_active+0x17/0x8a
[ 2710.900467] RSP: 0018:ffff88005ad4f960 EFLAGS: 00010246
[ 2710.900684] RAX: 0000000000000000 RBX: ffff88003701e298 RCX: 0000000000000000
[ 2710.900933] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003701e298
[ 2710.901177] RBP: ffff88005ad4f980 R08: 0000000000000001 R09: 0000000000000001
[ 2710.901419] R10: ffff88005ad4f800 R11: 0000000000000400 R12: 0000000000000000
[ 2710.901663] R13: ffff88003701e298 R14: ffffffff822a4540 R15: ffff88005ad4fac0
[ 2710.901907] FS:  00007f2f5e90f740(0000) GS:ffff88005d880000(0000) knlGS:0000000000000000
[ 2710.902277] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2710.902500] CR2: 0000000000000000 CR3: 0000000058ca3000 CR4: 00000000000406e0
[ 2710.902744] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2710.902977] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2710.903180] Call Trace:
[ 2710.903332]  hrtimer_try_to_cancel+0x1a/0x93
[ 2710.903504]  hrtimer_cancel+0x15/0x20
[ 2710.903667]  qdisc_watchdog_cancel+0x12/0x14
[ 2710.903866]  htb_destroy+0x2e/0xf7
[ 2710.904097]  qdisc_create+0x377/0x3fd
[ 2710.904330]  tc_modify_qdisc+0x4d2/0x4fd
[ 2710.904511]  rtnetlink_rcv_msg+0x188/0x197
[ 2710.904682]  ? rcu_read_unlock+0x3e/0x5f
[ 2710.904849]  ? rtnl_newlink+0x729/0x729
[ 2710.905017]  netlink_rcv_skb+0x6c/0xce
[ 2710.905183]  rtnetlink_rcv+0x23/0x2a
[ 2710.905345]  netlink_unicast+0x103/0x181
[ 2710.905511]  netlink_sendmsg+0x326/0x337
[ 2710.905679]  sock_sendmsg_nosec+0x14/0x3f
[ 2710.905847]  sock_sendmsg+0x29/0x2e
[ 2710.906010]  ___sys_sendmsg+0x209/0x28b
[ 2710.906176]  ? do_raw_spin_unlock+0xcd/0xf8
[ 2710.906346]  ? _raw_spin_unlock+0x27/0x31
[ 2710.906514]  ? __handle_mm_fault+0x651/0xdb1
[ 2710.906685]  ? check_chain_key+0xb0/0xfd
[ 2710.906855]  __sys_sendmsg+0x45/0x63
[ 2710.907018]  ? __sys_sendmsg+0x45/0x63
[ 2710.907185]  SyS_sendmsg+0x19/0x1b
[ 2710.907344]  entry_SYSCALL_64_fastpath+0x23/0xc2

Note that probably this bug goes further back because the default qdisc
handling always calls ->destroy on init failure too.

Fixes: 87b60cf ("net_sched: fix error recovery at qdisc creation")
Fixes: 0fbbeb1 ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom pushed a commit that referenced this issue Sep 18, 2017
The below commit added a call to ->destroy() on init failure, but multiq
still frees ->queues on error in init, but ->queues is also freed by
->destroy() thus we get double free and corrupted memory.

Very easy to reproduce (eth0 not multiqueue):
$ tc qdisc add dev eth0 root multiq
RTNETLINK answers: Operation not supported
$ ip l add dumdum type dummy
(crash)

Trace log:
[ 3929.467747] general protection fault: 0000 [#1] SMP
[ 3929.468083] Modules linked in:
[ 3929.468302] CPU: 3 PID: 967 Comm: ip Not tainted 4.13.0-rc6+ #56
[ 3929.468625] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 3929.469124] task: ffff88003716a700 task.stack: ffff88005872c000
[ 3929.469449] RIP: 0010:__kmalloc_track_caller+0x117/0x1be
[ 3929.469746] RSP: 0018:ffff88005872f6a0 EFLAGS: 00010246
[ 3929.470042] RAX: 00000000000002de RBX: 0000000058a59000 RCX: 00000000000002df
[ 3929.470406] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff821f7020
[ 3929.470770] RBP: ffff88005872f6e8 R08: 000000000001f010 R09: 0000000000000000
[ 3929.471133] R10: ffff88005872f730 R11: 0000000000008cdd R12: ff006d75646d7564
[ 3929.471496] R13: 00000000014000c0 R14: ffff88005b403c00 R15: ffff88005b403c00
[ 3929.471869] FS:  00007f0b70480740(0000) GS:ffff88005d980000(0000) knlGS:0000000000000000
[ 3929.472286] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3929.472677] CR2: 00007ffcee4f3000 CR3: 0000000059d45000 CR4: 00000000000406e0
[ 3929.473209] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3929.474109] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3929.474873] Call Trace:
[ 3929.475337]  ? kstrdup_const+0x23/0x25
[ 3929.475863]  kstrdup+0x2e/0x4b
[ 3929.476338]  kstrdup_const+0x23/0x25
[ 3929.478084]  __kernfs_new_node+0x28/0xbc
[ 3929.478478]  kernfs_new_node+0x35/0x55
[ 3929.478929]  kernfs_create_link+0x23/0x76
[ 3929.479478]  sysfs_do_create_link_sd.isra.2+0x85/0xd7
[ 3929.480096]  sysfs_create_link+0x33/0x35
[ 3929.480649]  device_add+0x200/0x589
[ 3929.481184]  netdev_register_kobject+0x7c/0x12f
[ 3929.481711]  register_netdevice+0x373/0x471
[ 3929.482174]  rtnl_newlink+0x614/0x729
[ 3929.482610]  ? rtnl_newlink+0x17f/0x729
[ 3929.483080]  rtnetlink_rcv_msg+0x188/0x197
[ 3929.483533]  ? rcu_read_unlock+0x3e/0x5f
[ 3929.483984]  ? rtnl_newlink+0x729/0x729
[ 3929.484420]  netlink_rcv_skb+0x6c/0xce
[ 3929.484858]  rtnetlink_rcv+0x23/0x2a
[ 3929.485291]  netlink_unicast+0x103/0x181
[ 3929.485735]  netlink_sendmsg+0x326/0x337
[ 3929.486181]  sock_sendmsg_nosec+0x14/0x3f
[ 3929.486614]  sock_sendmsg+0x29/0x2e
[ 3929.486973]  ___sys_sendmsg+0x209/0x28b
[ 3929.487340]  ? do_raw_spin_unlock+0xcd/0xf8
[ 3929.487719]  ? _raw_spin_unlock+0x27/0x31
[ 3929.488092]  ? __handle_mm_fault+0x651/0xdb1
[ 3929.488471]  ? check_chain_key+0xb0/0xfd
[ 3929.488847]  __sys_sendmsg+0x45/0x63
[ 3929.489206]  ? __sys_sendmsg+0x45/0x63
[ 3929.489576]  SyS_sendmsg+0x19/0x1b
[ 3929.489901]  entry_SYSCALL_64_fastpath+0x23/0xc2
[ 3929.490172] RIP: 0033:0x7f0b6fb93690
[ 3929.490423] RSP: 002b:00007ffcee4ed588 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 3929.490881] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007f0b6fb93690
[ 3929.491198] RDX: 0000000000000000 RSI: 00007ffcee4ed5d0 RDI: 0000000000000003
[ 3929.491521] RBP: ffff88005872ff98 R08: 0000000000000001 R09: 0000000000000000
[ 3929.491801] R10: 00007ffcee4ed350 R11: 0000000000000246 R12: 0000000000000002
[ 3929.492075] R13: 000000000066f1a0 R14: 00007ffcee4f5680 R15: 0000000000000000
[ 3929.492352]  ? trace_hardirqs_off_caller+0xa7/0xcf
[ 3929.492590] Code: 8b 45 c0 48 8b 45 b8 74 17 48 8b 4d c8 83 ca ff 44
89 ee 4c 89 f7 e8 83 ca ff ff 49 89 c4 eb 49 49 63 56 20 48 8d 48 01 4d
8b 06 <49> 8b 1c 14 48 89 c2 4c 89 e0 65 49 0f c7 08 0f 94 c0 83 f0 01
[ 3929.493335] RIP: __kmalloc_track_caller+0x117/0x1be RSP: ffff88005872f6a0

Fixes: 87b60cf ("net_sched: fix error recovery at qdisc creation")
Fixes: f07d150 ("multiq: Further multiqueue cleanup")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom pushed a commit that referenced this issue Sep 18, 2017
If sch_hhf fails in its ->init() function (either due to wrong
user-space arguments as below or memory alloc failure of hh_flows) it
will do a null pointer deref of q->hh_flows in its ->destroy() function.

To reproduce the crash:
$ tc qdisc add dev eth0 root hhf quantum 2000000 non_hh_weight 10000000

Crash log:
[  690.654882] BUG: unable to handle kernel NULL pointer dereference at (null)
[  690.655565] IP: hhf_destroy+0x48/0xbc
[  690.655944] PGD 37345067
[  690.655948] P4D 37345067
[  690.656252] PUD 58402067
[  690.656554] PMD 0
[  690.656857]
[  690.657362] Oops: 0000 [#1] SMP
[  690.657696] Modules linked in:
[  690.658032] CPU: 3 PID: 920 Comm: tc Not tainted 4.13.0-rc6+ #57
[  690.658525] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[  690.659255] task: ffff880058578000 task.stack: ffff88005acbc000
[  690.659747] RIP: 0010:hhf_destroy+0x48/0xbc
[  690.660146] RSP: 0018:ffff88005acbf9e0 EFLAGS: 00010246
[  690.660601] RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000
[  690.661155] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff821f63f0
[  690.661710] RBP: ffff88005acbfa08 R08: ffffffff81b10a90 R09: 0000000000000000
[  690.662267] R10: 00000000f42b7019 R11: ffff880058578000 R12: 00000000ffffffea
[  690.662820] R13: ffff8800372f6400 R14: 0000000000000000 R15: 0000000000000000
[  690.663769] FS:  00007f8ae5e8b740(0000) GS:ffff88005d980000(0000) knlGS:0000000000000000
[  690.667069] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  690.667965] CR2: 0000000000000000 CR3: 0000000058523000 CR4: 00000000000406e0
[  690.668918] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  690.669945] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  690.671003] Call Trace:
[  690.671743]  qdisc_create+0x377/0x3fd
[  690.672534]  tc_modify_qdisc+0x4d2/0x4fd
[  690.673324]  rtnetlink_rcv_msg+0x188/0x197
[  690.674204]  ? rcu_read_unlock+0x3e/0x5f
[  690.675091]  ? rtnl_newlink+0x729/0x729
[  690.675877]  netlink_rcv_skb+0x6c/0xce
[  690.676648]  rtnetlink_rcv+0x23/0x2a
[  690.677405]  netlink_unicast+0x103/0x181
[  690.678179]  netlink_sendmsg+0x326/0x337
[  690.678958]  sock_sendmsg_nosec+0x14/0x3f
[  690.679743]  sock_sendmsg+0x29/0x2e
[  690.680506]  ___sys_sendmsg+0x209/0x28b
[  690.681283]  ? __handle_mm_fault+0xc7d/0xdb1
[  690.681915]  ? check_chain_key+0xb0/0xfd
[  690.682449]  __sys_sendmsg+0x45/0x63
[  690.682954]  ? __sys_sendmsg+0x45/0x63
[  690.683471]  SyS_sendmsg+0x19/0x1b
[  690.683974]  entry_SYSCALL_64_fastpath+0x23/0xc2
[  690.684516] RIP: 0033:0x7f8ae529d690
[  690.685016] RSP: 002b:00007fff26d2d6b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  690.685931] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007f8ae529d690
[  690.686573] RDX: 0000000000000000 RSI: 00007fff26d2d700 RDI: 0000000000000003
[  690.687047] RBP: ffff88005acbff98 R08: 0000000000000001 R09: 0000000000000000
[  690.687519] R10: 00007fff26d2d480 R11: 0000000000000246 R12: 0000000000000002
[  690.687996] R13: 0000000001258070 R14: 0000000000000001 R15: 0000000000000000
[  690.688475]  ? trace_hardirqs_off_caller+0xa7/0xcf
[  690.688887] Code: 00 00 e8 2a 02 ae ff 49 8b bc 1d 60 02 00 00 48 83
c3 08 e8 19 02 ae ff 48 83 fb 20 75 dc 45 31 f6 4d 89 f7 4d 03 bd 20 02
00 00 <49> 8b 07 49 39 c7 75 24 49 83 c6 10 49 81 fe 00 40 00 00 75 e1
[  690.690200] RIP: hhf_destroy+0x48/0xbc RSP: ffff88005acbf9e0
[  690.690636] CR2: 0000000000000000

Fixes: 87b60cf ("net_sched: fix error recovery at qdisc creation")
Fixes: 10239ed ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom pushed a commit that referenced this issue Sep 18, 2017
CBQ can fail on ->init by wrong nl attributes or simply for missing any,
f.e. if it's set as a default qdisc then TCA_OPTIONS (opt) will be NULL
when it is activated. The first thing init does is parse opt but it will
dereference a null pointer if used as a default qdisc, also since init
failure at default qdisc invokes ->reset() which cancels all timers then
we'll also dereference two more null pointers (timer->base) as they were
never initialized.

To reproduce:
$ sysctl net.core.default_qdisc=cbq
$ ip l set ethX up

Crash log of the first null ptr deref:
[44727.907454] BUG: unable to handle kernel NULL pointer dereference at (null)
[44727.907600] IP: cbq_init+0x27/0x205
[44727.907676] PGD 59ff4067
[44727.907677] P4D 59ff4067
[44727.907742] PUD 59c70067
[44727.907807] PMD 0
[44727.907873]
[44727.907982] Oops: 0000 [#1] SMP
[44727.908054] Modules linked in:
[44727.908126] CPU: 1 PID: 21312 Comm: ip Not tainted 4.13.0-rc6+ #60
[44727.908235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[44727.908477] task: ffff88005ad42700 task.stack: ffff880037214000
[44727.908672] RIP: 0010:cbq_init+0x27/0x205
[44727.908838] RSP: 0018:ffff8800372175f0 EFLAGS: 00010286
[44727.909018] RAX: ffffffff816c3852 RBX: ffff880058c53800 RCX: 0000000000000000
[44727.909222] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffff8800372175f8
[44727.909427] RBP: ffff880037217650 R08: ffffffff81b0f380 R09: 0000000000000000
[44727.909631] R10: ffff880037217660 R11: 0000000000000020 R12: ffffffff822a44c0
[44727.909835] R13: ffff880058b92000 R14: 00000000ffffffff R15: 0000000000000001
[44727.910040] FS:  00007ff8bc583740(0000) GS:ffff88005d880000(0000) knlGS:0000000000000000
[44727.910339] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[44727.910525] CR2: 0000000000000000 CR3: 00000000371e5000 CR4: 00000000000406e0
[44727.910731] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[44727.910936] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[44727.911141] Call Trace:
[44727.911291]  ? lockdep_init_map+0xb6/0x1ba
[44727.911461]  ? qdisc_alloc+0x14e/0x187
[44727.911626]  qdisc_create_dflt+0x7a/0x94
[44727.911794]  ? dev_activate+0x129/0x129
[44727.911959]  attach_one_default_qdisc+0x36/0x63
[44727.912132]  netdev_for_each_tx_queue+0x3d/0x48
[44727.912305]  dev_activate+0x4b/0x129
[44727.912468]  __dev_open+0xe7/0x104
[44727.912631]  __dev_change_flags+0xc6/0x15c
[44727.912799]  dev_change_flags+0x25/0x59
[44727.912966]  do_setlink+0x30c/0xb3f
[44727.913129]  ? check_chain_key+0xb0/0xfd
[44727.913294]  ? check_chain_key+0xb0/0xfd
[44727.913463]  rtnl_newlink+0x3a4/0x729
[44727.913626]  ? rtnl_newlink+0x117/0x729
[44727.913801]  ? ns_capable_common+0xd/0xb1
[44727.913968]  ? ns_capable+0x13/0x15
[44727.914131]  rtnetlink_rcv_msg+0x188/0x197
[44727.914300]  ? rcu_read_unlock+0x3e/0x5f
[44727.914465]  ? rtnl_newlink+0x729/0x729
[44727.914630]  netlink_rcv_skb+0x6c/0xce
[44727.914796]  rtnetlink_rcv+0x23/0x2a
[44727.914956]  netlink_unicast+0x103/0x181
[44727.915122]  netlink_sendmsg+0x326/0x337
[44727.915291]  sock_sendmsg_nosec+0x14/0x3f
[44727.915459]  sock_sendmsg+0x29/0x2e
[44727.915619]  ___sys_sendmsg+0x209/0x28b
[44727.915784]  ? do_raw_spin_unlock+0xcd/0xf8
[44727.915954]  ? _raw_spin_unlock+0x27/0x31
[44727.916121]  ? __handle_mm_fault+0x651/0xdb1
[44727.916290]  ? check_chain_key+0xb0/0xfd
[44727.916461]  __sys_sendmsg+0x45/0x63
[44727.916626]  ? __sys_sendmsg+0x45/0x63
[44727.916792]  SyS_sendmsg+0x19/0x1b
[44727.916950]  entry_SYSCALL_64_fastpath+0x23/0xc2
[44727.917125] RIP: 0033:0x7ff8bbc96690
[44727.917286] RSP: 002b:00007ffc360991e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[44727.917579] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007ff8bbc96690
[44727.917783] RDX: 0000000000000000 RSI: 00007ffc36099230 RDI: 0000000000000003
[44727.917987] RBP: ffff880037217f98 R08: 0000000000000001 R09: 0000000000000003
[44727.918190] R10: 00007ffc36098fb0 R11: 0000000000000246 R12: 0000000000000006
[44727.918393] R13: 000000000066f1a0 R14: 00007ffc360a12e0 R15: 0000000000000000
[44727.918597]  ? trace_hardirqs_off_caller+0xa7/0xcf
[44727.918774] Code: 41 5f 5d c3 66 66 66 66 90 55 48 8d 56 04 45 31 c9
49 c7 c0 80 f3 b0 81 48 89 e5 41 55 41 54 53 48 89 fb 48 8d 7d a8 48 83
ec 48 <0f> b7 0e be 07 00 00 00 83 e9 04 e8 e6 f7 d8 ff 85 c0 0f 88 bb
[44727.919332] RIP: cbq_init+0x27/0x205 RSP: ffff8800372175f0
[44727.919516] CR2: 0000000000000000

Fixes: 0fbbeb1 ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom pushed a commit that referenced this issue Sep 18, 2017
netem can fail in ->init due to missing options (either not supplied by
user-space or used as a default qdisc) causing a timer->base null
pointer deref in its ->destroy() and ->reset() callbacks.

Reproduce:
$ sysctl net.core.default_qdisc=netem
$ ip l set ethX up

Crash log:
[ 1814.846943] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 1814.847181] IP: hrtimer_active+0x17/0x8a
[ 1814.847270] PGD 59c34067
[ 1814.847271] P4D 59c34067
[ 1814.847337] PUD 37374067
[ 1814.847403] PMD 0
[ 1814.847468]
[ 1814.847582] Oops: 0000 [#1] SMP
[ 1814.847655] Modules linked in: sch_netem(O) sch_fq_codel(O)
[ 1814.847761] CPU: 3 PID: 1573 Comm: ip Tainted: G           O 4.13.0-rc6+ #62
[ 1814.847884] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 1814.848043] task: ffff88003723a700 task.stack: ffff88005adc8000
[ 1814.848235] RIP: 0010:hrtimer_active+0x17/0x8a
[ 1814.848407] RSP: 0018:ffff88005adcb590 EFLAGS: 00010246
[ 1814.848590] RAX: 0000000000000000 RBX: ffff880058e359d8 RCX: 0000000000000000
[ 1814.848793] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880058e359d8
[ 1814.848998] RBP: ffff88005adcb5b0 R08: 00000000014080c0 R09: 00000000ffffffff
[ 1814.849204] R10: ffff88005adcb660 R11: 0000000000000020 R12: 0000000000000000
[ 1814.849410] R13: ffff880058e359d8 R14: 00000000ffffffff R15: 0000000000000001
[ 1814.849616] FS:  00007f733bbca740(0000) GS:ffff88005d980000(0000) knlGS:0000000000000000
[ 1814.849919] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1814.850107] CR2: 0000000000000000 CR3: 0000000059f0d000 CR4: 00000000000406e0
[ 1814.850313] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1814.850518] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1814.850723] Call Trace:
[ 1814.850875]  hrtimer_try_to_cancel+0x1a/0x93
[ 1814.851047]  hrtimer_cancel+0x15/0x20
[ 1814.851211]  qdisc_watchdog_cancel+0x12/0x14
[ 1814.851383]  netem_reset+0xe6/0xed [sch_netem]
[ 1814.851561]  qdisc_destroy+0x8b/0xe5
[ 1814.851723]  qdisc_create_dflt+0x86/0x94
[ 1814.851890]  ? dev_activate+0x129/0x129
[ 1814.852057]  attach_one_default_qdisc+0x36/0x63
[ 1814.852232]  netdev_for_each_tx_queue+0x3d/0x48
[ 1814.852406]  dev_activate+0x4b/0x129
[ 1814.852569]  __dev_open+0xe7/0x104
[ 1814.852730]  __dev_change_flags+0xc6/0x15c
[ 1814.852899]  dev_change_flags+0x25/0x59
[ 1814.853064]  do_setlink+0x30c/0xb3f
[ 1814.853228]  ? check_chain_key+0xb0/0xfd
[ 1814.853396]  ? check_chain_key+0xb0/0xfd
[ 1814.853565]  rtnl_newlink+0x3a4/0x729
[ 1814.853728]  ? rtnl_newlink+0x117/0x729
[ 1814.853905]  ? ns_capable_common+0xd/0xb1
[ 1814.854072]  ? ns_capable+0x13/0x15
[ 1814.854234]  rtnetlink_rcv_msg+0x188/0x197
[ 1814.854404]  ? rcu_read_unlock+0x3e/0x5f
[ 1814.854572]  ? rtnl_newlink+0x729/0x729
[ 1814.854737]  netlink_rcv_skb+0x6c/0xce
[ 1814.854902]  rtnetlink_rcv+0x23/0x2a
[ 1814.855064]  netlink_unicast+0x103/0x181
[ 1814.855230]  netlink_sendmsg+0x326/0x337
[ 1814.855398]  sock_sendmsg_nosec+0x14/0x3f
[ 1814.855584]  sock_sendmsg+0x29/0x2e
[ 1814.855747]  ___sys_sendmsg+0x209/0x28b
[ 1814.855912]  ? do_raw_spin_unlock+0xcd/0xf8
[ 1814.856082]  ? _raw_spin_unlock+0x27/0x31
[ 1814.856251]  ? __handle_mm_fault+0x651/0xdb1
[ 1814.856421]  ? check_chain_key+0xb0/0xfd
[ 1814.856592]  __sys_sendmsg+0x45/0x63
[ 1814.856755]  ? __sys_sendmsg+0x45/0x63
[ 1814.856923]  SyS_sendmsg+0x19/0x1b
[ 1814.857083]  entry_SYSCALL_64_fastpath+0x23/0xc2
[ 1814.857256] RIP: 0033:0x7f733b2dd690
[ 1814.857419] RSP: 002b:00007ffe1d3387d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 1814.858238] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007f733b2dd690
[ 1814.858445] RDX: 0000000000000000 RSI: 00007ffe1d338820 RDI: 0000000000000003
[ 1814.858651] RBP: ffff88005adcbf98 R08: 0000000000000001 R09: 0000000000000003
[ 1814.858856] R10: 00007ffe1d3385a0 R11: 0000000000000246 R12: 0000000000000002
[ 1814.859060] R13: 000000000066f1a0 R14: 00007ffe1d3408d0 R15: 0000000000000000
[ 1814.859267]  ? trace_hardirqs_off_caller+0xa7/0xcf
[ 1814.859446] Code: 10 55 48 89 c7 48 89 e5 e8 45 a1 fb ff 31 c0 5d c3
31 c0 c3 66 66 66 66 90 55 48 89 e5 41 56 41 55 41 54 53 49 89 fd 49 8b
45 30 <4c> 8b 20 41 8b 5c 24 38 31 c9 31 d2 48 c7 c7 50 8e 1d 82 41 89
[ 1814.860022] RIP: hrtimer_active+0x17/0x8a RSP: ffff88005adcb590
[ 1814.860214] CR2: 0000000000000000

Fixes: 87b60cf ("net_sched: fix error recovery at qdisc creation")
Fixes: 0fbbeb1 ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom pushed a commit that referenced this issue Sep 18, 2017
sch_tbf calls qdisc_watchdog_cancel() in both its ->reset and ->destroy
callbacks but it may fail before the timer is initialized due to missing
options (either not supplied by user-space or set as a default qdisc),
also q->qdisc is used by ->reset and ->destroy so we need it initialized.

Reproduce:
$ sysctl net.core.default_qdisc=tbf
$ ip l set ethX up

Crash log:
[  959.160172] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[  959.160323] IP: qdisc_reset+0xa/0x5c
[  959.160400] PGD 59cdb067
[  959.160401] P4D 59cdb067
[  959.160466] PUD 59ccb067
[  959.160532] PMD 0
[  959.160597]
[  959.160706] Oops: 0000 [#1] SMP
[  959.160778] Modules linked in: sch_tbf sch_sfb sch_prio sch_netem
[  959.160891] CPU: 2 PID: 1562 Comm: ip Not tainted 4.13.0-rc6+ #62
[  959.160998] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[  959.161157] task: ffff880059c9a700 task.stack: ffff8800376d0000
[  959.161263] RIP: 0010:qdisc_reset+0xa/0x5c
[  959.161347] RSP: 0018:ffff8800376d3610 EFLAGS: 00010286
[  959.161531] RAX: ffffffffa001b1dd RBX: ffff8800373a2800 RCX: 0000000000000000
[  959.161733] RDX: ffffffff8215f160 RSI: ffffffff8215f160 RDI: 0000000000000000
[  959.161939] RBP: ffff8800376d3618 R08: 00000000014080c0 R09: 00000000ffffffff
[  959.162141] R10: ffff8800376d3578 R11: 0000000000000020 R12: ffffffffa001d2c0
[  959.162343] R13: ffff880037538000 R14: 00000000ffffffff R15: 0000000000000001
[  959.162546] FS:  00007fcc5126b740(0000) GS:ffff88005d900000(0000) knlGS:0000000000000000
[  959.162844] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  959.163030] CR2: 0000000000000018 CR3: 000000005abc4000 CR4: 00000000000406e0
[  959.163233] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  959.163436] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  959.163638] Call Trace:
[  959.163788]  tbf_reset+0x19/0x64 [sch_tbf]
[  959.163957]  qdisc_destroy+0x8b/0xe5
[  959.164119]  qdisc_create_dflt+0x86/0x94
[  959.164284]  ? dev_activate+0x129/0x129
[  959.164449]  attach_one_default_qdisc+0x36/0x63
[  959.164623]  netdev_for_each_tx_queue+0x3d/0x48
[  959.164795]  dev_activate+0x4b/0x129
[  959.164957]  __dev_open+0xe7/0x104
[  959.165118]  __dev_change_flags+0xc6/0x15c
[  959.165287]  dev_change_flags+0x25/0x59
[  959.165451]  do_setlink+0x30c/0xb3f
[  959.165613]  ? check_chain_key+0xb0/0xfd
[  959.165782]  rtnl_newlink+0x3a4/0x729
[  959.165947]  ? rtnl_newlink+0x117/0x729
[  959.166121]  ? ns_capable_common+0xd/0xb1
[  959.166288]  ? ns_capable+0x13/0x15
[  959.166450]  rtnetlink_rcv_msg+0x188/0x197
[  959.166617]  ? rcu_read_unlock+0x3e/0x5f
[  959.166783]  ? rtnl_newlink+0x729/0x729
[  959.166948]  netlink_rcv_skb+0x6c/0xce
[  959.167113]  rtnetlink_rcv+0x23/0x2a
[  959.167273]  netlink_unicast+0x103/0x181
[  959.167439]  netlink_sendmsg+0x326/0x337
[  959.167607]  sock_sendmsg_nosec+0x14/0x3f
[  959.167772]  sock_sendmsg+0x29/0x2e
[  959.167932]  ___sys_sendmsg+0x209/0x28b
[  959.168098]  ? do_raw_spin_unlock+0xcd/0xf8
[  959.168267]  ? _raw_spin_unlock+0x27/0x31
[  959.168432]  ? __handle_mm_fault+0x651/0xdb1
[  959.168602]  ? check_chain_key+0xb0/0xfd
[  959.168773]  __sys_sendmsg+0x45/0x63
[  959.168934]  ? __sys_sendmsg+0x45/0x63
[  959.169100]  SyS_sendmsg+0x19/0x1b
[  959.169260]  entry_SYSCALL_64_fastpath+0x23/0xc2
[  959.169432] RIP: 0033:0x7fcc5097e690
[  959.169592] RSP: 002b:00007ffd0d5c7b48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  959.169887] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007fcc5097e690
[  959.170089] RDX: 0000000000000000 RSI: 00007ffd0d5c7b90 RDI: 0000000000000003
[  959.170292] RBP: ffff8800376d3f98 R08: 0000000000000001 R09: 0000000000000003
[  959.170494] R10: 00007ffd0d5c7910 R11: 0000000000000246 R12: 0000000000000006
[  959.170697] R13: 000000000066f1a0 R14: 00007ffd0d5cfc40 R15: 0000000000000000
[  959.170900]  ? trace_hardirqs_off_caller+0xa7/0xcf
[  959.171076] Code: 00 41 c7 84 24 14 01 00 00 00 00 00 00 41 c7 84 24
98 00 00 00 00 00 00 00 41 5c 41 5d 41 5e 5d c3 66 66 66 66 90 55 48 89
e5 53 <48> 8b 47 18 48 89 fb 48 8b 40 48 48 85 c0 74 02 ff d0 48 8b bb
[  959.171637] RIP: qdisc_reset+0xa/0x5c RSP: ffff8800376d3610
[  959.171821] CR2: 0000000000000018

Fixes: 87b60cf ("net_sched: fix error recovery at qdisc creation")
Fixes: 0fbbeb1 ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom pushed a commit that referenced this issue Sep 18, 2017
This reverts commit 7ad813f ("net: phy:
Correctly process PHY_HALTED in phy_stop_machine()") because it is
creating the possibility for a NULL pointer dereference.

David Daney provide the following call trace and diagram of events:

When ndo_stop() is called we call:

 phy_disconnect()
    +---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL;
    +---> phy_stop_machine()
    |      +---> phy_state_machine()
    |              +----> queue_delayed_work(): Work queued.
    +--->phy_detach() implies: phydev->attached_dev = NULL;

Now at a later time the queued work does:

 phy_state_machine()
    +---->netif_carrier_off(phydev->attached_dev): Oh no! It is NULL:

 CPU 12 Unable to handle kernel paging request at virtual address
0000000000000048, epc == ffffffff80de37ec, ra == ffffffff80c7c
Oops[#1]:
CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1
Workqueue: events_power_efficient phy_state_machine
task: 80000004021ed100 task.stack: 8000000409d70000
$ 0   : 0000000000000000 ffffffff84720060 0000000000000048 0000000000000004
$ 4   : 0000000000000000 0000000000000001 0000000000000004 0000000000000000
$ 8   : 0000000000000000 0000000000000000 00000000ffff98f3 0000000000000000
$12   : 8000000409d73fe0 0000000000009c00 ffffffff846547c8 000000000000af3b
$16   : 80000004096bab68 80000004096babd0 0000000000000000 80000004096ba800
$20   : 0000000000000000 0000000000000000 ffffffff81090000 0000000000000008
$24   : 0000000000000061 ffffffff808637b0
$28   : 8000000409d70000 8000000409d73cf0 80000000271bd300 ffffffff80c7804c
Hi    : 000000000000002a
Lo    : 000000000000003f
epc   : ffffffff80de37ec netif_carrier_off+0xc/0x58
ra    : ffffffff80c7804c phy_state_machine+0x48c/0x4f8
Status: 14009ce3        KX SX UX KERNEL EXL IE
Cause : 00800008 (ExcCode 02)
BadVA : 0000000000000048
PrId  : 000d9501 (Cavium Octeon III)
Modules linked in:
Process kworker/12:1 (pid: 1502, threadinfo=8000000409d70000,
task=80000004021ed100, tls=0000000000000000)
Stack : 8000000409a54000 80000004096bab68 80000000271bd300 80000000271c1e00
        0000000000000000 ffffffff808a1708 8000000409a54000 80000000271bd300
        80000000271bd320 8000000409a54030 ffffffff80ff0f00 0000000000000001
        ffffffff81090000 ffffffff808a1ac0 8000000402182080 ffffffff84650000
        8000000402182080 ffffffff84650000 ffffffff80ff0000 8000000409a54000
        ffffffff808a1970 0000000000000000 80000004099e8000 8000000402099240
        0000000000000000 ffffffff808a8598 0000000000000000 8000000408eeeb00
        8000000409a54000 00000000810a1d00 0000000000000000 8000000409d73de8
        8000000409d73de8 0000000000000088 000000000c009c00 8000000409d73e08
        8000000409d73e08 8000000402182080 ffffffff808a84d0 8000000402182080
        ...
Call Trace:
[<ffffffff80de37ec>] netif_carrier_off+0xc/0x58
[<ffffffff80c7804c>] phy_state_machine+0x48c/0x4f8
[<ffffffff808a1708>] process_one_work+0x158/0x368
[<ffffffff808a1ac0>] worker_thread+0x150/0x4c0
[<ffffffff808a8598>] kthread+0xc8/0xe0
[<ffffffff808617f0>] ret_from_kernel_thread+0x14/0x1c

The original motivation for this change originated from Marc Gonzales
indicating that his network driver did not have its adjust_link callback
executing with phydev->link = 0 while he was expecting it.

PHYLIB has never made any such guarantees ever because phy_stop() merely just
tells the workqueue to move into PHY_HALTED state which will happen
asynchronously.

Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reported-by: David Daney <ddaney.cavm@gmail.com>
Fixes: 7ad813f ("net: phy: Correctly process PHY_HALTED in phy_stop_machine()")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom pushed a commit that referenced this issue Sep 18, 2017
Guillaume Nault says:

====================
l2tp: session creation fixes

The session creation process has a few issues wrt. concurrent tunnel
deletion.

Patch #1 avoids creating sessions in tunnels that are getting removed.
This prevents races where sessions could try to take tunnel resources
that were already released.

Patch #2 removes some racy l2tp_tunnel_find() calls in session creation
callbacks. Together with path #1 it ensures that sessions can only
access tunnel resources that are guaranteed to remain valid during the
session creation process.

There are other problems with how sessions are created: pseudo-wire
specific data are set after the session is added to the tunnel. So
the session can be used, or deleted, before it has been completely
initialised. Separating session allocation from session registration
would be necessary, but we'd still have circular dependencies
preventing race-free registration. I'll consider this issue in future
series.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom added a commit that referenced this issue Sep 9, 2018
The reason for this patch is make sure that we can reach the point where
we call ath10k_mac_tx_lock if we have too many pending msdus.

If the lock limit is higher than max_num_pending, we will not reach this
point in the code without this patch.

This patch should be squashed with:

"ath10k: increase TX lock limit for high latency devices"
"ath10k: add htt_tx num_pending window"

It needs more testing, as the following problem occured when powering
off a NITROGEN6 board:

[ 2444.953411] wlan0: deauthenticating from 60:38:e0:c7:6b:3a by local choice (Reason: 3=DEAUTH_LEAVING)
[ 2444.976335] Unable to handle kernel NULL pointer dereference at virtual address 00000004
[ 2444.984658] pgd = 11e5d79d
[ 2444.987384] [00000004] *pgd=00000000
[ 2444.990987] Internal error: Oops: 817 [#1] SMP ARM
[ 2444.995789] Modules linked in: ath10k_sdio ath10k_core ath coda imx_vdoa v4l2_mem2mem videobuf2_vmalloc dw_hdmi_ahb_audio evbug
[ 2445.007332] CPU: 0 PID: 89 Comm: irq/64-mmc0 Not tainted 4.18.0-wt-ath+ #8
[ 2445.014213] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 2445.020757] PC is at ieee80211_tx_dequeue+0x4c4/0xb64
[ 2445.025825] LR is at lock_is_held_type+0x48/0x6c
[ 2445.030451] pc : [<c0bbd5a8>]    lr : [<c0182df8>]    psr: 600f0013
[ 2445.036723] sp : e2ea3d18  ip : e2ea3cc8  fp : e2ea3dbc
[ 2445.041954] r10: e276c0d0  r9 : e276d740  r8 : e276e000
[ 2445.047187] r7 : 00000000  r6 : e276c000  r5 : df801c38  r4 : e09c4aa8
[ 2445.053722] r3 : 00000010  r2 : 00000000  r1 : 00000000  r0 : e276c008
[ 2445.060258] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[ 2445.067401] Control: 10c5387d  Table: 2f4dc04a  DAC: 00000051
[ 2445.073156] Process irq/64-mmc0 (pid: 89, stack limit = 0x1fa91abb)
[ 2445.079430] Stack: (0xe2ea3d18 to 0xe2ea4000)
[ 2445.083796] 3d00:                                                       df800640 e276d768
[ 2445.091984] 3d20: e276c0a0 df8006dc c0185750 e276c08c df800640 e276c000 e276e03c df801d30
[ 2445.100171] 3d40: 00000000 df800640 e2ea3d74 e2ea3d58 c0185828 c0185708 000001ff ffffe000
[ 2445.108359] 3d60: bf05d20c df801d30 e2ea3d84 e2ea3d78 c0185978 c0185780 e2ea3da4 e2ea3d88
[ 2445.116547] 3d80: c0135288 c0185970 bf05d20c c498e019 df801420 e276c0d0 df801c38 df801420
[ 2445.124734] 3da0: df801d30 00000000 df800640 e276d768 e2ea3e04 e2ea3dc0 bf05d22c c0bbd0f0
[ 2445.132920] 3dc0: c0c21e6c bf056970 df801420 c1208908 00000000 e2e74f68 bf056970 e276c0dc
[ 2445.141107] 3de0: df800640 e276c0d0 df801420 df8022e0 0000000f 00000000 e2ea3e4c e2ea3e08
[ 2445.149294] 3e00: bf05d564 bf05d1bc 00000000 00000000 bf05d400 df88314c df8022c0 e276c0dc
[ 2445.157481] 3e20: df801420 df88314c df882420 00000000 df801420 00180200 00036cf2 00000001
[ 2445.165669] 3e40: e2ea3e94 e2ea3e50 bf0c8ec8 bf05d3d4 00000001 df805868 e2ea3e84 01ea3e68
[ 2445.173855] 3e60: c0663f08 c498e019 00000000 e2e5a000 c1208908 e2e81000 00000001 e285b200
[ 2445.182043] 3e80: e2e57e80 00000001 e2ea3ed4 e2ea3e98 c0853214 bf0c8e34 c0161f90 00000100
[ 2445.190229] 3ea0: 00000200 c498e019 600f0013 e2e5a000 00000100 e2e5a648 00000001 e285b200
[ 2445.198416] 3ec0: e2e57e80 00000001 e2ea3eec e2ea3ed8 c08533d0 c08531d4 e2e5a480 00000100
[ 2445.206602] 3ee0: e2ea3f0c e2ea3ef0 c085e64c c0853390 e2e57e80 e285b200 e2e57ea4 00000001
[ 2445.214790] 3f00: e2ea3f2c e2ea3f10 c019804c c085e5d8 ffffe000 00000000 e2e57ea4 00000001
[ 2445.222976] 3f20: e2ea3f74 e2ea3f30 c01983ac c019802c c0c21eb0 c0198020 e2ea2000 00000000
[ 2445.231164] 3f40: c0198160 c498e019 c0154f7c 00000000 e2744780 e2e88000 e2ea2000 e2e57e80
[ 2445.239351] 3f60: c0198244 e227dc00 e2ea3fac e2ea3f78 c015544c c0198250 e27447b8 e27447b8
[ 2445.247538] 3f80: e2ea3fac e2e88000 c01552f8 00000000 00000000 00000000 00000000 00000000
[ 2445.255725] 3fa0: 00000000 e2ea3fb0 c01010b4 c0155304 00000000 00000000 00000000 00000000
[ 2445.263912] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 2445.272098] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[ 2445.280279] Backtrace:
[ 2445.282897] [<c0bbd0e4>] (ieee80211_tx_dequeue) from [<bf05d22c>] (ath10k_mac_tx_push_txq+0x7c/0x218 [ath10k_core])
[ 2445.293344]  r10:e276d768 r9:df800640 r8:00000000 r7:df801d30 r6:df801420 r5:df801c38
[ 2445.301179]  r4:e276c0d0
[ 2445.303948] [<bf05d1b0>] (ath10k_mac_tx_push_txq [ath10k_core]) from [<bf05d564>] (ath10k_mac_tx_push_pending+0x19c/0x254 [ath10k_core])
[ 2445.316219]  r10:00000000 r9:0000000f r8:df8022e0 r7:df801420 r6:e276c0d0 r5:df800640
[ 2445.324056]  r4:e276c0dc
[ 2445.326721] [<bf05d3c8>] (ath10k_mac_tx_push_pending [ath10k_core]) from [<bf0c8ec8>] (ath10k_sdio_irq_handler+0xa0/0x3d0 [ath10k_sdio])
[ 2445.338990]  r10:00000001 r9:00036cf2 r8:00180200 r7:df801420 r6:00000000 r5:df882420
[ 2445.346826]  r4:df88314c
[ 2445.349384] [<bf0c8e28>] (ath10k_sdio_irq_handler [ath10k_sdio]) from [<c0853214>] (process_sdio_pending_irqs+0x4c/0x1bc)
[ 2445.360349]  r10:00000001 r9:e2e57e80 r8:e285b200 r7:00000001 r6:e2e81000 r5:c1208908
[ 2445.368184]  r4:e2e5a000
[ 2445.370732] [<c08531c8>] (process_sdio_pending_irqs) from [<c08533d0>] (sdio_run_irqs+0x4c/0x68)
[ 2445.379527]  r10:00000001 r9:e2e57e80 r8:e285b200 r7:00000001 r6:e2e5a648 r5:00000100
[ 2445.387362]  r4:e2e5a000
[ 2445.389911] [<c0853384>] (sdio_run_irqs) from [<c085e64c>] (sdhci_thread_irq+0x80/0xbc)
[ 2445.397921]  r5:00000100 r4:e2e5a480
[ 2445.401513] [<c085e5cc>] (sdhci_thread_irq) from [<c019804c>] (irq_thread_fn+0x2c/0x64)
[ 2445.409524]  r7:00000001 r6:e2e57ea4 r5:e285b200 r4:e2e57e80
[ 2445.415197] [<c0198020>] (irq_thread_fn) from [<c01983ac>] (irq_thread+0x168/0x24c)
[ 2445.422861]  r7:00000001 r6:e2e57ea4 r5:00000000 r4:ffffe000
[ 2445.428535] [<c0198244>] (irq_thread) from [<c015544c>] (kthread+0x154/0x16c)
[ 2445.435680]  r10:e227dc00 r9:c0198244 r8:e2e57e80 r7:e2ea2000 r6:e2e88000 r5:e2744780
[ 2445.443515]  r4:00000000
[ 2445.446063] [<c01552f8>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
[ 2445.453291] Exception stack(0xe2ea3fb0 to 0xe2ea3ff8)
[ 2445.458351] 3fa0:                                     00000000 00000000 00000000 00000000
[ 2445.466537] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 2445.474722] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 2445.481346]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c01552f8
[ 2445.489181]  r4:e2e88000
[ 2445.491726] Code: e1560003 e1c820d0 0a000006 e3a01000 (e5823004)
[ 2445.497883] ---[ end trace 0e11545f0f62060d ]---
[ 2445.502511] Kernel panic - not syncing: Fatal exception in interrupt
[ 2445.508888] CPU1: stopping
[ 2445.511610] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D           4.18.0-wt-ath+ #8
[ 2445.519622] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 2445.526155] Backtrace:
[ 2445.528623] [<c010fa7c>] (dump_backtrace) from [<c010fd7c>] (show_stack+0x20/0x24)
[ 2445.536203]  r7:00000000 r6:60040193 r5:00000000 r4:c12ce41c
[ 2445.541876] [<c010fd5c>] (show_stack) from [<c0c03074>] (dump_stack+0xb4/0xec)
[ 2445.549114] [<c0c02fc0>] (dump_stack) from [<c0113eec>] (handle_IPI+0x344/0x38c)
[ 2445.556522]  r10:00000001 r9:ffffe000 r8:c12d6804 r7:00000000 r6:00000004 r5:c1208e7c
[ 2445.564359]  r4:c11d90e8 r3:c498e019
[ 2445.567948] [<c0113ba8>] (handle_IPI) from [<c01024c4>] (gic_handle_irq+0xb8/0xcc)
[ 2445.575529]  r10:c125d6cc r9:e22c1ed8 r8:c12090a8 r7:f4000100 r6:000003ff r5:000003eb
[ 2445.583365]  r4:f400010c
[ 2445.585910] [<c010240c>] (gic_handle_irq) from [<c0101a30>] (__irq_svc+0x70/0x98)
[ 2445.593400] Exception stack(0xe22c1ed8 to 0xe22c1f20)
[ 2445.598460] 1ec0:                                                       c083f588 e22b9900
[ 2445.606648] 1ee0: 00000000 00000000 c12d7114 63b01e07 00000001 e5fab9f8 6352718f 00000239
[ 2445.614835] 1f00: 00000239 e22c1f6c e22c1f18 e22c1f28 c0185978 c083f58c 60040013 ffffffff
[ 2445.623022]  r10:00000239 r9:e22c0000 r8:6352718f r7:e22c1f0c r6:ffffffff r5:60040013
[ 2445.630858]  r4:c083f58c
[ 2445.633405] [<c083f404>] (cpuidle_enter_state) from [<c083f8c4>] (cpuidle_enter+0x24/0x28)
[ 2445.641679]  r10:c1208908 r9:c120f8c8 r8:e5fab9f8 r7:c1208970 r6:00000002 r5:c1208930
[ 2445.649515]  r4:ffffe000
[ 2445.652063] [<c083f8a0>] (cpuidle_enter) from [<c0167a08>] (call_cpuidle+0x30/0x4c)
[ 2445.659729] [<c01679d8>] (call_cpuidle) from [<c0167e38>] (do_idle+0x230/0x2c4)
[ 2445.667048] [<c0167c08>] (do_idle) from [<c0168298>] (cpu_startup_entry+0x28/0x30)
[ 2445.674628]  r10:00000000 r9:412fc09a r8:1000406a r7:c12f1720 r6:10c0387d r5:00000001
[ 2445.682463]  r4:00000085
[ 2445.685011] [<c0168270>] (cpu_startup_entry) from [<c01138d0>] (secondary_start_kernel+0x164/0x1ac)
[ 2445.694068] [<c011376c>] (secondary_start_kernel) from [<10102b2c>] (0x10102b2c)
[ 2445.701471]  r5:00000051 r4:322ac06a
[ 2445.705054] CPU3: stopping
[ 2445.707777] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G      D           4.18.0-wt-ath+ #8
[ 2445.715787] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 2445.722320] Backtrace:
[ 2445.724786] [<c010fa7c>] (dump_backtrace) from [<c010fd7c>] (show_stack+0x20/0x24)
[ 2445.732365]  r7:00000000 r6:60040193 r5:00000000 r4:c12ce41c
[ 2445.738035] [<c010fd5c>] (show_stack) from [<c0c03074>] (dump_stack+0xb4/0xec)
[ 2445.745269] [<c0c02fc0>] (dump_stack) from [<c0113eec>] (handle_IPI+0x344/0x38c)
[ 2445.752675]  r10:00000003 r9:ffffe000 r8:c12d6804 r7:00000000 r6:00000004 r5:c1208e7c
[ 2445.760512]  r4:c11d90e8 r3:c498e019
[ 2445.764100] [<c0113ba8>] (handle_IPI) from [<c01024c4>] (gic_handle_irq+0xb8/0xcc)
[ 2445.771680]  r10:c125d6cc r9:e22c5ed8 r8:c12090a8 r7:f4000100 r6:000003ff r5:000003eb
[ 2445.779516]  r4:f400010c
[ 2445.782060] [<c010240c>] (gic_handle_irq) from [<c0101a30>] (__irq_svc+0x70/0x98)
[ 2445.789549] Exception stack(0xe22c5ed8 to 0xe22c5f20)
[ 2445.794608] 5ec0:                                                       c083f588 e22bb200
[ 2445.802796] 5ee0: 00000000 00000000 c12d7114 63b01e07 00000001 e5fcf9f8 44ed8632 00000239
[ 2445.810984] 5f00: 00000239 e22c5f6c e22c5f18 e22c5f28 c0185978 c083f58c 60040013 ffffffff
[ 2445.819172]  r10:00000239 r9:e22c4000 r8:44ed8632 r7:e22c5f0c r6:ffffffff r5:60040013
[ 2445.827007]  r4:c083f58c
[ 2445.829552] [<c083f404>] (cpuidle_enter_state) from [<c083f8c4>] (cpuidle_enter+0x24/0x28)
[ 2445.837827]  r10:c1208908 r9:c120f8c8 r8:e5fcf9f8 r7:c1208970 r6:00000008 r5:c1208930
[ 2445.845663]  r4:ffffe000
[ 2445.848209] [<c083f8a0>] (cpuidle_enter) from [<c0167a08>] (call_cpuidle+0x30/0x4c)
[ 2445.855875] [<c01679d8>] (call_cpuidle) from [<c0167e38>] (do_idle+0x230/0x2c4)
[ 2445.863195] [<c0167c08>] (do_idle) from [<c0168298>] (cpu_startup_entry+0x28/0x30)
[ 2445.870774]  r10:00000000 r9:412fc09a r8:1000406a r7:c12f1720 r6:10c0387d r5:00000003
[ 2445.878611]  r4:00000085
[ 2445.881157] [<c0168270>] (cpu_startup_entry) from [<c01138d0>] (secondary_start_kernel+0x164/0x1ac)
[ 2445.890213] [<c011376c>] (secondary_start_kernel) from [<10102b2c>] (0x10102b2c)
[ 2445.897617]  r5:00000051 r4:322ac06a
[ 2445.901200] CPU2: stopping
[ 2445.903923] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D           4.18.0-wt-ath+ #8
[ 2445.911933] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 2445.918467] Backtrace:
[ 2445.920931] [<c010fa7c>] (dump_backtrace) from [<c010fd7c>] (show_stack+0x20/0x24)
[ 2445.928511]  r7:00000000 r6:60040193 r5:00000000 r4:c12ce41c
[ 2445.934183] [<c010fd5c>] (show_stack) from [<c0c03074>] (dump_stack+0xb4/0xec)
[ 2445.941417] [<c0c02fc0>] (dump_stack) from [<c0113eec>] (handle_IPI+0x344/0x38c)
[ 2445.948824]  r10:00000002 r9:ffffe000 r8:c12d6804 r7:00000000 r6:00000004 r5:c1208e7c
[ 2445.956661]  r4:c11d90e8 r3:c498e019
[ 2445.960248] [<c0113ba8>] (handle_IPI) from [<c01024c4>] (gic_handle_irq+0xb8/0xcc)
[ 2445.967828]  r10:c125d6cc r9:e22c3ed8 r8:c12090a8 r7:f4000100 r6:000003ff r5:000003eb
[ 2445.975664]  r4:f400010c
[ 2445.978208] [<c010240c>] (gic_handle_irq) from [<c0101a30>] (__irq_svc+0x70/0x98)
[ 2445.985699] Exception stack(0xe22c3ed8 to 0xe22c3f20)
[ 2445.990758] 3ec0:                                                       c083f588 e22ba580
[ 2445.998946] 3ee0: 00000000 00000000 c12d7114 63b01e07 00000001 e5fbd9f8 44ed95d2 00000239
[ 2446.007134] 3f00: 00000239 e22c3f6c e22c3f18 e22c3f28 c0185978 c083f58c 60040013 ffffffff
[ 2446.015321]  r10:00000239 r9:e22c2000 r8:44ed95d2 r7:e22c3f0c r6:ffffffff r5:60040013
[ 2446.023156]  r4:c083f58c
[ 2446.025701] [<c083f404>] (cpuidle_enter_state) from [<c083f8c4>] (cpuidle_enter+0x24/0x28)
[ 2446.033976]  r10:c1208908 r9:c120f8c8 r8:e5fbd9f8 r7:c1208970 r6:00000004 r5:c1208930
[ 2446.041811]  r4:ffffe000
[ 2446.044356] [<c083f8a0>] (cpuidle_enter) from [<c0167a08>] (call_cpuidle+0x30/0x4c)
[ 2446.052023] [<c01679d8>] (call_cpuidle) from [<c0167e38>] (do_idle+0x230/0x2c4)
[ 2446.059342] [<c0167c08>] (do_idle) from [<c0168298>] (cpu_startup_entry+0x28/0x30)
[ 2446.066921]  r10:00000000 r9:412fc09a r8:1000406a r7:c12f1720 r6:10c0387d r5:00000002
[ 2446.074756]  r4:00000085
[ 2446.077302] [<c0168270>] (cpu_startup_entry) from [<c01138d0>] (secondary_start_kernel+0x164/0x1ac)
[ 2446.086358] [<c011376c>] (secondary_start_kernel) from [<10102b2c>] (0x10102b2c)
[ 2446.093761]  r5:00000051 r4:322ac06a
[ 2446.097360] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Signed-off-by: Erik Stromdahl <erik.stromdahl@gmail.com>
erstrom pushed a commit that referenced this issue May 19, 2019
Currently mm_iommu_do_alloc() is called in 2 cases:
- VFIO_IOMMU_SPAPR_REGISTER_MEMORY ioctl() for normal memory:
	this locks &mem_list_mutex and then locks mm::mmap_sem
	several times when adjusting locked_vm or pinning pages;
- vfio_pci_nvgpu_regops::mmap() for GPU memory:
	this is called with mm::mmap_sem held already and it locks
	&mem_list_mutex.

So one can craft a userspace program to do special ioctl and mmap in
2 threads concurrently and cause a deadlock which lockdep warns about
(below).

We did not hit this yet because QEMU constructs the machine in a single
thread.

This moves the overlap check next to where the new entry is added and
reduces the amount of time spent with &mem_list_mutex held.

This moves locked_vm adjustment from under &mem_list_mutex.

This relies on mm_iommu_adjust_locked_vm() doing nothing when entries==0.

This is one of the lockdep warnings:

======================================================
WARNING: possible circular locking dependency detected
5.1.0-rc2-le_nv2_aikATfstn1-p1 #363 Not tainted
------------------------------------------------------
qemu-system-ppc/8038 is trying to acquire lock:
000000002ec6c453 (mem_list_mutex){+.+.}, at: mm_iommu_do_alloc+0x70/0x490

but task is already holding lock:
00000000fd7da97f (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xf0/0x160

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&mm->mmap_sem){++++}:
       lock_acquire+0xf8/0x260
       down_write+0x44/0xa0
       mm_iommu_adjust_locked_vm.part.1+0x4c/0x190
       mm_iommu_do_alloc+0x310/0x490
       tce_iommu_ioctl.part.9+0xb84/0x1150 [vfio_iommu_spapr_tce]
       vfio_fops_unl_ioctl+0x94/0x430 [vfio]
       do_vfs_ioctl+0xe4/0x930
       ksys_ioctl+0xc4/0x110
       sys_ioctl+0x28/0x80
       system_call+0x5c/0x70

-> #0 (mem_list_mutex){+.+.}:
       __lock_acquire+0x1484/0x1900
       lock_acquire+0xf8/0x260
       __mutex_lock+0x88/0xa70
       mm_iommu_do_alloc+0x70/0x490
       vfio_pci_nvgpu_mmap+0xc0/0x130 [vfio_pci]
       vfio_pci_mmap+0x198/0x2a0 [vfio_pci]
       vfio_device_fops_mmap+0x44/0x70 [vfio]
       mmap_region+0x5d4/0x770
       do_mmap+0x42c/0x650
       vm_mmap_pgoff+0x124/0x160
       ksys_mmap_pgoff+0xdc/0x2f0
       sys_mmap+0x40/0x80
       system_call+0x5c/0x70

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&mm->mmap_sem);
                               lock(mem_list_mutex);
                               lock(&mm->mmap_sem);
  lock(mem_list_mutex);

 *** DEADLOCK ***

1 lock held by qemu-system-ppc/8038:
 #0: 00000000fd7da97f (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xf0/0x160

Fixes: c10c21e ("powerpc/vfio/iommu/kvm: Do not pin device memory", 2018-12-19)
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
erstrom pushed a commit that referenced this issue May 19, 2019
There is a UBSAN report as below:
UBSAN: Undefined behaviour in net/ipv4/tcp_input.c:2877:56
signed integer overflow:
2147483647 * 1000 cannot be represented in type 'int'
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.1.0-rc4-00058-g582549e #1
Call Trace:
 <IRQ>
 dump_stack+0x8c/0xba
 ubsan_epilogue+0x11/0x60
 handle_overflow+0x12d/0x170
 ? ttwu_do_wakeup+0x21/0x320
 __ubsan_handle_mul_overflow+0x12/0x20
 tcp_ack_update_rtt+0x76c/0x780
 tcp_clean_rtx_queue+0x499/0x14d0
 tcp_ack+0x69e/0x1240
 ? __wake_up_sync_key+0x2c/0x50
 ? update_group_capacity+0x50/0x680
 tcp_rcv_established+0x4e2/0xe10
 tcp_v4_do_rcv+0x22b/0x420
 tcp_v4_rcv+0xfe8/0x1190
 ip_protocol_deliver_rcu+0x36/0x180
 ip_local_deliver+0x15b/0x1a0
 ip_rcv+0xac/0xd0
 __netif_receive_skb_one_core+0x7f/0xb0
 __netif_receive_skb+0x33/0xc0
 netif_receive_skb_internal+0x84/0x1c0
 napi_gro_receive+0x2a0/0x300
 receive_buf+0x3d4/0x2350
 ? detach_buf_split+0x159/0x390
 virtnet_poll+0x198/0x840
 ? reweight_entity+0x243/0x4b0
 net_rx_action+0x25c/0x770
 __do_softirq+0x19b/0x66d
 irq_exit+0x1eb/0x230
 do_IRQ+0x7a/0x150
 common_interrupt+0xf/0xf
 </IRQ>

It can be reproduced by:
  echo 2147483647 > /proc/sys/net/ipv4/tcp_min_rtt_wlen

Fixes: f672258 ("tcp: track min RTT using windowed min-filter")
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom pushed a commit that referenced this issue May 19, 2019
Ido Schimmel says:

====================
mlxsw: Few small fixes

Patch #1, from Petr, adjusts mlxsw to provide the same QoS behavior for
both Spectrum-1 and Spectrum-2. The fix is required due to a difference
in the behavior of Spectrum-2 compared to Spectrum-1. The problem and
solution are described in the detail in the changelog.

Patch #2 increases the time period in which the driver waits for the
firmware to signal it has finished its initialization. The issue will be
fixed in future firmware versions and the timeout will be decreased.

Patch #3, from Amit, fixes a display problem where the autoneg status in
ethtool is not updated in case the netdev is not running.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom pushed a commit that referenced this issue May 19, 2019
The way of getting private imx_i2c_struct in i2c_imx_clk_notifier_call()
is incorrect, should use clk_change_nb element to get correct address
and avoid below kernel dump during POST_RATE_CHANGE notify by clk
framework:

Unable to handle kernel paging request at virtual address 03ef1488
pgd = (ptrval)
[03ef1488] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT SMP ARM
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Workqueue: events reduce_bus_freq_handler
PC is at i2c_imx_set_clk+0x10/0xb8
LR is at i2c_imx_clk_notifier_call+0x20/0x28
pc : [<806a893c>]    lr : [<806a8a04>]    psr: a0080013
sp : bf399dd8  ip : bf3432ac  fp : bf7c1dc0
r10: 00000002  r9 : 00000000  r8 : 00000000
r7 : 03ef1480  r6 : bf399e50  r5 : ffffffff  r4 : 00000000
r3 : bf025300  r2 : bf399e50  r1 : 00b71b00  r0 : bf399be8
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 4e03004a  DAC: 00000051
Process kworker/2:1 (pid: 38, stack limit = 0x(ptrval))
Stack: (0xbf399dd8 to 0xbf39a000)
9dc0:                                                       806a89e4 00000000
9de0: ffffffff bf399e50 00000002 806a8a04 806a89e4 80142900 ffffffff 00000000
9e00: bf34ef18 bf34ef04 00000000 ffffffff bf399e50 80142d84 00000000 bf399e6c
9e20: bf34ef00 80f214c4 bf025300 00000002 80f08d08 bf017480 00000000 80142df0
9e40: 00000000 80166ed8 80c27638 8045de58 bf352340 03ef1480 00b71b00 0f82e242
9e60: bf025300 00000002 03ef1480 80f60e5c 00000001 8045edf0 00000002 8045eb08
9e80: bf025300 00000002 03ef1480 8045ee10 03ef1480 8045eb08 bf01be40 00000002
9ea0: 03ef1480 8045ee10 07de2900 8045eb08 bf01b780 00000002 07de2900 8045ee10
9ec0: 80c27898 bf399ee4 bf020a80 00000002 1f78a400 8045ee10 80f60e5c 80460514
9ee0: 80f60e5c bf01b600 bf01b480 80460460 0f82e242 bf383a80 bf383a00 80f60e5c
9f00: 00000000 bf7c1dc0 80f60e70 80460564 80f60df0 80f60d24 80f60df0 8011e72c
9f20: 00000000 80f60df0 80f60e6c bf7c4f00 00000000 8011e7ac bf274000 8013bd84
9f40: bf7c1dd8 80f03d00 bf274000 bf7c1dc0 bf274014 bf7c1dd8 80f03d00 bf398000
9f60: 00000008 8013bfb4 00000000 bf25d100 bf25d0c0 00000000 bf274000 8013bf88
9f80: bf25d11c bf0cfebc 00000000 8014140c bf25d0c0 801412ec 00000000 00000000
9fa0: 00000000 00000000 00000000 801010e8 00000000 00000000 00000000 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[<806a893c>] (i2c_imx_set_clk) from [<806a8a04>] (i2c_imx_clk_notifier_call+0x20/0x28)
[<806a8a04>] (i2c_imx_clk_notifier_call) from [<80142900>] (notifier_call_chain+0x44/0x84)
[<80142900>] (notifier_call_chain) from [<80142d84>] (__srcu_notifier_call_chain+0x44/0x98)
[<80142d84>] (__srcu_notifier_call_chain) from [<80142df0>] (srcu_notifier_call_chain+0x18/0x20)
[<80142df0>] (srcu_notifier_call_chain) from [<8045de58>] (__clk_notify+0x78/0xa4)
[<8045de58>] (__clk_notify) from [<8045edf0>] (__clk_recalc_rates+0x60/0xb4)
[<8045edf0>] (__clk_recalc_rates) from [<8045ee10>] (__clk_recalc_rates+0x80/0xb4)
Code: e92d40f8 e5903298 e59072a0 e1530001 (e5975008)
---[ end trace fc7f5514b97b6cbb ]---

Fixes: 90ad2cb ("i2c: imx: use clk notifier for rate changes")
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Reviewed-by: Dong Aisheng <aisheng.dong@nxp.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
erstrom pushed a commit that referenced this issue May 19, 2019
After commit 5271953 ("rxrpc: Use the UDP encap_rcv hook"),
rxrpc_input_packet() is directly called from lockless UDP receive
path, under rcu_read_lock() protection.

It must therefore use RCU rules :

- udp_sk->sk_user_data can be cleared at any point in this function.
  rcu_dereference_sk_user_data() is what we need here.

- Also, since sk_user_data might have been set in rxrpc_open_socket()
  we must observe a proper RCU grace period before kfree(local) in
  rxrpc_lookup_local()

v4: @Local can be NULL in xrpc_lookup_local() as reported by kbuild test robot <lkp@intel.com>
        and Julia Lawall <julia.lawall@lip6.fr>, thanks !

v3,v2 : addressed David Howells feedback, thanks !

syzbot reported :

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 19236 Comm: syz-executor703 Not tainted 5.1.0-rc6 #79
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__lock_acquire+0xbef/0x3fb0 kernel/locking/lockdep.c:3573
Code: 00 0f 85 a5 1f 00 00 48 81 c4 10 01 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 4a 21 00 00 49 81 7d 00 20 54 9c 89 0f 84 cf f4
RSP: 0018:ffff88809d7aef58 EFLAGS: 00010002
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000026 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffff88809d7af090 R08: 0000000000000001 R09: 0000000000000001
R10: ffffed1015d05bc7 R11: ffff888089428600 R12: 0000000000000000
R13: 0000000000000130 R14: 0000000000000001 R15: 0000000000000001
FS:  00007f059044d700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004b6040 CR3: 00000000955ca000 CR4: 00000000001406f0
Call Trace:
 lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4211
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:152
 skb_queue_tail+0x26/0x150 net/core/skbuff.c:2972
 rxrpc_reject_packet net/rxrpc/input.c:1126 [inline]
 rxrpc_input_packet+0x4a0/0x5536 net/rxrpc/input.c:1414
 udp_queue_rcv_one_skb+0xaf2/0x1780 net/ipv4/udp.c:2011
 udp_queue_rcv_skb+0x128/0x730 net/ipv4/udp.c:2085
 udp_unicast_rcv_skb.isra.0+0xb9/0x360 net/ipv4/udp.c:2245
 __udp4_lib_rcv+0x701/0x2ca0 net/ipv4/udp.c:2301
 udp_rcv+0x22/0x30 net/ipv4/udp.c:2482
 ip_protocol_deliver_rcu+0x60/0x8f0 net/ipv4/ip_input.c:208
 ip_local_deliver_finish+0x23b/0x390 net/ipv4/ip_input.c:234
 NF_HOOK include/linux/netfilter.h:289 [inline]
 NF_HOOK include/linux/netfilter.h:283 [inline]
 ip_local_deliver+0x1e9/0x520 net/ipv4/ip_input.c:255
 dst_input include/net/dst.h:450 [inline]
 ip_rcv_finish+0x1e1/0x300 net/ipv4/ip_input.c:413
 NF_HOOK include/linux/netfilter.h:289 [inline]
 NF_HOOK include/linux/netfilter.h:283 [inline]
 ip_rcv+0xe8/0x3f0 net/ipv4/ip_input.c:523
 __netif_receive_skb_one_core+0x115/0x1a0 net/core/dev.c:4987
 __netif_receive_skb+0x2c/0x1c0 net/core/dev.c:5099
 netif_receive_skb_internal+0x117/0x660 net/core/dev.c:5202
 napi_frags_finish net/core/dev.c:5769 [inline]
 napi_gro_frags+0xade/0xd10 net/core/dev.c:5843
 tun_get_user+0x2f24/0x3fb0 drivers/net/tun.c:1981
 tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2027
 call_write_iter include/linux/fs.h:1866 [inline]
 do_iter_readv_writev+0x5e1/0x8e0 fs/read_write.c:681
 do_iter_write fs/read_write.c:957 [inline]
 do_iter_write+0x184/0x610 fs/read_write.c:938
 vfs_writev+0x1b3/0x2f0 fs/read_write.c:1002
 do_writev+0x15e/0x370 fs/read_write.c:1037
 __do_sys_writev fs/read_write.c:1110 [inline]
 __se_sys_writev fs/read_write.c:1107 [inline]
 __x64_sys_writev+0x75/0xb0 fs/read_write.c:1107
 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 5271953 ("rxrpc: Use the UDP encap_rcv hook")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
erstrom pushed a commit that referenced this issue May 19, 2019
When ddc-i2c-bus property is used, a NULL pointer dereference is reported:

[   31.041669] Unable to handle kernel NULL pointer dereference at virtual address 00000008
[   31.041671] pgd = 4d3c16f6
[   31.041673] [00000008] *pgd=00000000
[   31.041678] Internal error: Oops: 5 [#1] SMP ARM

[   31.041711] Hardware name: Rockchip (Device Tree)
[   31.041718] PC is at i2c_transfer+0x8/0xe4
[   31.041721] LR is at drm_scdc_read+0x54/0x84
[   31.041723] pc : [<c073273c>]    lr : [<c05926c4>]    psr: 280f0013
[   31.041725] sp : edffdad0  ip : 5ccb5511  fp : 00000058
[   31.041727] r10: 00000780  r9 : edf91608  r8 : c11b0f48
[   31.041728] r7 : 00000438  r6 : 00000000  r5 : 00000000  r4 : 00000000
[   31.041730] r3 : edffdae7  r2 : 00000002  r1 : edffdaec  r0 : 00000000

[   31.041908] [<c073273c>] (i2c_transfer) from [<c05926c4>] (drm_scdc_read+0x54/0x84)
[   31.041913] [<c05926c4>] (drm_scdc_read) from [<c0592858>] (drm_scdc_set_scrambling+0x30/0xbc)
[   31.041919] [<c0592858>] (drm_scdc_set_scrambling) from [<c05cc0f4>] (dw_hdmi_update_power+0x1440/0x1610)
[   31.041926] [<c05cc0f4>] (dw_hdmi_update_power) from [<c05cc574>] (dw_hdmi_bridge_enable+0x2c/0x70)
[   31.041932] [<c05cc574>] (dw_hdmi_bridge_enable) from [<c05aed48>] (drm_bridge_enable+0x24/0x34)
[   31.041938] [<c05aed48>] (drm_bridge_enable) from [<c0591060>] (drm_atomic_helper_commit_modeset_enables+0x114/0x220)
[   31.041943] [<c0591060>] (drm_atomic_helper_commit_modeset_enables) from [<c05c3fe0>] (rockchip_atomic_helper_commit_tail_rpm+0x28/0x64)

hdmi->i2c may not be set when ddc-i2c-bus property is used in device tree.
Fix this by using hdmi->ddc as the i2c adapter when calling drm_scdc_*().
Also report that SCDC is not supported when there is no DDC bus.

Fixes: 264fce6 ("drm/bridge: dw-hdmi: Add SCDC and TMDS Scrambling support")
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/VE1PR03MB59031814B5BCAB2152923BDAAC210@VE1PR03MB5903.eurprd03.prod.outlook.com
erstrom pushed a commit that referenced this issue May 19, 2019
…esult

During the development of commit 5e1f0f0 ("mm, compaction: capture
a page under direct compaction"), a paranoid check was added to ensure
that if a captured page was available after compaction that it was
consistent with the final state of compaction.  The intent was to catch
serious programming bugs such as using a stale page pointer and causing
corruption problems.

However, it is possible to get a captured page even if compaction was
unsuccessful if an interrupt triggered and happened to free pages in
interrupt context that got merged into a suitable high-order page.  It's
highly unlikely but Li Wang did report the following warning on s390
occuring when testing OOM handling.  Note that the warning is slightly
edited for clarity.

  WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777 __alloc_pages_direct_compact+0x182/0x190
  Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs
    lockd grace fscache sunrpc pkey ghash_s390 prng xts aes_s390
    des_s390 des_generic sha512_s390 zcrypt_cex4 zcrypt vmur binfmt_misc
    ip_tables xfs libcrc32c dasd_fba_mod qeth_l2 dasd_eckd_mod dasd_mod
    qeth qdio lcs ctcm ccwgroup fsm dm_mirror dm_region_hash dm_log
    dm_mod
  CPU: 0 PID: 9783 Comm: copy.sh Kdump: loaded Not tainted 5.1.0-rc 5 #1

This patch simply removes the check entirely instead of trying to be
clever about pages freed from interrupt context.  If a serious
programming error was introduced, it is highly likely to be caught by
prep_new_page() instead.

Link: http://lkml.kernel.org/r/20190419085133.GH18914@techsingularity.net
Fixes: 5e1f0f0 ("mm, compaction: capture a page under direct compaction")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Li Wang <liwang@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
erstrom pushed a commit that referenced this issue May 19, 2019
Syzkaller report this:

  sysctl could not get directory: /net//bridge -12
  kasan: CONFIG_KASAN_INLINE enabled
  kasan: GPF could be caused by NULL-ptr deref or user memory access
  general protection fault: 0000 [#1] SMP KASAN PTI
  CPU: 1 PID: 7027 Comm: syz-executor.0 Tainted: G         C        5.1.0-rc3+ #8
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
  RIP: 0010:__write_once_size include/linux/compiler.h:220 [inline]
  RIP: 0010:__rb_change_child include/linux/rbtree_augmented.h:144 [inline]
  RIP: 0010:__rb_erase_augmented include/linux/rbtree_augmented.h:186 [inline]
  RIP: 0010:rb_erase+0x5f4/0x19f0 lib/rbtree.c:459
  Code: 00 0f 85 60 13 00 00 48 89 1a 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 75 0c 00 00 4d 85 ed 4c 89 2e 74 ce 4c 89 ea 48
  RSP: 0018:ffff8881bb507778 EFLAGS: 00010206
  RAX: dffffc0000000000 RBX: ffff8881f224b5b8 RCX: ffffffff818f3f6a
  RDX: 000000000000000a RSI: 0000000000000050 RDI: ffff8881f224b568
  RBP: 0000000000000000 R08: ffffed10376a0ef4 R09: ffffed10376a0ef4
  R10: 0000000000000001 R11: ffffed10376a0ef4 R12: ffff8881f224b558
  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  FS:  00007f3e7ce13700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fd60fbe9398 CR3: 00000001cb55c001 CR4: 00000000007606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   erase_entry fs/proc/proc_sysctl.c:178 [inline]
   erase_header+0xe3/0x160 fs/proc/proc_sysctl.c:207
   start_unregistering fs/proc/proc_sysctl.c:331 [inline]
   drop_sysctl_table+0x558/0x880 fs/proc/proc_sysctl.c:1631
   get_subdir fs/proc/proc_sysctl.c:1022 [inline]
   __register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335
   br_netfilter_init+0x68/0x1000 [br_netfilter]
   do_one_initcall+0xbc/0x47d init/main.c:901
   do_init_module+0x1b5/0x547 kernel/module.c:3456
   load_module+0x6405/0x8c10 kernel/module.c:3804
   __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
   do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
   entry_SYSCALL_64_after_hwframe+0x49/0xbe
  Modules linked in: br_netfilter(+) backlight comedi(C) hid_sensor_hub max3100 ti_ads8688 udc_core fddi snd_mona leds_gpio rc_streamzap mtd pata_netcell nf_log_common rc_winfast udp_tunnel snd_usbmidi_lib snd_usb_toneport snd_usb_line6 snd_rawmidi snd_seq_device snd_hwdep videobuf2_v4l2 videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops rc_gadmei_rm008z 8250_of smm665 hid_tmff hid_saitek hwmon_vid rc_ati_tv_wonder_hd_600 rc_core pata_pdc202xx_old dn_rtmsg as3722 ad714x_i2c ad714x snd_soc_cs4265 hid_kensington panel_ilitek_ili9322 drm drm_panel_orientation_quirks ipack cdc_phonet usbcore phonet hid_jabra hid extcon_arizona can_dev industrialio_triggered_buffer kfifo_buf industrialio adm1031 i2c_mux_ltc4306 i2c_mux ipmi_msghandler mlxsw_core snd_soc_cs35l34 snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer ac97_bus snd_compress snd soundcore gpio_da9055 uio ecdh_generic mdio_thunder of_mdio fixed_phy libphy mdio_cavium iptable_security iptable_raw iptable_mangle
   iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev tpm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ide_pci_generic piix aes_x86_64 crypto_simd cryptd ide_core glue_helper input_leds psmouse intel_agp intel_gtt serio_raw ata_generic i2c_piix4 agpgart pata_acpi parport_pc parport floppy rtc_cmos sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: br_netfilter]
  Dumping ftrace buffer:
     (ftrace buffer empty)
  ---[ end trace 68741688d5fbfe85 ]---

commit 23da958 ("fs/proc/proc_sysctl.c: fix NULL pointer
dereference in put_links") forgot to handle start_unregistering() case,
while header->parent is NULL, it calls erase_header() and as seen in the
above syzkaller call trace, accessing &header->parent->root will trigger
a NULL pointer dereference.

As that commit explained, there is also no need to call
start_unregistering() if header->parent is NULL.

Link: http://lkml.kernel.org/r/20190409153622.28112-1-yuehaibing@huawei.com
Fixes: 23da958 ("fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links")
Fixes: 0e47c99 ("sysctl: Replace root_list with links between sysctl_table_sets")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
erstrom pushed a commit that referenced this issue May 19, 2019
We don't check for the validity of the lengths in the packet received
from the firmware.  If the MPDU length received in the rx descriptor
is too short to contain the header length and the crypt length
together, we may end up trying to copy a negative number of bytes
(headlen - hdrlen < 0) which will underflow and cause us to try to
copy a huge amount of data.  This causes oopses such as this one:

BUG: unable to handle kernel paging request at ffff896be2970000
PGD 5e201067 P4D 5e201067 PUD 5e205067 PMD 16110d063 PTE 8000000162970161
Oops: 0003 [#1] PREEMPT SMP NOPTI
CPU: 2 PID: 1824 Comm: irq/134-iwlwifi Not tainted 4.19.33-04308-geea41cf4930f #1
Hardware name: [...]
RIP: 0010:memcpy_erms+0x6/0x10
Code: 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3
 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe
RSP: 0018:ffffa4630196fc60 EFLAGS: 00010287
RAX: ffff896be2924618 RBX: ffff896bc8ecc600 RCX: 00000000fffb4610
RDX: 00000000fffffff8 RSI: ffff896a835e2a38 RDI: ffff896be2970000
RBP: ffffa4630196fd30 R08: ffff896bc8ecc600 R09: ffff896a83597000
R10: ffff896bd6998400 R11: 000000000200407f R12: ffff896a83597050
R13: 00000000fffffff8 R14: 0000000000000010 R15: ffff896a83597038
FS:  0000000000000000(0000) GS:ffff896be8280000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff896be2970000 CR3: 000000005dc12002 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 iwl_mvm_rx_mpdu_mq+0xb51/0x121b [iwlmvm]
 iwl_pcie_rx_handle+0x58c/0xa89 [iwlwifi]
 iwl_pcie_irq_rx_msix_handler+0xd9/0x12a [iwlwifi]
 irq_thread_fn+0x24/0x49
 irq_thread+0xb0/0x122
 kthread+0x138/0x140
 ret_from_fork+0x1f/0x40

Fix that by checking the lengths for correctness and trigger a warning
to show that we have received wrong data.

Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
erstrom pushed a commit that referenced this issue May 19, 2019
If io_allocate_scq_urings() fails to allocate an sq_* region, it will
call io_mem_free() for any previously allocated regions, but leave
dangling pointers to these regions in the ctx. Any regions which have
not yet been allocated are left NULL. Note that when returning
-EOVERFLOW, the previously allocated sq_ring is not freed, which appears
to be an unintentional leak.

When io_allocate_scq_urings() fails, io_uring_create() will call
io_ring_ctx_wait_and_kill(), which calls io_mem_free() on all the sq_*
regions, assuming the pointers are valid and not NULL.

This can result in pages being freed multiple times, which has been
observed to corrupt the page state, leading to subsequent fun. This can
also result in virt_to_page() on NULL, resulting in the use of bogus
page addresses, and yet more subsequent fun. The latter can be detected
with CONFIG_DEBUG_VIRTUAL on arm64.

Adding a cleanup path to io_allocate_scq_urings() complicates the logic,
so let's leave it to io_ring_ctx_free() to consistently free these
pointers, and simplify the io_allocate_scq_urings() error paths.

Full splats from before this patch below. Note that the pointer logged
by the DEBUG_VIRTUAL "non-linear address" warning has been hashed, and
is actually NULL.

[   26.098129] page:ffff80000e949a00 count:0 mapcount:-128 mapping:0000000000000000 index:0x0
[   26.102976] flags: 0x63fffc000000()
[   26.104373] raw: 000063fffc000000 ffff80000e86c188 ffff80000ea3df08 0000000000000000
[   26.108917] raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000
[   26.137235] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
[   26.143960] ------------[ cut here ]------------
[   26.146020] kernel BUG at include/linux/mm.h:547!
[   26.147586] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[   26.149163] Modules linked in:
[   26.150287] Process syz-executor.21 (pid: 20204, stack limit = 0x000000000e9cefeb)
[   26.153307] CPU: 2 PID: 20204 Comm: syz-executor.21 Not tainted 5.1.0-rc7-00004-g7d30b2ea43d6 #18
[   26.156566] Hardware name: linux,dummy-virt (DT)
[   26.158089] pstate: 40400005 (nZcv daif +PAN -UAO)
[   26.159869] pc : io_mem_free+0x9c/0xa8
[   26.161436] lr : io_mem_free+0x9c/0xa8
[   26.162720] sp : ffff000013003d60
[   26.164048] x29: ffff000013003d60 x28: ffff800025048040
[   26.165804] x27: 0000000000000000 x26: ffff800025048040
[   26.167352] x25: 00000000000000c0 x24: ffff0000112c2820
[   26.169682] x23: 0000000000000000 x22: 0000000020000080
[   26.171899] x21: ffff80002143b418 x20: ffff80002143b400
[   26.174236] x19: ffff80002143b280 x18: 0000000000000000
[   26.176607] x17: 0000000000000000 x16: 0000000000000000
[   26.178997] x15: 0000000000000000 x14: 0000000000000000
[   26.181508] x13: 00009178a5e077b2 x12: 0000000000000001
[   26.183863] x11: 0000000000000000 x10: 0000000000000980
[   26.186437] x9 : ffff000013003a80 x8 : ffff800025048a20
[   26.189006] x7 : ffff8000250481c0 x6 : ffff80002ffe9118
[   26.191359] x5 : ffff80002ffe9118 x4 : 0000000000000000
[   26.193863] x3 : ffff80002ffefe98 x2 : 44c06ddd107d1f00
[   26.196642] x1 : 0000000000000000 x0 : 000000000000003e
[   26.198892] Call trace:
[   26.199893]  io_mem_free+0x9c/0xa8
[   26.201155]  io_ring_ctx_wait_and_kill+0xec/0x180
[   26.202688]  io_uring_setup+0x6c4/0x6f0
[   26.204091]  __arm64_sys_io_uring_setup+0x18/0x20
[   26.205576]  el0_svc_common.constprop.0+0x7c/0xe8
[   26.207186]  el0_svc_handler+0x28/0x78
[   26.208389]  el0_svc+0x8/0xc
[   26.209408] Code: aa0203e0 d0006861 9133a021 97fcdc3c (d4210000)
[   26.211995] ---[ end trace bdb81cd43a21e50d ]---

[   81.770626] ------------[ cut here ]------------
[   81.825015] virt_to_phys used for non-linear address: 000000000d42f2c7 (          (null))
[   81.827860] WARNING: CPU: 1 PID: 30171 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0x48/0x68
[   81.831202] Modules linked in:
[   81.832212] CPU: 1 PID: 30171 Comm: syz-executor.20 Not tainted 5.1.0-rc7-00004-g7d30b2ea43d6 #19
[   81.835616] Hardware name: linux,dummy-virt (DT)
[   81.836863] pstate: 60400005 (nZCv daif +PAN -UAO)
[   81.838727] pc : __virt_to_phys+0x48/0x68
[   81.840572] lr : __virt_to_phys+0x48/0x68
[   81.842264] sp : ffff80002cf67c70
[   81.843858] x29: ffff80002cf67c70 x28: ffff800014358e18
[   81.846463] x27: 0000000000000000 x26: 0000000020000080
[   81.849148] x25: 0000000000000000 x24: ffff80001bb01f40
[   81.851986] x23: ffff200011db06c8 x22: ffff2000127e3c60
[   81.854351] x21: ffff800014358cc0 x20: ffff800014358d98
[   81.856711] x19: 0000000000000000 x18: 0000000000000000
[   81.859132] x17: 0000000000000000 x16: 0000000000000000
[   81.861586] x15: 0000000000000000 x14: 0000000000000000
[   81.863905] x13: 0000000000000000 x12: ffff1000037603e9
[   81.866226] x11: 1ffff000037603e8 x10: 0000000000000980
[   81.868776] x9 : ffff80002cf67840 x8 : ffff80001bb02920
[   81.873272] x7 : ffff1000037603e9 x6 : ffff80001bb01f47
[   81.875266] x5 : ffff1000037603e9 x4 : dfff200000000000
[   81.876875] x3 : ffff200010087528 x2 : ffff1000059ecf58
[   81.878751] x1 : 44c06ddd107d1f00 x0 : 0000000000000000
[   81.880453] Call trace:
[   81.881164]  __virt_to_phys+0x48/0x68
[   81.882919]  io_mem_free+0x18/0x110
[   81.886585]  io_ring_ctx_wait_and_kill+0x13c/0x1f0
[   81.891212]  io_uring_setup+0xa60/0xad0
[   81.892881]  __arm64_sys_io_uring_setup+0x2c/0x38
[   81.894398]  el0_svc_common.constprop.0+0xac/0x150
[   81.896306]  el0_svc_handler+0x34/0x88
[   81.897744]  el0_svc+0x8/0xc
[   81.898715] ---[ end trace b4a703802243cbba ]---

Fixes: 2b188cc ("Add io_uring IO interface")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-block@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants