Skip to content

Device tree population #24

Closed
lp0 opened this Issue May 17, 2012 · 62 comments

4 participants

@lp0
lp0 commented May 17, 2012

Example template .dts: http://s85.org/6o9R3Ev2:view
Example compiled .dtb: http://s85.org/6A9h2c6X:raw

Could you automatically populate the following fields in the device tree file?

/memreserve/ (u32 offset, u32 size of VideoCore memory - this is not a normal attribute)
/memory/reg with the u32 memory range (e.g. 0x0 0x10000000 for 256MB)
/chosen/bootargs with the contents of the command line file (without any additional arguments)

/system/revision with the u32 system revision
/system/serial with the high u32 followed by the low u32 of the serial number

/axi/usb/hub/ethernet/mac-address with the system mac address (this is six u8 bytes in network byte order, e.g. 00:11:22:33:44:55)

/axi/dma/broadcom,channels with the u32 mask of allowed channels (this is currently on the command line as dma.dmachans)

/axi/sdhci/clock-frequency with the u32 clock frequency in Hz
/axi/uart0/clock-frequency with the u32 clock frequency in Hz

/display/broadcom,width with the u32 display pixel width (this is currently on the command line as bcm2708_fb.fbwidth)
/display/broadcom,height with the u32 display pixel height (this is currently on the command line as bcm2708_fb.fbheight)
/display/broadcom,depth with the u32 display bit depth (this is currently on the command line as bcm2708_fb.fbdepth)

If there's anything you think I've missed let me know (the setting for dwc_otg.lpm_enable=0 can be put in the static template so the GPU doesn't need to provide it).

@lp0
lp0 commented May 18, 2012

Updated to use /system-serial instead of separate /system-serial-high and /system-serial-low (although I haven't changed the .dts/.dtb).

@PeterHuewe PeterHuewe pushed a commit to PeterHuewe/linux that referenced this issue May 18, 2012
Dan Carpenter ext3: NULL dereference in ext3_evict_inode()
This is an fsfuzzer bug.  ->s_journal is set at the end of
ext3_load_journal() but we try to use it in the error handling from
ext3_get_journal() while it's still NULL.

[  337.039041] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024
[  337.040380] IP: [<ffffffff816e6539>] _raw_spin_lock+0x9/0x30
[  337.041687] PGD 0
[  337.043118] Oops: 0002 [#1] SMP
[  337.044483] CPU 3
[  337.044495] Modules linked in: ecb md4 cifs fuse kvm_intel kvm brcmsmac brcmutil crc8 cordic r8169 [last unloaded: scsi_wait_scan]
[  337.047633]
[  337.049259] Pid: 8308, comm: mount Not tainted 3.2.0-rc2-next-20111121+ #24 SAMSUNG ELECTRONICS CO., LTD. RV411/RV511/E3511/S3511    /RV411/RV511/E3511/S3511
[  337.051064] RIP: 0010:[<ffffffff816e6539>]  [<ffffffff816e6539>] _raw_spin_lock+0x9/0x30
[  337.052879] RSP: 0018:ffff8800b1d11ae8  EFLAGS: 00010282
[  337.054668] RAX: 0000000000000100 RBX: 0000000000000000 RCX: ffff8800b77c2000
[  337.056400] RDX: ffff8800a97b5c00 RSI: 0000000000000000 RDI: 0000000000000024
[  337.058099] RBP: ffff8800b1d11ae8 R08: 6000000000000000 R09: e018000000000000
[  337.059841] R10: ff67366cc2607c03 R11: 00000000110688e6 R12: 0000000000000000
[  337.061607] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8800a78f06e8
[  337.063385] FS:  00007f9d95652800(0000) GS:ffff8800b7180000(0000) knlGS:0000000000000000
[  337.065110] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  337.066801] CR2: 0000000000000024 CR3: 00000000aef2c000 CR4: 00000000000006e0
[  337.068581] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  337.070321] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  337.072105] Process mount (pid: 8308, threadinfo ffff8800b1d10000, task ffff8800b1d02be0)
[  337.073800] Stack:
[  337.075487]  ffff8800b1d11b08 ffffffff811f48cf ffff88007ac9b158 0000000000000000
[  337.077255]  ffff8800b1d11b38 ffffffff8119405d ffff88007ac9b158 ffff88007ac9b250
[  337.078851]  ffffffff8181bda0 ffffffff8181bda0 ffff8800b1d11b68 ffffffff81131e31
[  337.080284] Call Trace:
[  337.081706]  [<ffffffff811f48cf>] log_start_commit+0x1f/0x40
[  337.083107]  [<ffffffff8119405d>] ext3_evict_inode+0x1fd/0x2a0
[  337.084490]  [<ffffffff81131e31>] evict+0xa1/0x1a0
[  337.085857]  [<ffffffff81132031>] iput+0x101/0x210
[  337.087220]  [<ffffffff811339d1>] iget_failed+0x21/0x30
[  337.088581]  [<ffffffff811905fc>] ext3_iget+0x15c/0x450
[  337.089936]  [<ffffffff8118b0c1>] ? ext3_rsv_window_add+0x81/0x100
[  337.091284]  [<ffffffff816df9a4>] ext3_get_journal+0x15/0xde
[  337.092641]  [<ffffffff811a2e9b>] ext3_fill_super+0xf2b/0x1c30
[  337.093991]  [<ffffffff810ddf7d>] ? register_shrinker+0x4d/0x60
[  337.095332]  [<ffffffff8111c112>] mount_bdev+0x1a2/0x1e0
[  337.096680]  [<ffffffff811a1f70>] ? ext3_setup_super+0x210/0x210
[  337.098026]  [<ffffffff8119a770>] ext3_mount+0x10/0x20
[  337.099362]  [<ffffffff8111cbee>] mount_fs+0x3e/0x1b0
[  337.100759]  [<ffffffff810eda1b>] ? __alloc_percpu+0xb/0x10
[  337.102330]  [<ffffffff81135385>] vfs_kern_mount+0x65/0xc0
[  337.103889]  [<ffffffff8113611f>] do_kern_mount+0x4f/0x100
[  337.105442]  [<ffffffff811378fc>] do_mount+0x19c/0x890
[  337.106989]  [<ffffffff810e8456>] ? memdup_user+0x46/0x90
[  337.108572]  [<ffffffff810e84f3>] ? strndup_user+0x53/0x70
[  337.110114]  [<ffffffff811383fb>] sys_mount+0x8b/0xe0
[  337.111617]  [<ffffffff816ed93b>] system_call_fastpath+0x16/0x1b
[  337.113133] Code: 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f b6 03 38 c2 75 f7 48 83 c4 08 5b 5d c3 0f 1f 84 00 00 00 00 00 55 b8 00 01 00 00 48 89 e5 <f0> 66 0f c1 07 0f b6 d4 38 c2 74 0c 0f 1f 00 f3 90 0f b6 07 38
[  337.116588] RIP  [<ffffffff816e6539>] _raw_spin_lock+0x9/0x30
[  337.118260]  RSP <ffff8800b1d11ae8>
[  337.119998] CR2: 0000000000000024
[  337.188701] ---[ end trace c36d790becac1615 ]---

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
bcdd0c1
@lp0
lp0 commented May 22, 2012

Updated to move the mac address to /axi/usb/hub/ethernet/mac-address.

I've been advised that putting a mac-address at / is going to be a problem we don't want to re-open. Putting it at this location exactly matches the physical hardware devices.

@popcornmix

You did earlier say: "Everything's be32 so it should be simple to overwrite default values as required. There's already a placeholder for the memory"

But I think the bootargs breaks that. Looks like I have to decompile then compile the dtb file. Unless I've missed something?

@lp0
lp0 commented May 26, 2012

I was sure I had seen padding in one of the .dts for this but I can't find one now, so you may need to do that. I'm not sure exactly how the format works but you might be able to append another string to the end.

@popcornmix

What happens if you say:
chosen { bootargs = " "; };
(for a sufficiently large number of spaces)?

@lp0
lp0 commented May 26, 2012

Problem solved!

Put this in arch/arm/mach-bcm2708/Makefile.boot:
DTC_FLAGS ?= -p 4096

@popcornmix

Great. I'll start work on this issue.

@popcornmix

Okay, I've had a go at filling in the required device tree fields. Let me know if there are any problems:
https://dl.dropbox.com/u/3669512/start.elf

@bootc
bootc commented May 27, 2012

Nearly. I haven't spent much time on this yet, but with disable_cmdline_tags=1 I get no command-line options in /chosen - or at least my kernel doesn't pick them up. With disable_cmdline_tags=0 it uses ATAGs I think, and the kernel doesn't boot because the device tree isn't there.

@popcornmix

Can you try making space in the:
chosen { bootargs = " "; };
in bcm2835.dtsi. I will only fill in up to the original length of the bootargs string.

@bootc
bootc commented May 27, 2012

Oh it doesn't use the padding space left at the end of the dtb? I'll see if I can't give it a go in a few, but may not be able to until about 20:30.

@popcornmix

From my limited understanding, the data in the bootargs is just contained inline with the devtree tags, so it's not possible to store that in the padding space without recompiling a new devtree.

It does seem to work if you store a string of, say, 1024 spaces in the bootargs, and I overwrite that with real command line (leaving length as 1024 and spaces intact).

I've updated the start.elf to not store the '\0' inside string, which I think is more correct (it makes is_printable_string return true).

P.S. I'm expecting you to define disable_cmdline_tags=1.

@bootc
bootc commented May 27, 2012

Trouble with the big empty string full of spaces is I doubt we'd ever be able to get a device tree with that into mainline. How does fdtput do it? Is http://git.jdl.com/gitweb/?p=dtc.git any help?

@popcornmix

Looking at code, it won't
* TODO: add options to:
* - expand fdt if value doesn't fit

and:
./fdtput -ts bcm2835.dtb /chosen bootargs "root=nfs"
Error at 'bootargs': FDT_ERR_NOSPACE

If the big empty string is a blocker for upstreaming, then I'm sure we'll find a solution.
However it may be a temporary solution for the current developement.

@bootc
bootc commented May 27, 2012

Right, with the extra padding that lp0 has added when generating the .dtb (DTC_FLAGS ?= -p 4096) those should never trigger. The padding is added specifically so that fdtput can add in the command-line args without having to expand the fdt. Consider my code for building the .dtb files manually, below:

dtc -O dtb -o bcm2835-rpi-b.dtb -p 128 ~/Projects/linux-lp0/arch/arm/boot/dts/bcm2835-rpi-b.dts
fdtput -t x bcm2835-rpi-b.dtb /memory reg 0 0e000000
fdtput bcm2835-rpi-b.dtb / system-rev 2
fdtput -t x bcm2835-rpi-b.dtb / system-serial 0 12345678
fdtput -t hhx bcm2835-rpi-b.dtb / mac-address 11 22 33 44 55 66
fdtput bcm2835-rpi-b.dtb /display width 656
fdtput bcm2835-rpi-b.dtb /display height 416
fdtput bcm2835-rpi-b.dtb /display depth 16
fdtput -t x bcm2835-rpi-b.dtb /axi/dma channels 3c
fdtput -t s bcm2835-rpi-b.dtb /chosen bootargs "console=ttyAMA0,115200 debug earlyprintk root=/dev/mmcblk0p2 rootwait"

This works, including the bootargs bit at the bottom, because of the added 128 bytes of padding when I called dtc.

@lp0
lp0 commented May 27, 2012

It works ok for me with the padding:
$ fdtput -t s arch/arm/boot/bcm2835-rpi-b.dtb /chosen bootargs "test"

Here's a sample .dtb with 4KB of padding: http://s85.org/EkRirRIb:raw

@popcornmix

Okay, I'll look more closely at how fdtput does it.

@lp0
lp0 commented Jun 2, 2012

The memory value should be changed to reserve the GPU's memory area instead of pretending it doesn't exist:

On 01/06/12 02:47, Stephen Warren wrote:

On 05/30/2012 07:25 PM, Stephen Warren wrote:

On 05/30/2012 05:12 AM, Simon Arlott wrote:

On Wed, May 30, 2012 04:45, Stephen Warren wrote:
...
These are not required. If not using the future bootloader, ftdput can be
used to set up those values. Memory is not fixed (256MB below is invalid as
the GPU requires some memory).

One thing I forgot to mention on the memory front. It's typical for the
memory property to represent the entire physical RAM on the board, even
if something else is using parts of it, so Linux can't touch it. You can
prevent Linux from using parts of the memory with the /memreserve/
syntax in the .dts file, which the bootloader should be able to add or
adjust as appropriate based on system configuration.

@lp0
lp0 commented Jun 3, 2012

libfdt has functions to delete and add reserved memory but fdtput has no functionality to use them.

128MB:
(echo -ne "\x00\x00\x00\x00\x08\x00\x00\x00"; echo -e "\x00\x00\x00\x00\x08\x00\x00\x00") | \
dd conv=notrunc bs=8 count=2 seek=5 of=arch/arm/boot/bcm2835-rpi-b.dtb

192MB:
(echo -ne "\x00\x00\x00\x00\x0c\x00\x00\x00"; echo -e "\x00\x00\x00\x00\x04\x00\x00\x00") | \
dd conv=notrunc bs=8 count=2 seek=5 of=arch/arm/boot/bcm2835-rpi-b.dtb

224MB:
(echo -ne "\x00\x00\x00\x00\x0e\x00\x00\x00"; echo -e "\x00\x00\x00\x00\x02\x00\x00\x00") | \
dd conv=notrunc bs=8 count=2 seek=5 of=arch/arm/boot/bcm2835-rpi-b.dtb

@lp0
lp0 commented Jun 6, 2012

Could you explain the purpose of the the DMA channels mask?

Channels 11 and 12 generate unexpected interrupts constantly.
Channels 13 and 14 haven't got interrupts.
Channel 15 never generates any interrupts (and it has an AXI ID of 0).

If I try to use channels 0-10 then 0-1 and 6 don't generate any interrupts but 2-5 and 7-10 work.
If I try to use channels 2-5 and 7-10 then only 2-5 work and 7-10 don't generate any interrupts.

@lp0
lp0 commented Jun 6, 2012

I've updated the location and name of the /system attributes, and the name of the /display and /axi/dma attributes to include "broadcom," prefixes.

@lp0
lp0 commented Jun 7, 2012

Channel 11 generates interrupts as normal
Channels 12-14 all generate interrupts on the IRQ for channels 11 and 12
Channel 15 isn't enabled and can't be enabled

Unless I try to use channel 6 (which never works), channels 7-10 don't work... channels 2-5 and 11-14 always work.

@popcornmix

Some info first. DMA is shared with GPU. This is typical usage:

DMA1: PWM audio
DMA2: ARM SDCARD
DMA6: GPU SDCARD
DMA15: GPU dma_memcpy

Once the ARM boots, DMA6 won't be used again by GPU (although I haven't tested what happens if interrupts go off - possibly something bad).

The DMA mask is the channels the GPU guarantees not to use. So 0x3c means GPU won't use 2,3,4,5.

So 2,3,4,5 should definitely work. Anything is not designed to be used by ARM, and possibly the GPU will also respond to interrupts, making behaviour undefined.

However as the GPU is never going to use more than a few DMA channels, it might be worth increasing the DMA mask to allow the ARM to make use of the channels, and see if the extra channels start working.

As background he's some info I discovered for teh-orph about burst sizes and lockups:

"If you do a big read burst then the DMA will ask for the required no
of beats but only consume 2 128 bit word. The DMA will then stall its
read bus, write out the data and then consume some more.
This runs the risk of a system wide lockup where the stalled read is
preventing the DMA write from completing due to some circular
dependency somewhere in the AXI system.

To make this safe, DMA0 and the VPU DMA (DMA15) have an 8 deep FIFO fitted to
the read data path. This adsorbs the extra read words and the read
bus then completes and becomes ready. The DMA then sucks the data out
of the fifo as it needs it.

You should be able to do read bursts of 9 with DMA0. 8 beats in the
fifo and 1 gets eaten by the DMA. 10 might also make it if the data
is aligned, else it gets stuck in the DMA arbiter which may stall all
the other channels until it clears.

However if he is managing 5 on a dma with no fifo then I assume that 2
in the DMA , and 1 in the DMA arbiter, and then 1 in the system
arbiter, 1 in the L2 or sdram arbiter and then you have a possible
system lock up so 6 would make sense."

@lp0
lp0 commented Jun 8, 2012

Could the GPU use one of the Lite channels for audio? Presumably this is related to audio/video playback by the GPU as the audio driver could do DMA itself.

It appears that the GPU is doing something odd with DMA6 causing DMA7-10 to stop working.

@lp0
lp0 commented Jun 10, 2012

Added /axi/sdhci/clock-frequency

@lp0
lp0 commented Jun 16, 2012

Added /axi/uart0/clock-frequency

The kernel has a subsystem to manage clock frequencies and enable/disable clocks so an interface to do that would be better than having to pass it through device tree.

@popcornmix popcornmix pushed a commit to popcornmix/linux that referenced this issue Aug 16, 2012
Jeff Layton nfs: skip commit in releasepage if we're freeing memory for fs-relate…
…d reasons

commit 5cf02d0 upstream.

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
1c88c58
@lp0
lp0 commented Sep 8, 2012

It needs to work with "-p 4096" given to dtc (this is in my branch), instead of a command line with lots of spaces.

@popcornmix

I'll see what I can do.

@lp0
lp0 commented Sep 9, 2012

It's also not clear if the reserved memory/dma channels/sdhci clock frequency has been added or not (I haven't tried using the latest firmware yet).

@popcornmix

These appear to be currently supported:
"/memory/reg";
"/chosen/bootargs";
"/system-rev";
"/system-serial-low";
"/axi/usb/hub/ethernet/mac-address";
"/axi/dma/channels";
"/display/width";
"/display/height";
"/display/depth";

The clocks are available through the mailbox interface, but I can trivially add them here too if useful.

@lp0
lp0 commented Sep 9, 2012

/system-rev should now be /system/linux,revision
/system-serial-low should now be /system/linux,serial (u32 high followed by u32 low)
/axi/dma/channels should be /axi/dma/broadcom,channels
/display/width should be /display/broadcom,width
/display/height should be /display/broadcom,height
/display/depth should be /display/broadcom,depth

It would be useful if the sdhci and uart0 clocks were in device tree too - I still need to write the linux implementation for the mailbox clock interface

@popcornmix

I've updated to these field names:

"/memory/reg";
"/chosen/bootargs";
"/system/linux,revision";
"/system/linux,serial";
"/axi/usb/hub/ethernet/mac-address";
"/axi/dma/broadcom,channels";
"/display/broadcom,width";
"/display/broadcom,height";
"/display/broadcom,depth";
"/axi/sdhci/clock-frequency";
"/axi/uart0/clock-frequency";

https://dl.dropbox.com/u/3669512/temp/start_devtree.elf

Does the example dtb have the updated field names?

@lp0
lp0 commented Sep 11, 2012

Sample dtb: http://s85.org/45uN9REW:raw
Modified dtb: http://s85.org/o1e1rXmO:raw
It's not booting beyond uncompressing the kernel

@lp0
lp0 commented Sep 11, 2012

It boots once I add an appropriate cmdline.txt, but it's missing the dma channels mask:

[   12.506201] calling  bcm2708_dma_driver_init+0x0/0xc @ 1
[   12.511623] bus: 'platform': add driver bcm2708_dma
[   12.516999] bus: 'platform': driver_probe_device: matched device 20007000.dma with driver bcm2708_dma
[   12.526360] bus: 'platform': really_probe: probing driver bcm2708_dma with device 20007000.dma
[   12.537637] bcm2708_dma 20007000.dma: no usable channels
[   12.544248] bcm2708_dma: probe of 20007000.dma rejects match -6
[   12.554396] initcall bcm2708_dma_driver_init+0x0/0xc returned 0 after 41747 usecs
@popcornmix

I can't see what's wrong with dma channels. If I run with your "modified dtb", and dump the dt after modifiying it, I get:

MESS: 00:00:00.808067:0:          dma {
MESS: 00:00:00.808696:0:              compatible = [62 72 6f 61 64 63 6f 6d 2c 62 63 6d 32 38 33 35 2d 64 6d 61 00 62 72 6f 61 64 63 6f 6d 2c 62 63 6d 32 37 30 38 2d 64 6d 61 00];
MESS: 00:00:00.808779:0:              reg = <0x00007000 0x00001000 0x00e05000 0x00001000>;
MESS: 00:00:00.809730:0:              interrupts = <0x00000001 0x00000010 0x00000001 0x00000011 0x00000001 0x00000012 0x00000001 0x00000013 0x00000001 0x00000014 0x00000001 0x00000015 0x00000001 0x00000016 0x00000001 0x00000017 0x00000001 0x00000018 0x00000001 0x00000019 0x00000001 0x0000001a 0x00000001 0x0000001b 0x00000001 0x0000001b 0x00000001 0x0000001b 0x00000001 0x0000001b 0x00000001 0x0000001f>;
MESS: 00:00:00.809779:0:              broadcom,channels = <0x0000003c>;
MESS: 00:00:00.809794:0:          };
@lp0
lp0 commented Sep 12, 2012

Could you run "sample dtb" through your bootloader and send me the resulting dtb file?

I'm not sure I really ran it with your start.elf as that doesn't boot when I try it again.

@lp0
lp0 commented Sep 12, 2012

Forcing the kernel command line shows that you're removing the timer somehow:

Uncompressing Linux... done, booting the kernel.
[    0.000000] On node 0 totalpages: 49152
[    0.000000] free_area_init_node: node 0, pgdat c104e40c, node_mem_map c158f000
[    0.000000]   Normal zone: 384 pages used for memmap
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 48768 pages, LIFO batch:15
[    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[    0.000000] pcpu-alloc: [0] 0 
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 48768
[    0.000000] Kernel command line: console=ttyAMA0,115200 debug earlyprintk initcall_debug=1 sysrq_always_enabled dmatest.test_buf_size=131072 dmatest.threads_per_chan=1 dmatest.timeout=30000
[    0.000000] sysrq: sysrq always enabled.
[    0.000000] PID hash table entries: 1024 (order: 0, 4096 bytes)
[    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
[    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
[    0.000000] Memory: 192MB = 192MB total
[    0.000000] Memory: 156328k/156328k available, 40280k reserved, 0K highmem
[    0.000000] Virtual kernel memory layout:
[    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
[    0.000000]     fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
[    0.000000]     vmalloc : 0xcc800000 - 0xff000000   ( 808 MB)
[    0.000000]     lowmem  : 0xc0000000 - 0xcc000000   ( 192 MB)
[    0.000000]       .text : 0xc0008000 - 0xc0310660   (3106 kB)
[    0.000000]       .init : 0xc0311000 - 0xc102e000   (13428 kB)
[    0.000000]       .data : 0xc102e000 - 0xc104f060   ( 133 kB)
[    0.000000]        .bss : 0xc104f084 - 0xc158ed38   (5376 kB)
[    0.000000] SLUB: Genslabs=13, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] NR_IRQS:16 nr_irqs:16 16
[    0.000000] Kernel panic - not syncing: can't find system timer
[    0.000000] 
[    0.000000] [<c000deec>] (unwind_backtrace+0x0/0xe0) from [<c026b334>] (panic+0x80/0x1d0)
[    0.000000] [<c026b334>] (panic+0x80/0x1d0) from [<c03169cc>] (bcm2708_time_init+0x210/0x26c)
[    0.000000] [<c03169cc>] (bcm2708_time_init+0x210/0x26c) from [<c031650c>] (bcm2708_timer_init+0xb0/0xc4)
[    0.000000] [<c031650c>] (bcm2708_timer_init+0xb0/0xc4) from [<c03141c4>] (time_init+0x20/0x30)
[    0.000000] [<c03141c4>] (time_init+0x20/0x30) from [<c0311634>] (start_kernel+0x1a8/0x2f0)
[    0.000000] [<c0311634>] (start_kernel+0x1a8/0x2f0) from [<00008040>] (0x8040)
[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: at kernel/lockdep.c:2585 panic+0x158/0x1d0()
[    0.000000] [<c000deec>] (unwind_backtrace+0x0/0xe0) from [<c0014a00>] (warn_slowpath_common+0x48/0x60)
[    0.000000] [<c0014a00>] (warn_slowpath_common+0x48/0x60) from [<c0014ad0>] (warn_slowpath_null+0x18/0x1c)
[    0.000000] [<c0014ad0>] (warn_slowpath_null+0x18/0x1c) from [<c026b40c>] (panic+0x158/0x1d0)
[    0.000000] [<c026b40c>] (panic+0x158/0x1d0) from [<c03169cc>] (bcm2708_time_init+0x210/0x26c)
[    0.000000] [<c03169cc>] (bcm2708_time_init+0x210/0x26c) from [<c031650c>] (bcm2708_timer_init+0xb0/0xc4)
[    0.000000] [<c031650c>] (bcm2708_timer_init+0xb0/0xc4) from [<c03141c4>] (time_init+0x20/0x30)
[    0.000000] [<c03141c4>] (time_init+0x20/0x30) from [<c0311634>] (start_kernel+0x1a8/0x2f0)
[    0.000000] [<c0311634>] (start_kernel+0x1a8/0x2f0) from [<00008040>] (0x8040)
[    0.000000] ---[ end trace 1b75b31a2719ed1c ]---

The memory should always be 256MB with memreserve configured to cover the memory used by the VideoCore.

@popcornmix

This is my modded form of your sample dtb
https://dl.dropbox.com/u/3669512/temp/devtree.dat

The latest official firmware has the recent devtree mods in, so you could try this start.elf:
https://github.com/raspberrypi/firmware/blob/master/boot/start.elf

@lp0
lp0 commented Sep 12, 2012

The revision has the wrong endianness:

    system {
-       linux,revision = <0x0>;
-       linux,serial = <0x0 0x0>;
+       linux,revision = <0x1000000>;
+       linux,serial = <0x0 0xe>;
    };

Both clock-frequency values are being set to 0.

Is your serial definitely 14?

-                   mac-address = [00 00 00 00 00 00];
+                   mac-address = [b8 27 eb 00 00 0e];

I don't know why the kernel isn't finding the timer device. It works ok when the unmodified .dtb is put in place by the bootloader.

@lp0
lp0 commented Sep 12, 2012

I think this may be the problem:

-       bootargs = [00];
+       bootargs = [64];

The only change here is 0x00 to 0x64 and there's no command line string in the .dtb:

@@ -16 +16 @@
-0000240 00 00 00 01 00 00 00 2c 00 00 00 00 00 00 00 02
+0000240 00 00 00 01 00 00 00 2c 64 00 00 00 00 00 00 02

If I change that back to 00 it then boots a bit further, but fails because of:

[    0.000000] Memory: 384MB = 384MB total
[    0.000000] Memory: 220092k/220092k available, 173124k reserved, 0K highmem

You're specifying 384MB of RAM instead of 256MB:

    memory {
        device_type = "memory";
-       reg = <0x0 0x10000000>;
+       reg = <0x0 0x18000000>;
    };
@lp0 lp0 closed this Sep 12, 2012
@lp0 lp0 reopened this Sep 12, 2012
@popcornmix

Yes my serial is 14.
The revision is correct. My board's not been through factory programming so doesn't have a board rev. It does have the overclock/overvolt bit set (bit 24)
The clocks are my fault. It will report them if they are specified in config.txt, but I forgot to report the default. Fixed in my source tree.

(start.elf dropbox link is updated, if you are able to boot).

@popcornmix

I'm not putting the command line in as it won't fit, and I haven't implemented the devtree decompile/compile.
I think if you pad your "/chosen/bootargs" to, say 256 spaces, you will get a command line filled in.

The start.elf on github/dropbox will fill in the correct memory size.

@popcornmix

Okay, I've switched to using libfdt.
I was under the impression I needed dtc, which was under GPL, and so not suitable for linking with GPU code.
But I only actually needed libfdt, which seems usable with the BSD licence.
It seems to be working, and your command line text is filled in now.
I've moved back to a 128M memory split. The last one on dropbox was 192M,

These are updated:
https://dl.dropbox.com/u/3669512/temp/start_devtree.elf
https://dl.dropbox.com/u/3669512/temp/devtree.dat

@lp0
lp0 commented Sep 13, 2012

The memory size is still wrong, it should always be 256MB:
[ 0.000000] Memory: 128MB = 128MB total

I don't see why but it's still losing the system timer for me:
[ 0.000000] Kernel panic - not syncing: can't find system timer

Perhaps the boot args need to be null terminated?

@popcornmix

If I always report 256M, how do you determine if the arm has 128M/192M/224M/240M?

@lp0
lp0 commented Sep 13, 2012

Because the memreserve value will reserve the range used by the VideoCore

@popcornmix

Okay, updated:
https://dl.dropbox.com/u/3669512/temp/start_devtree.elf
https://dl.dropbox.com/u/3669512/temp/devtree.dat

should have /memory/reg and /memreserve filled in as descibed, plus null on end of boot args.

@lp0
lp0 commented Sep 13, 2012

That sample device tree doesn't appear to be modified at all, but the start.elf is still giving me the same 128MB memory and kernel error.

@popcornmix

My mistake. The files are updated again.

@lp0
lp0 commented Sep 14, 2012

They're still the same:

--2012-09-13 23:19:35--  http://dl.dropbox.com/u/3669512/temp/start_devtree.elf
Resolving dl.dropbox.com... 174.129.218.194, 107.20.142.191, 174.129.199.91, ...
Connecting to dl.dropbox.com|174.129.218.194|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2466672 (2.4M) [application/octet-stream]
Saving to: `start_devtree.elf'

     0K ........ ........ ........ ........ .....            100% 1.06M=2.2s

2012-09-13 23:19:38 (1.06 MB/s) - `start_devtree.elf' saved [2466672/2466672]
--2012-09-14 07:36:40--  http://dl.dropbox.com/u/3669512/temp/start_devtree.elf
Resolving dl.dropbox.com... 107.20.235.144, 23.23.134.47, 50.19.106.181, ...
Connecting to dl.dropbox.com|107.20.235.144|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2466672 (2.4M) [application/octet-stream]
Saving to: `start_devtree.elf'

     0K ........ ........ ........ ........ .....            100% 1.02M=2.3s

2012-09-14 07:36:44 (1.02 MB/s) - `start_devtree.elf' saved [2466672/2466672]
@popcornmix
 wget http://dl.dropbox.com/u/3669512/temp/devtree.dat
--2012-09-14 10:54:09--  http://dl.dropbox.com/u/3669512/temp/devtree.dat
Resolving dl.dropbox.com (dl.dropbox.com)... 107.20.138.135, 107.20.207.68, 107.20.235.144, ...
Connecting to dl.dropbox.com (dl.dropbox.com)|107.20.138.135|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12279 (12K) [application/x-ns-proxy-autoconfig]
Saving to: `devtree.dat'

100%[==============================================================================================>] 12,279      65.8K/s   in 0.2s

2012-09-14 10:54:09 (65.8 KB/s) - `devtree.dat' saved [12279/12279]
fdtdump.exe devtree.dat > /cygdrive/c/Users/dc4/Dropbox/Public/temp/devtree.txt

And this file:
https://dl.dropbox.com/u/3669512/temp/devtree.txt
contains:

    memreserve = <0x08000000 0x08000000>;
    chosen {
        bootargs = "dwc_otg.lpm_enable=0 dwc_otg.nak_holdoff_enable=1 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 nfsroot=10.177.65.90:/home/dc4/rootfs ip=dhcp rootwait";
    };
    memory {
        device_type = "memory";
        reg = <0x00000000 0x10000000>;
    };
@lp0
lp0 commented Sep 14, 2012

Ok, but the start.elf appears to be the same as it still reports the memory as 128MB.

$ sha1sum start_devtree.elf
3f505c4a0d4cab79cfdfc48fdb0224bf5ea817fe  start_devtree.elf
@popcornmix

Your sha1sum is correct. However I has just downloaded that dropbox elf file. Run it on my system. Extracted the modified devtree file, and it is binary identical with the modified devtree file I posted (with 128M of memory).
Sure you have renamed start_devtree.elf to start.elf?

@lp0
lp0 commented Sep 14, 2012

That dtb is now correct:

@@ -4,0 +5 @@
+   memreserve = <0x8000000 0x8000000>;
@@ -11 +12 @@
-       bootargs = [64 77 63 5f 6f 74 67 2e 6c 70 6d 5f 65 6e 61 62 6c 65 3d 30 20 64 77 63 5f 6f 74 67 2e 66 69 71 5f 66 69 78 5f 65 6e 61 62 6c 65 3d 31 20 63 6f 6e 73 6f 6c 65 3d 74 74 79 41 4d 41 30 2c 31 31 35 32 30 30 20 6b 67 64 62 6f 63 3d 74 74 79 41 4d 41 30 2c 31 31 35 32 30 30 20 63 6f 6e 73 6f 6c 65 3d 74 74 79 31 20 6e 66 73 72 6f 6f 74 3d 31 30 2e 31 37 37 2e 36 35 2e 39 30 3a 2f 68 6f 6d 65 2f 64 63 34 2f 72 6f 6f 74 66 73 20 69 70 3d 64 68 63 70 20 72 6f 6f 74 77 61 69 74];
+       bootargs = "dwc_otg.lpm_enable=0 dwc_otg.nak_holdoff_enable=1 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 nfsroot=10.177.65.90:/home/dc4/rootfs ip=dhcp rootwait";
@@ -19 +20 @@
-       reg = <0x0 0x8000000>;
+       reg = <0x0 0x10000000>;

I have your latest start.elf:

simon@redrum /media/disk $ ls -l
total 18076
-rw-r--r-- 1 simon root    12279 Sep 13 22:37 bcm2835.dtb
-rw-r--r-- 1 simon root    16528 Jun 15 21:15 bootcode.bin
-rw-r--r-- 1 simon root      165 Sep 13 22:36 cmdline.txt
-rw-r--r-- 1 simon root      171 Jun 15 21:18 config.txt
-rw-r--r-- 1 simon root 11181344 Sep 13 22:38 kernel.img
-rw-r--r-- 1 simon root   314691 Jun 15 21:15 loader.bin
-rw-r--r-- 1 simon root  2466672 Sep 14 17:26 start_devtree.elf
-rw-r--r-- 1 simon root  2466672 Sep 14 07:35 start.elf
-rw-r--r-- 1 simon root  2040328 Jun 15 21:15 start_normal.elf
simon@redrum /media/disk $ sha1sum start.elf 
3f505c4a0d4cab79cfdfc48fdb0224bf5ea817fe  start.elf

However, I'm still getting the old behaviour:

Uncompressing Linux... done, booting the kernel.
[    0.000000] On node 0 totalpages: 32768
[    0.000000] free_area_init_node: node 0, pgdat c104e40c, node_mem_map c158f000
[    0.000000]   Normal zone: 256 pages used for memmap
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 32512 pages, LIFO batch:7
[    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[    0.000000] pcpu-alloc: [0] 0 
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32512
[    0.000000] Kernel command line: console=ttyAMA0,115200 debug earlyprintk initcall_debug=1 sysrq_always_enabled dmatest.test_buf_size=131072 dmatest.threads_per_chan=1 dmatest.timeout=30000
[    0.000000] sysrq: sysrq always enabled.
[    0.000000] PID hash table entries: 512 (order: -1, 2048 bytes)
[    0.000000] Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
[    0.000000] Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
[    0.000000] Memory: 128MB = 128MB total
[    0.000000] Memory: 91408k/91408k available, 39664k reserved, 0K highmem
[    0.000000] Virtual kernel memory layout:
[    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
[    0.000000]     fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
[    0.000000]     vmalloc : 0xc8800000 - 0xff000000   ( 872 MB)
[    0.000000]     lowmem  : 0xc0000000 - 0xc8000000   ( 128 MB)
[    0.000000]       .text : 0xc0008000 - 0xc0310660   (3106 kB)
[    0.000000]       .init : 0xc0311000 - 0xc102e000   (13428 kB)
[    0.000000]       .data : 0xc102e000 - 0xc104f060   ( 133 kB)
[    0.000000]        .bss : 0xc104f084 - 0xc158ed38   (5376 kB)
[    0.000000] SLUB: Genslabs=13, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] NR_IRQS:16 nr_irqs:16 16
[    0.000000] Kernel panic - not syncing: can't find system timer
[    0.000000] 
[    0.000000] [<c000deec>] (unwind_backtrace+0x0/0xe0) from [<c026b334>] (panic+0x80/0x1d0)
[    0.000000] [<c026b334>] (panic+0x80/0x1d0) from [<c03169cc>] (bcm2708_time_init+0x210/0x26c)
[    0.000000] [<c03169cc>] (bcm2708_time_init+0x210/0x26c) from [<c031650c>] (bcm2708_timer_init+0xb0/0xc4)
[    0.000000] [<c031650c>] (bcm2708_timer_init+0xb0/0xc4) from [<c03141c4>] (time_init+0x20/0x30)
[    0.000000] [<c03141c4>] (time_init+0x20/0x30) from [<c0311634>] (start_kernel+0x1a8/0x2f0)
[    0.000000] [<c0311634>] (start_kernel+0x1a8/0x2f0) from [<00008040>] (0x8040)
[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: at kernel/lockdep.c:2585 panic+0x158/0x1d0()
[    0.000000] [<c000deec>] (unwind_backtrace+0x0/0xe0) from [<c0014a00>] (warn_slowpath_common+0x48/0x60)
[    0.000000] [<c0014a00>] (warn_slowpath_common+0x48/0x60) from [<c0014ad0>] (warn_slowpath_null+0x18/0x1c)
[    0.000000] [<c0014ad0>] (warn_slowpath_null+0x18/0x1c) from [<c026b40c>] (panic+0x158/0x1d0)
[    0.000000] [<c026b40c>] (panic+0x158/0x1d0) from [<c03169cc>] (bcm2708_time_init+0x210/0x26c)
[    0.000000] [<c03169cc>] (bcm2708_time_init+0x210/0x26c) from [<c031650c>] (bcm2708_timer_init+0xb0/0xc4)
[    0.000000] [<c031650c>] (bcm2708_timer_init+0xb0/0xc4) from [<c03141c4>] (time_init+0x20/0x30)
[    0.000000] [<c03141c4>] (time_init+0x20/0x30) from [<c0311634>] (start_kernel+0x1a8/0x2f0)
[    0.000000] [<c0311634>] (start_kernel+0x1a8/0x2f0) from [<00008040>] (0x8040)
[    0.000000] ---[ end trace 1b75b31a2719ed1c ]---
@popcornmix

Can you send me a kernel that has the problem?

@lp0
lp0 commented Sep 18, 2012

http://redrum.lp0.eu/rpi-zImage

This currently has the following forced command line: "console=ttyAMA0,115200 debug earlyprintk initcall_debug=1 sysrq_always_enabled dmatest.test_buf_size=131072 dmatest.threads_per_chan=1 dmatest.timeout=30000"

@popcornmix

And can you confirm what you want displayed here:
[ 0.000000] Memory: 128MB = 128MB total

if there is a 192M split for ARM (64M for GPU) and a total of 256M?

@lp0
lp0 commented Sep 18, 2012

It should look like this:

[    0.000000] Memory: 256MB = 256MB total
[    0.000000] Memory: 155784k/155784k available, 106360k reserved, 0K highmem
[    0.000000] Virtual kernel memory layout:
[    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
[    0.000000]     fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
[    0.000000]     vmalloc : 0xd0800000 - 0xff000000   ( 744 MB)
[    0.000000]     lowmem  : 0xc0000000 - 0xd0000000   ( 256 MB)
[    0.000000]       .text : 0xc0008000 - 0xc0310660   (3106 kB)
[    0.000000]       .init : 0xc0311000 - 0xc102e000   (13428 kB)
[    0.000000]       .data : 0xc102e000 - 0xc104f060   ( 133 kB)
[    0.000000]        .bss : 0xc104f084 - 0xc158ed38   (5376 kB)
# head /sys/kernel/debug/memblock/*
==> /sys/kernel/debug/memblock/memory <==
   0: 0x00000000..0x0fffffff

==> /sys/kernel/debug/memblock/reserved <==
   0: 0x00000100..0x000030f6
   1: 0x00004000..0x00007fff
   2: 0x00008200..0x0158ed37
   3: 0x0aff1000..0x0affefff
   4: 0x0affffc0..0x0fffffff
@popcornmix

I'm a bit confused. Your kernel doesn't seem to use the device tree for this:
[ 0.000000] Memory: 128MB = 128MB total
That value comes from the ATAGS. Was that intended?
Con you confirm if you are running with:
disable_cmdline_tags=1

@lp0
lp0 commented Sep 19, 2012

I have "disable_commandline_tags=1" but it still says 128MB even after changing that to "disable_cmdline_tags=1".

@popcornmix popcornmix pushed a commit that referenced this issue Oct 13, 2012
Luck, Tony x86: Remove some noise from boot log when starting cpus
Printing the "start_ip" for every secondary cpu is very noisy on a large
system - and doesn't add any value. Drop this message.

Console log before:
Booting Node   0, Processors  #1
smpboot cpu 1: start_ip = 96000
 #2
smpboot cpu 2: start_ip = 96000
 #3
smpboot cpu 3: start_ip = 96000
 #4
smpboot cpu 4: start_ip = 96000
       ...
 #31
smpboot cpu 31: start_ip = 96000
Brought up 32 CPUs

Console log after:
Booting Node   0, Processors  #1 #2 #3 #4 #5 #6 #7 Ok.
Booting Node   1, Processors  #8 #9 #10 #11 #12 #13 #14 #15 Ok.
Booting Node   0, Processors  #16 #17 #18 #19 #20 #21 #22 #23 Ok.
Booting Node   1, Processors  #24 #25 #26 #27 #28 #29 #30 #31
Brought up 32 CPUs

Acked-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/4f452eb42507460426@agluck-desktop.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
140f190
@popcornmix popcornmix pushed a commit that referenced this issue Oct 13, 2012
Jeff Layton nfs: skip commit in releasepage if we're freeing memory for fs-relate…
…d reasons

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
5cf02d0
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Dec 24, 2012
Eric Dumazet ipv4: arp: fix a lockdep splat in arp_solicit()
Yan Burman reported following lockdep warning :

=============================================
[ INFO: possible recursive locking detected ]
3.7.0+ #24 Not tainted
---------------------------------------------
swapper/1/0 is trying to acquire lock:
  (&n->lock){++--..}, at: [<ffffffff8139f56e>] __neigh_event_send
+0x2e/0x2f0

but task is already holding lock:
  (&n->lock){++--..}, at: [<ffffffff813f63f4>] arp_solicit+0x1d4/0x280

other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&n->lock);
   lock(&n->lock);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

4 locks held by swapper/1/0:
  #0:  (((&n->timer))){+.-...}, at: [<ffffffff8104b350>]
call_timer_fn+0x0/0x1c0
  #1:  (&n->lock){++--..}, at: [<ffffffff813f63f4>] arp_solicit
+0x1d4/0x280
  #2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff81395400>]
dev_queue_xmit+0x0/0x5d0
  #3:  (rcu_read_lock_bh){.+....}, at: [<ffffffff813cb41e>]
ip_finish_output+0x13e/0x640

stack backtrace:
Pid: 0, comm: swapper/1 Not tainted 3.7.0+ #24
Call Trace:
  <IRQ>  [<ffffffff8108c7ac>] validate_chain+0xdcc/0x11f0
  [<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30
  [<ffffffff81120565>] ? kmem_cache_free+0xe5/0x1c0
  [<ffffffff8108d570>] __lock_acquire+0x440/0xc30
  [<ffffffff813c3570>] ? inet_getpeer+0x40/0x600
  [<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30
  [<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0
  [<ffffffff8108ddf5>] lock_acquire+0x95/0x140
  [<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0
  [<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30
  [<ffffffff81448d4b>] _raw_write_lock_bh+0x3b/0x50
  [<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0
  [<ffffffff8139f56e>] __neigh_event_send+0x2e/0x2f0
  [<ffffffff8139f99b>] neigh_resolve_output+0x16b/0x270
  [<ffffffff813cb62d>] ip_finish_output+0x34d/0x640
  [<ffffffff813cb41e>] ? ip_finish_output+0x13e/0x640
  [<ffffffffa046f146>] ? vxlan_xmit+0x556/0xbec [vxlan]
  [<ffffffff813cb9a0>] ip_output+0x80/0xf0
  [<ffffffff813ca368>] ip_local_out+0x28/0x80
  [<ffffffffa046f25a>] vxlan_xmit+0x66a/0xbec [vxlan]
  [<ffffffffa046f146>] ? vxlan_xmit+0x556/0xbec [vxlan]
  [<ffffffff81394a50>] ? skb_gso_segment+0x2b0/0x2b0
  [<ffffffff81449355>] ? _raw_spin_unlock_irqrestore+0x65/0x80
  [<ffffffff81394c57>] ? dev_queue_xmit_nit+0x207/0x270
  [<ffffffff813950c8>] dev_hard_start_xmit+0x298/0x5d0
  [<ffffffff813956f3>] dev_queue_xmit+0x2f3/0x5d0
  [<ffffffff81395400>] ? dev_hard_start_xmit+0x5d0/0x5d0
  [<ffffffff813f5788>] arp_xmit+0x58/0x60
  [<ffffffff813f59db>] arp_send+0x3b/0x40
  [<ffffffff813f6424>] arp_solicit+0x204/0x280
  [<ffffffff813a1a70>] ? neigh_add+0x310/0x310
  [<ffffffff8139f515>] neigh_probe+0x45/0x70
  [<ffffffff813a1c10>] neigh_timer_handler+0x1a0/0x2a0
  [<ffffffff8104b3cf>] call_timer_fn+0x7f/0x1c0
  [<ffffffff8104b350>] ? detach_if_pending+0x120/0x120
  [<ffffffff8104b748>] run_timer_softirq+0x238/0x2b0
  [<ffffffff813a1a70>] ? neigh_add+0x310/0x310
  [<ffffffff81043e51>] __do_softirq+0x101/0x280
  [<ffffffff814518cc>] call_softirq+0x1c/0x30
  [<ffffffff81003b65>] do_softirq+0x85/0xc0
  [<ffffffff81043a7e>] irq_exit+0x9e/0xc0
  [<ffffffff810264f8>] smp_apic_timer_interrupt+0x68/0xa0
  [<ffffffff8145122f>] apic_timer_interrupt+0x6f/0x80
  <EOI>  [<ffffffff8100a054>] ? mwait_idle+0xa4/0x1c0
  [<ffffffff8100a04b>] ? mwait_idle+0x9b/0x1c0
  [<ffffffff8100a6a9>] cpu_idle+0x89/0xe0
  [<ffffffff81441127>] start_secondary+0x1b2/0x1b6

Bug is from arp_solicit(), releasing the neigh lock after arp_send()
In case of vxlan, we eventually need to write lock a neigh lock later.

Its a false positive, but we can get rid of it without lockdep
annotations.

We can instead use neigh_ha_snapshot() helper.

Reported-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9650388
@Olipro Olipro pushed a commit to Olipro/linux-RPi that referenced this issue Dec 26, 2012
Rob Herring ARM: 7493/1: use generic unaligned.h
This moves ARM over to the asm-generic/unaligned.h header. This has the
benefit of better code generated especially for ARMv7 on gcc 4.7+
compilers.

As Arnd Bergmann, points out: The asm-generic version uses the "struct"
version for native-endian unaligned access and the "byteshift" version
for the opposite endianess. The current ARM version however uses the
"byteshift" implementation for both.

Thanks to Nicolas Pitre for the excellent analysis:

Test case:

int foo (int *x) { return get_unaligned(x); }
long long bar (long long *x) { return get_unaligned(x); }

With the current ARM version:

foo:
	ldrb	r3, [r0, #2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
	mov	r3, r3, asl #16	@ tmp154, MEM[(const u8 *)x_1(D) + 2B],
	ldrb	r0, [r0, #3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
	orr	r3, r3, r1, asl #8	@, tmp155, tmp154, MEM[(const u8 *)x_1(D) + 1B],
	orr	r3, r3, r2	@ tmp157, tmp155, MEM[(const u8 *)x_1(D)]
	orr	r0, r3, r0, asl #24	@,, tmp157, MEM[(const u8 *)x_1(D) + 3B],
	bx	lr	@

bar:
	stmfd	sp!, {r4, r5, r6, r7}	@,
	mov	r2, #0	@ tmp184,
	ldrb	r5, [r0, #6]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 6B], MEM[(const u8 *)x_1(D) + 6B]
	ldrb	r4, [r0, #5]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 5B], MEM[(const u8 *)x_1(D) + 5B]
	ldrb	ip, [r0, #2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
	ldrb	r1, [r0, #4]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 4B], MEM[(const u8 *)x_1(D) + 4B]
	mov	r5, r5, asl #16	@ tmp175, MEM[(const u8 *)x_1(D) + 6B],
	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
	orr	r5, r5, r4, asl #8	@, tmp176, tmp175, MEM[(const u8 *)x_1(D) + 5B],
	ldrb	r6, [r0, #7]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 7B], MEM[(const u8 *)x_1(D) + 7B]
	orr	r5, r5, r1	@ tmp178, tmp176, MEM[(const u8 *)x_1(D) + 4B]
	ldrb	r4, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
	mov	ip, ip, asl #16	@ tmp188, MEM[(const u8 *)x_1(D) + 2B],
	ldrb	r1, [r0, #3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
	orr	ip, ip, r7, asl #8	@, tmp189, tmp188, MEM[(const u8 *)x_1(D) + 1B],
	orr	r3, r5, r6, asl #24	@,, tmp178, MEM[(const u8 *)x_1(D) + 7B],
	orr	ip, ip, r4	@ tmp191, tmp189, MEM[(const u8 *)x_1(D)]
	orr	ip, ip, r1, asl #24	@, tmp194, tmp191, MEM[(const u8 *)x_1(D) + 3B],
	mov	r1, r3	@,
	orr	r0, r2, ip	@ tmp171, tmp184, tmp194
	ldmfd	sp!, {r4, r5, r6, r7}
	bx	lr

In both cases the code is slightly suboptimal.  One may wonder why
wasting r2 with the constant 0 in the second case for example.  And all
the mov's could be folded in subsequent orr's, etc.

Now with the asm-generic version:

foo:
	ldr	r0, [r0, #0]	@ unaligned	@,* x
	bx	lr	@

bar:
	mov	r3, r0	@ x, x
	ldr	r0, [r0, #0]	@ unaligned	@,* x
	ldr	r1, [r3, #4]	@ unaligned	@,
	bx	lr	@

This is way better of course, but only because this was compiled for
ARMv7. In this case the compiler knows that the hardware can do
unaligned word access.  This isn't that obvious for foo(), but if we
remove the get_unaligned() from bar as follows:

long long bar (long long *x) {return *x; }

then the resulting code is:

bar:
	ldmia	r0, {r0, r1}	@ x,,
	bx	lr	@

So this proves that the presumed aligned vs unaligned cases does have
influence on the instructions the compiler may use and that the above
unaligned code results are not just an accident.

Still... this isn't fully conclusive without at least looking at the
resulting assembly fron a pre ARMv6 compilation.  Let's see with an
ARMv5 target:

foo:
	ldrb	r3, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ tmp140,
	ldrb	r2, [r0, #2]	@ zero_extendqisi2	@ tmp143,
	ldrb	r0, [r0, #3]	@ zero_extendqisi2	@ tmp146,
	orr	r3, r3, r1, asl #8	@, tmp142, tmp139, tmp140,
	orr	r3, r3, r2, asl #16	@, tmp145, tmp142, tmp143,
	orr	r0, r3, r0, asl #24	@,, tmp145, tmp146,
	bx	lr	@

bar:
	stmfd	sp!, {r4, r5, r6, r7}	@,
	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ tmp140,
	ldrb	r3, [r0, #4]	@ zero_extendqisi2	@ tmp149,
	ldrb	r6, [r0, #5]	@ zero_extendqisi2	@ tmp150,
	ldrb	r5, [r0, #2]	@ zero_extendqisi2	@ tmp143,
	ldrb	r4, [r0, #6]	@ zero_extendqisi2	@ tmp153,
	ldrb	r1, [r0, #7]	@ zero_extendqisi2	@ tmp156,
	ldrb	ip, [r0, #3]	@ zero_extendqisi2	@ tmp146,
	orr	r2, r2, r7, asl #8	@, tmp142, tmp139, tmp140,
	orr	r3, r3, r6, asl #8	@, tmp152, tmp149, tmp150,
	orr	r2, r2, r5, asl #16	@, tmp145, tmp142, tmp143,
	orr	r3, r3, r4, asl #16	@, tmp155, tmp152, tmp153,
	orr	r0, r2, ip, asl #24	@,, tmp145, tmp146,
	orr	r1, r3, r1, asl #24	@,, tmp155, tmp156,
	ldmfd	sp!, {r4, r5, r6, r7}
	bx	lr

Compared to the initial results, this is really nicely optimized and I
couldn't do much better if I were to hand code it myself.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
d25c881
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue May 8, 2013
@htejun htejun kthread: implement probe_kthread_data()
One of the problems that arise when converting dedicated custom threadpool
to workqueue is that the shared worker pool used by workqueue anonimizes
each worker making it more difficult to identify what the worker was doing
on which target from the output of sysrq-t or debug dump from oops, BUG()
and friends.

For example, after writeback is converted to use workqueue instead of
priviate thread pool, there's no easy to tell which backing device a
writeback work item was working on at the time of task dump, which,
according to our writeback brethren, is important in tracking down issues
with a lot of mounted file systems on a lot of different devices.

This patchset implements a way for a work function to mark its execution
instance so that task dump of the worker task includes information to
indicate what the work item was doing.

An example WARN dump would look like the following.

 WARNING: at fs/fs-writeback.c:1015 bdi_writeback_workfn+0x2b4/0x3c0()
 Modules linked in:
 CPU: 0 Pid: 28 Comm: kworker/u18:0 Not tainted 3.9.0-rc1-work+ #24
 Hardware name: empty empty/S3992, BIOS 080011  10/26/2007
 Workqueue: writeback bdi_writeback_workfn (flush-8:16)
  ffffffff820a3a98 ffff88015b927cb8 ffffffff81c61855 ffff88015b927cf8
  ffffffff8108f500 0000000000000000 ffff88007a171948 ffff88007a1716b0
  ffff88015b49df00 ffff88015b8d3940 0000000000000000 ffff88015b927d08
 Call Trace:
  [<ffffffff81c61855>] dump_stack+0x19/0x1b
  [<ffffffff8108f500>] warn_slowpath_common+0x70/0xa0
  ...

This patch:

Implement probe_kthread_data() which returns kthread_data if accessible.
The function is equivalent to kthread_data() except that the specified
@task may not be a kthread or its vfork_done is already cleared rendering
struct kthread inaccessible.  In the former case, probe_kthread_data() may
return any value.  In the latter, NULL.

This will be used to safely print debug information without affecting
synchronization in the normal paths.  Workqueue debug info printing on
dump_stack() and friends will make use of it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cd42d55
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue May 8, 2013
@htejun htejun writeback: set worker desc to identify writeback workers in task dumps
Writeback has been recently converted to use workqueue instead of its
private thread pool implementation.  One negative side effect of this
conversion is that there's no easy to tell which backing device a
writeback work item was working on at the time of task dump, be it
sysrq-t, BUG, WARN or whatever, which, according to our writeback
brethren, is important in tracking down issues with a lot of mounted
file systems on a lot of different devices.

This patch restores that information using the new worker description
facility.  bdi_writeback_workfn() calls set_work_desc() to identify
which bdi it's working on.  The description is printed out together with
the worqueue name and worker function as in the following example dump.

 WARNING: at fs/fs-writeback.c:1015 bdi_writeback_workfn+0x2b4/0x3c0()
 Modules linked in:
 Pid: 28, comm: kworker/u18:0 Not tainted 3.9.0-rc1-work+ #24 empty empty/S3992
 Workqueue: writeback bdi_writeback_workfn (flush-8:16)
  ffffffff820a3a98 ffff88015b927cb8 ffffffff81c61855 ffff88015b927cf8
  ffffffff8108f500 0000000000000000 ffff88007a171948 ffff88007a1716b0
  ffff88015b49df00 ffff88015b8d3940 0000000000000000 ffff88015b927d08
 Call Trace:
  [<ffffffff81c61855>] dump_stack+0x19/0x1b
  [<ffffffff8108f500>] warn_slowpath_common+0x70/0xa0
  [<ffffffff8108f54a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff81200144>] bdi_writeback_workfn+0x2b4/0x3c0
  [<ffffffff810b4c87>] process_one_work+0x1d7/0x660
  [<ffffffff810b5c72>] worker_thread+0x122/0x380
  [<ffffffff810bdfea>] kthread+0xea/0xf0
  [<ffffffff81c6cedc>] ret_from_fork+0x7c/0xb0

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ef3b101
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Jul 14, 2013
Cong Wang bridge: fix some kernel warning in multicast timer
Several people reported the warning: "kernel BUG at kernel/timer.c:729!"
and the stack trace is:

	#7 [ffff880214d25c10] mod_timer+501 at ffffffff8106d905
	#8 [ffff880214d25c50] br_multicast_del_pg.isra.20+261 at ffffffffa0731d25 [bridge]
	#9 [ffff880214d25c80] br_multicast_disable_port+88 at ffffffffa0732948 [bridge]
	#10 [ffff880214d25cb0] br_stp_disable_port+154 at ffffffffa072bcca [bridge]
	#11 [ffff880214d25ce8] br_device_event+520 at ffffffffa072a4e8 [bridge]
	#12 [ffff880214d25d18] notifier_call_chain+76 at ffffffff8164aafc
	#13 [ffff880214d25d50] raw_notifier_call_chain+22 at ffffffff810858f6
	#14 [ffff880214d25d60] call_netdevice_notifiers+45 at ffffffff81536aad
	#15 [ffff880214d25d80] dev_close_many+183 at ffffffff81536d17
	#16 [ffff880214d25dc0] rollback_registered_many+168 at ffffffff81537f68
	#17 [ffff880214d25de8] rollback_registered+49 at ffffffff81538101
	#18 [ffff880214d25e10] unregister_netdevice_queue+72 at ffffffff815390d8
	#19 [ffff880214d25e30] __tun_detach+272 at ffffffffa074c2f0 [tun]
	#20 [ffff880214d25e88] tun_chr_close+45 at ffffffffa074c4bd [tun]
	#21 [ffff880214d25ea8] __fput+225 at ffffffff8119b1f1
	#22 [ffff880214d25ef0] ____fput+14 at ffffffff8119b3fe
	#23 [ffff880214d25f00] task_work_run+159 at ffffffff8107cf7f
	#24 [ffff880214d25f30] do_notify_resume+97 at ffffffff810139e1
	#25 [ffff880214d25f50] int_signal+18 at ffffffff8164f292

this is due to I forgot to check if mp->timer is armed in
br_multicast_del_pg(). This bug is introduced by
commit 9f00b2e (bridge: only expire the mdb entry
when query is received).

Same for __br_mdb_del().

Tested-by: poma <pomidorabelisima@gmail.com>
Reported-by: LiYonghua <809674045@qq.com>
Reported-by: Robert Hancock <hancockrwd@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c7e8e8a
@ghollingworth

This looks a bit off topic and old now!

@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Sep 10, 2013
Andrew Morton lib-crc32-update-the-comments-of-crc32_bele_generic-checkpatch-fixes
WARNING: please, no space before tabs
#20: FILE: lib/crc32.c:135:
+ * ^I^I^ICRC32/CRC32C$

WARNING: line over 80 characters
#24: FILE: lib/crc32.c:137:
+ *	other uses, or the previous crc32/crc32c value if computing incrementally.

total: 0 errors, 2 warnings, 32 lines checked

./patches/lib-crc32-update-the-comments-of-crc32_bele_generic.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
aaac712
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Nov 22, 2013
Borislav Petkov x86: Improve the printout of the SMP bootup CPU table
As the new x86 CPU bootup printout format code maintainer, I am
taking immediate action to improve and clean (and thus indulge
my OCD) the reporting of the cores when coming up online.

Fix padding to a right-hand alignment, cleanup code and bind
reporting width to the max number of supported CPUs on the
system, like this:

 [    0.074509] smpboot: Booting Node   0, Processors:      #1  #2  #3  #4  #5  #6  #7 OK
 [    0.644008] smpboot: Booting Node   1, Processors:  #8  #9 #10 #11 #12 #13 #14 #15 OK
 [    1.245006] smpboot: Booting Node   2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK
 [    1.864005] smpboot: Booting Node   3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK
 [    2.489005] smpboot: Booting Node   4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK
 [    3.093005] smpboot: Booting Node   5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK
 [    3.698005] smpboot: Booting Node   6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK
 [    4.304005] smpboot: Booting Node   7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK
 [    4.961413] Brought up 64 CPUs

and this:

 [    0.072367] smpboot: Booting Node   0, Processors:    #1 #2 #3 #4 #5 #6 #7 OK
 [    0.686329] Brought up 8 CPUs

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Libin <huawei.libin@huawei.com>
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
646e29a
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Nov 22, 2013
Borislav Petkov x86/boot: Further compress CPUs bootup message
Turn it into (for example):

[    0.073380] x86: Booting SMP configuration:
[    0.074005] .... node   #0, CPUs:          #1   #2   #3   #4   #5   #6   #7
[    0.603005] .... node   #1, CPUs:     #8   #9  #10  #11  #12  #13  #14  #15
[    1.200005] .... node   #2, CPUs:    #16  #17  #18  #19  #20  #21  #22  #23
[    1.796005] .... node   #3, CPUs:    #24  #25  #26  #27  #28  #29  #30  #31
[    2.393005] .... node   #4, CPUs:    #32  #33  #34  #35  #36  #37  #38  #39
[    2.996005] .... node   #5, CPUs:    #40  #41  #42  #43  #44  #45  #46  #47
[    3.600005] .... node   #6, CPUs:    #48  #49  #50  #51  #52  #53  #54  #55
[    4.202005] .... node   #7, CPUs:    #56  #57  #58  #59  #60  #61  #62  #63
[    4.811005] .... node   #8, CPUs:    #64  #65  #66  #67  #68  #69  #70  #71
[    5.421006] .... node   #9, CPUs:    #72  #73  #74  #75  #76  #77  #78  #79
[    6.032005] .... node  #10, CPUs:    #80  #81  #82  #83  #84  #85  #86  #87
[    6.648006] .... node  #11, CPUs:    #88  #89  #90  #91  #92  #93  #94  #95
[    7.262005] .... node  #12, CPUs:    #96  #97  #98  #99 #100 #101 #102 #103
[    7.865005] .... node  #13, CPUs:   #104 #105 #106 #107 #108 #109 #110 #111
[    8.466005] .... node  #14, CPUs:   #112 #113 #114 #115 #116 #117 #118 #119
[    9.073006] .... node  #15, CPUs:   #120 #121 #122 #123 #124 #125 #126 #127
[    9.679901] x86: Booted up 16 nodes, 128 CPUs

and drop useless elements.

Change num_digits() to hpa's division-avoiding, cell-phone-typed
version which he went at great lengths and pains to submit on a
Saturday evening.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: huawei.libin@huawei.com
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
a17bce4
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Nov 22, 2013
@majianpeng majianpeng md/raid5: Use conf->device_lock protect changing of multi-thread reso…
…urces.

When we change group_thread_cnt from sysfs entry, it can OOPS.

The kernel messages are:
[  135.299021] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  135.299073] IP: [<ffffffff815188ab>] handle_active_stripes+0x32b/0x440
[  135.299107] PGD 0
[  135.299122] Oops: 0000 [#1] SMP
[  135.299144] Modules linked in: netconsole e1000e ptp pps_core
[  135.299188] CPU: 3 PID: 2225 Comm: md0_raid5 Not tainted 3.12.0+ #24
[  135.299214] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015  11/09/2011
[  135.299255] task: ffff8800b9638f80 ti: ffff8800b77a4000 task.ti: ffff8800b77a4000
[  135.299283] RIP: 0010:[<ffffffff815188ab>]  [<ffffffff815188ab>] handle_active_stripes+0x32b/0x440
[  135.299323] RSP: 0018:ffff8800b77a5c48  EFLAGS: 00010002
[  135.299344] RAX: ffff880037bb5c70 RBX: 0000000000000000 RCX: 0000000000000008
[  135.299371] RDX: ffff880037bb5cb8 RSI: 0000000000000001 RDI: ffff880037bb5c00
[  135.299398] RBP: ffff8800b77a5d08 R08: 0000000000000001 R09: 0000000000000000
[  135.299425] R10: ffff8800b77a5c98 R11: 00000000ffffffff R12: ffff880037bb5c00
[  135.299452] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880037bb5c70
[  135.299479] FS:  0000000000000000(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000
[  135.299510] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  135.299532] CR2: 0000000000000000 CR3: 0000000001c0b000 CR4: 00000000000407e0
[  135.299559] Stack:
[  135.299570]  ffff8800b77a5c88 ffffffff8107383e ffff8800b77a5c88 ffff880037a64300
[  135.299611]  000000000000ec08 ffff880037bb5cb8 ffff8800b77a5c98 ffffffffffffffd8
[  135.299654]  000000000000ec08 ffff880037bb5c60 ffff8800b77a5c98 ffff8800b77a5c98
[  135.299696] Call Trace:
[  135.299711]  [<ffffffff8107383e>] ? __wake_up+0x4e/0x70
[  135.299733]  [<ffffffff81518f88>] raid5d+0x4c8/0x680
[  135.299756]  [<ffffffff817174ed>] ? schedule_timeout+0x15d/0x1f0
[  135.299781]  [<ffffffff81524c9f>] md_thread+0x11f/0x170
[  135.299804]  [<ffffffff81069cd0>] ? wake_up_bit+0x40/0x40
[  135.299826]  [<ffffffff81524b80>] ? md_rdev_init+0x110/0x110
[  135.299850]  [<ffffffff81069656>] kthread+0xc6/0xd0
[  135.299871]  [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
[  135.299899]  [<ffffffff81722ffc>] ret_from_fork+0x7c/0xb0
[  135.299923]  [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
[  135.299951] Code: ff ff ff 0f 84 d7 fe ff ff e9 5c fe ff ff 66 90 41 8b b4 24 d8 01 00 00 45 31 ed 85 f6 0f 8e 7b fd ff ff 49 8b 9c 24 d0 01 00 00 <48> 3b 1b 49 89 dd 0f 85 67 fd ff ff 48 8d 43 28 31 d2 eb 17 90
[  135.300005] RIP  [<ffffffff815188ab>] handle_active_stripes+0x32b/0x440
[  135.300005]  RSP <ffff8800b77a5c48>
[  135.300005] CR2: 0000000000000000
[  135.300005] ---[ end trace 504854e5bb7562ed ]---
[  135.300005] Kernel panic - not syncing: Fatal exception

This is because raid5d() can be running when the multi-thread
resources are changed via system. We see need to provide locking.

mddev->device_lock is suitable, but we cannot simple call
alloc_thread_groups under this lock as we cannot allocate memory
while holding a spinlock.
So change alloc_thread_groups() to allocate and return the data
structures, then raid5_store_group_thread_cnt() can take the lock
while updating the pointers to the data structures.

This fixes a bug introduced in 3.12 and so is suitable for the 3.12.x
stable series.

Fixes: b721420
Cc: stable@vger.kernel.org (3.12)
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Shaohua Li <shli@kernel.org>
60aaf93
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Dec 4, 2013
@linusw linusw net: smc91: fix crash regression on the versatile
After commit e9e4ea7
"net: smc91x: dont't use SMC_outw for fixing up halfword-aligned data"
The Versatile SMSC LAN91C111 is crashing like this:

------------[ cut here ]------------
kernel BUG at /home/linus/linux/drivers/net/ethernet/smsc/smc91x.c:599!
Internal error: Oops - BUG: 0 [#1] ARM
Modules linked in:
CPU: 0 PID: 43 Comm: udhcpc Not tainted 3.13.0-rc1+ #24
task: c6ccfaa0 ti: c6cd0000 task.ti: c6cd0000
PC is at smc_hardware_send_pkt+0x198/0x22c
LR is at smc_hardware_send_pkt+0x24/0x22c
pc : [<c01be324>]    lr : [<c01be1b0>]    psr: 20000013
sp : c6cd1d08  ip : 00000001  fp : 00000000
r10: c02adb08  r9 : 00000000  r8 : c6ced802
r7 : c786fba0  r6 : 00000146  r5 : c8800000  r4 : c78d6000
r3 : 0000000f  r2 : 00000146  r1 : 00000000  r0 : 00000031
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: 06cf4000  DAC: 00000015
Process udhcpc (pid: 43, stack limit = 0xc6cd01c0)
Stack: (0xc6cd1d08 to 0xc6cd2000)
1d00:                   00000010 c8800000 c78d6000 c786fba0 c78d6000 c01be868
1d20: c01be7a4 00004000 00000000 c786fba0 c6c12b80 c0208554 000004d0 c780fc60
1d40: 00000220 c01fb734 00000000 00000000 00000000 c6c9a440 c6c12b80 c78d6000
1d60: c786fba0 c6c9a440 00000000 c021d1d8 00000000 00000000 c6c12b80 c78d6000
1d80: c786fba0 00000001 c6c9a440 c02087f8 c6c9a4a0 00080008 00000000 00000000
1da0: c78d6000 c786fba0 c78d6000 00000138 00000000 00000000 00000000 00000000
1dc0: 00000000 c027ba74 00000138 00000138 00000001 00000010 c6cedc00 00000000
1de0: 00000008 c7404400 c6cd1eec c6cd1f14 c067a73c c065c0b8 00000000 c067a740
1e00: 01ffffff 002040d0 00000000 00000000 00000000 00000000 00000000 ffffffff
1e20: 43004400 00110022 c6cdef20 c027ae8c c6ccfaa0 be82d65c 00000014 be82d3cc
1e40: 00000000 00000000 00000000 c01f2870 00000000 00000000 00000000 c6cd1e88
1e60: c6ccfaa0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1e80: 00000000 00000000 00000031 c7802310 c7802300 00000138 c7404400 c0771da0
1ea0: 00000000 c6cd1eec c7800340 00000138 be82d65c 00000014 be82d3cc c6cd1f08
1ec0: 00000014 00000000 c7404400 c7404400 00000138 c01f4628 c78d6000 00000000
1ee0: 00000000 be82d3cc 00000138 c6cd1f08 00000014 c6cd1ee4 00000001 00000000
1f00: 00000000 00000000 00080011 00000002 06000000 ffffffff 0000ffff 00000002
1f20: 06000000 ffffffff 0000ffff c00928c8 c065c520 c6cd1f58 00000003 c009299c
1f40: 00000003 c065c520 c7404400 00000000 c7404400 c01f2218 c78106b0 c7441cb0
1f60: 00000000 00000006 c06799fc 00000000 00000000 00000006 00000000 c01f3ee0
1f80: 00000000 00000000 be82d678 be82d65c 00000014 00000001 00000122 c00139c8
1fa0: c6cd0000 c0013840 be82d65c 00000014 00000006 be82d3cc 00000138 00000000
1fc0: be82d65c 00000014 00000001 00000122 00000000 00000000 00018cb1 00000000
1fe0: 00003801 be82d3a8 0003a0c7 b6e9af08 60000010 00000006 00000000 00000000
[<c01be324>] (smc_hardware_send_pkt+0x198/0x22c) from [<c01be868>] (smc_hard_start_xmit+0xc4/0x1e8)
[<c01be868>] (smc_hard_start_xmit+0xc4/0x1e8) from [<c0208554>] (dev_hard_start_xmit+0x460/0x4cc)
[<c0208554>] (dev_hard_start_xmit+0x460/0x4cc) from [<c021d1d8>] (sch_direct_xmit+0x94/0x18c)
[<c021d1d8>] (sch_direct_xmit+0x94/0x18c) from [<c02087f8>] (dev_queue_xmit+0x238/0x42c)
[<c02087f8>] (dev_queue_xmit+0x238/0x42c) from [<c027ba74>] (packet_sendmsg+0xbe8/0xd28)
[<c027ba74>] (packet_sendmsg+0xbe8/0xd28) from [<c01f2870>] (sock_sendmsg+0x84/0xa8)
[<c01f2870>] (sock_sendmsg+0x84/0xa8) from [<c01f4628>] (SyS_sendto+0xb8/0xdc)
[<c01f4628>] (SyS_sendto+0xb8/0xdc) from [<c0013840>] (ret_fast_syscall+0x0/0x2c)
Code: e3130002 1a000001 e3130001 0affffcd (e7f001f2)
---[ end trace 81104fe70e8da7fe ]---
Kernel panic - not syncing: Fatal exception in interrupt

This is because the macro operations in smc91x.h defined
for Versatile are missing SMC_outsw() as used in this
commit.

The Versatile needs and uses the same accessors as the other
platforms in the first if(...) clause, just switch it to using
that and we have one problem less to worry about.

Checkpatch complains about spacing, but I have opted to
follow the style of this .h-file.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Eric Miao <eric.y.miao@gmail.com>
Cc: Jonathan Cameron <jic23@cam.ac.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
b268daf
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Dec 4, 2013
@linusw linusw net: smc91: fix crash regression on the versatile
After commit e9e4ea7
"net: smc91x: dont't use SMC_outw for fixing up halfword-aligned data"
The Versatile SMSC LAN91C111 is crashing like this:

------------[ cut here ]------------
kernel BUG at /home/linus/linux/drivers/net/ethernet/smsc/smc91x.c:599!
Internal error: Oops - BUG: 0 [#1] ARM
Modules linked in:
CPU: 0 PID: 43 Comm: udhcpc Not tainted 3.13.0-rc1+ #24
task: c6ccfaa0 ti: c6cd0000 task.ti: c6cd0000
PC is at smc_hardware_send_pkt+0x198/0x22c
LR is at smc_hardware_send_pkt+0x24/0x22c
pc : [<c01be324>]    lr : [<c01be1b0>]    psr: 20000013
sp : c6cd1d08  ip : 00000001  fp : 00000000
r10: c02adb08  r9 : 00000000  r8 : c6ced802
r7 : c786fba0  r6 : 00000146  r5 : c8800000  r4 : c78d6000
r3 : 0000000f  r2 : 00000146  r1 : 00000000  r0 : 00000031
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: 06cf4000  DAC: 00000015
Process udhcpc (pid: 43, stack limit = 0xc6cd01c0)
Stack: (0xc6cd1d08 to 0xc6cd2000)
1d00:                   00000010 c8800000 c78d6000 c786fba0 c78d6000 c01be868
1d20: c01be7a4 00004000 00000000 c786fba0 c6c12b80 c0208554 000004d0 c780fc60
1d40: 00000220 c01fb734 00000000 00000000 00000000 c6c9a440 c6c12b80 c78d6000
1d60: c786fba0 c6c9a440 00000000 c021d1d8 00000000 00000000 c6c12b80 c78d6000
1d80: c786fba0 00000001 c6c9a440 c02087f8 c6c9a4a0 00080008 00000000 00000000
1da0: c78d6000 c786fba0 c78d6000 00000138 00000000 00000000 00000000 00000000
1dc0: 00000000 c027ba74 00000138 00000138 00000001 00000010 c6cedc00 00000000
1de0: 00000008 c7404400 c6cd1eec c6cd1f14 c067a73c c065c0b8 00000000 c067a740
1e00: 01ffffff 002040d0 00000000 00000000 00000000 00000000 00000000 ffffffff
1e20: 43004400 00110022 c6cdef20 c027ae8c c6ccfaa0 be82d65c 00000014 be82d3cc
1e40: 00000000 00000000 00000000 c01f2870 00000000 00000000 00000000 c6cd1e88
1e60: c6ccfaa0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1e80: 00000000 00000000 00000031 c7802310 c7802300 00000138 c7404400 c0771da0
1ea0: 00000000 c6cd1eec c7800340 00000138 be82d65c 00000014 be82d3cc c6cd1f08
1ec0: 00000014 00000000 c7404400 c7404400 00000138 c01f4628 c78d6000 00000000
1ee0: 00000000 be82d3cc 00000138 c6cd1f08 00000014 c6cd1ee4 00000001 00000000
1f00: 00000000 00000000 00080011 00000002 06000000 ffffffff 0000ffff 00000002
1f20: 06000000 ffffffff 0000ffff c00928c8 c065c520 c6cd1f58 00000003 c009299c
1f40: 00000003 c065c520 c7404400 00000000 c7404400 c01f2218 c78106b0 c7441cb0
1f60: 00000000 00000006 c06799fc 00000000 00000000 00000006 00000000 c01f3ee0
1f80: 00000000 00000000 be82d678 be82d65c 00000014 00000001 00000122 c00139c8
1fa0: c6cd0000 c0013840 be82d65c 00000014 00000006 be82d3cc 00000138 00000000
1fc0: be82d65c 00000014 00000001 00000122 00000000 00000000 00018cb1 00000000
1fe0: 00003801 be82d3a8 0003a0c7 b6e9af08 60000010 00000006 00000000 00000000
[<c01be324>] (smc_hardware_send_pkt+0x198/0x22c) from [<c01be868>] (smc_hard_start_xmit+0xc4/0x1e8)
[<c01be868>] (smc_hard_start_xmit+0xc4/0x1e8) from [<c0208554>] (dev_hard_start_xmit+0x460/0x4cc)
[<c0208554>] (dev_hard_start_xmit+0x460/0x4cc) from [<c021d1d8>] (sch_direct_xmit+0x94/0x18c)
[<c021d1d8>] (sch_direct_xmit+0x94/0x18c) from [<c02087f8>] (dev_queue_xmit+0x238/0x42c)
[<c02087f8>] (dev_queue_xmit+0x238/0x42c) from [<c027ba74>] (packet_sendmsg+0xbe8/0xd28)
[<c027ba74>] (packet_sendmsg+0xbe8/0xd28) from [<c01f2870>] (sock_sendmsg+0x84/0xa8)
[<c01f2870>] (sock_sendmsg+0x84/0xa8) from [<c01f4628>] (SyS_sendto+0xb8/0xdc)
[<c01f4628>] (SyS_sendto+0xb8/0xdc) from [<c0013840>] (ret_fast_syscall+0x0/0x2c)
Code: e3130002 1a000001 e3130001 0affffcd (e7f001f2)
---[ end trace 81104fe70e8da7fe ]---
Kernel panic - not syncing: Fatal exception in interrupt

This is because the macro operations in smc91x.h defined
for Versatile are missing SMC_outsw() as used in this
commit.

The Versatile needs and uses the same accessors as the other
platforms in the first if(...) clause, just switch it to using
that and we have one problem less to worry about.

This includes a hunk of a patch from Will Deacon fixin
the other 32bit platforms as well: Innokom, Ramses, PXA,
PCM027.

Checkpatch complains about spacing, but I have opted to
follow the style of this .h-file.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Eric Miao <eric.y.miao@gmail.com>
Cc: Jonathan Cameron <jic23@cam.ac.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
a0c20fb
@davet321 davet321 pushed a commit to davet321/rpi-linux that referenced this issue Dec 9, 2013
@linusw linusw net: smc91: fix crash regression on the versatile
[ Upstream commit a0c20fb ]

After commit e9e4ea7
"net: smc91x: dont't use SMC_outw for fixing up halfword-aligned data"
The Versatile SMSC LAN91C111 is crashing like this:

------------[ cut here ]------------
kernel BUG at /home/linus/linux/drivers/net/ethernet/smsc/smc91x.c:599!
Internal error: Oops - BUG: 0 [#1] ARM
Modules linked in:
CPU: 0 PID: 43 Comm: udhcpc Not tainted 3.13.0-rc1+ #24
task: c6ccfaa0 ti: c6cd0000 task.ti: c6cd0000
PC is at smc_hardware_send_pkt+0x198/0x22c
LR is at smc_hardware_send_pkt+0x24/0x22c
pc : [<c01be324>]    lr : [<c01be1b0>]    psr: 20000013
sp : c6cd1d08  ip : 00000001  fp : 00000000
r10: c02adb08  r9 : 00000000  r8 : c6ced802
r7 : c786fba0  r6 : 00000146  r5 : c8800000  r4 : c78d6000
r3 : 0000000f  r2 : 00000146  r1 : 00000000  r0 : 00000031
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: 06cf4000  DAC: 00000015
Process udhcpc (pid: 43, stack limit = 0xc6cd01c0)
Stack: (0xc6cd1d08 to 0xc6cd2000)
1d00:                   00000010 c8800000 c78d6000 c786fba0 c78d6000 c01be868
1d20: c01be7a4 00004000 00000000 c786fba0 c6c12b80 c0208554 000004d0 c780fc60
1d40: 00000220 c01fb734 00000000 00000000 00000000 c6c9a440 c6c12b80 c78d6000
1d60: c786fba0 c6c9a440 00000000 c021d1d8 00000000 00000000 c6c12b80 c78d6000
1d80: c786fba0 00000001 c6c9a440 c02087f8 c6c9a4a0 00080008 00000000 00000000
1da0: c78d6000 c786fba0 c78d6000 00000138 00000000 00000000 00000000 00000000
1dc0: 00000000 c027ba74 00000138 00000138 00000001 00000010 c6cedc00 00000000
1de0: 00000008 c7404400 c6cd1eec c6cd1f14 c067a73c c065c0b8 00000000 c067a740
1e00: 01ffffff 002040d0 00000000 00000000 00000000 00000000 00000000 ffffffff
1e20: 43004400 00110022 c6cdef20 c027ae8c c6ccfaa0 be82d65c 00000014 be82d3cc
1e40: 00000000 00000000 00000000 c01f2870 00000000 00000000 00000000 c6cd1e88
1e60: c6ccfaa0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1e80: 00000000 00000000 00000031 c7802310 c7802300 00000138 c7404400 c0771da0
1ea0: 00000000 c6cd1eec c7800340 00000138 be82d65c 00000014 be82d3cc c6cd1f08
1ec0: 00000014 00000000 c7404400 c7404400 00000138 c01f4628 c78d6000 00000000
1ee0: 00000000 be82d3cc 00000138 c6cd1f08 00000014 c6cd1ee4 00000001 00000000
1f00: 00000000 00000000 00080011 00000002 06000000 ffffffff 0000ffff 00000002
1f20: 06000000 ffffffff 0000ffff c00928c8 c065c520 c6cd1f58 00000003 c009299c
1f40: 00000003 c065c520 c7404400 00000000 c7404400 c01f2218 c78106b0 c7441cb0
1f60: 00000000 00000006 c06799fc 00000000 00000000 00000006 00000000 c01f3ee0
1f80: 00000000 00000000 be82d678 be82d65c 00000014 00000001 00000122 c00139c8
1fa0: c6cd0000 c0013840 be82d65c 00000014 00000006 be82d3cc 00000138 00000000
1fc0: be82d65c 00000014 00000001 00000122 00000000 00000000 00018cb1 00000000
1fe0: 00003801 be82d3a8 0003a0c7 b6e9af08 60000010 00000006 00000000 00000000
[<c01be324>] (smc_hardware_send_pkt+0x198/0x22c) from [<c01be868>] (smc_hard_start_xmit+0xc4/0x1e8)
[<c01be868>] (smc_hard_start_xmit+0xc4/0x1e8) from [<c0208554>] (dev_hard_start_xmit+0x460/0x4cc)
[<c0208554>] (dev_hard_start_xmit+0x460/0x4cc) from [<c021d1d8>] (sch_direct_xmit+0x94/0x18c)
[<c021d1d8>] (sch_direct_xmit+0x94/0x18c) from [<c02087f8>] (dev_queue_xmit+0x238/0x42c)
[<c02087f8>] (dev_queue_xmit+0x238/0x42c) from [<c027ba74>] (packet_sendmsg+0xbe8/0xd28)
[<c027ba74>] (packet_sendmsg+0xbe8/0xd28) from [<c01f2870>] (sock_sendmsg+0x84/0xa8)
[<c01f2870>] (sock_sendmsg+0x84/0xa8) from [<c01f4628>] (SyS_sendto+0xb8/0xdc)
[<c01f4628>] (SyS_sendto+0xb8/0xdc) from [<c0013840>] (ret_fast_syscall+0x0/0x2c)
Code: e3130002 1a000001 e3130001 0affffcd (e7f001f2)
---[ end trace 81104fe70e8da7fe ]---
Kernel panic - not syncing: Fatal exception in interrupt

This is because the macro operations in smc91x.h defined
for Versatile are missing SMC_outsw() as used in this
commit.

The Versatile needs and uses the same accessors as the other
platforms in the first if(...) clause, just switch it to using
that and we have one problem less to worry about.

This includes a hunk of a patch from Will Deacon fixin
the other 32bit platforms as well: Innokom, Ramses, PXA,
PCM027.

Checkpatch complains about spacing, but I have opted to
follow the style of this .h-file.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Eric Miao <eric.y.miao@gmail.com>
Cc: Jonathan Cameron <jic23@cam.ac.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2c9681e
@davet321 davet321 pushed a commit to davet321/rpi-linux that referenced this issue Dec 9, 2013
@majianpeng majianpeng md/raid5: Use conf->device_lock protect changing of multi-thread reso…
…urces.

commit 60aaf93 upstream.
and commit 0c775d5

When we change group_thread_cnt from sysfs entry, it can OOPS.

The kernel messages are:
[  135.299021] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  135.299073] IP: [<ffffffff815188ab>] handle_active_stripes+0x32b/0x440
[  135.299107] PGD 0
[  135.299122] Oops: 0000 [#1] SMP
[  135.299144] Modules linked in: netconsole e1000e ptp pps_core
[  135.299188] CPU: 3 PID: 2225 Comm: md0_raid5 Not tainted 3.12.0+ #24
[  135.299214] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015  11/09/2011
[  135.299255] task: ffff8800b9638f80 ti: ffff8800b77a4000 task.ti: ffff8800b77a4000
[  135.299283] RIP: 0010:[<ffffffff815188ab>]  [<ffffffff815188ab>] handle_active_stripes+0x32b/0x440
[  135.299323] RSP: 0018:ffff8800b77a5c48  EFLAGS: 00010002
[  135.299344] RAX: ffff880037bb5c70 RBX: 0000000000000000 RCX: 0000000000000008
[  135.299371] RDX: ffff880037bb5cb8 RSI: 0000000000000001 RDI: ffff880037bb5c00
[  135.299398] RBP: ffff8800b77a5d08 R08: 0000000000000001 R09: 0000000000000000
[  135.299425] R10: ffff8800b77a5c98 R11: 00000000ffffffff R12: ffff880037bb5c00
[  135.299452] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880037bb5c70
[  135.299479] FS:  0000000000000000(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000
[  135.299510] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  135.299532] CR2: 0000000000000000 CR3: 0000000001c0b000 CR4: 00000000000407e0
[  135.299559] Stack:
[  135.299570]  ffff8800b77a5c88 ffffffff8107383e ffff8800b77a5c88 ffff880037a64300
[  135.299611]  000000000000ec08 ffff880037bb5cb8 ffff8800b77a5c98 ffffffffffffffd8
[  135.299654]  000000000000ec08 ffff880037bb5c60 ffff8800b77a5c98 ffff8800b77a5c98
[  135.299696] Call Trace:
[  135.299711]  [<ffffffff8107383e>] ? __wake_up+0x4e/0x70
[  135.299733]  [<ffffffff81518f88>] raid5d+0x4c8/0x680
[  135.299756]  [<ffffffff817174ed>] ? schedule_timeout+0x15d/0x1f0
[  135.299781]  [<ffffffff81524c9f>] md_thread+0x11f/0x170
[  135.299804]  [<ffffffff81069cd0>] ? wake_up_bit+0x40/0x40
[  135.299826]  [<ffffffff81524b80>] ? md_rdev_init+0x110/0x110
[  135.299850]  [<ffffffff81069656>] kthread+0xc6/0xd0
[  135.299871]  [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
[  135.299899]  [<ffffffff81722ffc>] ret_from_fork+0x7c/0xb0
[  135.299923]  [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
[  135.299951] Code: ff ff ff 0f 84 d7 fe ff ff e9 5c fe ff ff 66 90 41 8b b4 24 d8 01 00 00 45 31 ed 85 f6 0f 8e 7b fd ff ff 49 8b 9c 24 d0 01 00 00 <48> 3b 1b 49 89 dd 0f 85 67 fd ff ff 48 8d 43 28 31 d2 eb 17 90
[  135.300005] RIP  [<ffffffff815188ab>] handle_active_stripes+0x32b/0x440
[  135.300005]  RSP <ffff8800b77a5c48>
[  135.300005] CR2: 0000000000000000
[  135.300005] ---[ end trace 504854e5bb7562ed ]---
[  135.300005] Kernel panic - not syncing: Fatal exception

This is because raid5d() can be running when the multi-thread
resources are changed via system. We see need to provide locking.

mddev->device_lock is suitable, but we cannot simple call
alloc_thread_groups under this lock as we cannot allocate memory
while holding a spinlock.
So change alloc_thread_groups() to allocate and return the data
structures, then raid5_store_group_thread_cnt() can take the lock
while updating the pointers to the data structures.

This fixes a bug introduced in 3.12 and so is suitable for the 3.12.x
stable series.

Fixes: b721420
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Shaohua Li <shli@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8d3022e
@popcornmix popcornmix pushed a commit that referenced this issue Dec 12, 2013
@linusw linusw net: smc91: fix crash regression on the versatile
commit a0c20fb upstream.

After commit e9e4ea7
"net: smc91x: dont't use SMC_outw for fixing up halfword-aligned data"
The Versatile SMSC LAN91C111 is crashing like this:

------------[ cut here ]------------
kernel BUG at /home/linus/linux/drivers/net/ethernet/smsc/smc91x.c:599!
Internal error: Oops - BUG: 0 [#1] ARM
Modules linked in:
CPU: 0 PID: 43 Comm: udhcpc Not tainted 3.13.0-rc1+ #24
task: c6ccfaa0 ti: c6cd0000 task.ti: c6cd0000
PC is at smc_hardware_send_pkt+0x198/0x22c
LR is at smc_hardware_send_pkt+0x24/0x22c
pc : [<c01be324>]    lr : [<c01be1b0>]    psr: 20000013
sp : c6cd1d08  ip : 00000001  fp : 00000000
r10: c02adb08  r9 : 00000000  r8 : c6ced802
r7 : c786fba0  r6 : 00000146  r5 : c8800000  r4 : c78d6000
r3 : 0000000f  r2 : 00000146  r1 : 00000000  r0 : 00000031
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: 06cf4000  DAC: 00000015
Process udhcpc (pid: 43, stack limit = 0xc6cd01c0)
Stack: (0xc6cd1d08 to 0xc6cd2000)
1d00:                   00000010 c8800000 c78d6000 c786fba0 c78d6000 c01be868
1d20: c01be7a4 00004000 00000000 c786fba0 c6c12b80 c0208554 000004d0 c780fc60
1d40: 00000220 c01fb734 00000000 00000000 00000000 c6c9a440 c6c12b80 c78d6000
1d60: c786fba0 c6c9a440 00000000 c021d1d8 00000000 00000000 c6c12b80 c78d6000
1d80: c786fba0 00000001 c6c9a440 c02087f8 c6c9a4a0 00080008 00000000 00000000
1da0: c78d6000 c786fba0 c78d6000 00000138 00000000 00000000 00000000 00000000
1dc0: 00000000 c027ba74 00000138 00000138 00000001 00000010 c6cedc00 00000000
1de0: 00000008 c7404400 c6cd1eec c6cd1f14 c067a73c c065c0b8 00000000 c067a740
1e00: 01ffffff 002040d0 00000000 00000000 00000000 00000000 00000000 ffffffff
1e20: 43004400 00110022 c6cdef20 c027ae8c c6ccfaa0 be82d65c 00000014 be82d3cc
1e40: 00000000 00000000 00000000 c01f2870 00000000 00000000 00000000 c6cd1e88
1e60: c6ccfaa0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1e80: 00000000 00000000 00000031 c7802310 c7802300 00000138 c7404400 c0771da0
1ea0: 00000000 c6cd1eec c7800340 00000138 be82d65c 00000014 be82d3cc c6cd1f08
1ec0: 00000014 00000000 c7404400 c7404400 00000138 c01f4628 c78d6000 00000000
1ee0: 00000000 be82d3cc 00000138 c6cd1f08 00000014 c6cd1ee4 00000001 00000000
1f00: 00000000 00000000 00080011 00000002 06000000 ffffffff 0000ffff 00000002
1f20: 06000000 ffffffff 0000ffff c00928c8 c065c520 c6cd1f58 00000003 c009299c
1f40: 00000003 c065c520 c7404400 00000000 c7404400 c01f2218 c78106b0 c7441cb0
1f60: 00000000 00000006 c06799fc 00000000 00000000 00000006 00000000 c01f3ee0
1f80: 00000000 00000000 be82d678 be82d65c 00000014 00000001 00000122 c00139c8
1fa0: c6cd0000 c0013840 be82d65c 00000014 00000006 be82d3cc 00000138 00000000
1fc0: be82d65c 00000014 00000001 00000122 00000000 00000000 00018cb1 00000000
1fe0: 00003801 be82d3a8 0003a0c7 b6e9af08 60000010 00000006 00000000 00000000
[<c01be324>] (smc_hardware_send_pkt+0x198/0x22c) from [<c01be868>] (smc_hard_start_xmit+0xc4/0x1e8)
[<c01be868>] (smc_hard_start_xmit+0xc4/0x1e8) from [<c0208554>] (dev_hard_start_xmit+0x460/0x4cc)
[<c0208554>] (dev_hard_start_xmit+0x460/0x4cc) from [<c021d1d8>] (sch_direct_xmit+0x94/0x18c)
[<c021d1d8>] (sch_direct_xmit+0x94/0x18c) from [<c02087f8>] (dev_queue_xmit+0x238/0x42c)
[<c02087f8>] (dev_queue_xmit+0x238/0x42c) from [<c027ba74>] (packet_sendmsg+0xbe8/0xd28)
[<c027ba74>] (packet_sendmsg+0xbe8/0xd28) from [<c01f2870>] (sock_sendmsg+0x84/0xa8)
[<c01f2870>] (sock_sendmsg+0x84/0xa8) from [<c01f4628>] (SyS_sendto+0xb8/0xdc)
[<c01f4628>] (SyS_sendto+0xb8/0xdc) from [<c0013840>] (ret_fast_syscall+0x0/0x2c)
Code: e3130002 1a000001 e3130001 0affffcd (e7f001f2)
---[ end trace 81104fe70e8da7fe ]---
Kernel panic - not syncing: Fatal exception in interrupt

This is because the macro operations in smc91x.h defined
for Versatile are missing SMC_outsw() as used in this
commit.

The Versatile needs and uses the same accessors as the other
platforms in the first if(...) clause, just switch it to using
that and we have one problem less to worry about.

This includes a hunk of a patch from Will Deacon fixin
the other 32bit platforms as well: Innokom, Ramses, PXA,
PCM027.

Checkpatch complains about spacing, but I have opted to
follow the style of this .h-file.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Eric Miao <eric.y.miao@gmail.com>
Cc: Jonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9fac703
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Feb 1, 2014
@mchehab mchehab [media] em28xx: push mutex down to extensions on .fini callback
Avoid circular mutex lock by pushing the dev->lock to the .fini
callback on each extension.

As em28xx-dvb, em28xx-alsa and em28xx-rc have their own data
structures, and don't touch at the common structure during .fini,
only em28xx-v4l needs to be locked.

[   90.994317] ======================================================
[   90.994356] [ INFO: possible circular locking dependency detected ]
[   90.994395] 3.13.0-rc1+ #24 Not tainted
[   90.994427] -------------------------------------------------------
[   90.994458] khubd/54 is trying to acquire lock:
[   90.994490]  (&card->controls_rwsem){++++.+}, at: [<ffffffffa0177b08>] snd_ctl_dev_free+0x28/0x60 [snd]
[   90.994656]
[   90.994656] but task is already holding lock:
[   90.994688]  (&dev->lock){+.+.+.}, at: [<ffffffffa040db81>] em28xx_close_extension+0x31/0x90 [em28xx]
[   90.994843]
[   90.994843] which lock already depends on the new lock.
[   90.994843]
[   90.994874]
[   90.994874] the existing dependency chain (in reverse order) is:
[   90.994905]
-> #1 (&dev->lock){+.+.+.}:
[   90.995057]        [<ffffffff810b8fa3>] __lock_acquire+0xb43/0x1330
[   90.995121]        [<ffffffff810b9f82>] lock_acquire+0xa2/0x120
[   90.995182]        [<ffffffff816a5b6c>] mutex_lock_nested+0x5c/0x3c0
[   90.995245]        [<ffffffffa0422cca>] em28xx_vol_put_mute+0x1ba/0x1d0 [em28xx_alsa]
[   90.995309]        [<ffffffffa017813d>] snd_ctl_elem_write+0xfd/0x140 [snd]
[   90.995376]        [<ffffffffa01791c2>] snd_ctl_ioctl+0xe2/0x810 [snd]
[   90.995442]        [<ffffffff811db8b0>] do_vfs_ioctl+0x300/0x520
[   90.995504]        [<ffffffff811dbb51>] SyS_ioctl+0x81/0xa0
[   90.995568]        [<ffffffff816b1929>] system_call_fastpath+0x16/0x1b
[   90.995630]
-> #0 (&card->controls_rwsem){++++.+}:
[   90.995780]        [<ffffffff810b7a47>] check_prevs_add+0x947/0x950
[   90.995841]        [<ffffffff810b8fa3>] __lock_acquire+0xb43/0x1330
[   90.995901]        [<ffffffff810b9f82>] lock_acquire+0xa2/0x120
[   90.995962]        [<ffffffff816a762b>] down_write+0x3b/0xa0
[   90.996022]        [<ffffffffa0177b08>] snd_ctl_dev_free+0x28/0x60 [snd]
[   90.996088]        [<ffffffffa017a255>] snd_device_free+0x65/0x140 [snd]
[   90.996154]        [<ffffffffa017a751>] snd_device_free_all+0x61/0xa0 [snd]
[   90.996219]        [<ffffffffa0173af4>] snd_card_do_free+0x14/0x130 [snd]
[   90.996283]        [<ffffffffa0173f14>] snd_card_free+0x84/0x90 [snd]
[   90.996349]        [<ffffffffa0423397>] em28xx_audio_fini+0x97/0xb0 [em28xx_alsa]
[   90.996411]        [<ffffffffa040dba6>] em28xx_close_extension+0x56/0x90 [em28xx]
[   90.996475]        [<ffffffffa040f639>] em28xx_usb_disconnect+0x79/0x90 [em28xx]
[   90.996539]        [<ffffffff814a06e7>] usb_unbind_interface+0x67/0x1d0
[   90.996620]        [<ffffffff8142920f>] __device_release_driver+0x7f/0xf0
[   90.996682]        [<ffffffff814292a5>] device_release_driver+0x25/0x40
[   90.996742]        [<ffffffff81428b0c>] bus_remove_device+0x11c/0x1a0
[   90.996801]        [<ffffffff81425536>] device_del+0x136/0x1d0
[   90.996863]        [<ffffffff8149e0c0>] usb_disable_device+0xb0/0x290
[   90.996923]        [<ffffffff814930c5>] usb_disconnect+0xb5/0x1d0
[   90.996984]        [<ffffffff81495ab6>] hub_port_connect_change+0xd6/0xad0
[   90.997044]        [<ffffffff814967c3>] hub_events+0x313/0x9b0
[   90.997105]        [<ffffffff81496e95>] hub_thread+0x35/0x170
[   90.997165]        [<ffffffff8108ea2f>] kthread+0xff/0x120
[   90.997226]        [<ffffffff816b187c>] ret_from_fork+0x7c/0xb0
[   90.997287]
[   90.997287] other info that might help us debug this:
[   90.997287]
[   90.997318]  Possible unsafe locking scenario:
[   90.997318]
[   90.997348]        CPU0                    CPU1
[   90.997378]        ----                    ----
[   90.997408]   lock(&dev->lock);
[   90.997497]                                lock(&card->controls_rwsem);
[   90.997607]                                lock(&dev->lock);
[   90.997697]   lock(&card->controls_rwsem);
[   90.997786]
[   90.997786]  *** DEADLOCK ***
[   90.997786]
[   90.997817] 5 locks held by khubd/54:
[   90.997847]  #0:  (&__lockdep_no_validate__){......}, at: [<ffffffff81496564>] hub_events+0xb4/0x9b0
[   90.998025]  #1:  (&__lockdep_no_validate__){......}, at: [<ffffffff81493076>] usb_disconnect+0x66/0x1d0
[   90.998204]  #2:  (&__lockdep_no_validate__){......}, at: [<ffffffff8142929d>] device_release_driver+0x1d/0x40
[   90.998383]  #3:  (em28xx_devlist_mutex){+.+.+.}, at: [<ffffffffa040db77>] em28xx_close_extension+0x27/0x90 [em28xx]
[   90.998567]  #4:  (&dev->lock){+.+.+.}, at: [<ffffffffa040db81>] em28xx_close_extension+0x31/0x90 [em28xx]

Reviewed-by: Frank Schäfer <fschaefer.oss@googlemail.com>
Tested-by: Antti Palosaari <crope@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
ebbfbc2
@popcornmix popcornmix pushed a commit that referenced this issue Jun 7, 2014
Vlad Yasevich macvlan: Fix lockdep warnings with stacked macvlan devices
[ Upstream commit c674ac3 ]

Macvlan devices try to avoid stacking, but that's not always
successfull or even desired.  As an example, the following
configuration is perefectly legal and valid:

eth0 <--- macvlan0 <---- vlan0.10 <--- macvlan1

However, this configuration produces the following lockdep
trace:
[  115.620418] ======================================================
[  115.620477] [ INFO: possible circular locking dependency detected ]
[  115.620516] 3.15.0-rc1+ #24 Not tainted
[  115.620540] -------------------------------------------------------
[  115.620577] ip/1704 is trying to acquire lock:
[  115.620604]  (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80
[  115.620686]
but task is already holding lock:
[  115.620723]  (&macvlan_netdev_addr_lock_key){+.....}, at: [<ffffffff815da5be>] dev_set_rx_mode+0x1e/0x40
[  115.620795]
which lock already depends on the new lock.

[  115.620853]
the existing dependency chain (in reverse order) is:
[  115.620894]
-> #1 (&macvlan_netdev_addr_lock_key){+.....}:
[  115.620935]        [<ffffffff810d57f2>] lock_acquire+0xa2/0x130
[  115.620974]        [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50
[  115.621019]        [<ffffffffa07296c3>] vlan_dev_set_rx_mode+0x53/0x110 [8021q]
[  115.621066]        [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0
[  115.621105]        [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40
[  115.621143]        [<ffffffff815da6be>] __dev_open+0xde/0x140
[  115.621174]        [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170
[  115.621174]        [<ffffffff815daaa9>] dev_change_flags+0x29/0x60
[  115.621174]        [<ffffffff815e7f11>] do_setlink+0x321/0x9a0
[  115.621174]        [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730
[  115.621174]        [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250
[  115.621174]        [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0
[  115.621174]        [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40
[  115.621174]        [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0
[  115.621174]        [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740
[  115.621174]        [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0
[  115.621174]        [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380
[  115.621174]        [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80
[  115.621174]        [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20
[  115.621174]        [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b
[  115.621174]
-> #0 (&vlan_netdev_addr_lock_key/1){+.....}:
[  115.621174]        [<ffffffff810d4d43>] __lock_acquire+0x1773/0x1a60
[  115.621174]        [<ffffffff810d57f2>] lock_acquire+0xa2/0x130
[  115.621174]        [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50
[  115.621174]        [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80
[  115.621174]        [<ffffffffa0696d2a>] macvlan_set_mac_lists+0xca/0x110 [macvlan]
[  115.621174]        [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0
[  115.621174]        [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40
[  115.621174]        [<ffffffff815da6be>] __dev_open+0xde/0x140
[  115.621174]        [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170
[  115.621174]        [<ffffffff815daaa9>] dev_change_flags+0x29/0x60
[  115.621174]        [<ffffffff815e7f11>] do_setlink+0x321/0x9a0
[  115.621174]        [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730
[  115.621174]        [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250
[  115.621174]        [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0
[  115.621174]        [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40
[  115.621174]        [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0
[  115.621174]        [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740
[  115.621174]        [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0
[  115.621174]        [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380
[  115.621174]        [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80
[  115.621174]        [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20
[  115.621174]        [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b
[  115.621174]
other info that might help us debug this:

[  115.621174]  Possible unsafe locking scenario:

[  115.621174]        CPU0                    CPU1
[  115.621174]        ----                    ----
[  115.621174]   lock(&macvlan_netdev_addr_lock_key);
[  115.621174]                                lock(&vlan_netdev_addr_lock_key/1);
[  115.621174]                                lock(&macvlan_netdev_addr_lock_key);
[  115.621174]   lock(&vlan_netdev_addr_lock_key/1);
[  115.621174]
 *** DEADLOCK ***

[  115.621174] 2 locks held by ip/1704:
[  115.621174]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff815e6dbb>] rtnetlink_rcv+0x1b/0x40
[  115.621174]  #1:  (&macvlan_netdev_addr_lock_key){+.....}, at: [<ffffffff815da5be>] dev_set_rx_mode+0x1e/0x40
[  115.621174]
stack backtrace:
[  115.621174] CPU: 3 PID: 1704 Comm: ip Not tainted 3.15.0-rc1+ #24
[  115.621174] Hardware name: Hewlett-Packard HP xw8400 Workstation/0A08h, BIOS 786D5 v02.38 10/25/2010
[  115.621174]  ffffffff82339ae0 ffff880465f79568 ffffffff816ee20c ffffffff82339ae0
[  115.621174]  ffff880465f795a8 ffffffff816e9e1b ffff880465f79600 ffff880465b019c8
[  115.621174]  0000000000000001 0000000000000002 ffff880465b019c8 ffff880465b01230
[  115.621174] Call Trace:
[  115.621174]  [<ffffffff816ee20c>] dump_stack+0x4d/0x66
[  115.621174]  [<ffffffff816e9e1b>] print_circular_bug+0x200/0x20e
[  115.621174]  [<ffffffff810d4d43>] __lock_acquire+0x1773/0x1a60
[  115.621174]  [<ffffffff810d3172>] ? trace_hardirqs_on_caller+0xb2/0x1d0
[  115.621174]  [<ffffffff810d57f2>] lock_acquire+0xa2/0x130
[  115.621174]  [<ffffffff815df49c>] ? dev_uc_sync+0x3c/0x80
[  115.621174]  [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50
[  115.621174]  [<ffffffff815df49c>] ? dev_uc_sync+0x3c/0x80
[  115.621174]  [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80
[  115.621174]  [<ffffffffa0696d2a>] macvlan_set_mac_lists+0xca/0x110 [macvlan]
[  115.621174]  [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0
[  115.621174]  [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40
[  115.621174]  [<ffffffff815da6be>] __dev_open+0xde/0x140
[  115.621174]  [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170
[  115.621174]  [<ffffffff815daaa9>] dev_change_flags+0x29/0x60
[  115.621174]  [<ffffffff811e1db1>] ? mem_cgroup_bad_page_check+0x21/0x30
[  115.621174]  [<ffffffff815e7f11>] do_setlink+0x321/0x9a0
[  115.621174]  [<ffffffff810d394c>] ? __lock_acquire+0x37c/0x1a60
[  115.621174]  [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730
[  115.621174]  [<ffffffff815ea169>] ? rtnl_newlink+0xe9/0x730
[  115.621174]  [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250
[  115.621174]  [<ffffffff810d329d>] ? trace_hardirqs_on+0xd/0x10
[  115.621174]  [<ffffffff815e6dbb>] ? rtnetlink_rcv+0x1b/0x40
[  115.621174]  [<ffffffff815e6de0>] ? rtnetlink_rcv+0x40/0x40
[  115.621174]  [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0
[  115.621174]  [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40
[  115.621174]  [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0
[  115.621174]  [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740
[  115.621174]  [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0
[  115.621174]  [<ffffffff8119d4af>] ? might_fault+0x5f/0xb0
[  115.621174]  [<ffffffff8119d4f8>] ? might_fault+0xa8/0xb0
[  115.621174]  [<ffffffff8119d4af>] ? might_fault+0x5f/0xb0
[  115.621174]  [<ffffffff815cb51e>] ? verify_iovec+0x5e/0xe0
[  115.621174]  [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380
[  115.621174]  [<ffffffff816faa0d>] ? __do_page_fault+0x11d/0x570
[  115.621174]  [<ffffffff810cfe9f>] ? up_read+0x1f/0x40
[  115.621174]  [<ffffffff816fab04>] ? __do_page_fault+0x214/0x570
[  115.621174]  [<ffffffff8120a10b>] ? mntput_no_expire+0x6b/0x1c0
[  115.621174]  [<ffffffff8120a0b7>] ? mntput_no_expire+0x17/0x1c0
[  115.621174]  [<ffffffff8120a284>] ? mntput+0x24/0x40
[  115.621174]  [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80
[  115.621174]  [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20
[  115.621174]  [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b

Fix this by correctly providing macvlan lockdep class.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1c030bf
@popcornmix popcornmix pushed a commit that referenced this issue Jun 8, 2014
@westeri westeri gpio / ACPI: Don't crash on NULL chip->dev
Commit aa92b6f (gpio / ACPI: Allocate ACPI specific data directly in
acpi_gpiochip_add()) moved ACPI handle checking to acpi_gpiochip_add() but
forgot to check whether chip->dev is NULL before dereferencing it.

Since chip->dev pointer is optional we can end up with crash like following:

 BUG: unable to handle kernel NULL pointer dereference at 00000138
 IP: [<c126c2b3>] acpi_gpiochip_add+0x13/0x190
 *pde = 00000000
 Oops: 0000 [#1] PREEMPT SMP
 Modules linked in: ssb(+) ...
 CPU: 0 PID: 512 Comm: modprobe Tainted: G        W     3.14.0-rc7-next-20140324-t1 #24
 Hardware name: Dell Inc. Latitude D830                   /0UY141, BIOS A02 06/07/2007
 task: f5799900 ti: f543e000 task.ti: f543e000
 EIP: 0060:[<c126c2b3>] EFLAGS: 00010282 CPU: 0
 EIP is at acpi_gpiochip_add+0x13/0x190
 EAX: 00000000 EBX: f57824c4 ECX: 00000000 EDX: 00000000
 ESI: f57824c4 EDI: 00000010 EBP: f543fc54 ESP: f543fc40
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
 CR0: 8005003b CR2: 00000138 CR3: 355f8000 CR4: 000007d0
 Stack:
  f543fc5c fd1f7790 f57824c4 000000be 00000010 f543fc84 c1269f4e f543fc74
  fd1f78bd 00008002 f57822b0 f5782090 fd1f8400 00000286 fd1f9994 00000000
  f5782000 f543fc8c fd1f7e39 f543fcc8 fd1f0bd8 000000c0 00000000 00000000
 Call Trace:
  [<fd1f7790>] ? ssb_pcie_mdio_write+0xa0/0xd0 [ssb]
  [<c1269f4e>] gpiochip_add+0xee/0x300
  [<fd1f78bd>] ? ssb_pcicore_serdes_workaround+0xfd/0x140 [ssb]
  [<fd1f7e39>] ssb_gpio_init+0x89/0xa0 [ssb]
  [<fd1f0bd8>] ssb_attach_queued_buses+0xc8/0x2d0 [ssb]
  [<fd1f0f65>] ssb_bus_register+0x185/0x1f0 [ssb]
  [<fd1f3120>] ? ssb_pci_xtal+0x220/0x220 [ssb]
  [<fd1f106c>] ssb_bus_pcibus_register+0x2c/0x80 [ssb]
  [<fd1f40dc>] ssb_pcihost_probe+0x9c/0x110 [ssb]
  [<c1276c8f>] pci_device_probe+0x6f/0xc0
  [<c11bdb55>] ? sysfs_create_link+0x25/0x40
  [<c131d8b9>] driver_probe_device+0x79/0x360
  [<c1276512>] ? pci_match_device+0xb2/0xc0
  [<c131dc51>] __driver_attach+0x71/0x80
  [<c131dbe0>] ? __device_attach+0x40/0x40
  [<c131bd87>] bus_for_each_dev+0x47/0x80
  [<c131d3ae>] driver_attach+0x1e/0x20
  [<c131dbe0>] ? __device_attach+0x40/0x40
  [<c131d007>] bus_add_driver+0x157/0x230
  [<c131e219>] driver_register+0x59/0xe0
  ...

Fix this by checking chip->dev pointer against NULL first. Also we can now
remove redundant check in acpi_gpiochip_request/free_interrupts().

Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Alexandre Courbot <acourbot@nvidia.com>
Tested-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
e9595f8
@popcornmix popcornmix pushed a commit that referenced this issue Jun 8, 2014
Vlad Yasevich macvlan: Fix lockdep warnings with stacked macvlan devices
Macvlan devices try to avoid stacking, but that's not always
successfull or even desired.  As an example, the following
configuration is perefectly legal and valid:

eth0 <--- macvlan0 <---- vlan0.10 <--- macvlan1

However, this configuration produces the following lockdep
trace:
[  115.620418] ======================================================
[  115.620477] [ INFO: possible circular locking dependency detected ]
[  115.620516] 3.15.0-rc1+ #24 Not tainted
[  115.620540] -------------------------------------------------------
[  115.620577] ip/1704 is trying to acquire lock:
[  115.620604]  (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80
[  115.620686]
but task is already holding lock:
[  115.620723]  (&macvlan_netdev_addr_lock_key){+.....}, at: [<ffffffff815da5be>] dev_set_rx_mode+0x1e/0x40
[  115.620795]
which lock already depends on the new lock.

[  115.620853]
the existing dependency chain (in reverse order) is:
[  115.620894]
-> #1 (&macvlan_netdev_addr_lock_key){+.....}:
[  115.620935]        [<ffffffff810d57f2>] lock_acquire+0xa2/0x130
[  115.620974]        [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50
[  115.621019]        [<ffffffffa07296c3>] vlan_dev_set_rx_mode+0x53/0x110 [8021q]
[  115.621066]        [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0
[  115.621105]        [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40
[  115.621143]        [<ffffffff815da6be>] __dev_open+0xde/0x140
[  115.621174]        [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170
[  115.621174]        [<ffffffff815daaa9>] dev_change_flags+0x29/0x60
[  115.621174]        [<ffffffff815e7f11>] do_setlink+0x321/0x9a0
[  115.621174]        [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730
[  115.621174]        [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250
[  115.621174]        [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0
[  115.621174]        [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40
[  115.621174]        [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0
[  115.621174]        [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740
[  115.621174]        [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0
[  115.621174]        [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380
[  115.621174]        [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80
[  115.621174]        [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20
[  115.621174]        [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b
[  115.621174]
-> #0 (&vlan_netdev_addr_lock_key/1){+.....}:
[  115.621174]        [<ffffffff810d4d43>] __lock_acquire+0x1773/0x1a60
[  115.621174]        [<ffffffff810d57f2>] lock_acquire+0xa2/0x130
[  115.621174]        [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50
[  115.621174]        [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80
[  115.621174]        [<ffffffffa0696d2a>] macvlan_set_mac_lists+0xca/0x110 [macvlan]
[  115.621174]        [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0
[  115.621174]        [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40
[  115.621174]        [<ffffffff815da6be>] __dev_open+0xde/0x140
[  115.621174]        [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170
[  115.621174]        [<ffffffff815daaa9>] dev_change_flags+0x29/0x60
[  115.621174]        [<ffffffff815e7f11>] do_setlink+0x321/0x9a0
[  115.621174]        [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730
[  115.621174]        [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250
[  115.621174]        [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0
[  115.621174]        [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40
[  115.621174]        [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0
[  115.621174]        [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740
[  115.621174]        [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0
[  115.621174]        [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380
[  115.621174]        [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80
[  115.621174]        [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20
[  115.621174]        [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b
[  115.621174]
other info that might help us debug this:

[  115.621174]  Possible unsafe locking scenario:

[  115.621174]        CPU0                    CPU1
[  115.621174]        ----                    ----
[  115.621174]   lock(&macvlan_netdev_addr_lock_key);
[  115.621174]                                lock(&vlan_netdev_addr_lock_key/1);
[  115.621174]                                lock(&macvlan_netdev_addr_lock_key);
[  115.621174]   lock(&vlan_netdev_addr_lock_key/1);
[  115.621174]
 *** DEADLOCK ***

[  115.621174] 2 locks held by ip/1704:
[  115.621174]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff815e6dbb>] rtnetlink_rcv+0x1b/0x40
[  115.621174]  #1:  (&macvlan_netdev_addr_lock_key){+.....}, at: [<ffffffff815da5be>] dev_set_rx_mode+0x1e/0x40
[  115.621174]
stack backtrace:
[  115.621174] CPU: 3 PID: 1704 Comm: ip Not tainted 3.15.0-rc1+ #24
[  115.621174] Hardware name: Hewlett-Packard HP xw8400 Workstation/0A08h, BIOS 786D5 v02.38 10/25/2010
[  115.621174]  ffffffff82339ae0 ffff880465f79568 ffffffff816ee20c ffffffff82339ae0
[  115.621174]  ffff880465f795a8 ffffffff816e9e1b ffff880465f79600 ffff880465b019c8
[  115.621174]  0000000000000001 0000000000000002 ffff880465b019c8 ffff880465b01230
[  115.621174] Call Trace:
[  115.621174]  [<ffffffff816ee20c>] dump_stack+0x4d/0x66
[  115.621174]  [<ffffffff816e9e1b>] print_circular_bug+0x200/0x20e
[  115.621174]  [<ffffffff810d4d43>] __lock_acquire+0x1773/0x1a60
[  115.621174]  [<ffffffff810d3172>] ? trace_hardirqs_on_caller+0xb2/0x1d0
[  115.621174]  [<ffffffff810d57f2>] lock_acquire+0xa2/0x130
[  115.621174]  [<ffffffff815df49c>] ? dev_uc_sync+0x3c/0x80
[  115.621174]  [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50
[  115.621174]  [<ffffffff815df49c>] ? dev_uc_sync+0x3c/0x80
[  115.621174]  [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80
[  115.621174]  [<ffffffffa0696d2a>] macvlan_set_mac_lists+0xca/0x110 [macvlan]
[  115.621174]  [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0
[  115.621174]  [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40
[  115.621174]  [<ffffffff815da6be>] __dev_open+0xde/0x140
[  115.621174]  [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170
[  115.621174]  [<ffffffff815daaa9>] dev_change_flags+0x29/0x60
[  115.621174]  [<ffffffff811e1db1>] ? mem_cgroup_bad_page_check+0x21/0x30
[  115.621174]  [<ffffffff815e7f11>] do_setlink+0x321/0x9a0
[  115.621174]  [<ffffffff810d394c>] ? __lock_acquire+0x37c/0x1a60
[  115.621174]  [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730
[  115.621174]  [<ffffffff815ea169>] ? rtnl_newlink+0xe9/0x730
[  115.621174]  [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250
[  115.621174]  [<ffffffff810d329d>] ? trace_hardirqs_on+0xd/0x10
[  115.621174]  [<ffffffff815e6dbb>] ? rtnetlink_rcv+0x1b/0x40
[  115.621174]  [<ffffffff815e6de0>] ? rtnetlink_rcv+0x40/0x40
[  115.621174]  [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0
[  115.621174]  [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40
[  115.621174]  [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0
[  115.621174]  [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740
[  115.621174]  [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0
[  115.621174]  [<ffffffff8119d4af>] ? might_fault+0x5f/0xb0
[  115.621174]  [<ffffffff8119d4f8>] ? might_fault+0xa8/0xb0
[  115.621174]  [<ffffffff8119d4af>] ? might_fault+0x5f/0xb0
[  115.621174]  [<ffffffff815cb51e>] ? verify_iovec+0x5e/0xe0
[  115.621174]  [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380
[  115.621174]  [<ffffffff816faa0d>] ? __do_page_fault+0x11d/0x570
[  115.621174]  [<ffffffff810cfe9f>] ? up_read+0x1f/0x40
[  115.621174]  [<ffffffff816fab04>] ? __do_page_fault+0x214/0x570
[  115.621174]  [<ffffffff8120a10b>] ? mntput_no_expire+0x6b/0x1c0
[  115.621174]  [<ffffffff8120a0b7>] ? mntput_no_expire+0x17/0x1c0
[  115.621174]  [<ffffffff8120a284>] ? mntput+0x24/0x40
[  115.621174]  [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80
[  115.621174]  [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20
[  115.621174]  [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b

Fix this by correctly providing macvlan lockdep class.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c674ac3
@popcornmix popcornmix pushed a commit that referenced this issue Aug 4, 2014
Li RongQing cxgb4: Not need to hold the adap_rcu_lock lock when read adap_rcu_list
cxgb4_netdev maybe lead to dead lock, since it uses a spin lock, and be called
in both thread and softirq context, but not disable BH, the lockdep report is
below; In fact, cxgb4_netdev only reads adap_rcu_list with RCU protection, so
not need to hold spin lock again.
	=================================
	[ INFO: inconsistent lock state ]
	3.14.7+ #24 Tainted: G         C O
	---------------------------------
	inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
	radvd/3794 [HC0[0]:SC1[1]:HE1:SE0] takes:
	 (adap_rcu_lock){+.?...}, at: [<ffffffffa09989ea>] clip_add+0x2c/0x116 [cxgb4]
	{SOFTIRQ-ON-W} state was registered at:
	  [<ffffffff810fca81>] __lock_acquire+0x34a/0xe48
	  [<ffffffff810fd98b>] lock_acquire+0x82/0x9d
	  [<ffffffff815d6ff8>] _raw_spin_lock+0x34/0x43
	  [<ffffffffa09989ea>] clip_add+0x2c/0x116 [cxgb4]
	  [<ffffffffa0998beb>] cxgb4_inet6addr_handler+0x117/0x12c [cxgb4]
	  [<ffffffff815da98b>] notifier_call_chain+0x32/0x5c
	  [<ffffffff815da9f9>] __atomic_notifier_call_chain+0x44/0x6e
	  [<ffffffff815daa32>] atomic_notifier_call_chain+0xf/0x11
	  [<ffffffff815b1356>] inet6addr_notifier_call_chain+0x16/0x18
	  [<ffffffffa01f72e5>] ipv6_add_addr+0x404/0x46e [ipv6]
	  [<ffffffffa01f8df0>] addrconf_add_linklocal+0x5f/0x95 [ipv6]
	  [<ffffffffa01fc3e9>] addrconf_notify+0x632/0x841 [ipv6]
	  [<ffffffff815da98b>] notifier_call_chain+0x32/0x5c
	  [<ffffffff810e09a1>] __raw_notifier_call_chain+0x9/0xb
	  [<ffffffff810e09b2>] raw_notifier_call_chain+0xf/0x11
	  [<ffffffff8151b3b7>] call_netdevice_notifiers_info+0x4e/0x56
	  [<ffffffff8151b3d0>] call_netdevice_notifiers+0x11/0x13
	  [<ffffffff8151c0a6>] netdev_state_change+0x1f/0x38
	  [<ffffffff8152f004>] linkwatch_do_dev+0x3b/0x49
	  [<ffffffff8152f184>] __linkwatch_run_queue+0x10b/0x144
	  [<ffffffff8152f1dd>] linkwatch_event+0x20/0x27
	  [<ffffffff810d7bc0>] process_one_work+0x1cb/0x2ee
	  [<ffffffff810d7e3b>] worker_thread+0x12e/0x1fc
	  [<ffffffff810dd391>] kthread+0xc4/0xcc
	  [<ffffffff815dc48c>] ret_from_fork+0x7c/0xb0
	irq event stamp: 3388
	hardirqs last  enabled at (3388): [<ffffffff810c6c85>]
	__local_bh_enable_ip+0xaa/0xd9
	hardirqs last disabled at (3387): [<ffffffff810c6c2d>]
	__local_bh_enable_ip+0x52/0xd9
	softirqs last  enabled at (3288): [<ffffffffa01f1d5b>]
	rcu_read_unlock_bh+0x0/0x2f [ipv6]
	softirqs last disabled at (3289): [<ffffffff815ddafc>]
	do_softirq_own_stack+0x1c/0x30

	other info that might help us debug this:
	 Possible unsafe locking scenario:

	       CPU0
	       ----
	  lock(adap_rcu_lock);
	  <Interrupt>
	    lock(adap_rcu_lock);

	 *** DEADLOCK ***

	5 locks held by radvd/3794:
	 #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffffa020b85a>]
	rawv6_sendmsg+0x74b/0xa4d [ipv6]
	 #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8151ac6b>]
	rcu_lock_acquire+0x0/0x29
	 #2:  (rcu_read_lock){.+.+..}, at: [<ffffffffa01f4cca>]
	rcu_lock_acquire.constprop.16+0x0/0x30 [ipv6]
	 #3:  (rcu_read_lock){.+.+..}, at: [<ffffffff810e09b4>]
	rcu_lock_acquire+0x0/0x29
	 #4:  (rcu_read_lock){.+.+..}, at: [<ffffffffa0998782>]
	rcu_lock_acquire.constprop.40+0x0/0x30 [cxgb4]

	stack backtrace:
	CPU: 7 PID: 3794 Comm: radvd Tainted: G         C O 3.14.7+ #24
	Hardware name: Supermicro X7DBU/X7DBU, BIOS 6.00 12/03/2007
	 ffffffff81f15990 ffff88012fdc36a8 ffffffff815d0016 0000000000000006
	 ffff8800c80dc2a0 ffff88012fdc3708 ffffffff815cc727 0000000000000001
	 0000000000000001 ffff880100000000 ffffffff81015b02 ffff8800c80dcb58
	Call Trace:
	 <IRQ>  [<ffffffff815d0016>] dump_stack+0x4e/0x71
	 [<ffffffff815cc727>] print_usage_bug+0x1ec/0x1fd
	 [<ffffffff81015b02>] ? save_stack_trace+0x27/0x44
	 [<ffffffff810fbfaa>] ? check_usage_backwards+0xa0/0xa0
	 [<ffffffff810fc640>] mark_lock+0x11b/0x212
	 [<ffffffff810fca0b>] __lock_acquire+0x2d4/0xe48
	 [<ffffffff810fbfaa>] ? check_usage_backwards+0xa0/0xa0
	 [<ffffffff810fbff6>] ? check_usage_forwards+0x4c/0xa6
	 [<ffffffff810c6c8a>] ? __local_bh_enable_ip+0xaf/0xd9
	 [<ffffffff810fd98b>] lock_acquire+0x82/0x9d
	 [<ffffffffa09989ea>] ? clip_add+0x2c/0x116 [cxgb4]
	 [<ffffffffa0998782>] ? rcu_read_unlock+0x23/0x23 [cxgb4]
	 [<ffffffff815d6ff8>] _raw_spin_lock+0x34/0x43
	 [<ffffffffa09989ea>] ? clip_add+0x2c/0x116 [cxgb4]
	 [<ffffffffa09987b0>] ? rcu_lock_acquire.constprop.40+0x2e/0x30 [cxgb4]
	 [<ffffffffa0998782>] ? rcu_read_unlock+0x23/0x23 [cxgb4]
	 [<ffffffffa09989ea>] clip_add+0x2c/0x116 [cxgb4]
	 [<ffffffffa0998beb>] cxgb4_inet6addr_handler+0x117/0x12c [cxgb4]
	 [<ffffffff810fd99d>] ? lock_acquire+0x94/0x9d
	 [<ffffffff810e09b4>] ? raw_notifier_call_chain+0x11/0x11
	 [<ffffffff815da98b>] notifier_call_chain+0x32/0x5c
	 [<ffffffff815da9f9>] __atomic_notifier_call_chain+0x44/0x6e
	 [<ffffffff815daa32>] atomic_notifier_call_chain+0xf/0x11
	 [<ffffffff815b1356>] inet6addr_notifier_call_chain+0x16/0x18
	 [<ffffffffa01f72e5>] ipv6_add_addr+0x404/0x46e [ipv6]
	 [<ffffffff810fde6a>] ? trace_hardirqs_on+0xd/0xf
	 [<ffffffffa01fb634>] addrconf_prefix_rcv+0x385/0x6ea [ipv6]
	 [<ffffffffa0207950>] ndisc_rcv+0x9d3/0xd76 [ipv6]
	 [<ffffffffa020d536>] icmpv6_rcv+0x592/0x67b [ipv6]
	 [<ffffffff810c6c85>] ? __local_bh_enable_ip+0xaa/0xd9
	 [<ffffffff810c6c85>] ? __local_bh_enable_ip+0xaa/0xd9
	 [<ffffffff810fd8dc>] ? lock_release+0x14e/0x17b
	 [<ffffffffa020df97>] ? rcu_read_unlock+0x21/0x23 [ipv6]
	 [<ffffffff8150df52>] ? rcu_read_unlock+0x23/0x23
	 [<ffffffffa01f4ede>] ip6_input_finish+0x1e4/0x2fc [ipv6]
	 [<ffffffffa01f540b>] ip6_input+0x33/0x38 [ipv6]
	 [<ffffffffa01f5557>] ip6_mc_input+0x147/0x160 [ipv6]
	 [<ffffffffa01f4ba3>] ip6_rcv_finish+0x7c/0x81 [ipv6]
	 [<ffffffffa01f5397>] ipv6_rcv+0x3a1/0x3e2 [ipv6]
	 [<ffffffff8151ef96>] __netif_receive_skb_core+0x4ab/0x511
	 [<ffffffff810fdc94>] ? mark_held_locks+0x71/0x99
	 [<ffffffff8151f0c0>] ? process_backlog+0x69/0x15e
	 [<ffffffff8151f045>] __netif_receive_skb+0x49/0x5b
	 [<ffffffff8151f0cf>] process_backlog+0x78/0x15e
	 [<ffffffff8151f571>] ? net_rx_action+0x1a2/0x1cc
	 [<ffffffff8151f47b>] net_rx_action+0xac/0x1cc
	 [<ffffffff810c69b7>] ? __do_softirq+0xad/0x218
	 [<ffffffff810c69ff>] __do_softirq+0xf5/0x218
	 [<ffffffff815ddafc>] do_softirq_own_stack+0x1c/0x30
	 <EOI>  [<ffffffff810c6bb6>] do_softirq+0x38/0x5d
	 [<ffffffffa01f1d5b>] ? ip6_copy_metadata+0x156/0x156 [ipv6]
	 [<ffffffff810c6c78>] __local_bh_enable_ip+0x9d/0xd9
	 [<ffffffffa01f1d88>] rcu_read_unlock_bh+0x2d/0x2f [ipv6]
	 [<ffffffffa01f28b4>] ip6_finish_output2+0x381/0x3d8 [ipv6]
	 [<ffffffffa01f49ef>] ip6_finish_output+0x6e/0x73 [ipv6]
	 [<ffffffffa01f4a70>] ip6_output+0x7c/0xa8 [ipv6]
	 [<ffffffff815b1bfa>] dst_output+0x18/0x1c
	 [<ffffffff815b1c9e>] ip6_local_out+0x1c/0x21
	 [<ffffffffa01f2489>] ip6_push_pending_frames+0x37d/0x427 [ipv6]
	 [<ffffffff81558af8>] ? skb_orphan+0x39/0x39
	 [<ffffffffa020b85a>] ? rawv6_sendmsg+0x74b/0xa4d [ipv6]
	 [<ffffffffa020ba51>] rawv6_sendmsg+0x942/0xa4d [ipv6]
	 [<ffffffff81584cd2>] inet_sendmsg+0x3d/0x66
	 [<ffffffff81508930>] __sock_sendmsg_nosec+0x25/0x27
	 [<ffffffff8150b0d7>] sock_sendmsg+0x5a/0x7b
	 [<ffffffff810fd8dc>] ? lock_release+0x14e/0x17b
	 [<ffffffff8116d756>] ? might_fault+0x9e/0xa5
	 [<ffffffff8116d70d>] ? might_fault+0x55/0xa5
	 [<ffffffff81508cb1>] ? copy_from_user+0x2a/0x2c
	 [<ffffffff8150b70c>] ___sys_sendmsg+0x226/0x2d9
	 [<ffffffff810fcd25>] ? __lock_acquire+0x5ee/0xe48
	 [<ffffffff810fde01>] ? trace_hardirqs_on_caller+0x145/0x1a1
	 [<ffffffff8118efcb>] ? slab_free_hook.isra.71+0x50/0x59
	 [<ffffffff8115c81f>] ? release_pages+0xbc/0x181
	 [<ffffffff810fd99d>] ? lock_acquire+0x94/0x9d
	 [<ffffffff81115e97>] ? read_seqcount_begin.constprop.25+0x73/0x90
	 [<ffffffff8150c408>] __sys_sendmsg+0x3d/0x5b
	 [<ffffffff8150c433>] SyS_sendmsg+0xd/0x19
	 [<ffffffff815dc53d>] system_call_fastpath+0x1a/0x1f

Reported-by: Ben Greear <greearb@candelatech.com>
Cc: Casey Leedom <leedom@chelsio.com>
Cc: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ee9a33b
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Oct 17, 2014
@arun-chandran arun-chandran arm64: convert part of soft_restart() to assembly
The current soft_restart() and setup_restart implementations incorrectly
assume that compiler will not spill/fill values to/from stack. However
this assumption seems to be wrong, revealed by the disassembly of the
currently existing code (v3.16) built with Linaro GCC 4.9-2014.05.

ffffffc000085224 <soft_restart>:
ffffffc000085224:  a9be7bfd  stp    x29, x30, [sp,#-32]!
ffffffc000085228:  910003fd  mov    x29, sp
ffffffc00008522c:  f9000fa0  str    x0, [x29,#24]
ffffffc000085230:  94003d21  bl     ffffffc0000946b4 <setup_mm_for_reboot>
ffffffc000085234:  94003b33  bl     ffffffc000093f00 <flush_cache_all>
ffffffc000085238:  94003dfa  bl     ffffffc000094a20 <cpu_cache_off>
ffffffc00008523c:  94003b31  bl     ffffffc000093f00 <flush_cache_all>
ffffffc000085240:  b0003321  adrp   x1, ffffffc0006ea000 <reset_devices>

ffffffc000085244:  f9400fa0  ldr    x0, [x29,#24] ----> spilled addr
ffffffc000085248:  f942fc22  ldr    x2, [x1,#1528] ----> global memstart_addr

ffffffc00008524c:  f0000061  adrp   x1, ffffffc000094000 <__inval_cache_range+0x40>
ffffffc000085250:  91290021  add    x1, x1, #0xa40
ffffffc000085254:  8b010041  add    x1, x2, x1
ffffffc000085258:  d2c00802  mov    x2, #0x4000000000           // #274877906944
ffffffc00008525c:  8b020021  add    x1, x1, x2
ffffffc000085260:  d63f0020  blr    x1
...

Here the compiler generates memory accesses after the cache is disabled,
loading stale values for the spilled value and global variable. As we cannot
control when the compiler will access memory we must rewrite the
functions in assembly to stash values we need in registers prior to
disabling the cache, avoiding the use of memory.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Arun Chandran <achandran@mvista.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
5e05153
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Oct 17, 2014
Dave Hansen x86, sched: Add new topology for multi-NUMA-node CPUs
I'm getting the spew below when booting with Haswell (Xeon
E5-2699 v3) CPUs and the "Cluster-on-Die" (CoD) feature enabled
in the BIOS.  It seems similar to the issue that some folks from
AMD ran in to on their systems and addressed in this commit:

  161270f ("x86/smp: Fix topology checks on AMD MCM CPUs")

Both these Intel and AMD systems break an assumption which is
being enforced by topology_sane(): a socket may not contain more
than one NUMA node.

AMD special-cased their system by looking for a cpuid flag.  The
Intel mode is dependent on BIOS options and I do not know of a
way which it is enumerated other than the tables being parsed
during the CPU bringup process.  In other words, we have to trust
the ACPI tables <shudder>.

This detects the situation where a NUMA node occurs at a place in
the middle of the "CPU" sched domains.  It replaces the default
topology with one that relies on the NUMA information from the
firmware (SRAT table) for all levels of sched domains above the
hyperthreads.

This also fixes a sysfs bug.  We used to freak out when we saw
the "mc" group cross a node boundary, so we stopped building the
MC group.  MC gets exported as the 'core_siblings_list' in
/sys/devices/system/cpu/cpu*/topology/ and this caused CPUs with
the same 'physical_package_id' to not be listed together in
'core_siblings_list'.  This violates a statement from
Documentation/ABI/testing/sysfs-devices-system-cpu:

	core_siblings: internal kernel map of cpu#'s hardware threads
	within the same physical_package_id.

	core_siblings_list: human-readable list of the logical CPU
	numbers within the same physical_package_id as cpu#.

The sysfs effects here cause an issue with the hwloc tool where
it gets confused and thinks there are more sockets than are
physically present.

Before this patch, there are two packages:

# cd /sys/devices/system/cpu/
# cat cpu*/topology/physical_package_id | sort | uniq -c
     18 0
     18 1

But 4 _sets_ of core siblings:

# cat cpu*/topology/core_siblings_list | sort | uniq -c
      9 0-8
      9 18-26
      9 27-35
      9 9-17

After this set, there are only 2 sets of core siblings, which
is what we expect for a 2-socket system.

# cat cpu*/topology/physical_package_id | sort | uniq -c
     18 0
     18 1
# cat cpu*/topology/core_siblings_list | sort | uniq -c
     18 0-17
     18 18-35

Example spew:
...
	NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
	 #2  #3  #4  #5  #6  #7  #8
	.... node  #1, CPUs:    #9
	------------[ cut here ]------------
	WARNING: CPU: 9 PID: 0 at /home/ak/hle/linux-hle-2.6/arch/x86/kernel/smpboot.c:306 topology_sane.isra.2+0x74/0x90()
	sched: CPU #9's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
	Modules linked in:
	CPU: 9 PID: 0 Comm: swapper/9 Not tainted 3.17.0-rc1-00293-g8e01c4d-dirty #631
	Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRNDSDP1.86B.0036.R05.1407140519 07/14/2014
	0000000000000009 ffff88046ddabe00 ffffffff8172e485 ffff88046ddabe48
	ffff88046ddabe38 ffffffff8109691d 000000000000b001 0000000000000009
	ffff88086fc12580 000000000000b020 0000000000000009 ffff88046ddabe98
	Call Trace:
	[<ffffffff8172e485>] dump_stack+0x45/0x56
	[<ffffffff8109691d>] warn_slowpath_common+0x7d/0xa0
	[<ffffffff8109698c>] warn_slowpath_fmt+0x4c/0x50
	[<ffffffff81074f94>] topology_sane.isra.2+0x74/0x90
	[<ffffffff8107530e>] set_cpu_sibling_map+0x31e/0x4f0
	[<ffffffff8107568d>] start_secondary+0x1ad/0x240
	---[ end trace 3fe5f587a9fcde61 ]---
	#10 #11 #12 #13 #14 #15 #16 #17
	.... node  #2, CPUs:   #18 #19 #20 #21 #22 #23 #24 #25 #26
	.... node  #3, CPUs:   #27 #28 #29 #30 #31 #32 #33 #34 #35

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
[ Added LLC domain and s/match_mc/match_die/ ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: brice.goglin@gmail.com
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/20140918193334.C065EBCE@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
cebf15e
@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Nov 5, 2014
@pranith pranith powerpc: Wire up sys_bpf() syscall
This patch wires up the new syscall sys_bpf() on powerpc.

Passes the tests in samples/bpf:

    #0 add+sub+mul OK
    #1 unreachable OK
    #2 unreachable2 OK
    #3 out of range jump OK
    #4 out of range jump2 OK
    #5 test1 ld_imm64 OK
    #6 test2 ld_imm64 OK
    #7 test3 ld_imm64 OK
    #8 test4 ld_imm64 OK
    #9 test5 ld_imm64 OK
    #10 no bpf_exit OK
    #11 loop (back-edge) OK
    #12 loop2 (back-edge) OK
    #13 conditional loop OK
    #14 read uninitialized register OK
    #15 read invalid register OK
    #16 program doesn't init R0 before exit OK
    #17 stack out of bounds OK
    #18 invalid call insn1 OK
    #19 invalid call insn2 OK
    #20 invalid function call OK
    #21 uninitialized stack1 OK
    #22 uninitialized stack2 OK
    #23 check valid spill/fill OK
    #24 check corrupted spill/fill OK
    #25 invalid src register in STX OK
    #26 invalid dst register in STX OK
    #27 invalid dst register in ST OK
    #28 invalid src register in LDX OK
    #29 invalid dst register in LDX OK
    #30 junk insn OK
    #31 junk insn2 OK
    #32 junk insn3 OK
    #33 junk insn4 OK
    #34 junk insn5 OK
    #35 misaligned read from stack OK
    #36 invalid map_fd for function call OK
    #37 don't check return value before access OK
    #38 access memory with incorrect alignment OK
    #39 sometimes access memory with incorrect alignment OK
    #40 jump test 1 OK
    #41 jump test 2 OK
    #42 jump test 3 OK
    #43 jump test 4 OK

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
[mpe: test using samples/bpf]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
fcbb539
@popcornmix popcornmix pushed a commit that referenced this issue Nov 22, 2014
@davem330 davem330 sparc64: Fix crashes in schizo_pcierr_intr_other().
[ Upstream commit 7da89a2 ]

Meelis Roos reports crashes during bootup on a V480 that look like
this:

====================
[   61.300577] PCI: Scanning PBM /pci@9,600000
[   61.304867] schizo f009b070: PCI host bridge to bus 0003:00
[   61.310385] pci_bus 0003:00: root bus resource [io  0x7ffe9000000-0x7ffe9ffffff] (bus address [0x0000-0xffffff])
[   61.320515] pci_bus 0003:00: root bus resource [mem 0x7fb00000000-0x7fbffffffff] (bus address [0x00000000-0xffffffff])
[   61.331173] pci_bus 0003:00: root bus resource [bus 00]
[   61.385344] Unable to handle kernel NULL pointer dereference
[   61.390970] tsk->{mm,active_mm}->context = 0000000000000000
[   61.396515] tsk->{mm,active_mm}->pgd = fff000b000002000
[   61.401716]               \|/ ____ \|/
[   61.401716]               "@'/ .. \`@"
[   61.401716]               /_| \__/ |_\
[   61.401716]                  \__U_/
[   61.416362] swapper/0(0): Oops [#1]
[   61.419837] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc1-00422-g2cc9188-dirty #24
[   61.427975] task: fff000b0fd8e9c40 ti: fff000b0fd928000 task.ti: fff000b0fd928000
[   61.435426] TSTATE: 0000004480e01602 TPC: 00000000004455e4 TNPC: 00000000004455e8 Y: 00000000    Not tainted
[   61.445230] TPC: <schizo_pcierr_intr+0x104/0x560>
[   61.449897] g0: 0000000000000000 g1: 0000000000000000 g2: 0000000000a10f78 g3: 000000000000000a
[   61.458563] g4: fff000b0fd8e9c40 g5: fff000b0fdd82000 g6: fff000b0fd928000 g7: 000000000000000a
[   61.467229] o0: 000000000000003d o1: 0000000000000000 o2: 0000000000000006 o3: fff000b0ffa5fc7e
[   61.475894] o4: 0000000000060000 o5: c000000000000000 sp: fff000b0ffa5f3c1 ret_pc: 00000000004455cc
[   61.484909] RPC: <schizo_pcierr_intr+0xec/0x560>
[   61.489500] l0: fff000b0fd8e9c40 l1: 0000000000a20800 l2: 0000000000000000 l3: 000000000119a430
[   61.498164] l4: 0000000001742400 l5: 00000000011cfbe0 l6: 00000000011319c0 l7: fff000b0fd8ea348
[   61.506830] i0: 0000000000000000 i1: fff000b0fdb34000 i2: 0000000320000000 i3: 0000000000000000
[   61.515497] i4: 00060002010b003f i5: 0000040004e02000 i6: fff000b0ffa5f481 i7: 00000000004a9920
[   61.524175] I7: <handle_irq_event_percpu+0x40/0x140>
[   61.529099] Call Trace:
[   61.531531]  [00000000004a9920] handle_irq_event_percpu+0x40/0x140
[   61.537681]  [00000000004a9a58] handle_irq_event+0x38/0x80
[   61.543145]  [00000000004ac77c] handle_fasteoi_irq+0xbc/0x200
[   61.548860]  [00000000004a9084] generic_handle_irq+0x24/0x40
[   61.554500]  [000000000042be0c] handler_irq+0xac/0x100
====================

The problem is that pbm->pci_bus->self is NULL.

This code is trying to go through the standard PCI config space
interfaces to read the PCI controller's PCI_STATUS register.

This doesn't work, because we more often than not do not enumerate
the PCI controller as a bonafide PCI device during the OF device
node scan.  Therefore bus->self remains NULL.

Existing common code for PSYCHO and PSYCHO-like PCI controllers
handles this properly, by doing the config space access directly.

Do the same here, pbm->pci_ops->{read,write}().

Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
76bbecb
@popcornmix popcornmix pushed a commit that referenced this issue Dec 7, 2014
@davem330 davem330 sparc64: Fix crashes in schizo_pcierr_intr_other().
[ Upstream commit 7da89a2 ]

Meelis Roos reports crashes during bootup on a V480 that look like
this:

====================
[   61.300577] PCI: Scanning PBM /pci@9,600000
[   61.304867] schizo f009b070: PCI host bridge to bus 0003:00
[   61.310385] pci_bus 0003:00: root bus resource [io  0x7ffe9000000-0x7ffe9ffffff] (bus address [0x0000-0xffffff])
[   61.320515] pci_bus 0003:00: root bus resource [mem 0x7fb00000000-0x7fbffffffff] (bus address [0x00000000-0xffffffff])
[   61.331173] pci_bus 0003:00: root bus resource [bus 00]
[   61.385344] Unable to handle kernel NULL pointer dereference
[   61.390970] tsk->{mm,active_mm}->context = 0000000000000000
[   61.396515] tsk->{mm,active_mm}->pgd = fff000b000002000
[   61.401716]               \|/ ____ \|/
[   61.401716]               "@'/ .. \`@"
[   61.401716]               /_| \__/ |_\
[   61.401716]                  \__U_/
[   61.416362] swapper/0(0): Oops [#1]
[   61.419837] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc1-00422-g2cc9188-dirty #24
[   61.427975] task: fff000b0fd8e9c40 ti: fff000b0fd928000 task.ti: fff000b0fd928000
[   61.435426] TSTATE: 0000004480e01602 TPC: 00000000004455e4 TNPC: 00000000004455e8 Y: 00000000    Not tainted
[   61.445230] TPC: <schizo_pcierr_intr+0x104/0x560>
[   61.449897] g0: 0000000000000000 g1: 0000000000000000 g2: 0000000000a10f78 g3: 000000000000000a
[   61.458563] g4: fff000b0fd8e9c40 g5: fff000b0fdd82000 g6: fff000b0fd928000 g7: 000000000000000a
[   61.467229] o0: 000000000000003d o1: 0000000000000000 o2: 0000000000000006 o3: fff000b0ffa5fc7e
[   61.475894] o4: 0000000000060000 o5: c000000000000000 sp: fff000b0ffa5f3c1 ret_pc: 00000000004455cc
[   61.484909] RPC: <schizo_pcierr_intr+0xec/0x560>
[   61.489500] l0: fff000b0fd8e9c40 l1: 0000000000a20800 l2: 0000000000000000 l3: 000000000119a430
[   61.498164] l4: 0000000001742400 l5: 00000000011cfbe0 l6: 00000000011319c0 l7: fff000b0fd8ea348
[   61.506830] i0: 0000000000000000 i1: fff000b0fdb34000 i2: 0000000320000000 i3: 0000000000000000
[   61.515497] i4: 00060002010b003f i5: 0000040004e02000 i6: fff000b0ffa5f481 i7: 00000000004a9920
[   61.524175] I7: <handle_irq_event_percpu+0x40/0x140>
[   61.529099] Call Trace:
[   61.531531]  [00000000004a9920] handle_irq_event_percpu+0x40/0x140
[   61.537681]  [00000000004a9a58] handle_irq_event+0x38/0x80
[   61.543145]  [00000000004ac77c] handle_fasteoi_irq+0xbc/0x200
[   61.548860]  [00000000004a9084] generic_handle_irq+0x24/0x40
[   61.554500]  [000000000042be0c] handler_irq+0xac/0x100
====================

The problem is that pbm->pci_bus->self is NULL.

This code is trying to go through the standard PCI config space
interfaces to read the PCI controller's PCI_STATUS register.

This doesn't work, because we more often than not do not enumerate
the PCI controller as a bonafide PCI device during the OF device
node scan.  Therefore bus->self remains NULL.

Existing common code for PSYCHO and PSYCHO-like PCI controllers
handles this properly, by doing the config space access directly.

Do the same here, pbm->pci_ops->{read,write}().

Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
80059d6
@popcornmix popcornmix pushed a commit that referenced this issue Dec 20, 2014
@davem330 davem330 sparc64: Fix crashes in schizo_pcierr_intr_other().
[ Upstream commit 7da89a2 ]

Meelis Roos reports crashes during bootup on a V480 that look like
this:

====================
[   61.300577] PCI: Scanning PBM /pci@9,600000
[   61.304867] schizo f009b070: PCI host bridge to bus 0003:00
[   61.310385] pci_bus 0003:00: root bus resource [io  0x7ffe9000000-0x7ffe9ffffff] (bus address [0x0000-0xffffff])
[   61.320515] pci_bus 0003:00: root bus resource [mem 0x7fb00000000-0x7fbffffffff] (bus address [0x00000000-0xffffffff])
[   61.331173] pci_bus 0003:00: root bus resource [bus 00]
[   61.385344] Unable to handle kernel NULL pointer dereference
[   61.390970] tsk->{mm,active_mm}->context = 0000000000000000
[   61.396515] tsk->{mm,active_mm}->pgd = fff000b000002000
[   61.401716]               \|/ ____ \|/
[   61.401716]               "@'/ .. \`@"
[   61.401716]               /_| \__/ |_\
[   61.401716]                  \__U_/
[   61.416362] swapper/0(0): Oops [#1]
[   61.419837] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc1-00422-g2cc9188-dirty #24
[   61.427975] task: fff000b0fd8e9c40 ti: fff000b0fd928000 task.ti: fff000b0fd928000
[   61.435426] TSTATE: 0000004480e01602 TPC: 00000000004455e4 TNPC: 00000000004455e8 Y: 00000000    Not tainted
[   61.445230] TPC: <schizo_pcierr_intr+0x104/0x560>
[   61.449897] g0: 0000000000000000 g1: 0000000000000000 g2: 0000000000a10f78 g3: 000000000000000a
[   61.458563] g4: fff000b0fd8e9c40 g5: fff000b0fdd82000 g6: fff000b0fd928000 g7: 000000000000000a
[   61.467229] o0: 000000000000003d o1: 0000000000000000 o2: 0000000000000006 o3: fff000b0ffa5fc7e
[   61.475894] o4: 0000000000060000 o5: c000000000000000 sp: fff000b0ffa5f3c1 ret_pc: 00000000004455cc
[   61.484909] RPC: <schizo_pcierr_intr+0xec/0x560>
[   61.489500] l0: fff000b0fd8e9c40 l1: 0000000000a20800 l2: 0000000000000000 l3: 000000000119a430
[   61.498164] l4: 0000000001742400 l5: 00000000011cfbe0 l6: 00000000011319c0 l7: fff000b0fd8ea348
[   61.506830] i0: 0000000000000000 i1: fff000b0fdb34000 i2: 0000000320000000 i3: 0000000000000000
[   61.515497] i4: 00060002010b003f i5: 0000040004e02000 i6: fff000b0ffa5f481 i7: 00000000004a9920
[   61.524175] I7: <handle_irq_event_percpu+0x40/0x140>
[   61.529099] Call Trace:
[   61.531531]  [00000000004a9920] handle_irq_event_percpu+0x40/0x140
[   61.537681]  [00000000004a9a58] handle_irq_event+0x38/0x80
[   61.543145]  [00000000004ac77c] handle_fasteoi_irq+0xbc/0x200
[   61.548860]  [00000000004a9084] generic_handle_irq+0x24/0x40
[   61.554500]  [000000000042be0c] handler_irq+0xac/0x100
====================

The problem is that pbm->pci_bus->self is NULL.

This code is trying to go through the standard PCI config space
interfaces to read the PCI controller's PCI_STATUS register.

This doesn't work, because we more often than not do not enumerate
the PCI controller as a bonafide PCI device during the OF device
node scan.  Therefore bus->self remains NULL.

Existing common code for PSYCHO and PSYCHO-like PCI controllers
handles this properly, by doing the config space access directly.

Do the same here, pbm->pci_ops->{read,write}().

Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
b51182e
@julianscheel julianscheel pushed a commit to julianscheel/linux that referenced this issue Mar 10, 2015
@davem330 davem330 sparc64: Fix crashes in schizo_pcierr_intr_other().
Meelis Roos reports crashes during bootup on a V480 that look like
this:

====================
[   61.300577] PCI: Scanning PBM /pci@9,600000
[   61.304867] schizo f009b070: PCI host bridge to bus 0003:00
[   61.310385] pci_bus 0003:00: root bus resource [io  0x7ffe9000000-0x7ffe9ffffff] (bus address [0x0000-0xffffff])
[   61.320515] pci_bus 0003:00: root bus resource [mem 0x7fb00000000-0x7fbffffffff] (bus address [0x00000000-0xffffffff])
[   61.331173] pci_bus 0003:00: root bus resource [bus 00]
[   61.385344] Unable to handle kernel NULL pointer dereference
[   61.390970] tsk->{mm,active_mm}->context = 0000000000000000
[   61.396515] tsk->{mm,active_mm}->pgd = fff000b000002000
[   61.401716]               \|/ ____ \|/
[   61.401716]               "@'/ .. \`@"
[   61.401716]               /_| \__/ |_\
[   61.401716]                  \__U_/
[   61.416362] swapper/0(0): Oops [#1]
[   61.419837] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc1-00422-g2cc9188-dirty #24
[   61.427975] task: fff000b0fd8e9c40 ti: fff000b0fd928000 task.ti: fff000b0fd928000
[   61.435426] TSTATE: 0000004480e01602 TPC: 00000000004455e4 TNPC: 00000000004455e8 Y: 00000000    Not tainted
[   61.445230] TPC: <schizo_pcierr_intr+0x104/0x560>
[   61.449897] g0: 0000000000000000 g1: 0000000000000000 g2: 0000000000a10f78 g3: 000000000000000a
[   61.458563] g4: fff000b0fd8e9c40 g5: fff000b0fdd82000 g6: fff000b0fd928000 g7: 000000000000000a
[   61.467229] o0: 000000000000003d o1: 0000000000000000 o2: 0000000000000006 o3: fff000b0ffa5fc7e
[   61.475894] o4: 0000000000060000 o5: c000000000000000 sp: fff000b0ffa5f3c1 ret_pc: 00000000004455cc
[   61.484909] RPC: <schizo_pcierr_intr+0xec/0x560>
[   61.489500] l0: fff000b0fd8e9c40 l1: 0000000000a20800 l2: 0000000000000000 l3: 000000000119a430
[   61.498164] l4: 0000000001742400 l5: 00000000011cfbe0 l6: 00000000011319c0 l7: fff000b0fd8ea348
[   61.506830] i0: 0000000000000000 i1: fff000b0fdb34000 i2: 0000000320000000 i3: 0000000000000000
[   61.515497] i4: 00060002010b003f i5: 0000040004e02000 i6: fff000b0ffa5f481 i7: 00000000004a9920
[   61.524175] I7: <handle_irq_event_percpu+0x40/0x140>
[   61.529099] Call Trace:
[   61.531531]  [00000000004a9920] handle_irq_event_percpu+0x40/0x140
[   61.537681]  [00000000004a9a58] handle_irq_event+0x38/0x80
[   61.543145]  [00000000004ac77c] handle_fasteoi_irq+0xbc/0x200
[   61.548860]  [00000000004a9084] generic_handle_irq+0x24/0x40
[   61.554500]  [000000000042be0c] handler_irq+0xac/0x100
====================

The problem is that pbm->pci_bus->self is NULL.

This code is trying to go through the standard PCI config space
interfaces to read the PCI controller's PCI_STATUS register.

This doesn't work, because we more often than not do not enumerate
the PCI controller as a bonafide PCI device during the OF device
node scan.  Therefore bus->self remains NULL.

Existing common code for PSYCHO and PSYCHO-like PCI controllers
handles this properly, by doing the config space access directly.

Do the same here, pbm->pci_ops->{read,write}().

Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
7da89a2
@popcornmix popcornmix pushed a commit that referenced this issue Jul 27, 2015
@grygoriyS grygoriyS pinctrl: single: ensure pcs irq will not be forced threaded
The PSC IRQ is requested using request_irq() API and as result it can
be forced to be threaded IRQ in RT-Kernel if PCS_QUIRK_HAS_SHARED_IRQ
is enabled for pinctrl domain.

As result, following 'possible irq lock inversion dependency' report
can be seen:
=========================================================
[ INFO: possible irq lock inversion dependency detected ]
3.14.43-rt42-00360-g96ff499-dirty #24 Not tainted
---------------------------------------------------------
irq/369-pinctrl/927 just changed the state of lock:
 (&pcs->lock){+.....}, at: [<c0375b54>] pcs_irq_handle+0x48/0x9c
but this lock was taken by another, HARDIRQ-safe lock in the past:
 (&irq_desc_lock_class){-.....}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&pcs->lock);
                               local_irq_disable();
                               lock(&irq_desc_lock_class);
                               lock(&pcs->lock);
  <Interrupt>
    lock(&irq_desc_lock_class);

 *** DEADLOCK ***

no locks held by irq/369-pinctrl/927.

the shortest dependencies between 2nd lock and 1st lock:
  -> (&irq_desc_lock_class){-.....} ops: 58724 {
     IN-HARDIRQ-W at:
                       [<c0090040>] lock_acquire+0x9c/0x158
                       [<c07065c8>] _raw_spin_lock+0x48/0x58
                       [<c009edac>] handle_fasteoi_irq+0x24/0x15c
                       [<c009abb0>] generic_handle_irq+0x3c/0x4c
                       [<c000f83c>] handle_IRQ+0x50/0xa0
                       [<c0008674>] gic_handle_irq+0x3c/0x6c
                       [<c0707a04>] __irq_svc+0x44/0x8c
                       [<c000fc44>] arch_cpu_idle+0x40/0x4c
                       [<c009aadc>] cpu_startup_entry+0x270/0x2e0
                       [<c06fcbf8>] rest_init+0xd4/0xe4
                       [<c0a44bfc>] start_kernel+0x3d0/0x3dc
                       [<80008084>] 0x80008084
     INITIAL USE at:
                      [<c0090040>] lock_acquire+0x9c/0x158
                      [<c070674c>] _raw_spin_lock_irqsave+0x54/0x68
                      [<c009aff8>] __irq_get_desc_lock+0x64/0xa4
                      [<c009e38c>] irq_set_chip+0x30/0x78
                      [<c009ec30>] irq_set_chip_and_handler_name+0x24/0x3c
                      [<c036ca10>] gic_irq_domain_map+0x48/0xb4
                      [<c00a0a80>] irq_domain_associate+0x84/0x1d4
                      [<c00a1154>] irq_create_mapping+0x80/0x11c
                      [<c00a1270>] irq_create_of_mapping+0x80/0x120
                      [<c05cdaa8>] irq_of_parse_and_map+0x34/0x3c
                      [<c0a4ea24>] omap_dm_timer_init_one+0x90/0x30c
                      [<c0a4eef0>] omap5_realtime_timer_init+0x8c/0x48c
                      [<c0a486b0>] time_init+0x28/0x38
                      [<c0a44a6c>] start_kernel+0x240/0x3dc
                      [<80008084>] 0x80008084
   }
   ... key      at: [<c1049ce0>] irq_desc_lock_class+0x0/0x8
   ... acquired at:
   [<c07065c8>] _raw_spin_lock+0x48/0x58
   [<c0375a90>] pcs_irq_unmask+0x58/0xa0
   [<c009ea48>] irq_enable+0x38/0x48
   [<c009ead0>] irq_startup+0x78/0x7c
   [<c009d440>] __setup_irq+0x4a8/0x4f4
   [<c009d5dc>] request_threaded_irq+0xb8/0x138
   [<c0415a5c>] omap_8250_startup+0x4c/0x148
   [<c041276c>] serial8250_startup+0x24/0x30
   [<c040d0ec>] uart_startup.part.9+0x5c/0x1b4
   [<c040dbcc>] uart_open+0xf4/0x16c
   [<c03f0540>] tty_open+0x170/0x61c
   [<c0157028>] chrdev_open+0xbc/0x1b4
   [<c0150494>] do_dentry_open+0x1e8/0x2bc
   [<c0150a84>] finish_open+0x44/0x5c
   [<c0160d50>] do_last.isra.47+0x710/0xca0
   [<c01613a4>] path_openat+0xc4/0x640
   [<c0162904>] do_filp_open+0x3c/0x98
   [<c0151bdc>] do_sys_open+0x114/0x1d8
   [<c0151cc8>] SyS_open+0x28/0x2c
   [<c0a44d70>] kernel_init_freeable+0x168/0x1e4
   [<c06fcc24>] kernel_init+0x1c/0xf8
   [<c000eee8>] ret_from_fork+0x14/0x20

-> (&pcs->lock){+.....} ops: 65 {
   HARDIRQ-ON-W at:
                    [<c0090040>] lock_acquire+0x9c/0x158
                    [<c07065c8>] _raw_spin_lock+0x48/0x58
                    [<c0375b54>] pcs_irq_handle+0x48/0x9c
                    [<c0375c5c>] pcs_irq_handler+0x1c/0x28
                    [<c009c458>] irq_forced_thread_fn+0x30/0x74
                    [<c009c784>] irq_thread+0x158/0x1c4
                    [<c0063fc4>] kthread+0xd4/0xe8
                    [<c000eee8>] ret_from_fork+0x14/0x20
   INITIAL USE at:
                   [<c0090040>] lock_acquire+0x9c/0x158
                   [<c070674c>] _raw_spin_lock_irqsave+0x54/0x68
                   [<c0375344>] pcs_enable+0x7c/0xe8
                   [<c0372a44>] pinmux_enable_setting+0x178/0x220
                   [<c036fecc>] pinctrl_select_state+0x110/0x194
                   [<c04732dc>] pinctrl_bind_pins+0x7c/0x108
                   [<c045853c>] driver_probe_device+0x70/0x254
                   [<c0458810>] __driver_attach+0x9c/0xa0
                   [<c045674c>] bus_for_each_dev+0x78/0xac
                   [<c0458030>] driver_attach+0x2c/0x30
                   [<c0457c78>] bus_add_driver+0x15c/0x204
                   [<c0458ee0>] driver_register+0x88/0x108
                   [<c045a168>] __platform_driver_register+0x64/0x6c
                   [<c0a8170c>] omap_hsmmc_driver_init+0x1c/0x20
                   [<c0008a94>] do_one_initcall+0x110/0x170
                   [<c0a44d48>] kernel_init_freeable+0x140/0x1e4
                   [<c06fcc24>] kernel_init+0x1c/0xf8
                   [<c000eee8>] ret_from_fork+0x14/0x20
 }
 ... key      at: [<c1088a8c>] __key.18572+0x0/0x8
 ... acquired at:
   [<c008cdd4>] mark_lock+0x388/0x76c
   [<c008df40>] __lock_acquire+0x6d0/0x1f98
   [<c0090040>] lock_acquire+0x9c/0x158
   [<c07065c8>] _raw_spin_lock+0x48/0x58
   [<c0375b54>] pcs_irq_handle+0x48/0x9c
   [<c0375c5c>] pcs_irq_handler+0x1c/0x28
   [<c009c458>] irq_forced_thread_fn+0x30/0x74
   [<c009c784>] irq_thread+0x158/0x1c4
   [<c0063fc4>] kthread+0xd4/0xe8
   [<c000eee8>] ret_from_fork+0x14/0x20

stack backtrace:
CPU: 1 PID: 927 Comm: irq/369-pinctrl Not tainted 3.14.43-rt42-00360-g96ff499-dirty #24
[<c00177e0>] (unwind_backtrace) from [<c00130b0>] (show_stack+0x20/0x24)
[<c00130b0>] (show_stack) from [<c0702958>] (dump_stack+0x84/0xd0)
[<c0702958>] (dump_stack) from [<c008bcfc>] (print_irq_inversion_bug+0x1d0/0x21c)
[<c008bcfc>] (print_irq_inversion_bug) from [<c008bf18>] (check_usage_backwards+0xb4/0x11c)
[<c008bf18>] (check_usage_backwards) from [<c008cdd4>] (mark_lock+0x388/0x76c)
[<c008cdd4>] (mark_lock) from [<c008df40>] (__lock_acquire+0x6d0/0x1f98)
[<c008df40>] (__lock_acquire) from [<c0090040>] (lock_acquire+0x9c/0x158)
[<c0090040>] (lock_acquire) from [<c07065c8>] (_raw_spin_lock+0x48/0x58)
[<c07065c8>] (_raw_spin_lock) from [<c0375b54>] (pcs_irq_handle+0x48/0x9c)
[<c0375b54>] (pcs_irq_handle) from [<c0375c5c>] (pcs_irq_handler+0x1c/0x28)
[<c0375c5c>] (pcs_irq_handler) from [<c009c458>] (irq_forced_thread_fn+0x30/0x74)
[<c009c458>] (irq_forced_thread_fn) from [<c009c784>] (irq_thread+0x158/0x1c4)
[<c009c784>] (irq_thread) from [<c0063fc4>] (kthread+0xd4/0xe8)
[<c0063fc4>] (kthread) from [<c000eee8>] (ret_from_fork+0x14/0x20)

To fix it use IRQF_NO_THREAD to ensure that pcs irq will not be forced threaded.

Cc: Tony Lindgren <tony@atomide.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
c10372e
@davet321 davet321 pushed a commit to davet321/rpi-linux that referenced this issue Aug 17, 2015
Michal Hocko mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
commit ecf5fc6 upstream.

Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
 #10 shrink_zone at ffffffff811360c3
 #11 shrink_zones at ffffffff81136eff
 #12 do_try_to_free_pages at ffffffff8113712f
 #13 try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Cc: stable@vger.kernel.org # 3.9+
[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7f488aa
@popcornmix popcornmix pushed a commit that referenced this issue Aug 20, 2015
Michal Hocko mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
 #10 shrink_zone at ffffffff811360c3
 #11 shrink_zones at ffffffff81136eff
 #12 do_try_to_free_pages at ffffffff8113712f
 #13 try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Cc: stable@vger.kernel.org # 3.9+
[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ecf5fc6
@popcornmix popcornmix pushed a commit that referenced this issue Dec 8, 2015
Nikolay Aleksandrov vrf: fix double free and memory corruption on register_netdevice failure
When vrf's ->newlink is called, if register_netdevice() fails then it
does free_netdev(), but that's also done by rtnl_newlink() so a second
free happens and memory gets corrupted, to reproduce execute the
following line a couple of times (1 - 5 usually is enough):
$ for i in `seq 1 5`; do ip link add vrf: type vrf table 1; done;
This works because we fail in register_netdevice() because of the wrong
name "vrf:".

And here's a trace of one crash:
[   28.792157] ------------[ cut here ]------------
[   28.792407] kernel BUG at fs/namei.c:246!
[   28.792608] invalid opcode: 0000 [#1] SMP
[   28.793240] Modules linked in: vrf nfsd auth_rpcgss oid_registry
nfs_acl nfs lockd grace sunrpc crct10dif_pclmul crc32_pclmul
crc32c_intel qxl drm_kms_helper ttm drm aesni_intel aes_x86_64 psmouse
glue_helper lrw evdev gf128mul i2c_piix4 ablk_helper cryptd ppdev
parport_pc parport serio_raw pcspkr virtio_balloon virtio_console
i2c_core acpi_cpufreq button 9pnet_virtio 9p 9pnet fscache ipv6 autofs4
ext4 crc16 mbcache jbd2 virtio_blk virtio_net sg sr_mod cdrom
ata_generic ehci_pci uhci_hcd ehci_hcd e1000 usbcore usb_common ata_piix
libata virtio_pci virtio_ring virtio scsi_mod floppy
[   28.796016] CPU: 0 PID: 1148 Comm: ld-linux-x86-64 Not tainted
4.4.0-rc1+ #24
[   28.796016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.8.1-20150318_183358- 04/01/2014
[   28.796016] task: ffff8800352561c0 ti: ffff88003592c000 task.ti:
ffff88003592c000
[   28.796016] RIP: 0010:[<ffffffff812187b3>]  [<ffffffff812187b3>]
putname+0x43/0x60
[   28.796016] RSP: 0018:ffff88003592fe88  EFLAGS: 00010246
[   28.796016] RAX: 0000000000000000 RBX: ffff8800352561c0 RCX:
0000000000000001
[   28.796016] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffff88003784f000
[   28.796016] RBP: ffff88003592ff08 R08: 0000000000000001 R09:
0000000000000000
[   28.796016] R10: 0000000000000000 R11: 0000000000000001 R12:
0000000000000000
[   28.796016] R13: 000000000000047c R14: ffff88003784f000 R15:
ffff8800358c4a00
[   28.796016] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000)
knlGS:0000000000000000
[   28.796016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   28.796016] CR2: 00007ffd583bc2d9 CR3: 0000000035a99000 CR4:
00000000000406f0
[   28.796016] Stack:
[   28.796016]  ffffffff8121045d ffffffff812102d3 ffff8800352561c0
ffff880035a91660
[   28.796016]  ffff8800008a9880 0000000000000000 ffffffff81a49940
00ffffff81218684
[   28.796016]  ffff8800352561c0 000000000000047c 0000000000000000
ffff880035b36d80
[   28.796016] Call Trace:
[   28.796016]  [<ffffffff8121045d>] ?
do_execveat_common.isra.34+0x74d/0x930
[   28.796016]  [<ffffffff812102d3>] ?
do_execveat_common.isra.34+0x5c3/0x930
[   28.796016]  [<ffffffff8121066c>] do_execve+0x2c/0x30
[   28.796016]  [<ffffffff810939a0>]
call_usermodehelper_exec_async+0xf0/0x140
[   28.796016]  [<ffffffff810938b0>] ? umh_complete+0x40/0x40
[   28.796016]  [<ffffffff815cb1af>] ret_from_fork+0x3f/0x70
[   28.796016] Code: 48 8d 47 1c 48 89 e5 53 48 8b 37 48 89 fb 48 39 c6
74 1a 48 8b 3d 7e e9 8f 00 e8 49 fa fc ff 48 89 df e8 f1 01 fd ff 5b 5d
f3 c3 <0f> 0b 48 89 fe 48 8b 3d 61 e9 8f 00 e8 2c fa fc ff 5b 5d eb e9
[   28.796016] RIP  [<ffffffff812187b3>] putname+0x43/0x60
[   28.796016]  RSP <ffff88003592fe88>

Fixes: 193125d ("net: Introduce VRF device driver")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7f109f7
@popcornmix popcornmix pushed a commit that referenced this issue Dec 15, 2015
Nikolay Aleksandrov vrf: fix double free and memory corruption on register_netdevice failure
[ Upstream commit 7f109f7 ]

When vrf's ->newlink is called, if register_netdevice() fails then it
does free_netdev(), but that's also done by rtnl_newlink() so a second
free happens and memory gets corrupted, to reproduce execute the
following line a couple of times (1 - 5 usually is enough):
$ for i in `seq 1 5`; do ip link add vrf: type vrf table 1; done;
This works because we fail in register_netdevice() because of the wrong
name "vrf:".

And here's a trace of one crash:
[   28.792157] ------------[ cut here ]------------
[   28.792407] kernel BUG at fs/namei.c:246!
[   28.792608] invalid opcode: 0000 [#1] SMP
[   28.793240] Modules linked in: vrf nfsd auth_rpcgss oid_registry
nfs_acl nfs lockd grace sunrpc crct10dif_pclmul crc32_pclmul
crc32c_intel qxl drm_kms_helper ttm drm aesni_intel aes_x86_64 psmouse
glue_helper lrw evdev gf128mul i2c_piix4 ablk_helper cryptd ppdev
parport_pc parport serio_raw pcspkr virtio_balloon virtio_console
i2c_core acpi_cpufreq button 9pnet_virtio 9p 9pnet fscache ipv6 autofs4
ext4 crc16 mbcache jbd2 virtio_blk virtio_net sg sr_mod cdrom
ata_generic ehci_pci uhci_hcd ehci_hcd e1000 usbcore usb_common ata_piix
libata virtio_pci virtio_ring virtio scsi_mod floppy
[   28.796016] CPU: 0 PID: 1148 Comm: ld-linux-x86-64 Not tainted
4.4.0-rc1+ #24
[   28.796016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.8.1-20150318_183358- 04/01/2014
[   28.796016] task: ffff8800352561c0 ti: ffff88003592c000 task.ti:
ffff88003592c000
[   28.796016] RIP: 0010:[<ffffffff812187b3>]  [<ffffffff812187b3>]
putname+0x43/0x60
[   28.796016] RSP: 0018:ffff88003592fe88  EFLAGS: 00010246
[   28.796016] RAX: 0000000000000000 RBX: ffff8800352561c0 RCX:
0000000000000001
[   28.796016] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffff88003784f000
[   28.796016] RBP: ffff88003592ff08 R08: 0000000000000001 R09:
0000000000000000
[   28.796016] R10: 0000000000000000 R11: 0000000000000001 R12:
0000000000000000
[   28.796016] R13: 000000000000047c R14: ffff88003784f000 R15:
ffff8800358c4a00
[   28.796016] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000)
knlGS:0000000000000000
[   28.796016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   28.796016] CR2: 00007ffd583bc2d9 CR3: 0000000035a99000 CR4:
00000000000406f0
[   28.796016] Stack:
[   28.796016]  ffffffff8121045d ffffffff812102d3 ffff8800352561c0
ffff880035a91660
[   28.796016]  ffff8800008a9880 0000000000000000 ffffffff81a49940
00ffffff81218684
[   28.796016]  ffff8800352561c0 000000000000047c 0000000000000000
ffff880035b36d80
[   28.796016] Call Trace:
[   28.796016]  [<ffffffff8121045d>] ?
do_execveat_common.isra.34+0x74d/0x930
[   28.796016]  [<ffffffff812102d3>] ?
do_execveat_common.isra.34+0x5c3/0x930
[   28.796016]  [<ffffffff8121066c>] do_execve+0x2c/0x30
[   28.796016]  [<ffffffff810939a0>]
call_usermodehelper_exec_async+0xf0/0x140
[   28.796016]  [<ffffffff810938b0>] ? umh_complete+0x40/0x40
[   28.796016]  [<ffffffff815cb1af>] ret_from_fork+0x3f/0x70
[   28.796016] Code: 48 8d 47 1c 48 89 e5 53 48 8b 37 48 89 fb 48 39 c6
74 1a 48 8b 3d 7e e9 8f 00 e8 49 fa fc ff 48 89 df e8 f1 01 fd ff 5b 5d
f3 c3 <0f> 0b 48 89 fe 48 8b 3d 61 e9 8f 00 e8 2c fa fc ff 5b 5d eb e9
[   28.796016] RIP  [<ffffffff812187b3>] putname+0x43/0x60
[   28.796016]  RSP <ffff88003592fe88>

Fixes: 193125d ("net: Introduce VRF device driver")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
b3abad3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.