Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USB Audio Dropouts #575

Closed
lnicola opened this issue Apr 29, 2014 · 45 comments
Closed

USB Audio Dropouts #575

lnicola opened this issue Apr 29, 2014 · 45 comments
Assignees

Comments

@lnicola
Copy link

lnicola commented Apr 29, 2014

Since the new FIQ work, (31a8f3f12c, actually), the audio on my USB DAC is full of crackles.

dmesg: http://pastebin.com/DVjm57k0
lsusb -v: http://pastebin.com/VPQa17hE
vmstat 2 a while before and during playback: http://pastebin.com/5SqwfwR9
/proc/asound/card0/stream0: http://pastebin.com/r0hQijyD

The stream I'm playing is 16-bit at 44.1 kHz. I've also unbound the HID endpoint of the DAC because it made things better at some point. This issue still occurs even if I keep it.

@ghollingworth
Copy link

Have you tried rpi-update since last night (the new fiq_fsm was pushed to the head branch), this should have been fixed already

@lnicola
Copy link
Author

lnicola commented Apr 29, 2014

Yes, it's the version from the master branch.

@kursusHC
Copy link

I have the same issue, both with the recent Master branch and the recent Next.
I'm running Arch up to date with a Hifimediy Sabre USB DAC ES9023 (CMA options disabled).

When I revert to an old Next commit (April 15th), the problem disappears.

@P33M
Copy link
Contributor

P33M commented Apr 29, 2014

I have 3 DACs using different manufacturers of ASIC that do not exhibit this problem.

It would be good to get a minimal set of conditions that cause this to happen. Does this happen with a simple "aplay test.wav" - it would be good to use a sinewave test file or similar:

http://www.audiocheck.net/audiofrequencysignalgenerator_sinetone.php

That webpage will let you generate a sinewave test tone up to 10 seconds long.

Also, is this DAC widely available for purchase in the EU? It would be easier if I just got my hands on one.

@lnicola
Copy link
Author

lnicola commented Apr 29, 2014

Yes, it happens to me with a raw file (aplay -f cd foo.raw). Do you think it would help if I tried with a sine wave and perhaps send you a recording?

I do have some specific settings: snd-usb-audio.nrpacks=1 and

pcm.!default {
        type hw
        card 0
}

in asound.conf to prevent the sample rate defaulting to 48000 Hz (I don't remember exactly, but I think it's using plug:hw without it).

My DAC is the one from http://www.amazon.co.uk/Behringer-UCA202-U-Control-low-latency-Interface/dp/B000KW2YEI/ , but note that it's a bit finicky: both Windows and the Raspberry Pi tend to hang when I connect it -- or at least they did when I got it. If I keep it connected during boot it's fine, though.

@P33M
Copy link
Contributor

P33M commented Apr 29, 2014

snd_usb_audio.nrpacks now has no effect in 3.12: the URB allocation is now completely automatic.

I have that device (or at least something with the same badge) on my desk. It works fine. It may well be that yours has a different codec inside: I will compare the device I have to see if there are any differences.

Please do send a recording of a sinewave test.

@andrea-iob
Copy link

I've the same problem. My DAC is a HRT Music Streamer II+. I've the first version of the DAC (http://www.headfonia.com/hrt-music-streamer-ii-2496-usb-dac) the one they are selling now should have just a different case and a different PCB layout (http://www.hirestech.com/product/?pid=128).

I've the latest revision of the DAC firmware. I've also tried with a firmware version specifically designed for use with NAS and similar lower processing power hosts, but the problem remains.

I hear the noise on all the files I play, also with a sinewave test file.

@lnicola
Copy link
Author

lnicola commented Apr 29, 2014

Are you sure that change was in 3.12? I thought the argument was actually removed in 3.13: http://www.raspberrypi.org/forums/viewtopic.php?p=541704#p541704 .

It's too bad that the revision in which this issue appeared contained two changes; I suppose one thing to try would be to see which of those is actually causing it.

Regarding the recording, I don't think I have an audio cable to be able to record it properly. Would using a microphone suffice?

@P33M
Copy link
Contributor

P33M commented Apr 29, 2014

My mistake: yes the patch was in the 3.13 release. Even so, altering nrpacks should have no effect since the glitches appear to happen more frequently than the ~10ms URB interval.

If you can get any sort of recording (of a sine test) that is of reasonable quality then it would help.

@andrea-iob
Copy link

Looking at the 3.12.18 kernel sources I still see the nrpacks parameter being used (see line 82 of https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/sound/usb/card.c?id=refs/tags/v3.12.18).

However I've tried the "ALSA: improve buffer size computations for USB PCM audio" patch, but I didn't notice any improvements.

@lnicola
Copy link
Author

lnicola commented Apr 29, 2014

It seems that I can't get a decent recording with my laptop microphone :(.

@andrea-iob
Copy link

I connected the line out of my hi-fi to the mic in of my notebook. Here is the results: "https://www.dropbox.com/s/mz4wkilqj9xtyur/HRT_Music_streamer_Pi_Noise.wav".

Could it be useful to you?

@lnicola
Copy link
Author

lnicola commented Apr 29, 2014

Mine sounds about the same. I noticed, however, that the noise is much more audible with a 20 Hz tone. Perhaps andrea-iob could repeat the test using that?

@andrea-iob
Copy link

@lnicola
Copy link
Author

lnicola commented Apr 29, 2014

Of course I have a cable, from my DAC to the amplifier. Sorry for the high volume, I can't seem to figure out how to reduce it.

Oh, my: http://i.imgur.com/EGYSFQz.png . Download link: https://www.wetransfer.com/downloads/a1d630b523384d2181aabfdf536afbaa20140429204918/2176f0f73803024943fe56222cd6cd1b20140429204918/e97241

@P33M
Copy link
Contributor

P33M commented Apr 29, 2014

(ಠ_ಠ)

Well that puts paid to any theories I have about what's causing this.

If it were reads off the end of the bounce buffers, I would expect there to be a constant value transmitted (the buffers are rewritten with 0x6B on transaction completion). This isn't happening - in fact it appears that random corruption is happening at random times.

@lnicola
Copy link
Author

lnicola commented Apr 29, 2014

Sorry, please consider the file above invalid. I had a "monitor" switch on, which supposedly mixes together the analog and the USB input; weird thing is, recording didn't work without the analog input, even with the switch on.

Anyway, that takes care of the clipping issue. I've also switched to mono recording to take that out of the picture. The waveform still looks bad, though.

Here is a new set: http://i.imgur.com/eckeYK5.png https://www.dropbox.com/s/qt6wlm34uafu9gu/quux.flac .

@ski522
Copy link

ski522 commented Apr 29, 2014

I just upgraded my Pi and my USB PCM2704 is exhibiting similar symptons. To temporarily resolve the issue I lowered my USB ports speed down by adding the "dwc_otg.speed=1" Not ideal as I use my Pi as a NAS, but works for now. My post about the problem is at the Pi Forum http://www.raspberrypi.org/forums/viewtopic.php?f=63&t=76026&p=543139#p543139

@lnicola
Copy link
Author

lnicola commented Apr 29, 2014

You mention a constant value being transmitted on overruns. However, my source is 16-bit. Is the transfer size a multiple of the sample size? I wonder if the corruption could happen in the middle of a frame or sample.

Now this sounds fun..

- /* Bugette: for some reason, memcpy corrupts the data in the bounce buffers. May be a
-  * cache coherency issue */

@kursusHC
Copy link

The device is lost after a few minutes of playback/forward/volume change (just like it was before the huge work on FIQ drivers). But unlike @GrayShade it runs perfectly on all my linux desktops.

I've tried "dwc_otg.speed=1" but it prevents ALSA from detecting my DAC.

Can we provide something to help ?

@giddyhup
Copy link

I am also running into this and probably related problems. I updated about twelve hours ago. USB sound is crackling, transferring data via WIFI (samba) times out. I get NYET messages related to my bluetooth dongle.

I updated because I hoped my SDR solution (based on a DVB-T USB stick) would improve, but everything has become worse. I will revert to my backup I created before the update.

lsusb -v http://pastebin.com/LRcq8xva
dmesg (up until WIFI related samba errors) http://pastebin.com/y21Skth5

@P33M
Copy link
Contributor

P33M commented Apr 30, 2014

dwc_otg.speed=1 should not be required.

It should be sufficient to disable the FSM: put dwc_otg.fiq_fsm_enable=0 into /boot/cmdline.txt

@ALL:
How many of you are running overclocked?

@lnicola
Copy link
Author

lnicola commented Apr 30, 2014

I recently overclocked my Pi, however the issue also occurs when running at the normal clock rate.

@giddyhup
Copy link

I run overclocked. I now disabled the FSM. Sound is back to normal and samba/WIFI throughput is better/without errors. I don't need to revert to my old image after all.

@ski522
Copy link

ski522 commented Apr 30, 2014

Yes dwc_otg.fiq_fsm_enable=0 seemed to fix the problem as well for me.

I am overclocking my Pi at 900MHz.

@P33M
Copy link
Contributor

P33M commented Apr 30, 2014

I can replicate this with an overclock.

Edit: I can also replicate this without an overclock.

@andrea-iob
Copy link

My Pi is not overclocked.

Disabling the FSM does not fix the problem: with some files I still hear some random pops, with other files I hear only noise. The files that with FSM disabled are completely broken are 16/44 files that, for some reasons I don't understand, are played as S24_3LE.

As reported in the FIQ_FSM forum thread, before the kernel upgrade the files played as S24_3LE were working good whereas the files played as S16_LE where distorted (there were NYET errors). After the kernel upgrade I hear noise on all the files I play.

@P33M
Copy link
Contributor

P33M commented Apr 30, 2014

You will get unreliable playback with the FSM disabled. Interrupt latency will mess up audio transfers from time to time. 24-bit audio is particularly vulnerable as you may get what is effectively an endian swap (and thus everything turns into noise) if you lose a frame of data.

@amtssp
Copy link

amtssp commented Apr 30, 2014

Hi I have the same problem (using 3.14.1 kernel) constant crackling noise in my USB audio out.
But If I add dwc_otg.fiq_fsm_mask=0x1 to the cmdline file - the USB-audio is playing without problems.

If I add dwc_otg.fiq_fsm_mask=0x3 ....... I get the crackling noise
If I add dwc_otg.fiq_fsm_mask=0x7 ........I get the crackling noise

My USB adapter is:
[ 4.746893] usb 1-1.2: new full-speed USB device number 4 using dwc_otg
[ 4.849367] usb 1-1.2: New USB device found, idVendor=0d8c, idProduct=000c
[ 4.849403] usb 1-1.2: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[ 4.849421] usb 1-1.2: Product: C-Media USB Headphone Set
[ 4.862167] input: C-Media USB Headphone Set as /devices/platform/bcm2708_usb/usb1/1-1/1-1.2/1-1.2:1.3/0003:0D8C:000C.0001/input/input0
[ 4.862840] hid-generic 0003:0D8C:000C.0001: input,hidraw0: USB HID v1.00 Device [C-Media USB Headphone Set ] on usb-bcm2708_usb-1.2/input3

And the new fiq-fsm is enabled:
[ 3.522091] OTG VER PARAM: 0, OTG VER FLAG: 0
[ 3.522109] Dedicated Tx FIFOs mode
[ 3.522357] WARN::dwc_otg_hcd_init:1040: FIQ DMA bounce buffers: virt = 0xdcc27000 dma = 0x5ab94000 len=9024
[ 3.522391] FIQ FSM acceleration enabled for :
[ 3.522391] Non-periodic Split Transactions
[ 3.522411] dwc_otg: Microframe scheduler enabled
[ 3.522452] WARN::hcd_init:473: FIQ at 0xc03329d0
[ 3.522469] WARN::hcd_init:474: FIQ ASM at 0xc0332c4c length 36
...
[ 3.524484] dwc_otg: FIQ enabled
[ 3.524502] dwc_otg: NAK holdoff enabled
[ 3.524512] dwc_otg: FIQ split-transaction FSM enabled

Hope this info is helpfull

@popcornmix
Copy link
Collaborator

@P33M Does this reduce usable memory to linux by 16M?
Have you tried:
CONFIG_CMA_SIZE_MBYTES=8
(or 4 or 2 or 1?)

@lnicola
Copy link
Author

lnicola commented May 1, 2014

Did anything related to CMA change in 3.12? I had to remove my CMA options from config.txt to be able to boot.

I'll test it when the new binaries come out.

@popcornmix
Copy link
Collaborator

CONFIG_DMA_CMA=y seems to enable other disabled CMA config options, so it's possible adding this will make CMA work.

@P33M
Copy link
Contributor

P33M commented May 1, 2014

CMA is already enabled (or at least the reservation happens) - therefore 16M is reserved on boot anyway.

Edit: by changing cma=xM in /boot/cmdline.txt, the minimum to boot is 5M.

popcornmix pushed a commit to raspberrypi/firmware that referenced this issue May 1, 2014
…erency issues

See: raspberrypi/linux#575

kernel: config: Add CONFIG_USB_HOS module (option branded GSM modem)
See: raspberrypi/linux#580

tvservice: Allow requesting NTSC frequency variants of hdmi modes
See: http://www.raspberrypi.org/forums/viewtopic.php?f=91&t=75589
popcornmix pushed a commit to Hexxeh/rpi-firmware that referenced this issue May 1, 2014
…erency issues

See: raspberrypi/linux#575

kernel: config: Add CONFIG_USB_HOS module (option branded GSM modem)
See: raspberrypi/linux#580

tvservice: Allow requesting NTSC frequency variants of hdmi modes
See: http://www.raspberrypi.org/forums/viewtopic.php?f=91&t=75589
@lnicola
Copy link
Author

lnicola commented May 1, 2014

20 Hz wave looks good. I'll try listening to something a bit later, but I think I'll leave this issue open until @andrea-iob or someone else gets to test it. Any ideas on why some people never had this issue?

Anyway, thanks for your work!

EDIT: Streaming over the network seems to work fine too.

I suppose the memory fragmentation problem reported by milhouse at the beginning of the FIQ_FSM rewrite testing thread will remain, right?

@P33M
Copy link
Contributor

P33M commented May 1, 2014

Memory fragmentation is orthogonal to this issue. It will always happen with a buddy allocator, and is only a "Problem" because the ARM has to do a relatively large amount of work if it wants to shift pages around to reclaim a contiguous area of memory. The net effect is lowered throughput and higher CPU load with some workloads that particularly fragment memory.

The coherency issue disappears if you get sufficient cache activity such that in the 125uS between a transaction being queued, and the hardware executing it, that all the affected cache lines are written out into main memory. You can make the issue go away by having the Pi do "other stuff" at the same time as playback.

@andrea-iob
Copy link

No more noise, pops or clicks. The issue seems fixed for me.

Thanks a lot! I really appreciate your hard work!

@amtssp
Copy link

amtssp commented May 1, 2014

Also fixed for me after building a new kernel 3.14.1 and update firmware. USB audio is good now.

Thank you..

@kursusHC
Copy link

kursusHC commented May 1, 2014

Same here after a simple rpi-update. Thanks a lot, I can finally enjoy my Pi !!

@lnicola
Copy link
Author

lnicola commented May 1, 2014

Seems fixed, then.

@ski522
Copy link

ski522 commented May 1, 2014

Yup...PCM2704 USB DAC is back to normal...thank you for the prompt fix!!

@giddyhup
Copy link

giddyhup commented May 4, 2014

The recent update from May 1 initially looked like an improvement. Even my USB connected DPF became stable. Yet, with my USB sound dongle some sounds sounded odd or wouldn't play at all. dmesg showed some NYET messages related to the sound card, I had to disable the driver again.

[  141.005422] Transfer to device 7 endpoint 0x1 frame 1150 failed - FIQ reported NYET. Data may have been lost.
[  141.325427] Transfer to device 7 endpoint 0x1 frame 1470 failed - FIQ reported NYET. Data may have been lost.

davet321 added a commit to davet321/rpi-linux that referenced this issue May 11, 2014
commit a8c3930
Author: popcornmix <popcornmix@gmail.com>
Date:   Thu May 1 13:38:17 2014 +0100

    config: enable CONFIG_DMA_CMA - it may fix cache coherency issue in USB driver

    See: raspberrypi#575

commit 63cbbd4
Author: Gordon Garrity <gordon@iqaudio.com>
Date:   Sat Mar 8 16:56:57 2014 +0000

    Add IQaudIO Sound Card support for Raspberry Pi
lclausen-adi pushed a commit to analogdevicesinc/linux that referenced this issue Jun 24, 2014
neuschaefer pushed a commit to neuschaefer/raspi-binary-firmware that referenced this issue Feb 27, 2017
…erency issues

See: raspberrypi/linux#575

kernel: config: Add CONFIG_USB_HOS module (option branded GSM modem)
See: raspberrypi/linux#580

tvservice: Allow requesting NTSC frequency variants of hdmi modes
See: http://www.raspberrypi.org/forums/viewtopic.php?f=91&t=75589
popcornmix pushed a commit that referenced this issue Jul 13, 2017
commit 1a3fc2c upstream.

There has been a report about a deadlock in the xenbus driver:

[  247.979498] ======================================================
[  247.985688] WARNING: possible circular locking dependency detected
[  247.991882] 4.12.0-rc4-00022-gc4b25c0 #575 Not tainted
[  247.997040] ------------------------------------------------------
[  248.003232] xenbus/91 is trying to acquire lock:
[  248.007875]  (&u->msgbuffer_mutex){+.+.+.}, at: [<ffff00000863e904>]
xenbus_dev_queue_reply+0x3c/0x230
[  248.017163]
[  248.017163] but task is already holding lock:
[  248.023096]  (xb_write_mutex){+.+...}, at: [<ffff00000863a940>]
xenbus_thread+0x5f0/0x798
[  248.031267]
[  248.031267] which lock already depends on the new lock.
[  248.031267]
[  248.039615]
[  248.039615] the existing dependency chain (in reverse order) is:
[  248.047176]
[  248.047176] -> #1 (xb_write_mutex){+.+...}:
[  248.052943]        __lock_acquire+0x1728/0x1778
[  248.057498]        lock_acquire+0xc4/0x288
[  248.061630]        __mutex_lock+0x84/0x868
[  248.065755]        mutex_lock_nested+0x3c/0x50
[  248.070227]        xs_send+0x164/0x1f8
[  248.074015]        xenbus_dev_request_and_reply+0x6c/0x88
[  248.079427]        xenbus_file_write+0x260/0x420
[  248.084073]        __vfs_write+0x48/0x138
[  248.088113]        vfs_write+0xa8/0x1b8
[  248.091983]        SyS_write+0x54/0xb0
[  248.095768]        el0_svc_naked+0x24/0x28
[  248.099897]
[  248.099897] -> #0 (&u->msgbuffer_mutex){+.+.+.}:
[  248.106088]        print_circular_bug+0x80/0x2e0
[  248.110730]        __lock_acquire+0x1768/0x1778
[  248.115288]        lock_acquire+0xc4/0x288
[  248.119417]        __mutex_lock+0x84/0x868
[  248.123545]        mutex_lock_nested+0x3c/0x50
[  248.128016]        xenbus_dev_queue_reply+0x3c/0x230
[  248.133005]        xenbus_thread+0x788/0x798
[  248.137306]        kthread+0x110/0x140
[  248.141087]        ret_from_fork+0x10/0x40

It is rather easy to avoid by dropping xb_write_mutex before calling
xenbus_dev_queue_reply().

Fixes: fd8aa90 ("xen: optimize xenbus
driver for multiple concurrent xenstore accesses").

Reported-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Jul 13, 2017
commit 1a3fc2c upstream.

There has been a report about a deadlock in the xenbus driver:

[  247.979498] ======================================================
[  247.985688] WARNING: possible circular locking dependency detected
[  247.991882] 4.12.0-rc4-00022-gc4b25c0 #575 Not tainted
[  247.997040] ------------------------------------------------------
[  248.003232] xenbus/91 is trying to acquire lock:
[  248.007875]  (&u->msgbuffer_mutex){+.+.+.}, at: [<ffff00000863e904>]
xenbus_dev_queue_reply+0x3c/0x230
[  248.017163]
[  248.017163] but task is already holding lock:
[  248.023096]  (xb_write_mutex){+.+...}, at: [<ffff00000863a940>]
xenbus_thread+0x5f0/0x798
[  248.031267]
[  248.031267] which lock already depends on the new lock.
[  248.031267]
[  248.039615]
[  248.039615] the existing dependency chain (in reverse order) is:
[  248.047176]
[  248.047176] -> #1 (xb_write_mutex){+.+...}:
[  248.052943]        __lock_acquire+0x1728/0x1778
[  248.057498]        lock_acquire+0xc4/0x288
[  248.061630]        __mutex_lock+0x84/0x868
[  248.065755]        mutex_lock_nested+0x3c/0x50
[  248.070227]        xs_send+0x164/0x1f8
[  248.074015]        xenbus_dev_request_and_reply+0x6c/0x88
[  248.079427]        xenbus_file_write+0x260/0x420
[  248.084073]        __vfs_write+0x48/0x138
[  248.088113]        vfs_write+0xa8/0x1b8
[  248.091983]        SyS_write+0x54/0xb0
[  248.095768]        el0_svc_naked+0x24/0x28
[  248.099897]
[  248.099897] -> #0 (&u->msgbuffer_mutex){+.+.+.}:
[  248.106088]        print_circular_bug+0x80/0x2e0
[  248.110730]        __lock_acquire+0x1768/0x1778
[  248.115288]        lock_acquire+0xc4/0x288
[  248.119417]        __mutex_lock+0x84/0x868
[  248.123545]        mutex_lock_nested+0x3c/0x50
[  248.128016]        xenbus_dev_queue_reply+0x3c/0x230
[  248.133005]        xenbus_thread+0x788/0x798
[  248.137306]        kthread+0x110/0x140
[  248.141087]        ret_from_fork+0x10/0x40

It is rather easy to avoid by dropping xb_write_mutex before calling
xenbus_dev_queue_reply().

Fixes: fd8aa90 ("xen: optimize xenbus
driver for multiple concurrent xenstore accesses").

Reported-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Jul 1, 2020
[ Upstream commit fb7861d ]

In the current code, ->ndo_start_xmit() can be executed recursively only
10 times because of stack memory.
But, in the case of the vxlan, 10 recursion limit value results in
a stack overflow.
In the current code, the nested interface is limited by 8 depth.
There is no critical reason that the recursion limitation value should
be 10.
So, it would be good to be the same value with the limitation value of
nesting interface depth.

Test commands:
    ip link add vxlan10 type vxlan vni 10 dstport 4789 srcport 4789 4789
    ip link set vxlan10 up
    ip a a 192.168.10.1/24 dev vxlan10
    ip n a 192.168.10.2 dev vxlan10 lladdr fc:22:33:44:55:66 nud permanent

    for i in {9..0}
    do
        let A=$i+1
	ip link add vxlan$i type vxlan vni $i dstport 4789 srcport 4789 4789
	ip link set vxlan$i up
	ip a a 192.168.$i.1/24 dev vxlan$i
	ip n a 192.168.$i.2 dev vxlan$i lladdr fc:22:33:44:55:66 nud permanent
	bridge fdb add fc:22:33:44:55:66 dev vxlan$A dst 192.168.$i.2 self
    done
    hping3 192.168.10.2 -2 -d 60000

Splat looks like:
[  103.814237][ T1127] =============================================================================
[  103.871955][ T1127] BUG kmalloc-2k (Tainted: G    B            ): Padding overwritten. 0x00000000897a2e4f-0x000
[  103.873187][ T1127] -----------------------------------------------------------------------------
[  103.873187][ T1127]
[  103.874252][ T1127] INFO: Slab 0x000000005cccc724 objects=5 used=5 fp=0x0000000000000000 flags=0x10000000001020
[  103.881323][ T1127] CPU: 3 PID: 1127 Comm: hping3 Tainted: G    B             5.7.0+ #575
[  103.882131][ T1127] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  103.883006][ T1127] Call Trace:
[  103.883324][ T1127]  dump_stack+0x96/0xdb
[  103.883716][ T1127]  slab_err+0xad/0xd0
[  103.884106][ T1127]  ? _raw_spin_unlock+0x1f/0x30
[  103.884620][ T1127]  ? get_partial_node.isra.78+0x140/0x360
[  103.885214][ T1127]  slab_pad_check.part.53+0xf7/0x160
[  103.885769][ T1127]  ? pskb_expand_head+0x110/0xe10
[  103.886316][ T1127]  check_slab+0x97/0xb0
[  103.886763][ T1127]  alloc_debug_processing+0x84/0x1a0
[  103.887308][ T1127]  ___slab_alloc+0x5a5/0x630
[  103.887765][ T1127]  ? pskb_expand_head+0x110/0xe10
[  103.888265][ T1127]  ? lock_downgrade+0x730/0x730
[  103.888762][ T1127]  ? pskb_expand_head+0x110/0xe10
[  103.889244][ T1127]  ? __slab_alloc+0x3e/0x80
[  103.889675][ T1127]  __slab_alloc+0x3e/0x80
[  103.890108][ T1127]  __kmalloc_node_track_caller+0xc7/0x420
[ ... ]

Fixes: 11a766c ("net: Increase xmit RECURSION_LIMIT to 10.")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Jul 1, 2020
[ Upstream commit fb7861d ]

In the current code, ->ndo_start_xmit() can be executed recursively only
10 times because of stack memory.
But, in the case of the vxlan, 10 recursion limit value results in
a stack overflow.
In the current code, the nested interface is limited by 8 depth.
There is no critical reason that the recursion limitation value should
be 10.
So, it would be good to be the same value with the limitation value of
nesting interface depth.

Test commands:
    ip link add vxlan10 type vxlan vni 10 dstport 4789 srcport 4789 4789
    ip link set vxlan10 up
    ip a a 192.168.10.1/24 dev vxlan10
    ip n a 192.168.10.2 dev vxlan10 lladdr fc:22:33:44:55:66 nud permanent

    for i in {9..0}
    do
        let A=$i+1
	ip link add vxlan$i type vxlan vni $i dstport 4789 srcport 4789 4789
	ip link set vxlan$i up
	ip a a 192.168.$i.1/24 dev vxlan$i
	ip n a 192.168.$i.2 dev vxlan$i lladdr fc:22:33:44:55:66 nud permanent
	bridge fdb add fc:22:33:44:55:66 dev vxlan$A dst 192.168.$i.2 self
    done
    hping3 192.168.10.2 -2 -d 60000

Splat looks like:
[  103.814237][ T1127] =============================================================================
[  103.871955][ T1127] BUG kmalloc-2k (Tainted: G    B            ): Padding overwritten. 0x00000000897a2e4f-0x000
[  103.873187][ T1127] -----------------------------------------------------------------------------
[  103.873187][ T1127]
[  103.874252][ T1127] INFO: Slab 0x000000005cccc724 objects=5 used=5 fp=0x0000000000000000 flags=0x10000000001020
[  103.881323][ T1127] CPU: 3 PID: 1127 Comm: hping3 Tainted: G    B             5.7.0+ #575
[  103.882131][ T1127] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  103.883006][ T1127] Call Trace:
[  103.883324][ T1127]  dump_stack+0x96/0xdb
[  103.883716][ T1127]  slab_err+0xad/0xd0
[  103.884106][ T1127]  ? _raw_spin_unlock+0x1f/0x30
[  103.884620][ T1127]  ? get_partial_node.isra.78+0x140/0x360
[  103.885214][ T1127]  slab_pad_check.part.53+0xf7/0x160
[  103.885769][ T1127]  ? pskb_expand_head+0x110/0xe10
[  103.886316][ T1127]  check_slab+0x97/0xb0
[  103.886763][ T1127]  alloc_debug_processing+0x84/0x1a0
[  103.887308][ T1127]  ___slab_alloc+0x5a5/0x630
[  103.887765][ T1127]  ? pskb_expand_head+0x110/0xe10
[  103.888265][ T1127]  ? lock_downgrade+0x730/0x730
[  103.888762][ T1127]  ? pskb_expand_head+0x110/0xe10
[  103.889244][ T1127]  ? __slab_alloc+0x3e/0x80
[  103.889675][ T1127]  __slab_alloc+0x3e/0x80
[  103.890108][ T1127]  __kmalloc_node_track_caller+0xc7/0x420
[ ... ]

Fixes: 11a766c ("net: Increase xmit RECURSION_LIMIT to 10.")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants