Skip to content
Commits on Feb 21, 2013
  1. @gregkh


    gregkh committed
  2. @igit @gregkh

    printk: fix buffer overflow when calling log_prefix function from cal…

    igit committed with gregkh
    This patch corrects a buffer overflow in kernels from 3.0 to 3.4 when calling
    log_prefix() function from call_console_drivers().
    This bug existed in previous releases but has been revealed with commit
    162a7e7 (2.6.39 => 3.0) that made changes
    about how to allocate memory for early printk buffer (use of memblock_alloc).
    It disappears with commit 7ff9554 (3.4 => 3.5)
    that does a refactoring of printk buffer management.
    In log_prefix(), the access to "p[0]", "p[1]", "p[2]" or
    "simple_strtoul(&p[1], &endp, 10)" may cause a buffer overflow as this
    function is called from call_console_drivers by passing "&LOG_BUF(cur_index)"
    where the index must be masked to do not exceed the buffer's boundary.
    The trick is to prepare in call_console_drivers() a buffer with the necessary
    data (PRI field of syslog message) to be safely evaluated in log_prefix().
    This patch can be applied to stable kernel branches 3.0.y, 3.2.y and 3.4.y.
    Without this patch, one can freeze a server running this loop from shell :
      $ export DUMMY=`cat /dev/urandom | tr -dc '12345AZERTYUIOPQSDFGHJKLMWXCVBNazertyuiopqsdfghjklmwxcvbn' | head -c255`
      $ while true do ; echo $DUMMY > /dev/kmsg ; done
    The "server freeze" depends on where memblock_alloc does allocate printk buffer :
    if the buffer overflow is inside another kernel allocation the problem may not
    be revealed, else the server may hangs up.
    Signed-off-by: Alexandre SIMON <>
    Signed-off-by: Greg Kroah-Hartman <>
Commits on Feb 17, 2013
  1. @gregkh

    Linux 3.4.32

    gregkh committed
  2. @gregkh

    igb: Remove artificial restriction on RQDPC stat reading

    Alexander Duyck committed with gregkh
    commit ae1c07a upstream.
    For some reason the reading of the RQDPC register was being artificially
    limited to 4K.  Instead of limiting the value we should read the value and
    add the full amount.  Otherwise this can lead to a misleading number of
    dropped packets when the actual value is in fact much higher.
    Signed-off-by: Alexander Duyck <>
    Tested-by: Jeff Pieper   <>
    Signed-off-by: Jeff Kirsher <>
    Cc: Vinson Lee <>
    Signed-off-by: Greg Kroah-Hartman <>
  3. @gregkh

    efi: Clear EFI_RUNTIME_SERVICES rather than EFI_BOOT by "noefi" boot …

    Satoru Takeuchi committed with gregkh
    commit 1de63d6 upstream.
    There was a serious problem in samsung-laptop that its platform driver is
    designed to run under BIOS and running under EFI can cause the machine to
    become bricked or can cause Machine Check Exceptions.
        Discussion about this problem:
        The patches to fix this problem:
        efi: Make 'efi_enabled' a function to query EFI facilities
        samsung-laptop: Disable on EFI hardware
    Unfortunately this problem comes back again if users specify "noefi" option.
    This parameter clears EFI_BOOT and that driver continues to run even if running
    under EFI. Refer to the document, this parameter should clear
    	noefi		[X86] Disable EFI runtime services support.
    - If some or all EFI runtime services don't work, you can try following
      kernel command line parameters to turn off some or all EFI runtime
    	noefi		turn off all EFI runtime services
    Signed-off-by: Satoru Takeuchi <>
    Cc: Matt Fleming <>
    Signed-off-by: H. Peter Anvin <>
    Signed-off-by: Greg Kroah-Hartman <>
  4. @rjwysocki @gregkh

    PCI/PM: Clean up PME state when removing a device

    rjwysocki committed with gregkh
    commit 249bfb8 upstream.
    Devices are added to pci_pme_list when drivers use pci_enable_wake()
    or pci_wake_from_d3(), but they aren't removed from the list unless
    the driver explicitly disables wakeup.  Many drivers never disable
    wakeup, so their devices remain on the list even after they are
    removed, e.g., via hotplug.  A subsequent PME poll will oops when
    it tries to touch the device.
    This patch disables PME# on a device before removing it, which removes
    the device from pci_pme_list.  This is safe even if the device never
    had PME# enabled.
    This oops can be triggered by unplugging a Thunderbolt ethernet adapter
    on a Macbook Pro, as reported by Daniel below.
    [bhelgaas: changelog]
    Reported-and-tested-by: Daniel J Blueman <>
    Signed-off-by: Rafael J. Wysocki <>
    Signed-off-by: Bjorn Helgaas <>
    Signed-off-by: Greg Kroah-Hartman <>
  5. @jbeulich @gregkh

    x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS.

    jbeulich committed with gregkh
    commit 13d2b4d upstream.
    This fixes CVE-2013-0228 / XSA-42
    Drew Jones while working on CVE-2013-0190 found that that unprivileged guest user
    in 32bit PV guest can use to crash the > guest with the panic like this:
    general protection fault: 0000 [#1] SMP
    last sysfs file: /sys/devices/vbd-51712/block/xvda/dev
    Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
    iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
    xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront ext4
    mbcache jbd2 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last
    unloaded: scsi_wait_scan]
    Pid: 1250, comm: r Not tainted 2.6.32-356.el6.i686 #1
    EIP: 0061:[<c0407462>] EFLAGS: 00010086 CPU: 0
    EIP is at xen_iret+0x12/0x2b
    EAX: eb8d0000 EBX: 00000001 ECX: 08049860 EDX: 00000010
    ESI: 00000000 EDI: 003d0f00 EBP: b77f8388 ESP: eb8d1fe0
     DS: 0000 ES: 007b FS: 0000 GS: 00e0 SS: 0069
    Process r (pid: 1250, ti=eb8d0000 task=c2953550 task.ti=eb8d0000)
     00000000 0027f416 00000073 00000206 b77f8364 0000007b 00000000 00000000
    Call Trace:
    Code: c3 8b 44 24 18 81 4c 24 38 00 02 00 00 8d 64 24 30 e9 03 00 00 00
    8d 76 00 f7 44 24 08 00 00 02 80 75 33 50 b8 00 e0 ff ff 21 e0 <8b> 40
    10 8b 04 85 a0 f6 ab c0 8b 80 0c b0 b3 c0 f6 44 24 0d 02
    EIP: [<c0407462>] xen_iret+0x12/0x2b SS:ESP 0069:eb8d1fe0
    general protection fault: 0000 [#2]
    ---[ end trace ab0d29a492dcd330 ]---
    Kernel panic - not syncing: Fatal exception
    Pid: 1250, comm: r Tainted: G      D    ---------------
    2.6.32-356.el6.i686 #1
    Call Trace:
     [<c08476df>] ? panic+0x6e/0x122
     [<c084b63c>] ? oops_end+0xbc/0xd0
     [<c084b260>] ? do_general_protection+0x0/0x210
     [<c084a9b7>] ? error_code+0x73/
    Petr says: "
     I've analysed the bug and I think that xen_iret() cannot cope with
     mangled DS, in this case zeroed out (null selector/descriptor) by either
     xen_failsafe_callback() or RESTORE_REGS because the corresponding LDT
     entry was invalidated by the reproducer. "
    Jan took a look at the preliminary patch and came up a fix that solves
    this problem:
    "This code gets called after all registers other than those handled by
    IRET got already restored, hence a null selector in %ds or a non-null
    one that got loaded from a code or read-only data descriptor would
    cause a kernel mode fault (with the potential of crashing the kernel
    as a whole, if panic_on_oops is set)."
    The way to fix this is to realize that the we can only relay on the
    registers that IRET restores. The two that are guaranteed are the
    %cs and %ss as they are always fixed GDT selectors. Also they are
    inaccessible from user mode - so they cannot be altered. This is
    the approach taken in this patch.
    Another alternative option suggested by Jan would be to relay on
    the subtle realization that using the %ebp or %esp relative references uses
    the %ss segment.  In which case we could switch from using %eax to %ebp and
    would not need the %ss over-rides. That would also require one extra
    instruction to compensate for the one place where the register is used
    as scaled index. However Andrew pointed out that is too subtle and if
    further work was to be done in this code-path it could escape folks attention
    and lead to accidents.
    Reviewed-by: Petr Matousek <>
    Reported-by: Petr Matousek <>
    Reviewed-by: Andrew Cooper <>
    Signed-off-by: Jan Beulich <>
    Signed-off-by: Konrad Rzeszutek Wilk <>
    Signed-off-by: Greg Kroah-Hartman <>
  6. @gregkh

    x86/mm: Check if PUD is large when validating a kernel address

    Mel Gorman committed with gregkh
    commit 0ee364e upstream.
    A user reported the following oops when a backup process reads
     BUG: unable to handle kernel paging request at ffffbb00ff33b000
     IP: [<ffffffff8103157e>] kern_addr_valid+0xbe/0x110
     Call Trace:
      [<ffffffff811b8aaa>] read_kcore+0x17a/0x370
      [<ffffffff811ad847>] proc_reg_read+0x77/0xc0
      [<ffffffff81151687>] vfs_read+0xc7/0x130
      [<ffffffff811517f3>] sys_read+0x53/0xa0
      [<ffffffff81449692>] system_call_fastpath+0x16/0x1b
    Investigation determined that the bug triggered when reading
    system RAM at the 4G mark. On this system, that was the first
    address using 1G pages for the virt->phys direct mapping so the
    PUD is pointing to a physical address, not a PMD page.
    The problem is that the page table walker in kern_addr_valid() is
    not checking pud_large() and treats the physical address as if
    it was a PMD.  If it happens to look like pmd_none then it'll
    silently fail, probably returning zeros instead of real data. If
    the data happens to look like a present PMD though, it will be
    walked resulting in the oops above.
    This patch adds the necessary pud_large() check.
    Unfortunately the problem was not readily reproducible and now
    they are running the backup program without accessing
    /proc/kcore so the patch has not been validated but I think it
    makes sense.
    Signed-off-by: Mel Gorman <>
    Reviewed-by: Rik van Riel <riel@redhat.coM>
    Reviewed-by: Michal Hocko <>
    Acked-by: Johannes Weiner <>
    Signed-off-by: Ingo Molnar <>
    Signed-off-by: Greg Kroah-Hartman <>
  7. @gregkh

    x86/apic: Work around boot failure on HP ProLiant DL980 G7 Server sys…

    Stoney Wang committed with gregkh
    commit cb214ed upstream.
    When a HP ProLiant DL980 G7 Server boots a regular kernel,
    there will be intermittent lost interrupts which could
    result in a hang or (in extreme cases) data loss.
    The reason is that this system only supports x2apic physical
    mode, while the kernel boots with a logical-cluster default
    This bug can be worked around by specifying the "x2apic_phys" or
    "nox2apic" boot option, but we want to handle this system
    without requiring manual workarounds.
    As all apicids are smaller than 255, BIOS need to pass the
    control to the OS with xapic mode, according to x2apic-spec,
    chapter 2.9.
    Current code handle x2apic when BIOS pass with xapic mode
    When user specifies x2apic_phys, or FADT indicates PHYSICAL:
    1. During madt oem check, apic driver is set with xapic logical
       or xapic phys driver at first.
    2. enable_IR_x2apic() will enable x2apic_mode.
    3. if user specifies x2apic_phys on the boot line, x2apic_phys_probe()
       will install the correct x2apic phys driver and use x2apic phys mode.
       Otherwise it will skip the driver will let x2apic_cluster_probe to
       take over to install x2apic cluster driver (wrong one) even though FADT
       indicates PHYSICAL, because x2apic_phys_probe does not check
    Add checking x2apic_fadt_phys in x2apic_phys_probe() to fix the
    Signed-off-by: Stoney Wang <>
    [ updated the changelog and simplified the code ]
    Signed-off-by: Yinghai Lu <>
    Signed-off-by: Ingo Molnar <>
    Signed-off-by: Greg Kroah-Hartman <>
  8. @kees @gregkh

    x86: Do not leak kernel page mapping locations

    kees committed with gregkh
    commit e575a86 upstream.
    Without this patch, it is trivial to determine kernel page
    mappings by examining the error code reported to dmesg[1].
    Instead, declare the entire kernel memory space as a violation
    of a present page.
    Additionally, since show_unhandled_signals is enabled by
    default, switch branch hinting to the more realistic
    expectation, and unobfuscate the setting of the PF_PROT bit to
    improve readability.
    Reported-by: Dan Rosenberg <>
    Suggested-by: Brad Spengler <>
    Signed-off-by: Kees Cook <>
    Acked-by: H. Peter Anvin <>
    Cc: Paul E. McKenney <>
    Cc: Frederic Weisbecker <>
    Cc: Eric W. Biederman <>
    Cc: Linus Torvalds <>
    Cc: Andrew Morton <>
    Cc: Peter Zijlstra <>
    Signed-off-by: Ingo Molnar <>
    Signed-off-by: Greg Kroah-Hartman <>
  9. @gregkh

    s390/timer: avoid overflow when programming clock comparator

    Heiko Carstens committed with gregkh
    commit d911e03 upstream.
    Since ed4f209 "s390/time: fix sched_clock() overflow" a new helper function
    is used to avoid overflows when converting TOD format values to nanosecond
    The kvm interrupt code formerly however only worked by accident because of
    an overflow. It tried to program a timer that would expire in more than ~29
    years. Because of the old TOD-to-nanoseconds overflow bug the real expiry
    value however was much smaller, but now it isn't anymore.
    This however triggers yet another bug in the function that programs the clock
    comparator s390_next_ktime(): if the absolute "expires" value is after 2042
    this will result in an overflow and the programmed value is lower than the
    current TOD value which immediatly triggers a clock comparator (= timer)
    Since the timer isn't expired it will be programmed immediately again and so
    on... the result is a dead system.
    To fix this simply program the maximum possible value if an overflow is
    Reported-by: Christian Borntraeger <>
    Tested-by: Christian Borntraeger <>
    Signed-off-by: Heiko Carstens <>
    Signed-off-by: Greg Kroah-Hartman <>
Commits on Feb 14, 2013
  1. @gregkh

    Linux 3.4.31

    gregkh committed
  2. @gregkh

    be2net: Fix to trim skb for padded vlan packets to workaround an ASIC…

    Somnath Kotur committed with gregkh
    … Bug
    commit 93040ae upstream.
    Fixed spelling error in a comment as pointed out by DaveM.
    Also refactored existing code a bit to provide placeholders for another ASIC
    Bug workaround that will be checked-in soon after this.
    Signed-off-by: Somnath Kotur <>
    Signed-off-by: David S. Miller <>
    Cc: Jacek Luczak <>
    Signed-off-by: Greg Kroah-Hartman <>
  3. @gregkh

    tg3: Fix crc errors on jumbo frame receive

    Nithin Nayak Sujir committed with gregkh
    [ Upstream commit daf3ec6 ]
    TG3_PHY_AUXCTL_SMDSP_ENABLE/DISABLE macros do a blind write to the phy
    auxiliary control register and overwrite the EXT_PKT_LEN (bit 14) resulting
    in intermittent crc errors on jumbo frames with some link partners. Change
    the code to do a read/modify/write.
    Signed-off-by: Nithin Nayak Sujir <>
    Signed-off-by: Michael Chan <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
  4. @gregkh

    tg3: Avoid null pointer dereference in tg3_interrupt in netconsole mode

    Nithin Nayak Sujir committed with gregkh
    [ Upstream commit 9c13cb8 ]
    When netconsole is enabled, logging messages generated during tg3_open
    can result in a null pointer dereference for the uninitialized tg3
    status block. Use the irq_sync flag to disable polling in the early
    stages. irq_sync is cleared when the driver is enabling interrupts after
    all initialization is completed.
    Signed-off-by: Nithin Nayak Sujir <>
    Signed-off-by: Michael Chan <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
  5. @gregkh

    bridge: Pull ip header into skb->data before looking into ip header.

    Sarveshwar Bandi committed with gregkh
    [ Upstream commit 6caab7b ]
    If lower layer driver leaves the ip header in the skb fragment, it needs to
    be first pulled into skb->data before inspecting ip header length or ip version
    Signed-off-by: Sarveshwar Bandi <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
  6. @gregkh

    tcp: fix for zero packets_in_flight was too broad

    Ilpo Järvinen committed with gregkh
    [ Upstream commit 6731d20 ]
    There are transients during normal FRTO procedure during which
    the packets_in_flight can go to zero between write_queue state
    updates and firing the resulting segments out. As FRTO processing
    occurs during that window the check must be more precise to
    not match "spuriously" :-). More specificly, e.g., when
    packets_in_flight is zero but FLAG_DATA_ACKED is true the problematic
    branch that set cwnd into zero would not be taken and new segments
    might be sent out later.
    Signed-off-by: Ilpo Järvinen <>
    Tested-by: Eric Dumazet <>
    Acked-by: Neal Cardwell <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
  7. @gregkh

    tcp: frto should not set snd_cwnd to 0

    Eric Dumazet committed with gregkh
    [ Upstream commit 2e5f421 ]
    Commit 9dc2741 (tcp: fix ABC in tcp_slow_start())
    uncovered a bug in FRTO code :
    tcp_process_frto() is setting snd_cwnd to 0 if the number
    of in flight packets is 0.
    As Neal pointed out, if no packet is in flight we lost our
    chance to disambiguate whether a loss timeout was spurious.
    We should assume it was a proper loss.
    Reported-by: Pasi Kärkkäinen <>
    Signed-off-by: Neal Cardwell <>
    Signed-off-by: Eric Dumazet <>
    Cc: Ilpo Järvinen <>
    Cc: Yuchung Cheng <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
  8. @gregkh

    netback: correct netbk_tx_err to handle wrap around.

    Ian Campbell committed with gregkh
    [ Upstream commit b914972 ]
    Signed-off-by: Ian Campbell <>
    Acked-by: Jan Beulich <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
  9. @gregkh

    xen/netback: free already allocated memory on failure in xen_netbk_ge…

    Ian Campbell committed with gregkh
    [ Upstream commit 4cc7c1c ]
    Signed-off-by: Ian Campbell <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
  10. @gregkh

    xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop.

    Matthew Daley committed with gregkh
    [ Upstream commit 7d5145d ]
    Signed-off-by: Matthew Daley <>
    Reviewed-by: Konrad Rzeszutek Wilk <>
    Acked-by: Ian Campbell <>
    Acked-by: Jan Beulich <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
  11. @gregkh

    xen/netback: shutdown the ring if it contains garbage.

    Ian Campbell committed with gregkh
    [ Upstream commit 4885628 ]
    A buggy or malicious frontend should not be able to confuse netback.
    If we spot anything which is not as it should be then shutdown the
    device and don't try to continue with the ring in a potentially
    hostile state. Well behaved and non-hostile frontends will not be
    As well as making the existing checks for such errors fatal also add a
    new check that ensures that there isn't an insane number of requests
    on the ring (i.e. more than would fit in the ring). If the ring
    contains garbage then previously is was possible to loop over this
    insane number, getting an error each time and therefore not generating
    any more pending requests and therefore not exiting the loop in
    xen_netbk_tx_build_gops for an externded period.
    Also turn various netdev_dbg calls which no precipitate a fatal error
    into netdev_err, they are rate limited because the device is shutdown
    This fixes at least one known DoS/softlockup of the backend domain.
    Signed-off-by: Ian Campbell <>
    Reviewed-by: Konrad Rzeszutek Wilk <>
    Acked-by: Jan Beulich <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
  12. @borkmann @gregkh

    net: sctp: sctp_endpoint_free: zero out secret key data

    borkmann committed with gregkh
    [ Upstream commit b5c37fe ]
    On sctp_endpoint_destroy, previously used sensitive keying material
    should be zeroed out before the memory is returned, as we already do
    with e.g. auth keys when released.
    Signed-off-by: Daniel Borkmann <>
    Acked-by: Vlad Yasevich <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
  13. @borkmann @gregkh

    net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree

    borkmann committed with gregkh
    [ Upstream commit 6ba542a ]
    In sctp_setsockopt_auth_key, we create a temporary copy of the user
    passed shared auth key for the endpoint or association and after
    internal setup, we free it right away. Since it's sensitive data, we
    should zero out the key before returning the memory back to the
    allocator. Thus, use kzfree instead of kfree, just as we do in
    Signed-off-by: Daniel Borkmann <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
  14. @gregkh

    sctp: refactor sctp_outq_teardown to insure proper re-initalization

    Neil Horman committed with gregkh
    [ Upstream commit 2f94aab ]
    Jamie Parsons reported a problem recently, in which the re-initalization of an
    association (The duplicate init case), resulted in a loss of receive window
    space.  He tracked down the root cause to sctp_outq_teardown, which discarded
    all the data on an outq during a re-initalization of the corresponding
    association, but never reset the outq->outstanding_data field to zero.  I wrote,
    and he tested this fix, which does a proper full re-initalization of the outq,
    fixing this problem, and hopefully future proofing us from simmilar issues down
    the road.
    Signed-off-by: Neil Horman <>
    Reported-by: Jamie Parsons <>
    Tested-by: Jamie Parsons <>
    CC: Jamie Parsons <>
    CC: Vlad Yasevich <>
    CC: "David S. Miller" <>
    Acked-by: Vlad Yasevich <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
  15. @gregkh

    atm/iphase: rename fregt_t -> ffreg_t

    Heiko Carstens committed with gregkh
    [ Upstream commit ab54ee8 ]
    We have conflicting type qualifiers for "freg_t" in s390's ptrace.h and the
    iphase atm device driver, which causes the compile error below.
    Unfortunately the s390 typedef can't be renamed, since it's a user visible api,
    nor can I change the include order in s390 code to avoid the conflict.
    So simply rename the iphase typedef to a new name. Fixes this compile error:
    In file included from drivers/atm/iphase.c:66:0:
    drivers/atm/iphase.h:639:25: error: conflicting type qualifiers for 'freg_t'
    In file included from next/arch/s390/include/asm/ptrace.h:9:0,
                     from next/arch/s390/include/asm/lowcore.h:12,
                     from next/arch/s390/include/asm/thread_info.h:30,
                     from include/linux/thread_info.h:54,
                     from include/linux/preempt.h:9,
                     from include/linux/spinlock.h:50,
                     from include/linux/seqlock.h:29,
                     from include/linux/time.h:5,
                     from include/linux/stat.h:18,
                     from include/linux/module.h:10,
                     from drivers/atm/iphase.c:43:
    next/arch/s390/include/uapi/asm/ptrace.h:197:3: note: previous declaration of 'freg_t' was here
    Signed-off-by: Heiko Carstens <>
    Acked-by: chas williams - CONTRACTOR <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
  16. @gregkh

    packet: fix leakage of tx_ring memory

    Phil Sutter committed with gregkh
    [ Upstream commit 9665d5d ]
    When releasing a packet socket, the routine packet_set_ring() is reused
    to free rings instead of allocating them. But when calling it for the
    first time, it fills req->tp_block_nr with the value of rb->pg_vec_len
    which in the second invocation makes it bail out since req->tp_block_nr
    is greater zero but req->tp_block_size is zero.
    This patch solves the problem by passing a zeroed auto-variable to
    packet_set_ring() upon each invocation from packet_release().
    As far as I can tell, this issue exists even since 69e3c75 (net: TX_RING
    and packet mmap), i.e. the original inclusion of TX ring support into
    af_packet, but applies only to sockets with both RX and TX ring
    allocated, which is probably why this was unnoticed all the time.
    Signed-off-by: Phil Sutter <>
    Cc: Johann Baudy <>
    Cc: Daniel Borkmann <>
    Acked-by: Daniel Borkmann <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
  17. @davem330 @gregkh

    via-rhine: Fix bugs in NAPI support.

    davem330 committed with gregkh
    [ Upstream commit 559bcac ]
    1) rhine_tx() should use dev_kfree_skb() not dev_kfree_skb_irq()
    2) rhine_slow_event_task's NAPI triggering logic is racey, it
       should just hit the interrupt mask register.  This is the
       same as commit 7dbb491
       ("r8169: avoid NAPI scheduling delay.") made to fix the same
       problem in the r8169 driver.  From Francois Romieu.
    Reported-by: Jamie Gloudon <>
    Tested-by: Jamie Gloudon <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
  18. @gregkh

    ipv6: do not create neighbor entries for local delivery

    Marcelo Ricardo Leitner committed with gregkh
    [ Upstream commit bd30e94 ]
    They will be created at output, if ever needed. This avoids creating
    empty neighbor entries when TPROXYing/Forwarding packets for addresses
    that are not even directly reachable.
    Note that IPv4 already handles it this way. No neighbor entries are
    created for local input.
    Tested by myself and customer.
    Signed-off-by: Jiri Pirko <>
    Signed-off-by: Marcelo Ricardo Leitner <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
  19. @gregkh

    pktgen: correctly handle failures when adding a device

    Cong Wang committed with gregkh
    [ Upstream commit 604dfd6 ]
    The return value of pktgen_add_device() is not checked, so
    even if we fail to add some device, for example, non-exist one,
    we still see "OK:...". This patch fixes it.
    After this patch, I got:
    	# echo "add_device non-exist" > /proc/net/pktgen/kpktgend_0
    	-bash: echo: write error: No such device
    	# cat /proc/net/pktgen/kpktgend_0
    	Result: ERROR: can not add device non-exist
    	# echo "add_device eth0" > /proc/net/pktgen/kpktgend_0
    	# cat /proc/net/pktgen/kpktgend_0
    	Stopped: eth0
    	Result: OK: add_device=eth0
    (Candidate for -stable)
    Cc: David S. Miller <>
    Signed-off-by: Cong Wang <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
  20. @gregkh

    net: loopback: fix a dst refcounting issue

    Eric Dumazet committed with gregkh
    [ Upstream commit 794ed39 ]
    Ben Greear reported crashes in ip_rcv_finish() on a stress
    test involving many macvlans.
    We tracked the bug to a dst use after free. ip_rcv_finish()
    was calling dst->input() and got garbage for dst->input value.
    It appears the bug is in loopback driver, lacking
    a skb_dst_force() before calling netif_rx().
    As a result, a non refcounted dst, normally protected by a
    RCU read_lock section, was escaping this section and could
    be freed before the packet being processed.
      [<ffffffff813a3c4d>] loopback_xmit+0x64/0x83
      [<ffffffff81477364>] dev_hard_start_xmit+0x26c/0x35e
      [<ffffffff8147771a>] dev_queue_xmit+0x2c4/0x37c
      [<ffffffff81477456>] ? dev_hard_start_xmit+0x35e/0x35e
      [<ffffffff8148cfa6>] ? eth_header+0x28/0xb6
      [<ffffffff81480f09>] neigh_resolve_output+0x176/0x1a7
      [<ffffffff814ad835>] ip_finish_output2+0x297/0x30d
      [<ffffffff814ad6d5>] ? ip_finish_output2+0x137/0x30d
      [<ffffffff814ad90e>] ip_finish_output+0x63/0x68
      [<ffffffff814ae412>] ip_output+0x61/0x67
      [<ffffffff814ab904>] dst_output+0x17/0x1b
      [<ffffffff814adb6d>] ip_local_out+0x1e/0x23
      [<ffffffff814ae1c4>] ip_queue_xmit+0x315/0x353
      [<ffffffff814adeaf>] ? ip_send_unicast_reply+0x2cc/0x2cc
      [<ffffffff814c018f>] tcp_transmit_skb+0x7ca/0x80b
      [<ffffffff814c3571>] tcp_connect+0x53c/0x587
      [<ffffffff810c2f0c>] ? getnstimeofday+0x44/0x7d
      [<ffffffff810c2f56>] ? ktime_get_real+0x11/0x3e
      [<ffffffff814c6f9b>] tcp_v4_connect+0x3c2/0x431
      [<ffffffff814d6913>] __inet_stream_connect+0x84/0x287
      [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
      [<ffffffff8108d695>] ? _local_bh_enable_ip+0x84/0x9f
      [<ffffffff8108d6c8>] ? local_bh_enable+0xd/0x11
      [<ffffffff8146763c>] ? lock_sock_nested+0x6e/0x79
      [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
      [<ffffffff814d6b49>] inet_stream_connect+0x33/0x49
      [<ffffffff814632c6>] sys_connect+0x75/0x98
    This bug was introduced in linux-2.6.35, in commit
    7fee226 (net: add a noref bit on skb dst)
    skb_dst_force() is enforced in dev_queue_xmit() for devices having a
    Reported-by: Ben Greear <>
    Signed-off-by: Eric Dumazet <>
    Tested-by: Ben Greear <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
  21. @fabled @gregkh

    r8169: remove the obsolete and incorrect AMD workaround

    fabled committed with gregkh
    [ Upstream commit 5d0feaf ]
    This was introduced in commit 6dccd16 "r8169: merge with version
    6.001.00 of Realtek's r8169 driver". I did not find the version
    6.001.00 online, but in 6.002.00 or any later r8169 from Realtek
    this hunk is no longer present.
    Also commit 05af214 "r8169: fix Ethernet Hangup for RTL8110SC
    rev d" claims to have fixed this issue otherwise.
    The magic compare mask of 0xfffe000 is dubious as it masks
    parts of the Reserved part, and parts of the VLAN tag. But this
    does not make much sense as the VLAN tag parts are perfectly
    valid there. In matter of fact this seems to be triggered with
    any VLAN tagged packet as RxVlanTag bit is matched. I would
    suspect 0xfffe0000 was intended to test reserved part only.
    Finally, this hunk is evil as it can cause more packets to be
    handled than what was NAPI quota causing net/core/dev.c:
    net_rx_action(): WARN_ON_ONCE(work > weight) to trigger, and
    mess up the NAPI state causing device to hang.
    As result, any system using VLANs and having high receive
    traffic (so that NAPI poll budget limits rtl_rx) would result
    in device hang.
    Signed-off-by: Timo Teräs <>
    Acked-by: Francois Romieu <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
  22. @gregkh

    netxen: fix off by one bug in netxen_release_tx_buffer()

    Eric Dumazet committed with gregkh
    [ Upstream commit a05948f ]
    Christoph Paasch found netxen could trigger a BUG in its dismantle
    phase, in netxen_release_tx_buffer(), using full size TSO packets.
    cmd_buf->frag_count includes the skb->data part, so the loop must
    start at index 1 instead of 0, or else we can make an out
    of bound access to cmd_buff->frag_array[MAX_SKB_FRAGS + 2]
    Christoph provided the fixes in netxen_map_tx_skb() function.
    In case of a dma mapping error, its better to clear the dma fields
    so that we don't try to unmap them again in netxen_release_tx_buffer()
    Reported-by: Christoph Paasch <>
    Signed-off-by: Eric Dumazet <>
    Tested-by: Christoph Paasch <>
    Cc: Sony Chacko <>
    Cc: Rajesh Borundia <>
    Signed-off-by: Christoph Paasch <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
  23. @tilmanschmidt @gregkh

    isdn/gigaset: fix zero size border case in debug dump

    tilmanschmidt committed with gregkh
    [ Upstream commit d721a17 ]
    If subtracting 12 from l leaves zero we'd do a zero size allocation,
    leading to an oops later when we try to set the NUL terminator.
    Reported-by: Dan Carpenter <>
    Signed-off-by: Tilman Schmidt <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
  24. @gregkh

    net/mlx4_core: Set number of msix vectors under SRIOV mode to firmwar…

    Or Gerlitz committed with gregkh
    …e defaults
    [ Upstream commit ca4c7b3 ]
    The lines
    	if (mlx4_is_mfunc(dev)) {
    		nreq = 2;
    	} else {
    which hard code the number of requested msi-x vectors under multi-function
    mode to two can be removed completely, since the firmware sets num_eqs and
    reserved_eqs appropriately Thus, the code line:
    	nreq = min_t(int, dev->caps.num_eqs - dev->caps.reserved_eqs, nreq);
    is by itself sufficient and correct for all cases. Currently, for mfunc
    mode num_eqs = 32 and reserved_eqs = 28, hence four vectors will be enabled.
    This triples (one vector is used for the async events and commands EQ) the
    horse power provided for processing of incoming packets on netdev RSS scheme,
    IO initiators/targets commands processing flows, etc.
    Reviewed-by: Jack Morgenstein <>
    Signed-off-by: Amir Vadai <>
    Signed-off-by: Or Gerlitz <>
    Signed-off-by: David S. Miller <>
    Signed-off-by: Greg Kroah-Hartman <>
Something went wrong with that request. Please try again.