Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Commits on Dec 5, 2011
  1. sysctl: add register_net_sysctl_table_net_cookie

    authored
    Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
  2. sysctl: add cookie to __register_sysctl_paths

    authored
    Extend the sysctl registration APIs to receive a cookie + cookie handler.
    
    Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
  3. sysctl: replace netns corresp list with rbtree

    authored
    Similar to the last patch that replaced the subdirectory list with a rbtree.
    
    Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
  4. sysctl: faster tree-based sysctl implementation

    authored
    The old implementation used inefficient algorithms for lookup, readdir
    and registration.
    
    This patch introduces an improved algorithm:
    - lower memory consumption,
    - better time complexity for lookup/readdir/registration.
    
    Locking is a bit heavier in this algorithm (in this patch: reader
    locks for lookup/readdir, writer locks for register/unregister; in a
    later patch in this series: RCU + spin-lock). I'll address this
    locking issue later in this commit.
    
    I will shortly describe the previous algorithm, the new one and brag
    at the end with an endless list of improvements and new limitations.
    
    = Old algorithm =
    
    == Description ==
    We created a ctl_table_header for each registered sysctl table. The
    header's role is to maintain sysctl internal data, reference counting
    and as a token to unregister the table.
    
    All headers were put in a list in the order of registration without
    regard to the position of the tables in the sysctl tree. Headers were
    also 'attached' one to another to (somewhat) speed up lookup/readdir.
    
    Attaching a header meant looking at each other already registered
    header and comparing the paths to the tables. A newly registered
    header would be attached to the first header with which it would share
    most of it's path.
    
    e.g. paths registered: /, /a/b/c, /a/b/c/d, /a/x, /a/x/y, /a/z
         tree:
      /
      + /a/b/c
         |   + /a/b/c/d
         + /a/x
         | /a/x/y
         + /a/z
    
    == Time complexity ==
    
    - register N tables would take O(N^2) steps (see above)
    
    - lookup: if the item searched for is not found in the current header,
      iterate the list of headers until you find another header that's
      attached to the current position in the header's table. Lookups for
      elements that are in a header registered under the current position
      or inexistent elements would take O(N) steps each.
    
    - readdir: after searching the current headers table in the current
      position, always do an O(N) search for a header attached to the
      current table position.
    
    == Memory ==
    
    Each header was allocated some data and a variable-length path.
    O(1) with kzalloc/kfree.
    
    = New algorithm =
    
    == Description ==
    
    Reuses the 'ctl_table_header' concept but with two distinct meanings:
    - as a wrapper of a table registered by the user
    - as a directory entry.
    
    Registering the paths from the above example gives this tree:
     paths: /, /a/b/c, /a/b/c/d, /a/x, /a/x/y, /a/z
     tree:
         /: .subdirs = a
           a: .subdirs = b x z
             b: subdirs = c
                c: subdirs = d
    	      d:
             x: subdirs = y
    	   y:
             z:
    
    Each directory gets a header. Each header has a parent (except root)
    and two lists:
     - ctl_subdirs: list of sub-directories - other headers
     - ctl_tables: list of headers that wrap a ctl_table array
    
    Because the directory structure is now maintained as ctl_table_header
    objects, we needed to remove the .child from ctl_tables (this explains
    the previous patches). A ctl_table array represents a list of files.
    
    == Time complexity ==
    
    - registration of N headers. Registration means adding new directories
      at each level or incrementing an existing directory's refcount.
    
      - O(N * lnN) - if the paths to the headers are evenly distributed
    
      - O(N^2) - if most of the headers registered are children of the
        same parent directory (searching the list of subdirs takes O(N)).
        There are cases where this happens (e.g. registering sysctl
        entries for net devices under /proc/sys/net/ipv4|6/conf/device).
    
        A few later patches will add an optimisation, to fix locations
        that might trigger the O(N^2) issue.
    
    - lookup: O(len(subdirs) + sum(len(tarr) for each tarr in ctl_tables)
      - could be made better:
         - sort ctl_subdirs (for binary search)
         - replace ctl_subdirs with a hash-table (increase memory footprint)
         - sort ctl_table entries at registration time (for binary search).
        Could be done, but I'm too lazy to do it now.
    
    - readdir: O(len(subdirs) + sum(len(tarr) for each tarr in ctl_tables)
       - can't get any better than this :)
    
    == Memory complexity ==
    
    Although we create more ctl_table_header (one for each directory, one
    for each table, and because we deleted the .child from ctl_table there
    are more tables registered than before this patch) we remove the need
    to store a full path (from too to the table) as was done in the old
    solution => a O(N) small memory gain with report to the old algo.
    
    = Limitations =
    
    == ctl_table will lose .child => some code uglyfication  ==
    
    Registering tables with multiple directories and files cannot be done
    in a single operation: there must be at least a table registered for
    each directory. This make code that registers sysctls uglier.
    
    The first patches in this series made the conversion from paths
    encoded with .child to paths specified by ctl_path. Later patches will
    convert all users of .child to ctl_path and the conversion layer will
    be deleted.
    
    == Handling of netns specific paths is weirder ==
    
    The algorithm descriptions from above are simplified. In reality the
    code needs to handle directories and files that must be visible only
    in some net namespaces. E.g. the /proc/sys/net/ipv4/conf/DEVICENAME/
    directory and it's files must be visible only in the netns of the
    'DEVICENAME' device.
    
    The old algorithm used secondary lists that indexed all netns specific
    headers (one such list per netns). The old-algorithm description is
    still valid, with the mention that besides searching the global list,
    the algorithm would also look into the current netns' list of
    headers. This scales perfectly in rapport to the number of network
    namespaces.
    
    The new algorithm does something similar, but a bit more complicated.
    We also use netns specific lists of directories/tables and store them
    in a special directory ctl_table_header (which I dubbed the
    "netns-correspondent" of another directory - I'm not very pleased with
    the name either).
    
    When registering a netns specific table, we will create a
    "netns-correspondent" to the last directory that is not netns specific
    in that path.
    
    E.g.: we're registering a netns specific table for 'lo':
          common path: /proc/sys/net/ipv4/
           netns path: /proc/sys/net/ipv4/conf/lo/
    
       We'll create an (unnamed) netns correspondent for 'ipv4' which will
       have 'conf' as it's subdir.
    
    E.g.: We're registering a netns specific file in /proc/sys/net/core/somaxconn
          common path: /proc/sys/net/core/
           netns path: /proc/sys/net/core/
    
    We'll create an (unnamed) netns correspondent for 'core' with the
    table containing 'somaxconn' in ctl_tables.
    
    All netns correspondents of one netns are held in a single list, and
    each netns gets it own list. This keeps the algorithm complexity
    indifferent of the number of network namespaces (as was the old one).
    
    However, now only a smaller part of directories are members of this
    list, improving register/lookup/readdir time complexity.
    
    There is one ugly limitation that stems from this approach.
    E.g.: register these files in this order:
     - register common         /dir1/file-common1
     - register netns specific /dir1/dir2/file-netns
     - register common         /dir1/dir2/file-common2
    
      We'll have this tree:
       'dir1' { .subdirs = ['dir2'], .tables = ['file-common1'] }
         ^                    |
         |                    -> { .subdirs = [], .tables = ['file-common2'] }
         |
         | (unnamed netns-corresp for dir1)
         -> { .subdir = ['dir2'] }
                            |
                            -> { .subdirs = [], .tables = ['file-netns'] }
    
    readdir: when we list the contents of 'dir1' we'll see it has two
             sub-directories named 'dir2' each with a file in it.
    
    lookup: lookup of /dir1/dir2/file-netns will not work because we find
            'dir2' as a subdir of 'dir1' and stick with it and never look
            into the netns correspondent of 'dir1'.
    
    This can be fixed in two ways:
    
    - A) by making sure to never register a netns specific directory and
      after that register that directory as a common one. From what I can
      tell there isn't such a problem in the kernel at the moment, but I
      did not study the source in detail.
    
    - B) by increasing the complexity of the code:
    
      - readdir: looking at both lists and comparing if we have already
                 listed a directory as common, so we don't list twice.
                 -> For imbalanced trees this can make readdir O(N^2) :(
    
      - register: the netns 'dir2' from the example above needs to be
                  connected to the common 'dir2' when 'dir2' is
                  registered. I'm not even going to thing of how time
                  complexity/ugliness is going to explode here.
    
    A later patch will implement version B): checks to make sure the
    registration order is maintained (a non-netns specific directory will
    not be added after netns specific directory with the same path was
    already added).
    
    = Change summary =
    
    * include/linux/sysctl.h
      - removed _set and _root, replaced with _group
    
      - netns correspondent directories are held in each netns's
        group->corresp_list
    
      - reused the header structure to represent directories which don't
        use ctl_table_arg, but store the directory name directly.
    
      - each directory header also gets two lists: subdirs and tables
    
    * fs/proc/proc_sysctl.c
      - a proc inode has ->sysctl_entry set only for files, not
        directories as these store the dirname directly
    
      - lookup:
         - take the dirs read-lock and iterate through subdirs and tables
         - if nothing is found, try the dir's netns-correspondent
    
      - scan: list every subdir and file that was not listed before
    
      - readdir: scan the current dir and it's netns correspondent
    
    * kernel/sysctl.c
      - inlines the code of use_table/unuse_table as it is not used
        elsewhere (used to be called from __register, but aren't any more)
    
      - adds routines to get/set the netns-correspondent
    
      - adds routines to protect the subdirs/tables lists (rwsem)
    
      - __register_sysctl_paths:
        - preallocate ctl_table_header for every dir in 'path'
        - increase the ctl_header_refs of every existing directory
        - if the group needs a netns-correspondent it is created for the
          last existing directory that is part of the non-netns specific
          path.
        - all the non-existing directories are added as children of their
          parent's subdir lists.
    
       - unregister:
         - wait until no one uses the header
         - for normal directories and table-wrapper headers take the
           parent's write lock to be able to delete something from one of
           it's lists (ctl_subdir or ctl_tables).
         - netns-correspondent headers must take the netns group list lock
           before deleting.
    
    Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
  5. sysctl: introduce ctl_table_group and ctl_table_group_ops

    authored
    ctl_table_group will replace in the future ctl_table_root and ctl_table_set.
    
    Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
  6. sysctl: simplify ->permissions hook

    authored
    The @root parameter was not used at all.
    
    The @namespaces parameter was used to transmit current->nsproxy. We
    can access current->nsproxy directly in the ->permissions function, no
    need to send it.
    
    Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
  7. sysctl: call sysctl_init before the first sysctl registration

    authored
    In the next patch key_init() will be changed to register a sysctl
    table. In preparation, we call sysctl_init() before it.
    
    Also, rename net/sysctl_net.c's sysctl_init so the two don't clash.
    
    Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
Commits on Nov 22, 2011
  1. SUNRPC: Ensure we return EAGAIN in xs_nospace if congestion is cleared

    Trond Myklebust authored
    By returning '0' instead of 'EAGAIN' when the tests in xs_nospace() fail
    to find evidence of socket congestion, we are making the RPC engine believe
    that the message was incorrectly sent and so it disconnects the socket
    instead of just retrying.
    
    The bug appears to have been introduced by commit
    5e3771c (SUNRPC: Ensure that xs_nospace
    return values are propagated).
    
    Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
    Cc: stable@vger.kernel.org [>= 2.6.30]
    Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
  2. @torvalds

    Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/l…

    torvalds authored
    …inux-nfs
    
    * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
      NFS: Revert pnfs ugliness from the generic NFS read code path
      SUNRPC: destroy freshly allocated transport in case of sockaddr init error
      NFS: Fix a regression in the referral code
      nfs: move nfs_file_operations declaration to bottom of file.c (try #2)
      nfs: when attempting to open a directory, fall back on normal lookup (try #5)
Commits on Nov 21, 2011
  1. @torvalds

    Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel…

    torvalds authored
    …/git/sage/ceph-client
    
    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
      libceph: Allocate larger oid buffer in request msgs
      ceph: initialize root dentry
      ceph: fix iput race when queueing inode work
Commits on Nov 18, 2011
  1. @davem330

    ipv4: fix redirect handling

    Eric Dumazet authored davem330 committed
    commit f39925d (ipv4: Cache learned redirect information in
    inetpeer.) introduced a regression in ICMP redirect handling.
    
    It assumed ipv4_dst_check() would be called because all possible routes
    were attached to the inetpeer we modify in ip_rt_redirect(), but thats
    not true.
    
    commit 7cc9150 (route: fix ICMP redirect validation) tried to fix
    this but solution was not complete. (It fixed only one route)
    
    So we must lookup existing routes (including different TOS values) and
    call check_peer_redir() on them.
    
    Reported-by: Ivan Zahariev <famzah@icdsoft.com>
    Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
    CC: Flavio Leitner <fbl@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
  2. @davem330

    ping: dont increment ICMP_MIB_INERRORS

    Eric Dumazet authored davem330 committed
    ping module incorrectly increments ICMP_MIB_INERRORS if feeded with a
    frame not belonging to its own sockets.
    
    RFC 2011 states that ICMP_MIB_INERRORS should count "the number of ICMP
    messages which the entiry received but determined as having
    ICMP-specific errors (bad ICMP checksums, bad length, etc.)."
    
    Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
    CC: Vasiliy Kulikov <segoon@openwall.com>
    Acked-by: Flavio Leitner <fbl@redhat.com>
    Acked-by: Vasiliy Kulikov <segoon@openwall.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
Commits on Nov 16, 2011
  1. @davem330

    bridge: correct IPv6 checksum after pull

    stephen hemminger authored davem330 committed
    Bridge multicast snooping of ICMPv6 would incorrectly report a checksum problem
    when used with Ethernet devices like sky2 that use CHECKSUM_COMPLETE.
    When bytes are removed from skb, the computed checksum needs to be adjusted.
    
    Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
    Tested-by: Martin Volf <martin.volf.42@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
  2. @davem330

    tcp: clear xmit timers in tcp_v4_syn_recv_sock()

    Eric Dumazet authored davem330 committed
    Simon Kirby reported divides by zero errors in __tcp_select_window()
    
    This happens when inet_csk_route_child_sock() returns a NULL pointer :
    
    We free new socket while we eventually armed keepalive timer in
    tcp_create_openreq_child()
    
    Fix this by a call to tcp_clear_xmit_timers()
    
    [ This is a followup to commit 918eb39 (net: add missing
    bh_unlock_sock() calls) ]
    
    Reported-by: Simon Kirby <sim@hostway.ca>
    Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
    Tested-by: Simon Kirby <sim@hostway.ca>
    Signed-off-by: David S. Miller <davem@davemloft.net>
Commits on Nov 15, 2011
  1. @jjuhl @davem330

    net/packet: Revert incorrect dead-code changes to prb_setup_retire_bl…

    jjuhl authored davem330 committed
    …k_timer
    
    Signed-off-by: Jesper Juhl <jj@chaosbits.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>
Commits on Nov 14, 2011
  1. @davem330
  2. @zenczykowski @davem330

    net-netlink: Add a new attribute to expose TCLASS values via netlink

    zenczykowski authored davem330 committed
    commit 3ceca74 added a TOS attribute.
    
    Unfortunately TOS and TCLASS are both present in a dual-stack v6 socket,
    furthermore they can have different values.  As such one cannot in a
    sane way expose both through a single attribute.
    
    Signed-off-by: Maciej Żenczyowski <maze@google.com>
    CC: Murali Raja <muralira@google.com>
    CC: Stephen Hemminger <shemminger@vyatta.com>
    CC: Eric Dumazet <eric.dumazet@gmail.com>
    CC: David S. Miller <davem@davemloft.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>
  3. @avagin @davem330

    bridge: Fix potential deadlock on br->multicast_lock

    avagin authored davem330 committed
    multicast_lock is taken in softirq context, so we should use
    spin_lock_bh() in userspace.
    
    call-chain in softirq context:
    run_timer_softirq()
    	br_multicast_query_expired()
    
    call-chain in userspace:
    sysfs_write_file()
    	store_multicast_snooping()
    		br_multicast_toggle()
    
    Signed-off-by: Andrew Vagin <avagin@openvz.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
  4. @davem330

    ip6_tunnel: copy parms.name after register_netdevice

    Josh Boyer authored davem330 committed
    Commit 1c5cae8 removed an explicit call to dev_alloc_name in ip6_tnl_create
    because register_netdevice will now create a valid name.  This works for the
    net_device itself.
    
    However the tunnel keeps a copy of the name in the parms structure for the
    ip6_tnl associated with the tunnel.  parms.name is set by copying the net_device
    name in ip6_tnl_dev_init_gen.  That function is called from ip6_tnl_dev_init in
    ip6_tnl_create, but it is done before register_netdevice is called so the name
    is set to a bogus value in the parms.name structure.
    
    This shows up if you do a simple tunnel add, followed by a tunnel show:
    
    [root@localhost ~]# ip -6 tunnel add remote fec0::100 local fec0::200
    [root@localhost ~]# ip -6 tunnel show
    ip6tnl0: ipv6/ipv6 remote :: local :: encaplimit 0 hoplimit 0 tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000)
    ip6tnl%d: ipv6/ipv6 remote fec0::100 local fec0::200 encaplimit 4 hoplimit 64 tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000)
    [root@localhost ~]#
    
    Fix this by moving the strcpy out of ip6_tnl_dev_init_gen, and calling it after
    register_netdevice has successfully returned.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Josh Boyer <jwboyer@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
  5. @pebolle @davem330

    rds: drop "select LLIST"

    pebolle authored davem330 committed
    Commit 1bc144b ("net, rds, Replace xlist in net/rds/xlist.h with
    llist") added "select LLIST" to the RDS_RDMA Kconfig entry. But there is
    no Kconfig symbol named LLIST. The select statement for that symbol is a
    nop. Drop it.
    
    lib/llist.o is builtin, so all that's needed to use the llist
    functionality is to include linux/llist.h, which this commit also did.
    
    Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
    Signed-off-by: David S. Miller <davem@davemloft.net>
  6. @jjuhl @davem330

    net/packet: remove dead code and unneeded variable from prb_setup_ret…

    jjuhl authored davem330 committed
    …ire_blk_timer()
    
    We test for 'tx_ring' being != zero and BUG() if that's the case. So after
    that check there is no way that 'tx_ring' could be anything _but_ zero, so
    testing it again is just dead code. Once that dead code is removed, the
    'pkc' local variable becomes entirely redundant, so remove that as well.
    
    Signed-off-by: Jesper Juhl <jj@chaosbits.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>
Commits on Nov 12, 2011
  1. @davem330

    ah: Don't return NET_XMIT_DROP on input.

    Nick Bowler authored davem330 committed
    When the ahash driver returns -EBUSY, AH4/6 input functions return
    NET_XMIT_DROP, presumably copied from the output code path.  But
    returning transmit codes on input doesn't make a lot of sense.
    Since NET_XMIT_DROP is a positive int, this gets interpreted as
    the next header type (i.e., success).  As that can only end badly,
    remove the check.
    
    Signed-off-by: Nick Bowler <nbowler@elliptictech.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
Commits on Nov 11, 2011
  1. @psomas @liewegas

    libceph: Allocate larger oid buffer in request msgs

    psomas authored liewegas committed
    ceph_osd_request struct allocates a 40-byte buffer for object names.
    RBD image names can be up to 96 chars long (100 with the .rbd suffix),
    which results in the object name for the image being truncated, and a
    subsequent map failure.
    
    Increase the oid buffer in request messages, in order to avoid the
    truncation.
    
    Signed-off-by: Stratos Psomadakis <psomas@grnet.gr>
    Signed-off-by: Sage Weil <sage@newdream.net>
  2. @linvjw

    Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/gi…

    linvjw authored
    …t/linville/wireless into for-davem
Commits on Nov 10, 2011
  1. SUNRPC: destroy freshly allocated transport in case of sockaddr init …

    Stanislav Kinsbursky authored Trond Myklebust committed
    …error
    
    Otherwise we will leak xprt structure and struct net reference.
    
    Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Commits on Nov 9, 2011
  1. @davem330
  2. @davem330

    ipv4: fix for ip_options_rcv_srr() daddr update.

    Li Wei authored davem330 committed
    When opt->srr_is_hit is set skb_rtable(skb) has been updated for
    'nexthop' and iph->daddr should always equals to skb_rtable->rt_dst
    holds, We need update iph->daddr either.
    
    Signed-off-by: Li Wei <lw@cn.fujitsu.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
  3. @davem330

    ah: Read nexthdr value before overwriting it in ahash input callback.

    Nick Bowler authored davem330 committed
    The AH4/6 ahash input callbacks read out the nexthdr field from the AH
    header *after* they overwrite that header.  This is obviously not going
    to end well.  Fix it up.
    
    Signed-off-by: Nick Bowler <nbowler@elliptictech.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
  4. @davem330

    ah: Correctly pass error codes in ahash output callback.

    Nick Bowler authored davem330 committed
    The AH4/6 ahash output callbacks pass nexthdr to xfrm_output_resume
    instead of the error code.  This appears to be a copy+paste error from
    the input case, where nexthdr is expected.  This causes the driver to
    continuously add AH headers to the datagram until either an allocation
    fails and the packet is dropped or the ahash driver hits a synchronous
    fallback and the resulting monstrosity is transmitted.
    
    Correct this issue by simply passing the error code unadulterated.
    
    Signed-off-by: Nick Bowler <nbowler@elliptictech.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
  5. @jmberg @linvjw

    mac80211: fix race between connection monitor & suspend

    jmberg authored linvjw committed
    When the connection monitor timer fires right before
    suspend, the following will happen:
     timer fires -> monitor_work gets queued
     suspend calls ieee80211_sta_quiesce
     ieee80211_sta_quiesce:
      - deletes timer
      - cancels monitor_work synchronously, running it
      [note wrong order of these steps]
     monitor_work runs, re-arming the timer
     later, timer fires while system should be quiesced
    
    This causes a warning:
    
    WARNING: at net/mac80211/util.c:540 ieee80211_can_queue_work+0x35/0x40 [mac80211]()
    
    but is otherwise harmless. I'm not completely sure
    this is the scenario Thomas stumbled across, but it
    is the only way I can right now see the warning in
    a scenario like the one he reported.
    
    Reported-by: Thomas Meyer <thomas@m3y3r.de>
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>
  6. @mcgrof @linvjw

    cfg80211: fix bug on regulatory core exit on access to last_request

    mcgrof authored linvjw committed
    Commit 4d9d88d by Scott James Remnant <keybuk@google.com> added
    the .uevent() callback for the regulatory device used during
    the platform device registration. The change was done to account
    for queuing up udev change requests through udevadm triggers.
    The change also meant that upon regulatory core exit we will now
    send a uevent() but the uevent() callback, reg_device_uevent(),
    also accessed last_request. Right before commiting device suicide
    we free'd last_request but never set it to NULL so
    platform_device_unregister() would lead to bogus kernel paging
    request. Fix this and also simply supress uevents right before
    we commit suicide as they are pointless.
    
    This fix is required for kernels >= v2.6.39
    
    $ git describe --contains 4d9d88d
    v2.6.39-rc1~468^2~25^2^2~21
    
    The impact of not having this present is that a bogus paging
    access may occur (only read) upon cfg80211 unload time. You
    may also get this BUG complaint below. Although Johannes
    could not reproduce the issue this fix is theoretically correct.
    
    mac80211_hwsim: unregister radios
    mac80211_hwsim: closing netlink
    BUG: unable to handle kernel paging request at ffff88001a06b5ab
    IP: [<ffffffffa030df9a>] reg_device_uevent+0x1a/0x50 [cfg80211]
    PGD 1836063 PUD 183a063 PMD 1ffcb067 PTE 1a06b160
    Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    CPU 0
    Modules linked in: cfg80211(-) [last unloaded: mac80211]
    
    Pid: 2279, comm: rmmod Tainted: G        W   3.1.0-wl+ #663 Bochs Bochs
    RIP: 0010:[<ffffffffa030df9a>]  [<ffffffffa030df9a>] reg_device_uevent+0x1a/0x50 [cfg80211]
    RSP: 0000:ffff88001c5f9d58  EFLAGS: 00010286
    RAX: 0000000000000000 RBX: ffff88001d2eda88 RCX: ffff88001c7468fc
    RDX: ffff88001a06b5a0 RSI: ffff88001c7467b0 RDI: ffff88001c7467b0
    RBP: ffff88001c5f9d58 R08: 000000000000ffff R09: 000000000000ffff
    R10: 0000000000000000 R11: 0000000000000001 R12: ffff88001c7467b0
    R13: ffff88001d2eda78 R14: ffffffff8164a840 R15: 0000000000000001
    FS:  00007f8a91d8a6e0(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: ffff88001a06b5ab CR3: 000000001c62e000 CR4: 00000000000006f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process rmmod (pid: 2279, threadinfo ffff88001c5f8000, task ffff88000023c780)
    Stack:
     ffff88001c5f9d98 ffffffff812ff7e5 ffffffff8176ab3d ffff88001c7468c2
     000000000000ffff ffff88001d2eda88 ffff88001c7467b0 ffff880000114820
     ffff88001c5f9e38 ffffffff81241dc7 ffff88001c5f9db8 ffffffff81040189
    Call Trace:
     [<ffffffff812ff7e5>] dev_uevent+0xc5/0x170
     [<ffffffff81241dc7>] kobject_uevent_env+0x1f7/0x490
     [<ffffffff81040189>] ? sub_preempt_count+0x29/0x60
     [<ffffffff814cab1a>] ? _raw_spin_unlock_irqrestore+0x4a/0x90
     [<ffffffff81305307>] ? devres_release_all+0x27/0x60
     [<ffffffff8124206b>] kobject_uevent+0xb/0x10
     [<ffffffff812fee27>] device_del+0x157/0x1b0
     [<ffffffff8130377d>] platform_device_del+0x1d/0x90
     [<ffffffff81303b76>] platform_device_unregister+0x16/0x30
     [<ffffffffa030fffd>] regulatory_exit+0x5d/0x180 [cfg80211]
     [<ffffffffa032bec3>] cfg80211_exit+0x2b/0x45 [cfg80211]
     [<ffffffff8109a84c>] sys_delete_module+0x16c/0x220
     [<ffffffff8108a23e>] ? trace_hardirqs_on_caller+0x7e/0x120
     [<ffffffff814cba02>] system_call_fastpath+0x16/0x1b
    Code: <all your base are belong to me>
    RIP  [<ffffffffa030df9a>] reg_device_uevent+0x1a/0x50 [cfg80211]
     RSP <ffff88001c5f9d58>
    CR2: ffff88001a06b5ab
    ---[ end trace 147c5099a411e8c0 ]---
    
    Reported-by: Johannes Berg <johannes@sipsolutions.net>
    Cc: Scott James Remnant <keybuk@google.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>
  7. @jmberg @linvjw

    mac80211: fix bug in ieee80211_build_probe_req

    jmberg authored linvjw committed
    ieee80211_probereq_get() can return NULL in
    which case we should clean up & return NULL
    in ieee80211_build_probe_req() as well.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>
  8. @jmberg @linvjw

    mac80211: fix NULL dereference in radiotap code

    jmberg authored linvjw committed
    When receiving failed PLCP frames is enabled, there
    won't be a rate pointer when we add the radiotap
    header and thus the kernel will crash. Fix this by
    not assuming the rate pointer is always valid. It's
    still always valid for frames that have good PLCP
    though, and that is checked & enforced.
    
    This was broken by my
    commit fc88518
    Author: Johannes Berg <johannes.berg@intel.com>
    Date:   Fri Jul 30 13:23:12 2010 +0200
    
        mac80211: don't check rates on PLCP error frames
    
    where I removed the check in this case but didn't
    take into account that the rate info would be used.
    
    Reported-by: Xiaokang Qin <xiaokang.qin@intel.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>
  9. @linvjw

    Merge branch 'master' of ssh://ra.kernel.org/pub/scm/linux/kernel/git…

    linvjw authored
    …/linville/wireless into for-davem
Commits on Nov 8, 2011
  1. @linvjw
Something went wrong with that request. Please try again.