Commits on Sep 20, 2010
  1. @gregkh

    Linux 2.6.32.22

    gregkh committed Sep 20, 2010
  2. @ickle @gregkh

    drm: Only decouple the old_fb from the crtc is we call mode_set*

    commit 356ad3c upstream.
    
    Otherwise when disabling the output we switch to the new fb (which is
    likely NULL) and skip the call to mode_set -- leaking driver private
    state on the old_fb.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29857
    Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Dave Airlie <airlied@redhat.com>
    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    ickle committed with gregkh Sep 9, 2010
  3. @ickle @gregkh

    drm/i915: Prevent double dpms on

    commit 032d2a0 upstream.
    
    Arguably this is a bug in drm-core in that we should not be called twice
    in succession with DPMS_ON, however this is still occuring and we see
    FDI link training failures on the second call leading to the occassional
    blank display. For the time being ignore the repeated call.
    
    Original patch by Dave Airlie <airlied@redhat.com>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    ickle committed with gregkh Sep 6, 2010
  4. @error27 @gregkh

    i915_gem: return -EFAULT if copy_to_user fails

    commit c877cdc upstream.
    
    copy_to_user() returns the number of bytes remaining to be copied and
    I'm pretty sure we want to return a negative error code here.
    
    Signed-off-by: Dan Carpenter <error27@gmail.com>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    error27 committed with gregkh Jun 23, 2010
  5. @error27 @gregkh

    i915: return -EFAULT if copy_to_user fails

    commit 9927a40 upstream.
    
    copy_to_user returns the number of bytes remaining to be copied, but we
    want to return a negative error code here.  These are returned to
    userspace.
    
    Signed-off-by: Dan Carpenter <error27@gmail.com>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    error27 committed with gregkh Jun 19, 2010
  6. @gregkh

    SUNRPC: Fix race corrupting rpc upcall

    commit 5a67657 upstream.
    
    If rpc_queue_upcall() adds a new upcall to the rpci->pipe list just
    after rpc_pipe_release calls rpc_purge_list(), but before it calls
    gss_pipe_release (as rpci->ops->release_pipe(inode)), then the latter
    will free a message without deleting it from the rpci->pipe list.
    
    We will be left with a freed object on the rpc->pipe list.  Most
    frequent symptoms are kernel crashes in rpc.gssd system calls on the
    pipe in question.
    
    Reported-by: J. Bruce Fields <bfields@redhat.com>
    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Trond Myklebust committed with gregkh Sep 12, 2010
  7. @gregkh

    NFS: Fix a typo in nfs_sockaddr_match_ipaddr6

    commit b20d37c upstream.
    
    Reported-by: Ben Greear <greearb@candelatech.com>
    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Trond Myklebust committed with gregkh Sep 12, 2010
  8. @enomsg @gregkh

    apm_power: Add missing break statement

    commit 1d22033 upstream.
    
    The missing break statement causes wrong capacity calculation for
    batteries that report energy.
    
    Reported-by: d binderman <dcb314@hotmail.com>
    Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    enomsg committed with gregkh Sep 8, 2010
  9. @guillemj @gregkh

    hwmon: (f75375s) Do not overwrite values read from registers

    commit c3b327d upstream.
    
    All bits in the values read from registers to be used for the next
    write were getting overwritten, avoid doing so to not mess with the
    current configuration.
    
    Signed-off-by: Guillem Jover <guillem@hadrons.org>
    Cc: Riku Voipio <riku.voipio@iki.fi>
    Signed-off-by: Jean Delvare <khali@linux-fr.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    guillemj committed with gregkh Sep 17, 2010
  10. @guillemj @gregkh

    hwmon: (f75375s) Shift control mode to the correct bit position

    commit 96f3640 upstream.
    
    The spec notes that fan0 and fan1 control mode bits are located in bits
    7-6 and 5-4 respectively, but the FAN_CTRL_MODE macro was making the
    bits shift by 5 instead of by 4.
    
    Signed-off-by: Guillem Jover <guillem@hadrons.org>
    Cc: Riku Voipio <riku.voipio@iki.fi>
    Signed-off-by: Jean Delvare <khali@linux-fr.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    guillemj committed with gregkh Sep 17, 2010
  11. @gregkh

    arm: fix really nasty sigreturn bug

    commit 653d48b upstream.
    
    If a signal hits us outside of a syscall and another gets delivered
    when we are in sigreturn (e.g. because it had been in sa_mask for
    the first one and got sent to us while we'd been in the first handler),
    we have a chance of returning from the second handler to location one
    insn prior to where we ought to return.  If r0 happens to contain -513
    (-ERESTARTNOINTR), sigreturn will get confused into doing restart
    syscall song and dance.
    
    Incredible joy to debug, since it manifests as random, infrequent and
    very hard to reproduce double execution of instructions in userland
    code...
    
    The fix is simple - mark it "don't bother with restarts" in wrapper,
    i.e. set r8 to 0 in sys_sigreturn and sys_rt_sigreturn wrappers,
    suppressing the syscall restart handling on return from these guys.
    They can't legitimately return a restart-worthy error anyway.
    
    Testcase:
    	#include <unistd.h>
    	#include <signal.h>
    	#include <stdlib.h>
    	#include <sys/time.h>
    	#include <errno.h>
    
    	void f(int n)
    	{
    		__asm__ __volatile__(
    			"ldr r0, [%0]\n"
    			"b 1f\n"
    			"b 2f\n"
    			"1:b .\n"
    			"2:\n" : : "r"(&n));
    	}
    
    	void handler1(int sig) { }
    	void handler2(int sig) { raise(1); }
    	void handler3(int sig) { exit(0); }
    
    	main()
    	{
    		struct sigaction s = {.sa_handler = handler2};
    		struct itimerval t1 = { .it_value = {1} };
    		struct itimerval t2 = { .it_value = {2} };
    
    		signal(1, handler1);
    
    		sigemptyset(&s.sa_mask);
    		sigaddset(&s.sa_mask, 1);
    		sigaction(SIGALRM, &s, NULL);
    
    		signal(SIGVTALRM, handler3);
    
    		setitimer(ITIMER_REAL, &t1, NULL);
    		setitimer(ITIMER_VIRTUAL, &t2, NULL);
    
    		f(-513); /* -ERESTARTNOINTR */
    
    		write(1, "buggered\n", 9);
    		return 1;
    	}
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Al Viro committed with gregkh Sep 17, 2010
  12. @tiwai @gregkh

    ALSA: hda - Handle pin NID 0x1a on ALC259/269

    commit b08b163 upstream.
    
    The pin NID 0x1a should be handled as well as NID 0x1b.
    Also added comments.
    
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Cc: David Henningsson <david.henningsson@canonical.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    tiwai committed with gregkh Jul 30, 2010
  13. @tiwai @gregkh

    ALSA: hda - Handle missing NID 0x1b on ALC259 codec

    commit 5d4abf9 upstream.
    
    Since ALC259/269 use the same parser of ALC268, the pin 0x1b was ignored
    as an invalid widget.  Just add this NID to handle properly.
    This will add the missing mixer controls for some devices.
    
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Cc: David Henningsson <david.henningsson@canonical.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    tiwai committed with gregkh Jul 30, 2010
  14. @antonblanchard @gregkh

    sched: cpuacct: Use bigger percpu counter batch values for stats coun…

    …ters
    
    commit fa535a7 upstream
    
    When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are
    enabled we can call cpuacct_update_stats with values much larger
    than percpu_counter_batch.  This means the call to
    percpu_counter_add will always add to the global count which is
    protected by a spinlock and we end up with a global spinlock in
    the scheduler.
    
    Based on an idea by KOSAKI Motohiro, this patch scales the batch
    value by cputime_one_jiffy such that we have the same batch
    limit as we would if CONFIG_VIRT_CPU_ACCOUNTING was disabled.
    His patch did this once at boot but that initialisation happened
    too early on PowerPC (before time_init) and it was never updated
    at runtime as a result of a hotplug cpu add/remove.
    
    This patch instead scales percpu_counter_batch by
    cputime_one_jiffy at runtime, which keeps the batch correct even
    after cpu hotplug operations.  We cap it at INT_MAX in case of
    overflow.
    
    For architectures that do not support
    CONFIG_VIRT_CPU_ACCOUNTING, cputime_one_jiffy is the constant 1
    and gcc is smart enough to optimise min(s32
    percpu_counter_batch, INT_MAX) to just percpu_counter_batch at
    least on x86 and PowerPC.  So there is no need to add an #ifdef.
    
    On a 64 thread PowerPC box with CONFIG_VIRT_CPU_ACCOUNTING and
    CONFIG_CGROUP_CPUACCT enabled, a context switch microbenchmark
    is 234x faster and almost matches a CONFIG_CGROUP_CPUACCT
    disabled kernel:
    
     CONFIG_CGROUP_CPUACCT disabled:   16906698 ctx switches/sec
     CONFIG_CGROUP_CPUACCT enabled:       61720 ctx switches/sec
     CONFIG_CGROUP_CPUACCT + patch:	   16663217 ctx switches/sec
    
    Tested with:
    
     wget http://ozlabs.org/~anton/junkcode/context_switch.c
     make context_switch
     for i in `seq 0 63`; do taskset -c $i ./context_switch & done
     vmstat 1
    
    Signed-off-by: Anton Blanchard <anton@samba.org>
    Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
    Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
    Cc: Tony Luck <tony.luck@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    antonblanchard committed with gregkh Feb 2, 2010
  15. @gregkh

    sched: Fix select_idle_sibling() logic in select_task_rq_fair()

    commit 99bd5e2 upstream
    
    Issues in the current select_idle_sibling() logic in select_task_rq_fair()
    in the context of a task wake-up:
    
    a) Once we select the idle sibling, we use that domain (spanning the cpu that
       the task is currently woken-up and the idle sibling that we found) in our
       wake_affine() decisions. This domain is completely different from the
       domain(we are supposed to use) that spans the cpu that the task currently
       woken-up and the cpu where the task previously ran.
    
    b) We do select_idle_sibling() check only for the cpu that the task is
       currently woken-up on. If select_task_rq_fair() selects the previously run
       cpu for waking the task, doing a select_idle_sibling() check
       for that cpu also helps and we don't do this currently.
    
    c) In the scenarios where the cpu that the task is woken-up is busy but
       with its HT siblings are idle, we are selecting the task be woken-up
       on the idle HT sibling instead of a core that it previously ran
       and currently completely idle. i.e., we are not taking decisions based on
       wake_affine() but directly selecting an idle sibling that can cause
       an imbalance at the SMT/MC level which will be later corrected by the
       periodic load balancer.
    
    Fix this by first going through the load imbalance calculations using
    wake_affine() and once we make a decision of woken-up cpu vs previously-ran cpu,
    then choose a possible idle sibling for waking up the task on.
    
    Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    LKML-Reference: <1270079265.7835.8.camel@sbs-t61.sc.intel.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Suresh Siddha committed with gregkh Mar 31, 2010
  16. @gregkh

    sched: Pre-compute cpumask_weight(sched_domain_span(sd))

    commit 669c55e upstream
    
    Dave reported that his large SPARC machines spend lots of time in
    hweight64(), try and optimize some of those needless cpumask_weight()
    invocations (esp. with the large offstack cpumasks these are very
    expensive indeed).
    
    Reported-by: David Miller <davem@davemloft.net>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    LKML-Reference: <new-submission>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Peter Zijlstra committed with gregkh Apr 16, 2010
  17. @gregkh

    sched: Fix select_idle_sibling()

    commit 8b911ac upstream
    
    Don't bother with selection when the current cpu is idle.  Recent load
    balancing changes also make it no longer necessary to check wake_affine()
    success before returning the selected sibling, so we now always use it.
    
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    LKML-Reference: <1268301369.6785.36.camel@marge.simson.net>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Mike Galbraith committed with gregkh Mar 11, 2010
  18. @gregkh

    sched: Fix vmark regression on big machines

    commit 50b926e upstream
    
    SD_PREFER_SIBLING is set at the CPU domain level if power saving isn't
    enabled, leading to many cache misses on large machines as we traverse
    looking for an idle shared cache to wake to.  Change the enabler of
    select_idle_sibling() to SD_SHARE_PKG_RESOURCES, and enable same at the
    sibling domain level.
    
    Reported-by: Lin Ming <ming.m.lin@intel.com>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    LKML-Reference: <1262612696.15495.15.camel@marge.simson.net>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Mike Galbraith committed with gregkh Jan 4, 2010
  19. @gregkh

    sched: More generic WAKE_AFFINE vs select_idle_sibling()

    commit fe3bcfe upstream
    
    Instead of only considering SD_WAKE_AFFINE | SD_PREFER_SIBLING
    domains also allow all SD_PREFER_SIBLING domains below a
    SD_WAKE_AFFINE domain to change the affinity target.
    
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Mike Galbraith <efault@gmx.de>
    LKML-Reference: <20091112145610.909723612@chello.nl>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Peter Zijlstra committed with gregkh Nov 12, 2009
  20. @gregkh

    sched: Cleanup select_task_rq_fair()

    commit a50bde5 upstream
    
    Clean up the new affine to idle sibling bits while trying to
    grok them. Should not have any function differences.
    
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Mike Galbraith <efault@gmx.de>
    LKML-Reference: <20091112145610.832503781@chello.nl>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Peter Zijlstra committed with gregkh Nov 12, 2009
  21. @gregkh

    sched: apply RCU protection to wake_affine()

    commit f3b577d upstream
    
    The task_group() function returns a pointer that must be protected
    by either RCU, the ->alloc_lock, or the cgroup lock (see the
    rcu_dereference_check() in task_subsys_state(), which is invoked by
    task_group()).  The wake_affine() function currently does none of these,
    which means that a concurrent update would be within its rights to free
    the structure returned by task_group().  Because wake_affine() uses this
    structure only to compute load-balancing heuristics, there is no reason
    to acquire either of the two locks.
    
    Therefore, this commit introduces an RCU read-side critical section that
    starts before the first call to task_group() and ends after the last use
    of the "tg" pointer returned from task_group().  Thanks to Li Zefan for
    pointing out the need to extend the RCU read-side critical section from
    that proposed by the original patch.
    
    Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Daniel J Blueman committed with gregkh Jun 1, 2010
  22. @gregkh

    sched: Remove unnecessary RCU exclusion

    commit fb58bac upstream
    
    As Nick pointed out, and realized by myself when doing:
       sched: Fix balance vs hotplug race
    the patch:
       sched: for_each_domain() vs RCU
    
    is wrong, sched_domains are freed after synchronize_sched(), which
    means disabling preemption is enough.
    
    Reported-by: Nick Piggin <npiggin@suse.de>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    LKML-Reference: <new-submission>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Peter Zijlstra committed with gregkh Dec 1, 2009
  23. @gregkh

    sched: Fix rq->clock synchronization when migrating tasks

    commit 861d034 upstream
    
    sched_fork() -- we do task placement in ->task_fork_fair() ensure we
      update_rq_clock() so we work with current time. We leave the vruntime
      in relative state, so the time delay until wake_up_new_task() doesn't
      matter.
    
    wake_up_new_task() -- Since task_fork_fair() left p->vruntime in
      relative state we can safely migrate, the activate_task() on the
      remote rq will call update_rq_clock() and causes the clock to be
      synced (enough).
    
    Tested-by: Jack Daniel <wanders.thirst@gmail.com>
    Tested-by: Philby John <pjohn@mvista.com>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    LKML-Reference: <1281002322.1923.1708.camel@laptop>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Peter Zijlstra committed with gregkh Aug 19, 2010
  24. @gregkh

    sched: Fix nr_uninterruptible count

    commit cc87f76 upstream
    
    The cpuload calculation in calc_load_account_active() assumes
    rq->nr_uninterruptible will not change on an offline cpu after
    migrate_nr_uninterruptible(). However the recent migrate on wakeup
    changes broke that and would result in decrementing the offline cpu's
    rq->nr_uninterruptible.
    
    Fix this by accounting the nr_uninterruptible on the waking cpu.
    
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    LKML-Reference: <new-submission>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Peter Zijlstra committed with gregkh Mar 26, 2010
  25. @gregkh

    sched: Optimize task_rq_lock()

    commit 65cc8e4 upstream
    
    Now that we hold the rq->lock over set_task_cpu() again, we can do
    away with most of the TASK_WAKING checks and reduce them again to
    set_cpus_allowed_ptr().
    
    Removes some conditionals from scheduling hot-paths.
    
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Oleg Nesterov <oleg@redhat.com>
    LKML-Reference: <new-submission>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Peter Zijlstra committed with gregkh Mar 25, 2010
  26. @gregkh

    sched: Fix TASK_WAKING vs fork deadlock

    commit 0017d73 upstream
    
    Oleg noticed a few races with the TASK_WAKING usage on fork.
    
     - since TASK_WAKING is basically a spinlock, it should be IRQ safe
     - since we set TASK_WAKING (*) without holding rq->lock it could
       be there still is a rq->lock holder, thereby not actually
       providing full serialization.
    
    (*) in fact we clear PF_STARTING, which in effect enables TASK_WAKING.
    
    Cure the second issue by not setting TASK_WAKING in sched_fork(), but
    only temporarily in wake_up_new_task() while calling select_task_rq().
    
    Cure the first by holding rq->lock around the select_task_rq() call,
    this will disable IRQs, this however requires that we push down the
    rq->lock release into select_task_rq_fair()'s cgroup stuff.
    
    Because select_task_rq_fair() still needs to drop the rq->lock we
    cannot fully get rid of TASK_WAKING.
    
    Reported-by: Oleg Nesterov <oleg@redhat.com>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    LKML-Reference: <new-submission>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Peter Zijlstra committed with gregkh Mar 24, 2010
  27. @utrace @gregkh

    sched: Make select_fallback_rq() cpuset friendly

    commit 9084bb8 upstream
    
    Introduce cpuset_cpus_allowed_fallback() helper to fix the cpuset problems
    with select_fallback_rq(). It can be called from any context and can't use
    any cpuset locks including task_lock(). It is called when the task doesn't
    have online cpus in ->cpus_allowed but ttwu/etc must be able to find a
    suitable cpu.
    
    I am not proud of this patch. Everything which needs such a fat comment
    can't be good even if correct. But I'd prefer to not change the locking
    rules in the code I hardly understand, and in any case I believe this
    simple change make the code much more correct compared to deadlocks we
    currently have.
    
    Signed-off-by: Oleg Nesterov <oleg@redhat.com>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    LKML-Reference: <20100315091027.GA9155@redhat.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    utrace committed with gregkh Mar 15, 2010
  28. @utrace @gregkh

    sched: _cpu_down(): Don't play with current->cpus_allowed

    commit 6a1bdc1 upstream
    
    _cpu_down() changes the current task's affinity and then recovers it at
    the end. The problems are well known: we can't restore old_allowed if it
    was bound to the now-dead-cpu, and we can race with the userspace which
    can change cpu-affinity during unplug.
    
    _cpu_down() should not play with current->cpus_allowed at all. Instead,
    take_cpu_down() can migrate the caller of _cpu_down() after __cpu_disable()
    removes the dying cpu from cpu_online_mask.
    
    Signed-off-by: Oleg Nesterov <oleg@redhat.com>
    Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    LKML-Reference: <20100315091023.GA9148@redhat.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    utrace committed with gregkh Mar 15, 2010
  29. @utrace @gregkh

    sched: sched_exec(): Remove the select_fallback_rq() logic

    commit 30da688 upstream.
    
    sched_exec()->select_task_rq() reads/updates ->cpus_allowed lockless.
    This can race with other CPUs updating our ->cpus_allowed, and this
    looks meaningless to me.
    
    The task is current and running, it must have online cpus in ->cpus_allowed,
    the fallback mode is bogus. And, if ->sched_class returns the "wrong" cpu,
    this likely means we raced with set_cpus_allowed() which was called
    for reason, why should sched_exec() retry and call ->select_task_rq()
    again?
    
    Change the code to call sched_class->select_task_rq() directly and do
    nothing if the returned cpu is wrong after re-checking under rq->lock.
    
    From now task_struct->cpus_allowed is always stable under TASK_WAKING,
    select_fallback_rq() is always called under rq-lock or the caller or
    the caller owns TASK_WAKING (select_task_rq).
    
    Signed-off-by: Oleg Nesterov <oleg@redhat.com>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    LKML-Reference: <20100315091019.GA9141@redhat.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    utrace committed with gregkh Mar 15, 2010
  30. @utrace @gregkh

    sched: move_task_off_dead_cpu(): Remove retry logic

    commit c1804d5 upstream
    
    The previous patch preserved the retry logic, but it looks unneeded.
    
    __migrate_task() can only fail if we raced with migration after we dropped
    the lock, but in this case the caller of set_cpus_allowed/etc must initiate
    migration itself if ->on_rq == T.
    
    We already fixed p->cpus_allowed, the changes in active/online masks must
    be visible to racer, it should migrate the task to online cpu correctly.
    
    Signed-off-by: Oleg Nesterov <oleg@redhat.com>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    LKML-Reference: <20100315091014.GA9138@redhat.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    utrace committed with gregkh Mar 15, 2010
  31. @utrace @gregkh

    sched: move_task_off_dead_cpu(): Take rq->lock around select_fallback…

    …_rq()
    
    commit 1445c08 upstream
    
    move_task_off_dead_cpu()->select_fallback_rq() reads/updates ->cpus_allowed
    lockless. We can race with set_cpus_allowed() running in parallel.
    
    Change it to take rq->lock around select_fallback_rq(). Note that it is not
    trivial to move this spin_lock() into select_fallback_rq(), we must recheck
    the task was not migrated after we take the lock and other callers do not
    need this lock.
    
    To avoid the races with other callers of select_fallback_rq() which rely on
    TASK_WAKING, we also check p->state != TASK_WAKING and do nothing otherwise.
    The owner of TASK_WAKING must update ->cpus_allowed and choose the correct
    CPU anyway, and the subsequent __migrate_task() is just meaningless because
    p->se.on_rq must be false.
    
    Alternatively, we could change select_task_rq() to take rq->lock right
    after it calls sched_class->select_task_rq(), but this looks a bit ugly.
    
    Also, change it to not assume irqs are disabled and absorb __migrate_task_irq().
    
    Signed-off-by: Oleg Nesterov <oleg@redhat.com>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    LKML-Reference: <20100315091010.GA9131@redhat.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    utrace committed with gregkh Mar 15, 2010
  32. @utrace @gregkh

    sched: Kill the broken and deadlockable cpuset_lock/cpuset_cpus_allow…

    …ed_locked code
    
    commit 897f0b3 upstream
    
    This patch just states the fact the cpusets/cpuhotplug interaction is
    broken and removes the deadlockable code which only pretends to work.
    
    - cpuset_lock() doesn't really work. It is needed for
      cpuset_cpus_allowed_locked() but we can't take this lock in
      try_to_wake_up()->select_fallback_rq() path.
    
    - cpuset_lock() is deadlockable. Suppose that a task T bound to CPU takes
      callback_mutex. If cpu_down(CPU) happens before T drops callback_mutex
      stop_machine() preempts T, then migration_call(CPU_DEAD) tries to take
      cpuset_lock() and hangs forever because CPU is already dead and thus
      T can't be scheduled.
    
    - cpuset_cpus_allowed_locked() is deadlockable too. It takes task_lock()
      which is not irq-safe, but try_to_wake_up() can be called from irq.
    
    Kill them, and change select_fallback_rq() to use cpu_possible_mask, like
    we currently do without CONFIG_CPUSETS.
    
    Also, with or without this patch, with or without CONFIG_CPUSETS, the
    callers of select_fallback_rq() can race with each other or with
    set_cpus_allowed() pathes.
    
    The subsequent patches try to to fix these problems.
    
    Signed-off-by: Oleg Nesterov <oleg@redhat.com>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    LKML-Reference: <20100315091003.GA9123@redhat.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    utrace committed with gregkh Mar 15, 2010
  33. @utrace @gregkh

    sched: set_cpus_allowed_ptr(): Don't use rq->migration_thread after u…

    …nlock
    
    commit 47a7098 upstream
    
    Trivial typo fix. rq->migration_thread can be NULL after
    task_rq_unlock(), this is why we have "mt" which should be
     used instead.
    
    Signed-off-by: Oleg Nesterov <oleg@redhat.com>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    LKML-Reference: <20100330165829.GA18284@redhat.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    utrace committed with gregkh Mar 30, 2010
  34. @gregkh

    sched: Queue a deboosted task to the head of the RT prio queue

    commit 60db48c upstream
    
    rtmutex_set_prio() is used to implement priority inheritance for
    futexes. When a task is deboosted it gets enqueued at the tail of its
    RT priority list. This is violating the POSIX scheduling semantics:
    
    rt priority list X contains two runnable tasks A and B
    
    task A	 runs with priority X and holds mutex M
    task C	 preempts A and is blocked on mutex M
         	 -> task A is boosted to priority of task C (Y)
    task A	 unlocks the mutex M and deboosts itself
         	 -> A is dequeued from rt priority list Y
    	 -> A is enqueued to the tail of rt priority list X
    task C	 schedules away
    task B	 runs
    
    This is wrong as task A did not schedule away and therefor violates
    the POSIX scheduling semantics.
    
    Enqueue the task to the head of the priority list instead.
    
    Reported-by: Mathias Weber <mathias.weber.mw1@roche.com>
    Reported-by: Carsten Emde <cbe@osadl.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Peter Zijlstra <peterz@infradead.org>
    Tested-by: Carsten Emde <cbe@osadl.org>
    Tested-by: Mathias Weber <mathias.weber.mw1@roche.com>
    LKML-Reference: <20100120171629.809074113@linutronix.de>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Thomas Gleixner committed with gregkh Jan 20, 2010
  35. @gregkh

    sched: Implement head queueing for sched_rt

    commit 37dad3f upstream
    
    The ability of enqueueing a task to the head of a SCHED_FIFO priority
    list is required to fix some violations of POSIX scheduling policy.
    
    Implement the functionality in sched_rt.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Peter Zijlstra <peterz@infradead.org>
    Tested-by: Carsten Emde <cbe@osadl.org>
    Tested-by: Mathias Weber <mathias.weber.mw1@roche.com>
    LKML-Reference: <20100120171629.772169931@linutronix.de>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
    Thomas Gleixner committed with gregkh Jan 20, 2010