Kernel Concurrency Sanitizer (KCSAN)
Kernel Concurrency Sanitizer (KCSAN) is a sampling watchpoint-based data-race detector. More details can be found in Documentation.
There is one main branch related to KCSAN:
- kcsan: Contains KCSAN only, rebased on top of stable upstream kernel (as long as KCSAN is not yet upstream).
Some tags of interest:
- kcsan_v5.3-with-fixes: Contains initial KCSAN release with various bug-fixes for races detected -- not exhaustive, only for bugs reported very frequently in order to suppress them in initial testing; the commit messages for those bugfixes include the KCSAN report as-is.
Testing & Fuzzing
Initially (Sep 2019) we have been running KCSAN via Syzkaller for several weeks, and have found numerous bugs (running for just 2 days found over 300 unique data-races).
We now have a public Syzkaller instance. Bugs will slowly appear on the dashboard, currently after moderation only, to keep the volume of reported bugs manageable.
Note on Data-Races
KCSAN detects data-races according to the LKMM. Although some data-races are due to logic bugs, or missed synchronization elsewhere, we expect a large number of data-races to be due to concurrent plain accesses.
We want to point out that racing accesses that are not considered logic bugs ("benign"), are still data-races while not marked atomic (
atomic_t, etc.) and behaviour may still be undefined -- this LWN article provides good background why such data-races are problems, with some more info on our wiki page.
To rebase KCSAN against latest upstream tree:
git rebase <upstream> kcsan
- Then force-push
Rebasing KCSAN with fixes on top can be done as follows:
git rebase -i --onto kcsan $(git log --format=oneline <my-branch-with-fixes> | grep -v 'BUG: KCSAN' | head -n 1 | cut -d " " -f 1) <my-branch-with-fixes>
Upstream Fixes of Data-Races found by KCSAN
This is a list of known upstream fixes for data-races found by KCSAN. Last updated: Oct 21, 2019.
- proc: fix inode uid/gid writeback race
- netfilter: conntrack: avoid possible false sharing
- tcp: address KCSAN reports in tcp_poll() (part I)
- rcu: Fix data-race due to atomic_t copy-by-value
- rcu: avoid data-race in rcu_gp_fqs_check_wake()
- rcu: exp: Avoid race on lockless rcu_node::expmask loop
- stop_machine: avoid potential race behaviour
- taskstats: fix data-race | report