Commits
dm-write-same
Name already in use
Commits on Dec 20, 2012
-
dm: fix is_split_required_for_discard so intent of split_discard_requ…
…ests isn't inverted (fold into previous "dm: add WRITE SAME support" commit)
-
dm stripe: add WRITE SAME support
Rename stripe_map_discard to stripe_map_range and reuse it for WRITE SAME bio processing. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
-
dm linear: add WRITE SAME support
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
-
WRITE SAME bios have a payload that contain a single page. When cloning WRITE SAME bios DM has no need to modify the bi_io_vec attributes (doing so would be detrimental). DM need only alter the start and end of the WRITE SAME bio accordingly. Rather than duplicate __clone_and_map_discard, factor out a common function that is also used by __clone_and_map_write_same. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
-
dm: default to disabling WRITE SAME support for all targets
Best to disable WRITE SAME support for all targets and then require each target to opt-in by setting 'num_write_same_requests' in the dm_target structure. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Commits on Dec 18, 2012
Commits on Dec 14, 2012
-
persistent data: fix dm_btree_value_type const compiler warnings
(in dm-array and dm-thin-metadata)
-
-
-
Fix a race condition between discard bios and ordinary bios.
The deferred_set entries should not be incremented until the bio prison cells are held. Otherwise quiescing a block for discard may end up waiting for a bio that's held in the discard bios cell.
-
-
Fix a bug in btree_del, and another bug that was compensating for it.
When deleting nested btrees the inner most btree wasn't being deleted. The thin-metadata code was serendipitously compensating for this by claiming there was one extra layer in the tree.
Commits on Dec 13, 2012
Commits on Dec 11, 2012
-
Input: matrix-keymap - provide proper module license
The matrix-keymap module is currently lacking a proper module license, add one so we don't have this module tainting the entire kernel. This issue has been present since commit 1932811 ("Input: matrix-keymap - uninline and prepare for device tree support") Signed-off-by: Florian Fainelli <florian@openwrt.org> CC: stable@vger.kernel.org # v3.5+ Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller: 1) Netlink socket dumping had several missing verifications and checks. In particular, address comparisons in the request byte code interpreter could access past the end of the address in the inet_request_sock. Also, address family and address prefix lengths were not validated properly at all. This means arbitrary applications can read past the end of certain kernel data structures. Fixes from Neal Cardwell. 2) ip_check_defrag() operates in contexts where we're in the process of, or about to, input the packet into the real protocols (specifically macvlan and AF_PACKET snooping). Unfortunately, it does a pskb_may_pull() which can modify the backing packet data which is not legal if the SKB is shared. It very much can be shared in this context. Deal with the possibility that the SKB is segmented by using skb_copy_bits(). Fix from Johannes Berg based upon a report by Eric Leblond. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: ipv4: ip_check_defrag must not modify skb before unsharing inet_diag: validate port comparison byte code to prevent unsafe reads inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run() inet_diag: validate byte code to prevent oops in inet_diag_bc_run() inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state
Commits on Dec 10, 2012
-
Revert "revert "Revert "mm: remove __GFP_NO_KSWAPD""" and associated …
…damage This reverts commits a509153 and d7c3b93. This is a revert of a revert of a revert. In addition, it reverts the even older i915 change to stop using the __GFP_NO_KSWAPD flag due to the original commits in linux-next. It turns out that the original patch really was bogus, and that the original revert was the correct thing to do after all. We thought we had fixed the problem, and then reverted the revert, but the problem really is fundamental: waking up kswapd simply isn't the right thing to do, and direct reclaim sometimes simply _is_ the right thing to do. When certain allocations fail, we simply should try some direct reclaim, and if that fails, fail the allocation. That's the right thing to do for THP allocations, which can easily fail, and the GPU allocations want to do that too. So starting kswapd is sometimes simply wrong, and removing the flag that said "don't start kswapd" was a mistake. Let's hope we never revisit this mistake again - and certainly not this many times ;) Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
ipv4: ip_check_defrag must not modify skb before unsharing
ip_check_defrag() might be called from af_packet within the RX path where shared SKBs are used, so it must not modify the input SKB before it has unshared it for defragmentation. Use skb_copy_bits() to get the IP header and only pull in everything later. The same is true for the other caller in macvlan as it is called from dev->rx_handler which can also get a shared SKB. Reported-by: Eric Leblond <eric@regit.org> Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Revert "mm: avoid waking kswapd for THP allocations when compaction i…
…s deferred or contended" This reverts commit 782fd30. We are going to reinstate the __GFP_NO_KSWAPD flag that has been removed, the removal reverted, and then removed again. Making this commit a pointless fixup for a problem that was caused by the removal of __GFP_NO_KSWAPD flag. The thing is, we really don't want to wake up kswapd for THP allocations (because they fail quite commonly under any kind of memory pressure, including when there is tons of memory free), and these patches were just trying to fix up the underlying bug: the original removal of __GFP_NO_KSWAPD in commit c654345 ("mm: remove __GFP_NO_KSWAPD") was simply bogus. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
dm thin: emit "ignore_discard" in status if discards are disabled
If "ignore_discard" is specified when creating the thin pool device then discard support is disabled for that device. The pool device's status should reflect this fact rather than stating "no_discard_passdown" (which implies discards are enabled but passdown is disabled). Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
-
dm: additional cleanup for clone_bio
Port split up of clone_bio as originally coded by Joe (prior to Mikulas' per-bio patches). In doing so it became clear there is no need to pass bio_set to clone_bio. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: Joe Thornber <ejt@redhat.com>
-
dm: introduce get_num_duplicates method in dm_target
This allows a target to dynamically tell dm core how many copies of a bio it needs. The dm-cache target will make use of this for it's write-through support. Introduce __split_and_issue_bio_to_target to allow split_bvec to be used with get_num_duplicates. We already have hard coded instances of this (eg, num_flush_requests, num_discard_requests). These have been left for now, we can switch everything over to use get_num_duplicates later. (Based heavily on Joe's original work; ported to work with Mikulas' new bio cloning via front padding) Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: Joe Thornber <ejt@redhat.com>
-
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/gi…
…t/torvalds/linux-2.6 into dm-for-3.8
-
inet_diag: validate port comparison byte code to prevent unsafe reads
Add logic to verify that a port comparison byte code operation actually has the second inet_diag_bc_op from which we read the port for such operations. Previously the code blindly referenced op[1] without first checking whether a second inet_diag_bc_op struct could fit there. So a malicious user could make the kernel read 4 bytes beyond the end of the bytecode array by claiming to have a whole port comparison byte code (2 inet_diag_bc_op structs) when in fact the bytecode was not long enough to hold both. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Commits on Dec 9, 2012
-
inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_b…
…c_run() Add logic to check the address family of the user-supplied conditional and the address family of the connection entry. We now do not do prefix matching of addresses from different address families (AF_INET vs AF_INET6), except for the previously existing support for having an IPv4 prefix match an IPv4-mapped IPv6 address (which this commit maintains as-is). This change is needed for two reasons: (1) The addresses are different lengths, so comparing a 128-bit IPv6 prefix match condition to a 32-bit IPv4 connection address can cause us to unwittingly walk off the end of the IPv4 address and read garbage or oops. (2) The IPv4 and IPv6 address spaces are semantically distinct, so a simple bit-wise comparison of the prefixes is not meaningful, and would lead to bogus results (except for the IPv4-mapped IPv6 case, which this commit maintains). Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
inet_diag: validate byte code to prevent oops in inet_diag_bc_run()
Add logic to validate INET_DIAG_BC_S_COND and INET_DIAG_BC_D_COND operations. Previously we did not validate the inet_diag_hostcond, address family, address length, and prefix length. So a malicious user could make the kernel read beyond the end of the bytecode array by claiming to have a whole inet_diag_hostcond when the bytecode was not long enough to contain a whole inet_diag_hostcond of the given address family. Or they could make the kernel read up to about 27 bytes beyond the end of a connection address by passing a prefix length that exceeded the length of addresses of the given family. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state
Fix inet_diag to be aware of the fact that AF_INET6 TCP connections instantiated for IPv4 traffic and in the SYN-RECV state were actually created with inet_reqsk_alloc(), instead of inet6_reqsk_alloc(). This means that for such connections inet6_rsk(req) returns a pointer to a random spot in memory up to roughly 64KB beyond the end of the request_sock. With this bug, for a server using AF_INET6 TCP sockets and serving IPv4 traffic, an inet_diag user like `ss state SYN-RECV` would lead to inet_diag_fill_req() causing an oops or the export to user space of 16 bytes of kernel memory as a garbage IPv6 address, depending on where the garbage inet6_rsk(req) pointed. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Commits on Dec 8, 2012
-
mm: vmscan: fix inappropriate zone congestion clearing
commit c702418 ("mm: vmscan: do not keep kswapd looping forever due to individual uncompactable zones") removed zone watermark checks from the compaction code in kswapd but left in the zone congestion clearing, which now happens unconditionally on higher order reclaim. This messes up the reclaim throttling logic for zones with dirty/writeback pages, where zones should only lose their congestion status when their watermarks have been restored. Remove the clearing from the zone compaction section entirely. The preliminary zone check and the reclaim loop in kswapd will clear it if the zone is considered balanced. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
vfs: fix O_DIRECT read past end of block device
The direct-IO write path already had the i_size checks in mm/filemap.c, but it turns out the read path did not, and removing the block size checks in fs/block_dev.c (commit bbec027: "blkdev_max_block: make private to fs/buffer.c") removed the magic "shrink IO to past the end of the device" code there. Fix it by truncating the IO to the size of the block device, like the write path already does. NOTE! I suspect the write path would be *much* better off doing it this way in fs/block_dev.c, rather than hidden deep in mm/filemap.c. The mm/filemap.c code is extremely hard to follow, and has various conditionals on the target being a block device (ie the flag passed in to 'generic_write_checks()', along with a conditional update of the inode timestamp etc). It is also quite possible that we should treat this whole block device size as a "s_maxbytes" issue, and try to make the logic even more generic. However, in the meantime this is the fairly minimal targeted fix. Noted by Milan Broz thanks to a regression test for the cryptsetup reencrypt tool. Reported-and-tested-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller: "Two stragglers: 1) The new code that adds new flushing semantics to GRO can cause SKB pointer list corruption, manage the lists differently to avoid the OOPS. Fix from Eric Dumazet. 2) When TCP fast open does a retransmit of data in a SYN-ACK or similar, we update retransmit state that we shouldn't triggering a WARN_ON later. Fix from Yuchung Cheng." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: gro: fix possible panic in skb_gro_receive() tcp: bug fix Fast Open client retransmission
Commits on Dec 7, 2012
-
net: gro: fix possible panic in skb_gro_receive()
commit 2e71a6f (net: gro: selective flush of packets) added a bug for skbs using frag_list. This part of the GRO stack is rarely used, as it needs skb not using a page fragment for their skb->head. Most drivers do use a page fragment, but some of them use GFP_KERNEL allocations for the initial fill of their RX ring buffer. napi_gro_flush() overwrite skb->prev that was used for these skb to point to the last skb in frag_list. Fix this using a separate field in struct napi_gro_cb to point to the last fragment. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>