Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

W zeke test #1

Open
wants to merge 13 commits into
base: lollipop
Choose a base branch
from
Open

W zeke test #1

wants to merge 13 commits into from

Conversation

WZeke
Copy link

@WZeke WZeke commented Mar 14, 2015

No description provided.

Maxime Poulain and others added 13 commits January 5, 2015 19:35
Imported from upstream branch with the following patches:
f2fs: support 3.4
f2fs: potential shift wrapping buf in f2fs_trim_fs()
f2fs: call f2fs_unlock_op after error was handled
f2fs: clean up f2fs_ioctl functions
f2fs: support atomic_write feature for database
f2fs: check the use of macros on block counts and addresses
f2fs: refactor flush_nat_entries to remove costly reorganizing ops
f2fs: introduce FITRIM in f2fs_ioctl
f2fs: introduce cp_control structure
f2fs: use more free segments until SSR is activated
f2fs: change the ipu_policy option to enable combinations
f2fs: fix to search whole dirty segmap when get_victim
f2fs: fix to clean previous mount option when remount_fs
f2fs: skip punching hole in special condition
f2fs: support large sector size
f2fs: fix to truncate blocks past EOF in ->setattr
f2fs: update i_size when __allocate_data_block
f2fs: use MAX_BIO_BLOCKS(sbi)
f2fs: remove redundant operation during roll-forward recovery
f2fs: do not skip latest inode information
f2fs: fix roll-forward missing scenarios
f2fs: fix conditions to remain recovery information in f2fs_sync_file
f2fs: introduce a flag to represent each nat entry information
f2fs: use meta_inode cache to improve roll-forward speed
f2fs: fix double lock for inode page during roll-foward recovery
f2fs: fix a race condition in next_free_nid
f2fs: use nm_i->next_scan_nid as default for next_free_nid
f2fs: give an option to enable in-place-updates during fsync to users
f2fs: expand counting dirty pages in the inode page cache
f2fs: remove lengthy inode->i_ino
f2fs: fix negative value for lseek offset
f2fs: avoid node page to be written twice in gc_node_segment
f2fs: use lock-less list(llist) to simplify the flush cmd management
f2fs: refactor flush_sit_entries codes for reducing SIT writes
f2fs: remove unneeded sit_i in macro SIT_BLOCK_OFFSET/START_SEGNO
f2fs: need fsck.f2fs if the recovery was failed
f2fs: handle bug cases by letting fsck.f2fs initiate
f2fs: add BUG cases to initiate fsck.f2fs
f2fs: need fsck.f2fs when f2fs_bug_on is triggered
f2fs: retain inconsistency information to initiate fsck.f2fs
f2fs: introduce F2FS_I_SB, F2FS_M_SB, and F2FS_P_SB
Merge tag 'for-f2fs-3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
f2fs: reposition unlock_new_inode to prevent accessing invalid inode
f2fs: fix wrong casting for dentry name
f2fs: simplify by using a literal
f2fs: truncate stale block for inline_data
f2fs: use macro for code readability
f2fs: introduce need_do_checkpoint for readability
f2fs: fix incorrect calculation with total/free inode num
f2fs: remove rename and use rename2
f2fs: skip if inline_data was converted already
f2fs: remove rewrite_node_page
f2fs: avoid double lock in truncate_blocks
f2fs: prevent checkpoint during roll-forward
f2fs: add WARN_ON in f2fs_bug_on
f2fs: handle EIO not to break fs consistency
f2fs: check s_dirty under cp_mutex
f2fs: unlock_page when node page is redirtied out
f2fs: introduce f2fs_cp_error for readability
f2fs: give a chance to mount again when encountering errors
f2fs: trigger release_dirty_inode in f2fs_put_super
f2fs: don't skip checkpoint if there is no dirty node pages
f2fs: avoid bug_on when error is occurred
f2fs: fix to recover inline_xattr/data and blocks
f2fs: should clear the inline_xattr flag
f2fs: clear FI_INC_LINK during the recovery
f2fs: fix the initial inode page for recovery
f2fs: make clear on test condition and return types
f2fs: should convert inline_data during the mkwrite
f2fs: fix typo
Merge tag 'for-f2fs-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
f2fs: use for_each_set_bit to simplify the code
f2fs: add f2fs_balance_fs for expand_inode_data
f2fs: invalidate xattr node page when evict inode
f2fs: avoid skipping recover_inline_xattr after recover_inline_data
f2fs: add tracepoint for f2fs_direct_IO
f2fs: reduce competition among node page writes
f2fs: fix coding style
f2fs: remove redundant lines in allocate_data_block
f2fs: add tracepoint for f2fs_issue_flush
f2fs: avoid retrying wrong recovery routine when error was occurred
f2fs: test before set/clear bits
f2fs: fix wrong condition for unlikely
f2fs: enable in-place-update for fdatasync
f2fs: skip unnecessary data writes during fsync
f2fs: add info of appended or updated data writes
f2fs: use radix_tree for ino management
f2fs: add infra for ino management
f2fs: punch the core function for inode management
f2fs: add nobarrier mount option
f2fs: fix to put root inode in error path of fill_super
f2fs: avoid use invalid mapping of node_inode when evict meta inode
f2fs: support ->rename2()
f2fs: add f2fs_balance_fs for direct IO
f2fs: reduce searching region of segmap when free section
f2fs: remove the unused stat_lock
f2fs: cleanup the needless return of f2fs_create_root_stats
f2fs: check name_len of dir entry to prevent from deadloop
f2fs: use inner macro and function to clean up codes
f2fs: introduce f2fs_write_failed to handle error case when write
f2fs: arguments cleanup of finding file flow functions
f2fs: remove the needless point-cast
f2fs: remove the redundant validation check of acl
f2fs: reduce region of f2fs_lock_op covered for better concurrency
f2fs: replace count*size kzalloc by kcalloc
f2fs: refactor flush_nat_entries codes for reducing NAT writes
f2fs: clean up an unused parameter and assignment
f2fs: introduce f2fs_do_tmpfile for code consistency
f2fs: support ->tmpfile()
f2fs: avoid to truncate non-updated page partially
f2fs: avoid unneeded SetPageUptodate in f2fs_write_end
Merge tag 'f2fs-fixes-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
f2fs: avoid to access NULL pointer in issue_flush_thread
f2fs: check bdi->dirty_exceeded when trying to skip data writes
f2fs: do checkpoint for the renamed inode
f2fs: release new entry page correctly in error path of f2fs_rename
f2fs: fix error path in init_inode_metadata
f2fs: check lower bound nid value in check_nid_range
f2fs: remove unused variables in f2fs_sm_info
f2fs: fix not to allocate unnecessary blocks during fallocate
f2fs: recover fallocated data and its i_size together
f2fs: fix to report newly allocate region as extent
Merge tag 'for-f2fs-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
f2fs: support f2fs_fiemap
f2fs: avoid not to call remove_dirty_inode
f2fs: recover fallocated space
f2fs: fix to recover data written by dio
f2fs: large volume support
f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages
f2fs: avoid overflow when large directory feathure is enabled
f2fs: fix recursive lock by f2fs_setxattr
MAINTAINERS: change the email address for f2fs
f2fs: use inode_init_owner() to simplify codes
f2fs: avoid to use slab memory in f2fs_issue_flush for efficiency
f2fs: add a tracepoint for f2fs_read_data_page
f2fs: add a tracepoint for f2fs_write_{meta,node,data}_pages
f2fs: add a tracepoint for f2fs_write_{meta,node,data}_page
f2fs: add a tracepoint for f2fs_write_end
f2fs: add a tracepoint for f2fs_write_begin
f2fs: fix checkpatch warning
f2fs: deactivate inode page if the inode is evicted
f2fs: decrease the lock granularity during write_begin
f2fs: no need to wait on page writebck to meta pages
f2fs: avoid grab_cache_page_write_begin for data pages
f2fs: split grab_cache_page and wait_on_page_writeback for node pages
f2fs: fix to truncate inline data in inode page when setattr
f2fs: readahead multi pages of directory for performance
f2fs: set errno when f2fs_iget failed in recover_dentry
f2fs: consider fallocated space for SEEK_DATA
f2fs: return i_size if the hole is outside of i_size
f2fs: introduce f2fs_seek_block to support SEEK_{DATA, HOLE} in llseek
f2fs: introduce help function {create,destroy}_flush_cmd_control
f2fs: introduce struct flush_cmd_control to wrap the flush_merge fields
f2fs: introduce help macro ADDRS_PER_PAGE()
f2fs: submit bio at the reclaim path
f2fs: return errors right after checking them
f2fs: pass flags field to setxattr functions
f2fs: clean up long variable names
f2fs: handle inline data independently in f2fs_bmap
f2fs: adjust free mem size to flush dentry blocks
f2fs: avoid BUG_ON when mouting corrupted image having garbage blocks
f2fs: add available_nids to fix handling max_nid correctly
f2fs: add static to get_max_meta_blks
f2fs: introduce raw_nat_from_node_info() to simplfy codes
f2fs: add the flush_merge handle in the remount flow
f2fs: atomically set inode->i_flags in f2fs_set_inode_flags()
f2fs: make recover_inline_xattr() static
f2fs: remove costly dirty_dir_inode operations
f2fs: fix to unlock f2fs_lock at the omitted error case
f2fs: call redirty_page_for_writepage
f2fs: avoid to conduct roll-forward due to the remained garbage blocks
f2fs: enable flush_merge only in f2fs is not read-only
f2fs: use __GFP_ZERO to avoid appending set-NULL
f2fs: put the bio when issue_flush completed
f2fs: switch to iov_iter_alignment()
Merge tag 'for-f2fs-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
f2fs: fix wrong statistics of inline data
f2fs: check the acl's validity before setting
f2fs: introduce f2fs_issue_flush to avoid redundant flush issue
f2fs: fix to cover io->bio with io_rwsem
f2fs: fix error path when fail to read inline data
f2fs: use list_for_each_entry{_safe} for simplyfying code
f2fs: avoid free slab cache under spinlock
f2fs: avoid unneeded lookup when xattr name length is too long
f2fs: avoid unnecessary bio submit when wait page writeback
f2fs: return -EIO when node id is not matched
f2fs: avoid RECLAIM_FS-ON-W warning
f2fs: skip unnecessary node writes during fsync
f2fs: introduce fi->i_sem to protect fi's info
f2fs: change reclaim rate in percentage
f2fs: add missing documentation for dir_level
f2fs: remove unnecessary threshold
f2fs: throttle the memory footprint with a sysfs entry
f2fs: avoid to drop nat entries due to the negative nr_shrink
f2fs: call f2fs_wait_on_page_writeback instead of native function
f2fs: introduce nr_pages_to_write for segment alignment
f2fs: increase pages_skipped when skipping writepages
f2fs: avoid small data writes by skipping writepages
f2fs: introduce get_dirty_dents for readability
f2fs: fix incorrect parsing with option string
f2fs: avoid to return incorrect errno of read_normal_summaries
f2fs: introduce f2fs_has_xattr_block for better readability
f2fs: print type for each segment in segment_info's show
f2fs: check upper bound of ino value in f2fs_nfs_get_inode
f2fs: introduce f2fs_has_inline_xattr for better readability
f2fs: recover inline xattr data in roll-forward process
f2fs: optimize restore_node_summary slightly
f2fs: format segment_info's show for better legibility
f2fs: remove the unused ctor argument of f2fs_kmem_cache_create()
f2fs: update start nid only once each circle
f2fs: fix wrong kernel coding style
f2fs: fix to write node pages with WRITE_SYNC
f2fs: fix dirty page accounting when redirty
f2fs: use existing macro to clean up some codes
f2fs: readahead contiguous SSA blocks for f2fs_gc
f2fs: add an sysfs entry to control the directory level
f2fs: introduce large directory support
f2fs: remove costly bit operations for f2fs_find_entry
f2fs: implement a lock-free stat_show
f2fs: introduce a radix_tree for the free_nid list
f2fs: introduce help macro on_build_free_nids()
f2fs: fix to mark the checkpointed nat entry correctly
f2fs: fix to do build_stat prior to the recovery procedure
f2fs: fix not to write data pages on the page reclaiming path
f2fs: fix the calculation of max_nids
f2fs: show counts of checkpoint in status
f2fs: introduce ra_meta_pages to readahead CP/NAT/SIT pages
f2fs: use inode mutex to keep atomicity of f2fs_falloc
f2fs: clean up redundant function call
f2fs: fix f2fs_write_meta_page at no checkpoint status
f2fs: fix to truncate dentry pages in the error case
f2fs: fix a build warning
f2fs: clean up with a macro
f2fs: fix the potential mismatch between dir's i_size and i_blocks
f2fs: remove the ugly pointer conversion
f2fs: fix to recover xattr node block
f2fs: handle dirty segments inside refresh_sit_entry
f2fs: update_inode_page should be done all the time
f2fs: use generic posix ACL infrastructure
Merge tag 'for-f2fs-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
f2fs: drop obsolete node page when it is truncated
f2fs: introduce NODE_MAPPING for code consistency
f2fs: remove the orphan block page array
f2fs: add help function META_MAPPING
f2fs: move a branch for code redability
f2fs: call mark_inode_dirty to flush dirty pages
f2fs: clean checkpatch warnings
f2fs: missing REQ_META and REQ_PRIO when sync_meta_pages(META_FLUSH)
f2fs: avoid f2fs_balance_fs call during pageout
f2fs: add delimiter to seperate name and value in debug phrase
f2fs: use spinlock rather than mutex for better speed
f2fs: move alloc new orphan node out of lock protection region
f2fs: move grabing orphan pages out of protection region
f2fs: remove the needless parameter of f2fs_wait_on_page_writeback
f2fs: update documents and a MAINTAINERS entry
f2fs: add a sysfs entry to control max_victim_search
f2fs: improve write performance under frequent fsync calls
f2fs: avoid to read inline data except first page
f2fs: avoid to left uninitialized data in page when read inline data
f2fs: fix truncate_partial_nodes bug
f2fs: handle errors correctly during f2fs_reserve_block
f2fs: add inline_data recovery routine
f2fs: add the number of inline_data files to status info
f2fs: refactor f2fs_convert_inline_data
f2fs: call f2fs_put_page at the error case
f2fs: convert inline_data for punch_hole
f2fs: don't need to get f2fs_lock_op for the inline_data test
f2fs: update f2fs Documentation
f2fs: handle inline data operations
f2fs: key functions to handle inline data
f2fs: convert max_orphans to a field of f2fs_sb_info
f2fs: check the blocksize before calling generic_direct_IO path
f2fs: should put the dnode when NEW_ADDR is detected
f2fs: introduce F2FS_INODE macro to get f2fs_inode
f2fs: check filename length in recover_dentry
f2fs: avoid to set wrong pino of inode when rename dir
f2fs: update several comments
f2fs: remove the rw_flag domain from f2fs_io_info
f2fs: move all the bio initialization into __bio_alloc
f2fs: add description about small_discards in document
f2fs: write dirty meta pages collectively
f2fs: introduce a new direct_IO write path
f2fs: introduce sysfs entry to control in-place-update policy
f2fs: missing kmem_cache_destroy for discard_entry
f2fs: fix the location of tracepoint
f2fs: refactor bio->rw handling
f2fs: merge pages with the same sync_mode flag
f2fs: add unlikely() macro for compiler more aggressively
f2fs: add unlikely() macro for compiler optimization
f2fs: avoid unneeded page release for correct _count of page
f2fs: use inner macro GFP_F2FS_ZERO for simplification
f2fs: replace the debugfs_root with f2fs_debugfs_root
f2fs: remove debufs dir if debugfs_create_file() failed
f2fs: readahead contiguous pages for restore_node_summary
f2fs: refactor bio-related operations
f2fs: remove the own bi_private allocation
f2fs: convert recover_orphan_inodes to void
f2fs: check return value of f2fs_readpage in find_data_page
f2fs: use true and false for boolean variable
f2fs: correct type of wait in struct bio_private
f2fs: avoid to calculate incorrect max orphan number
f2fs: remove unneeded code in punch_hole
f2fs: remove unnecessary condition checks
f2fs: bug fix on bit overflow from 32bits to 64bits
f2fs: fix a potential out of range issue
f2fs: remove unnecessary return value
f2fs: add a new mount option: inline_data
f2fs: add flags and helpers to support inline data
f2fs: send REQ_META or REQ_PRIO when reading meta area
f2fs: add detailed information of bio types in the tracepoints
f2fs: add a new function: f2fs_reserve_block()
f2fs: avoid lock debugging overhead
f2fs: read contiguous sit entry pages by merging for mount performance
f2fs: adds a tracepoint for f2fs_submit_read_bio
f2fs: adds a tracepoint for submit_read_page
f2fs: simplify IS_DATASEG and IS_NODESEG macro
f2fs: merge read IOs at ra_nat_pages()
f2fs: add a new function to support for merging contiguous read
f2fs: move the list_head initialization into the lock protection region
f2fs: simplify write_orphan_inodes for better readable
f2fs: convert inc/dec_valid_node_count to inc/dec one count
f2fs: convert dev_valid_block_count to void
f2fs: convert remove_inode_page to void
f2fs: introduce a bio array for per-page write bios
f2fs: disable the extent cache ops on high fragmented files
f2fs: use sbi->write_mutex for write bios
f2fs: clean up the do_submit_bio flow
f2fs: use f2fs_put_page to release page for uniform style
f2fs: add a tracepoint for f2fs_issue_discard
f2fs: introduce f2fs_issue_discard() to clean up
f2fs: add a sysfs entry to control max_discards
f2fs: add key functions for small discards
f2fs: add a slab cache entry for small discards
f2fs: improve searching speed of __next_free_blkoff
f2fs: introduce __find_rev_next(_zero)_bit
Merge tag 'for-f2fs-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
f2fs: issue more large discard command
f2fs: fix memory leak after kobject init failed in fill_super
f2fs: cleanup waiting routine for writeback pages in cp
f2fs: avoid to use a NULL point in destroy_segment_manager
f2fs: remove unnecessary TestClearPageError when wait pages writeback
f2fs: update f2fs document
f2fs: avoid to wait all the node blocks during fsync
f2fs: check all ones or zeros bitmap with bitops for better mount performance
f2fs: change the method of calculating the number summary blocks
f2fs: fix calculating incorrect free size when update xattr in __f2fs_setxattr
f2fs: add an option to avoid unnecessary BUG_ONs
f2fs: introduce CONFIG_F2FS_CHECK_FS for BUG_ON control
f2fs: fix a deadlock during init_acl procedure
f2fs: clean up acl flow for better readability
f2fs: remove unnecessary segment bitmap updates
f2fs: add tracepoint for vm_page_mkwrite
f2fs: add tracepoint for set_page_dirty
f2fs: remove redundant set_page_dirty from write_compacted_summaries
f2fs: add reclaiming control by sysfs
f2fs: introduce f2fs_balance_fs_bg for some background jobs
f2fs: reclaim prefree segments periodically
f2fs: use bool for booleans
f2fs: clean up several status-related operations
f2fs: introduce f2fs_kmem_cache_alloc to hide the unfailed, kmem cache allocation
f2fs: no need to check other dirty_segmap when the seg has been found
f2fs: use true and false for boolean value
f2fs: avoid to write during the recovery
f2fs: avoid wait if IO end up when do_checkpoint for better performance
f2fs: introduce function read_raw_super_block()
f2fs: fix the starvation problem on cp_rwsem
f2fs: fix to store and retrieve i_rdev correctly
f2fs: fix writing incorrect orphan blocks
f2fs: avoid unnecessary checkpoints
f2fs: handle remount options correctly
f2fs: use rw_sem instead of fs_lock(locks mutex)
f2fs: account for orphan inodes during recovery
f2fs: don't GC or take an fs_lock from f2fs_initxattrs()
f2fs: don't let the orphan inode counter underflow
f2fs: remove unneeded write checkpoint in recover_fsync_data
f2fs: avoid allocating failure in bio_alloc
f2fs: optimize the victim searching loop slightly
f2fs: optimize fs_lock for better performance
Merge tag 'for-f2fs-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
f2fs: optimize gc for better performance
f2fs: merge more bios of node block writes
f2fs: avoid an overflow during utilization calculation
f2fs: trigger GC when there are prefree segments
f2fs: use strncasecmp() simplify the string comparison
f2fs: fix omitting to update inode page
f2fs: support the inline xattrs
f2fs: add the truncate_xattr_node function
f2fs: introduce __find_xattr for readability
f2fs: reserve the xattr space dynamically
f2fs: add flags for inline xattrs
f2fs: fix error return code in init_f2fs_fs()
f2fs: fix wrong BUG_ON condition
f2fs: fix memory leak when init f2fs filesystem fail
f2fs: fix a compound statement label error
f2fs: avoid writing inode redundantly when creating a file
f2fs: alloc_page() doesn't return an ERR_PTR
f2fs: should cover i_xattr_nid with its xattr node page lock
f2fs: check the free space first in new_node_page
f2fs: clean up the needless end 'return' of void function
f2fs: introduce cur_cp_version function to reduce code size
f2fs: fix inconsistency between xattr node blocks and its inode
f2fs: fix the use of XATTR_NODE_OFFSET
f2fs: fix a build failure due to missing the kobject header
f2fs: fix a deadlock in fsync
f2fs: remove redundant code from f2fs_write_begin
f2fs: add sysfs entries to select the gc policy
f2fs: add sysfs support for controlling the gc_thread
f2fs: remove an unneeded kfree(NULL)
f2fs: fix handling orphan inodes
f2fs: move bio_private allocation out of f2fs_bio_alloc()
f2fs: use list_for_each rather than list_for_each_safe, in remove_orphan_inode()
f2fs: use seq_puts()/seq_putc() rather than seq_printf() where possible
f2fs: fix i_name during f2fs_sync_file
f2fs: update file name in the inode block during f2fs_rename
f2fs: introduce help function F2FS_NODE()
f2fs: add a help func F2FS_STAT() to get the f2fs_stat_info
f2fs: add proc entry to monitor current usage of segments
f2fs: add description for fsck.f2fs and dump.f2fs
f2fs: recover date requested by fdatasync
f2fs: fix readdir incorrectness
Merge tag 'for-f2fs-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
f2fs: fix to recover i_size from roll-forward
f2fs: remove the unused argument "sbi" of func destroy_fsync_dnodes()
f2fs: remove reusing any prefree segments
f2fs: code cleanup and simplify in func {find/add}_gc_inode
f2fs: optimize the init_dirty_segmap function
f2fs: fix an endian conversion bug detected by sparse
f2fs: fix crc endian conversion
[readdir] convert f2fs
f2fs: add remount_fs callback support
f2fs: recover wrong pino after checkpoint during fsync
f2fs: optimize do_write_data_page()
f2fs: make locate_dirty_segment() as static
f2fs: remove unnecessary parameter "offset" from __add_sum_entry()
f2fs: avoid freqeunt write_inode calls
f2fs: optimise the truncate_data_blocks_range() range
f2fs: use the F2FS specific flags in f2fs_ioctl()
f2fs: sync dir->i_size with its block allocation
f2fs: fix i_blocks translation on various types of files
f2fs: set sb->s_fs_info before calling parse_options()
f2fs: support xattr security labels
f2fs: fix iget/iput of dir during recovery
f2fs: reorganise the function get_victim_by_default
f2fs: handle errors from get_node_page calls
f2fs: cover cp_file information with ilock
f2fs: fix incorrect iputs during the dentry recovery
f2fs: fix dentry recovery routine
f2fs: iput only if whole data blocks are flushed
f2fs: return proper error from start_gc_thread
f2fs: optimize several routines in node.h
f2fs: remove unneeded initializations in f2fs_parent_dir
f2fs: push some variables to debug part
f2fs: align data types between on-disk and in-memory block addresses
f2fs: dereferencing an ERR_PTR
f2fs: use ihold
f2fs: should not make_bad_inode on f2fs_link failure
f2fs: fix to handle do_recover_data errors
f2fs: reuse the locked dnode page and its inode
f2fs: fix wrong condition check
f2fs: add f2fs_readonly()
f2fs: avoid RECLAIM_FS-ON-W: deadlock
f2fs: don't do checkpoint if error is occurred
f2fs: fix to unlock page before exit
f2fs: remove unnecessary kmap/kunmap operations
f2fs: reorganize f2fs_vm_page_mkwrite
f2fs: use list_for_each_entry rather than list_for_each_entry_safe
f2fs: remove unecessary variable and code
f2fs, lockdep: annotate mutex_lock_all()
f2fs: add debug msgs in the recovery routine
f2fs: update inode page after creation
f2fs: change get_new_data_page to pass a locked node page
f2fs: skip get_node_page if locked node page is passed
f2fs: remove unnecessary por_doing check
f2fs: fix BUG_ON during f2fs_evict_inode(dir)
f2fs: fix por_doing variable coverage
f2fs: remove redundant assignment
f2fs: fix the inconsistent state of data pages
f2fs: fix inconsistency of block count during recovery
Merge tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
f2fs: cover free_nid management with spin_lock
f2fs: optimize scan_nat_page()
f2fs: code cleanup for scan_nat_page() and build_free_nids()
f2fs: bugfix for alloc_nid_failed()
f2fs: recover when journal contains deleted files
f2fs: continue to mount after failing recovery
f2fs: avoid deadlock during evict after f2fs_gc
f2fs: modify the number of issued pages to merge IOs
f2fs: remove useless #include <linux/proc_fs.h> as we're now using sysfs as debug entry.
f2fs: fix inconsistent using of NM_WOUT_THRESHOLD
f2fs: check truncation of mapping after lock_page
f2fs: enhance alloc_nid and build_free_nids flows
f2fs: add a tracepoint on f2fs_new_inode
f2fs: check nid == 0 in add_free_nid
f2fs: add REQ_META about metadata requests for submit
f2fs: give a chance to merge IOs by IO scheduler
f2fs: avoid frequent background GC
f2fs: add tracepoints to debug checkpoint request
f2fs: add tracepoints for write page operations
f2fs: add tracepoints to debug the block allocation
f2fs: add tracepoints for GC threads
f2fs: add tracepoint for tracing the page i/o
f2fs: add tracepoints for truncate operation
f2fs: add tracepoints for sync & inode operations
f2fs: make is_multimedia_file code align with its name
f2fs: fix error return code in f2fs_fill_super()
f2fs: use mnt_want_write_file() in ioctl
f2fs: fix typo mistakes
f2fs: write checkpoint before starting FG_GC
f2fs: fix the logic of IS_DNODE()
f2fs: introduce a new global lock scheme
f2fs: move f2fs_balance_fs from truncate to punch_hole
f2fs: reduce redundant spin_lock operations
f2fs: update f2fs.txt related with discard at mkfs
f2fs: add NULL pointer check
f2fs: fix the bitmap consistency of dirty segments
f2fs: avoid race for summary information
f2fs: allocate remained free segments in the LFS mode
f2fs: check completion of foreground GC
f2fs: change GC bitmaps to apply the section granularity
f2fs: allocate new segment aligned with sections
f2fs: remove redundant lock_page calls
f2fs: introduce TOTAL_SECS macro
f2fs: do not use duplicate names in a macro
f2fs: use kmemdup
f2fs: fix to give correct parent inode number for roll forward
f2fs: remain nat cache entries for further free nid allocation
f2fs: do not skip writing file meta during fsync
f2fs: fix the recovery flow to handle errors correctly
f2fs: fix typo in comments
f2fs: avoid BUG_ON from check_nid_range and update return path in do_read_inode
f2fs: fix return values from validate superblock
f2fs: reorganize f2fs_setxattr
f2fs: notify when discard is not supported
f2fs: fix to call WRITE_FLUSH at the end of fsync
f2fs: fix not to allocate max_nid
f2fs: fix return value of releasepage for node and data
f2fs: scan next nat page to reuse free nids in there
f2fs: should check the node page was truncated first
f2fs: reduce unncessary locking pages during read
f2fs: Fix typo in comments
f2fs: avoid extra ++ while returning from get_node_path
f2fs: align f2fs maximum name length to linux based filesystem
f2fs: optimize and change return path in lookup_free_nid_list
f2fs: optimize get node page readahead part
f2fs: check the level before calling get_nid function
f2fs: introduce readahead mode of node pages
f2fs: read with READ_SYNC when getting dnode page
f2fs: fix to unlock node page when it was truncated
f2fs: fix overflow when calculating utilization on 32-bit
Merge tag 'f2fs-for-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
f2fs: avoid build warning
Merge branch 'f2fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into dev
f2fs: add compat_ioctl to provide backward compatability
f2fs: fix calculation of max. gc cost in the SSR case
f2fs: clarify and enhance the f2fs_gc flow
f2fs: optimize the return condition for has_not_enough_free_secs
f2fs: make an accessor to get sections for particular block type
f2fs: mark gc_thread as NULL when thread creation is failed
f2fs: name gc task as per the block device
f2fs: remove unnecessary gc option check and balance_fs
f2fs: remove repeated F2FS_SET_SB_DIRT call
f2fs: when check superblock failed, try to check another superblock
f2fs: use F2FS_BLKSIZE to judge bloksize and page_cache_size
f2fs: add device name in debugfs
f2fs: stop repeated checking if cp is needed
f2fs: avoid balanc_fs during evict_inode
f2fs: remove the use of page_cache_release
f2fs: fix typo mistake for data_version description
f2fs: reorganize code for ra_node_page
f2fs: avoid redundant call to has_not_enough_free_secs in f2fs_gc
f2fs: add un/freeze_fs into super_operations
f2fs: clean up the add_orphan_inode func
f2fs: fix disable_ext_identify option spelling
f2fs: cover global locks for reserve_new_block
f2fs: prevent checkpoint once any IO failure is detected
f2fs: save device node number into f2fs_inode
f2fs: get rid of fake on-stack dentries
f2fs: switch init_inode_metadata() to passing parent and name separately
f2fs: switch new_inode_page() from dentry to qstr
f2fs: init_dent_inode() should take qstr
Merge tag 'f2fs-for-3.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
f2fs: use _safe() version of list_for_each
f2fs: add comments of start_bidx_of_node
f2fs: avoid issuing small bios due to several dirty node pages
f2fs: support swapfile
f2fs: add remap_pages as generic_file_remap_pages
f2fs: add __init to functions in init_f2fs_fs
f2fs: fix the debugfs entry creation path
f2fs: add global mutex_lock to protect f2fs_stat_list
f2fs: remove the blk_plug usage in f2fs_write_data_pages
f2fs: avoid redundant time update for parent directory in f2fs_delete_entry
f2fs: remove redundant call to set_blocksize in f2fs_fill_super
f2fs: move f2fs_balance_fs to punch_hole
f2fs: add f2fs_balance_fs in several interfaces
f2fs: revisit the f2fs_gc flow
f2fs: check return value during recovery
f2fs: avoid null dereference in f2fs_acl_from_disk
f2fs: initialize newly allocated dnode structure
f2fs: update f2fs partition info about SIT/NAT layout
f2fs: update f2fs document to reflect SIT/NAT layout correctly
f2fs: remove unneeded INIT_LIST_HEAD at few places
f2fs: fix time update in case of f2fs fallocate
f2fs: introduce f2fs_msg to ease adding information prints
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
f2fs: unify string length declarations and usage
f2fs: clean up unused variables and return values
f2fs: clean up the start_bidx_of_node function
f2fs: remove unneeded variable from f2fs_sync_fs
f2fs: fix fsync_inode list addition logic and avoid invalid access to memory
f2fs: remove unneeded initialization of nr_dirty in dirty_seglist_info
f2fs: handle error from f2fs_iget_nowait
f2fs: fix equation of has_not_enough_free_secs()
f2fs: add MAINTAINERS entry
f2fs: return a default value for non-void function
f2fs: invalidate the node page if allocation is failed
f2fs: add missing #include <linux/prefetch.h>
f2fs: Don't assign e_id in f2fs_acl_from_disk
f2fs: do f2fs_balance_fs in front of dir operations
f2fs: should recover orphan and fsync data
f2fs: fix handling errors got by f2fs_write_inode
f2fs: fix up f2fs_get_parent issue to retrieve correct parent inode number
f2fs: fix wrong calculation on f_files in statfs
f2fs: remove set_page_dirty for atomic f2fs_end_io_write
Merge tag 'for-3.8-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
f2fs: fix tracking parent inode number
f2fs: cleanup the f2fs_bio_alloc routine
f2fs: introduce accessor to retrieve number of dentry slots
f2fs: remove redundant call to f2fs_put_page in delete entry
f2fs: make use of GFP_F2FS_ZERO for setting gfp_mask
f2fs: rewrite f2fs_bio_alloc to make it simpler
f2fs: fix a typo in f2fs documentation
f2fs: remove unused variable
f2fs: move error condition for mkdir at proper place
f2fs: remove unneeded initialization
f2fs: check read only condition before beginning write out
f2fs: remove unneeded memset from init_once
f2fs: show error in case of invalid mount arguments
f2fs: fix the compiler warning for uninitialized use of variable
f2fs: resolve build failures
f2fs: adjust kernel coding style
f2fs: fix endian conversion bugs reported by sparse
f2fs: remove unneeded version.h header file from f2fs.h
f2fs: update the f2fs document
f2fs: update Kconfig and Makefile
f2fs: move proc files to debugfs
f2fs: add recovery routines for roll-forward
f2fs: add garbage collection functions
f2fs: add xattr and acl functionalities
f2fs: add core directory operations
f2fs: add inode operations for special inodes
f2fs: add core inode operations
f2fs: add address space operations for data
f2fs: add file operations
f2fs: add segment operations
f2fs: add node operations
f2fs: add checkpoint operations
f2fs: add super block operations
f2fs: add superblock and major in-memory structure
f2fs: add on-disk layout
f2fs: add document
Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>
Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com>
Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>
Prevent a Kernel panic that can be triggered after shooting an
HDR picture.

Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com>
…ine"

This reverts commit 0394650bb6f53d0271c9610e6cc2a7fc855432c6.

Conflicts:
	arch/arm/mach-msm/cpufreq.c
…we don't need to implement this everytime we need it.

Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>
Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com>
…ernor.

Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com>

Conflicts:
	drivers/input/touchscreen/Makefile
Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com>
Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com>
…don't overlap"

This reverts commit 0f12c28fabf191e3539f6f6056bc23d96ef45cf3.
…events when updating the input time variable

Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com>
maxime-poulain pushed a commit that referenced this pull request Apr 4, 2015
We will encounter oops by executing below command.
getfattr -n system.advise /mnt/f2fs/file
Killed

message log:
BUG: unable to handle kernel NULL pointer dereference at   (null)
IP: [<f8b54d69>] f2fs_xattr_advise_get+0x29/0x40 [f2fs]
*pdpt = 00000000319b7001 *pde = 0000000000000000
Oops: 0002 [#1] SMP
Modules linked in: f2fs(O) snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq joydev
snd_seq_device snd_timer bnep snd rfcomm microcode bluetooth soundcore i2c_piix4 mac_hid serio_raw parport_pc ppdev lp parport
binfmt_misc hid_generic psmouse usbhid hid e1000 [last unloaded: f2fs]
CPU: 3 PID: 3134 Comm: getfattr Tainted: G           O    4.0.0-rc1 MiCode#6
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: f3a71b60 ti: f19a6000 task.ti: f19a6000
EIP: 0060:[<f8b54d69>] EFLAGS: 00010246 CPU: 3
EIP is at f2fs_xattr_advise_get+0x29/0x40 [f2fs]
EAX: 00000000 EBX: f19a7e71 ECX: 00000000 EDX: f8b5b467
ESI: 00000000 EDI: f2008570 EBP: f19a7e14 ESP: f19a7e08
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: 00000000 CR3: 319b8000 CR4: 000007f0
Stack:
 f8b5a634 c0cbb580 00000000 f19a7e34 c1193850 00000000 00000007 f19a7e71
 f19a7e64 c0cbb580 c1193810 f19a7e50 c1193c00 00000000 00000000 00000000
 c0cbb580 00000000 f19a7f70 c1194097 00000000 00000000 00000000 74737973
Call Trace:
 [<c1193850>] generic_getxattr+0x40/0x50
 [<c1193810>] ? xattr_resolve_name+0x80/0x80
 [<c1193c00>] vfs_getxattr+0x70/0xa0
 [<c1194097>] getxattr+0x87/0x190
 [<c11801d7>] ? path_lookupat+0x57/0x5f0
 [<c11819d2>] ? putname+0x32/0x50
 [<c116653a>] ? kmem_cache_alloc+0x2a/0x130
 [<c11819d2>] ? putname+0x32/0x50
 [<c11819d2>] ? putname+0x32/0x50
 [<c11819d2>] ? putname+0x32/0x50
 [<c11827f9>] ? user_path_at_empty+0x49/0x70
 [<c118283f>] ? user_path_at+0x1f/0x30
 [<c11941e7>] path_getxattr+0x47/0x80
 [<c11948e7>] SyS_getxattr+0x27/0x30
 [<c163f748>] sysenter_do_call+0x12/0x12
Code: 66 90 55 89 e5 57 56 53 66 66 66 66 90 8b 78 20 89 d3 ba 67 b4 b5 f8 89 d8 89 ce e8 42 7c 7b c8 85 c0 75 16 0f b6 87 44 01 00
00 <88> 06 b8 01 00 00 00 5b 5e 5f 5d c3 8d 76 00 b8 ea ff ff ff eb
EIP: [<f8b54d69>] f2fs_xattr_advise_get+0x29/0x40 [f2fs] SS:ESP 0068:f19a7e08
CR2: 0000000000000000
---[ end trace 860260654f1f416a ]---

The reason is that in getfattr there are two steps which is indicated by strace info:
1) try to lookup and get size of specified xattr.
2) get value of the extented attribute.

strace info:
getxattr("/mnt/f2fs/file", "system.advise", 0x0, 0) = 1
getxattr("/mnt/f2fs/file", "system.advise", "\x00", 256) = 1

For the first step, getfattr may pass a NULL pointer in @value and zero in @SiZe
as parameters for ->getxattr, but we access this @value pointer directly without
checking whether the pointer is valid or not in f2fs_xattr_advise_get, so the
oops occurs.

This patch fixes this issue by verifying @value pointer before using.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
@maxime-poulain maxime-poulain force-pushed the lollipop branch 3 times, most recently from c812578 to dfcdc63 Compare April 5, 2015 16:00
@maxime-poulain maxime-poulain force-pushed the lollipop branch 2 times, most recently from 89cee7e to 915820c Compare April 24, 2015 14:29
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
commit 4523e14 upstream.

hugetlb_reserve_pages() can be used for either normal file-backed
hugetlbfs mappings, or MAP_HUGETLB.  In the MAP_HUGETLB, semi-anonymous
mode, there is not a VMA around.  The new call to resv_map_put() assumed
that there was, and resulted in a NULL pointer dereference:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
  IP: vma_resv_map+0x9/0x30
  PGD 141453067 PUD 1421e1067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP
  ...
  Pid: 14006, comm: trinity-child6 Not tainted 3.4.0+ #36
  RIP: vma_resv_map+0x9/0x30
  ...
  Process trinity-child6 (pid: 14006, threadinfo ffff8801414e0000, task ffff8801414f26b0)
  Call Trace:
    resv_map_put+0xe/0x40
    hugetlb_reserve_pages+0xa6/0x1d0
    hugetlb_file_setup+0x102/0x2c0
    newseg+0x115/0x360
    ipcget+0x1ce/0x310
    sys_shmget+0x5a/0x60
    system_call_fastpath+0x16/0x1b

This was reported by Dave Jones, but was reproducible with the
libhugetlbfs test cases, so shame on me for not running them in the
first place.

With this, the oops is gone, and the output of libhugetlbfs's
run_tests.py is identical to plain 3.4 again.

[ Marked for stable, since this was introduced by commit c50ac05
  ("hugetlb: fix resv_map leak in error path") which was also marked for
  stable ]

Reported-by: Dave Jones <davej@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
[ Upstream commit e49cc0d ]

We hit a kernel OOPS.

<3>[23898.789643] BUG: sleeping function called from invalid context at
/data/buildbot/workdir/ics/hardware/intel/linux-2.6/arch/x86/mm/fault.c:1103
<3>[23898.862215] in_atomic(): 0, irqs_disabled(): 0, pid: 10526, name:
Thread-6683
<4>[23898.967805] HSU serial 0000:00:05.1: 0000:00:05.2:HSU serial prevented me
to suspend...
<4>[23899.258526] Pid: 10526, comm: Thread-6683 Tainted: G        W
3.0.8-137685-ge7742f9 #1
<4>[23899.357404] HSU serial 0000:00:05.1: 0000:00:05.2:HSU serial prevented me
to suspend...
<4>[23899.904225] Call Trace:
<4>[23899.989209]  [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.000416]  [<c1238c2a>] __might_sleep+0x10a/0x110
<4>[23900.007357]  [<c1228021>] do_page_fault+0xd1/0x3c0
<4>[23900.013764]  [<c18e9ba9>] ? restore_all+0xf/0xf
<4>[23900.024024]  [<c17c007b>] ? napi_complete+0x8b/0x690
<4>[23900.029297]  [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.123739]  [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.128955]  [<c18ea0c3>] error_code+0x5f/0x64
<4>[23900.133466]  [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.138450]  [<c17f6298>] ? __ip_route_output_key+0x698/0x7c0
<4>[23900.144312]  [<c17f5f8d>] ? __ip_route_output_key+0x38d/0x7c0
<4>[23900.150730]  [<c17f63df>] ip_route_output_flow+0x1f/0x60
<4>[23900.156261]  [<c181de58>] ip4_datagram_connect+0x188/0x2b0
<4>[23900.161960]  [<c18e981f>] ? _raw_spin_unlock_bh+0x1f/0x30
<4>[23900.167834]  [<c18298d6>] inet_dgram_connect+0x36/0x80
<4>[23900.173224]  [<c14f9e88>] ? _copy_from_user+0x48/0x140
<4>[23900.178817]  [<c17ab9da>] sys_connect+0x9a/0xd0
<4>[23900.183538]  [<c132e93c>] ? alloc_file+0xdc/0x240
<4>[23900.189111]  [<c123925d>] ? sub_preempt_count+0x3d/0x50

Function free_fib_info resets nexthop_nh->nh_dev to NULL before releasing
fi. Other cpu might be accessing fi. Fixing it by delaying the releasing.

With the patch, we ran MTBF testing on Android mobile for 12 hours
and didn't trigger the issue.

Thank Eric for very detailed review/checking the issue.

Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: Kun Jiang <kunx.jiang@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
commit e5851da upstream.

Remove spinlock as atomic_t can be used instead. Note we use only 16
lower bits, upper bits are changed but we impilcilty cast to u16.

This fix possible deadlock on IBSS mode reproted by lockdep:

=================================
[ INFO: inconsistent lock state ]
3.4.0-wl+ MiCode#4 Not tainted
---------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
kworker/u:2/30374 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (&(&intf->seqlock)->rlock){+.?...}, at: [<f9979a20>] rt2x00queue_create_tx_descriptor+0x380/0x490 [rt2x00lib]
{IN-SOFTIRQ-W} state was registered at:
  [<c04978ab>] __lock_acquire+0x47b/0x1050
  [<c0498504>] lock_acquire+0x84/0xf0
  [<c0835733>] _raw_spin_lock+0x33/0x40
  [<f9979a20>] rt2x00queue_create_tx_descriptor+0x380/0x490 [rt2x00lib]
  [<f9979f2a>] rt2x00queue_write_tx_frame+0x1a/0x300 [rt2x00lib]
  [<f997834f>] rt2x00mac_tx+0x7f/0x380 [rt2x00lib]
  [<f98fe363>] __ieee80211_tx+0x1b3/0x300 [mac80211]
  [<f98ffdf5>] ieee80211_tx+0x105/0x130 [mac80211]
  [<f99000dd>] ieee80211_xmit+0xad/0x100 [mac80211]
  [<f9900519>] ieee80211_subif_start_xmit+0x2d9/0x930 [mac80211]
  [<c0782e87>] dev_hard_start_xmit+0x307/0x660
  [<c079bb71>] sch_direct_xmit+0xa1/0x1e0
  [<c0784bb3>] dev_queue_xmit+0x183/0x730
  [<c078c27a>] neigh_resolve_output+0xfa/0x1e0
  [<c07b436a>] ip_finish_output+0x24a/0x460
  [<c07b4897>] ip_output+0xb7/0x100
  [<c07b2d60>] ip_local_out+0x20/0x60
  [<c07e01ff>] igmpv3_sendpack+0x4f/0x60
  [<c07e108f>] igmp_ifc_timer_expire+0x29f/0x330
  [<c04520fc>] run_timer_softirq+0x15c/0x2f0
  [<c0449e3e>] __do_softirq+0xae/0x1e0
irq event stamp: 18380437
hardirqs last  enabled at (18380437): [<c0526027>] __slab_alloc.clone.3+0x67/0x5f0
hardirqs last disabled at (18380436): [<c0525ff3>] __slab_alloc.clone.3+0x33/0x5f0
softirqs last  enabled at (18377616): [<c0449eb3>] __do_softirq+0x123/0x1e0
softirqs last disabled at (18377611): [<c041278d>] do_softirq+0x9d/0xe0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&intf->seqlock)->rlock);
  <Interrupt>
    lock(&(&intf->seqlock)->rlock);

 *** DEADLOCK ***

4 locks held by kworker/u:2/30374:
 #0:  (wiphy_name(local->hw.wiphy)){++++.+}, at: [<c045cf99>] process_one_work+0x109/0x3f0
 #1:  ((&sdata->work)){+.+.+.}, at: [<c045cf99>] process_one_work+0x109/0x3f0
 mitwo-dev#2:  (&ifibss->mtx){+.+.+.}, at: [<f98f005b>] ieee80211_ibss_work+0x1b/0x470 [mac80211]
 MiCode#3:  (&intf->beacon_skb_mutex){+.+...}, at: [<f997a644>] rt2x00queue_update_beacon+0x24/0x50 [rt2x00lib]

stack backtrace:
Pid: 30374, comm: kworker/u:2 Not tainted 3.4.0-wl+ MiCode#4
Call Trace:
 [<c04962a6>] print_usage_bug+0x1f6/0x220
 [<c0496a12>] mark_lock+0x2c2/0x300
 [<c0495ff0>] ? check_usage_forwards+0xc0/0xc0
 [<c04978ec>] __lock_acquire+0x4bc/0x1050
 [<c0527890>] ? __kmalloc_track_caller+0x1c0/0x1d0
 [<c0777fb6>] ? copy_skb_header+0x26/0x90
 [<c0498504>] lock_acquire+0x84/0xf0
 [<f9979a20>] ? rt2x00queue_create_tx_descriptor+0x380/0x490 [rt2x00lib]
 [<c0835733>] _raw_spin_lock+0x33/0x40
 [<f9979a20>] ? rt2x00queue_create_tx_descriptor+0x380/0x490 [rt2x00lib]
 [<f9979a20>] rt2x00queue_create_tx_descriptor+0x380/0x490 [rt2x00lib]
 [<f997a5cf>] rt2x00queue_update_beacon_locked+0x5f/0xb0 [rt2x00lib]
 [<f997a64d>] rt2x00queue_update_beacon+0x2d/0x50 [rt2x00lib]
 [<f9977e3a>] rt2x00mac_bss_info_changed+0x1ca/0x200 [rt2x00lib]
 [<f9977c70>] ? rt2x00mac_remove_interface+0x70/0x70 [rt2x00lib]
 [<f98e4dd0>] ieee80211_bss_info_change_notify+0xe0/0x1d0 [mac80211]
 [<f98ef7b8>] __ieee80211_sta_join_ibss+0x3b8/0x610 [mac80211]
 [<c0496ab4>] ? mark_held_locks+0x64/0xc0
 [<c0440012>] ? virt_efi_query_capsule_caps+0x12/0x50
 [<f98efb09>] ieee80211_sta_join_ibss+0xf9/0x140 [mac80211]
 [<f98f0456>] ieee80211_ibss_work+0x416/0x470 [mac80211]
 [<c0496d8b>] ? trace_hardirqs_on+0xb/0x10
 [<c077683b>] ? skb_dequeue+0x4b/0x70
 [<f98f207f>] ieee80211_iface_work+0x13f/0x230 [mac80211]
 [<c045cf99>] ? process_one_work+0x109/0x3f0
 [<c045d015>] process_one_work+0x185/0x3f0
 [<c045cf99>] ? process_one_work+0x109/0x3f0
 [<f98f1f40>] ? ieee80211_teardown_sdata+0xa0/0xa0 [mac80211]
 [<c045ed86>] worker_thread+0x116/0x270
 [<c045ec70>] ? manage_workers+0x1e0/0x1e0
 [<c0462f64>] kthread+0x84/0x90
 [<c0462ee0>] ? __init_kthread_worker+0x60/0x60
 [<c083d382>] kernel_thread_helper+0x6/0x10

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Helmut Schaa <helmut.schaa@googlemail.com>
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
commit 08f75bf upstream.

usb_ep_ops.disable must clear external copy of the endpoint descriptor,
otherwise musb crashes after loading/unloading several gadget modules
in a row:

Unable to handle kernel paging request at virtual address bf013730
pgd = c0004000
[bf013730] *pgd=8f26d811, *pte=00000000, *ppte=00000000
Internal error: Oops: 7 [#1]
Modules linked in: g_cdc [last unloaded: g_file_storage]
CPU: 0    Not tainted  (3.2.17 #647)
PC is at musb_gadget_enable+0x4c/0x24c
LR is at _raw_spin_lock_irqsave+0x4c/0x58
[<c027c030>] (musb_gadget_enable+0x4c/0x24c) from [<bf01b760>] (gether_connect+0x3c/0x19c [g_cdc])
[<bf01b760>] (gether_connect+0x3c/0x19c [g_cdc]) from [<bf01ba1c>] (ecm_set_alt+0x15c/0x180 [g_cdc])
[<bf01ba1c>] (ecm_set_alt+0x15c/0x180 [g_cdc]) from [<bf01ecd4>] (composite_setup+0x85c/0xac4 [g_cdc])
[<bf01ecd4>] (composite_setup+0x85c/0xac4 [g_cdc]) from [<c027b744>] (musb_g_ep0_irq+0x844/0x924)
[<c027b744>] (musb_g_ep0_irq+0x844/0x924) from [<c027a97c>] (musb_interrupt+0x79c/0x864)
[<c027a97c>] (musb_interrupt+0x79c/0x864) from [<c027aaa8>] (generic_interrupt+0x64/0x7c)
[<c027aaa8>] (generic_interrupt+0x64/0x7c) from [<c00797cc>] (handle_irq_event_percpu+0x28/0x178)
...

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
commit 954c3f8 upstream.

We need to make sure that the USB serial driver we find
matches the USB driver whose probe we are currently
executing. Otherwise we will end up with USB serial
devices bound to the correct serial driver but wrong
USB driver.

An example of such cross-probing, where the usbserial_generic
USB driver has found the sierra serial driver:

May 29 18:26:15 nemi kernel: [ 4442.559246] usbserial_generic 4-4:1.0: Sierra USB modem converter detected
May 29 18:26:20 nemi kernel: [ 4447.556747] usbserial_generic 4-4:1.2: Sierra USB modem converter detected
May 29 18:26:25 nemi kernel: [ 4452.557288] usbserial_generic 4-4:1.3: Sierra USB modem converter detected

sysfs view of the same problem:

bjorn@nemi:~$ ls -l /sys/bus/usb/drivers/sierra/
total 0
--w------- 1 root root 4096 May 29 18:23 bind
lrwxrwxrwx 1 root root    0 May 29 18:23 module -> ../../../../module/usbserial
--w------- 1 root root 4096 May 29 18:23 uevent
--w------- 1 root root 4096 May 29 18:23 unbind
bjorn@nemi:~$ ls -l /sys/bus/usb-serial/drivers/sierra/
total 0
--w------- 1 root root 4096 May 29 18:23 bind
lrwxrwxrwx 1 root root    0 May 29 18:23 module -> ../../../../module/sierra
-rw-r--r-- 1 root root 4096 May 29 18:23 new_id
lrwxrwxrwx 1 root root    0 May 29 18:32 ttyUSB0 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.0/ttyUSB0
lrwxrwxrwx 1 root root    0 May 29 18:32 ttyUSB1 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.2/ttyUSB1
lrwxrwxrwx 1 root root    0 May 29 18:32 ttyUSB2 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.3/ttyUSB2
--w------- 1 root root 4096 May 29 18:23 uevent
--w------- 1 root root 4096 May 29 18:23 unbind

bjorn@nemi:~$ ls -l /sys/bus/usb/drivers/usbserial_generic/
total 0
lrwxrwxrwx 1 root root    0 May 29 18:33 4-4:1.0 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.0
lrwxrwxrwx 1 root root    0 May 29 18:33 4-4:1.2 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.2
lrwxrwxrwx 1 root root    0 May 29 18:33 4-4:1.3 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.3
--w------- 1 root root 4096 May 29 18:33 bind
lrwxrwxrwx 1 root root    0 May 29 18:33 module -> ../../../../module/usbserial
--w------- 1 root root 4096 May 29 18:22 uevent
--w------- 1 root root 4096 May 29 18:33 unbind
bjorn@nemi:~$ ls -l /sys/bus/usb-serial/drivers/generic/
total 0
--w------- 1 root root 4096 May 29 18:33 bind
lrwxrwxrwx 1 root root    0 May 29 18:33 module -> ../../../../module/usbserial
-rw-r--r-- 1 root root 4096 May 29 18:33 new_id
--w------- 1 root root 4096 May 29 18:22 uevent
--w------- 1 root root 4096 May 29 18:33 unbind

So we end up with a mismatch between the USB driver and the
USB serial driver.  The reason for the above is simple: The
USB driver probe will succeed if *any* registered serial
driver matches, and will use that serial driver for all
serial driver functions.

This makes ref counting go wrong. We count the USB driver
as used, but not the USB serial driver.  This may result
in Oops'es as demonstrated by Johan Hovold <jhovold@gmail.com>:

[11811.646396] drivers/usb/serial/usb-serial.c: get_free_serial 1
[11811.646443] drivers/usb/serial/usb-serial.c: get_free_serial - minor base = 0
[11811.646460] drivers/usb/serial/usb-serial.c: usb_serial_probe - registering ttyUSB0
[11811.646766] usb 6-1: pl2303 converter now attached to ttyUSB0
[11812.264197] USB Serial deregistering driver FTDI USB Serial Device
[11812.264865] usbcore: deregistering interface driver ftdi_sio
[11812.282180] USB Serial deregistering driver pl2303
[11812.283141] pl2303 ttyUSB0: pl2303 converter now disconnected from ttyUSB0
[11812.283272] usbcore: deregistering interface driver pl2303
[11812.301056] USB Serial deregistering driver generic
[11812.301186] usbcore: deregistering interface driver usbserial_generic
[11812.301259] drivers/usb/serial/usb-serial.c: usb_serial_disconnect
[11812.301823] BUG: unable to handle kernel paging request at f8e7438c
[11812.301845] IP: [<f8e38445>] usb_serial_disconnect+0xb5/0x100 [usbserial]
[11812.301871] *pde = 357ef067 *pte = 00000000
[11812.301957] Oops: 0000 [#1] PREEMPT SMP
[11812.301983] Modules linked in: usbserial(-) [last unloaded: pl2303]
[11812.302008]
[11812.302019] Pid: 1323, comm: modprobe Tainted: G        W    3.4.0-rc7+ #101 Dell Inc. Vostro 1520/0T816J
[11812.302115] EIP: 0060:[<f8e38445>] EFLAGS: 00010246 CPU: 1
[11812.302130] EIP is at usb_serial_disconnect+0xb5/0x100 [usbserial]
[11812.302141] EAX: f508a180 EBX: f508a180 ECX: 00000000 EDX: f8e74300
[11812.302151] ESI: f5050800 EDI: 00000001 EBP: f5141e78 ESP: f5141e58
[11812.302160]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[11812.302170] CR0: 8005003b CR2: f8e7438c CR3: 34848000 CR4: 000007d0
[11812.302180] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[11812.302189] DR6: ffff0ff0 DR7: 00000400
[11812.302199] Process modprobe (pid: 1323, ti=f5140000 task=f61e2bc0 task.ti=f5140000)
[11812.302209] Stack:
[11812.302216]  f8e3be0f f8e3b29c f8e3ae00 00000000 f513641c f5136400 f513641c f507a540
[11812.302325]  f5141e98 c133d2c1 00000000 00000000 f509c400 f513641c f507a590 f5136450
[11812.302372]  f5141ea8 c12f0344 f513641c f507a590 f5141ebc c12f0c67 00000000 f507a590
[11812.302419] Call Trace:
[11812.302439]  [<c133d2c1>] usb_unbind_interface+0x51/0x190
[11812.302456]  [<c12f0344>] __device_release_driver+0x64/0xb0
[11812.302469]  [<c12f0c67>] driver_detach+0x97/0xa0
[11812.302483]  [<c12f001c>] bus_remove_driver+0x6c/0xe0
[11812.302500]  [<c145938d>] ? __mutex_unlock_slowpath+0xcd/0x140
[11812.302514]  [<c12f0ff9>] driver_unregister+0x49/0x80
[11812.302528]  [<c1457df6>] ? printk+0x1d/0x1f
[11812.302540]  [<c133c50d>] usb_deregister+0x5d/0xb0
[11812.302557]  [<f8e37c55>] ? usb_serial_deregister+0x45/0x50 [usbserial]
[11812.302575]  [<f8e37c8d>] usb_serial_deregister_drivers+0x2d/0x40 [usbserial]
[11812.302593]  [<f8e3a6e2>] usb_serial_generic_deregister+0x12/0x20 [usbserial]
[11812.302611]  [<f8e3acf0>] usb_serial_exit+0x8/0x32 [usbserial]
[11812.302716]  [<c1080b48>] sys_delete_module+0x158/0x260
[11812.302730]  [<c110594e>] ? mntput+0x1e/0x30
[11812.302746]  [<c145c3c3>] ? sysenter_exit+0xf/0x18
[11812.302746]  [<c107777c>] ? trace_hardirqs_on_caller+0xec/0x170
[11812.302746]  [<c145c390>] sysenter_do_call+0x12/0x36
[11812.302746] Code: 24 02 00 00 e8 dd f3 20 c8 f6 86 74 02 00 00 02 74 b4 8d 86 4c 02 00 00 47 e8 78 55 4b c8 0f b6 43 0e 39 f8 7f a9 8b 53 04 89 d8 <ff> 92 8c 00 00 00 89 d8 e8 0e ff ff ff 8b 45 f0 c7 44 24 04 2f
[11812.302746] EIP: [<f8e38445>] usb_serial_disconnect+0xb5/0x100 [usbserial] SS:ESP 0068:f5141e58
[11812.302746] CR2: 00000000f8e7438c

Fix by only evaluating serial drivers pointing back to the
USB driver we are currently probing.  This still allows two
or more drivers to match the same device, running their
serial driver probes to sort out which one to use.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Reviewed-by: Felipe Balbi <balbi@ti.com>
Tested-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
[ Upstream commit 6bc96d0 ]

Fixes:
[   15.470311] WARNING: at /local/scratch/ianc/devel/kernels/linux/fs/sysfs/file.c:498 sysfs_attr_ns+0x95/0xa0()
[   15.470326] sysfs: kobject eth0 without dirent
[   15.470333] Modules linked in:
[   15.470342] Pid: 12, comm: xenwatch Not tainted 3.4.0-x86_32p-xenU #93
and
[    9.150554] BUG: unable to handle kernel paging request at 2b359000
[    9.150577] IP: [<c1279561>] linkwatch_do_dev+0x81/0xc0
[    9.150592] *pdpt = 000000002c3c9027 *pde = 0000000000000000
[    9.150604] Oops: 0002 [#1] SMP
[    9.150613] Modules linked in:

This is http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=675190

Reported-by: George Shuklin <george.shuklin@gmail.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: William Dauchy <wdauchy@gmail.com>
Cc: stable@kernel.org
Cc: 675190@bugs.debian.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
commit 03e934f upstream.

Sasha Levin reported following panic :

[ 2136.383310] BUG: unable to handle kernel NULL pointer dereference at
00000000000003b0
[ 2136.384022] IP: [<ffffffff8114e400>] __lock_acquire+0xc0/0x4b0
[ 2136.384022] PGD 131c4067 PUD 11c0c067 PMD 0
[ 2136.388106] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 2136.388106] CPU 1
[ 2136.388106] Pid: 24855, comm: trinity-child1 Tainted: G        W
3.5.0-rc2-sasha-00015-g7b268f7 #374
[ 2136.388106] RIP: 0010:[<ffffffff8114e400>]  [<ffffffff8114e400>]
__lock_acquire+0xc0/0x4b0
[ 2136.388106] RSP: 0018:ffff8800130b3ca8  EFLAGS: 00010046
[ 2136.388106] RAX: 0000000000000086 RBX: ffff88001186b000 RCX:
0000000000000000
[ 2136.388106] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
0000000000000000
[ 2136.388106] RBP: ffff8800130b3d08 R08: 0000000000000001 R09:
0000000000000000
[ 2136.388106] R10: 0000000000000000 R11: 0000000000000001 R12:
0000000000000002
[ 2136.388106] R13: 00000000000003b0 R14: 0000000000000000 R15:
0000000000000000
[ 2136.388106] FS:  00007fa5b1bd4700(0000) GS:ffff88001b800000(0000)
knlGS:0000000000000000
[ 2136.388106] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2136.388106] CR2: 00000000000003b0 CR3: 0000000011d1f000 CR4:
00000000000406e0
[ 2136.388106] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 2136.388106] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 2136.388106] Process trinity-child1 (pid: 24855, threadinfo
ffff8800130b2000, task ffff88001186b000)
[ 2136.388106] Stack:
[ 2136.388106]  ffff8800130b3cd8 ffffffff81121785 ffffffff81236774
000080d000000001
[ 2136.388106]  ffff88001b9d6c00 00000000001d6c00 ffffffff130b3d08
ffff88001186b000
[ 2136.388106]  0000000000000000 0000000000000002 0000000000000000
0000000000000000
[ 2136.388106] Call Trace:
[ 2136.388106]  [<ffffffff81121785>] ? sched_clock_local+0x25/0x90
[ 2136.388106]  [<ffffffff81236774>] ? get_empty_filp+0x74/0x220
[ 2136.388106]  [<ffffffff8114e97a>] lock_acquire+0x18a/0x1e0
[ 2136.388106]  [<ffffffff836b37df>] ? rawsock_release+0x4f/0xa0
[ 2136.388106]  [<ffffffff837c0ef0>] _raw_write_lock_bh+0x40/0x80
[ 2136.388106]  [<ffffffff836b37df>] ? rawsock_release+0x4f/0xa0
[ 2136.388106]  [<ffffffff836b37df>] rawsock_release+0x4f/0xa0
[ 2136.388106]  [<ffffffff8321cfe8>] sock_release+0x18/0x70
[ 2136.388106]  [<ffffffff8321d069>] sock_close+0x29/0x30
[ 2136.388106]  [<ffffffff81236bca>] __fput+0x11a/0x2c0
[ 2136.388106]  [<ffffffff81236d85>] fput+0x15/0x20
[ 2136.388106]  [<ffffffff8321de34>] sys_accept4+0x1b4/0x200
[ 2136.388106]  [<ffffffff837c165c>] ? _raw_spin_unlock_irq+0x4c/0x80
[ 2136.388106]  [<ffffffff837c1669>] ? _raw_spin_unlock_irq+0x59/0x80
[ 2136.388106]  [<ffffffff837c2565>] ? sysret_check+0x22/0x5d
[ 2136.388106]  [<ffffffff8321de8b>] sys_accept+0xb/0x10
[ 2136.388106]  [<ffffffff837c2539>] system_call_fastpath+0x16/0x1b
[ 2136.388106] Code: ec 04 00 0f 85 ea 03 00 00 be d5 0b 00 00 48 c7 c7
8a c1 40 84 e8 b1 a5 f8 ff 31 c0 e9 d4 03 00 00 66 2e 0f 1f 84 00 00 00
00 00 <49> 81 7d 00 60 73 5e 85 b8 01 00 00 00 44 0f 44 e0 83 fe 01 77
[ 2136.388106] RIP  [<ffffffff8114e400>] __lock_acquire+0xc0/0x4b0
[ 2136.388106]  RSP <ffff8800130b3ca8>
[ 2136.388106] CR2: 00000000000003b0
[ 2136.388106] ---[ end trace 6d450e935ee18982 ]---
[ 2136.388106] Kernel panic - not syncing: Fatal exception in interrupt

rawsock_release() should test if sock->sk is NULL before calling
sock_orphan()/sock_put()

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
commit fe20b39 upstream.

reg_timeout_work() calls restore_regulatory_settings() which
takes cfg80211_mutex.

reg_set_request_processed() already holds cfg80211_mutex
before calling cancel_delayed_work_sync(reg_timeout),
so it might deadlock.

Call the async cancel_delayed_work instead, in order
to avoid the potential deadlock.

This is the relevant lockdep warning:

cfg80211: Calling CRDA for country: XX

======================================================
[ INFO: possible circular locking dependency detected ]
3.4.0-rc5-wl+ #26 Not tainted
-------------------------------------------------------
kworker/0:2/1391 is trying to acquire lock:
 (cfg80211_mutex){+.+.+.}, at: [<bf28ae00>] restore_regulatory_settings+0x34/0x418 [cfg80211]

but task is already holding lock:
 ((reg_timeout).work){+.+...}, at: [<c0059e94>] process_one_work+0x1f0/0x480

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> mitwo-dev#2 ((reg_timeout).work){+.+...}:
       [<c008fd44>] validate_chain+0xb94/0x10f0
       [<c0090b68>] __lock_acquire+0x8c8/0x9b0
       [<c0090d40>] lock_acquire+0xf0/0x114
       [<c005b600>] wait_on_work+0x4c/0x154
       [<c005c000>] __cancel_work_timer+0xd4/0x11c
       [<c005c064>] cancel_delayed_work_sync+0x1c/0x20
       [<bf28b274>] reg_set_request_processed+0x50/0x78 [cfg80211]
       [<bf28bd84>] set_regdom+0x550/0x600 [cfg80211]
       [<bf294cd8>] nl80211_set_reg+0x218/0x258 [cfg80211]
       [<c03c7738>] genl_rcv_msg+0x1a8/0x1e8
       [<c03c6a00>] netlink_rcv_skb+0x5c/0xc0
       [<c03c7584>] genl_rcv+0x28/0x34
       [<c03c6720>] netlink_unicast+0x15c/0x228
       [<c03c6c7c>] netlink_sendmsg+0x218/0x298
       [<c03933c8>] sock_sendmsg+0xa4/0xc0
       [<c039406c>] __sys_sendmsg+0x1e4/0x268
       [<c0394228>] sys_sendmsg+0x4c/0x70
       [<c0013840>] ret_fast_syscall+0x0/0x3c

-> #1 (reg_mutex){+.+.+.}:
       [<c008fd44>] validate_chain+0xb94/0x10f0
       [<c0090b68>] __lock_acquire+0x8c8/0x9b0
       [<c0090d40>] lock_acquire+0xf0/0x114
       [<c04734dc>] mutex_lock_nested+0x48/0x320
       [<bf28b2cc>] reg_todo+0x30/0x538 [cfg80211]
       [<c0059f44>] process_one_work+0x2a0/0x480
       [<c005a4b4>] worker_thread+0x1bc/0x2bc
       [<c0061148>] kthread+0x98/0xa4
       [<c0014af4>] kernel_thread_exit+0x0/0x8

-> #0 (cfg80211_mutex){+.+.+.}:
       [<c008ed58>] print_circular_bug+0x68/0x2cc
       [<c008fb28>] validate_chain+0x978/0x10f0
       [<c0090b68>] __lock_acquire+0x8c8/0x9b0
       [<c0090d40>] lock_acquire+0xf0/0x114
       [<c04734dc>] mutex_lock_nested+0x48/0x320
       [<bf28ae00>] restore_regulatory_settings+0x34/0x418 [cfg80211]
       [<bf28b200>] reg_timeout_work+0x1c/0x20 [cfg80211]
       [<c0059f44>] process_one_work+0x2a0/0x480
       [<c005a4b4>] worker_thread+0x1bc/0x2bc
       [<c0061148>] kthread+0x98/0xa4
       [<c0014af4>] kernel_thread_exit+0x0/0x8

other info that might help us debug this:

Chain exists of:
  cfg80211_mutex --> reg_mutex --> (reg_timeout).work

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((reg_timeout).work);
                               lock(reg_mutex);
                               lock((reg_timeout).work);
  lock(cfg80211_mutex);

 *** DEADLOCK ***

2 locks held by kworker/0:2/1391:
 #0:  (events){.+.+.+}, at: [<c0059e94>] process_one_work+0x1f0/0x480
 #1:  ((reg_timeout).work){+.+...}, at: [<c0059e94>] process_one_work+0x1f0/0x480

stack backtrace:
[<c001b928>] (unwind_backtrace+0x0/0x12c) from [<c0471d3c>] (dump_stack+0x20/0x24)
[<c0471d3c>] (dump_stack+0x20/0x24) from [<c008ef70>] (print_circular_bug+0x280/0x2cc)
[<c008ef70>] (print_circular_bug+0x280/0x2cc) from [<c008fb28>] (validate_chain+0x978/0x10f0)
[<c008fb28>] (validate_chain+0x978/0x10f0) from [<c0090b68>] (__lock_acquire+0x8c8/0x9b0)
[<c0090b68>] (__lock_acquire+0x8c8/0x9b0) from [<c0090d40>] (lock_acquire+0xf0/0x114)
[<c0090d40>] (lock_acquire+0xf0/0x114) from [<c04734dc>] (mutex_lock_nested+0x48/0x320)
[<c04734dc>] (mutex_lock_nested+0x48/0x320) from [<bf28ae00>] (restore_regulatory_settings+0x34/0x418 [cfg80211])
[<bf28ae00>] (restore_regulatory_settings+0x34/0x418 [cfg80211]) from [<bf28b200>] (reg_timeout_work+0x1c/0x20 [cfg80211])
[<bf28b200>] (reg_timeout_work+0x1c/0x20 [cfg80211]) from [<c0059f44>] (process_one_work+0x2a0/0x480)
[<c0059f44>] (process_one_work+0x2a0/0x480) from [<c005a4b4>] (worker_thread+0x1bc/0x2bc)
[<c005a4b4>] (worker_thread+0x1bc/0x2bc) from [<c0061148>] (kthread+0x98/0xa4)
[<c0061148>] (kthread+0x98/0xa4) from [<c0014af4>] (kernel_thread_exit+0x0/0x8)
cfg80211: Calling CRDA to update world regulatory domain
cfg80211: World regulatory domain updated:
cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
commit d9b8706 upstream.

usbnet_disconnect() will set intfdata to NULL before calling
the minidriver unbind function.  The cdc_wdm subdriver cannot
know that it is disconnecting until the qmi_wwan unbind
function has called its disconnect function.  This means that
we must be able to support the cdc_wdm subdriver operating
normally while usbnet_disconnect() is running, and in
particular that intfdata may be NULL.

The only place this matters is in qmi_wwan_cdc_wdm_manage_power
which is called from cdc_wdm.  Simply testing for NULL
intfdata there is sufficient to allow it to continue working
at all times.

Fixes this Oops where a cdc-wdm device was closed while the
USB device was disconnecting, causing wdm_release to call
qmi_wwan_cdc_wdm_manage_power after intfdata was set to
NULL by usbnet_disconnect:

[41819.087460] BUG: unable to handle kernel NULL pointer dereference at 00000080
[41819.087815] IP: [<f8640458>] qmi_wwan_manage_power+0x68/0x90 [qmi_wwan]
[41819.088028] *pdpt = 000000000314f001 *pde = 0000000000000000
[41819.088028] Oops: 0002 [#1] SMP
[41819.088028] Modules linked in: qmi_wwan option usb_wwan usbserial usbnet
cdc_wdm nls_iso8859_1 nls_cp437 vfat fat usb_storage bnep rfcomm bluetooth
parport_pc ppdev binfmt_misc iptable_nat nf_nat nf_conntrack_ipv4
nf_conntrack nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables
x_tables dm_crypt uvcvideo snd_hda_codec_realtek snd_hda_intel
videobuf2_core snd_hda_codec joydev videodev videobuf2_vmalloc
hid_multitouch snd_hwdep arc4 videobuf2_memops snd_pcm snd_seq_midi
snd_rawmidi snd_seq_midi_event ath9k mac80211 snd_seq ath9k_common ath9k_hw
ath snd_timer snd_seq_device sparse_keymap dm_multipath scsi_dh coretemp
mac_hid snd soundcore cfg80211 snd_page_alloc psmouse serio_raw microcode
lp parport dm_mirror dm_region_hash dm_log usbhid hid i915 drm_kms_helper
drm r8169 i2c_algo_bit wmi video [last unloaded: qmi_wwan]
[41819.088028]
[41819.088028] Pid: 23292, comm: qmicli Not tainted 3.4.0-5-generic MiCode#11-Ubuntu GIGABYTE T1005/T1005
[41819.088028] EIP: 0060:[<f8640458>] EFLAGS: 00010246 CPU: 1
[41819.088028] EIP is at qmi_wwan_manage_power+0x68/0x90 [qmi_wwan]
[41819.088028] EAX: 00000000 EBX: 00000000 ECX: 000000c3 EDX: 00000000
[41819.088028] ESI: c3b27658 EDI: 00000000 EBP: c298bea4 ESP: c298be98
[41819.088028]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[41819.088028] CR0: 8005003b CR2: 00000080 CR3: 3605e000 CR4: 000007f0
[41819.088028] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[41819.088028] DR6: ffff0ff0 DR7: 00000400
[41819.088028] Process qmicli (pid: 23292, ti=c298a000 task=f343b280 task.ti=c298a000)
[41819.088028] Stack:
[41819.088028]  00000000 c3b27658 e2a80d00 c298beb0 f864051a c3b27600 c298bec0 f9027099
[41819.088028]  c2fd6000 00000008 c298bef0 c1147f96 00000001 00000000 00000000 f4e54790
[41819.088028]  ecf43a00 ecf43a00 c2fd6008 c2fd6000 ebbd7600 ffffffb9 c298bf08 c1144474
[41819.088028] Call Trace:
[41819.088028]  [<f864051a>] qmi_wwan_cdc_wdm_manage_power+0x1a/0x20 [qmi_wwan]
[41819.088028]  [<f9027099>] wdm_release+0x69/0x70 [cdc_wdm]
[41819.088028]  [<c1147f96>] fput+0xe6/0x210
[41819.088028]  [<c1144474>] filp_close+0x54/0x80
[41819.088028]  [<c1046a65>] put_files_struct+0x75/0xc0
[41819.088028]  [<c1046b56>] exit_files+0x46/0x60
[41819.088028]  [<c1046f81>] do_exit+0x141/0x780
[41819.088028]  [<c107248f>] ? wake_up_state+0xf/0x20
[41819.088028]  [<c1053f48>] ? signal_wake_up+0x28/0x40
[41819.088028]  [<c1054f3b>] ? zap_other_threads+0x6b/0x80
[41819.088028]  [<c1047864>] do_group_exit+0x34/0xa0
[41819.088028]  [<c10478e8>] sys_exit_group+0x18/0x20
[41819.088028]  [<c15bb7df>] sysenter_do_call+0x12/0x28
[41819.088028] Code: 04 83 e7 01 c1 e7 03 0f b6 42 18 83 e0 f7 09 f8 88 42
18 8b 43 04 e8 48 9a dd c8 89 f0 8b 5d f4 8b 75 f8 8b 7d fc 89 ec 5d c3 90
<f0> ff 88 80 00 00 00 0f 94 c0 84 c0 75 b7 31 f6 8b 5d f4 89 f0
[41819.088028] EIP: [<f8640458>] qmi_wwan_manage_power+0x68/0x90 [qmi_wwan] SS:ESP 0068:c298be98
[41819.088028] CR2: 0000000000000080
[41819.149492] ---[ end trace 0944479ff8257f55 ]---

Reported-by: Marius Bjørnstad Kotsbak <marius.kotsbak@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
commit 6266230 upstream.

If CONFIG_DM_DEBUG_SPACE_MAPS is enabled and dm_sm_checker_create()
fails, dm_tm_create_internal() would still return success even though it
cleaned up all resources it was supposed to have created.  This will
lead to a kernel crash:

general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
...
RIP: 0010:[<ffffffff81593659>]  [<ffffffff81593659>] dm_bufio_get_block_size+0x9/0x20
Call Trace:
  [<ffffffff81599bae>] dm_bm_block_size+0xe/0x10
  [<ffffffff8159b8b8>] sm_ll_init+0x78/0xd0
  [<ffffffff8159c1a6>] sm_ll_new_disk+0x16/0xa0
  [<ffffffff8159c98e>] dm_sm_disk_create+0xfe/0x160
  [<ffffffff815abf6e>] dm_pool_metadata_open+0x16e/0x6a0
  [<ffffffff815aa010>] pool_ctr+0x3f0/0x900
  [<ffffffff8158d565>] dm_table_add_target+0x195/0x450
  [<ffffffff815904c4>] table_load+0xe4/0x330
  [<ffffffff815917ea>] ctl_ioctl+0x15a/0x2c0
  [<ffffffff81591963>] dm_ctl_ioctl+0x13/0x20
  [<ffffffff8116a4f8>] do_vfs_ioctl+0x98/0x560
  [<ffffffff8116aa51>] sys_ioctl+0x91/0xa0
  [<ffffffff81869f52>] system_call_fastpath+0x16/0x1b

Fix the space map checker code to return an appropriate ERR_PTR and have
dm_sm_disk_create() and dm_tm_create_internal() check for it with
IS_ERR.

Reported-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
commit 60d65f1 upstream.

Don't grab the daemon mutex while holding the message context mutex.
Addresses this lockdep warning:

 ecryptfsd/2141 is trying to acquire lock:
  (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}, at: [<ffffffffa029c213>] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs]

 but task is already holding lock:
  (&(*daemon)->mux){+.+...}, at: [<ffffffffa029c2ec>] ecryptfs_miscdev_read+0x21c/0x470 [ecryptfs]

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&(*daemon)->mux){+.+...}:
        [<ffffffff810a3b8d>] lock_acquire+0x9d/0x220
        [<ffffffff8151c6da>] __mutex_lock_common+0x5a/0x4b0
        [<ffffffff8151cc64>] mutex_lock_nested+0x44/0x50
        [<ffffffffa029c5d7>] ecryptfs_send_miscdev+0x97/0x120 [ecryptfs]
        [<ffffffffa029b744>] ecryptfs_send_message+0x134/0x1e0 [ecryptfs]
        [<ffffffffa029a24e>] ecryptfs_generate_key_packet_set+0x2fe/0xa80 [ecryptfs]
        [<ffffffffa02960f8>] ecryptfs_write_metadata+0x108/0x250 [ecryptfs]
        [<ffffffffa0290f80>] ecryptfs_create+0x130/0x250 [ecryptfs]
        [<ffffffff811963a4>] vfs_create+0xb4/0x120
        [<ffffffff81197865>] do_last+0x8c5/0xa10
        [<ffffffff811998f9>] path_openat+0xd9/0x460
        [<ffffffff81199da2>] do_filp_open+0x42/0xa0
        [<ffffffff81187998>] do_sys_open+0xf8/0x1d0
        [<ffffffff81187a91>] sys_open+0x21/0x30
        [<ffffffff81527d69>] system_call_fastpath+0x16/0x1b

 -> #0 (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}:
        [<ffffffff810a3418>] __lock_acquire+0x1bf8/0x1c50
        [<ffffffff810a3b8d>] lock_acquire+0x9d/0x220
        [<ffffffff8151c6da>] __mutex_lock_common+0x5a/0x4b0
        [<ffffffff8151cc64>] mutex_lock_nested+0x44/0x50
        [<ffffffffa029c213>] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs]
        [<ffffffff811887d3>] vfs_read+0xb3/0x180
        [<ffffffff811888ed>] sys_read+0x4d/0x90
        [<ffffffff81527d69>] system_call_fastpath+0x16/0x1b

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
…condition

commit 26c1917 upstream.

When holding the mmap_sem for reading, pmd_offset_map_lock should only
run on a pmd_t that has been read atomically from the pmdp pointer,
otherwise we may read only half of it leading to this crash.

PID: 11679  TASK: f06e8000  CPU: 3   COMMAND: "do_race_2_panic"
 #0 [f06a9dd8] crash_kexec at c049b5ec
 #1 [f06a9e2c] oops_end at c083d1c2
 mitwo-dev#2 [f06a9e40] no_context at c0433ded
 MiCode#3 [f06a9e64] bad_area_nosemaphore at c043401a
 MiCode#4 [f06a9e6c] __do_page_fault at c0434493
 MiCode#5 [f06a9eec] do_page_fault at c083eb45
 MiCode#6 [f06a9f04] error_code (via page_fault) at c083c5d5
    EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP:
    00000000
    DS:  007b     ESI: 9e201000 ES:  007b     EDI: 01fb4700 GS:  00e0
    CS:  0060     EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246
 MiCode#7 [f06a9f38] _spin_lock at c083bc14
 MiCode#8 [f06a9f44] sys_mincore at c0507b7d
 MiCode#9 [f06a9fb0] system_call at c083becd
                         start           len
    EAX: ffffffda  EBX: 9e200000  ECX: 00001000  EDX: 6228537f
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 003d0f00
    SS:  007b      ESP: 62285354  EBP: 62285388  GS:  0033
    CS:  0073      EIP: 00291416  ERR: 000000da  EFLAGS: 00000286

This should be a longstanding bug affecting x86 32bit PAE without THP.
Only archs with 64bit large pmd_t and 32bit unsigned long should be
affected.

With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad()
would partly hide the bug when the pmd transition from none to stable,
by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is
enabled a new set of problem arises by the fact could then transition
freely in any of the none, pmd_trans_huge or pmd_trans_stable states.
So making the barrier in pmd_none_or_trans_huge_or_clear_bad()
unconditional isn't good idea and it would be a flakey solution.

This should be fully fixed by introducing a pmd_read_atomic that reads
the pmd in order with THP disabled, or by reading the pmd atomically
with cmpxchg8b with THP enabled.

Luckily this new race condition only triggers in the places that must
already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix
is localized there but this bug is not related to THP.

NOTE: this can trigger on x86 32bit systems with PAE enabled with more
than 4G of ram, otherwise the high part of the pmd will never risk to be
truncated because it would be zero at all times, in turn so hiding the
SMP race.

This bug was discovered and fully debugged by Ulrich, quote:

----
[..]
pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and
eax.

    496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t
    *pmd)
    497 {
    498         /* depend on compiler for an atomic pmd read */
    499         pmd_t pmdval = *pmd;

                                // edi = pmd pointer
0xc0507a74 <sys_mincore+548>:   mov    0x8(%esp),%edi
...
                                // edx = PTE page table high address
0xc0507a84 <sys_mincore+564>:   mov    0x4(%edi),%edx
...
                                // eax = PTE page table low address
0xc0507a8e <sys_mincore+574>:   mov    (%edi),%eax

[..]

Please note that the PMD is not read atomically. These are two "mov"
instructions where the high order bits of the PMD entry are fetched
first. Hence, the above machine code is prone to the following race.

-  The PMD entry {high|low} is 0x0000000000000000.
   The "mov" at 0xc0507a84 loads 0x00000000 into edx.

-  A page fault (on another CPU) sneaks in between the two "mov"
   instructions and instantiates the PMD.

-  The PMD entry {high|low} is now 0x00000003fda38067.
   The "mov" at 0xc0507a8e loads 0xfda38067 into eax.
----

Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
commit 0fde0a8 upstream.

Fix:

BUG: sleeping function called from invalid context at kernel/workqueue.c:2547
in_atomic(): 1, irqs_disabled(): 0, pid: 629, name: wpa_supplicant
2 locks held by wpa_supplicant/629:
 #0:  (rtnl_mutex){+.+.+.}, at: [<c08b2b84>] rtnl_lock+0x14/0x20
 #1:  (&trigger->leddev_list_lock){.+.?..}, at: [<c0867f41>] led_trigger_event+0x21/0x80
Pid: 629, comm: wpa_supplicant Not tainted 3.3.0-0.rc3.git5.1.fc17.i686
Call Trace:
 [<c046a9f6>] __might_sleep+0x126/0x1d0
 [<c0457d6c>] wait_on_work+0x2c/0x1d0
 [<c045a09a>] __cancel_work_timer+0x6a/0x120
 [<c045a160>] cancel_delayed_work_sync+0x10/0x20
 [<f7dd3c22>] rtl8187_led_brightness_set+0x82/0xf0 [rtl8187]
 [<c0867f7c>] led_trigger_event+0x5c/0x80
 [<f7ff5e6d>] ieee80211_led_radio+0x1d/0x40 [mac80211]
 [<f7ff3583>] ieee80211_stop_device+0x13/0x230 [mac80211]

Removing _sync is ok, because if led_on work is currently running
it will be finished before led_off work start to perform, since
they are always queued on the same mac80211 local->workqueue.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=795176

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
commit d8adde1 upstream.

kswapd_stop() is called to destroy the kswapd work thread when all memory
of a NUMA node has been offlined.  But kswapd_stop() only terminates the
work thread without resetting NODE_DATA(nid)->kswapd to NULL.  The stale
pointer will prevent kswapd_run() from creating a new work thread when
adding memory to the memory-less NUMA node again.  Eventually the stale
pointer may cause invalid memory access.

An example stack dump as below. It's reproduced with 2.6.32, but latest
kernel has the same issue.

  BUG: unable to handle kernel NULL pointer dereference at (null)
  IP: [<ffffffff81051a94>] exit_creds+0x12/0x78
  PGD 0
  Oops: 0000 [#1] SMP
  last sysfs file: /sys/devices/system/memory/memory391/state
  CPU 11
  Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq microcode fuse loop dm_mod tpm_tis rtc_cmos i2c_i801 rtc_core tpm serio_raw pcspkr sg tpm_bios igb i2c_core iTCO_wdt rtc_lib mptctl iTCO_vendor_support button dca bnx2 usbhid hid uhci_hcd ehci_hcd usbcore sd_mod crc_t10dif edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic ata_piix libata thermal processor thermal_sys hwmon mptsas mptscsih mptbase scsi_transport_sas scsi_mod
  Pid: 7949, comm: sh Not tainted 2.6.32.12-qiuxishi-5-default #92 Tecal RH2285
  RIP: 0010:exit_creds+0x12/0x78
  RSP: 0018:ffff8806044f1d78  EFLAGS: 00010202
  RAX: 0000000000000000 RBX: ffff880604f22140 RCX: 0000000000019502
  RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000
  RBP: ffff880604f22150 R08: 0000000000000000 R09: ffffffff81a4dc10
  R10: 00000000000032a0 R11: ffff880006202500 R12: 0000000000000000
  R13: 0000000000c40000 R14: 0000000000008000 R15: 0000000000000001
  FS:  00007fbc03d066f0(0000) GS:ffff8800282e0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000000 CR3: 000000060f029000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process sh (pid: 7949, threadinfo ffff8806044f0000, task ffff880603d7c600)
  Stack:
   ffff880604f22140 ffffffff8103aac5 ffff880604f22140 ffffffff8104d21e
   ffff880006202500 0000000000008000 0000000000c38000 ffffffff810bd5b1
   0000000000000000 ffff880603d7c600 00000000ffffdd29 0000000000000003
  Call Trace:
    __put_task_struct+0x5d/0x97
    kthread_stop+0x50/0x58
    offline_pages+0x324/0x3da
    memory_block_change_state+0x179/0x1db
    store_mem_state+0x9e/0xbb
    sysfs_write_file+0xd0/0x107
    vfs_write+0xad/0x169
    sys_write+0x45/0x6e
    system_call_fastpath+0x16/0x1b
  Code: ff 4d 00 0f 94 c0 84 c0 74 08 48 89 ef e8 1f fd ff ff 5b 5d 31 c0 41 5c c3 53 48 8b 87 20 06 00 00 48 89 fb 48 8b bf 18 06 00 00 <8b> 00 48 c7 83 18 06 00 00 00 00 00 00 f0 ff 0f 0f 94 c0 84 c0
  RIP  exit_creds+0x12/0x78
   RSP <ffff8806044f1d78>
  CR2: 0000000000000000

[akpm@linux-foundation.org: add pglist_data.kswapd locking comments]
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
commit 29f6738 upstream.

memblock_free_reserved_regions() calls memblock_free(), but
memblock_free() would double reserved.regions too, so we could free the
old range for reserved.regions.

Also tj said there is another bug which could be related to this.

| I don't think we're saving any noticeable
| amount by doing this "free - give it to page allocator - reserve
| again" dancing.  We should just allocate regions aligned to page
| boundaries and free them later when memblock is no longer in use.

in that case, when DEBUG_PAGEALLOC, will get panic:

     memblock_free: [0x0000102febc080-0x0000102febf080] memblock_free_reserved_regions+0x37/0x39
  BUG: unable to handle kernel paging request at ffff88102febd948
  IP: [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155
  PGD 4826063 PUD cf67a067 PMD cf7fa067 PTE 800000102febd160
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  CPU 0
  Pid: 0, comm: swapper Not tainted 3.5.0-rc2-next-20120614-sasha #447
  RIP: 0010:[<ffffffff836a5774>]  [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155

See the discussion at https://lkml.org/lkml/2012/6/13/469

So try to allocate with PAGE_SIZE alignment and free it later.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
commit 8d657eb upstream.

This can be trivially triggered from userspace by passing in something unexpected.

    kernel BUG at fs/locks.c:1468!
    invalid opcode: 0000 [#1] SMP
    RIP: 0010:generic_setlease+0xc2/0x100
    Call Trace:
      __vfs_setlease+0x35/0x40
      fcntl_setlease+0x76/0x150
      sys_fcntl+0x1c6/0x810
      system_call_fastpath+0x1a/0x1f

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
commit 3cf003c upstream.

[The async read code was broadened to include uncached reads in 3.5, so
the mainline patch did not apply directly. This patch is just a backport
to account for that change.]

Jian found that when he ran fsx on a 32 bit arch with a large wsize the
process and one of the bdi writeback kthreads would sometimes deadlock
with a stack trace like this:

crash> bt
PID: 2789   TASK: f02edaa0  CPU: 3   COMMAND: "fsx"
 #0 [eed63cbc] schedule at c083c5b3
 #1 [eed63d80] kmap_high at c0500ec8
 mitwo-dev#2 [eed63db0] cifs_async_writev at f7fabcd7 [cifs]
 MiCode#3 [eed63df0] cifs_writepages at f7fb7f5c [cifs]
 MiCode#4 [eed63e50] do_writepages at c04f3e32
 MiCode#5 [eed63e54] __filemap_fdatawrite_range at c04e152a
 MiCode#6 [eed63ea4] filemap_fdatawrite at c04e1b3e
 MiCode#7 [eed63eb4] cifs_file_aio_write at f7fa111a [cifs]
 MiCode#8 [eed63ecc] do_sync_write at c052d202
 MiCode#9 [eed63f74] vfs_write at c052d4ee
MiCode#10 [eed63f94] sys_write at c052df4c
MiCode#11 [eed63fb0] ia32_sysenter_target at c0409a98
    EAX: 00000004  EBX: 00000003  ECX: abd73b73  EDX: 012a65c6
    DS:  007b      ESI: 012a65c6  ES:  007b      EDI: 00000000
    SS:  007b      ESP: bf8db178  EBP: bf8db1f8  GS:  0033
    CS:  0073      EIP: 40000424  ERR: 00000004  EFLAGS: 00000246

Each task would kmap part of its address array before getting stuck, but
not enough to actually issue the write.

This patch fixes this by serializing the marshal_iov operations for
async reads and writes. The idea here is to ensure that cifs
aggressively tries to populate a request before attempting to fulfill
another one. As soon as all of the pages are kmapped for a request, then
we can unlock and allow another one to proceed.

There's no need to do this serialization on non-CONFIG_HIGHMEM arches
however, so optimize all of this out when CONFIG_HIGHMEM isn't set.

Reported-by: Jian Li <jiali@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
…d reasons

commit 5cf02d0 upstream.

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     mitwo-dev#2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     MiCode#3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     MiCode#4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     MiCode#5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     MiCode#6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     MiCode#7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     MiCode#8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     MiCode#9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    MiCode#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    MiCode#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    MiCode#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    MiCode#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
commit a427b9e upstream.

Fix a number of bugs in the NFS idmapper code:

 (1) Only registered key types can be passed to the core keys code, so
     register the legacy idmapper key type.

     This is a requirement because the unregister function cleans up keys
     belonging to that key type so that there aren't dangling pointers to the
     module left behind - including the key->type pointer.

 (2) Rename the legacy key type.  You can't have two key types with the same
     name, and (1) would otherwise require that.

 (3) complete_request_key() must be called in the error path of
     nfs_idmap_legacy_upcall().

 (4) There is one idmap struct for each nfs_client struct.  This means that
     idmap->idmap_key_cons is shared without the use of a lock.  This is a
     problem because key_instantiate_and_link() - as called indirectly by
     idmap_pipe_downcall() - releases anyone waiting for the key to be
     instantiated.

     What happens is that idmap_pipe_downcall() running in the rpc.idmapd
     thread, releases the NFS filesystem in whatever thread that is running in
     to continue.  This may then make another idmapper call, overwriting
     idmap_key_cons before idmap_pipe_downcall() gets the chance to call
     complete_request_key().

     I *think* that reading idmap_key_cons only once, before
     key_instantiate_and_link() is called, and then caching the result in a
     variable is sufficient.

Bug (4) is the cause of:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<          (null)>]           (null)
PGD 0
Oops: 0010 [#1] SMP
CPU 1
Modules linked in: ppdev parport_pc lp parport ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack nfs fscache xt_CHECKSUM auth_rpcgss iptable_mangle nfs_acl bridge stp llc lockd be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_codec_realtek snd_usb_audio snd_hda_intel snd_hda_codec snd_seq snd_pcm snd_hwdep snd_usbmidi_lib snd_rawmidi snd_timer uvcvideo videobuf2_core videodev media videobuf2_vmalloc snd_seq_device videobuf2_memops e1000e vhost_net iTCO_wdt joydev coretemp snd soundcore macvtap macvlan i2c_i801 snd_page_alloc tun iTCO_vendor_support microcode kvm_intel kvm sunrpc hid_logitech_dj usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
Pid: 1229, comm: rpc.idmapd Not tainted 3.4.2-1.fc16.x86_64 #1 Gateway DX4710-UB801A/G33M05G1
RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
RSP: 0018:ffff8801a3645d40  EFLAGS: 00010246
RAX: ffff880077707e30 RBX: ffff880077707f50 RCX: ffff8801a18ccd80
RDX: 0000000000000006 RSI: ffff8801a3645e75 RDI: ffff880077707f50
RBP: ffff8801a3645d88 R08: ffff8801a430f9c0 R09: ffff8801a3645db0
R10: 000000000000000a R11: 0000000000000246 R12: ffff8801a18ccd80
R13: ffff8801a3645e75 R14: ffff8801a430f9c0 R15: 0000000000000006
FS:  00007fb6fb51a700(0000) GS:ffff8801afc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001a49b0000 CR4: 00000000000027e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rpc.idmapd (pid: 1229, threadinfo ffff8801a3644000, task ffff8801a3bf9710)
Stack:
 ffffffff81260878 ffff8801a3645db0 ffff8801a3645db0 ffff880077707a90
 ffff880077707f50 ffff8801a18ccd80 0000000000000006 ffff8801a3645e75
 ffff8801a430f9c0 ffff8801a3645dd8 ffffffff81260983 ffff8801a3645de8
Call Trace:
 [<ffffffff81260878>] ? __key_instantiate_and_link+0x58/0x100
 [<ffffffff81260983>] key_instantiate_and_link+0x63/0xa0
 [<ffffffffa057062b>] idmap_pipe_downcall+0x1cb/0x1e0 [nfs]
 [<ffffffffa0107f57>] rpc_pipe_write+0x67/0x90 [sunrpc]
 [<ffffffff8117f833>] vfs_write+0xb3/0x180
 [<ffffffff8117fb5a>] sys_write+0x4a/0x90
 [<ffffffff81600329>] system_call_fastpath+0x16/0x1b
Code:  Bad RIP value.
RIP  [<          (null)>]           (null)
 RSP <ffff8801a3645d40>
CR2: 0000000000000000

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
…list

[ Upstream commit 2eebc1e ]

A few days ago Dave Jones reported this oops:

[22766.294255] general protection fault: 0000 [#1] PREEMPT SMP
[22766.295376] CPU 0
[22766.295384] Modules linked in:
[22766.387137]  ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90
ffff880147c03a74
[22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000
[22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000,
[22766.387137] Stack:
[22766.387140]  ffff880147c03a10
[22766.387140]  ffffffffa169f2b6
[22766.387140]  ffff88013ed95728
[22766.387143]  0000000000000002
[22766.387143]  0000000000000000
[22766.387143]  ffff880003fad062
[22766.387144]  ffff88013c120000
[22766.387144]
[22766.387145] Call Trace:
[22766.387145]  <IRQ>
[22766.387150]  [<ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0
[sctp]
[22766.387154]  [<ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp]
[22766.387157]  [<ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp]
[22766.387161]  [<ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0
[22766.387163]  [<ffffffff815827e3>] ? nf_hook_slow+0x133/0x210
[22766.387166]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387168]  [<ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0
[22766.387169]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387171]  [<ffffffff81590a07>] ip_local_deliver+0x47/0x80
[22766.387172]  [<ffffffff8158fd80>] ip_rcv_finish+0x150/0x680
[22766.387174]  [<ffffffff81590c54>] ip_rcv+0x214/0x320
[22766.387176]  [<ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910
[22766.387178]  [<ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910
[22766.387180]  [<ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40
[22766.387182]  [<ffffffff81558f83>] netif_receive_skb+0x23/0x1f0
[22766.387183]  [<ffffffff815596a9>] ? dev_gro_receive+0x139/0x440
[22766.387185]  [<ffffffff81559280>] napi_skb_finish+0x70/0xa0
[22766.387187]  [<ffffffff81559cb5>] napi_gro_receive+0xf5/0x130
[22766.387218]  [<ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e]
[22766.387242]  [<ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e]
[22766.387266]  [<ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e]
[22766.387268]  [<ffffffff81559fea>] net_rx_action+0x1aa/0x3d0
[22766.387270]  [<ffffffff810a495f>] ? account_system_vtime+0x10f/0x130
[22766.387273]  [<ffffffff810734d0>] __do_softirq+0xe0/0x420
[22766.387275]  [<ffffffff8169826c>] call_softirq+0x1c/0x30
[22766.387278]  [<ffffffff8101db15>] do_softirq+0xd5/0x110
[22766.387279]  [<ffffffff81073bc5>] irq_exit+0xd5/0xe0
[22766.387281]  [<ffffffff81698b03>] do_IRQ+0x63/0xd0
[22766.387283]  [<ffffffff8168ee2f>] common_interrupt+0x6f/0x6f
[22766.387283]  <EOI>
[22766.387284]
[22766.387285]  [<ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b
[22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48
89 e5 48 83
ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00
48 89 fb
49 89 f5 66 c1 c0 08 66 39 46 02
[22766.387307]
[22766.387307] RIP
[22766.387311]  [<ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp]
[22766.387311]  RSP <ffff880147c039b0>
[22766.387142]  ffffffffa16ab120
[22766.599537] ---[ end trace 3f6dae82e37b17f5 ]---
[22766.601221] Kernel panic - not syncing: Fatal exception in interrupt

It appears from his analysis and some staring at the code that this is likely
occuring because an association is getting freed while still on the
sctp_assoc_hashtable.  As a result, we get a gpf when traversing the hashtable
while a freed node corrupts part of the list.

Nominally I would think that an mibalanced refcount was responsible for this,
but I can't seem to find any obvious imbalance.  What I did note however was
that the two places where we create an association using
sctp_primitive_ASSOCIATE (__sctp_connect and sctp_sendmsg), have failure paths
which free a newly created association after calling sctp_primitive_ASSOCIATE.
sctp_primitive_ASSOCIATE brings us into the sctp_sf_do_prm_asoc path, which
issues a SCTP_CMD_NEW_ASOC side effect, which in turn adds a new association to
the aforementioned hash table.  the sctp command interpreter that process side
effects has not way to unwind previously processed commands, so freeing the
association from the __sctp_connect or sctp_sendmsg error path would lead to a
freed association remaining on this hash table.

I've fixed this but modifying sctp_[un]hash_established to use hlist_del_init,
which allows us to proerly use hlist_unhashed to check if the node is on a
hashlist safely during a delete.  That in turn alows us to safely call
sctp_unhash_established in the __sctp_connect and sctp_sendmsg error paths
before freeing them, regardles of what the associations state is on the hash
list.

I noted, while I was doing this, that the __sctp_unhash_endpoint was using
hlist_unhsashed in a simmilar fashion, but never nullified any removed nodes
pointers to make that function work properly, so I fixed that up in a simmilar
fashion.

I attempted to test this using a virtual guest running the SCTP_RR test from
netperf in a loop while running the trinity fuzzer, both in a loop.  I wasn't
able to recreate the problem prior to this fix, nor was I able to trigger the
failure after (neither of which I suppose is suprising).  Given the trace above
however, I think its likely that this is what we hit.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: davej@redhat.com
CC: davej@redhat.com
CC: "David S. Miller" <davem@davemloft.net>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: linux-sctp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
…bles

commit d833352 upstream.

If a process creates a large hugetlbfs mapping that is eligible for page
table sharing and forks heavily with children some of whom fault and
others which destroy the mapping then it is possible for page tables to
get corrupted.  Some teardowns of the mapping encounter a "bad pmd" and
output a message to the kernel log.  The final teardown will trigger a
BUG_ON in mm/filemap.c.

This was reproduced in 3.4 but is known to have existed for a long time
and goes back at least as far as 2.6.37.  It was probably was introduced
in 2.6.20 by [39dde65: shared page table for hugetlb page].  The messages
look like this;

[  ..........] Lots of bad pmd messages followed by this
[  127.164256] mm/memory.c:391: bad pmd ffff880412e04fe8(80000003de4000e7).
[  127.164257] mm/memory.c:391: bad pmd ffff880412e04ff0(80000003de6000e7).
[  127.164258] mm/memory.c:391: bad pmd ffff880412e04ff8(80000003de0000e7).
[  127.186778] ------------[ cut here ]------------
[  127.186781] kernel BUG at mm/filemap.c:134!
[  127.186782] invalid opcode: 0000 [#1] SMP
[  127.186783] CPU 7
[  127.186784] Modules linked in: af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf ext3 jbd dm_mod coretemp crc32c_intel usb_storage ghash_clmulni_intel aesni_intel i2c_i801 r8169 mii uas sr_mod cdrom sg iTCO_wdt iTCO_vendor_support shpchp serio_raw cryptd aes_x86_64 e1000e pci_hotplug dcdbas aes_generic container microcode ext4 mbcache jbd2 crc16 sd_mod crc_t10dif i915 drm_kms_helper drm i2c_algo_bit ehci_hcd ahci libahci usbcore rtc_cmos usb_common button i2c_core intel_agp video intel_gtt fan processor thermal thermal_sys hwmon ata_generic pata_atiixp libata scsi_mod
[  127.186801]
[  127.186802] Pid: 9017, comm: hugetlbfs-test Not tainted 3.4.0-autobuild #53 Dell Inc. OptiPlex 990/06D7TR
[  127.186804] RIP: 0010:[<ffffffff810ed6ce>]  [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
[  127.186809] RSP: 0000:ffff8804144b5c08  EFLAGS: 00010002
[  127.186810] RAX: 0000000000000001 RBX: ffffea000a5c9000 RCX: 00000000ffffffc0
[  127.186811] RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff88042dfdad00
[  127.186812] RBP: ffff8804144b5c18 R08: 0000000000000009 R09: 0000000000000003
[  127.186813] R10: 0000000000000000 R11: 000000000000002d R12: ffff880412ff83d8
[  127.186814] R13: ffff880412ff83d8 R14: 0000000000000000 R15: ffff880412ff83d8
[  127.186815] FS:  00007fe18ed2c700(0000) GS:ffff88042dce0000(0000) knlGS:0000000000000000
[  127.186816] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  127.186817] CR2: 00007fe340000503 CR3: 0000000417a14000 CR4: 00000000000407e0
[  127.186818] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  127.186819] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  127.186820] Process hugetlbfs-test (pid: 9017, threadinfo ffff8804144b4000, task ffff880417f803c0)
[  127.186821] Stack:
[  127.186822]  ffffea000a5c9000 0000000000000000 ffff8804144b5c48 ffffffff810ed83b
[  127.186824]  ffff8804144b5c48 000000000000138a 0000000000001387 ffff8804144b5c98
[  127.186825]  ffff8804144b5d48 ffffffff811bc925 ffff8804144b5cb8 0000000000000000
[  127.186827] Call Trace:
[  127.186829]  [<ffffffff810ed83b>] delete_from_page_cache+0x3b/0x80
[  127.186832]  [<ffffffff811bc925>] truncate_hugepages+0x115/0x220
[  127.186834]  [<ffffffff811bca43>] hugetlbfs_evict_inode+0x13/0x30
[  127.186837]  [<ffffffff811655c7>] evict+0xa7/0x1b0
[  127.186839]  [<ffffffff811657a3>] iput_final+0xd3/0x1f0
[  127.186840]  [<ffffffff811658f9>] iput+0x39/0x50
[  127.186842]  [<ffffffff81162708>] d_kill+0xf8/0x130
[  127.186843]  [<ffffffff81162812>] dput+0xd2/0x1a0
[  127.186845]  [<ffffffff8114e2d0>] __fput+0x170/0x230
[  127.186848]  [<ffffffff81236e0e>] ? rb_erase+0xce/0x150
[  127.186849]  [<ffffffff8114e3ad>] fput+0x1d/0x30
[  127.186851]  [<ffffffff81117db7>] remove_vma+0x37/0x80
[  127.186853]  [<ffffffff81119182>] do_munmap+0x2d2/0x360
[  127.186855]  [<ffffffff811cc639>] sys_shmdt+0xc9/0x170
[  127.186857]  [<ffffffff81410a39>] system_call_fastpath+0x16/0x1b
[  127.186858] Code: 0f 1f 44 00 00 48 8b 43 08 48 8b 00 48 8b 40 28 8b b0 40 03 00 00 85 f6 0f 88 df fe ff ff 48 89 df e8 e7 cb 05 00 e9 d2 fe ff ff <0f> 0b 55 83 e2 fd 48 89 e5 48 83 ec 30 48 89 5d d8 4c 89 65 e0
[  127.186868] RIP  [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
[  127.186870]  RSP <ffff8804144b5c08>
[  127.186871] ---[ end trace 7cbac5d1db69f426 ]---

The bug is a race and not always easy to reproduce.  To reproduce it I was
doing the following on a single socket I7-based machine with 16G of RAM.

$ hugeadm --pool-pages-max DEFAULT:13G
$ echo $((18*1048576*1024)) > /proc/sys/kernel/shmmax
$ echo $((18*1048576*1024)) > /proc/sys/kernel/shmall
$ for i in `seq 1 9000`; do ./hugetlbfs-test; done

On my particular machine, it usually triggers within 10 minutes but
enabling debug options can change the timing such that it never hits.
Once the bug is triggered, the machine is in trouble and needs to be
rebooted.  The machine will respond but processes accessing proc like "ps
aux" will hang due to the BUG_ON.  shutdown will also hang and needs a
hard reset or a sysrq-b.

The basic problem is a race between page table sharing and teardown.  For
the most part page table sharing depends on i_mmap_mutex.  In some cases,
it is also taking the mm->page_table_lock for the PTE updates but with
shared page tables, it is the i_mmap_mutex that is more important.

Unfortunately it appears to be also insufficient. Consider the following
situation

Process A					Process B
---------					---------
hugetlb_fault					shmdt
  						LockWrite(mmap_sem)
    						  do_munmap
						    unmap_region
						      unmap_vmas
						        unmap_single_vma
						          unmap_hugepage_range
      						            Lock(i_mmap_mutex)
							    Lock(mm->page_table_lock)
							    huge_pmd_unshare/unmap tables <--- (1)
							    Unlock(mm->page_table_lock)
      						            Unlock(i_mmap_mutex)
  huge_pte_alloc				      ...
    Lock(i_mmap_mutex)				      ...
    vma_prio_walk, find svma, spte		      ...
    Lock(mm->page_table_lock)			      ...
    share spte					      ...
    Unlock(mm->page_table_lock)			      ...
    Unlock(i_mmap_mutex)			      ...
  hugetlb_no_page									  <--- (2)
						      free_pgtables
						        unlink_file_vma
							hugetlb_free_pgd_range
						    remove_vma_list

In this scenario, it is possible for Process A to share page tables with
Process B that is trying to tear them down.  The i_mmap_mutex on its own
does not prevent Process A walking Process B's page tables.  At (1) above,
the page tables are not shared yet so it unmaps the PMDs.  Process A sets
up page table sharing and at (2) faults a new entry.  Process B then trips
up on it in free_pgtables.

This patch fixes the problem by adding a new function
__unmap_hugepage_range_final that is only called when the VMA is about to
be destroyed.  This function clears VM_MAYSHARE during
unmap_hugepage_range() under the i_mmap_mutex.  This makes the VMA
ineligible for sharing and avoids the race.  Superficially this looks like
it would then be vunerable to truncate and madvise issues but hugetlbfs
has its own truncate handlers so does not use unmap_mapping_range() and
does not support madvise(DONTNEED).

This should be treated as a -stable candidate if it is merged.

Test program is as follows. The test case was mostly written by Michal
Hocko with a few minor changes to reproduce this bug.

==== CUT HERE ====

static size_t huge_page_size = (2UL << 20);
static size_t nr_huge_page_A = 512;
static size_t nr_huge_page_B = 5632;

unsigned int get_random(unsigned int max)
{
	struct timeval tv;

	gettimeofday(&tv, NULL);
	srandom(tv.tv_usec);
	return random() % max;
}

static void play(void *addr, size_t size)
{
	unsigned char *start = addr,
		      *end = start + size,
		      *a;
	start += get_random(size/2);

	/* we could itterate on huge pages but let's give it more time. */
	for (a = start; a < end; a += 4096)
		*a = 0;
}

int main(int argc, char **argv)
{
	key_t key = IPC_PRIVATE;
	size_t sizeA = nr_huge_page_A * huge_page_size;
	size_t sizeB = nr_huge_page_B * huge_page_size;
	int shmidA, shmidB;
	void *addrA = NULL, *addrB = NULL;
	int nr_children = 300, n = 0;

	if ((shmidA = shmget(key, sizeA, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
		perror("shmget:");
		return 1;
	}

	if ((addrA = shmat(shmidA, addrA, SHM_R|SHM_W)) == (void *)-1UL) {
		perror("shmat");
		return 1;
	}
	if ((shmidB = shmget(key, sizeB, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
		perror("shmget:");
		return 1;
	}

	if ((addrB = shmat(shmidB, addrB, SHM_R|SHM_W)) == (void *)-1UL) {
		perror("shmat");
		return 1;
	}

fork_child:
	switch(fork()) {
		case 0:
			switch (n%3) {
			case 0:
				play(addrA, sizeA);
				break;
			case 1:
				play(addrB, sizeB);
				break;
			case 2:
				break;
			}
			break;
		case -1:
			perror("fork:");
			break;
		default:
			if (++n < nr_children)
				goto fork_child;
			play(addrA, sizeA);
			break;
	}
	shmdt(addrA);
	shmdt(addrB);
	do {
		wait(NULL);
	} while (--n > 0);
	shmctl(shmidA, IPC_RMID, NULL);
	shmctl(shmidB, IPC_RMID, NULL);
	return 0;
}

[akpm@linux-foundation.org: name the declaration's args, fix CONFIG_HUGETLBFS=n build]
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxime-poulain pushed a commit that referenced this pull request May 25, 2015
commit a2367db upstream.

With the new i.MX clock infrastructure we need to request the dma clocks
seperately: ahb and ipg clocks.

This fixes the following kernel crash and make audio to be functional again:

root@freescale /home$ aplay audio48k16S.wav
Playing WAVE 'audio48k16S.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c7b74000
[00000000] *pgd=a7bb5831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] PREEMPT ARM
Modules linked in:
CPU: 0    Not tainted  (3.5.0-rc5-next-20120702-00007-g3028b64 #1128)
PC is at snd_dmaengine_pcm_get_chan+0x8/0x10
LR is at snd_imx_pcm_hw_params+0x18/0xdc
pc : [<c02d3cf8>]    lr : [<c02e95ec>]    psr: a0000013
sp : c7b45e30  ip : ffffffff  fp : c7ae58e0
r10: 00000000  r9 : c7ae981c  r8 : c7b88800
r7 : c7ae5a60  r6 : c7ae5b20  r5 : c7ae9810  r4 : c7afa060
r3 : 00000000  r2 : 00000001  r1 : c7b88800  r0 : c7afa060
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: a7b74000  DAC: 00000015
Process aplay (pid: 701, stack limit = 0xc7b44270)
Stack: (0xc7b45e30 to 0xc7b46000)
5e20:                                     00100000 00000029 c7b88800 c02db870
5e40: c7ae5a60 c02d4594 00000010 01ae5a60 c7ae5a60 c7ae9810 c7ae9810 c7afa060
5e60: c7ae5b20 c7ae5a60 c7b88800 c02e3ef0 c02e3e08 c7b1e400 c7afa060 c7b88800
5e80: 00000000 c0014da8 c7b44000 00000000 bec566ac c02cd400 c7afa060 c7afa060
5ea0: bec56800 c7b88800 c0014da8 c02cdd7c c04ee710 c04ee7b8 00000003 c005fc74
5ec0: 00000000 7fffffff c7b45f00 c7afa060 c7b67420 c7ba3070 00000004 c0014da8
5ee0: c7b44000 00000000 bec566ac c02ced88 c04e95f8 b6f5ab04 c7b45fb0 0145a468
5f00: 0145a600 bec566bc bec56800 c7b67420 c7ba3070 c00d499c c7b45f18 c7b45f18
5f20: 0000001a 00000004 00000001 c7b44000 c0527f40 00000009 00000008 00000000
5f40: c7b44000 c002c9ec 00000001 c04f0ab0 c04ebec0 00000101 00000000 0000000a
5f60: 60000093 c7b67420 bec56800 c25c4111 00000004 c0014da8 c7b44000 00000000
5f80: bec566ac c00d4f38 b6ffb658 00000000 c0522d80 0145a468 b6fd5000 0145a418
5fa0: 00000036 c0014c00 0145a468 b6fd5000 00000004 c25c4111 bec56800 00020001
5fc0: 0145a468 b6fd5000 0145a418 00000036 0145a468 0145a600 bec566bc bec566ac
5fe0: 0145a468 bec56388 b6f65ce4 b6dcebec 20000010 00000004 00000000 00000000
[<c02d3cf8>] (snd_dmaengine_pcm_get_chan+0x8/0x10) from [<c02e95ec>] (snd_imx_pcm_hw_params+0x18/0xdc)
[<c02e95ec>] (snd_imx_pcm_hw_params+0x18/0xdc) from [<c02e3ef0>] (soc_pcm_hw_params+0xe8/0x1f0)
[<c02e3ef0>] (soc_pcm_hw_params+0xe8/0x1f0) from [<c02cd400>] (snd_pcm_hw_params+0x124/0x474)
[<c02cd400>] (snd_pcm_hw_params+0x124/0x474) from [<c02cdd7c>] (snd_pcm_common_ioctl1+0x4b4/0xf74)
[<c02cdd7c>] (snd_pcm_common_ioctl1+0x4b4/0xf74) from [<c02ced88>] (snd_pcm_playback_ioctl1+0x30/0x510)
[<c02ced88>] (snd_pcm_playback_ioctl1+0x30/0x510) from [<c00d499c>] (do_vfs_ioctl+0x80/0x5e4)
[<c00d499c>] (do_vfs_ioctl+0x80/0x5e4) from [<c00d4f38>] (sys_ioctl+0x38/0x60)
[<c00d4f38>] (sys_ioctl+0x38/0x60) from [<c0014c00>] (ret_fast_syscall+0x0/0x2c)
Code: e593000c e12fff1e e59030a0 e59330bc (e5930000)
---[ end trace fa518c8ba3a74e97 ]--

Reported-by: Javier Martin <javier.martin@vista-silicon.com>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Vinod Koul <vinod.koul@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants