Skip to content
Permalink
Gavin-Shan/KVM…
Switch branches/tags

Commits on Sep 16, 2022

  1. KVM: selftests: Automate choosing dirty ring size in dirty_log_test

    In the dirty ring case, we rely on vcpu exit due to full dirty ring
    state. On ARM64 system, there are 4096 host pages when the host
    page size is 64KB. In this case, the vcpu never exits due to the
    full dirty ring state. The similar case is 4KB page size on host
    and 64KB page size on guest. The vcpu corrupts same set of host
    pages, but the dirty page information isn't collected in the main
    thread. This leads to infinite loop as the following log shows.
    
      # ./dirty_log_test -M dirty-ring -c 65536 -m 5
      Setting log mode to: 'dirty-ring'
      Test iterations: 32, interval: 10 (ms)
      Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
      guest physical test memory offset: 0xffbffe0000
      vcpu stops because vcpu is kicked out...
      Notifying vcpu to continue
      vcpu continues now.
      Iteration 1 collected 576 pages
      <No more output afterwards>
    
    Fix the issue by automatically choosing the best dirty ring size,
    to ensure vcpu exit due to full dirty ring state. The option '-c'
    becomes a hint to the dirty ring count, instead of the value of it.
    
    Signed-off-by: Gavin Shan <gshan@redhat.com>
    Gavin Shan authored and intel-lab-lkp committed Sep 16, 2022
  2. KVM: selftests: Clear dirty ring states between two modes in dirty_lo…

    …g_test
    
    There are two states, which need to be cleared before next mode
    is executed. Otherwise, we will hit failure as the following messages
    indicate.
    
    - The variable 'dirty_ring_vcpu_ring_full' shared by main and vcpu
      thread. It's indicating if the vcpu exit due to full ring buffer.
      The value can be carried from previous mode (VM_MODE_P40V48_4K) to
      current one (VM_MODE_P40V48_64K) when VM_MODE_P40V48_16K isn't
      supported.
    
    - The current ring buffer index needs to be reset before next mode
      (VM_MODE_P40V48_64K) is executed. Otherwise, the stale value is
      carried from previous mode (VM_MODE_P40V48_4K).
    
      # ./dirty_log_test -M dirty-ring
      Setting log mode to: 'dirty-ring'
      Test iterations: 32, interval: 10 (ms)
      Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
      guest physical test memory offset: 0xffbfffc000
        :
      Dirtied 995328 pages
      Total bits checked: dirty (1012434), clear (7114123), track_next (966700)
      Testing guest mode: PA-bits:40,  VA-bits:48, 64K pages
      guest physical test memory offset: 0xffbffc0000
      vcpu stops because vcpu is kicked out...
      vcpu continues now.
      Notifying vcpu to continue
      Iteration 1 collected 0 pages
      vcpu stops because dirty ring is full...
      vcpu continues now.
      vcpu stops because dirty ring is full...
      vcpu continues now.
      vcpu stops because dirty ring is full...
      ==== Test Assertion Failure ====
      dirty_log_test.c:369: cleared == count
      pid=10541 tid=10541 errno=22 - Invalid argument
         1	0x0000000000403087: dirty_ring_collect_dirty_pages at dirty_log_test.c:369
         2	0x0000000000402a0b: log_mode_collect_dirty_pages at dirty_log_test.c:492
         3	 (inlined by) run_test at dirty_log_test.c:795
         4	 (inlined by) run_test at dirty_log_test.c:705
         5	0x0000000000403a37: for_each_guest_mode at guest_modes.c:100
         6	0x0000000000401ccf: main at dirty_log_test.c:938
         7	0x0000ffff9ecd279b: ?? ??:0
         8	0x0000ffff9ecd286b: ?? ??:0
         9	0x0000000000401def: _start at ??:?
      Reset dirty pages (0) mismatch with collected (35566)
    
    Fix the issues by clearing 'dirty_ring_vcpu_ring_full' and the ring
    buffer index before next new mode is to be executed.
    
    Signed-off-by: Gavin Shan <gshan@redhat.com>
    Gavin Shan authored and intel-lab-lkp committed Sep 16, 2022
  3. KVM: selftests: Use host page size to map ring buffer in dirty_log_test

    In vcpu_map_dirty_ring(), the guest's page size is used to figure out
    the offset in the virtual area. It works fine when we have same page
    sizes on host and guest. However, it fails when the page sizes on host
    and guest are different on arm64, like below error messages indicates.
    
      # ./dirty_log_test -M dirty-ring -m 7
      Setting log mode to: 'dirty-ring'
      Test iterations: 32, interval: 10 (ms)
      Testing guest mode: PA-bits:40,  VA-bits:48, 64K pages
      guest physical test memory offset: 0xffbffc0000
      vcpu stops because vcpu is kicked out...
      Notifying vcpu to continue
      vcpu continues now.
      ==== Test Assertion Failure ====
      lib/kvm_util.c:1477: addr == MAP_FAILED
      pid=9000 tid=9000 errno=0 - Success
      1  0x0000000000405f5b: vcpu_map_dirty_ring at kvm_util.c:1477
      2  0x0000000000402ebb: dirty_ring_collect_dirty_pages at dirty_log_test.c:349
      3  0x00000000004029b3: log_mode_collect_dirty_pages at dirty_log_test.c:478
      4  (inlined by) run_test at dirty_log_test.c:778
      5  (inlined by) run_test at dirty_log_test.c:691
      6  0x0000000000403a57: for_each_guest_mode at guest_modes.c:105
      7  0x0000000000401ccf: main at dirty_log_test.c:921
      8  0x0000ffffb06ec79b: ?? ??:0
      9  0x0000ffffb06ec86b: ?? ??:0
      10 0x0000000000401def: _start at ??:?
      Dirty ring mapped private
    
    Fix the issue by using host's page size to map the ring buffer.
    
    Signed-off-by: Gavin Shan <gshan@redhat.com>
    Gavin Shan authored and intel-lab-lkp committed Sep 16, 2022
  4. KVM: arm64: Enable ring-based dirty memory tracking

    This enables the ring-based dirty memory tracking on ARM64. The
    feature is configured by CONFIG_HAVE_KVM_DIRTY_RING, detected and
    enabled by KVM_CAP_DIRTY_LOG_RING. A ring buffer is created on every
    VCPU when the feature is enabled. Each entry in the ring buffer is
    described by 'struct kvm_dirty_gfn'.
    
    A ring buffer entry is pushed when a page becomes dirty on host,
    and pulled by userspace after the ring buffer is mapped at physical
    page offset KVM_DIRTY_LOG_PAGE_OFFSET. The specific VCPU is enforced
    to exit if its ring buffer becomes softly full. Besides, the ring
    buffer can be reset by ioctl command KVM_RESET_DIRTY_RINGS to release
    those pulled ring buffer entries.
    
    Signed-off-by: Gavin Shan <gshan@redhat.com>
    Gavin Shan authored and intel-lab-lkp committed Sep 16, 2022
  5. KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL

    This adds KVM_REQ_RING_SOFT_FULL, which is raised when the dirty
    ring of the specific VCPU becomes softly full in kvm_dirty_ring_push().
    The VCPU is enforced to exit when the request is raised and its
    dirty ring is softly full on its entrance.
    
    Suggested-by: Marc Zyngier <maz@kernel.org>
    Signed-off-by: Gavin Shan <gshan@redhat.com>
    Gavin Shan authored and intel-lab-lkp committed Sep 16, 2022

Commits on Aug 19, 2022

  1. KVM: selftests: Fix ambiguous mov in KVM_ASM_SAFE()

    Change the mov in KVM_ASM_SAFE() that zeroes @vector to a movb to
    make it unambiguous.
    
    This fixes a build failure with Clang since, unlike the GNU assembler,
    the LLVM integrated assembler rejects ambiguous X86 instructions that
    don't have suffixes:
    
      In file included from x86_64/hyperv_features.c:13:
      include/x86_64/processor.h:825:9: error: ambiguous instructions require an explicit suffix (could be 'movb', 'movw', 'movl', or 'movq')
              return kvm_asm_safe("wrmsr", "a"(val & -1u), "d"(val >> 32), "c"(msr));
                     ^
      include/x86_64/processor.h:802:15: note: expanded from macro 'kvm_asm_safe'
              asm volatile(KVM_ASM_SAFE(insn)                 \
                           ^
      include/x86_64/processor.h:788:16: note: expanded from macro 'KVM_ASM_SAFE'
              "1: " insn "\n\t"                                       \
                            ^
      <inline asm>:5:2: note: instantiated into assembly here
              mov $0, 15(%rsp)
              ^
    
    It seems like this change could introduce undesirable behavior in the
    future, e.g. if someone used a type larger than a u8 for @vector, since
    KVM_ASM_SAFE() will only zero the bottom byte. I tried changing the type
    of @vector to an int to see what would happen. GCC failed to compile due
    to a size mismatch between `movb` and `%eax`. Clang succeeded in
    compiling, but the generated code looked correct, so perhaps it will not
    be an issue. That being said it seems like there could be a better
    solution to this issue that does not assume @vector is a u8.
    
    Fixes: 3b23054 ("KVM: selftests: Add x86-64 support for exception fixup")
    Signed-off-by: David Matlack <dmatlack@google.com>
    Reviewed-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220722234838.2160385-3-dmatlack@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    dmatlack authored and bonzini committed Aug 19, 2022
  2. KVM: selftests: Fix KVM_EXCEPTION_MAGIC build with Clang

    Change KVM_EXCEPTION_MAGIC to use the all-caps "ULL", rather than lower
    case. This fixes a build failure with Clang:
    
      In file included from x86_64/hyperv_features.c:13:
      include/x86_64/processor.h:825:9: error: unexpected token in argument list
              return kvm_asm_safe("wrmsr", "a"(val & -1u), "d"(val >> 32), "c"(msr));
                     ^
      include/x86_64/processor.h:802:15: note: expanded from macro 'kvm_asm_safe'
              asm volatile(KVM_ASM_SAFE(insn)                 \
                           ^
      include/x86_64/processor.h:785:2: note: expanded from macro 'KVM_ASM_SAFE'
              "mov $" __stringify(KVM_EXCEPTION_MAGIC) ", %%r9\n\t"   \
              ^
      <inline asm>:1:18: note: instantiated into assembly here
              mov $0xabacadabaull, %r9
                              ^
    
    Fixes: 3b23054 ("KVM: selftests: Add x86-64 support for exception fixup")
    Signed-off-by: David Matlack <dmatlack@google.com>
    Reviewed-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220722234838.2160385-2-dmatlack@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    dmatlack authored and bonzini committed Aug 19, 2022
  3. KVM: VMX: Heed the 'msr' argument in msr_write_intercepted()

    Regardless of the 'msr' argument passed to the VMX version of
    msr_write_intercepted(), the function always checks to see if a
    specific MSR (IA32_SPEC_CTRL) is intercepted for write.  This behavior
    seems unintentional and unexpected.
    
    Modify the function so that it checks to see if the provided 'msr'
    index is intercepted for write.
    
    Fixes: 67f4b99 ("KVM: nVMX: Handle dynamic MSR intercept toggling")
    Cc: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Jim Mattson <jmattson@google.com>
    Reviewed-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220810213050.2655000-1-jmattson@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    jsmattsonjr authored and bonzini committed Aug 19, 2022
  4. kvm: x86: mmu: Always flush TLBs when enabling dirty logging

    When A/D bits are not available, KVM uses a software access tracking
    mechanism, which involves making the SPTEs inaccessible. However,
    the clear_young() MMU notifier does not flush TLBs. So it is possible
    that there may still be stale, potentially writable, TLB entries.
    This is usually fine, but can be problematic when enabling dirty
    logging, because it currently only does a TLB flush if any SPTEs were
    modified. But if all SPTEs are in access-tracked state, then there
    won't be a TLB flush, which means that the guest could still possibly
    write to memory and not have it reflected in the dirty bitmap.
    
    So just unconditionally flush the TLBs when enabling dirty logging.
    As an alternative, KVM could explicitly check the MMU-Writable bit when
    write-protecting SPTEs to decide if a flush is needed (instead of
    checking the Writable bit), but given that a flush almost always happens
    anyway, so just making it unconditional seems simpler.
    
    Signed-off-by: Junaid Shahid <junaids@google.com>
    Message-Id: <20220810224939.2611160-1-junaids@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Junaid Shahid authored and bonzini committed Aug 19, 2022
  5. kvm: x86: mmu: Drop the need_remote_flush() function

    This is only used by kvm_mmu_pte_write(), which no longer actually
    creates the new SPTE and instead just clears the old SPTE. So we
    just need to check if the old SPTE was shadow-present instead of
    calling need_remote_flush(). Hence we can drop this function. It was
    incomplete anyway as it didn't take access-tracking into account.
    
    This patch should not result in any functional change.
    
    Signed-off-by: Junaid Shahid <junaids@google.com>
    Reviewed-by: David Matlack <dmatlack@google.com>
    Reviewed-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220723024316.2725328-1-junaids@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Junaid Shahid authored and bonzini committed Aug 19, 2022
  6. Merge tag 'kvmarm-fixes-6.0-1' of git://git.kernel.org/pub/scm/linux/…

    …kernel/git/kvmarm/kvmarm into HEAD
    
    KVM/arm64 fixes for 6.0, take #1
    
    - Fix unexpected sign extension of KVM_ARM_DEVICE_ID_MASK
    
    - Tidy-up handling of AArch32 on asymmetric systems
    bonzini committed Aug 19, 2022
  7. KVM: Drop unnecessary initialization of "ops" in kvm_ioctl_create_dev…

    …ice()
    
    The variable is initialized but it is only used after its assignment.
    
    Reviewed-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Li kunyu <kunyu@nfschina.com>
    Message-Id: <20220819021535.483702-1-kunyu@nfschina.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    likunyur authored and bonzini committed Aug 19, 2022
  8. KVM: Drop unnecessary initialization of "npages" in hva_to_pfn_slow()

    The variable is initialized but it is only used after its assignment.
    
    Reviewed-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Li kunyu <kunyu@nfschina.com>
    Message-Id: <20220819022804.483914-1-kunyu@nfschina.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    likunyur authored and bonzini committed Aug 19, 2022
  9. x86/kvm: Fix "missing ENDBR" BUG for fastop functions

    The following BUG was reported:
    
      traps: Missing ENDBR: andw_ax_dx+0x0/0x10 [kvm]
      ------------[ cut here ]------------
      kernel BUG at arch/x86/kernel/traps.c:253!
      invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
       <TASK>
       asm_exc_control_protection+0x2b/0x30
      RIP: 0010:andw_ax_dx+0x0/0x10 [kvm]
      Code: c3 cc cc cc cc 0f 1f 44 00 00 66 0f 1f 00 48 19 d0 c3 cc cc cc
            cc 0f 1f 40 00 f3 0f 1e fa 20 d0 c3 cc cc cc cc 0f 1f 44 00 00
            <66> 0f 1f 00 66 21 d0 c3 cc cc cc cc 0f 1f 40 00 66 0f 1f 00 21
            d0
    
       ? andb_al_dl+0x10/0x10 [kvm]
       ? fastop+0x5d/0xa0 [kvm]
       x86_emulate_insn+0x822/0x1060 [kvm]
       x86_emulate_instruction+0x46f/0x750 [kvm]
       complete_emulated_mmio+0x216/0x2c0 [kvm]
       kvm_arch_vcpu_ioctl_run+0x604/0x650 [kvm]
       kvm_vcpu_ioctl+0x2f4/0x6b0 [kvm]
       ? wake_up_q+0xa0/0xa0
    
    The BUG occurred because the ENDBR in the andw_ax_dx() fastop function
    had been incorrectly "sealed" (converted to a NOP) by apply_ibt_endbr().
    
    Objtool marked it to be sealed because KVM has no compile-time
    references to the function.  Instead KVM calculates its address at
    runtime.
    
    Prevent objtool from annotating fastop functions as sealable by creating
    throwaway dummy compile-time references to the functions.
    
    Fixes: 6649fa8 ("x86/ibt,kvm: Add ENDBR to fastops")
    Reported-by: Pengfei Xu <pengfei.xu@intel.com>
    Debugged-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Message-Id: <0d4116f90e9d0c1b754bb90c585e6f0415a1c508.1660837839.git.jpoimboe@kernel.org>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Josh Poimboeuf authored and bonzini committed Aug 19, 2022
  10. x86/kvm: Simplify FOP_SETCC()

    SETCC_ALIGN and FOP_ALIGN are both 16.  Remove the special casing for
    FOP_SETCC() and just make it a normal fastop.
    
    Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Message-Id: <7c13d94d1a775156f7e36eed30509b274a229140.1660837839.git.jpoimboe@kernel.org>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Josh Poimboeuf authored and bonzini committed Aug 19, 2022
  11. x86/ibt, objtool: Add IBT_NOSEAL()

    Add a macro which prevents a function from getting sealed if there are
    no compile-time references to it.
    
    Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Message-Id: <20220818213927.e44fmxkoq4yj6ybn@treble>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Josh Poimboeuf authored and bonzini committed Aug 19, 2022
  12. KVM: Rename mmu_notifier_* to mmu_invalidate_*

    The motivation of this renaming is to make these variables and related
    helper functions less mmu_notifier bound and can also be used for non
    mmu_notifier based page invalidation. mmu_invalidate_* was chosen to
    better describe the purpose of 'invalidating' a page that those
    variables are used for.
    
      - mmu_notifier_seq/range_start/range_end are renamed to
        mmu_invalidate_seq/range_start/range_end.
    
      - mmu_notifier_retry{_hva} helper functions are renamed to
        mmu_invalidate_retry{_hva}.
    
      - mmu_notifier_count is renamed to mmu_invalidate_in_progress to
        avoid confusion with mn_active_invalidate_count.
    
      - While here, also update kvm_inc/dec_notifier_count() to
        kvm_mmu_invalidate_begin/end() to match the change for
        mmu_notifier_count.
    
    No functional change intended.
    
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    Message-Id: <20220816125322.1110439-3-chao.p.peng@linux.intel.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    chao-p authored and bonzini committed Aug 19, 2022
  13. KVM: Rename KVM_PRIVATE_MEM_SLOTS to KVM_INTERNAL_MEM_SLOTS

    KVM_INTERNAL_MEM_SLOTS better reflects the fact those slots are KVM
    internally used (invisible to userspace) and avoids confusion to future
    private slots that can have different meaning.
    
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    Message-Id: <20220816125322.1110439-2-chao.p.peng@linux.intel.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    chao-p authored and bonzini committed Aug 19, 2022
  14. KVM: MIPS: remove unnecessary definition of KVM_PRIVATE_MEM_SLOTS

    KVM_PRIVATE_MEM_SLOTS defaults to zero, so it is not necessary to
    define it in MIPS's asm/kvm_host.h.
    
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    bonzini committed Aug 19, 2022
  15. KVM: Move coalesced MMIO initialization (back) into kvm_create_vm()

    Invoke kvm_coalesced_mmio_init() from kvm_create_vm() now that allocating
    and initializing coalesced MMIO objects is separate from registering any
    associated devices.  Moving coalesced MMIO cleans up the last oddity
    where KVM does VM creation/initialization after kvm_create_vm(), and more
    importantly after kvm_arch_post_init_vm() is called and the VM is added
    to the global vm_list, i.e. after the VM is fully created as far as KVM
    is concerned.
    
    Originally, kvm_coalesced_mmio_init() was called by kvm_create_vm(), but
    the original implementation was completely devoid of error handling.
    Commit 6ce5a09 ("KVM: coalesced_mmio: fix kvm_coalesced_mmio_init()'s
    error handling" fixed the various bugs, and in doing so rightly moved the
    call to after kvm_create_vm() because kvm_coalesced_mmio_init() also
    registered the coalesced MMIO device.  Commit 2b3c246 ("KVM: Make
    coalesced mmio use a device per zone") cleaned up that mess by having
    each zone register a separate device, i.e. moved device registration to
    its logical home in kvm_vm_ioctl_register_coalesced_mmio().  As a result,
    kvm_coalesced_mmio_init() is now a "pure" initialization helper and can
    be safely called from kvm_create_vm().
    
    Opportunstically drop the #ifdef, KVM provides stubs for
    kvm_coalesced_mmio_{init,free}() when CONFIG_KVM_MMIO=n (s390).
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220816053937.2477106-4-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    sean-jc authored and bonzini committed Aug 19, 2022
  16. KVM: Unconditionally get a ref to /dev/kvm module when creating a VM

    Unconditionally get a reference to the /dev/kvm module when creating a VM
    instead of using try_get_module(), which will fail if the module is in
    the process of being forcefully unloaded.  The error handling when
    try_get_module() fails doesn't properly unwind all that has been done,
    e.g. doesn't call kvm_arch_pre_destroy_vm() and doesn't remove the VM
    from the global list.  Not removing VMs from the global list tends to be
    fatal, e.g. leads to use-after-free explosions.
    
    The obvious alternative would be to add proper unwinding, but the
    justification for using try_get_module(), "rmmod --wait", is completely
    bogus as support for "rmmod --wait", i.e. delete_module() without
    O_NONBLOCK, was removed by commit 3f2b9c9 ("module: remove rmmod
    --wait option.") nearly a decade ago.
    
    It's still possible for try_get_module() to fail due to the module dying
    (more like being killed), as the module will be tagged MODULE_STATE_GOING
    by "rmmod --force", i.e. delete_module(..., O_TRUNC), but playing nice
    with forced unloading is an exercise in futility and gives a falsea sense
    of security.  Using try_get_module() only prevents acquiring _new_
    references, it doesn't magically put the references held by other VMs,
    and forced unloading doesn't wait, i.e. "rmmod --force" on KVM is all but
    guaranteed to cause spectacular fireworks; the window where KVM will fail
    try_get_module() is tiny compared to the window where KVM is building and
    running the VM with an elevated module refcount.
    
    Addressing KVM's inability to play nice with "rmmod --force" is firmly
    out-of-scope.  Forcefully unloading any module taints kernel (for obvious
    reasons)  _and_ requires the kernel to be built with
    CONFIG_MODULE_FORCE_UNLOAD=y, which is off by default and comes with the
    amusing disclaimer that it's "mainly for kernel developers and desperate
    users".  In other words, KVM is free to scoff at bug reports due to using
    "rmmod --force" while VMs may be running.
    
    Fixes: 5f6de5c ("KVM: Prevent module exit until all VMs are freed")
    Cc: stable@vger.kernel.org
    Cc: David Matlack <dmatlack@google.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220816053937.2477106-3-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    sean-jc authored and bonzini committed Aug 19, 2022
  17. KVM: Properly unwind VM creation if creating debugfs fails

    Properly unwind VM creation if kvm_create_vm_debugfs() fails.  A recent
    change to invoke kvm_create_vm_debug() in kvm_create_vm() was led astray
    by buggy try_get_module() handling adding by commit 5f6de5c ("KVM:
    Prevent module exit until all VMs are freed").  The debugfs error path
    effectively inherits the bad error path of try_module_get(), e.g. KVM
    leaves the to-be-free VM on vm_list even though KVM appears to do the
    right thing by calling module_put() and falling through.
    
    Opportunistically hoist kvm_create_vm_debugfs() above the call to
    kvm_arch_post_init_vm() so that the "post-init" arch hook is actually
    invoked after the VM is initialized (ignoring kvm_coalesced_mmio_init()
    for the moment).  x86 is the only non-nop implementation of the post-init
    hook, and it doesn't allocate/initialize any objects that are reachable
    via debugfs code (spawns a kthread worker for the NX huge page mitigation).
    
    Leave the buggy try_get_module() alone for now, it will be fixed in a
    separate commit.
    
    Fixes: b74ed7a ("KVM: Actually create debugfs in kvm_create_vm()")
    Reported-by: syzbot+744e173caec2e1627ee0@syzkaller.appspotmail.com
    Cc: Oliver Upton <oliver.upton@linux.dev>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
    Message-Id: <20220816053937.2477106-2-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    sean-jc authored and bonzini committed Aug 19, 2022

Commits on Aug 17, 2022

  1. KVM: arm64: Reject 32bit user PSTATE on asymmetric systems

    KVM does not support AArch32 EL0 on asymmetric systems. To that end,
    prevent userspace from configuring a vCPU in such a state through
    setting PSTATE.
    
    It is already ABI that KVM rejects such a write on a system where
    AArch32 EL0 is unsupported. Though the kernel's definition of a 32bit
    system changed in commit 2122a83 ("arm64: Allow mismatched
    32-bit EL0 support"), KVM's did not.
    
    Fixes: 2122a83 ("arm64: Allow mismatched 32-bit EL0 support")
    Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
    Signed-off-by: Marc Zyngier <maz@kernel.org>
    Link: https://lore.kernel.org/r/20220816192554.1455559-3-oliver.upton@linux.dev
    Oliver Upton authored and Marc Zyngier committed Aug 17, 2022
  2. KVM: arm64: Treat PMCR_EL1.LC as RES1 on asymmetric systems

    KVM does not support AArch32 on asymmetric systems. To that end, enforce
    AArch64-only behavior on PMCR_EL1.LC when on an asymmetric system.
    
    Fixes: 2122a83 ("arm64: Allow mismatched 32-bit EL0 support")
    Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
    Signed-off-by: Marc Zyngier <maz@kernel.org>
    Link: https://lore.kernel.org/r/20220816192554.1455559-2-oliver.upton@linux.dev
    Oliver Upton authored and Marc Zyngier committed Aug 17, 2022

Commits on Aug 14, 2022

  1. Linux 6.0-rc1

    torvalds committed Aug 14, 2022
  2. radix-tree: replace gfp.h inclusion with gfp_types.h

    Radix tree header includes gfp.h for __GFP_BITS_SHIFT only. Now we
    have gfp_types.h for this.
    
    Fixes powerpc allmodconfig build:
    
       In file included from include/linux/nodemask.h:97,
                        from include/linux/mmzone.h:17,
                        from include/linux/gfp.h:7,
                        from include/linux/radix-tree.h:12,
                        from include/linux/idr.h:15,
                        from include/linux/kernfs.h:12,
                        from include/linux/sysfs.h:16,
                        from include/linux/kobject.h:20,
                        from include/linux/pci.h:35,
                        from arch/powerpc/kernel/prom_init.c:24:
       include/linux/random.h: In function 'add_latent_entropy':
    >> include/linux/random.h:25:46: error: 'latent_entropy' undeclared (first use in this function); did you mean 'add_latent_entropy'?
          25 |         add_device_randomness((const void *)&latent_entropy, sizeof(latent_entropy));
             |                                              ^~~~~~~~~~~~~~
             |                                              add_latent_entropy
       include/linux/random.h:25:46: note: each undeclared identifier is reported only once for each function it appears in
    
    Reported-by: kernel test robot <lkp@intel.com>
    CC: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    CC: Andrew Morton <akpm@linux-foundation.org>
    CC: Jason A. Donenfeld <Jason@zx2c4.com>
    Signed-off-by: Yury Norov <yury.norov@gmail.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    YuryNorov authored and torvalds committed Aug 14, 2022
  3. Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/g…

    …it/viro/vfs
    
    Pull vfs lseek fix from Al Viro:
     "Fix proc_reg_llseek() breakage. Always had been possible if somebody
      left NULL ->proc_lseek, became a practical issue now"
    
    * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
      take care to handle NULL ->proc_lseek()
    torvalds committed Aug 14, 2022
  4. take care to handle NULL ->proc_lseek()

    Easily done now, just by clearing FMODE_LSEEK in ->f_mode
    during proc_reg_open() for such entries.
    
    Fixes: 868941b "fs: remove no_llseek"
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Al Viro committed Aug 14, 2022
  5. Merge tag 'for-linus-6.0-rc1b-tag' of git://git.kernel.org/pub/scm/li…

    …nux/kernel/git/xen/tip
    
    Pull more xen updates from Juergen Gross:
    
     - fix the handling of the "persistent grants" feature negotiation
       between Xen blkfront and Xen blkback drivers
    
     - a cleanup of xen.config and adding xen.config to Xen section in
       MAINTAINERS
    
     - support HVMOP_set_evtchn_upcall_vector, which is more compliant to
       "normal" interrupt handling than the global callback used up to now
    
     - further small cleanups
    
    * tag 'for-linus-6.0-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
      MAINTAINERS: add xen config fragments to XEN HYPERVISOR sections
      xen: remove XEN_SCRUB_PAGES in xen.config
      xen/pciback: Fix comment typo
      xen/xenbus: fix return type in xenbus_file_read()
      xen-blkfront: Apply 'feature_persistent' parameter when connect
      xen-blkback: Apply 'feature_persistent' parameter when connect
      xen-blkback: fix persistent grants negotiation
      x86/xen: Add support for HVMOP_set_evtchn_upcall_vector
    torvalds committed Aug 14, 2022
  6. Merge tag 'perf-tools-fixes-for-v6.0-2022-08-13' of git://git.kernel.…

    …org/pub/scm/linux/kernel/git/acme/linux
    
    Pull more perf tool updates from Arnaldo Carvalho de Melo:
    
     - 'perf c2c' now supports ARM64, adjust its output to cope with
       differences with what is in x86_64. Now go find false sharing on
       ARM64 (at least Neoverse) as well!
    
     - Refactor the JSON processing, making the output more compact and thus
       reducing the size of the resulting perf binary
    
     - Improvements for 'perf offcpu' profiling, including tracking child
       processes
    
     - Update Intel JSON metrics and events files for broadwellde,
       broadwellx, cascadelakex, haswellx, icelakex, ivytown, jaketown,
       knightslanding, sapphirerapids, skylakex and snowridgex
    
     - Add 'perf stat' JSON output and a 'perf test' entry for it
    
     - Ignore memfd and anonymous mmap events if jitdump present
    
     - Refactor 'perf test' shell tests allowing subdirs
    
     - Fix an error handling path in 'parse_perf_probe_command()'
    
     - Fixes for the guest Intel PT tracing patchkit in the 1st batch of
       this merge window
    
     - Print debuginfod queries if -v option is used, to explain delays in
       processing when debuginfo servers are enabled to fetch DSOs with
       richer symbol tables
    
     - Improve error message for 'perf record -p not_existing_pid'
    
     - Fix openssl and libbpf feature detection
    
     - Add PMU pai_crypto event description for IBM z16 on 'perf list'
    
     - Fix typos and duplicated words on comments in various places
    
    * tag 'perf-tools-fixes-for-v6.0-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (81 commits)
      perf test: Refactor shell tests allowing subdirs
      perf vendor events: Update events for snowridgex
      perf vendor events: Update events and metrics for skylakex
      perf vendor events: Update metrics for sapphirerapids
      perf vendor events: Update events for knightslanding
      perf vendor events: Update metrics for jaketown
      perf vendor events: Update metrics for ivytown
      perf vendor events: Update events and metrics for icelakex
      perf vendor events: Update events and metrics for haswellx
      perf vendor events: Update events and metrics for cascadelakex
      perf vendor events: Update events and metrics for broadwellx
      perf vendor events: Update metrics for broadwellde
      perf jevents: Fold strings optimization
      perf jevents: Compress the pmu_events_table
      perf metrics: Copy entire pmu_event in find metric
      perf pmu-events: Hide the pmu_events
      perf pmu-events: Don't assume pmu_event is an array
      perf pmu-events: Move test events/metrics to JSON
      perf test: Use full metric resolution
      perf pmu-events: Hide pmu_events_map
      ...
    torvalds committed Aug 14, 2022
  7. Merge tag 'powerpc-6.0-2' of git://git.kernel.org/pub/scm/linux/kerne…

    …l/git/powerpc/linux
    
    Pull powerpc fixes from Michael Ellerman:
    
     - Ensure we never emit lwarx with EH=1 on 32-bit, because some 32-bit
       CPUs trap on it rather than ignoring it as they should.
    
     - Fix ftrace when building with clang, which was broken by some
       refactoring.
    
     - A couple of other minor fixes.
    
    Thanks to Christophe Leroy, Naveen N.  Rao, Nick Desaulniers, Ondrej
    Mosnacek, Pali Rohár, Russell Currey, and Segher Boessenkool.
    
    * tag 'powerpc-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
      powerpc/kexec: Fix build failure from uninitialised variable
      powerpc/ppc-opcode: Fix PPC_RAW_TW()
      powerpc64/ftrace: Fix ftrace for clang builds
      powerpc: Make eh value more explicit when using lwarx
      powerpc: Don't hide eh field of lwarx behind a macro
      powerpc: Fix eh field when calling lwarx on PPC32
    torvalds committed Aug 14, 2022
  8. Merge tag 'pull-work.misc' of git://git.kernel.org/pub/scm/linux/kern…

    …el/git/viro/vfs
    
    Pull /proc/mounts fix from Al Viro:
     "Fix for /proc/mounts escaping - escape the '#' character too"
    
    * tag 'pull-work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
      vfs: escape hash as well
    torvalds committed Aug 14, 2022
  9. Merge tag '5.20-rc-smb3-client-fixes-part2' of git://git.samba.org/sf…

    …rench/cifs-2.6
    
    Pull more cifs updates from Steve French:
    
     - two fixes for stable, one for a lock length miscalculation, and
       another fixes a lease break timeout bug
    
     - improvement to handle leases, allows the close timeout to be
       configured more safely
    
     - five restructuring/cleanup patches
    
    * tag '5.20-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
      cifs: Do not access tcon->cfids->cfid directly from is_path_accessible
      cifs: Add constructor/destructors for tcon->cfid
      SMB3: fix lease break timeout when multiple deferred close handles for the same file.
      smb3: allow deferred close timeout to be configurable
      cifs: Do not use tcon->cfid directly, use the cfid we get from open_cached_dir
      cifs: Move cached-dir functions into a separate file
      cifs: Remove {cifs,nfs}_fscache_release_page()
      cifs: fix lock length calculation
    torvalds committed Aug 14, 2022
  10. afs: Enable multipage folio support

    Enable multipage folio support for the afs filesystem.
    
    Support has already been implemented in netfslib, fscache and cachefiles
    and in most of afs, but I've waited for Matthew Wilcox's latest folio
    changes.
    
    Note that it does require a change to afs_write_begin() to return the
    correct subpage.  This is a "temporary" change as we're working on
    getting rid of the need for ->write_begin() and ->write_end()
    completely, at least as far as network filesystems are concerned - but
    it doesn't prevent afs from making use of the capability.
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Tested-by: kafs-testing@auristor.com
    Cc: Marc Dionne <marc.dionne@auristor.com>
    Cc: linux-afs@lists.infradead.org
    Link: https://lore.kernel.org/lkml/2274528.1645833226@warthog.procyon.org.uk/
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    dhowells authored and torvalds committed Aug 14, 2022

Commits on Aug 13, 2022

  1. Merge tag 'timers-urgent-2022-08-13' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/tip/tip
    
    Pull timer fixes from Ingo Molnar:
     "Misc timer fixes:
    
       - fix a potential use-after-free bug in posix timers
    
       - correct a prototype
    
       - address a build warning"
    
    * tag 'timers-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      posix-cpu-timers: Cleanup CPU timers before freeing them during exec
      time: Correct the prototype of ns_to_kernel_old_timeval and ns_to_timespec64
      posix-timers: Make do_clock_gettime() static
    torvalds committed Aug 13, 2022
Older