Skip to content
Permalink
Tree: d34f9519da
Commits on Apr 17, 2019
  1. vhost, kcov: annotate vhost_worker

    xairy committed Jan 17, 2019
    Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
  2. usb, kcov: annotate hub_event

    xairy committed Jan 17, 2019
    Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
  3. kcov: remote coverage support

    xairy committed Jan 17, 2019
    Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Commits on Apr 16, 2019
  1. Merge tag 'riscv-for-linus-5.1-rc6' of git://git.kernel.org/pub/scm/l…

    torvalds committed Apr 16, 2019
    …inux/kernel/git/palmer/riscv-linux
    
    Pull RISC-V fixes from Palmer Dabbelt:
     "This contains an assortment of RISC-V-related fixups that we found
      after rc4. They're all really unrelated:
    
       - The addition of a 32-bit defconfig, to emphasize testing the 32-bit
         port.
    
       - A device tree bindings patch, which is pre-work for some patches
         that target 5.2.
    
       - A fix to support booting on systems with more physical memory than
         the maximum supported by the kernel"
    
    * tag 'riscv-for-linus-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
      RISC-V: Fix Maximum Physical Memory 2GiB option for 64bit systems
      dt-bindings: clock: sifive: add FU540-C000 PRCI clock constants
      RISC-V: Add separate defconfig for 32bit systems
  2. Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

    torvalds committed Apr 16, 2019
    Pull KVM fixes from Paolo Bonzini:
     "5.1 keeps its reputation as a big bugfix release for KVM x86.
    
       - Fix for a memory leak introduced during the merge window
    
       - Fixes for nested VMX with ept=0
    
       - Fixes for AMD (APIC virtualization, NMI injection)
    
       - Fixes for Hyper-V under KVM and KVM under Hyper-V
    
       - Fixes for 32-bit SMM and tests for SMM virtualization
    
       - More array_index_nospec peppering"
    
    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
      KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in tracing
      KVM: fix spectrev1 gadgets
      KVM: x86: fix warning Using plain integer as NULL pointer
      selftests: kvm: add a selftest for SMM
      selftests: kvm: fix for compilers that do not support -no-pie
      selftests: kvm/evmcs_test: complete I/O before migrating guest state
      KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels
      KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU
      KVM: x86: clear SMM flags before loading state while leaving SMM
      KVM: x86: Open code kvm_set_hflags
      KVM: x86: Load SMRAM in a single shot when leaving SMM
      KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU
      KVM: x86: Raise #GP when guest vCPU do not support PMU
      x86/kvm: move kvm_load/put_guest_xcr0 into atomic context
      KVM: x86: svm: make sure NMI is injected after nmi_singlestep
      svm/avic: Fix invalidate logical APIC id entry
      Revert "svm: Fix AVIC incomplete IPI emulation"
      kvm: mmu: Fix overflow on kvm mmu page limit calculation
      KVM: nVMX: always use early vmcs check when EPT is disabled
      KVM: nVMX: allow tests to use bad virtual-APIC page address
      ...
  3. KVM: x86: avoid misreporting level-triggered irqs as edge-triggered i…

    Vitaly Kuznetsov authored and bonzini committed Mar 27, 2019
    …n tracing
    
    In __apic_accept_irq() interface trig_mode is int and actually on some code
    paths it is set above u8:
    
    kvm_apic_set_irq() extracts it from 'struct kvm_lapic_irq' where trig_mode
    is u16. This is done on purpose as e.g. kvm_set_msi_irq() sets it to
    (1 << 15) & e->msi.data
    
    kvm_apic_local_deliver sets it to reg & (1 << 15).
    
    Fix the immediate issue by making 'tm' into u16. We may also want to adjust
    __apic_accept_irq() interface and use proper sizes for vector, level,
    trig_mode but this is not urgent.
    
    Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
  4. KVM: fix spectrev1 gadgets

    bonzini committed Apr 11, 2019
    These were found with smatch, and then generalized when applicable.
    
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
  5. KVM: x86: fix warning Using plain integer as NULL pointer

    Hariprasad Kelam authored and bonzini committed Apr 6, 2019
    Changed passing argument as "0 to NULL" which resolves below sparse warning
    
    arch/x86/kvm/x86.c:3096:61: warning: Using plain integer as NULL pointer
    
    Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
  6. selftests: kvm: add a selftest for SMM

    Vitaly Kuznetsov authored and bonzini committed Apr 10, 2019
    Add a simple test for SMM, based on VMX.  The test implements its own
    sync between the guest and the host as using our ucall library seems to
    be too cumbersome: SMI handler is happening in real-address mode.
    
    This patch also fixes KVM_SET_NESTED_STATE to happen after
    KVM_SET_VCPU_EVENTS, in fact it places it last.  This is because
    KVM needs to know whether the processor is in SMM or not.
    
    Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
  7. selftests: kvm: fix for compilers that do not support -no-pie

    bonzini committed Apr 11, 2019
    -no-pie was added to GCC at the same time as their configuration option
    --enable-default-pie.  Compilers that were built before do not have
    -no-pie, but they also do not need it.  Detect the option at build
    time.
    
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
  8. selftests: kvm/evmcs_test: complete I/O before migrating guest state

    bonzini committed Apr 11, 2019
    Starting state migration after an IO exit without first completing IO
    may result in test failures.  We already have two tests that need this
    (this patch in fact fixes evmcs_test, similar to what was fixed for
    state_test in commit 0f73bbc, "KVM: selftests: complete IO before
    migrating guest state", 2019-03-13) and a third is coming.  So, move the
    code to vcpu_save_state, and while at it do not access register state
    until after I/O is complete.
    
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
  9. KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels

    sean-jc authored and bonzini committed Apr 2, 2019
    Invoking the 64-bit variation on a 32-bit kenrel will crash the guest,
    trigger a WARN, and/or lead to a buffer overrun in the host, e.g.
    rsm_load_state_64() writes r8-r15 unconditionally, but enum kvm_reg and
    thus x86_emulate_ctxt._regs only define r8-r15 for CONFIG_X86_64.
    
    KVM allows userspace to report long mode support via CPUID, even though
    the guest is all but guaranteed to crash if it actually tries to enable
    long mode.  But, a pure 32-bit guest that is ignorant of long mode will
    happily plod along.
    
    SMM complicates things as 64-bit CPUs use a different SMRAM save state
    area.  KVM handles this correctly for 64-bit kernels, e.g. uses the
    legacy save state map if userspace has hid long mode from the guest,
    but doesn't fare well when userspace reports long mode support on a
    32-bit host kernel (32-bit KVM doesn't support 64-bit guests).
    
    Since the alternative is to crash the guest, e.g. by not loading state
    or explicitly requesting shutdown, unconditionally use the legacy SMRAM
    save state map for 32-bit KVM.  If a guest has managed to get far enough
    to handle SMIs when running under a weird/buggy userspace hypervisor,
    then don't deliberately crash the guest since there are no downsides
    (from KVM's perspective) to allow it to continue running.
    
    Fixes: 660a5d5 ("KVM: x86: save/load state on SMM switch")
    Cc: stable@vger.kernel.org
    Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
  10. KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU

    sean-jc authored and bonzini committed Apr 2, 2019
    Neither AMD nor Intel CPUs have an EFER field in the legacy SMRAM save
    state area, i.e. don't save/restore EFER across SMM transitions.  KVM
    somewhat models this, e.g. doesn't clear EFER on entry to SMM if the
    guest doesn't support long mode.  But during RSM, KVM unconditionally
    clears EFER so that it can get back to pure 32-bit mode in order to
    start loading CRs with their actual non-SMM values.
    
    Clear EFER only when it will be written when loading the non-SMM state
    so as to preserve bits that can theoretically be set on 32-bit vCPUs,
    e.g. KVM always emulates EFER_SCE.
    
    And because CR4.PAE is cleared only to play nice with EFER, wrap that
    code in the long mode check as well.  Note, this may result in a
    compiler warning about cr4 being consumed uninitialized.  Re-read CR4
    even though it's technically unnecessary, as doing so allows for more
    readable code and RSM emulation is not a performance critical path.
    
    Fixes: 660a5d5 ("KVM: x86: save/load state on SMM switch")
    Cc: stable@vger.kernel.org
    Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
  11. KVM: x86: clear SMM flags before loading state while leaving SMM

    sean-jc authored and bonzini committed Apr 2, 2019
    RSM emulation is currently broken on VMX when the interrupted guest has
    CR4.VMXE=1.  Stop dancing around the issue of HF_SMM_MASK being set when
    loading SMSTATE into architectural state, e.g. by toggling it for
    problematic flows, and simply clear HF_SMM_MASK prior to loading
    architectural state (from SMRAM save state area).
    
    Reported-by: Jon Doron <arilou@gmail.com>
    Cc: Jim Mattson <jmattson@google.com>
    Cc: Liran Alon <liran.alon@oracle.com>
    Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
    Fixes: 5bea512 ("KVM: VMX: check nested state and CR4.VMXE against SMM")
    Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
    Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
  12. KVM: x86: Open code kvm_set_hflags

    sean-jc authored and bonzini committed Apr 2, 2019
    Prepare for clearing HF_SMM_MASK prior to loading state from the SMRAM
    save state map, i.e. kvm_smm_changed() needs to be called after state
    has been loaded and so cannot be done automatically when setting
    hflags from RSM.
    
    Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
  13. KVM: x86: Load SMRAM in a single shot when leaving SMM

    sean-jc authored and bonzini committed Apr 2, 2019
    RSM emulation is currently broken on VMX when the interrupted guest has
    CR4.VMXE=1.  Rather than dance around the issue of HF_SMM_MASK being set
    when loading SMSTATE into architectural state, ideally RSM emulation
    itself would be reworked to clear HF_SMM_MASK prior to loading non-SMM
    architectural state.
    
    Ostensibly, the only motivation for having HF_SMM_MASK set throughout
    the loading of state from the SMRAM save state area is so that the
    memory accesses from GET_SMSTATE() are tagged with role.smm.  Load
    all of the SMRAM save state area from guest memory at the beginning of
    RSM emulation, and load state from the buffer instead of reading guest
    memory one-by-one.
    
    This paves the way for clearing HF_SMM_MASK prior to loading state,
    and also aligns RSM with the enter_smm() behavior, which fills a
    buffer and writes SMRAM save state in a single go.
    
    Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
  14. KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU

    Liran Alon authored and bonzini committed Mar 25, 2019
    Issue was discovered when running kvm-unit-tests on KVM running as L1 on
    top of Hyper-V.
    
    When vmx_instruction_intercept unit-test attempts to run RDPMC to test
    RDPMC-exiting, it is intercepted by L1 KVM which it's EXIT_REASON_RDPMC
    handler raise #GP because vCPU exposed by Hyper-V doesn't support PMU.
    Instead of unit-test expectation to be reflected with EXIT_REASON_RDPMC.
    
    The reason vmx_instruction_intercept unit-test attempts to run RDPMC
    even though Hyper-V doesn't support PMU is because L1 expose to L2
    support for RDPMC-exiting. Which is reasonable to assume that is
    supported only in case CPU supports PMU to being with.
    
    Above issue can easily be simulated by modifying
    vmx_instruction_intercept config in x86/unittests.cfg to run QEMU with
    "-cpu host,+vmx,-pmu" and run unit-test.
    
    To handle issue, change KVM to expose RDPMC-exiting only when guest
    supports PMU.
    
    Reported-by: Saar Amar <saaramar@microsoft.com>
    Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
    Reviewed-by: Jim Mattson <jmattson@google.com>
    Signed-off-by: Liran Alon <liran.alon@oracle.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
  15. KVM: x86: Raise #GP when guest vCPU do not support PMU

    Liran Alon authored and bonzini committed Mar 25, 2019
    Before this change, reading a VMware pseduo PMC will succeed even when
    PMU is not supported by guest. This can easily be seen by running
    kvm-unit-test vmware_backdoors with "-cpu host,-pmu" option.
    
    Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
    Signed-off-by: Liran Alon <liran.alon@oracle.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
  16. x86/kvm: move kvm_load/put_guest_xcr0 into atomic context

    wcwxyz authored and bonzini committed Apr 12, 2019
    guest xcr0 could leak into host when MCE happens in guest mode. Because
    do_machine_check() could schedule out at a few places.
    
    For example:
    
    kvm_load_guest_xcr0
    ...
    kvm_x86_ops->run(vcpu) {
      vmx_vcpu_run
        vmx_complete_atomic_exit
          kvm_machine_check
            do_machine_check
              do_memory_failure
                memory_failure
                  lock_page
    
    In this case, host_xcr0 is 0x2ff, guest vcpu xcr0 is 0xff. After schedule
    out, host cpu has guest xcr0 loaded (0xff).
    
    In __switch_to {
         switch_fpu_finish
           copy_kernel_to_fpregs
             XRSTORS
    
    If any bit i in XSTATE_BV[i] == 1 and xcr0[i] == 0, XRSTORS will
    generate #GP (In this case, bit 9). Then ex_handler_fprestore kicks in
    and tries to reinitialize fpu by restoring init fpu state. Same story as
    last #GP, except we get DOUBLE FAULT this time.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
  17. KVM: x86: svm: make sure NMI is injected after nmi_singlestep

    Vitaly Kuznetsov authored and bonzini committed Apr 3, 2019
    I noticed that apic test from kvm-unit-tests always hangs on my EPYC 7401P,
    the hanging test nmi-after-sti is trying to deliver 30000 NMIs and tracing
    shows that we're sometimes able to deliver a few but never all.
    
    When we're trying to inject an NMI we may fail to do so immediately for
    various reasons, however, we still need to inject it so enable_nmi_window()
    arms nmi_singlestep mode. #DB occurs as expected, but we're not checking
    for pending NMIs before entering the guest and unless there's a different
    event to process, the NMI will never get delivered.
    
    Make KVM_REQ_EVENT request on the vCPU from db_interception() to make sure
    pending NMIs are checked and possibly injected.
    
    Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
  18. svm/avic: Fix invalidate logical APIC id entry

    Suthikulpanit, Suravee authored and bonzini committed Mar 26, 2019
    Only clear the valid bit when invalidate logical APIC id entry.
    The current logic clear the valid bit, but also set the rest of
    the bits (including reserved bits) to 1.
    
    Fixes: 98d9058 ('svm: Fix AVIC DFR and LDR handling')
    Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
  19. Revert "svm: Fix AVIC incomplete IPI emulation"

    Suthikulpanit, Suravee authored and bonzini committed Mar 20, 2019
    This reverts commit bb218fb.
    
    As Oren Twaig pointed out the old discussion:
    
      https://patchwork.kernel.org/patch/8292231/
    
    that the change coud potentially cause an extra IPI to be sent to
    the destination vcpu because the AVIC hardware already set the IRR bit
    before the incomplete IPI #VMEXIT with id=1 (target vcpu is not running).
    Since writting to ICR and ICR2 will also set the IRR. If something triggers
    the destination vcpu to get scheduled before the emulation finishes, then
    this could result in an additional IPI.
    
    Also, the issue mentioned in the commit bb218fb was misdiagnosed.
    
    Cc: Radim Krčmář <rkrcmar@redhat.com>
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Reported-by: Oren Twaig <oren@scalemp.com>
    Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
  20. kvm: mmu: Fix overflow on kvm mmu page limit calculation

    Ben Gardon authored and bonzini committed Apr 8, 2019
    KVM bases its memory usage limits on the total number of guest pages
    across all memslots. However, those limits, and the calculations to
    produce them, use 32 bit unsigned integers. This can result in overflow
    if a VM has more guest pages that can be represented by a u32. As a
    result of this overflow, KVM can use a low limit on the number of MMU
    pages it will allocate. This makes KVM unable to map all of guest memory
    at once, prompting spurious faults.
    
    Tested: Ran all kvm-unit-tests on an Intel Haswell machine. This patch
    	introduced no new failures.
    
    Signed-off-by: Ben Gardon <bgardon@google.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
  21. KVM: nVMX: always use early vmcs check when EPT is disabled

    bonzini committed Apr 15, 2019
    The remaining failures of vmx.flat when EPT is disabled are caused by
    incorrectly reflecting VMfails to the L1 hypervisor.  What happens is
    that nested_vmx_restore_host_state corrupts the guest CR3, reloading it
    with the host's shadow CR3 instead, because it blindly loads GUEST_CR3
    from the vmcs01.
    
    For simplicity let's just always use hardware VMCS checks when EPT is
    disabled.  This way, nested_vmx_restore_host_state is not reached at
    all (or at least shouldn't be reached).
    
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
  22. KVM: nVMX: allow tests to use bad virtual-APIC page address

    bonzini committed Apr 15, 2019
    As mentioned in the comment, there are some special cases where we can simply
    clear the TPR shadow bit from the CPU-based execution controls in the vmcs02.
    Handle them so that we can remove some XFAILs from vmx.flat.
    
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commits on Apr 15, 2019
  1. Merge tag 'libnvdimm-fixes-5.1-rc6' of git://git.kernel.org/pub/scm/l…

    torvalds committed Apr 15, 2019
    …inux/kernel/git/nvdimm/nvdimm
    
    Pull libnvdimm fixes from Dan Williams:
     "I debated holding this back for the v5.2 merge window due to the size
      of the "zero-key" changes, but affected users would benefit from
      having the fixes sooner. It did not make sense to change the zero-key
      semantic in isolation for the "secure-erase" command, but instead
      include it for all security commands.
    
      The short background on the need for these changes is that some NVDIMM
      platforms enable security with a default zero-key rather than let the
      OS specify the initial key. This makes the security enabling that
      landed in v5.0 unusable for some users.
    
      Summary:
    
       - Compatibility fix for nvdimm-security implementations with a
         default zero-key.
    
       - Miscellaneous small fixes for out-of-bound accesses, cleanup after
         initialization failures, and missing debug messages"
    
    * tag 'libnvdimm-fixes-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
      tools/testing/nvdimm: Retain security state after overwrite
      libnvdimm/pmem: fix a possible OOB access when read and write pmem
      libnvdimm/security, acpi/nfit: unify zero-key for all security commands
      libnvdimm/security: provide fix for secure-erase to use zero-key
      libnvdimm/btt: Fix a kmemdup failure check
      libnvdimm/namespace: Fix a potential NULL pointer dereference
      acpi/nfit: Always dump _DSM output payload
  2. Merge tag 'fsdax-fix-5.1-rc6' of git://git.kernel.org/pub/scm/linux/k…

    torvalds committed Apr 15, 2019
    …ernel/git/nvdimm/nvdimm
    
    Pull fsdax fix from Dan Williams:
     "A single filesystem-dax fix. It has been lingering in -next for a long
      while and there are no other fsdax fixes on the horizon:
    
       - Avoid a crash scenario with architectures like powerpc that require
         'pgtable_deposit' for the zero page"
    
    * tag 'fsdax-fix-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
      fs/dax: Deposit pagetable even when installing zero page
  3. KVM: x86/mmu: Fix an inverted list_empty() check when zapping sptes

    sean-jc authored and bonzini committed Apr 13, 2019
    A recently introduced helper for handling zap vs. remote flush
    incorrectly bails early, effectively leaking defunct shadow pages.
    Manifests as a slab BUG when exiting KVM due to the shadow pages
    being alive when their associated cache is destroyed.
    
    ==========================================================================
    BUG kvm_mmu_page_header: Objects remaining in kvm_mmu_page_header on ...
    --------------------------------------------------------------------------
    Disabling lock debugging due to kernel taint
    INFO: Slab 0x00000000fc436387 objects=26 used=23 fp=0x00000000d023caee ...
    CPU: 6 PID: 4315 Comm: rmmod Tainted: G    B             5.1.0-rc2+ #19
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
    Call Trace:
     dump_stack+0x46/0x5b
     slab_err+0xad/0xd0
     ? on_each_cpu_mask+0x3c/0x50
     ? ksm_migrate_page+0x60/0x60
     ? on_each_cpu_cond_mask+0x7c/0xa0
     ? __kmalloc+0x1ca/0x1e0
     __kmem_cache_shutdown+0x13a/0x310
     shutdown_cache+0xf/0x130
     kmem_cache_destroy+0x1d5/0x200
     kvm_mmu_module_exit+0xa/0x30 [kvm]
     kvm_arch_exit+0x45/0x60 [kvm]
     kvm_exit+0x6f/0x80 [kvm]
     vmx_exit+0x1a/0x50 [kvm_intel]
     __x64_sys_delete_module+0x153/0x1f0
     ? exit_to_usermode_loop+0x88/0xc0
     do_syscall_64+0x4f/0x100
     entry_SYSCALL_64_after_hwframe+0x44/0xa9
    
    Fixes: a211363 ("KVM: x86/mmu: Split remote_flush+zap case out of kvm_mmu_flush_or_zap()")
    Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commits on Apr 14, 2019
  1. Linux 5.1-rc5

    torvalds committed Apr 14, 2019
  2. Merge branch 'page-refs' (page ref overflow)

    torvalds committed Apr 14, 2019
    Merge page ref overflow branch.
    
    Jann Horn reported that he can overflow the page ref count with
    sufficient memory (and a filesystem that is intentionally extremely
    slow).
    
    Admittedly it's not exactly easy.  To have more than four billion
    references to a page requires a minimum of 32GB of kernel memory just
    for the pointers to the pages, much less any metadata to keep track of
    those pointers.  Jann needed a total of 140GB of memory and a specially
    crafted filesystem that leaves all reads pending (in order to not ever
    free the page references and just keep adding more).
    
    Still, we have a fairly straightforward way to limit the two obvious
    user-controllable sources of page references: direct-IO like page
    references gotten through get_user_pages(), and the splice pipe page
    duplication.  So let's just do that.
    
    * branch page-refs:
      fs: prevent page refcount overflow in pipe_buf_get
      mm: prevent get_user_pages() from overflowing page refcount
      mm: add 'try_get_page()' helper function
      mm: make page ref count overflow check tighter and more explicit
  3. fs: prevent page refcount overflow in pipe_buf_get

    Matthew Wilcox authored and torvalds committed Apr 5, 2019
    Change pipe_buf_get() to return a bool indicating whether it succeeded
    in raising the refcount of the page (if the thing in the pipe is a page).
    This removes another mechanism for overflowing the page refcount.  All
    callers converted to handle a failure.
    
    Reported-by: Jann Horn <jannh@google.com>
    Signed-off-by: Matthew Wilcox <willy@infradead.org>
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
  4. mm: prevent get_user_pages() from overflowing page refcount

    torvalds committed Apr 11, 2019
    If the page refcount wraps around past zero, it will be freed while
    there are still four billion references to it.  One of the possible
    avenues for an attacker to try to make this happen is by doing direct IO
    on a page multiple times.  This patch makes get_user_pages() refuse to
    take a new page reference if there are already more than two billion
    references to the page.
    
    Reported-by: Jann Horn <jannh@google.com>
    Acked-by: Matthew Wilcox <willy@infradead.org>
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
  5. mm: add 'try_get_page()' helper function

    torvalds committed Apr 11, 2019
    This is the same as the traditional 'get_page()' function, but instead
    of unconditionally incrementing the reference count of the page, it only
    does so if the count was "safe".  It returns whether the reference count
    was incremented (and is marked __must_check, since the caller obviously
    has to be aware of it).
    
    Also like 'get_page()', you can't use this function unless you already
    had a reference to the page.  The intent is that you can use this
    exactly like get_page(), but in situations where you want to limit the
    maximum reference count.
    
    The code currently does an unconditional WARN_ON_ONCE() if we ever hit
    the reference count issues (either zero or negative), as a notification
    that the conditional non-increment actually happened.
    
    NOTE! The count access for the "safety" check is inherently racy, but
    that doesn't matter since the buffer we use is basically half the range
    of the reference count (ie we look at the sign of the count).
    
    Acked-by: Matthew Wilcox <willy@infradead.org>
    Cc: Jann Horn <jannh@google.com>
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
  6. mm: make page ref count overflow check tighter and more explicit

    torvalds committed Apr 11, 2019
    We have a VM_BUG_ON() to check that the page reference count doesn't
    underflow (or get close to overflow) by checking the sign of the count.
    
    That's all fine, but we actually want to allow people to use a "get page
    ref unless it's already very high" helper function, and we want that one
    to use the sign of the page ref (without triggering this VM_BUG_ON).
    
    Change the VM_BUG_ON to only check for small underflows (or _very_ close
    to overflowing), and ignore overflows which have strayed into negative
    territory.
    
    Acked-by: Matthew Wilcox <willy@infradead.org>
    Cc: Jann Horn <jannh@google.com>
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Older
You can’t perform that action at this time.