Skip to content

[pull] master from torvalds:master#123

Merged
pull[bot] merged 71 commits intosandutsar:masterfrom
torvalds:master
Aug 21, 2021
Merged

[pull] master from torvalds:master#123
pull[bot] merged 71 commits intosandutsar:masterfrom
torvalds:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Aug 21, 2021

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

Rahul Tanwar and others added 30 commits July 23, 2021 16:05
Add Rahul Tanwar as maintainer for PCIe RC controller driver for the Intel
Lightning Mountain (LGM) Gateway SoC.

Link: https://lore.kernel.org/r/b3249e08155e04ac08d820be3b8da29a913c472a.1625559158.git.rtanwar@maxlinear.com
Signed-off-by: Rahul Tanwar <rtanwar@maxlinear.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The Qualcomm SC8180x platform uses the qcom-cpufreq-hw driver, so
it in the cpufreq-dt-platdev driver's blocklist.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Currently, iommu_dma_alloc_noncontiguous() allocates a
struct dma_sgt_handle object to hold some state needed for
iommu_dma_free_noncontiguous().

However, the handle is neither freed nor returned explicitly by
the ->alloc_noncontiguous method, and therefore seems leaked.
This was found by code inspection, so please review carefully and test.

As a side note, it appears the struct dma_sgt_handle type is exposed
to users of the DMA-API by linux/dma-map-ops.h, but is has no users
or functions returning the type explicitly.

This may indicate it's a good idea to move the struct dma_sgt_handle type
to drivers/iommu/dma-iommu.c. The decision is left to maintainers :-)

Cc: stable@vger.kernel.org
Fixes: e817ee5 ("dma-iommu: implement ->alloc_noncontiguous")
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210723010552.50969-1-ezequiel@collabora.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
If probe_device is failing, iommu_group is not initialized because
iommu_group_add_device is not reached, so freeing it will result
in NULL pointer access.

iommu_bus_init
  ->bus_iommu_probe
      ->probe_iommu_group in for each:/* return -22 in fail case */
          ->iommu_probe_device
              ->__iommu_probe_device       /* return -22 here.*/
                  -> ops->probe_device          /* return -22 here.*/
                  -> iommu_group_get_for_dev
                        -> ops->device_group
                        -> iommu_group_add_device //good case
  ->remove_iommu_group  //in fail case, it will remove group
     ->iommu_release_device
         ->iommu_group_remove_device // here we don't have group

In my case ops->probe_device (mtk_iommu_probe_device from
mtk_iommu_v1.c) is due to failing fwspec->ops mismatch.

Fixes: d72e31c ("iommu: IOMMU Groups")
Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Link: https://lore.kernel.org/r/20210731074737.4573-1-linux@fw-web.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This WARN can be triggered per-core and the stack trace is not useful.
Replace it with plain dev_err(). Fix a comment while at it.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Stop the initialization when cpumask allocation failed and return an
error.

Fixes: 80a064d ("scmi-cpufreq: Get opp_shared_cpus from opp-v2 for EM")
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
When a Data CRC interrupt is received, the driver disables the DMA, then
sends the stop/abort command and then waits for Data Transfer Over.

However, sometimes, when a data CRC error is received in the middle of a
multi-block write transfer, the Data Transfer Over interrupt is never
received, and the driver hangs and never completes the request.

The driver sets the BMOD.SWR bit (SDMMC_IDMAC_SWRESET) when stopping the
DMA, but according to the manual CMD.STOP_ABORT_CMD should be programmed
"before assertion of SWR".  Do these operations in the recommended
order.  With this change the Data Transfer Over is always received
correctly in my tests.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Reviewed-by: Jaehoon Chung <jh80.chung@samsung.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210630102232.16011-1-vincent.whitchurch@axis.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
If the card has not been power cycled, it may still be using 1.8V
signaling. This situation is detected in mmc_sd_init_card function and
should be handled in mmci stm32 variant.  The host->pwr_reg variable is
also correctly protected with spin locks.

Fixes: 94b94a9 ("mmc: mmci_sdmmc: Implement signal voltage callbacks")
Signed-off-by: Christophe Kerello <christophe.kerello@foss.st.com>
Signed-off-by: Yann Gautier <yann.gautier@foss.st.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210701143353.13188-1-yann.gautier@foss.st.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Whenever SDHC run at clock rate 50MHZ or below, the hardware data
timeout value will be 21.47secs, which is approx. 22secs and we have
a current software timeout value as 10secs. We have to set software
timeout value more than the hardware data timeout value to avioid seeing
the below register dumps.

[  332.953670] mmc2: Timeout waiting for hardware interrupt.
[  332.959608] mmc2: sdhci: ============ SDHCI REGISTER DUMP ===========
[  332.966450] mmc2: sdhci: Sys addr:  0x00000000 | Version:  0x00007202
[  332.973256] mmc2: sdhci: Blk size:  0x00000200 | Blk cnt:  0x00000001
[  332.980054] mmc2: sdhci: Argument:  0x00000000 | Trn mode: 0x00000027
[  332.986864] mmc2: sdhci: Present:   0x01f801f6 | Host ctl: 0x0000001f
[  332.993671] mmc2: sdhci: Power:     0x00000001 | Blk gap:  0x00000000
[  333.000583] mmc2: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
[  333.007386] mmc2: sdhci: Timeout:   0x0000000e | Int stat: 0x00000000
[  333.014182] mmc2: sdhci: Int enab:  0x03ff100b | Sig enab: 0x03ff100b
[  333.020976] mmc2: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[  333.027771] mmc2: sdhci: Caps:      0x322dc8b2 | Caps_1:   0x0000808f
[  333.034561] mmc2: sdhci: Cmd:       0x0000183a | Max curr: 0x00000000
[  333.041359] mmc2: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0x00000000
[  333.048157] mmc2: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
[  333.054945] mmc2: sdhci: Host ctl2: 0x00000000
[  333.059657] mmc2: sdhci: ADMA Err:  0x00000000 | ADMA Ptr:
0x0000000ffffff218
[  333.067178] mmc2: sdhci_msm: ----------- VENDOR REGISTER DUMP
-----------
[  333.074343] mmc2: sdhci_msm: DLL sts: 0x00000000 | DLL cfg:
0x6000642c | DLL cfg2: 0x0020a000
[  333.083417] mmc2: sdhci_msm: DLL cfg3: 0x00000000 | DLL usr ctl:
0x00000000 | DDR cfg: 0x80040873
[  333.092850] mmc2: sdhci_msm: Vndr func: 0x00008a9c | Vndr func2 :
0xf88218a8 Vndr func3: 0x02626040
[  333.102371] mmc2: sdhci: ============================================

So, set software timeout value more than hardware timeout value.

Signed-off-by: Shaik Sajida Bhanu <sbhanu@codeaurora.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1626435974-14462-1-git-send-email-sbhanu@codeaurora.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The Qualcomm sm8150 platform uses the qcom-cpufreq-hw driver, so
add it to the cpufreq-dt-platdev driver's blocklist.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The 1.2 GHz variant of the Armada 3720 SOC is unstable with DVFS: when
the SOC boots, the WTMI firmware sets clocks and AVS values that work
correctly with 1.2 GHz CPU frequency, but random crashes occur once
cpufreq driver starts scaling.

We do not know currently what is the reason:
- it may be that the voltage value for L0 for 1.2 GHz variant provided
  by the vendor in the OTP is simply incorrect when scaling is used,
- it may be that some delay is needed somewhere,
- it may be something else.

The most sane solution now seems to be to simply forbid the cpufreq
driver on 1.2 GHz variant.

Signed-off-by: Marek Behún <kabel@kernel.org>
Fixes: 92ce45f ("cpufreq: Add DVFS support for Armada 37xx")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
commit a5b8ca9 ("arm64: do not descend to vdso directories twice")
changes the cleaning behavior of arm64's vdso files, in that vdso.lds,
vdso.so, and vdso.so.dbg are not removed upon a 'make clean/mrproper':

$ make defconfig ARCH=arm64
$ make ARCH=arm64
$ make mrproper ARCH=arm64
$ git clean -nxdf
Would remove arch/arm64/kernel/vdso/vdso.lds
Would remove arch/arm64/kernel/vdso/vdso.so
Would remove arch/arm64/kernel/vdso/vdso.so.dbg

To remedy this, manually descend into arch/arm64/kernel/vdso upon
cleaning.

After this commit:
$ make defconfig ARCH=arm64
$ make ARCH=arm64
$ make mrproper ARCH=arm64
$ git clean -nxdf
<empty>

Similar results are obtained for the vdso32 equivalent.

Signed-off-by: Andrew Delgadillo <adelg@google.com>
Cc: stable@vger.kernel.org
Fixes: a5b8ca9 ("arm64: do not descend to vdso directories twice")
Link: https://lore.kernel.org/r/20210810231755.1743524-1-adelg@google.com
Signed-off-by: Will Deacon <will@kernel.org>
To avoid the output width and height is incorrect,
AAL_OUTPUT_SIZE configuration should be set.

Fixes: 0664d13 ("drm/mediatek: Add AAL engine basic function")
Signed-off-by: jason-jh.lin <jason-jh.lin@mediatek.com>
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Add component_del in OVL and COLOR remove function.

Fixes: ff13956 ("drm/mediatek: Move mtk_ddp_comp_init() from sub driver to DRM driver")
Signed-off-by: jason-jh.lin <jason-jh.lin@mediatek.com>
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
The current_opp is released only when whole OPP table is released,
otherwise it's only marked as removed by dev_pm_opp_remove_table().
Functions like dev_pm_opp_put_clkname() and dev_pm_opp_put_supported_hw()
are checking whether OPP table is empty and it's not if current_opp is
set since it holds the refcount of OPP, this produces a noisy warning
from these functions about busy OPP table. Remove the checks to fix it.

Cc: stable@vger.kernel.org
Fixes: 81c4d8a ("opp: Keep track of currently programmed OPP")
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
There is a known bug on BCM2711's SDHCI core integration where the
controller will hang when the difference between the core clock and the
bus clock is too great. Specifically this can be reproduced under the
following conditions:

- No SD card plugged in, polling thread is running, probing cards at
  100 kHz.
- BCM2711's core clock configured at 500MHz or more.

So set 200 kHz as the minimum clock frequency available for that board.

For more information on the issue see this:
https://lore.kernel.org/linux-mmc/20210322185816.27582-1-nsaenz@kernel.org/T/#m11f2783a09b581da6b8a15f302625b43a6ecdeca

Fixes: f84e411 ("mmc: sdhci-iproc: Add support for emmc2 of the BCM2711")
Signed-off-by: Nicolas Saenz Julienne <nsaenz@kernel.org>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1628334401-6577-5-git-send-email-stefan.wahren@i2se.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The controller doesn't seem to pick-up on clock changes, so set the
SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN flag to query the clock frequency
directly from the clock.

Fixes: f84e411 ("mmc: sdhci-iproc: Add support for emmc2 of the BCM2711")
Signed-off-by: Nicolas Saenz Julienne <nsaenz@kernel.org>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1628334401-6577-6-git-send-email-stefan.wahren@i2se.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
In 69de442 ("drm/ttm: Initialize debugfs from
ttm_global_init()"), ttm_global_init was changed so that if creation
of the debugfs global root directory fails, ttm_global_init will bail
out early and return an error, leading to initialization failure of
DRM drivers. However, not every system will be using debugfs. On such
a system, debugfs directory creation can be expected to fail, but DRM
drivers must still be usable. This changes it so that if creation of
TTM's debugfs root directory fails, then no biggie: keep calm and
carry on.

Fixes: 69de442 ("drm/ttm: Initialize debugfs from ttm_global_init()")
Signed-off-by: Dan Moulding <dmoulding@me.com>
Tested-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210810195906.22220-2-dmoulding@me.com
Signed-off-by: Christian König <christian.koenig@amd.com>
…/git/vireshk/pm

Pull operating performance points (OPP) framework fixes for v5.14
from iresh Kumar:

"This removes few WARN() statements from the OPP core."

* 'opp/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  opp: Drop empty-table checks from _put functions
  opp: remove WARN when no valid OPPs remain
On the system PRMT table is not present, dmesg output:

	$ dmesg | grep PRM
	[    1.532237] ACPI: PRMT not present
	[    1.532237] PRM: found 4294967277 modules

The result of acpi_table_parse_entries need to be checked and return
immediately if PRMT table is not present or no PRM module found.

Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It was reported by a user with a Dell m15 R5 (5800H) that
the keyboard backlight was turning on when entering suspend
and turning off when exiting (the opposite of how it should be).

The user bisected it back to commit 5dbf509 ("ACPI: PM:
s2idle: Add support for new Microsoft UUID").  Previous to that
commit the LEDs didn't turn off at all.  Confirming in the spec,
these were reversed when introduced.

Fix them to match the spec.

BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1230#note_1021836
Fixes: 5dbf509 ("ACPI: PM: s2idle: Add support for new Microsoft UUID")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This reverts commit 0979d43.
Revert this because it does not apply to all the cards.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
change the workload type for some cards as it is needed.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
KFDSVMRangeTest.SetGetAttributesTest randomly fails in stress test.

Note: Google Test filter = KFDSVMRangeTest.*
[==========] Running 18 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 18 tests from KFDSVMRangeTest
[ RUN      ] KFDSVMRangeTest.BasicSystemMemTest
[       OK ] KFDSVMRangeTest.BasicSystemMemTest (30 ms)
[ RUN      ] KFDSVMRangeTest.SetGetAttributesTest
[          ] Get default atrributes
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:154: Failure
Value of: expectedDefaultResults[i]
  Actual: 4294967295
Expected: outputAttributes[i].value
Which is: 0
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:154: Failure
Value of: expectedDefaultResults[i]
  Actual: 4294967295
Expected: outputAttributes[i].value
Which is: 0
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:152: Failure
Value of: expectedDefaultResults[i]
  Actual: 4
Expected: outputAttributes[i].type
Which is: 2
[          ] Setting/Getting atrributes
[  FAILED  ]

the root cause is that svm work queue has not finished when svm_range_get_attr is called, thus
some garbage svm interval tree data make svm_range_get_attr get wrong result. Flush work queue before
iterate svm interval tree.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
DM initializes VM context after DMCUB initialization.
This results in loss of DCN_VM_CONTEXT registers after z10.

[How]
Notify DMCUB when VM setup is complete, and have DMCUB
save init registers.

v2: squash in CONFIG_DRM_AMD_DC_DCN3_1 fix

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Jake Wang <haonan.wang2@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Userspace should get back a copy of drm_wait_vblank that's been modified
even when drm_wait_vblank_ioctl returns a failure.

Rationale:
drm_wait_vblank_ioctl modifies the request and expects the user to read
it back. When the type is RELATIVE, it modifies it to ABSOLUTE and updates
the sequence to become current_vblank_count + sequence (which was
RELATIVE), but now it became ABSOLUTE.
drmWaitVBlank (in libdrm) expects this to be the case as it modifies
the request to be Absolute so it expects the sequence to would have been
updated.

The change is in compat_drm_wait_vblank, which is called by
drm_compat_ioctl. This change of copying the data back regardless of the
return number makes it en par with drm_ioctl, which always copies the
data before returning.

[How]
Return from the function after everything has been copied to user.

Fixes IGT:kms_flip::modeset-vs-vblank-race-interruptible
Tested on ChromeOS Trogdor(msm)

Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Mark Yacoub <markyacoub@chromium.org>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20210812194917.1703356-1-markyacoub@chromium.org
…x/kernel/git/vireshk/pm

Pull ARM cpufreq fixes for v5.14 from Viresh Kumar:

"This contains:

 - Addition of SoCs to blocklist for cpufreq-dt driver (Bjorn Andersson
   and Thara Gopinath).

 - Fix error path for scmi driver (Lukasz Luba).

 - Temporarily disable highest frequency for armada, its unsafe and
   breaks stuff."

* 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant
  cpufreq: blocklist Qualcomm sm8150 in cpufreq-dt-platdev
  cpufreq: arm_scmi: Fix error path when allocation failed
  cpufreq: blacklist Qualcomm sc8180x in cpufreq-dt-platdev
Still no GA106 as I don't have HW to verif.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
When booted with multiple displays attached, the EFI GOP driver on (at
least) Ampere, can leave DP links powered up that aren't being used to
display anything.  This confuses our tracking of SOR routing, with the
likely result being a failed modeset and display engine hang.

Fix this by (ab?)using the DisableLT IED script to power-down the link,
restoring HW to a state the driver expects.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
…ences

Should fix some initial modeset failures on (at least) Ampere boards.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
tiwai and others added 27 commits August 19, 2021 17:57
The commit 2e6b836 ("ASoC: intel: atom: Fix reference to PCM
buffer address") changed the reference of PCM buffer address to
substream->runtime->dma_addr as the buffer address may change
dynamically.  However, I forgot that the dma_addr field is still not
set up for the CONTINUOUS buffer type (that this driver uses) yet in
5.14 and earlier kernels, and it resulted in garbage I/O.  The problem
will be fixed in 5.15, but we need to address it quickly for now.

The fix is to deduce the address again from the DMA pointer with
virt_to_phys(), but from the right one, substream->runtime->dma_area.

Fixes: 2e6b836 ("ASoC: intel: atom: Fix reference to PCM buffer address")
Reported-and-tested-by: Hans de Goede <hdegoede@redhat.com>
Cc: <stable@vger.kernel.org>
Acked-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/2048c6aa-2187-46bd-6772-36a4fb3c5aeb@redhat.com
Link: https://lore.kernel.org/r/20210819152945.8510-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
…g/drm/drm-misc into drm-fixes

Short summary of fixes pull:

 * UAPI: Return results for failed drm_wait_vblank_ioctl()
 * ttm: Fix debugfs initialization

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/YR1c7cG1IaL+g8EN@linux-uq9g.fritz.box
…fixes

- Ampere display fixes
- Fix longstanding MM race issue by removing unused code.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Ben Skeggs <skeggsb@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CACAvsv5jtUFkHsGe-pf-=RceDOgKygjPnCi=6d5vCLM_f5aeMQ@mail.gmail.com
…rg/drm/drm-intel into drm-fixes

- Expand a tweaked display workaround for all PCHs. (Anshuman)
- Fix eDP MSO pipe sanity checks for ADL-P. (Jani)
- Remove superfluous EXPORT_SYMBOL(). (Jani)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YR137zkSAIbun1Ed@intel.com
…cm/linux/kernel/git/chunkuang.hu/linux into drm-fixes

Mediatek DRM Fixes for Linux 5.14-2

1. Fix AAL output size setting.
2. Delete component in remove function.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20210819001635.14803-1-chunkuang.hu@kernel.org
…op.org/agd5f/linux into drm-fixes

amd-drm-fixes-5.14-2021-08-18:

amdgpu:
- vega10 SMU workload fix
- DCN VM fix
- DCN 3.01 watermark fix

amdkfd:
- SVM fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210818225137.4070-1-alexander.deucher@amd.com
Due to the change about how block layer detects congestion the
justification of commit 8fd2e0b ("mm: swap: check if swap backing
device is congested or not") doesn't stand anymore, so the commit could
be just reverted in order to solve the race reported by commit
2efa33f ("mm/shmem: fix shmem_swapin() race with swapoff"), so the
fix commit could be just reverted as well.

And that fix is also kind of buggy as discussed by [1] and [2].

[1] https://lore.kernel.org/linux-mm/24187e5e-069-9f3f-cefe-39ac70783753@google.com/
[2] https://lore.kernel.org/linux-mm/e82380b9-3ad4-4a52-be50-6d45c7f2b5da@google.com/

Link: https://lkml.kernel.org/r/20210810202936.2672-2-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Due to the change about how block layer detects congestion the
justification of commit 8fd2e0b ("mm: swap: check if swap backing
device is congested or not") doesn't stand anymore, so the commit could
be just reverted in order to solve the race reported by commit
2efa33f ("mm/shmem: fix shmem_swapin() race with swapoff").  The
fix was reverted by the previous patch.

Link: https://lkml.kernel.org/r/20210810202936.2672-3-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When placing pages on a pcp list, migratetype values over
MIGRATE_PCPTYPES get added to the MIGRATE_MOVABLE pcp list.

However, the actual migratetype is preserved in the page and should
not be changed to MIGRATE_MOVABLE or the page may end up on the wrong
free_list.

The impact is that HIGHATOMIC or CMA pages getting bulk freed from the
PCP lists could potentially end up on the wrong buddy list.  There are
various consequences but minimally NR_FREE_CMA_PAGES accounting could
get screwed up.

[mgorman@techsingularity.net: changelog update]

Link: https://lkml.kernel.org/r/20210811182917.2607994-1-opendmb@gmail.com
Fixes: df1acc8 ("mm/page_alloc: avoid conflating IRQs disabled with zone->lock")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
printk("%pGg") outputs these two flags as hexadecimal number, rather
than as a string, e.g:

	GFP_KERNEL|0x1800000

Fix this by adding missing names of __GFP_ZEROTAGS and
__GFP_SKIP_KASAN_POISON flags to __def_gfpflag_names.

Link: https://lkml.kernel.org/r/20210816133502.590-1-rppt@kernel.org
Fixes: 013bb59 ("arm64: mte: handle tags zeroing at page allocation time")
Fixes: c275c5c ("kasan: disable freed user page poisoning with HW tags")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Everyone has moved from Freenode to Libera so updated the channel entry
for MAINTAINERS.

Link: ClangBuiltLinux#1402
Link: https://lkml.kernel.org/r/20210818022339.3863058-1-nathan@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
…claim

We've noticed occasional OOM killing when memory.low settings are in
effect for cgroups.  This is unexpected and undesirable as memory.low is
supposed to express non-OOMing memory priorities between cgroups.

The reason for this is proportional memory.low reclaim.  When cgroups
are below their memory.low threshold, reclaim passes them over in the
first round, and then retries if it couldn't find pages anywhere else.
But when cgroups are slightly above their memory.low setting, page scan
force is scaled down and diminished in proportion to the overage, to the
point where it can cause reclaim to fail as well - only in that case we
currently don't retry, and instead trigger OOM.

To fix this, hook proportional reclaim into the same retry logic we have
in place for when cgroups are skipped entirely.  This way if reclaim
fails and some cgroups were scanned with diminished pressure, we'll try
another full-force cycle before giving up and OOMing.

[akpm@linux-foundation.org: coding-style fixes]

Link: https://lkml.kernel.org/r/20210817180506.220056-1-hannes@cmpxchg.org
Fixes: 9783aa9 ("mm, memcg: proportional memory.{low,min} reclaim")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Leon Yang <lnyng@fb.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Chris Down <chris@chrisdown.name>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>		[5.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
HWPoisonHandlable() sometimes returns false for typical user pages due
to races with average memory events like transfers over LRU lists.  This
causes failures in hwpoison handling.

There's retry code for such a case but does not work because the retry
loop reaches the retry limit too quickly before the page settles down to
handlable state.  Let get_any_page() call shake_page() to fix it.

[naoya.horiguchi@nec.com: get_any_page(): return -EIO when retry limit reached]
  Link: https://lkml.kernel.org/r/20210819001958.2365157-1-naoya.horiguchi@linux.dev

Link: https://lkml.kernel.org/r/20210817053703.2267588-1-naoya.horiguchi@linux.dev
Fixes: 25182f0 ("mm,hwpoison: fix race with hugetlb page allocation")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>		[5.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In a debugging session the other day, Rik noticed that node_reclaim()
was missing memstall annotations.  This means we'll miss pressure and
lost productivity resulting from reclaim on an overloaded local NUMA
node when vm.zone_reclaim_mode is enabled.

There haven't been any reports, but that's likely because
vm.zone_reclaim_mode hasn't been a commonly used feature recently, and
the intersection between such setups and psi users is probably nil.

But secondary memory such as CXL-connected DIMMS, persistent memory etc,
and the page demotion patches that handle them
(https://lore.kernel.org/lkml/20210401183216.443C4443@viggo.jf.intel.com/)
could soon make this a more common codepath again.

Link: https://lkml.kernel.org/r/20210818152457.35846-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Originally the addr != NULL check was meant to take care of the case
where __kfence_pool == NULL (KFENCE is disabled).  However, this does
not work for addresses where addr > 0 && addr < KFENCE_POOL_SIZE.

This can be the case on NULL-deref where addr > 0 && addr < PAGE_SIZE or
any other faulting access with addr < KFENCE_POOL_SIZE.  While the
kernel would likely crash, the stack traces and report might be
confusing due to double faults upon KFENCE's attempt to unprotect such
an address.

Fix it by just checking that __kfence_pool != NULL instead.

Link: https://lkml.kernel.org/r/20210818130300.2482437-1-elver@google.com
Fixes: 0ce20dd ("mm: add Kernel Electric-Fence infrastructure")
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>    [5.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
syzbot hit kernel BUG at fs/hugetlbfs/inode.c:532 as described in [1].
This BUG triggers if the HPageRestoreReserve flag is set on a page in
the page cache.  It should never be set, as the routine
huge_add_to_page_cache explicitly clears the flag after adding a page to
the cache.

The only code other than huge page allocation which sets the flag is
restore_reserve_on_error.  It will potentially set the flag in rare out
of memory conditions.  syzbot was injecting errors to cause memory
allocation errors which exercised this specific path.

The code in restore_reserve_on_error is doing the right thing.  However,
there are instances where pages in the page cache were being passed to
restore_reserve_on_error.  This is incorrect, as once a page goes into
the cache reservation information will not be modified for the page
until it is removed from the cache.  Error paths do not remove pages
from the cache, so even in the case of error, the page will remain in
the cache and no reservation adjustment is needed.

Modify routines that potentially call restore_reserve_on_error with a
page cache page to no longer do so.

Note on fixes tag: Prior to commit 846be08 ("mm/hugetlb: expand
restore_reserve_on_error functionality") the routine would not process
page cache pages because the HPageRestoreReserve flag is not set on such
pages.  Therefore, this issue could not be trigggered.  The code added
by commit 846be08 ("mm/hugetlb: expand restore_reserve_on_error
functionality") is needed and correct.  It exposed incorrect calls to
restore_reserve_on_error which is the root cause addressed by this
commit.

[1] https://lore.kernel.org/linux-mm/00000000000050776d05c9b7c7f0@google.com/

Link: https://lkml.kernel.org/r/20210818213304.37038-1-mike.kravetz@oracle.com
Fixes: 846be08 ("mm/hugetlb: expand restore_reserve_on_error functionality")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: <syzbot+67654e51e54455f1c585@syzkaller.appspotmail.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* pm-opp:
  opp: Drop empty-table checks from _put functions
  opp: remove WARN when no valid OPPs remain
…ux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

 - Fix for a potential NULL-ptr dereference in IOMMU core code

 - Two resource leak fixes

 - Cache flush fix in the Intel VT-d driver

* tag 'iommu-fixes-v5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Fix incomplete cache flush in intel_pasid_tear_down_entry()
  iommu/vt-d: Fix PASID reference leak
  iommu: Check if group is NULL before remove device
  iommu/dma: Fix leak in non-contiguous API
* acpi-pm:
  ACPI: PM: s2idle: Invert Microsoft UUID entry and exit
…git/arm64/linux

Pull arm64 fixes from Will Deacon:

 - Fix cleaning of vDSO directories

 - Ensure CNTHCTL_EL2 is fully initialised when booting at EL2

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: initialize all of CNTHCTL_EL2
  arm64: clean vdso & vdso32 files
…rnel/git/tiwai/sound

Pull more sound fixes from Takashi Iwai:
 "This is a quick follow up for 5.14: a fix for a very recently
  introduced regression on ASoC Intel Atom driver, and another trivial
  HD-audio quirk for HP laptops"

* tag 'sound-5.14-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ASoC: intel: atom: Fix breakage for PCM buffer address setup
  ALSA: hda/realtek: Limit mic boost on HP ProBook 445 G8
…l/git/ulfh/mmc

Pull MMC host fixes from Ulf Hansson:

 - dw_mmc: Fix hang on data CRC error

 - mmci: Fix voltage switch procedure for the stm32 variant

 - sdhci-iproc: Fix some clock issues for BCM2711

 - sdhci-msm: Fixup software timeout value

* tag 'mmc-v5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711
  mmc: sdhci-iproc: Cap min clock frequency on BCM2711
  mmc: sdhci-msm: Update the software timeout value for sdhc
  mmc: mmci: stm32: Check when the voltage switch procedure should be done
  mmc: dw_mmc: Fix hang on data CRC error
…ernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:

 - Add Rahul Tanwar as Intel LGM Gateway PCIe maintainer (Rahul Tanwar)

 - Add Jim Quinlan et al as Broadcom STB PCIe maintainers (Jim Quinlan)

 - Increase D3hot-to-D0 delay for AMD Renoir/Cezanne XHCI (Marcin
   Bachry)

 - Correct iomem_get_mapping() usage for legacy_mem sysfs (Krzysztof
   Wilczyński)

* tag 'pci-v5.14-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI/sysfs: Use correct variable for the legacy_mem sysfs object
  PCI: Increase D3 delay for AMD Renoir/Cezanne XHCI
  MAINTAINERS: Add Jim Quinlan et al as Broadcom STB PCIe maintainers
  MAINTAINERS: Add Rahul Tanwar as Intel LGM Gateway PCIe maintainer
…rm/drm

Pull drm fixes from Dave Airlie:
 "Regularly scheduled fixes. The ttm one solves a problem of GPU drivers
  failing to load if debugfs is off in Kconfig, otherwise the i915 and
  mediatek, and amdgpu fixes all fairly normal.

  Nouveau has a couple of display fixes, but it has a fix for a
  longstanding race condition in it's memory manager code, and the fix
  mostly removes some code that wasn't working properly and has no
  userspace users. This fix makes the diffstat kinda larger but in a
  good (negative line-count) way.

  core:
   - fix drm_wait_vblank uapi copying bug

  ttm:
   - fix debugfs init when debugfs is off

  amdgpu:
   - vega10 SMU workload fix
   - DCN VM fix
   - DCN 3.01 watermark fix

  amdkfd:
   - SVM fix

  nouveau:
   - ampere display fixes
   - remove MM misfeature to fix a longstanding race condition

  i915:
   - tweaked display workaround for all PCHs
   - eDP MSO pipe sanity for ADL-P fix
   - remove unused symbol export

  mediatek:
   - AAL output size setting
   - Delete component in remove function"

* tag 'drm-fixes-2021-08-20-3' of git://anongit.freedesktop.org/drm/drm:
  drm/amd/display: Use DCN30 watermark calc for DCN301
  drm/i915/dp: remove superfluous EXPORT_SYMBOL()
  drm/i915/edp: fix eDP MSO pipe sanity checks for ADL-P
  drm/i915: Tweaked Wa_14010685332 for all PCHs
  drm/nouveau: rip out nvkm_client.super
  drm/nouveau: block a bunch of classes from userspace
  drm/nouveau/fifo/nv50-: rip out dma channels
  drm/nouveau/kms/nv50: workaround EFI GOP window channel format differences
  drm/nouveau/disp: power down unused DP links during init
  drm/nouveau: recognise GA107
  drm: Copy drm_wait_vblank to user before returning
  drm/amd/display: Ensure DCN save after VM setup
  drm/amdkfd: fix random KFDSVMRangeTest.SetGetAttributesTest test failure
  drm/amd/pm: change the workload type for some cards
  Revert "drm/amd/pm: fix workload mismatch on vega10"
  drm: ttm: Don't bail from ttm_global_init if debugfs_create_dir fails
  drm/mediatek: Add component_del in OVL and COLOR remove function
  drm/mediatek: Add AAL output size configuration
Merge misc fixes from Andrew Morton:
 "10 patches.

  Subsystems affected by this patch series: MAINTAINERS and mm (shmem,
  pagealloc, tracing, memcg, memory-failure, vmscan, kfence, and
  hugetlb)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  hugetlb: don't pass page cache pages to restore_reserve_on_error
  kfence: fix is_kfence_address() for addresses below KFENCE_POOL_SIZE
  mm: vmscan: fix missing psi annotation for node_reclaim()
  mm/hwpoison: retry with shake_page() for unhandlable pages
  mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
  MAINTAINERS: update ClangBuiltLinux IRC chat
  mmflags.h: add missing __GFP_ZEROTAGS and __GFP_SKIP_KASAN_POISON names
  mm/page_alloc: don't corrupt pcppage_migratetype
  Revert "mm: swap: check if swap backing device is congested or not"
  Revert "mm/shmem: fix shmem_swapin() race with swapoff"
…git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix some issues in the ARM cpufreq drivers and in the operating
  performance points (OPP) framework.

  Specifics:

   - Fix useless WARN() in the OPP core and prevent a noisy warning
     from being printed by OPP _put functions (Dmitry Osipenko).

   - Fix error path when allocation failed in the arm_scmi cpufreq
     driver (Lukasz Luba).

   - Blacklist Qualcomm sc8180x and Qualcomm sm8150 in
     cpufreq-dt-platdev (Bjorn Andersson, Thara Gopinath).

   - Forbid cpufreq for 1.2 GHz variant in the armada-37xx cpufreq
     driver (Marek Behún)"

* tag 'pm-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  opp: Drop empty-table checks from _put functions
  cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant
  cpufreq: blocklist Qualcomm sm8150 in cpufreq-dt-platdev
  cpufreq: arm_scmi: Fix error path when allocation failed
  opp: remove WARN when no valid OPPs remain
  cpufreq: blacklist Qualcomm sc8180x in cpufreq-dt-platdev
…l/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These fix two mistakes in new code.

  Specifics:

   - Prevent confusing messages from being printed if the PRMT table is
     not present or there are no PRM modules (Aubrey Li).

   - Fix the handling of suspend-to-idle entry and exit in the case when
     the Microsoft UUID is used with the Low-Power S0 Idle _DSM
     interface (Mario Limonciello)"

* tag 'acpi-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: PM: s2idle: Invert Microsoft UUID entry and exit
  ACPI: PRM: Deal with table not present or no module found
@pull pull bot added the ⤵️ pull label Aug 21, 2021
@pull pull bot merged commit fa54d36 into sandutsar:master Aug 21, 2021
pull bot pushed a commit that referenced this pull request Jan 12, 2022
We got issue as follows when run syzkaller:
[  167.936972] EXT4-fs error (device loop0): __ext4_remount:6314: comm rep: Abort forced by user
[  167.938306] EXT4-fs (loop0): Remounting filesystem read-only
[  167.981637] Assertion failure in ext4_getblk() at fs/ext4/inode.c:847: '(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) || handle != NULL || create == 0'
[  167.983601] ------------[ cut here ]------------
[  167.984245] kernel BUG at fs/ext4/inode.c:847!
[  167.984882] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[  167.985624] CPU: 7 PID: 2290 Comm: rep Tainted: G    B             5.16.0-rc5-next-20211217+ #123
[  167.986823] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
[  167.988590] RIP: 0010:ext4_getblk+0x17e/0x504
[  167.989189] Code: c6 01 74 28 49 c7 c0 a0 a3 5c 9b b9 4f 03 00 00 48 c7 c2 80 9c 5c 9b 48 c7 c6 40 b6 5c 9b 48 c7 c7 20 a4 5c 9b e8 77 e3 fd ff <0f> 0b 8b 04 244
[  167.991679] RSP: 0018:ffff8881736f7398 EFLAGS: 00010282
[  167.992385] RAX: 0000000000000094 RBX: 1ffff1102e6dee75 RCX: 0000000000000000
[  167.993337] RDX: 0000000000000001 RSI: ffffffff9b6e29e0 RDI: ffffed102e6dee66
[  167.994292] RBP: ffff88816a076210 R08: 0000000000000094 R09: ffffed107363fa09
[  167.995252] R10: ffff88839b1fd047 R11: ffffed107363fa08 R12: ffff88816a0761e8
[  167.996205] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000001
[  167.997158] FS:  00007f6a1428c740(0000) GS:ffff88839b000000(0000) knlGS:0000000000000000
[  167.998238] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  167.999025] CR2: 00007f6a140716c8 CR3: 0000000133216000 CR4: 00000000000006e0
[  167.999987] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  168.000944] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  168.001899] Call Trace:
[  168.002235]  <TASK>
[  168.007167]  ext4_bread+0xd/0x53
[  168.007612]  ext4_quota_write+0x20c/0x5c0
[  168.010457]  write_blk+0x100/0x220
[  168.010944]  remove_free_dqentry+0x1c6/0x440
[  168.011525]  free_dqentry.isra.0+0x565/0x830
[  168.012133]  remove_tree+0x318/0x6d0
[  168.014744]  remove_tree+0x1eb/0x6d0
[  168.017346]  remove_tree+0x1eb/0x6d0
[  168.019969]  remove_tree+0x1eb/0x6d0
[  168.022128]  qtree_release_dquot+0x291/0x340
[  168.023297]  v2_release_dquot+0xce/0x120
[  168.023847]  dquot_release+0x197/0x3e0
[  168.024358]  ext4_release_dquot+0x22a/0x2d0
[  168.024932]  dqput.part.0+0x1c9/0x900
[  168.025430]  __dquot_drop+0x120/0x190
[  168.025942]  ext4_clear_inode+0x86/0x220
[  168.026472]  ext4_evict_inode+0x9e8/0xa22
[  168.028200]  evict+0x29e/0x4f0
[  168.028625]  dispose_list+0x102/0x1f0
[  168.029148]  evict_inodes+0x2c1/0x3e0
[  168.030188]  generic_shutdown_super+0xa4/0x3b0
[  168.030817]  kill_block_super+0x95/0xd0
[  168.031360]  deactivate_locked_super+0x85/0xd0
[  168.031977]  cleanup_mnt+0x2bc/0x480
[  168.033062]  task_work_run+0xd1/0x170
[  168.033565]  do_exit+0xa4f/0x2b50
[  168.037155]  do_group_exit+0xef/0x2d0
[  168.037666]  __x64_sys_exit_group+0x3a/0x50
[  168.038237]  do_syscall_64+0x3b/0x90
[  168.038751]  entry_SYSCALL_64_after_hwframe+0x44/0xae

In order to reproduce this problem, the following conditions need to be met:
1. Ext4 filesystem with no journal;
2. Filesystem image with incorrect quota data;
3. Abort filesystem forced by user;
4. umount filesystem;

As in ext4_quota_write:
...
         if (EXT4_SB(sb)->s_journal && !handle) {
                 ext4_msg(sb, KERN_WARNING, "Quota write (off=%llu, len=%llu)"
                         " cancelled because transaction is not started",
                         (unsigned long long)off, (unsigned long long)len);
                 return -EIO;
         }
...
We only check handle if NULL when filesystem has journal. There is need
check handle if NULL even when filesystem has no journal.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211223015506.297766-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.