Skip to content

[pull] master from torvalds:master#1052

Merged
pull[bot] merged 3860 commits intojamlee-t:masterfrom
torvalds:master
Oct 31, 2023
Merged

[pull] master from torvalds:master#1052
pull[bot] merged 3860 commits intojamlee-t:masterfrom
torvalds:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Oct 31, 2023

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

Kent Overstreet and others added 30 commits October 22, 2023 17:10
We were failing to upgrade to the latest compatible version - whoops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Now we also print the open_buckets owned by each write_point - this is
to help with debugging a shutdown hang.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This will be used when we need to re-hash a directory tree when setting
flags.

It is not possible to have concurrent btree_trans on a thread.

Signed-off-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Six locks do lock handoff via the wakeup path: the thread doing the
wakeup also takes the lock on behalf of the waiter, which means the
waiter only has to look at its waitlist entry, and doesn't have to touch
the lock cacheline while another thread is using it.

Linus noticed that this needs a real barrier, which this patch fixes.

Also add a comment for the should_sleep_fn() error path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: linux-bcachefs@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We're hunting for an open_bucket leak, add an assertion to help track it
down: also, we can't use the bch_fs after dropping our write ref to it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
A nice cleanup that avoids a bunch of open-coding name/string usage
around dirent usage.

Will be used by casefolding impl in future commits.

Signed-off-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Avoids doing a full strnlen for getting the length of the name of a
dirent entry.

Given the fact that the name of dirents is stored at the end of the
bkey's value, and we know the length of that in u64s, we can find the
last u64 and figure out how many NUL bytes are at the end of the string.

On little endian systems this ends up being the leading zeros of the
last u64, whereas on big endian systems this ends up being the trailing
zeros of the last u64.
We can take that value in bits and divide it by 8 to get the number of
NUL bytes at the end.

There is no endian-fixup or other compatibility here as this is string
data interpreted as a u64.

Signed-off-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
To ensure we aren't shooting ourselves in the foot after merge for
potentially doing future revisions for dirent or for storing multiple
names for casefolding, limit this to 512 for now.

Previously this define was linked to the max size a d_name in
bch_dirent could be.

Signed-off-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This fixes the device removal tests, which have been failing at random
due to the fact that when we're running the .key_invalid checks in the
write path the key may actually no longer exist - we might be racing
with the keys being deleted.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This fixes a bug in the cycle detector, bch2_check_for_deadlock() - we
have to make sure the node pointers in the btree paths array are set to
something not-garbage before another thread may see them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
 - There was no need for a retry loop in bch2_extent_fallocate(); if we
   have to retry we may be overwriting something different and we need
   to return an error and let the caller retry.
 - The bch2_alloc_sectors_start() error path was wrong, and wasn't
   running our cleanup at the end of the function

This also fixes a very rare open bucket leak due to the missing cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
For extents, we increase the number of bits of the size field to allow
extents to get bigger due to merging - but this code didn't check for
overflow.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The folio_hole_offset() helper returns a mix of bool and int types.
The latter is to support a possible -EAGAIN error code when using
nonblocking locks. This is not only confusing, but the only caller
also essentially ignores errors outside of stopping the range
iteration. This means an -EAGAIN error can't return directly from
folio_hole_offset() and may be lost via bch2_clamp_data_hole().

Fix up the error handling and make it more readable.
__filemap_get_folio() returns -ENOENT instead of NULL when no folio
exists, so reuse the same error code in folio_hole_offset(). Fix up
bch2_seek_pagecache_hole() to return the current offset on -ENOENT,
but otherwise return unexpected error code up to the caller.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In __bch2_buffered_write, if we fail to write to an entire !uptodate
folio, we have to back out the write, bail out and retry.

But we were missing an iov_iter_revert() call, so the data written to
the folio was lost and the rest of the write shifted to the wrong
offset.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
subvolume.c has gotten a bit large, this splits out a separate file just
for managing snapshot trees - BTREE_ID_snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This fixes koverstreet/bcachefs-tools#159

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In koverstreet/bcachefs#450, we're seeing
unexplained btree_path_relock_fail events - according to the information
currently in the tracepoint, it appears the relock should be succeeding.

This adds lock counts to the tracepoint to help track it down.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In the bch2_mount() error path, we were calling
deactivate_locked_super(), which calls ->kill_sb(), which in our case
was calling bch2_fs_free() without __bch2_fs_stop().

This changes bch2_mount() to just call bch2_fs_stop() directly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The is_ancestor bitmap is at optimization for bch2_snapshot_is_ancestor;
once we get sufficiently close to the ancestor ID we're searching for we
test a bitmap.

But initialization of the is_ancestor bitmap was broken; we do it by
using bch2_snapshot_parent(), but we call that on nodes that haven't
been initialized yet with bch2_mark_snapshot().

Fix this by adding a separate loop in bch2_snapshots_read() for
initializing the is_ancestor bitmap, and also add some new debug asserts
for checking this sort of breakage in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
After deleteing snapshots, we may be left with a snapshot tree where
some nodes only have one child, and we have a linear chain.

Interior snapshot nodes are never used directly (i.e. they never have
subvolumes that point to them), they are only referered to by child
snapshot nodes - hence, they are redundant.

The existing code talks about redundant snapshot nodes as forming and
equivalence class; i.e. nodes for which snapshot_t->equiv is equal. In a
given equivalence class, we only ever need a single key at a given
position - i.e. multiple versions with different snapshot fields are
redundant.

The existing snapshot cleanup code deletes these redundant keys, but not
redundant nodes. It turns out this is buggy, because we assume that
after snapshot deletion finishes we should only have a single key per
equivalence class, but the btree update path doesn't preserve this -
overwriting keys in old snapshots doesn't check for the equivalence
class being equal, and thus we can end up with duplicate keys in the
same equivalence class and fsck complaining about snapshot deletion not
having run correctly.

The equivalence class notion has been leaking out of the core snapshots
code and into too much other code, i.e. fsck, so this patch takes a
different approach: snapshot deletion now moves keys to the node in an
equivalence class being kept (the leafiest node) and then deletes the
redundant nodes in the equivalance class.

Some work has to be done to correctly delete interior snapshot nodes;
snapshot node depth and skiplist fields for descendent nodes have to be
fixed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If fsck finds a key that needs work done, the primary example being an
unlinked inode that needs to be deleted, and the key is in an internal
snapshot node, we have a bit of a conundrum.

The conundrum is that internal snapshot nodes are shared, and we in
general do updates in internal snapshot nodes because there may be
overwrites in some snapshots and not others, and this may affect other
keys referenced by this key (i.e. extents).

For example, we might be seeing an unlinked inode in an internal
snapshot node, but then in one child snapshot the inode might have been
reattached and might not be unlinked. Deleting the inode in the internal
snapshot node would be wrong, because then we'll delete all the extents
that the child snapshot references.

But if an unlinked inode does not have any overwrites in child
snapshots, we're fine: the inode is overwrritten in all child snapshots,
so we can do the deletion at the point of comonality in the snapshot
tree, i.e. the node where we found it.

This patch adds a new helper, bch2_propagate_key_to_snapshot_leaves(),
to handle the case where we need a to update a key that does have
overwrites in child snapshots: we copy the key to leaf snapshot nodes,
and then rewind fsck and process the needed updates there.

With this, fsck can now always correctly handle unlinked inodes found in
internal snapshot nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
…kernel/git/tip/tip

Pull x86 assembly code updates from Ingo Molnar:

 - Micro-optimize the x86 bitops code

 - Define target-specific {raw,this}_cpu_try_cmpxchg{64,128}() to
   improve code generation

 - Define and use raw_cpu_try_cmpxchg() preempt_count_set()

 - Do not clobber %rsi in percpu_{try_,}cmpxchg{64,128}_op

 - Remove the unused __sw_hweight64() implementation on x86-32

 - Misc fixes and cleanups

* tag 'x86-asm-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/lib: Address kernel-doc warnings
  x86/entry: Fix typos in comments
  x86/entry: Remove unused argument %rsi passed to exc_nmi()
  x86/bitops: Remove unused __sw_hweight64() assembly implementation on x86-32
  x86/percpu: Do not clobber %rsi in percpu_{try_,}cmpxchg{64,128}_op
  x86/percpu: Use raw_cpu_try_cmpxchg() in preempt_count_set()
  x86/percpu: Define raw_cpu_try_cmpxchg and this_cpu_try_cmpxchg()
  x86/percpu: Define {raw,this}_cpu_try_cmpxchg{64,128}
  x86/asm/bitops: Use __builtin_clz{l|ll} to evaluate constant expressions
…x/kernel/git/tip/tip

Pull x86 entry updates from Ingo Molnar:

 - Make IA32_EMULATION boot time configurable with
   the new ia32_emulation=<bool> boot option

 - Clean up fast syscall return validation code: convert
   it to C and refactor the code

 - As part of this, optimize the canonical RIP test code

* tag 'x86-entry-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry/32: Clean up syscall fast exit tests
  x86/entry/64: Use TASK_SIZE_MAX for canonical RIP test
  x86/entry/64: Convert SYSRET validation tests to C
  x86/entry/32: Remove SEP test for SYSEXIT
  x86/entry/32: Convert do_fast_syscall_32() to bool return type
  x86/entry/compat: Combine return value test from syscall handler
  x86/entry/64: Remove obsolete comment on tracing vs. SYSRET
  x86: Make IA32_EMULATION boot time configurable
  x86/entry: Make IA32 syscalls' availability depend on ia32_enabled()
  x86/elf: Make loading of 32bit processes depend on ia32_enabled()
  x86/entry: Compile entry_SYSCALL32_ignore() unconditionally
  x86/entry: Rename ignore_sysret()
  x86: Introduce ia32_enabled()
…kernel/git/tip/tip

Pull x86 irq fix from Ingo Molnar:
 "Fix out-of-order NMI nesting checks resulting in false positive
  warnings"

* tag 'x86-irq-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/nmi: Fix out-of-order NMI nesting checks & false positive warning
…ernel/git/tip/tip

Pull x86 mm handling updates from Ingo Molnar:

 - Add new NX-stack self-test

 - Improve NUMA partial-CFMWS handling

 - Fix #VC handler bugs resulting in SEV-SNP boot failures

 - Drop the 4MB memory size restriction on minimal NUMA nodes

 - Reorganize headers a bit, in preparation to header dependency
   reduction efforts

 - Misc cleanups & fixes

* tag 'x86-mm-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Drop the 4 MB restriction on minimal NUMA node memory size
  selftests/x86/lam: Zero out buffer for readlink()
  x86/sev: Drop unneeded #include
  x86/sev: Move sev_setup_arch() to mem_encrypt.c
  x86/tdx: Replace deprecated strncpy() with strtomem_pad()
  selftests/x86/mm: Add new test that userspace stack is in fact NX
  x86/sev: Make boot_ghcb_page[] static
  x86/boot: Move x86_cache_alignment initialization to correct spot
  x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach
  x86/sev-es: Allow copy_from_kernel_nofault() in earlier boot
  x86_64: Show CR4.PSE on auxiliaries like on BSP
  x86/iommu/docs: Update AMD IOMMU specification document URL
  x86/sev/docs: Update document URL in amd-memory-encryption.rst
  x86/mm: Move arch_memory_failure() and arch_is_platform_page() definitions from <asm/processor.h> to <asm/pgtable.h>
  ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window
  x86/numa: Introduce numa_fill_memblks()
…x/kernel/git/tip/tip

Pull x86 build update from Ingo Molnar:
 "Enable CONFIG_DEBUG_ENTRY=y in the x86 defconfigs"

* tag 'x86-build-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/defconfig: Enable CONFIG_DEBUG_ENTRY=y
…inux/kernel/git/tip/tip

Pull core updates from Thomas Gleixner:
 "Two small updates to ptrace_stop():

   - Add a comment to explain that the preempt_disable() before
     unlocking tasklist lock is not a correctness problem and just
     avoids the tracer to preempt the tracee before the tracee schedules
     out.

   - Make that preempt_disable() conditional on PREEMPT_RT=n.

     RT enabled kernels cannot disable preemption at this point because
     cgroup_enter_frozen() and sched_submit_work() acquire spinlocks or
     rwlocks which are substituted by sleeping locks on RT. Acquiring a
     sleeping lock in a preemption disable region is obviously not
     possible.

     This obviously brings back the potential slowdown of ptrace() for
     RT enabled kernels, but that's a price to be paid for latency
     guarantees"

* tag 'core-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  signal: Don't disable preemption in ptrace_stop() on PREEMPT_RT
  signal: Add a proper comment about preempt_disable() in ptrace_stop()
…nux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "Core:

   - Exclude managed interrupts in the calculation of interrupts which
     are targeted to a CPU which is about to be offlined to ensure that
     there are enough free vectors on the still online CPUs to migrate
     them over.

     Managed interrupts do not need to be accounted because they are
     either shut down on offline or migrated to an already reserved and
     guaranteed slot on a still online CPU in the interrupts affinity
     mask.

     Including managed interrupts is overaccounting and can result in
     needlessly aborting hibernation on large server machines.

   - The usual set of small improvements

  Drivers:

   - Make the generic interrupt chip implementation handle interrupt
     domains correctly and initialize the name pointers correctly

   - Add interrupt affinity setting support to the Renesas RZG2L chip
     driver.

   - Prevent registering syscore operations multiple times in the SiFive
     PLIC chip driver.

   - Update device tree handling in the NXP Layerscape MSI chip driver"

* tag 'irq-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/sifive-plic: Fix syscore registration for multi-socket systems
  irqchip/ls-scfg-msi: Use device_get_match_data()
  genirq/generic_chip: Make irq_remove_generic_chip() irqdomain aware
  genirq/matrix: Exclude managed interrupts in irq_matrix_allocated()
  PCI/MSI: Provide stubs for IMS functions
  irqchip/renesas-rzg2l: Enhance driver to support interrupt affinity setting
  genirq/generic-chip: Fix the irq_chip name for /proc/interrupts
  irqdomain: Annotate struct irq_domain with __counted_by
…nux/kernel/git/tip/tip

Pull SMP and CPU hotplug updates from Thomas Gleixner:

 - Switch the smp_call_function*() @csd argument to call_single_data_t
   type, which is a cache-line aligned typedef of the underlying struct
   __call_single_data.

   This ensures that the call data is not crossing a cacheline which
   avoids bouncing an extra cache-line for the SMP function call

 - Prevent offlining of the last housekeeping CPU when CPU isolation is
   active.

   Offlining the last housekeeping CPU makes no sense in general, but
   also caused the scheduler to panic due to the empty CPU mask when
   rebuilding the scheduler domains.

 - Remove an unused CPU hotplug state

* tag 'smp-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu/hotplug: Don't offline the last non-isolated CPU
  cpu/hotplug: Remove unused cpuhp_state CPUHP_AP_X86_VDSO_VMA_ONLINE
  smp: Change function signatures to use call_single_data_t
…/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "Updates for time, timekeeping and timers:

  Core:

   - Avoid superfluous deactivation of the tick in the low resolution
     tick NOHZ interrupt handler as the deactivation is handled already
     in the idle loop and on interrupt exit.

   - Update stale comments in the tick NOHZ code and rename the tick
     handler functions to be self-explanatory.

   - Remove an unused function in the tick NOHZ code, which was
     forgotten when the last user went away.

   - Handle RTC alarms which exceed the maximum alarm time of the
     underlying RTC hardware gracefully.

     Setting RTC alarms which exceed the maximum alarm time of the RTC
     hardware failed so far and caused suspend operations to abort.

     Cure this by limiting the alarm to the maximum alarm time of the
     RTC hardware, which is provided by the driver. This causes early
     resume wakeups, but that's way better than not suspending at all.

  Drivers:

   - Add a proper clocksource/event driver for the ancient Cirrus Logic
     EP93xx SoC family, which is one of the last non device-tree
     holdouts in arch/arm.

   - The usual boring device tree bindings updates and small fixes and
     enhancements all over the place"

* tag 'timers-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: ep93xx: Add driver for Cirrus Logic EP93xx
  dt-bindings: timers: Add Cirrus EP93xx
  clocksource/drivers/timer-atmel-tcb: Fix initialization on SAM9 hardware
  clocksource/timer-riscv: ACPI: Add timer_cannot_wakeup_cpu
  clocksource/drivers/sun5i: Remove surplus dev_err() when using platform_get_irq()
  drivers/clocksource/timer-ti-dm: Don't call clk_get_rate() in stop function
  clocksource/drivers/timer-imx-gpt: Fix potential memory leak
  dt-bindings: timer: renesas,rz-mtu3: Document RZ/{G2UL,Five} SoCs
  dt-bindings: timer: renesas,rz-mtu3: Improve documentation
  dt-bindings: timer: renesas,rz-mtu3: Fix overflow/underflow interrupt names
  alarmtimer: Use maximum alarm time for suspend
  rtc: Add API function to return alarm time bound by hardware limit
  tick/nohz: Update comments some more
  tick/nohz: Remove unused tick_nohz_idle_stop_tick_protected()
  tick/nohz: Don't shutdown the lowres tick from itself
  tick/nohz: Update obsolete comments
  tick/nohz: Rename the tick handlers to more self-explanatory names
…nux/kernel/git/tip/tip

Pull x86 APIC updates from Thomas Gleixner:

 - Make the quirk for non-maskable MSI interrupts in the affinity setter
   functional again.

   It was broken by a MSI core code update, which restructured the code
   in a way that the quirk flag was not longer set correctly.

   Trying to restore the core logic caused a deeper inspection and it
   turned out that the extra quirk flag is not required at all because
   it's the inverse of the reservation mode bit, which only can be set
   when the MSI interrupt is maskable.

   So the trivial fix is to use the reservation mode check in the
   affinity setter function and remove almost 40 lines of code related
   to the no-mask quirk flag.

 - Cure a Kconfig dependency issue which causes compile failures by
   correcting the conditionals in the affected header files.

 - Clean up coding style in the UV APIC driver.

* tag 'x86-apic-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic/msi: Fix misconfigured non-maskable MSI quirk
  x86/msi: Fix compile error caused by CONFIG_GENERIC_MSI_IRQ=y && !CONFIG_X86_LOCAL_APIC
  x86/platform/uv/apic: Clean up inconsistent indenting
…nux/kernel/git/tip/tip

Pull x86 core updates from Thomas Gleixner:

 - Limit the hardcoded topology quirk for Hygon CPUs to those which have
   a model ID less than 4.

   The newer models have the topology CPUID leaf 0xB correctly
   implemented and are not affected.

 - Make SMT control more robust against enumeration failures

   SMT control was added to allow controlling SMT at boottime or
   runtime. The primary purpose was to provide a simple mechanism to
   disable SMT in the light of speculation attack vectors.

   It turned out that the code is sensible to enumeration failures and
   worked only by chance for XEN/PV. XEN/PV has no real APIC enumeration
   which means the primary thread mask is not set up correctly. By
   chance a XEN/PV boot ends up with smp_num_siblings == 2, which makes
   the hotplug control stay at its default value "enabled". So the mask
   is never evaluated.

   The ongoing rework of the topology evaluation caused XEN/PV to end up
   with smp_num_siblings == 1, which sets the SMT control to "not
   supported" and the empty primary thread mask causes the hotplug core
   to deny the bringup of the APS.

   Make the decision logic more robust and take 'not supported' and 'not
   implemented' into account for the decision whether a CPU should be
   booted or not.

 - Fake primary thread mask for XEN/PV

   Pretend that all XEN/PV vCPUs are primary threads, which makes the
   usage of the primary thread mask valid on XEN/PV. That is consistent
   with because all of the topology information on XEN/PV is fake or
   even non-existent.

 - Encapsulate topology information in cpuinfo_x86

   Move the randomly scattered topology data into a separate data
   structure for readability and as a preparatory step for the topology
   evaluation overhaul.

 - Consolidate APIC ID data type to u32

   It's fixed width hardware data and not randomly u16, int, unsigned
   long or whatever developers decided to use.

 - Cure the abuse of cpuinfo for persisting logical IDs.

   Per CPU cpuinfo is used to persist the logical package and die IDs.
   That's really not the right place simply because cpuinfo is subject
   to be reinitialized when a CPU goes through an offline/online cycle.

   Use separate per CPU data for the persisting to enable the further
   topology management rework. It will be removed once the new topology
   management is in place.

 - Provide a debug interface for inspecting topology information

   Useful in general and extremly helpful for validating the topology
   management rework in terms of correctness or "bug" compatibility.

* tag 'x86-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/apic, x86/hyperv: Use u32 in hv_snp_boot_ap() too
  x86/cpu: Provide debug interface
  x86/cpu/topology: Cure the abuse of cpuinfo for persisting logical ids
  x86/apic: Use u32 for wakeup_secondary_cpu[_64]()
  x86/apic: Use u32 for [gs]et_apic_id()
  x86/apic: Use u32 for phys_pkg_id()
  x86/apic: Use u32 for cpu_present_to_apicid()
  x86/apic: Use u32 for check_apicid_used()
  x86/apic: Use u32 for APIC IDs in global data
  x86/apic: Use BAD_APICID consistently
  x86/cpu: Move cpu_l[l2]c_id into topology info
  x86/cpu: Move logical package and die IDs into topology info
  x86/cpu: Remove pointless evaluation of x86_coreid_bits
  x86/cpu: Move cu_id into topology info
  x86/cpu: Move cpu_core_id into topology info
  hwmon: (fam15h_power) Use topology_core_id()
  scsi: lpfc: Use topology_core_id()
  x86/cpu: Move cpu_die_id into topology info
  x86/cpu: Move phys_proc_id into topology info
  x86/cpu: Encapsulate topology information in cpuinfo_x86
  ...
…kernel/git/paulmck/linux-rcu

Pull nolibc updates from Paul McKenney:

 - Add stdarg.h header and a few additional system-call upgrades

 - Add support for constructors and destructors

 - Add tests to verify the ability to link multiple .o files against
   nolibc

 - Numerous string-function optimizations and improvements

 - Prevent redundant kernel relinks by avoiding embedding of initramfs
   into the kernel image

 - Allow building i386 with multiarch compiler and make ppc64le use
   qemu-system-ppc64

 - Miscellaneous fixups, including addition of -nostdinc for
   nolibc-test, avoiding -Wstringop-overflow warnings, and avoiding
   unused parameter warnings for ENOSYS fallbacks

* tag 'nolibc.2023.10.23a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  selftests/nolibc: add tests for multi-object linkage
  selftests/nolibc: use qemu-system-ppc64 for ppc64le
  tools/nolibc: add support for constructors and destructors
  tools/nolibc: drop test for getauxval(AT_PAGESZ)
  tools/nolibc: automatically detect necessity to use pselect6
  tools/nolibc: don't define new syscall number
  tools/nolibc: avoid unused parameter warnings for ENOSYS fallbacks
  selftests/nolibc: allow building i386 with multiarch compiler
  selftests/nolibc: don't embed initramfs into kernel image
  selftests/nolibc: libc-test: avoid -Wstringop-overflow warnings
  tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function
  tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function
  tools/nolibc: x86-64: Use `rep stosb` for `memset()`
  tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
  selftests/nolibc: use -nostdinc for nolibc-test
  tools/nolibc: add stdarg.h header
…rnel/git/paulmck/linux-rcu

Pull Linux Kernel Memory Model updates from Paul McKenney:
 "This update adds paragraphs to the portions of memory-barriers.txt
  that have been marked historical due to changes in the way that the
  Linux kernel handles DEC Alpha. These paragraphs includes information
  on where to find the corresponding up-to-date information"

* tag 'lkmm.2023.10.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  docs: memory-barriers: Add note on compiler transformation and address deps
…x/kernel/git/paulmck/linux-rcu

Pull CSD lock update from Paul McKenney:
 "This adds a kernel boot parameter that causes the kernel to panic if
  one of the call_smp_function() APIs is stalled for more than the
  specified duration.

  This is useful in deployments in which a clean panic is preferable to
  an indefinite stall"

* tag 'csd-lock.2023.10.23a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  smp,csd: Throw an error if a CSD lock is stuck for too long
…l/git/frederic/linux-dynticks

Pull RCU updates from Frederic Weisbecker:

 - RCU torture, locktorture and generic torture infrastructure updates
   that include various fixes, cleanups and consolidations.

   Among the user visible things, ftrace dumps can now be found into
   their own file, and module parameters get better documented and
   reported on dumps.

 - Generic and misc fixes all over the place. Some highlights:

     * Hotplug handling has seen some light cleanups and comments

     * An RCU barrier can now be triggered through sysfs to serialize
       memory stress testing and avoid OOM

     * Object information is now dumped in case of invalid callback
       invocation

     * Also various SRCU issues, too hard to trigger to deserve urgent
       pull requests, have been fixed

 - RCU documentation updates

 - RCU reference scalability test minor fixes and doc improvements.

 - RCU tasks minor fixes

 - Stall detection updates. Introduce RCU CPU Stall notifiers that
   allows a subsystem to provide informations to help debugging. Also
   cure some false positive stalls.

* tag 'rcu-next-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks: (56 commits)
  srcu: Only accelerate on enqueue time
  locktorture: Check the correct variable for allocation failure
  srcu: Fix callbacks acceleration mishandling
  rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP
  rcu: Standardize explicit CPU-hotplug calls
  rcu: Conditionally build CPU-hotplug teardown callbacks
  rcu: Remove references to rcu_migrate_callbacks() from diagrams
  rcu: Assume rcu_report_dead() is always called locally
  rcu: Assume IRQS disabled from rcu_report_dead()
  rcu: Use rcu_segcblist_segempty() instead of open coding it
  rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects
  srcu: Fix srcu_struct node grpmask overflow on 64-bit systems
  torture: Convert parse-console.sh to mktemp
  rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle()
  rcutorture: Replace schedule_timeout*() 1-jiffy waits with HZ/20
  torture: Add kvm.sh --debug-info argument
  locktorture: Rename readers_bind/writers_bind to bind_readers/bind_writers
  doc: Catch-up update for locktorture module parameters
  locktorture: Add call_rcu_chains module parameter
  locktorture: Add new module parameters to lock_torture_print_module_parms()
  ...
…/git/vbabka/slab

Pull slab updates from Vlastimil Babka:

 - SLUB: slab order calculation refactoring (Vlastimil Babka, Feng Tang)

   Recent proposals to tune the slab order calculations have prompted us
   to look at the current code and refactor it to make it easier to
   follow and eliminate some odd corner cases.

   The refactoring is mostly non-functional changes, but should make the
   actual tuning easier to implement and review.

* tag 'slab-for-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm/slub: refactor calculate_order() and calc_slab_order()
  mm/slub: attempt to find layouts up to 1/2 waste in calculate_order()
  mm/slub: remove min_objects loop from calculate_order()
  mm/slub: simplify the last resort slab order calculation
  mm/slub: add sanity check for slub_min/max_order cmdline setup
…kernel/git/kees/linux

Pull hardening updates from Kees Cook:
 "One of the more voluminous set of changes is for adding the new
  __counted_by annotation[1] to gain run-time bounds checking of
  dynamically sized arrays with UBSan.

   - Add LKDTM test for stuck CPUs (Mark Rutland)

   - Improve LKDTM selftest behavior under UBSan (Ricardo Cañuelo)

   - Refactor more 1-element arrays into flexible arrays (Gustavo A. R.
     Silva)

   - Analyze and replace strlcpy and strncpy uses (Justin Stitt, Azeem
     Shaikh)

   - Convert group_info.usage to refcount_t (Elena Reshetova)

   - Add __counted_by annotations (Kees Cook, Gustavo A. R. Silva)

   - Add Kconfig fragment for basic hardening options (Kees Cook, Lukas
     Bulwahn)

   - Fix randstruct GCC plugin performance mode to stay in groups (Kees
     Cook)

   - Fix strtomem() compile-time check for small sources (Kees Cook)"

* tag 'hardening-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (56 commits)
  hwmon: (acpi_power_meter) replace open-coded kmemdup_nul
  reset: Annotate struct reset_control_array with __counted_by
  kexec: Annotate struct crash_mem with __counted_by
  virtio_console: Annotate struct port_buffer with __counted_by
  ima: Add __counted_by for struct modsig and use struct_size()
  MAINTAINERS: Include stackleak paths in hardening entry
  string: Adjust strtomem() logic to allow for smaller sources
  hardening: x86: drop reference to removed config AMD_IOMMU_V2
  randstruct: Fix gcc-plugin performance mode to stay in group
  mailbox: zynqmp: Annotate struct zynqmp_ipi_pdata with __counted_by
  drivers: thermal: tsens: Annotate struct tsens_priv with __counted_by
  irqchip/imx-intmux: Annotate struct intmux_data with __counted_by
  KVM: Annotate struct kvm_irq_routing_table with __counted_by
  virt: acrn: Annotate struct vm_memory_region_batch with __counted_by
  hwmon: Annotate struct gsc_hwmon_platform_data with __counted_by
  sparc: Annotate struct cpuinfo_tree with __counted_by
  isdn: kcapi: replace deprecated strncpy with strscpy_pad
  isdn: replace deprecated strncpy with strscpy
  NFS/flexfiles: Annotate struct nfs4_ff_layout_segment with __counted_by
  nfs41: Annotate struct nfs4_file_layout_dsaddr with __counted_by
  ...
@pull pull bot added the ⤵️ pull label Oct 31, 2023
…nel/git/kees/linux

Pull pstore updates from Kees Cook:

 - Check for out-of-memory condition during initialization (Jiasheng
   Jiang)

 - Fix documentation typos (Tudor Ambarus)

* tag 'pstore-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore/platform: Add check for kstrdup
  docs: pstore-blk.rst: fix typo, s/console/ftrace
  docs: pstore-blk.rst: use "about" as a preposition after "care"
…nel/git/kees/linux

Pull execve updates from Kees Cook:

 - Support non-BSS ELF segments with zero filesz

   Eric Biederman and I refactored ELF segment loading to handle the
   case where a segment has a smaller filesz than memsz. Traditionally
   linkers only did this for .bss and it was always the last segment. As
   a result, the kernel only handled this case when it was the last
   segment. We've had two recent cases where linkers were trying to use
   these kinds of segments for other reasons, and the were in the middle
   of the segment list. There was no good reason for the kernel not to
   support this, and the refactor actually ends up making things more
   readable too.

 - Enable namespaced binfmt_misc

   Christian Brauner has made it possible to use binfmt_misc with mount
   namespaces. This means some traditionally root-only interfaces (for
   adding/removing formats) are now more exposed (but believed to be
   safe).

 - Remove struct tag 'dynamic' from ELF UAPI

   Alejandro Colomar noticed that the ELF UAPI has been polluting the
   struct namespace with an unused and overly generic tag named
   "dynamic" for no discernible reason for many many years. After
   double-checking various distro source repositories, it has been
   removed.

 - Clean up binfmt_elf_fdpic debug output (Greg Ungerer)

* tag 'execve-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  binfmt_misc: enable sandboxed mounts
  binfmt_misc: cleanup on filesystem umount
  binfmt_elf_fdpic: clean up debug warnings
  mm: Remove unused vm_brk()
  binfmt_elf: Only report padzero() errors when PROT_WRITE
  binfmt_elf: Use elf_load() for library
  binfmt_elf: Use elf_load() for interpreter
  binfmt_elf: elf_bss no longer used by load_elf_binary()
  binfmt_elf: Support segments with 0 filesz and misaligned starts
  elf, uapi: Remove struct tag 'dynamic'
…it/jarkko/linux-tpmdd

Pull tpm updates from Jarkko Sakkinen:
 "This is a small sized pull request. One commit I would like to
  pinpoint is my fix for init_trusted() rollback, as for actual patch I
  did not receive any feedback"

* tag 'tpmdd-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  keys: Remove unused extern declarations
  integrity: powerpc: Do not select CA_MACHINE_KEYRING
  KEYS: trusted: tee: Refactor register SHM usage
  KEYS: trusted: Rollback init_trusted() consistently
…ernel/git/pcmoore/audit

Pull audit update from Paul Moore:
 "Only two audit patches for v6.7, both fairly small with a combined 11
  lines of changes.

  The first patch is a simple __counted_by annontation, and the second
  fixes a a problem where audit could deadlock on task_lock() when an
  exe filter is configured. More information is available in the commit
  description and the patch is tagged for stable"

* tag 'audit-pr-20231030' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: don't take task_lock() in audit_exe_compare() code path
  audit: Annotate struct audit_chunk with __counted_by
…/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:

 - improve the SELinux debugging configuration controls in Kconfig

 - print additional information about the hash table chain lengths when
   when printing SELinux debugging information

 - simplify the SELinux access vector hash table calcaulations

 - use a better hashing function for the SELinux role tansition hash
   table

 - improve SELinux load policy time through the use of optimized
   functions for calculating the number of bits set in a field

 - addition of a __counted_by annotation

 - simplify the avtab_inert_node() function through a simplified
   prototype

* tag 'selinux-pr-20231030' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: simplify avtab_insert_node() prototype
  selinux: hweight optimization in avtab_read_item
  selinux: improve role transition hashing
  selinux: simplify avtab slot calculation
  selinux: improve debug configuration
  selinux: print sum of chain lengths^2 for hash tables
  selinux: Annotate struct sidtab_str_cache with __counted_by
…nel/git/pcmoore/lsm

Pull LSM updates from Paul Moore:

 - Add new credential functions, get_cred_many() and put_cred_many() to
   save some atomic_t operations for a few operations.

   While not strictly LSM related, this patchset had been rotting on the
   mailing lists for some time and since the LSMs do care a lot about
   credentials I thought it reasonable to give this patch a home.

 - Five patches to constify different LSM hook parameters.

 - Fix a spelling mistake.

* tag 'lsm-pr-20231030' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  lsm: fix a spelling mistake
  cred: add get_cred_many and put_cred_many
  lsm: constify 'sb' parameter in security_sb_kern_mount()
  lsm: constify 'bprm' parameter in security_bprm_committed_creds()
  lsm: constify 'bprm' parameter in security_bprm_committing_creds()
  lsm: constify 'file' parameter in security_bprm_creds_from_file()
  lsm: constify 'sb' parameter in security_quotactl()
Pull rust updates from Miguel Ojeda:
 "A small one compared to the previous one in terms of features. In
  terms of lines, as usual, the 'alloc' version upgrade accounts for
  most of them.

  Toolchain and infrastructure:

   - Upgrade to Rust 1.73.0

     This time around, due to how the kernel and Rust schedules have
     aligned, there are two upgrades in fact. They contain the fixes for
     a few issues we reported to the Rust project.

     In addition, a few cleanups indicated by the upgraded compiler or
     possible thanks to it. For instance, the compiler now detects
     redundant explicit links.

   - A couple changes to the Rust 'Makefile' so that it can be used with
     toybox tools, allowing Rust to be used in the Android kernel build.

  x86:

   - Enable IBT if enabled in C

  Documentation:

   - Add "The Rust experiment" section to the Rust index page

  MAINTAINERS:

   - Add Maintainer Entry Profile field ('P:').

   - Update our 'W:' field to point to the webpage we have been building
     this year"

* tag 'rust-6.7' of https://github.com/Rust-for-Linux/linux:
  docs: rust: add "The Rust experiment" section
  x86: Enable IBT in Rust if enabled in C
  rust: Use grep -Ev rather than relying on GNU grep
  rust: Use awk instead of recent xargs
  rust: upgrade to Rust 1.73.0
  rust: print: use explicit link in documentation
  rust: task: remove redundant explicit link
  rust: kernel: remove `#[allow(clippy::new_ret_no_self)]`
  MAINTAINERS: add Maintainer Entry Profile field for Rust
  MAINTAINERS: update Rust webpage
  rust: upgrade to Rust 1.72.1
  rust: arc: add explicit `drop()` around `Box::from_raw()`
…linux/kernel/git/tj/wq

Pull workqueue rust bindings from Tejun Heo:
 "Add rust bindings to allow rust code to schedule work items on
  workqueues.

  While the current bindings don't cover all of the workqueue API, it
  provides enough for basic usage and can be expanded as needed"

* tag 'wq-for-6.7-rust-bindings' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  rust: workqueue: add examples
  rust: workqueue: add `try_spawn` helper method
  rust: workqueue: implement `WorkItemPointer` for pointer types
  rust: workqueue: add helper for defining work_struct fields
  rust: workqueue: define built-in queues
  rust: workqueue: add low-level workqueue bindings
  rust: sync: add `Arc::{from_raw, into_raw}`
…it/tj/wq

Pull workqueue update from Tejun Heo:
 "Just one commit to improve lockdep annotation for work_on_cpu() to
  avoid spurious warnings"

* tag 'wq-for-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Provide one lock class key per work_on_cpu() callsite
…el/git/tj/cgroup

Pull cgroup updates from Tejun Heo:

 - cpuset now supports remote partitions where CPUs can be reserved for
   exclusive use down the tree without requiring all the intermediate
   nodes to be partitions. This makes it easier to use partitions
   without modifying existing cgroup hierarchy.

 - cpuset partition configuration behavior improvement

 - cgroup_favordynmods= boot param added to allow setting the flag on
   boot on cgroup1

 - Misc code and doc updates

* tag 'cgroup-for-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  docs/cgroup: Add the list of threaded controllers to cgroup-v2.rst
  cgroup: use legacy_name for cgroup v1 disable info
  cgroup/cpuset: Cleanup signedness issue in cpu_exclusive_check()
  cgroup/cpuset: Enable invalid to valid local partition transition
  cgroup: add cgroup_favordynmods= command-line option
  cgroup/cpuset: Extend test_cpuset_prs.sh to test remote partition
  cgroup/cpuset: Documentation update for partition
  cgroup/cpuset: Check partition conflict with housekeeping setup
  cgroup/cpuset: Introduce remote partition
  cgroup/cpuset: Add cpuset.cpus.exclusive for v2
  cgroup/cpuset: Add cpuset.cpus.exclusive.effective for v2
  cgroup/cpuset: Fix load balance state in update_partition_sd_lb()
  cgroup: Avoid extra dereference in css_populate_dir()
  cgroup: Check for ret during cgroup1_base_files cft addition
@pull pull bot merged commit 5a6a09e into jamlee-t:master Oct 31, 2023
pull bot pushed a commit that referenced this pull request Dec 2, 2024
When CONFIG_CPUMASK_OFFSTACK and CONFIG_DEBUG_PER_CPU_MAPS are selected,
cpu_max_bits_warn() generates a runtime warning similar as below when
showing /proc/cpuinfo. Fix this by using nr_cpu_ids (the runtime limit)
instead of NR_CPUS to iterate CPUs.

[    3.052463] ------------[ cut here ]------------
[    3.059679] WARNING: CPU: 3 PID: 1 at include/linux/cpumask.h:108 show_cpuinfo+0x5e8/0x5f0
[    3.070072] Modules linked in: efivarfs autofs4
[    3.076257] CPU: 0 PID: 1 Comm: systemd Not tainted 5.19-rc5+ #1052
[    3.099465] Stack : 9000000100157b08 9000000000f18530 9000000000cf846c 9000000100154000
[    3.109127]         9000000100157a50 0000000000000000 9000000100157a58 9000000000ef7430
[    3.118774]         90000001001578e8 0000000000000040 0000000000000020 ffffffffffffffff
[    3.128412]         0000000000aaaaaa 1ab25f00eec96a37 900000010021de80 900000000101c890
[    3.138056]         0000000000000000 0000000000000000 0000000000000000 0000000000aaaaaa
[    3.147711]         ffff8000339dc220 0000000000000001 0000000006ab4000 0000000000000000
[    3.157364]         900000000101c998 0000000000000004 9000000000ef7430 0000000000000000
[    3.167012]         0000000000000009 000000000000006c 0000000000000000 0000000000000000
[    3.176641]         9000000000d3de08 9000000001639390 90000000002086d8 00007ffff0080286
[    3.186260]         00000000000000b0 0000000000000004 0000000000000000 0000000000071c1c
[    3.195868]         ...
[    3.199917] Call Trace:
[    3.203941] [<90000000002086d8>] show_stack+0x38/0x14c
[    3.210666] [<9000000000cf846c>] dump_stack_lvl+0x60/0x88
[    3.217625] [<900000000023d268>] __warn+0xd0/0x100
[    3.223958] [<9000000000cf3c90>] warn_slowpath_fmt+0x7c/0xcc
[    3.231150] [<9000000000210220>] show_cpuinfo+0x5e8/0x5f0
[    3.238080] [<90000000004f578c>] seq_read_iter+0x354/0x4b4
[    3.245098] [<90000000004c2e90>] new_sync_read+0x17c/0x1c4
[    3.252114] [<90000000004c5174>] vfs_read+0x138/0x1d0
[    3.258694] [<90000000004c55f8>] ksys_read+0x70/0x100
[    3.265265] [<9000000000cfde9c>] do_syscall+0x7c/0x94
[    3.271820] [<9000000000202fe4>] handle_syscall+0xc4/0x160
[    3.281824] ---[ end trace 8b484262b4b8c24c ]---

Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.