Releases: redpanda-data/redpanda
v23.3.16
Features
- Schema Registry: Support for
deleted=true
query parameter onPOST /subjects/<subject>
. by @BenPope in #18432 - #18460 rpk: ability to transfer partition leadership by @daisukebe in #18461
Bug Fixes
- Fix initial_leader_epoch/KIP-320 handling in fetch requests. It was ignored until now which prevented consumers to correctly detect suffix truncation. For Redpanda (and Raft), this is a minor problem since suffix truncation is a very improbable event. by @nvartolomei in #17728
- #17957 Fix incorrect log truncations caused by delayed replication requests. by @ztlpn in #18523
- #18282 #18566 Fix a scenario where list_offset with a timestamp could return a lower offset than partition start after a trim-prefix command. This could lead to consumers being stuck with an out-of-range-offset exception if they began consuming from an offset below the one which was used in the trim-prefix command. by @nvartolomei in #18599
- #18282 #18566 Fix an edge case where a timequery returns no results if it races with tiered storage retention and garbage collection. This is important at least for consumers that fall behind retention. They interpret such response as the partition is empty and jump to the HWM instead of resuming consuming from the first available message. by @nvartolomei in #18599
- #18443 Fixed an assertion triggering in a full-disk scenario by @andijcr in #18444
- #18517 Don't mark partition rebalance complete if some partitions are not moveable (e.g. due to partial recovery mode) by @ztlpn in #18522
- #18569 Enforce client quota throttling in a Kafka-compatible way, meaning we enforce the throttle delay on the next request if the client did not enforce it on its side. by @pgellert in #18575
- concurrent requests of set_log_level + expiration now work as expected by @andijcr in #18438
- fixes possible stall in
raft::state_machine_manger
by @mmaslankaprv in #18637
Improvements
- Made electing a leader faster by @mmaslankaprv in #18625
- #17951 Schema Registry: Improve retry logic for
delete_config
anddelete_subject_permanent
by @BenPope in #18624 - #17951 Schema Registry: Improve tombstoning when deleting a subject by @BenPope in #18624
Full Changelog: v23.3.15...v23.3.16
v24.1.2
Features
- Re-adds the
fetch_read_strategy
cluster config property to select betweenpolling
andnon-polling
fetch implementations. Uses thenon-polling
fetch implementation by default. by @StephanDollberg in #18176 - #18163 rpk container start: now starts a Redpanda Console container connected with the cluster. by @r-vasquez in #18164
- rpk container now has a set of flags to specify ports for node to start on. by @r-vasquez in #18148
Bug Fixes
- Fix a bug validating WebAssembly when global constants are specific values that have the encoded byte 0x0B. by @rockwotj in #18108
- Fix a bug where an invalid buffer passed into the WebAssembly host from the guest could cause Redpanda to abort. by @rockwotj in #18234
- Fix a scenario where list_offset with a timestamp could return a lower offset than partition start after a trim-prefix command. This could lead to consumers being stuck with an out-of-range-offset exception if they began consuming from an offset below the one which was used in the trim-prefix command. by @nvartolomei in #18281
- #18100 Better mapping of REST error codes by @mmaslankaprv in #18102
- #18158 Fix issuing timequeries to cloud storage if
remote.read
is not enabled. by @WillemKauf in #18159 - #18240 Fixes a crash caused by a race between a client disconnect and a segment reader in tiered storage. by @andrwng in #18241
- #18317 Fixes expiration for transactions that have begun and not produced any data batches. This prevents a stalling LSO. by @bharathv in #18324
- PR #18051 [v24.1.x] Address oversized allocs across kafka API and schema registry by @oleiman
- PR #18125 [v24.1.x] cluster_recovery_backend_test: fix unsafe iteration by @andrwng
- PR #18141 [v24.1.x] Fixes for wait_ms cpu profiler mode by @StephanDollberg
- PR #18216 [v24.1.x] controller_backend: prevent busy-looping when removing partitions by @ztlpn
- PR #18222 [v24.1.x] tx/tm_stm: fix unboundedness of _pid_tx_id by @bharathv
- PR #18328 [v24.1.x] Change information stored in
_topic_node_index
to avoid oversized alloc by @ballard26 - PR #18406 [v24.1.x] Fix some concurrent memory access problems in partition balancer by @ztlpn
Improvements
- Improve cloud storage cache to prevent readers from being blocked during cache eviction. by @Lazin in #18134
- #18150
rpk container start
: You can now select the subnet and gateway to create your 'redpanda' network. by @r-vasquez in #18151 - allow interpreting
'retention_duration' = -1
in a topic_manifest.json file as infinite time retention by @andijcr in #18243 - rpk container now starts the seed broker using the default listener ports. by @r-vasquez in #18148
- PR #18117 [v24.1.x] wasm/parser: better global support by @rockwotj
- PR #18128 [v24.1.x] c/balancer_backend: first initialize planner and then call plan by @mmaslankaprv
- PR #18194 [v24.1.x] configuration to enable delete retention for consumer offsets by @bharathv
- PR #18228 [v24.1.x] CORE-1752: cst: Downgrade error logs to debug by @abhijat
- PR #18269 [v24.1.x] [CORE-2581] cst: move chunk downloads to remote segment bg loop by @abhijat
- PR #18321 [v24.1.x] rpk: stop using args[0] in cloud cluster select by @r-vasquez
- PR #18318 [v24.1.x] offline_log_viewer: fix get_control_record_type by @bharathv
Full Changelog: v24.1.1...v24.1.2
v23.3.15
Bug Fixes
- Fix a bug where an invalid buffer passed into the WebAssembly host from the guest could cause Redpanda to abort. by @rockwotj in #18235
- Fixes expiration for transactions that have begun and not produced any data batches. This prevents a stalling LSO. by @bharathv in #18248
- #18237 Fixes a crash caused by a race between a client disconnect and a segment reader in tiered storage. by @andrwng in #18238
- PR #18223 [v23.3.x] tx/tm_stm: fix unboundedness of _pid_tx_id by @bharathv
Improvements
- allow interpreting
'retention_duration' = -1
in a topic_manifest.json file as infinite time retention by @andijcr in #18242
Full Changelog: v23.3.14...v23.3.15
v23.3.14
Features
- rpk:
rpk cluster partitions list
now supports filtering with broker IDs. by @daisukebe in [#18104](https://github.com/redpanda-data/ redpanda/pull/18104)
Bug Fixes
- Fix a bug validating WebAssembly when global constants are specific values that have the encoded byte 0x0B. by @rockwotj in #18109
- #18081 Fixes a crash that could happen when reading from local storage with a large number of segments that all do not contain user data. by @andrwng in #18088 * #18101 Better mapping of REST error codes by @mmaslankaprv in [#18103](https:// github.com//pull/18103)
- #18155 Fix issuing timequeries to cloud storage if
remote.read
is not enabled. by @WillemKauf in #18156 - PR #18123 [v23.3.x] cluster_recovery_backend_test: fix unsafe iteration by @andrwng
Improvements
- #18133 Improve cloud storage cache to prevent readers from being blocked during cache eviction. by @Lazin in #18138
- PR #18111 [v23.3.x] gh: fix lint-cpp for ubuntu noble by @dotnwat
- PR #18118 [v23.3.x] wasm/parser: better global support by @rockwotj
- PR #18154 [v23.3.x] c/topic_table: do not log duplicated lifecycle marker command by @mmaslankaprv
- PR #18166 [v23.3.x] rpk: bump docker version by @r-vasquez
- PR #18186 [v23.3.x] rpk: bump go deps by @r-vasquez
- PR #18191 [backport][23.3.x] configuration to enable delete retention for consumer offsets #18140 by @bharathv
Full Changelog: v23.3.13...v23.3.14
v23.2.29
Bug Fixes
- fixed a problem leading to UAF error while calculating cloud stage usage by @mmaslankaprv in #17981
Full Changelog: v23.2.28...v23.2.29
v24.1.1
New Features
- Adds new cluster and topic level configurations for write caching feature. by @bharathv in #16924
- PR #17009 write caching - raft implementation by @bharathv
- Enables write caching by default in dev container mode. by @bharathv in #17677
- Add
rpk security roles
, a new command space to manage your Redpanda roles. by @r-vasquez in #17538 - Introduce
--allow-role
and--deny-role
flags forrpk acl
commands by @oleiman in #17416 - Introduces GET /v1/security/users/roles (Admin API) by @oleiman in #17155
- Introduces
/v1/security/roles/{role}/members
Admin API endpoint for reading and updating RBAC role members. by @oleiman in #17153 - #17679
rpk security acl list
now supports--format=json
by @rockwotj in #17684 - Data Transforms now support writing to multiple output topics. The
REDPANDA_OUTPUT_TOPIC
environment variable exposed in transforms is now removed forREDPANDA_OUTPUT_TOPIC_%d
for each output topic specified. by @rockwotj in #16946 rpk transform deploy
now supports multiple output topics by @rockwotj in #16950- The golang transform-sdk gains the ability to write to multiple output topics.
This feature can only be used in Redpanda v24.1.x or newer. by @rockwotj in #16978 - The rust transform-sdk gains the ability to write to multiple output topics.
This feature can only be used in Redpanda v24.1.x or newer. by @rockwotj in #17007 - Publish log (i.e. stderr/stdout) output from data transforms exclusively to an internally managed Redpanda topic (
_redpanda.transform_logs
). Data transform logs will no longer appear in broker logs. by @oleiman in #16485 - Introduce
rpk transform logs NAME
to view logs for a transform by @rockwotj in #16923 - #16075 Data Transform's Rust SDK now supports a Schema Registry Client. by @rockwotj in #16464
- Topic-aware partition balancing, which attempts to spread partition replicas topic-wise across a cluster. This behavior is controlled by the
partition_autobalancing_topic_aware
config property (enabled by default). by @ztlpn in #17263 - Tiered Storage now supports using Azure VM user-assigned managed identities for securely accessing
Azure Blob Storage @andijcr in #17157 - Topic recovery and ‘whole-cluster restore’ from Tiered Storage now perform integrity checks on metadata to ensure that each partition can be recovered successfully by @andijcr in #16915
- You can now create namespaces in Redpanda Cloud using rpk cloud namespace. by @r-vasquez in #16685
- #13175
rpk debug bundle
now includes a CPU profile of the requested nodes. by @r- vasquez in #16414 - #16107 You can print a schema now using
rpk registry schema get --print-schema
. by @r-vasquez in #16109 - #16623
rpk redpanda config bootstrap
now supports bootstrapping your advertised addresses configuration. by @r-vasquez in #16652 - new metric vectorized_storage_log_compacted_away_bytes for compaction observability in local storage added by @andijcr in #17579
- new public metric
redpanda_cluster_latest_cluster_metadata_manifest_age
to track the age of the cluster_metadata_manifest in cloud storage added by @andijcr in #17404
Bug Fixes
- Aggregates partitions in some cloud storage metrics when the
aggregate_metrics
cluster config is set to true. by @ballard26 in #16336 - Fix a bug that could lead to raft log inconsistencies when 2 out of 3 nodes in a configuration are changed. by @ztlpn in #17675
- Fix a bug that resulted in Redpanda ignoring until the next restart config values that were reset to their defaults. by @ztlpn in #16504
- Fix a bug where logging in a transform could cause the transform to not make progress. by @rockwotj in #17186
- Fix a crash that happened when a cluster that was partially in recovery mode tried to upload consumer offsets to cloud storage. by @ztlpn in #17013
- Fix a memory leak when using transactions with many different producer IDs. by @rockwotj in #15797
- Fix a potential cloud storage cache access time tracker file corruption during shutdown. by @nvartolomei in #16648
- Fix a race condition between suffix truncation / delete records and adjacent segment compaction that can lead to crashes and data-loss. by @ nvartolomei in #17019
- Fix a rare bug where http client connections would vanish from the connection pool leading to various operations hanging while waiting for an http client. by @nvartolomei in #15681
- Fix an issue where
rpk transform logs
waits for records without the--follow
flag specified. by @rockwotj in #17832 - Fix an issue with
Cargo.toml
when initializing a Rust Data Transform project viarpk transform init
by @rockwotj in #15934 - Fix initial_leader_epoch/KIP-320 handling in fetch requests. It was ignored until now which prevented consumers to correctly detect suffix truncation. For Redpanda (and Raft), this is a minor problem since suffix truncation is a very improbable event. by @nvartolomei in #17674
- Fix internal RPC client connection stall after more than 2^32 requests are sent. by @ztlpn in #16156
- Fix large allocation in partition manifest. by @dotnwat in #16160
- Fix oversized allocation in storage. by @Lazin in #16642
- Fix the starter code for Rust projects in
rpk transform init
by @rockwotj in #16180 - Fix tiered-storage housekeeping problem that may cause replaced segments to pile up if the spillover is enabled. by @Lazin in #16163
- Fixed a few oversized allocations for some admin server endpoints. by @rockwotj in #16551
- Fixed the values for the rpc client in/out bytes metric by @ballard26 in #17933
- Fixes
rpk transform init --install-deps
so that an explicit true value is not needed. by @rockwotj in #17831 - Fixes a bug in windowed compaction that could cause Redpanda to crash when an error occurs while reading batches. by @andrwng in #16928
- Fixes a bug of config_frontend methods getting called on shards other than the controller shard. by @pgellert in #17088
- Fixes a bug that may prevent redpanda from shutting down cleanly when auditing is enabled by @graphcareful in #16315
- Fixes a concurrency issue in transform offset commits pertaining to taking/applying snapshots. by @bharathv in #17383
- Fixes a crash if a WebAssembly function is deployed that immediately crashes. by @rockwotj in #15939
- Fixes a crash that could happen when reading from local storage with a large number of segments that all do not contain user data. by @andrwng in #18075
- Fixes a plausible correctness issue with idempotent requests during replication failures. by @bharathv in #16706
- Fixes a race between compaction and Raft recovery for compacted topics that could result in aborted transactional data batches being visible. by @andrwng in #16295
- Fixes an an improper initialization of metrics related to controller snapshot uploads. by @andrwng in #16070
- Fixes an issue where using the CPU profiler with running Data Transforms could cause the process to deadlock. by @rockwotj in #17877
- Fixes issue that causes the connection to hang when an unsupported compression type is passed via an incremental_alter_configs request by @graphcareful in #16399
- Fixes lock starvation during transform offset commits. by @bharathv in #17402
- Have fetch handler ensure rack awareness is enabled before performing follower fetching by @michael-redpanda in #15883
- Prevent an assertion from being triggered when Wasm VMs fail immediately. by @rockwotj in #15933
- Prevent detecting leader epoch advancement when state is not up to date by @mmaslankaprv in #16560
- Prevent reactor stalls querying leadership information for large clusters by @rockwotj in #17473
- Protect against a very rare scenario where after node restart, some of the partition replicas hosted on that node could not take part in leader elections. by @ztlpn in #16068
- Redpanda used to accept an empty string in
redpanda.rack
in node config. This would cause issues in Kafka operations. Redpanda will now error on startup ifredpanda.rack
is set to an empty string. by @michael-redpanda in #15835 - Redpanda will now correctly handle an empty rack ID provided in a fetch request by @michael-redpanda in #15846
- Reduces maximum log line size from
1MiB
to128KiB
to reduce occurrences of memory allocation failures by @michael-redpanda in #17922 - Report runtime public metrics by task queue for all cores, not just core 0 by @rockwotj in #16154
- Return a HTTP 400 error code when deploying a transform to a topic that doesn't exist instead of a 500 by @rockwotj in #17011
- Schema Registry: Deleted schemas no longer reappear after certain compaction patterns on the
_schemas
topic. by @BenPope in #17091 - #15042 Fixes a bug in the tiered storage time-based query implementation that could result in a consumer hang when consuming very old data. by @andrwng in #16645
- #15201 Fix assertion triggered by interleaving of log flush and log truncation followed by append by @Lazin in #16105
- #15603 cluster config aliases are accepted while reading from yaml by @andijcr in #15605
- #15674 Fix an issue where new configs would continually revert to legacy defaults after an upgrade. by @oleiman in #15761
- #15722 #7946 Fix an issue where create topics responses would show incorrect partition count and replication factor by @oleiman in #16410* #15811 Several additional metrics will have their "partition" label aggregated away (i.e., into a single series per remaining label set with no partition label,...
v23.3.13
Bug Fixes
- fixed a problem leading to UAF error while calculating cloud stage usage by @mmaslankaprv in [#17980](https://github.com/ /pull/17980)
- prevents partial consumer group recovery by @mmaslankaprv in #18016
Improvements
- Changes what the
kafka_latency_fetch_latency
metric measures to be the time the firstfetch_ntps_in_parallel
takes. by @ ballard26 in #17977 - largely reduced number of health report copies by @mmaslankaprv in #18017## None
No release notes explicitly specified. - PR #17917 [v23.3.x] archival: Start housekeeping jobs only on a leader by @Lazin
- PR #17927 [v23.3.x] c/hm_backend: cache the collected report by @ mmaslankaprv
- PR #17960 [v23.3.x] rptest: fix test_exceed_broker_limit flake by @ travisdowns
- PR #17979 [v23.3.x] [CORE-2400] kafka/server: Disable quota balancer by @BenPope
- PR #18019 [v23.3.x] CORE-1752: cst: Downgrade error logs to debug by @ abhijat
- PR #18023 [v23.3.x] Backport of #16243 by @mmaslankaprv
- PR #18039 [v23.3.x] c/controller_backend: try to force-abort reconfiguration only on leaders by @ztlpn
- PR #18053 [v23.3.x] Address oversized allocs across kafka API and schema registry by @oleiman
Full Changelog: v23.3.12...v23.3.13
v23.2.28
Bug Fixes
- Fix a race condition between suffix truncation / delete records and adjacent segment compaction that can lead to crashes and data-loss. by @ nvartolomei in #17254
- Fix initial_leader_epoch/KIP-320 handling in fetch requests. It was ignored until now which prevented consumers to correctly detect suffix truncation. For Redpanda (and Raft), this is a minor problem since suffix truncation is a very improbable event. by @nvartolomei in #17727
- Fixes a bug of config_frontend methods getting called on shards other than the controller shard. by @pgellert in [#17211](https://github.com/ /pull/17211)
- Prevent detecting leader epoch advancement when state is not up to date by @mmaslankaprv in [#17882](https://github.com/redpanda-data/ redpanda/pull/17882)
- Reduces maximum log line size from
1MiB
to128KiB
to reduce occurrences of memory allocation failures by @michael-redpanda in #17924 - #16612 fixes small inconsistency between Kafka and Redpanda when trying to query end_offset of an empty log by @mmaslankaprv in #17881
- #17238 Fixes a bug in CreateTopicsResponse to now return all the configs of the topic, not just the topic-specific override configs. by @pgellert in #17241
- #17790 Fix a bug that could lead to raft log inconsistencies when 2 out of 3 nodes in a configuration are changed. by @ztlpn in #17797
- prevents partial consumer group recovery by @mmaslankaprv in #17882
- PR #17160 [v23.2.x] compression: Allocate memory for LZ4_compressEnd by @abhijat
- PR #17826 [v23.2.x] CORE-1722: compression: Use preallocated decompression buffers for lz4 by @abhijat
- PR #17881 [v23.2.x] k/replicated_partition: fixed querying end offset of an empty log by @nvartolomei
- PR #17882 Backport of #17673 #17498 #16560 by @mmaslankaprv
Improvements
-
Adds a new public metric redpanda_raft_recovery_partition_movement_consumed_bandwidth that tracks how much bandwidth is currently in use for raft recovery. This helps tune raft_learner_recovery_rate. by @bharathv in #17217
-
PR #17397 [v23.2.x] k/group: recover leader epoch on leader change by @nvartolomei * PR #17448 [v23.2.x] tx: fix param ordering in log statement by @nvartolomei
-
PR #17577 [v23.2.x] c/topics_frontend: break the loop when dispatching to current leader by @mmaslankaprv
Full Changelog: v23.2.27...v23.2.28
v23.3.12
Bug Fixes
- Fix an issue where
rpk transform logs
waits for records without the--follow
flag specified. by @rockwotj in #17837 - Fixes
rpk transform init --install-deps
so that an explicit true value is not needed. by @rockwotj in [#17867](https:// github.com//pull/17867) - Fixes a crash when data transforms error and restart by @rockwotj in [#17696](https://github.com/redpanda-data/redpanda/pull/ 17696)
- Reduces maximum log line size from
1MiB
to128KiB
to reduce occurrences of memory allocation failures by @michael-redpanda in #17923 - #16612 fixes small inconsistency between Kafka and Redpanda when trying to query end_offset of an empty log by @mmaslankaprv in #17809
- #17718 Fix reported config source for cleanup.policy by reporting DEFAULT_CONFIG instead of DYNAMIC_TOPIC_CONFIG for the default value. by @pgellert in [#17719](https://github.com/redpanda-data/ redpanda/pull/17719)
- #17791 Fix a bug that could lead to raft log inconsistencies when 2 out of 3 nodes in a configuration are changed. by @ztlpn in #17796
- #17817 Fix problem in Tiered-Storage that could potentially cause consumers to get stuck by @Lazin in #17818
- #17891 fix a race between eviction and producer registration that results in an invalid transaction state. by @bharathv in #17900
Improvements
- Handle missing data transform logs topic in
rpk transform logs
by @rockwotj in [#17835](https://github.com/redpanda-data/ redpanda/pull/17835) - #17197 more accurate node status reporting by @mmaslankaprv in #17698
- skipping overhead of collecting node health report for each node separately. by @mmaslankaprv in [#17864](https://github.com/ /pull/17864)
- PR #17756 [v23.3.x] kafka: chunked_vector for config responses by @ pgellert
- PR #17792 [v23.3.x] CORE-1752: cst: improved logging by @abhijat
- PR #17825 [v23.3.x] CORE-1722: compression: Use preallocated decompression buffers for lz4 by @abhijat
- PR #17888 [v23.3.x] CORE-2365: storage: increase size of offset key map fragment size by @dotnwat
Full Changelog: v23.3.11...v23.3.12
v23.3.11
Features
- Introduce "trust_file_crc32c" metric to export a checksum for each trust file in the system. by @oleiman in #17587
Bug Fixes
- #16650 Fix oversized allocation in storage. by @Lazin in #17541
- #17459 Fixes a bug with TLS metrics where expiration timestamps would not advance on certificate reload by @oleiman in #17460
- rpk: prevent a segfault when creating a profile from a cloud that is not in ready state. by @r-vasquez in #17585
- PR #17435 [v23.3.x] c/frag_vector: added
get_allocator()
method to fragmented vector by @mmaslankaprv - PR #17449 [v23.3.x] tx: fix param ordering in log statement by @ nvartolomei
- PR #17572 [v23.3.x] Fixed
node_hash_map
caused oversized allocations in cluster module by @mmaslankaprv - PR #17573 [v23.3.x] use chunked vector as batches cache in
raft:: replicate_batcher
by @mmaslankaprv - PR #17576 [v23.3.x] c/topics_frontend: break the loop when dispatching to current leader by @mmaslankaprv
- PR #17578 [v23.3.x] rm_stm: do not hold producer lock for the duration of the barrier by @bharathv
- PR #17584 [v23.3.x] k/group_manager: used chunked_vector when cleaning groups by @mmaslankaprv
Improvements
- #17428 Improves error feedback when Redpanda is given an invalid number of partitions during either topic creation or when the partition count for a topic is increased. by @michael-redpanda in #17431
- PR #17574 [v23.3.x] Improved validation of Fetch requests when reading from follower by @mmaslankaprv
Full Changelog: v23.3.10...v23.3.11