Releases: redpanda-data/redpanda
v23.3.2
Bug Fixes
- Fix an issue with
Cargo.toml
when initializing a Rust Data Transform project viarpk transform init
by @rockwotj in #15947 - Fixes a crash if a WebAssembly function is deployed that immediately crashes. by @rockwotj in #15943
- Fixes an an improper initialization of metrics related to controller snapshot uploads. by @andrwng in #16074
- Have fetch handler ensure rack awareness is enabled before performing follower fetching by @michael-redpanda in #15915
- Prevent an assertion from being triggered when Wasm VMs fail immediately. by @rockwotj in #15941
- Redpanda used to accept an empty string in
redpanda.rack
in node config. This would cause issues in Kafka operations. Redpanda will now error on startup ifredpanda.rack
is set to an empty string. by @michael-redpanda in #15849 - Redpanda will now correctly handle an empty rack ID provided in a fetch request by @michael-redpanda in #15860
- #15928 Prevent oversized allocs when group fetching from many partitions. by @rockwotj in #15929
- ext4 is no longer incorrectly detected as ext2 (all of ext2, 3 and 4 are assumed to be ext4). by @travisdowns in #15855
- safer handle unknown properties in local state by @andijcr in #15838
Improvements
- Caches the connections local address preventing the need to make a system calls to grab this value when auditing events. by @graphcareful in #15958
- Data Transforms written in Golang now use a non-buffered write mechanism. Transforms that used to be written as by @rockwotj in #15936
- Support changing the timeout for WebAssembly functions by @rockwotj in #15984
- Support dynamically changing the limit for WebAssembly binary size by @rockwotj in #15984
- This PR partially reverts the change such that strict retention remains enabled after upgrade unless it had been explicitly disabled before the upgrade. by @dotnwat in #16084
- #15974 Internal kafka client now uses asynchronous compression (when possible) to reduce possibility of oversized allocations and reactor stalls by @michael-redpanda in #15976
- [rpk] more informative error message display on create topic failure by @michael-redpanda in #15847
rpk transform deploy
takes a--file
flag to deploy a compiled WebAssembly binary. by @rockwotj in #15954- PR #15843 [v23.3.x] Increase audit buffer sizes for audit scale test by @graphcareful
- PR #15866 [v23.3.x] rptest: log error on failure to delete bucket by @andrwng
- PR #15867 [v23.3.x] rptest: allow the new version of xfs/ext4 fs error msg by @nvartolomei
- PR #15871 [v23.3.x] c/s/leader_balancer: prevent oversized alloc by @rockwotj
- PR #15876 [v23.3.x] t/kgo: upgrade kgo to do a full run after /last_pass by @nvartolomei
- PR #15885 [v23.3.x] gh/workflow: add build message in promote trigger by @gousteris
- PR #15888 [v23.3.x] gha: s/git.ref_name/github.ref_name by @rockwotj
- PR #15930 [v23.3.x] Skip assertion in audit log tests if results beat the baseline by @graphcareful
- PR #15970 [v23.3.x] dt: Fixed flaky schemas test by @michael-redpanda
- PR #15980 [v23.3.x] tx_migration: avoid ping pong of requests between brokers by @bharathv
- PR #15988 [v23.3.x] r/offset_translator: remove unsafe bootstrap code by @ztlpn
- PR #15998 [v23.3.x] c/partition_balancer: use full partition move when disk is full by @mmaslankaprv
- PR #15999 [v23.3.x] securit/:OIDC: Enable licence check and telemetry by @BenPope
- PR #16065 [v23.3.x] cloud_storage: Improve scrubber by @Lazin
- PR #16073 [v23.3.x] archival: avoid division by 0 when computing slow down rate by @nvartolomei
- PR #16052 [v23.3.x] storage: enable space management by default by @dotnwat
Full Changelog: v23.3.1...v23.3.2
v23.2.22
Bug Fixes
- Redpanda used to accept an empty string in
redpanda.rack
in node config. This would cause issues in Kafka operations. Redpanda will now error on startup ifredpanda.rack
is set to an empty string. by @michael-redpanda in #15848 - Redpanda will now correctly handle an empty rack ID provided in a fetch request by @michael-redpanda in #15861
- #15785 Fix a rare bug where http client connections would vanish from the connection pool leading to various operations hanging while waiting for an http client. by @nvartolomei in #15834
- PR #15873 [v23.2.x] c/s/leader_balancer: prevent oversized alloc by @rockwotj
Full Changelog: v23.2.21...v23.2.22
v23.3.1
Features
- Add recovery mode - an option to start redpanda in "metadata-only" mode, skipping loading user partitions and allowing only metadata operations to enable recovery from fatal misconfiguration or resource exhaustion situations. Enabled by the
recovery_mode_enabled
node config property. by @ztlpn in #14236 - Add broker support for SASL reauthentication. To enable, config
connections_max_reauth_ms > 0
. by @oleiman in #13822 - Adds cloud storage scrubbing capabilities to Redpanda. In brief, the scrubber runs in the background and verifies the integrity of the cloud storage metadata and the existence of data referenced by it. When an issue is discovered, the
redpanda_cloud_storage_anomalies
will increment its counters based on the anomaly type. Per partition anomalies can be queried via thev1/cloud_storage/anomalies/
admin API endpoint. by @VladLazar in #13253 - Add per-shard download throughput limit to tiered-storage by @Lazin in #13552
- Added a Golang SDK for Data Transforms. by @rockwotj in #12322
- Added the ability do disable whole topics or specific topic partitions. Disabled partitions are ignored by Redpanda and producing/consuming to them is impossible. This is useful when only a handful of partitions prevent the cluster from becoming healthy (e.g. if the data in them is corrupted in a way that causes Redpanda to crash). by @ztlpn in #15141
- Adds an admin API to reset the crash loop prevention counter. Additionally the tracking metadata is reset every time the broker boots up in recovery mode. by @bharathv in #15064
- Adds node wise partition recovery functionality. Given an input set of node ids, force reconfigures all partitions that would lose majority if the specified set of nodes are dead. Intended to bulk recover partitions that are stuck when majority of brokers hosting their replicas are dead and irrecoverable. by @bharathv in #13943 and
#15394 - Admin API: Support OpenID connect Authentication @BenPope in #14378
- Allow template prefix in advertised addresses by @RafalKorepta in #14177
- K8s: Allow to move pods into new nodes if needed, and allow down scaling Redpanda custom resource. by @alejandroEsc in #12847
- Force reconfiguration now supports recovering from an ‘all replicas lost’ scenario. The replicas are reset with empty logs (no data) on the (broker, shard) destinations passed in the reconfiguration command. by @bharathv in #13661
- HTTP Proxy: Support OpenID connect authentication by @BenPope in #14378
- Adds fast partition movement to speed up cluster scaling operations by @mmaslankaprv in #14305
- Implements audit logging capabilities for the admin server. To configure set
audit_enabled
toTrue
and ensure one of the auditing enabled type inaudit_enabled_event_types
is set tomanagement
by @graphcareful in #14158 - Metadata backup: In addition to topic data, Redpanda will now upload cluster-related metadata, such as cluster configs, security settings, and consumer offsets into cloud storage. This metadata can be restored alongside topic data, allowing for the functional recovery of the entire cluster. by @andrwng in #15188
- Introduce Prometheus metrics for tracking TLS cert/CA expiration by @oleiman in #13477
- Kafka API: Adds SASL/OAUTHBEARER authentication to the Kafka API by @BenPope in #14378
- Redpanda is now compatible with Azure Storage Accounts that enable Hierarchical Namespaces. by @VladLazar in #12632
- K8s: Redpanda Node ID will be reported as label and annotation in Pod metadata by @RafalKorepta in #12524
- Schema Registry: Support OpenID connect by @BenPope in #14378
- You can decode now schema registry encoded messages (AVRO or Protobuf) using
rpk topic consume --use-schema-registry
by @r-vasquez in #14498 - #12912 Adds
rpc_client_connections_per_shard
cluster property that allows for the number of clients a broker opens to a given peer to be user configurable. by @ballard26 in #12906 - #12934 #13617 Time-based retention now uses broker-based timestamps for determining when to purge new data. This reduces the risk of retention not removing segments when a misbehaving client produces messages with incorrect timestamps (e.g. a timestamp in the future). by @andijcr in #12991
- The configuration option
storage_ignore_timestamps_in_future_secs
is retained to deal with bad segments produced before v23.3 - This changes the behavior for messages with a timestamp in the past. Before, retention would use this timestamp to delete data. now, the retention window starts when the message arrives in the broker.
- The configuration option
- #12989 Schema Registry: Support
GET /schemas/ids/{id}/subjects
by @oleiman in #13020 - #13229 Schema Registry: Add
DELETE /config/{subject}
endpoint by @oleiman in #13557 - #14191 Audit Kafka API authentication and authorization events by @michael-redpanda in #14452
- #14394 rpk: adds
rpk cluster partitions enable/disable
: now users can disable or enable partitions of a topic using rpk. by @r-vasquez in #14909 - #14483 rpk debug bundle: now you can use the
--partition
flag to request additional debugging information from the admin API for the provided partitions. by @r-vasquez in #15136 - #9128 K8s: New Topic Custom Resource that can manage single topic by @RafalKorepta in #11208
- #9205 rpk: new 'rpk cluster partitions move' command to reassign replicas by @daisukebe in #13684
- #9205 rpk: new ‘rpk cluster partitions move-status’ to show ongoing partition movements by @daisukebe in #13258
- adds
rpk redpanda mode recovery
command to put redpanda into recovery mode. by @r-vasquez in #14431 - ability to change number of partitions in tx manager topic by @mmaslankaprv in #15121
- adds support to the Cluster API (deprecated) for a user-specified ServiceAccount by @joejulian in #12864
- adds support to the Console API (deprecated) for a user-specified ServiceAccount by @joejulian in #13120
- crash_loop_limit now defaults to 5. If a broker uncleanly shutdowns for 5 times back to back, it is considered to be in a crash loop mode and Redpanda refuses to start up and may need manual intervention. This enforcement is disabled in
developer
mode and rpk'sdev-container
mode. by @bharathv in #13431 - rpk introduces
rpk cluster partitions list
which lets users query cluster-level metadata of all partitions in the cluster. by @r-vasquez in #14862 - rpk now has a
cluster txn
command space by @twmb in #7557 - rpk now supports encoding messages with a given schema stored in the schema registry when producing via rpk topic produce. by @r-vasquez in #13543
- rpk: Introduce a new flag
--label-selector
torpk debug bundle
and have a way to filter the bundled resources by label. by @r-vasquez in #15185 - rpk: new
rpk registry
command to manage schema registry with rpk by @r-vasquez in #12669 - PR #14283 Compaction lossy hash by @dotnwat...
v23.2.21
Bug Fixes
-
Fix a memory leak when transactions are used with many different producer IDs. by @rockwotj in #15796
-
#15688 topic remote recovery via redpanda.remote.recovery handle correctly disabled retention by @andijcr in #15689
-
#15786 Fix an issue where new configs would continually revert to legacy defaults after an upgrade. by @oleiman in #15787
-
rpk: Redpanda log collection now is possible in pods where you have multiple containers, by default, rpk will gather logs from the 'redpanda' container. by @r-vasquez in #15684
-
PR #15597 [v23.2.x] r/state_machine: ignore broken semaphore and condition var exceptions by @mmaslankaprv
-
PR #15599 [v23.2.x] tests/e2e_shadow_indexing_test: wait until segments has one entry by @andijcr
-
PR #15600 [v23.2.x] archival: Disable cross term merging of adjacent segments by @Lazin
-
PR #15602 [v23.2.x] storage/disk_log_impl: defensive reverse iteration of _segs by @andijcr
-
PR #15772 [v23.2.x] c/members_member: don't ignore update_broker_client future by @rockwotj
-
PR #15780 [v23.2.x] Fix 128K iobuf zero-copy by @travisdowns
-
PR #15800 [v23.2.x] c/topic_table: do not log duplicated lifecycle marker command by @mmaslankaprv
Full Changelog: v23.2.20...v23.2.21
v23.2.20
Bug Fixes
- Fixed-limit truncation for request/response logging in the Kafka API by @oleiman in #15621
- Fixes a bug in compacted segment reuploads that could result in overlapping remote segments in the cloud manifest. by @andrwng in #15152
- PR #15682 [v23.2.x] r/persisted_stm: remove persistent state from previous incarnations by @ztlpn
- PR #15686 [v23.2.x] c/partition: fix acquiring _archiver_reset_mutex by @ztlpn
- PR #15699 [v23.2.x] codeowners: update consistency tests codeowner by @r-vasquez
- PR #15732 [v23.2.x] cloud_storage: Make wait for hydration abortable by @andrwng
Improvements
Full Changelog: v23.2.19...v23.2.20
v23.2.19
Bug Fixes
- #15331 Speeds up convergence of leadership metadata in Redpanda by @mmaslankaprv in #15333
- #15435 kafka/client: More robust error handling during initial connection by @BenPope in #15436
Improvements
- Modifications to avoid stale responses returned from DescribeACLs requests by @graphcareful in #15437
- #15358 Improves space management by helping to reduce bias towards certain partitions when reclaiming space. by @dotnwat in #15359
- #15587 kafka/client: Configuration for consume min and max bytes by @BenPope in #15589
- PR #15246 [v23.2.x] remote_partition_fuzz_test: measure shutdown time directly by @andijcr
- PR #15380 [v23.2.x] rpk produce: Adjust timeout and maxBytes options by @r-vasquez
- PR #15389 [v23.2.x] rptest: Ignore transient log error message by @Lazin
- PR #15419 [v23.2.x] c/partition_balancer: handle members_table update during planning by @ztlpn
- PR #15421 [v23.2.x] tests: fix FollowerFetchingTest.test_with_leadership_transfers by @ztlpn
Full Changelog: v23.2.18...v23.2.19
v23.1.21
Bug Fixes
- Fixes a bug in read replicas that were subject to unstable leadership that could create corrupted local segment files (note, segments in the cloud are safe). by @andrwng in #14634
- #14800 Fix a bug where producer would get
INVALID_PRODUCER_ID_MAPPING
if the leader of the transaction coordinator partition would change. by @nvartolomei in #14802 - #15048 Adds previously missing authorization checks to Transactions API by @oleiman in #15060
- #15239 Fix NotFound error handling when using Google Cloud Storage backend for Tiered Storage. by @nvartolomei in #15241
- #15280 Fixes an issue where lookup would fail for URL encoded username parameter (
DELETE/PUT /v1/security/users/{user})
by @oleiman in #15283 - #15433 kafka/client: More robust error handling during initial connection by @BenPope in #15434
- prevents consumer from reading not majority acknowledged data by @mmaslankaprv in #15330
Improvements
- Avoid 100% reactor utilization in case of state machine errors. by @ztlpn in #15286
- Schema Registry: Improve compatibility of reading schemas that were created by another registry. by @BenPope in #14876
- #14833 Redpanda: Lower the log level if
GNUTLS_E_DECRYPTION_FAILED
is encountered. by @BenPope in #14869 - #15275 CreatePartitionsAPI responds with REASSIGNMENT_IN_PROGRESS when servicing a request for a topic that also has an active partition reassignment. by @NyaliaLui in #15276
- #15588 kafka/client: Configuration for consume min and max bytes by @BenPope in #15590
- redpanda: Make
redpanda_cpu_busy_seconds_total
metric a counter instead of a gauge. by @BenPope in #14928 - PR #14293 [v23.1.x] Fix for CI Failure (invalid timer interval) in
UsageTest.test_usage_metrics_collection
by @graphcareful - PR #14497 [v23.1.x] Use trunk for promote job by @nk-87
- PR #14613 [v23.1.x] cmake: exclude operator tags when determining version by @rockwotj
- PR #14753 [v23.1.x] dt/memory_stress_test: Mark
test_fetch_with_many_partitions
@ok_to_fail
by @michael-redpanda - PR #14902 [v23.1.x] Remove k8s operator code from monorepo by @nk-87
- PR #14932 [v23.1.x] tests: disable minio console by @jcsp
- PR #15387 Revert "[v23.1.x] redpanda: Make
redpanda_cpu_busy_seconds_total
a counter" by @rockwotj
Full Changelog*: v23.1.20...v23.1.21
v23.2.18
Features
- #15074 rpk introduces
rpk cluster partitions list
which lets the user query the list of the partitions for a topic. by @r-vasquez in #15119 - #15235 rpk: add 'partitions move' to reassign replicas by @daisukebe in #15236
Bug Fixes
- Fixes an issue where adjacent segment compaction concatenates with an empty index resulting in incorrect concatenated segment. by @bharathv in #15372
- Prevent data balancing from scheduling partition moves back and forth when rack awareness and preventing full disk are in conflict (the latter gets a priority). by @ztlpn in #15292
- #15047 Adds previously missing authorization checks to Transactions API by @oleiman in #15061
- #15173 #15367 Fix a bug that resulted in
offset_out_of_range
errors for valid fetch offsets when fetching from followers that had recently lost leadership. by @ztlpn in #15369 - #15238 Fix NotFound error handling when using Google Cloud Storage backend for Tiered Storage. by @nvartolomei in #15240
- #15279 Fixes an issue where lookup would fail for URL encoded username parameter (
DELETE/PUT /v1/security/users/{user})
by @oleiman in #15282
Improvements
- Avoid 100% reactor utilization in case of state machine errors. by @ztlpn in #15259
- #10963 Expose metric for total sent/received bytes from clients on public_metrics by @StephanDollberg in #15315
- #13992 #15080 #14829 #14820 easier to understand transaction related logs by @mmaslankaprv in #15139
- #14826 Adds a metric to track fetch plan and execute latency by @StephanDollberg in #15129
- #15173 #15367 more information when investigating offset out of range errors by @mmaslankaprv in #15369
- #15272 CreatePartitionsAPI responds with REASSIGNMENT_IN_PROGRESS when servicing a request for a topic that also has an active partition reassignment. by @NyaliaLui in #15274
- PR #14903 [v23.2.x] Remove k8s operator code from monorepo by @nk-87
- PR #15105 [v23.2.x] c/topic_table: fix force_update replicas_revisions update by @ztlpn
- PR #15125 raft/state_machine: log error for unknown exception by @andijcr
- PR #15126 [v23.2.x] tests/cluster_config_test: _check_value_everywhere retry and better error message by @andijcr
- PR #15139 [v23.2.x] Backport of #13964 #14739 #14803 #15049 #15032 by @mmaslankaprv
- PR #15148 [v23.2.x] bytes/io_iterator_consumer: add backtrace to std::out_of_range error by @andijcr
- PR #15156 Cleanup k8s operator code by @nk-87
- PR #15166 [v23.2.x] Feat/partition manifest log path by @andijcr
- PR #15175 Backport pr 14434 v23.2.x 808 by @graphcareful
- PR #15177 [v23.2.x] archival/test: Ensure archiver is stopped by @abhijat
- PR #15204 [v23.2.x] format exception fixes by @andijcr
- PR #15329 [v23.2.x] segment_meta_cstore: further restriction on hints when materializing the iterator by @andijcr
- PR #15345 [v23.2.x] Simplified mapping transaction id to coordinator partition by @mmaslankaprv
- PR #15364 [v23.2.x] rpk: manually update golangci-lint version by @r-vasquez
- PR #15384 Revert "[v23.2.x] redpanda: Make
redpanda_cpu_busy_seconds_total
a counter" by @rockwotj
Full Changelog: v23.2.17...v23.2.18
v23.2.17
Features
- Adds
rpc_client_connections_per_shard
cluster property that allows for the number of clients a broker opens to a given peer to be user configurable. by @ballard26 in #14907
Bug Fixes
- Fixed transaction status being incorrectly reported when listing transactions via Kafka API by @mmaslankaprv in #14886
- Fixes a bug in segment reupload that could cause the operation to fail after a delete-records request. by @andrwng in #14839
- #15087 Fix a rare bug causing compaction to fail when lz4 compression is in use with
lz4f_compressupdate error:ERROR_dstMaxSize_tooSmall
. by @nvartolomei in #15088 - prevents consumer from reading not majority acknowledged data by @mmaslankaprv in #14918
- PR #14881 [v23.2.x] Add proper exception handler to around eviction_stm's call to replicate by @graphcareful
- PR #14981 [v23.2.x] Fixed handling Raft snapshot by @mmaslankaprv
- PR #14997 [v23.2.x] c/archival_stm: translate sync error to not_leader error code by @mmaslankaprv
- PR #15008 [v23.2.x] Better error message when topic creation fails by @travisdowns
- PR #15025 storage: fix variable name typo in gc logging statement by @dotnwat
Full Changelog: v23.2.16...v23.2.17
v23.2.16
Bug Fixes
- #14799 Fix a bug where producer would get
INVALID_PRODUCER_ID_MAPPING
if the leader of the transaction coordinator partition would change. by @nvartolomei in #14801 - removed assumptions of no gaps and alignment from segment_meta_cstore by @andijcr in #14954
- PR #14851 [v23.2.x] cloud_storage_clients: Fix xml parsing when searching for error code in response by @abhijat
- PR #14934 [v23.2.x] tests: retry on timeout in partition balancer test by @mmaslankaprv
- PR #14944 [v23.2.x] rpk: add no-browser to cloud login by @r-vasquez
- PR #14952 [v23.2.x] cloud_storage: Add client address to fetch and log reader config by @abhijat
- PR #14958 [v23.2.x] cloud_storage: Log client address in batch parser by @abhijat
Improvements
- Schema Registry: Improve compatibility of reading schemas that were created by another registry. by @BenPope in #14875
- #14832 Redpanda: Lower the log level if
GNUTLS_E_DECRYPTION_FAILED
is encountered. by @BenPope in #14868 - redpanda: Make
redpanda_cpu_busy_seconds_total
metric a counter instead of a gauge. by @BenPope in #14926
Full Changelog: v23.2.15...v23.2.16