EMQX Enterprise 6.1.2 #17527
zmstone
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Breaking Changes
Breaking change in a bug fix due to 1) Security enhancement; 2) Feature not used before.
#17157 Introduced a new Rule Engine configuration,
rule_engine.limit_selects_in_namespace, whose default value istrue. When enabled, rules will only trigger on messages published by clients on the same namespace as the rule itself.#17325 Removed the hot-upgrade REST API endpoints (
/api/v5/relup/*).Hot-upgrade is now operated exclusively through the
emqx ctl relupCLI on each node — there is no dashboard surface.The target release tarball can be placed anywhere readable by the EMQX process (no special staging directory). Trigger with
emqx ctl relup upgrade <TarballPath>. A<TarballPath>.sha256sidecar is required next to the tarball; the target version is read from the tarball's ownreleases/emqx_vars(REL_VSN).Enhancements
Security Hardening
#17040 Restricted API key access to dashboard user-account management endpoints.
Previously, an API key with the
administratorrole could call the dashboard user management endpointsPOST/DELETE /users/:username/mfaandPOST /users/:username/change_pwdvia HTTP Basic authentication. This meant an API key could reset or disable another dashboard user's MFA, or change another dashboard user's password, bypassing the intended separation between human dashboard sessions and machine API keys.These endpoints now return
401 API_KEY_NOT_ALLOWwhen accessed via an API key, matching the existing policy that already blocks API key access to/users,/users/:username,/logout, and/api_key. Dashboard users can still manage their own MFA and password from the dashboard UI using bearer-token (JWT) sessions as before.#17065 Added SSRF protection for rule-engine-reachable connector and bridge configurations.
When
rule_engine.ssrf.enableis set totrue, EMQX applies an outbound SSRF policy to connector, bridge and action configurations. Exact matches inrule_engine.ssrf.deny_hostsare rejected immediately, resolved target IPs are checked againstrule_engine.ssrf.allow_cidrsbeforerule_engine.ssrf.deny_cidrs, and the default denied ranges cover loopback, link-local (including cloud instance-metadata endpoints), RFC1918, ULA, unspecified and multicast ranges. The check runs at config-update time and covers HTTPurlfields as well asserver/servers/bootstrap_hostsstyle fields across all connector families.The feature is disabled by default to preserve compatibility with deployments whose connectors legitimately point at internal services. Operators in multi-tenant or externally-exposed setups are encouraged to enable it, alongside a network-layer egress firewall.
#17173 Restrict API keys from exporting or importing dashboard accounts and API keys via the data backup endpoints.
POST /data/exportcalled with an API key now silently omits thedashboard_usersandapi_keysmnesia table sets from the resulting archive.POST /data/importcalled with an API key now returns403 FORBIDDENwhen the uploaded backup contains either of those table sets.Dashboard bearer-token (login) callers are unaffected and continue to be able to back up and restore the full database, including dashboard users and API keys.
This closes a privilege-escalation gap where an API key holder could read or write dashboard login credentials and API key records, which the existing
/usersand/api_keyendpoints already deny to API keys, by going through the data backup endpoints instead.#17187 Removed the EMQX release version (
rel_vsn) from the unauthenticatedGET /status?format=jsonresponse to avoid disclosing the broker version to unauthenticated callers. The version remains available via the authenticated node-info APIs.#17201 Hardened the plugin install endpoint against path traversal in uploaded tarballs and tightened the install allowlist.
emqx ctl plugins allow <name-vsn>entries now expire 5 minutes after they are issued, and may be pinned to a SHA-256 hash of the package viaemqx ctl plugins allow <name-vsn> sha256:<HEX>. Uploads whose contents do not hash to the pinned value are rejected with403 Forbidden. The previous behavior of accepting any payload named<name-vsn>.tar.gzis preserved when the optionalsha256:argument is omitted.#17252 Publish
.sha256checksum sidecars alongside plugin packages on the official download site, so users can verify the integrity of downloaded plugin archives.#17271 Hardened the official EMQX docker image to clear image-scanner findings:
libssl3t64.libgnutls30t64package. EMQX talks TLS via OpenSSL through Erlang/OTP and never links GnuTLS, so it was only present as a transitive dependency ofcurland showed up in scanner reports.curlpackage, which would have transitively re-introducedlibgnutls30t64vialibrtmp1, with a statically-linkedcurlbinary from https://github.com/stunnel/static-curl (OpenSSL, HTTP/2, HTTP/3; no RTMP, no GnuTLS). Container healthchecks that callcurlcontinue to work unchanged.#17309 Sanitize PROXY-Protocol v2 SSL Common Name / Subject before they enter client identity.
When a listener is configured with
proxy_protocol = true, the broker now rejects connections whose PROXY-Protocol SSL TLV bytes contain ASCII control characters (the same byte class already rejected on MQTT-ingested clientid/username/password). This blocks attacker-controlled bytes from being smuggled into outbound HTTP authentication, authorization, or rule-engine header values via${cert_common_name}and${cert_subject}templates.As an additional defense layer, the HTTP authentication and authorization clients now refuse to send a request when a rendered header name or value contains a CR, LF, or NUL byte.
#17315 Extend the byte-class check applied to MQTT clientid / username / password to other fields that feed
ClientInfoand HTTP request templating:peersni(TLS Server Name Indication; also accepted from the PROXY-Protocol v2authorityTLV) is now validated at the connection ingestion boundary. Control characters cause the connection to be rejected and a warning logged.mqtt.client_attrs_initVariform expressions are dropped (with a warning) when they contain control characters, so templates such as${client_attrs.tns}cannot carry injected bytes downstream.#17440 Restricted downloading of stored backup files (
GET /api/v5/data/files/<filename>) to the global dashboard administrator. Backup archives may contain dashboard accounts (with password hashes and MFA / TOTP state) and API key records, so API key callers, dashboard viewers, and namespaced administrators are no longer permitted to download them. Listing the backup directory (GET /api/v5/data/files) remains available to all roles that previously had access.#17491 Fixed password and secret leaks in gateway authentication APIs, gateway error paths, and gateway debug logs. Gateway authentication API responses now redact secrets while preserving raw configuration. Gateway authentication failures, listener start errors, ExProto authentication logs, CoAP token-required logs, and LwM2M invalid-register logs no longer print raw passwords or secrets.
#17501 Block namespaced dashboard users from reading MQTT message content that crosses namespace boundaries.
403 FORBIDDEN:GET /clients/:clientid/mqueue_messages,GET /clients/:clientid/inflight_messages,GET|DELETE /mqtt/retainer/messages,GET|DELETE /mqtt/retainer/message/:topic,GET /mqtt/delayed/messages,GET|DELETE /mqtt/delayed/messages/:node/:msgid,DELETE /mqtt/delayed/messages/:topic. Previously a namespaced user could read or delete messages produced by other namespaces.GET /tracelists only traces created by the caller's namespace, and the per-name endpoints (/trace/:name,/trace/:name/download,/trace/:name/log,/trace/:name/log_detail,/trace/:name/stop) return404when the trace belongs to a different namespace (so the response does not leak that other-namespace traces exist). The bulkDELETE /trace(clear-all) is reserved for the global administrator; namespaced callers get403. Namespaced administrators can still create, list, download, stream, stop and delete their own traces, which keeps the existing per-namespace event filtering (PR fix(trace): filter events by namespace #17406) useful end to end.Clustering
#17076 Introduced routing schema
v3with per-node ownership of route table entries.With schema
v3, each node (core or replicant) takes full ownership of route entries pointing toward it, giving peer nodes read-only access. This improves partition tolerance of the cluster (peer nodes in a partitioned cluster cannot change route entries on behalf of other nodes) and reducesSUBACKlatency on replicant nodes.Backward compatibility: when a node supporting
v3joins a cluster of nodes only supportingv2, it keeps usingv2for compatibility. To switch the cluster tov3, perform a full restart of the cluster after upgrade. Setbroker.routing.storage_schema = v2to opt out. Note that rolling downgrade becomes impossible after the cluster switches tov3.Check the active schema with
emqx eval 'emqx_router:get_schema_vsn()'.#17152 Added support for configuring Erlang inet port options for the distribution port, with a default
buffersize of 1 MB.Previously, the Erlang distribution port used an extremely small default port buffer (1460 bytes, or ~9 KB on some platforms), which caused performance bottlenecks even when the distribution port buffer (
+zdbbl) was configured to a much larger value (e.g., 32 MB). This affected cluster communication reliability and could manifest aserpc timeouterrors, Mnesia transaction congestions, and degraded multi-core node support.Observability
#16911 Reduce the overhead of Prometheus metrics collection by avoiding accidental repeated queries of Mria statistics.
#16916 Now, the
emqx_cert_expiry_atPrometheus metric takes into account the expiry date of certificates that belong to managed certificate bundles, when they are used in MQTT listeners.#16958 Added focused
/api-specendpoints and a dashboard API spec explorer page for easier browsing of EMQX HTTP API documentation.The dashboard now serves tag-scoped and drill-down OpenAPI slices, and these endpoints are disabled together with Swagger when
dashboard.swagger_supportis set tofalse. Addedemqx ctl api_keysCLI commands to list, show, add, delete, enable, and disable API keys from the command line.#17018 Reduced the number of calls to other nodes performed when calling the Prometheus scraping API endpoint. This makes the API call return faster and reduces the chance of it timing out when the cluster is under strain.
Specifically,
emqx_mria_lagmetric that is of interest to replicant nodes is now refreshed periodically (every 10 seconds by default) instead of refreshed on demand for each API call.#17162 Exposed per-node license info via Prometheus gauges (
emqx_license_max_sessions,emqx_license_expiry_at,emqx_license_issued_at) so cluster-wide license consistency can be alerted on without per-node CLI checks.Operators can now alert on license inconsistencies across cluster nodes by comparing these gauges. The implementation fetches all three values from a single
emqx_license_checker:dump/0gen_server call, eliminating a redundant round-trip on every Prometheus scrape.#17176 Added
emqx_routes_countandemqx_routes_maxPrometheus metrics to export the number of route table entries per node.#17329 Added two node-wide gauge metrics to the
/api/v5/prometheus/statsendpoint:emqx_vm_uptime_msreports the EMQX node uptime in milliseconds.emqx_vm_max_fdsreports the maximum number of file descriptors available to the node.Access Control
#16849 Added cookie-based authentication fallback for plugin API endpoints.
Plugin UI iframes served by the dashboard can now authenticate via the
emqx_authcookie when noAuthorizationheader is present. This only applies to/api/v5/plugin_api/...paths.#16942 #17235 Introduced fine-grained scope-based access control for both API keys and dashboard login users.
API keys can now be restricted to specific API path categories using scopes derived from OpenAPI tags. Keys without scopes retain full access (backward compatible). An empty scopes list denies all scoped API paths. The
publisherAPI-key role is now constrained to[publish]only.Dashboard login users now also carry an optional
scopesfield; when set, requests are authorized against the same path-to-scope catalog used for API keys, layered on top of the existing role-based check. Four new scopes (user_management,mfa_management,sso_management,api_key_management) cover dashboard-only endpoints and are admin-only exceptmfa_management, which any role may hold for self-exemption from forced MFA. API keys cannot hold any of the four login-only scopes. Both checks apply to the HTTP API and to bootstrap-file loading (incompatible scopes are dropped with a warning).New public catalog endpoints expose the scope vocabulary for UI consumption:
GET /api_key_scopesandGET /user_scopes, both accessible to any bearer-authenticated caller. Thescopesfield is also surfaced inGET /users,POST /users, andPUT /users/:usernameresponses; when not explicitly set, the response projects the role-default scope list.Additional behavior changes that follow from the new scope model:
dashboard.default_usernameuser is protected as a break-glass account. It cannot be deleted, demoted from administrator, or have itsscopesfield set; only itsdescriptionmay be changed. This guarantees an operator always retains administrative access if other administrators lose or misconfigure their scopes.PUT /users/:selfare subject to the user's scopes.PUT /users/:usernameandPUT /api_key/:namevalidate role changes against the effective persisted scopes when the request body omits thescopesfield. Demoting a user or changing an API key role is rejected if the persisted scopes are incompatible with the new role.key:secret:role:scopes). Unknown or role-incompatible scope names are dropped with a warning rather than rejecting the whole file, so existing three-column bootstrap files remain loadable.GET /sso/saml/metadata) is now reachable without authentication, matching/sso/saml/acs.#16943 Added per-backend
force_mfaoption for SSO (OIDC/SAML/LDAP).When enabled, SSO users must complete TOTP MFA setup or verification before receiving a dashboard token, regardless of IDP-side MFA settings. Supports three MFA states:
not_configured(force setup),enabled(require verification), andadmin_disabled(skip MFA). New API endpointsPOST /sso/mfa/setupandPOST /sso/mfa/verifyhandle the MFA flow.Existing users can be exempted or required individually by an administrator via DELETE/POST on
/users/:username/mfa, and that decision overrides the live backend policy until the administrator changes it. SSO users on aforce_mfa = truebackend who disable their own MFA are required to set MFA up again on the next login; only an administrator-initiated disable exempts a user from the live policy.#17178 The
emqx ctl api_keys addCLI command now accepts a--scopes <scope1,scope2,...>option, matching the scope-based permission control already supported by the REST API.#17218 Added an ACME client plugin (
emqx_acme) that issues and renews TLS certificates from any RFC 8555 ACME CA (e.g. Let's Encrypt) into an EMQX managed certificate bundle, and rewrites the configured SSL/WSS and/or dashboard HTTPS listeners to consume that bundle.Multi-tenancy
#17053 Added a new multi-tenancy configuration option
multi_tenancy.post_auth_tns_expression.When configured, it is a Variform expression evaluated after the authentication chain completes. Its rendered value is written into
client_attrs.tns, the tenant namespace key used by multi-tenancy quota and routing decisions.This lets operators derive the tenant namespace from authentication-response attributes (for example, a
tagfield returned by an HTTP auth backend) instead of relying only on pre-authenticationmqtt.client_attrs_init. Example expressions:client_attrs.tag, or with a fallbackcoalesce(client_attrs.tag, username).When the expression is empty (default), behavior is unchanged.
#17078 Inlined each managed namespace's configuration (session and limiter) in the response of
GET /api/v5/mt/managed_ns_list_details, so management UIs can render a list of namespaces with their configuration in a single request instead of one additional call per namespace.Gateway
#17013 Added GBT32960-2025 protocol support to the GBT32960 gateway.
The gateway now automatically detects the protocol version by frame header (
##for 2016,$$for 2025) and handles version-specific parsing and serialization, including:Data Integration
#16929 Two new limiter kinds are introduced:
delivery_messagesanddelivery_bytes. In contrast to the existingmessagesandbyteslimiters, which limit messages published by a single client, the new limiter throttle messages received by a single client from any source. If the limit is hit, QoS 0 messages are dropped, QoS > 0 are queued internally, and a retry is scheduled. The retry time is derived from the limiter's configuration.The new limiters are only supported for memory sessions (
durable_sessions.enable = false).If unspecified, the default values are unlimited, thus keeping backwards compatibility.
#16962 Improved Kafka source polling behavior by ensuring fetch requests wait briefly for data instead of returning empty batches immediately when no records are available. This reduces unnecessary polling delays and helps Kafka consumers receive new records more consistently.
#17011 Added
ts_columnandttlconfiguration fields to the EMQX Tables (Rust NIF driver) connector.ts_column: Specifies a custom timestamp column name for auto-created tables (defaults totsif not set).ttl: Sets the time-to-live hint for auto-created tables (e.g.,3 days).These fields were already supported by the underlying
greptimedb-ingester-erlnifdriver (since 0.1.8) and are now exposed in the EMQX Tables connector configuration.#17025 The way the InfluxDB database performs health checks and credential verification has been changed.
It no longer performs checks by executing
SHOW DATABASES, which could be falsely flagged as a system penetration by some auditing systems.See also emqx/influxdb-client-erl#54
#17031 Added session high-watermark history for license usage auditing.
EMQX now records the daily peak session count and retains at least 24 months of history. Operators can query this data via
emqx ctl license historywith optional--period daily|monthlyand--jsonflags. A newlicense.high_watermark_timezoneconfig controls the day boundary for bucketing.#17046 Added a new metric
actions.messages(and the correspondingactions_messages_ratein the dashboard monitor API) that counts the total number of messages handled by rule-engine action executions.Because a single action execution may handle a batch of messages,
actions.messagesis greater than or equal toactions.executed, andactions_messages_ratereflects the true per-message throughput of actions.#17089 MQTT ingress bridges now support consuming from remote message queues exposed as
$queue/{name}/{bind-filter}when the remote broker supports MQTT 5 Subscription Identifiers. Queue subscriptions are rejected when Subscription Identifiers are unavailable, and regular topic subscriptions automatically retry without Subscription Identifiers if the remote broker does not accept them.#17104 Blob name templates in aggregated upload actions (Azure Blob Storage, Amazon S3, GCS, Snowflake, S3 Tables) now accept date-part placeholders
${datetime.YYYY},${datetime.MM},${datetime.DD},${datetime.hh},${datetime.mm},${datetime.ss}, and${datetime.DOY}(day of year), defaulting to UTC and rendered against the aggregation start time. Each part token may be prefixed with an explicit timezone —utc(same as no prefix) orlocal(EMQX node's system timezone) — e.g.${datetime.local.YYYY}or${datetime.utc.hh}. This enables Hive-partitioned object layouts (e.g.year=2025/month=04/day=22/hour=07/...) that are directly consumable by Spark, Databricks, and Synapse.#17120 Added a new query string filter option to
GET /clients_v2:node. When specified, online clients connected to the supplied node name will be returned, as well as disconnected clients last connected to them.#17136 Added the
ping_with_authoption for InfluxDB connectors. When enabled, health checks include the configured credentials for InfluxDB-compatible services that require authenticated health check requests. Also fixed the InfluxDB connector/action to preserve Unicode text when writing values fromwrite_syntaxliterals or MQTT payloads.#17165 Added the
resource_opts.dispatch_strategyoption for actions.The new option defaults to
per_clientid, preserving the previous buffer worker dispatch behavior. Setting it torandommakes queries without an explicitpick_keyuse a random dispatch key, which helps spread traffic across multiple buffer workers when a small number of clients publish a large amount of messages.#17170 #17282 #17297 Added
tcp_opts(nodelay,sndbuf,recbuf,buffer,keepalive,delay_send,active_n) to the MQTT bridge connector and Cluster Link configurations, so the outbound MQTT client TCP socket can be tuned per connection. Unset fields keep the operating system /gen_tcpdefaults.delay_send(off by default) coalesces small writes for better throughput at the cost of a small latency increase.#17221 Improved Cluster Linking diagnostics for MQTT message forwarding.
When message forwarding connections experience connectivity issues, the link resource status and respective alarms now include the disconnect reason, making configuration problems easier to identify.
#17245 Added Chinese and English translations for the MQTT Disk-Queue bridge plugin's configuration UI in the Dashboard.
Deployment
service.wsEnabledoption to the Helm chart to suppress the ws/wss Service port entries when MQTT WebSocket listeners are disabled. Defaults totrueto preserve existing behavior.Bug Fixes
Core MQTT Functionalities
#16779 Improve handling of malformed first packets by classifying them as invalid CONNECT packets and adding better protocol hints in logs.
#16781 Fixed CONNECT validation when retained messages are unavailable.
When
mqtt.retain_availableis set tofalse, CONNECT packets with Will Retain set are now correctly rejected with CONNACK reasonRetain not supported (0x9A).#16783 Fixed MQTT v5 SUBSCRIBE validation for
Subscription-Identifierupper bound.EMQX now accepts
268435455(0x0FFFFFFF), which is the maximum valid Subscription Identifier value defined by the MQTT spec.#16847 Fix a crash when non-ASCII unicode string is used in message transformation expression.
#16874 Fixed a rare issue where Durable Storage backed by DS Raft could stop accepting new messages after a sequence of quick cluster leadership changes, requiring a node restart to recover.
#16876 Changed log message 'msg_publish_not_allowed' to 'msg_not_routed_to_subscribers'.
#16974 In EMQX 6.1.1, when a session was subscribed to a topic filter containing retained messages and was later taken over or resumed without re-subscribing to the same topic filter, it would receive again the received messages. Now, the previous behavior is restored, meaning that, upon session resumption or takeover without explicit re-subscription, retained message iteration will cease.
#17139 Restored
retainer.enableas a real runtime switch for the retainer subsystem.This allows deployments to keep MQTT retained-message protocol support enabled while disabling retained-message storage, instead of relying on
mqtt.retain_available, which can reject retained publishes at the protocol layer.#17172 Fixed an issue where MQTT packets (such as PUBACK) sent by a client right before disconnecting could be lost when the connection process had pending outbound messages in its mailbox. Now the connection process correctly drains its mailbox before shutting down, ensuring that inbound packets are processed even after the socket is closed.
#17175 Fixed an issue where messages delivered from Streams did not apply subscription options such as Subscription Identifier from the stream subscription.
#17353 Fixed an issue in the
socketTCP backend where outbound MQTT packets could be sent in the wrong order when a client connection experienced repeated send congestion. This scenario was practically very unlikely to occur.#17383 After a session takeover, the channel info reflected by the dashboard and REST API (
mqueue_len,inflight_cnt) now updates immediately after the takeover replay completes, rather than waiting for the next 15-second stats refresh tick.Rule Engine
#16699 Previously, under certain race conditions, long and cryptic logs like the following could be printed:
Now, we print more meaningful information to help debug the issue.
#16780 Fixed an issue in authorization source validation where requests missing the
typefield could trigger an internal error.Now EMQX returns a clear
BAD_REQUESTvalidation error for this case.#16796 Fixed handling of multiline SQL statements in connector actions.
#16805 Added support for authz hook results to opt out of authorization cache storage for dynamic ACL decisions.
#17211 Added the
connected_atfield to the$events/client/connackRule Event, which was stated in the documentation but missing from the actual data.Data Integration
#16936 Fixed an issue where the health check of an Azure Blob Storage Action in aggregate mode could timeout if the container contained too many blobs.
#16955 Eliminate Kafka producer action false health check warning logs.
Previously if Kafka producer is idling for too long, Kafka may close the connection (typically default is 10 minutes), if Kafka producer action health-checks happen to be performed around the same moment, there could be a false warning message with message "not_all_kafka_partitions_connected".
#16972 HTTP and GCP PubSub Actions were patched to treat transient connection errors with reason
closingas recoverable errors, reducing log noise.#17001 Fixed an issue where MQTT source failed to receive messages from
$queue/subscriptions when the remote broker has the Message Queue (mq) feature enabled.The root cause was that the MQ message delivery did not include the MQTT v5 Subscription-Identifier property in PUBLISH packets, which the MQTT bridge ingress relies on to route messages from queue subscriptions.
#17068 Fixed EMQX Tables TLS connector startup when
ssl.verifyisverify_noneand cert file paths are left empty, and aligned Rust NIF TLS verify propagation with connector config.#17084 Fixed an issue with MQTT Sources in which, if its Connector used
clean_start = falseand reconnected to a broker with a session containing messages, those messages would not trigger rule actions.#17111 Fix query execution for PostgreSQL connectors in disable prepared statements mode. Previously, concurrent queries could interleave and produce errors.
#17113 Fixed RocketMQ connector isolation: a misconfigured or unreachable RocketMQ connector no longer destabilises other RocketMQ connectors on the same node. Previously, one connector with an unreachable broker could stall the shared client supervisor for up to 60 seconds, causing sibling connectors to flap with
resource_health_check_timed_outand for dashboard operations on them to hang.The default TCP/TLS connect timeout is also lowered from 60 seconds to 10 seconds so a misconfigured server surfaces as failed quickly instead of appearing stuck.
#17180 Fixed an issue where, under heavy load, a timed out call to a MongoDB process would be interpreted as an unrecoverable error and wouldn't be retried. Now, the message will be retried on such events.
#17216 Fixed Timescale/PostgreSQL actions to report a structured bad parameter error instead of crashing the database connection process when a quoted JSON numeric string is mapped to a
FLOATcolumn.#17250 Fixed Redis Sentinel connectors to support separate authentication settings for Redis data nodes and Sentinel nodes.
#17293 Fixed an issue where, when writing a Parquet file with an object containing a required key but with an
undefined/nullvalue, a corrupt file would be written instead of raising an error.#17303 Upgraded Kafka client libraries:
brodfrom 4.5.2 to 4.5.4 andwolfffrom 4.1.9 to 4.1.10.Notable fixes picked up from upstream:
brod: fix a race condition during Kafka connection re-authentication (viakafka_protocol4.3.4).wolff: under high-memory load control (drop_if_highmem), keep a minimum buffer reserve so the producer is not starved of in-flight data; only bytes exceeding the reserve are dropped.#17343 Fixed a clustered-config replication bug where importing a data backup (or loading a HOCON config via
emqx ctl conf load/PUT /api/v5/configs) that contained afile-type authorization source could leave peer nodes lagging with acluster_rpc_apply_failed/failed_to_read_acl_fileerror.The importer used to write the ACL file locally and replace inline
ruleswith apath, then ship the path-form config across the cluster. Peer nodes have no such file on disk and so could not apply the change. The config sent to the cluster now keepsrulesinline, so each peer writes its own copy of the ACL file from the replicated content.#17347 Upgraded the RocketMQ client dependency to
v0.7.2to fix memory growth in async producer requests.#17439 Fixed an issue where the health check of an Azure Blob Storage Connector could timeout, or generate large bandwidth costs, if the storage account contained too many containers. Companion fix to fix(azure blob storage aggregated action): list at most 1 blob during health check (r58) #16935.
#17450 Fixed an issue where the
/prometheus/data_integrationPrometheus endpoint could repond with a 500 status when usingmode=node. This issue would only arise when the configuration for Actions and Connectors was manually edited and inconsistent, having an Action whose Connector does not exist.#17474 Reduced the overhead of IoTDB REST API connector health checks by using a bounded version query instead of listing all databases on each check.
Clustering
#17132 Fixed an issue where adding or removing topic metrics could fail on a replicate node when its raw config or runtime state had drifted, raising a
cluster_rpc_apply_failedalarm and stalling cluster RPC replication. Duplicate-add and missing-remove are now rejected on the initiator only, while replicates apply the change idempotently.#17182 Bump to emqx-OTP 27.3.4.2-8 for mria.
Without this change, during EMQX startup, Mria app boot may get stuck if it's not connected to the cluster.
#17214 Removed cryptic error-level logging of disconnect events from Cluster Link message forwarding MQTT clients, in favor of more user-friendly messages with enough context for troubleshooting. Events similar to this one should no longer appear in the error logs:
#17218 Avoid
bin/emqxandbin/emqx_ctlinvocations from triggeringnodeup/nodedownevents on the running broker, which previously surfaced as misleadingcm_registry_node_downwarnings in the broker log. The temporary helper nodes started by these scripts now register as hidden Erlang nodes, as intended.#17269 Improved cluster recovery after a network partition.
Previously, part of the clients connected to the replicant nodes could be lost from the global registry. This could lead to inconsistent behavior during takeover and incorrect information displayed in the dashboard.
This fix adds a background process that re-registers the existing clients when network partition is healed. It also adds a new alarm: "Broker is recovering after a network partition", which is raised while the global registry is being rebuilt.
Introduced a new cluster auto-heal algorithm that can automatically recover overlapping network partitions.
#17342 Fixed cluster configuration import failing with a "required_field: node.cookie" schema check error when the exported
cluster.hoconcontained a partialnodesection. Read-only roots (node,rpc) are not part of the data import anyway, so they are now dropped from the imported config before the pre-flight schema check, letting the running node's own values be used for the validation.#17348 Fixed noisy and misleading
emqx ctl conf cluster_sync statusdiagnostics when clustered nodes have the same effective checked configuration but different raw configuration representations.The command now suppresses raw-only representation differences that do not correspond to checked configuration changes, while still warning when checked configuration is inconsistent. It also avoids crashing when a raw configuration key exists on one node but is missing from another node.
It also ignores timestamp-only metadata differences in
created_atandlast_modified_atfor actions, sources, bridges, and rule metadata. Data import or boot-time configuration loading can refresh these generated timestamps on only some nodes even when the effective runtime configuration is otherwise identical.#17349 Improved responsiveness of a Cluster Link in situations when route replication was stuck connecting to an unresponsive target cluster. Now, deleting such Cluster Link should finish slightly sooner.
#17382 Fix corruption of global channel registry that may occur when cluster experiences a network partition.
#17424 Fixed a global session registry leak that could leave duplicate or stale entries for the same client ID after a network partition followed by Mnesia autoheal.
Discard and takeover-kick RPC handlers now also remove the registry row when the target process is no longer alive, and the registration throttle on the connect path now recognizes tombstone rows (no local channel state) and reaps them instead of blocking new connections for the same client ID indefinitely.
#17432 Fixed an issue where concurrent Cluster Link API requests could return generic error responses, instead of returning either success or not found.
#17469 Fixed the issue where warnings similiar to those below are emitted when enabling or disabling an active Cluster Link.
Access Control
#17045 Fixed password-based authentication backends to let the auth chain continue when the CONNECT packet has no password, instead of rejecting the connection immediately.
Previously, if a client connected without a password, the first password-based authenticator (built-in database, MySQL, PostgreSQL, MongoDB, Redis, or LDAP) in the chain would return an error, blocking any subsequent authenticators from being tried.
#17064 Closed an authorization gap in the
/authentication/:id/usersREST endpoint so that a namespaced administrator can no longer list or create users in the global (or another tenant's) namespace by omitting thensquery parameter or thenamespacebody field. Authentication users in a non-global namespace can no longer be marked asis_superuser; requests to create or update such a user are rejected so that explicit ACL rules are always enforced for tenant MQTT clients.#17100 Fixed OIDC SSO login failing with
provider_not_readywhen the identity provider returns a JWKS response whoseContent-Typeuses the+jsonstructured syntax suffix (e.g.application/jwk-set+json; charset=utf-8). Such responses are now accepted as valid JWKS content.#17122 Fixed dashboard RBAC checks for SSO users with URL-encoded usernames such as email addresses, so viewer self-service MFA disable requests work correctly when
force_mfais disabled.#17140 Fixed a silent failure when EMQX fetched a Certificate Revocation List (CRL) over HTTP from a server that returns a DER-encoded body (
Content-Type: application/pkix-crl, the format mandated by RFC 5280 §5).Previously, EMQX only decoded PEM-encoded CRL bodies; a DER body was silently treated as zero CRLs and cached as an empty list, causing every TLS handshake on
enable_crl_check = truelisteners to fail withbad_crls, no_relevant_crlsand no log line indicating what went wrong.EMQX now decodes both PEM and DER CRL bodies. When a fetched body is neither, a warning is logged with the URL so the misconfiguration is visible.
#17171 Fixed an RBAC issue that prevented namespaced dashboard administrators from enabling or disabling MFA for their own account.
Namespaced administrators remain restricted from managing MFA settings for other dashboard users.
#17177 Dashboard-created REST API keys are now generated randomly instead of being derived from the API key name.
#17223 Fixed missing client certificate when a TCP-passthrough proxy (e.g. GCP TCP Proxy NLB, AWS NLB) is placed in front of an SSL listener with
proxy_protocol = true. The TLS handshake at the listener was completing successfully and the client certificate was present, but it was not exposed to authentication or rule events. Functions, ACL rules, and authentication backends that depend on the client certificate (CN, subject, full PEM) now work correctly in this deployment shape.#17330 Hardened the PROXY Protocol v2 TLV parser on TCP and SSL listeners with
proxy_protocolenabled. Previously, a TLV whose declared length overran the buffer caused the parser to silently truncate the TLV stream, dropping any trailing fields. The parser is now strict: malformed TLV streams cause the connection to be rejected with a warning log entry instead of being accepted with a partially parsed PROXY header.#17428 Fixed a Dashboard OIDC SSO crash that prevented EMQX from completing the OpenID provider discovery when the provider's
.well-known/openid-configurationresponse included aCache-Controlheader such asmax-age=0(observed with Kanidm). The crash caused the OIDC supervisor to exhaust its restart budget after a single failure, leaving SSO unable to recover without a config re-save. The cache-control parser is now tolerant of these values, the worker no longer hard-crashes on a bad expiry, and the OIDC supervisor allows several restarts within a minute so transient failures retry cleanly.Gateway
#17141 Fixed CoAP connection-mode token takeover so reconnecting UDP/DTLS clients can resume with a valid token while invalid token/clientid combinations are rejected. Also ensured required connection info fields are present before running CoAP takeover connected hooks.
#17258 Fixed an issue in the MQTT-SN gateway where a connected client sending a second CONNECT packet on the same session would crash its connection process. The gateway now responds with a DISCONNECT and closes the session gracefully.
#17287 Fixed MQTT-SN clients crash caused by packets received in unexpected connection or Will states, including
DISCONNECTduring connection setup,REGISTERbefore the Will handshake completes, andWILLMSGUPDbefore a Will topic exists.#17419 Fixed CoAP gateway observe notifications to honor the
gateway.coap.notify_typesetting.Observe notifications now use a per-session confirmable in-flight window of 1 and a fixed pending queue of 100 entries shared by all observe tokens. When a confirmable notification is in flight, later observe notifications are queued instead of being silently lost. When the queue is full, the oldest pending notification is dropped,
delivery.dropped.queue_fullis incremented, and a throttled warning is logged.Cancelling an observe relation now also removes pending notifications for that observed topic/filter and observe token, so queued notifications are not delivered after the client has cancelled the observe, including wildcard observe filters.
Observability
#16842 Reduced noisy plugin config warning logs when no peer node has the plugin config yet.
Previously, when a node tried to fetch plugin config from peer nodes during startup, it would log a warning even when all peers simply didn't have the config (e.g., first node to load the plugin). Now this benign case is logged at debug level, and only genuine errors (RPC failures, timeouts) remain as warnings.
#16843 Fixed an issue where HTTP headers and query string parameters were not passed through to plugin API handlers, causing plugins to receive empty headers and missing query parameters.
#16863 Added a warning log when an async reply is received for an already-expired request.
#16868 Improved REST API authentication error messages to guide programmatic clients toward using API keys (Basic auth) instead of repeatedly logging in for bearer tokens. Error responses now mention the
api_key.bootstrap_fileconfiguration option and thePOST /api_keyendpoint for creating persistent API keys.#16879 Add
log.audit.cache_sizeas the primary config key for the audit log DB cache size, while keepinglog.audit.max_filter_sizefor backward compatibility.#16890 Fixed an ExHook issue where successful reconnect reloads could duplicate the same server name in the running list and trigger repeated callback dispatches.
#16904 Prevent enabling or starting multiple versions of the same plugin at once. When a newer version is enabled, older configured versions of that plugin are automatically disabled, and management API actions now return a clear error instead of reporting success while another version is still active.
#16939 Fixed the built-in database authenticator so it no longer logs a warning when the default bootstrap file path is configured but the file does not exist.
#16956 Log client connection termination at warning level instead of info when the reason is
emsgsize(received packet exceedsmqtt.max_packet_size).#17002 Updated
minirestlibrary to version 1.4.12. This version fixes a bug that caused EMQX API to produce malformed API responses with204 No Contentstatus line, emitting invalidcontent-lengthheader.#17024 Dashboard HTTP listener now automatically uses IPv6 when the bind address is an IPv6 address, removing the need to explicitly set
inet6 = true.#17054 Fixed
GET /api/v5/configs?key=...returning incomplete data whenAccept: application/jsonwas set.Previously, the JSON response ignored the
keyquery parameter and always returned a fixed subset of root configurations, which excluded keys likemulti_tenancy. The endpoint now honors thekeyparameter in JSON responses consistently with the hocon (text/plain) response.#17118 Improved pagination on multi-tenancy list endpoints (
/mt/ns_list,/mt/ns_list_details,/mt/managed_ns_list,/mt/managed_ns_list_details,/mt/ns/{ns}/client_list):Link: <?...>; rel="next"response header. When more pages are available the header carries the query-only URI-reference of the next page; when absent, the current response is the last page. This removes the prior ambiguity where a full page (len(results) == limit) could not be distinguished from the exact-boundary "no more data" case without an extra request.first_ns,first_clientid) alongside the existing exclusive cursors (last_ns,last_clientid). The inclusive form supports exact-match lookup (e.g.?first_ns=foo&limit=1) and is preserved across paginated Link headers when the caller opts in. The two forms are mutually exclusive on a single request; supplying both returns HTTP 400.#17134 Fixed
invalid json termerror returned by the banned clients listing API for client ID and username regex bans created before 6.2.0. The compiled regex retained in the database from the older release is now translated back to the original pattern string when serializing the response.#17227 Cluster config file save errors now name the file and the underlying reason.
When
cluster.hocon(or its directory) is read-only, immutable, or otherwise unwritable (e.g. mounted read-only into a container), changing config via the dashboard or REST API previously returned an opaque HTTP 400 with body{config_update_crashed,{badmatch,{error,ebusy}}}and only logged a badmatch crash that did not name the file.The error now:
failed_to_save_conf_filewith the actual file path and reason (eacces,eperm,ebusy, ...) plus a hint listing common operator-side causes.Previously, when only the temporary file write failed (e.g. read-only directory), the API silently returned HTTP 200 even though the change was not persisted to disk. The API now correctly reports failure in this case as well.
#17246 Upgraded
joselibrary from 1.11.10 to 1.11.12, picking up EC and EdDSA key fixes for newer OTP releases.#17247 When a plugin's REST API callback crashes or runs over its timeout budget, the broker now logs the failing API method and path together with the configured timeout, so the offending call is identifiable in mixed-traffic logs. A timeout is logged as a warning (not an error) and includes a hint pointing at
plugins.api_endpoint.timeout, the config key to raise when a plugin callback legitimately needs more time.#17254 Improved memory-usage reporting inside containers. The broker now picks the most constraining memory reading among cgroup v2, cgroup v1, and the host's
/proc/meminfo(smallest non-zero total wins, larger usage ratio breaks ties). Previously the reading could be misleading in two ways: on containers with a tight cgroup limit, the host view could indicate >70% while the cgroup limit was <10% (or the reverse); and on hosts where a cgroup is mounted with no memory limit set, the cgroup reading could collapse the reported usage ratio to ~0%. Overload-protection thresholds and theMemory usedmetric now reflect the limit that actually constrains the process.#17319
GET /api/v5/schemas/{hotconf,actions,connectors}now returns the response withContent-Type: application/json. Previously the response body was valid JSON but the header wastext/plain; charset=utf-8, which broke clients that dispatch on the response content type.#17406 Now, events captured by a trace initiated by a namespaced admin are limited to the namespace of such admin, for traces of types topic, IP address, and clientid. Traces of type rule ID already had such behavior.
#17473 Lower the log level of
unabled_to_stop_plugin_appsfrom warning to info when the plugin's Erlang applications cannot be stopped because other running applications still depend on them. This is an expected, non-actionable condition during plugin unload and no longer raises a warning.Deployment
#16901 Fixed RPM package OpenSSL dependency for RHEL 9.6 LTS: pinned
openssl >= 3.5.1for RHEL >= 9.7 andopenssl >= 3.0.7for older RHEL 9 versions.#17311 Fixed Docker startup when the container hostname cannot be resolved. The entrypoint now falls back to the interface IP address before auto-generating the node name, and fails with a clear error if no node host can be determined.
#17369 Moved the dashboard listener defaults (
http.bindand the placeholder HTTPSssl_options) from the user-editableetc/emqx.confinto the shippedetc/base.hocon. Runtime updates -- including those made through the dashboard, the REST API, or theemqx_acmeplugin's automatic HTTPS configuration -- are now correctly preserved across restarts instead of being silently reverted to the default self-signed certificate by the hardcodedemqx.confblock.#17504 Fixed
bin/emqxfailing to detect a running node when its command line is wider than the terminal. The process discoveryps -efcall has been switched tops -efwwso that long-root <path>arguments are not truncated and the running EMQX process is reliably matched.This discussion was created from the release EMQX Enterprise 6.1.2.
Beta Was this translation helpful? Give feedback.
All reactions