EMQX Enterprise 6.1.2 #17527

zmstone · 2026-06-09T11:07:24Z

zmstone
Jun 9, 2026
Maintainer

Breaking Changes

Breaking change in a bug fix due to 1) Security enhancement; 2) Feature not used before.

#17157 Introduced a new Rule Engine configuration, rule_engine.limit_selects_in_namespace, whose default value is true. When enabled, rules will only trigger on messages published by clients on the same namespace as the rule itself.
#17325 Removed the hot-upgrade REST API endpoints (/api/v5/relup/*).

Hot-upgrade is now operated exclusively through the emqx ctl relup CLI on each node — there is no dashboard surface.

The target release tarball can be placed anywhere readable by the EMQX process (no special staging directory). Trigger with emqx ctl relup upgrade <TarballPath>. A <TarballPath>.sha256 sidecar is required next to the tarball; the target version is read from the tarball's own releases/emqx_vars (REL_VSN).

Enhancements

Security Hardening

#17040 Restricted API key access to dashboard user-account management endpoints.

Previously, an API key with the administrator role could call the dashboard user management endpoints POST/DELETE /users/:username/mfa and POST /users/:username/change_pwd via HTTP Basic authentication. This meant an API key could reset or disable another dashboard user's MFA, or change another dashboard user's password, bypassing the intended separation between human dashboard sessions and machine API keys.

These endpoints now return 401 API_KEY_NOT_ALLOW when accessed via an API key, matching the existing policy that already blocks API key access to /users, /users/:username, /logout, and /api_key. Dashboard users can still manage their own MFA and password from the dashboard UI using bearer-token (JWT) sessions as before.
#17065 Added SSRF protection for rule-engine-reachable connector and bridge configurations.

When rule_engine.ssrf.enable is set to true, EMQX applies an outbound SSRF policy to connector, bridge and action configurations. Exact matches in rule_engine.ssrf.deny_hosts are rejected immediately, resolved target IPs are checked against rule_engine.ssrf.allow_cidrs before rule_engine.ssrf.deny_cidrs, and the default denied ranges cover loopback, link-local (including cloud instance-metadata endpoints), RFC1918, ULA, unspecified and multicast ranges. The check runs at config-update time and covers HTTP url fields as well as server / servers / bootstrap_hosts style fields across all connector families.

The feature is disabled by default to preserve compatibility with deployments whose connectors legitimately point at internal services. Operators in multi-tenant or externally-exposed setups are encouraged to enable it, alongside a network-layer egress firewall.
#17173 Restrict API keys from exporting or importing dashboard accounts and API keys via the data backup endpoints.

POST /data/export called with an API key now silently omits the dashboard_users and api_keys mnesia table sets from the resulting archive. POST /data/import called with an API key now returns 403 FORBIDDEN when the uploaded backup contains either of those table sets.

Dashboard bearer-token (login) callers are unaffected and continue to be able to back up and restore the full database, including dashboard users and API keys.

This closes a privilege-escalation gap where an API key holder could read or write dashboard login credentials and API key records, which the existing /users and /api_key endpoints already deny to API keys, by going through the data backup endpoints instead.
#17187 Removed the EMQX release version (rel_vsn) from the unauthenticated GET /status?format=json response to avoid disclosing the broker version to unauthenticated callers. The version remains available via the authenticated node-info APIs.
#17201 Hardened the plugin install endpoint against path traversal in uploaded tarballs and tightened the install allowlist.
- The install path now refuses to extract any tarball whose entries would resolve outside the plugin install directory.
- emqx ctl plugins allow <name-vsn> entries now expire 5 minutes after they are issued, and may be pinned to a SHA-256 hash of the package via emqx ctl plugins allow <name-vsn> sha256:<HEX>. Uploads whose contents do not hash to the pinned value are rejected with 403 Forbidden. The previous behavior of accepting any payload named <name-vsn>.tar.gz is preserved when the optional sha256: argument is omitted.
- A successful install via the HTTP plugin install endpoint (and the dashboard upload that wraps it) immediately revokes the allow entry cluster-wide, so the same grant cannot be reused for a subsequent (potentially different) tarball.
#17252 Publish .sha256 checksum sidecars alongside plugin packages on the official download site, so users can verify the integrity of downloaded plugin archives.
#17271 Hardened the official EMQX docker image to clear image-scanner findings:
- Applied Debian security upgrades during the runtime image build, so the image picks up the latest patched libssl3t64.
- Removed the unused libgnutls30t64 package. EMQX talks TLS via OpenSSL through Erlang/OTP and never links GnuTLS, so it was only present as a transitive dependency of curl and showed up in scanner reports.
- Replaced the Debian curl package, which would have transitively re-introduced libgnutls30t64 via librtmp1, with a statically-linked curl binary from https://github.com/stunnel/static-curl (OpenSSL, HTTP/2, HTTP/3; no RTMP, no GnuTLS). Container healthchecks that call curl continue to work unchanged.
#17309 Sanitize PROXY-Protocol v2 SSL Common Name / Subject before they enter client identity.

When a listener is configured with proxy_protocol = true, the broker now rejects connections whose PROXY-Protocol SSL TLV bytes contain ASCII control characters (the same byte class already rejected on MQTT-ingested clientid/username/password). This blocks attacker-controlled bytes from being smuggled into outbound HTTP authentication, authorization, or rule-engine header values via ${cert_common_name} and ${cert_subject} templates.

As an additional defense layer, the HTTP authentication and authorization clients now refuse to send a request when a rendered header name or value contains a CR, LF, or NUL byte.
#17315 Extend the byte-class check applied to MQTT clientid / username / password to other fields that feed ClientInfo and HTTP request templating:
- peersni (TLS Server Name Indication; also accepted from the PROXY-Protocol v2 authority TLV) is now validated at the connection ingestion boundary. Control characters cause the connection to be rejected and a warning logged.
- Client attribute values produced by mqtt.client_attrs_init Variform expressions are dropped (with a warning) when they contain control characters, so templates such as ${client_attrs.tns} cannot carry injected bytes downstream.
- HTTP action / bridge connector header rendering now drops any header whose rendered name or value contains NUL, CR, or LF.
#17440 Restricted downloading of stored backup files (GET /api/v5/data/files/<filename>) to the global dashboard administrator. Backup archives may contain dashboard accounts (with password hashes and MFA / TOTP state) and API key records, so API key callers, dashboard viewers, and namespaced administrators are no longer permitted to download them. Listing the backup directory (GET /api/v5/data/files) remains available to all roles that previously had access.
#17491 Fixed password and secret leaks in gateway authentication APIs, gateway error paths, and gateway debug logs. Gateway authentication API responses now redact secrets while preserving raw configuration. Gateway authentication failures, listener start errors, ExProto authentication logs, CoAP token-required logs, and LwM2M invalid-register logs no longer print raw passwords or secrets.
#17501 Block namespaced dashboard users from reading MQTT message content that crosses namespace boundaries.
- Endpoints that return MQTT payloads outside the caller's namespace now reject non-global callers with 403 FORBIDDEN: GET /clients/:clientid/mqueue_messages, GET /clients/:clientid/inflight_messages, GET|DELETE /mqtt/retainer/messages, GET|DELETE /mqtt/retainer/message/:topic, GET /mqtt/delayed/messages, GET|DELETE /mqtt/delayed/messages/:node/:msgid, DELETE /mqtt/delayed/messages/:topic. Previously a namespaced user could read or delete messages produced by other namespaces.
- Trace APIs are now namespace-scoped per trace: GET /trace lists only traces created by the caller's namespace, and the per-name endpoints (/trace/:name, /trace/:name/download, /trace/:name/log, /trace/:name/log_detail, /trace/:name/stop) return 404 when the trace belongs to a different namespace (so the response does not leak that other-namespace traces exist). The bulk DELETE /trace (clear-all) is reserved for the global administrator; namespaced callers get 403. Namespaced administrators can still create, list, download, stream, stop and delete their own traces, which keeps the existing per-namespace event filtering (PR fix(trace): filter events by namespace #17406) useful end to end.

Clustering

#17076 Introduced routing schema v3 with per-node ownership of route table entries.

With schema v3, each node (core or replicant) takes full ownership of route entries pointing toward it, giving peer nodes read-only access. This improves partition tolerance of the cluster (peer nodes in a partitioned cluster cannot change route entries on behalf of other nodes) and reduces SUBACK latency on replicant nodes.

Backward compatibility: when a node supporting v3 joins a cluster of nodes only supporting v2, it keeps using v2 for compatibility. To switch the cluster to v3, perform a full restart of the cluster after upgrade. Set broker.routing.storage_schema = v2 to opt out. Note that rolling downgrade becomes impossible after the cluster switches to v3.

Check the active schema with emqx eval 'emqx_router:get_schema_vsn()'.
#17152 Added support for configuring Erlang inet port options for the distribution port, with a default buffer size of 1 MB.

Previously, the Erlang distribution port used an extremely small default port buffer (1460 bytes, or ~9 KB on some platforms), which caused performance bottlenecks even when the distribution port buffer (+zdbbl) was configured to a much larger value (e.g., 32 MB). This affected cluster communication reliability and could manifest as erpc timeout errors, Mnesia transaction congestions, and degraded multi-core node support.

Observability

#16911 Reduce the overhead of Prometheus metrics collection by avoiding accidental repeated queries of Mria statistics.
#16916 Now, the emqx_cert_expiry_at Prometheus metric takes into account the expiry date of certificates that belong to managed certificate bundles, when they are used in MQTT listeners.
#16958 Added focused /api-spec endpoints and a dashboard API spec explorer page for easier browsing of EMQX HTTP API documentation.

The dashboard now serves tag-scoped and drill-down OpenAPI slices, and these endpoints are disabled together with Swagger when dashboard.swagger_support is set to false. Added emqx ctl api_keys CLI commands to list, show, add, delete, enable, and disable API keys from the command line.
#17018 Reduced the number of calls to other nodes performed when calling the Prometheus scraping API endpoint. This makes the API call return faster and reduces the chance of it timing out when the cluster is under strain.

Specifically, emqx_mria_lag metric that is of interest to replicant nodes is now refreshed periodically (every 10 seconds by default) instead of refreshed on demand for each API call.
#17162 Exposed per-node license info via Prometheus gauges (emqx_license_max_sessions, emqx_license_expiry_at, emqx_license_issued_at) so cluster-wide license consistency can be alerted on without per-node CLI checks.

Operators can now alert on license inconsistencies across cluster nodes by comparing these gauges. The implementation fetches all three values from a single emqx_license_checker:dump/0 gen_server call, eliminating a redundant round-trip on every Prometheus scrape.
#17176 Added emqx_routes_count and emqx_routes_max Prometheus metrics to export the number of route table entries per node.
#17329 Added two node-wide gauge metrics to the /api/v5/prometheus/stats endpoint:
- emqx_vm_uptime_ms reports the EMQX node uptime in milliseconds.
- emqx_vm_max_fds reports the maximum number of file descriptors available to the node.

Access Control

#16849 Added cookie-based authentication fallback for plugin API endpoints.

Plugin UI iframes served by the dashboard can now authenticate via the emqx_auth cookie when no Authorization header is present. This only applies to /api/v5/plugin_api/... paths.
#16942 #17235 Introduced fine-grained scope-based access control for both API keys and dashboard login users.

API keys can now be restricted to specific API path categories using scopes derived from OpenAPI tags. Keys without scopes retain full access (backward compatible). An empty scopes list denies all scoped API paths. The publisher API-key role is now constrained to [publish] only.

Dashboard login users now also carry an optional scopes field; when set, requests are authorized against the same path-to-scope catalog used for API keys, layered on top of the existing role-based check. Four new scopes (user_management, mfa_management, sso_management, api_key_management) cover dashboard-only endpoints and are admin-only except mfa_management, which any role may hold for self-exemption from forced MFA. API keys cannot hold any of the four login-only scopes. Both checks apply to the HTTP API and to bootstrap-file loading (incompatible scopes are dropped with a warning).

New public catalog endpoints expose the scope vocabulary for UI consumption: GET /api_key_scopes and GET /user_scopes, both accessible to any bearer-authenticated caller. The scopes field is also surfaced in GET /users, POST /users, and PUT /users/:username responses; when not explicitly set, the response projects the role-default scope list.

Additional behavior changes that follow from the new scope model:
- The dashboard.default_username user is protected as a break-glass account. It cannot be deleted, demoted from administrator, or have its scopes field set; only its description may be changed. This guarantees an operator always retains administrative access if other administrators lose or misconfigure their scopes.
- Self-service on a user's own record now respects scopes. Only the dedicated change-password and MFA self endpoints still bypass scope checks; other operations such as PUT /users/:self are subject to the user's scopes.
- PUT /users/:username and PUT /api_key/:name validate role changes against the effective persisted scopes when the request body omits the scopes field. Demoting a user or changing an API key role is rejected if the persisted scopes are incompatible with the new role.
- API key bootstrap files accept an optional fourth column for scopes (key:secret:role:scopes). Unknown or role-incompatible scope names are dropped with a warning rather than rejecting the whole file, so existing three-column bootstrap files remain loadable.
- The SAML SP metadata endpoint (GET /sso/saml/metadata) is now reachable without authentication, matching /sso/saml/acs.
#16943 Added per-backend force_mfa option for SSO (OIDC/SAML/LDAP).

When enabled, SSO users must complete TOTP MFA setup or verification before receiving a dashboard token, regardless of IDP-side MFA settings. Supports three MFA states: not_configured (force setup), enabled (require verification), and admin_disabled (skip MFA). New API endpoints POST /sso/mfa/setup and POST /sso/mfa/verify handle the MFA flow.

Existing users can be exempted or required individually by an administrator via DELETE/POST on /users/:username/mfa, and that decision overrides the live backend policy until the administrator changes it. SSO users on a force_mfa = true backend who disable their own MFA are required to set MFA up again on the next login; only an administrator-initiated disable exempts a user from the live policy.
#17178 The emqx ctl api_keys add CLI command now accepts a --scopes <scope1,scope2,...> option, matching the scope-based permission control already supported by the REST API.
#17218 Added an ACME client plugin (emqx_acme) that issues and renews TLS certificates from any RFC 8555 ACME CA (e.g. Let's Encrypt) into an EMQX managed certificate bundle, and rewrites the configured SSL/WSS and/or dashboard HTTPS listeners to consume that bundle.

Multi-tenancy

#17053 Added a new multi-tenancy configuration option multi_tenancy.post_auth_tns_expression.

When configured, it is a Variform expression evaluated after the authentication chain completes. Its rendered value is written into client_attrs.tns, the tenant namespace key used by multi-tenancy quota and routing decisions.

This lets operators derive the tenant namespace from authentication-response attributes (for example, a tag field returned by an HTTP auth backend) instead of relying only on pre-authentication mqtt.client_attrs_init. Example expressions: client_attrs.tag, or with a fallback coalesce(client_attrs.tag, username).

When the expression is empty (default), behavior is unchanged.
#17078 Inlined each managed namespace's configuration (session and limiter) in the response of GET /api/v5/mt/managed_ns_list_details, so management UIs can render a list of namespaces with their configuration in a single request instead of one additional call per namespace.

Gateway

#17013 Added GBT32960-2025 protocol support to the GBT32960 gateway.

The gateway now automatically detects the protocol version by frame header (## for 2016, $$ for 2025) and handles version-specific parsing and serialization, including:
- New 2025 info types: Vehicle, DriveMotor, FuelCell, Engine, Location, Alarm, PowerBatteryVoltage/Temp, FuelCellStack, SuperCapacitor, SuperCapacitorExtreme, and digital Signature.
- New command: Activation (0x09/0x0A).
- Version-aware parameter sizes for parameter query/setting (0x02/0x03: BYTE in 2025 vs WORD in 2016).
- 2025 vehicle login with BMS battery pack encoding fields.

Data Integration

#16929 Two new limiter kinds are introduced: delivery_messages and delivery_bytes. In contrast to the existing messages and bytes limiters, which limit messages published by a single client, the new limiter throttle messages received by a single client from any source. If the limit is hit, QoS 0 messages are dropped, QoS > 0 are queued internally, and a retry is scheduled. The retry time is derived from the limiter's configuration.

The new limiters are only supported for memory sessions (durable_sessions.enable = false).

If unspecified, the default values are unlimited, thus keeping backwards compatibility.
#16962 Improved Kafka source polling behavior by ensuring fetch requests wait briefly for data instead of returning empty batches immediately when no records are available. This reduces unnecessary polling delays and helps Kafka consumers receive new records more consistently.
#17011 Added ts_column and ttl configuration fields to the EMQX Tables (Rust NIF driver) connector.
- ts_column: Specifies a custom timestamp column name for auto-created tables (defaults to ts if not set).
- ttl: Sets the time-to-live hint for auto-created tables (e.g., 3 days).
These fields were already supported by the underlying greptimedb-ingester-erlnif driver (since 0.1.8) and are now exposed in the EMQX Tables connector configuration.
#17025 The way the InfluxDB database performs health checks and credential verification has been changed.

It no longer performs checks by executing SHOW DATABASES, which could be falsely flagged as a system penetration by some auditing systems.

See also emqx/influxdb-client-erl#54
#17031 Added session high-watermark history for license usage auditing.

EMQX now records the daily peak session count and retains at least 24 months of history. Operators can query this data via emqx ctl license history with optional --period daily|monthly and --json flags. A new license.high_watermark_timezone config controls the day boundary for bucketing.
#17046 Added a new metric actions.messages (and the corresponding actions_messages_rate in the dashboard monitor API) that counts the total number of messages handled by rule-engine action executions.

Because a single action execution may handle a batch of messages, actions.messages is greater than or equal to actions.executed, and actions_messages_rate reflects the true per-message throughput of actions.
#17089 MQTT ingress bridges now support consuming from remote message queues exposed as $queue/{name}/{bind-filter} when the remote broker supports MQTT 5 Subscription Identifiers. Queue subscriptions are rejected when Subscription Identifiers are unavailable, and regular topic subscriptions automatically retry without Subscription Identifiers if the remote broker does not accept them.
#17104 Blob name templates in aggregated upload actions (Azure Blob Storage, Amazon S3, GCS, Snowflake, S3 Tables) now accept date-part placeholders ${datetime.YYYY}, ${datetime.MM}, ${datetime.DD}, ${datetime.hh}, ${datetime.mm}, ${datetime.ss}, and ${datetime.DOY} (day of year), defaulting to UTC and rendered against the aggregation start time. Each part token may be prefixed with an explicit timezone — utc (same as no prefix) or local (EMQX node's system timezone) — e.g. ${datetime.local.YYYY} or ${datetime.utc.hh}. This enables Hive-partitioned object layouts (e.g. year=2025/month=04/day=22/hour=07/...) that are directly consumable by Spark, Databricks, and Synapse.
#17120 Added a new query string filter option to GET /clients_v2: node. When specified, online clients connected to the supplied node name will be returned, as well as disconnected clients last connected to them.
#17136 Added the ping_with_auth option for InfluxDB connectors. When enabled, health checks include the configured credentials for InfluxDB-compatible services that require authenticated health check requests. Also fixed the InfluxDB connector/action to preserve Unicode text when writing values from write_syntax literals or MQTT payloads.
#17165 Added the resource_opts.dispatch_strategy option for actions.

The new option defaults to per_clientid, preserving the previous buffer worker dispatch behavior. Setting it to random makes queries without an explicit pick_key use a random dispatch key, which helps spread traffic across multiple buffer workers when a small number of clients publish a large amount of messages.
#17170 #17282 #17297 Added tcp_opts (nodelay, sndbuf, recbuf, buffer, keepalive, delay_send, active_n) to the MQTT bridge connector and Cluster Link configurations, so the outbound MQTT client TCP socket can be tuned per connection. Unset fields keep the operating system / gen_tcp defaults. delay_send (off by default) coalesces small writes for better throughput at the cost of a small latency increase.
#17221 Improved Cluster Linking diagnostics for MQTT message forwarding.

When message forwarding connections experience connectivity issues, the link resource status and respective alarms now include the disconnect reason, making configuration problems easier to identify.
#17245 Added Chinese and English translations for the MQTT Disk-Queue bridge plugin's configuration UI in the Dashboard.

Deployment

#17079 Add service.wsEnabled option to the Helm chart to suppress the ws/wss Service port entries when MQTT WebSocket listeners are disabled. Defaults to true to preserve existing behavior.

Bug Fixes

Core MQTT Functionalities

#16779 Improve handling of malformed first packets by classifying them as invalid CONNECT packets and adding better protocol hints in logs.
#16781 Fixed CONNECT validation when retained messages are unavailable.

When mqtt.retain_available is set to false, CONNECT packets with Will Retain set are now correctly rejected with CONNACK reason Retain not supported (0x9A).
#16783 Fixed MQTT v5 SUBSCRIBE validation for Subscription-Identifier upper bound.

EMQX now accepts 268435455 (0x0FFFFFFF), which is the maximum valid Subscription Identifier value defined by the MQTT spec.
#16847 Fix a crash when non-ASCII unicode string is used in message transformation expression.
#16874 Fixed a rare issue where Durable Storage backed by DS Raft could stop accepting new messages after a sequence of quick cluster leadership changes, requiring a node restart to recover.
#16876 Changed log message 'msg_publish_not_allowed' to 'msg_not_routed_to_subscribers'.
#16974 In EMQX 6.1.1, when a session was subscribed to a topic filter containing retained messages and was later taken over or resumed without re-subscribing to the same topic filter, it would receive again the received messages. Now, the previous behavior is restored, meaning that, upon session resumption or takeover without explicit re-subscription, retained message iteration will cease.
#17139 Restored retainer.enable as a real runtime switch for the retainer subsystem.

This allows deployments to keep MQTT retained-message protocol support enabled while disabling retained-message storage, instead of relying on mqtt.retain_available, which can reject retained publishes at the protocol layer.
#17172 Fixed an issue where MQTT packets (such as PUBACK) sent by a client right before disconnecting could be lost when the connection process had pending outbound messages in its mailbox. Now the connection process correctly drains its mailbox before shutting down, ensuring that inbound packets are processed even after the socket is closed.
#17175 Fixed an issue where messages delivered from Streams did not apply subscription options such as Subscription Identifier from the stream subscription.
#17353 Fixed an issue in the socket TCP backend where outbound MQTT packets could be sent in the wrong order when a client connection experienced repeated send congestion. This scenario was practically very unlikely to occur.
#17383 After a session takeover, the channel info reflected by the dashboard and REST API (mqueue_len, inflight_cnt) now updates immediately after the takeover replay completes, rather than waiting for the next 15-second stats refresh tick.

Rule Engine

#16699 Previously, under certain race conditions, long and cryptic logs like the following could be printed:

2026-02-03T13:53:54.576326+00:00 [error] Generic server <0.11323236.0> terminating. Reason: {{badkey,'actions.success'},[{erlang,map_get,['actions.success',#{}],[{error_info,#{module => erl_erts_errors}}]},{emqx_metrics_worker,idx_metric,4,[{file,"emqx_metrics_worker.erl"},{line,683}]},{emqx_metrics_worker,inc,4,[{file,"emqx_metrics_worker.erl"},{line,322}]},{emqx_rule_runtime,do_eval_action_reply_t...

Now, we print more meaningful information to help debug the issue.

#16780 Fixed an issue in authorization source validation where requests missing the type field could trigger an internal error.

Now EMQX returns a clear BAD_REQUEST validation error for this case.
#16796 Fixed handling of multiline SQL statements in connector actions.
#16805 Added support for authz hook results to opt out of authorization cache storage for dynamic ACL decisions.
#17211 Added the connected_at field to the $events/client/connack Rule Event, which was stated in the documentation but missing from the actual data.

Data Integration

#16936 Fixed an issue where the health check of an Azure Blob Storage Action in aggregate mode could timeout if the container contained too many blobs.
#16955 Eliminate Kafka producer action false health check warning logs.

Previously if Kafka producer is idling for too long, Kafka may close the connection (typically default is 10 minutes), if Kafka producer action health-checks happen to be performed around the same moment, there could be a false warning message with message "not_all_kafka_partitions_connected".
#16972 HTTP and GCP PubSub Actions were patched to treat transient connection errors with reason closing as recoverable errors, reducing log noise.
#17001 Fixed an issue where MQTT source failed to receive messages from $queue/ subscriptions when the remote broker has the Message Queue (mq) feature enabled.

The root cause was that the MQ message delivery did not include the MQTT v5 Subscription-Identifier property in PUBLISH packets, which the MQTT bridge ingress relies on to route messages from queue subscriptions.
#17068 Fixed EMQX Tables TLS connector startup when ssl.verify is verify_none and cert file paths are left empty, and aligned Rust NIF TLS verify propagation with connector config.
#17084 Fixed an issue with MQTT Sources in which, if its Connector used clean_start = false and reconnected to a broker with a session containing messages, those messages would not trigger rule actions.
#17111 Fix query execution for PostgreSQL connectors in disable prepared statements mode. Previously, concurrent queries could interleave and produce errors.
#17113 Fixed RocketMQ connector isolation: a misconfigured or unreachable RocketMQ connector no longer destabilises other RocketMQ connectors on the same node. Previously, one connector with an unreachable broker could stall the shared client supervisor for up to 60 seconds, causing sibling connectors to flap with resource_health_check_timed_out and for dashboard operations on them to hang.

The default TCP/TLS connect timeout is also lowered from 60 seconds to 10 seconds so a misconfigured server surfaces as failed quickly instead of appearing stuck.
#17180 Fixed an issue where, under heavy load, a timed out call to a MongoDB process would be interpreted as an unrecoverable error and wouldn't be retried. Now, the message will be retried on such events.
#17216 Fixed Timescale/PostgreSQL actions to report a structured bad parameter error instead of crashing the database connection process when a quoted JSON numeric string is mapped to a FLOAT column.
#17250 Fixed Redis Sentinel connectors to support separate authentication settings for Redis data nodes and Sentinel nodes.
#17293 Fixed an issue where, when writing a Parquet file with an object containing a required key but with an undefined/null value, a corrupt file would be written instead of raising an error.
#17303 Upgraded Kafka client libraries: brod from 4.5.2 to 4.5.4 and wolff from 4.1.9 to 4.1.10.

Notable fixes picked up from upstream:
- brod: fix a race condition during Kafka connection re-authentication (via kafka_protocol 4.3.4).
- wolff: under high-memory load control (drop_if_highmem), keep a minimum buffer reserve so the producer is not starved of in-flight data; only bytes exceeding the reserve are dropped.
#17343 Fixed a clustered-config replication bug where importing a data backup (or loading a HOCON config via emqx ctl conf load / PUT /api/v5/configs) that contained a file-type authorization source could leave peer nodes lagging with a cluster_rpc_apply_failed / failed_to_read_acl_file error.

The importer used to write the ACL file locally and replace inline rules with a path, then ship the path-form config across the cluster. Peer nodes have no such file on disk and so could not apply the change. The config sent to the cluster now keeps rules inline, so each peer writes its own copy of the ACL file from the replicated content.
#17347 Upgraded the RocketMQ client dependency to v0.7.2 to fix memory growth in async producer requests.
#17439 Fixed an issue where the health check of an Azure Blob Storage Connector could timeout, or generate large bandwidth costs, if the storage account contained too many containers. Companion fix to fix(azure blob storage aggregated action): list at most 1 blob during health check (r58) #16935.
#17450 Fixed an issue where the /prometheus/data_integration Prometheus endpoint could repond with a 500 status when using mode=node. This issue would only arise when the configuration for Actions and Connectors was manually edited and inconsistent, having an Action whose Connector does not exist.
#17474 Reduced the overhead of IoTDB REST API connector health checks by using a bounded version query instead of listing all databases on each check.

Clustering

#17132 Fixed an issue where adding or removing topic metrics could fail on a replicate node when its raw config or runtime state had drifted, raising a cluster_rpc_apply_failed alarm and stalling cluster RPC replication. Duplicate-add and missing-remove are now rejected on the initiator only, while replicates apply the change idempotently.
#17182 Bump to emqx-OTP 27.3.4.2-8 for mria.

Without this change, during EMQX startup, Mria app boot may get stuck if it's not connected to the cluster.
#17214 Removed cryptic error-level logging of disconnect events from Cluster Link message forwarding MQTT clients, in favor of more user-friendly messages with enough context for troubleshooting. Events similar to this one should no longer appear in the error logs:
```
2026-05-06T03:00:48.738654+00:00 [error] [PoolWorker] unexpected info: {disconnected,141,#{}}
```
#17218 Avoid bin/emqx and bin/emqx_ctl invocations from triggering nodeup/nodedown events on the running broker, which previously surfaced as misleading cm_registry_node_down warnings in the broker log. The temporary helper nodes started by these scripts now register as hidden Erlang nodes, as intended.
#17269 Improved cluster recovery after a network partition.
1. Previously, part of the clients connected to the replicant nodes could be lost from the global registry. This could lead to inconsistent behavior during takeover and incorrect information displayed in the dashboard.
  
  This fix adds a background process that re-registers the existing clients when network partition is healed. It also adds a new alarm: "Broker is recovering after a network partition", which is raised while the global registry is being rebuilt.
2. Introduced a new cluster auto-heal algorithm that can automatically recover overlapping network partitions.
#17342 Fixed cluster configuration import failing with a "required_field: node.cookie" schema check error when the exported cluster.hocon contained a partial node section. Read-only roots (node, rpc) are not part of the data import anyway, so they are now dropped from the imported config before the pre-flight schema check, letting the running node's own values be used for the validation.
#17348 Fixed noisy and misleading emqx ctl conf cluster_sync status diagnostics when clustered nodes have the same effective checked configuration but different raw configuration representations.

The command now suppresses raw-only representation differences that do not correspond to checked configuration changes, while still warning when checked configuration is inconsistent. It also avoids crashing when a raw configuration key exists on one node but is missing from another node.

It also ignores timestamp-only metadata differences in created_at and last_modified_at for actions, sources, bridges, and rule metadata. Data import or boot-time configuration loading can refresh these generated timestamps on only some nodes even when the effective runtime configuration is otherwise identical.
#17349 Improved responsiveness of a Cluster Link in situations when route replication was stuck connecting to an unresponsive target cluster. Now, deleting such Cluster Link should finish slightly sooner.
#17382 Fix corruption of global channel registry that may occur when cluster experiences a network partition.
#17424 Fixed a global session registry leak that could leave duplicate or stale entries for the same client ID after a network partition followed by Mnesia autoheal.

Discard and takeover-kick RPC handlers now also remove the registry row when the target process is no longer alive, and the registration throttle on the connect path now recognizes tombstone rows (no local channel state) and reaps them instead of blocking new connections for the same client ID indefinitely.
#17432 Fixed an issue where concurrent Cluster Link API requests could return generic error responses, instead of returning either success or not found.
#17469 Fixed the issue where warnings similiar to those below are emitted when enabling or disabling an active Cluster Link.
```
[warning] tag: RESOURCE, msg: handle_resource_metrics_failed, reason: {badkey, matched}, event: matched, ...
```

Access Control

#17045 Fixed password-based authentication backends to let the auth chain continue when the CONNECT packet has no password, instead of rejecting the connection immediately.

Previously, if a client connected without a password, the first password-based authenticator (built-in database, MySQL, PostgreSQL, MongoDB, Redis, or LDAP) in the chain would return an error, blocking any subsequent authenticators from being tried.
#17064 Closed an authorization gap in the /authentication/:id/users REST endpoint so that a namespaced administrator can no longer list or create users in the global (or another tenant's) namespace by omitting the ns query parameter or the namespace body field. Authentication users in a non-global namespace can no longer be marked as is_superuser; requests to create or update such a user are rejected so that explicit ACL rules are always enforced for tenant MQTT clients.
#17100 Fixed OIDC SSO login failing with provider_not_ready when the identity provider returns a JWKS response whose Content-Type uses the +json structured syntax suffix (e.g. application/jwk-set+json; charset=utf-8). Such responses are now accepted as valid JWKS content.
#17122 Fixed dashboard RBAC checks for SSO users with URL-encoded usernames such as email addresses, so viewer self-service MFA disable requests work correctly when force_mfa is disabled.
#17140 Fixed a silent failure when EMQX fetched a Certificate Revocation List (CRL) over HTTP from a server that returns a DER-encoded body (Content-Type: application/pkix-crl, the format mandated by RFC 5280 §5).

Previously, EMQX only decoded PEM-encoded CRL bodies; a DER body was silently treated as zero CRLs and cached as an empty list, causing every TLS handshake on enable_crl_check = true listeners to fail with bad_crls, no_relevant_crls and no log line indicating what went wrong.

EMQX now decodes both PEM and DER CRL bodies. When a fetched body is neither, a warning is logged with the URL so the misconfiguration is visible.
#17171 Fixed an RBAC issue that prevented namespaced dashboard administrators from enabling or disabling MFA for their own account.

Namespaced administrators remain restricted from managing MFA settings for other dashboard users.
#17177 Dashboard-created REST API keys are now generated randomly instead of being derived from the API key name.
#17223 Fixed missing client certificate when a TCP-passthrough proxy (e.g. GCP TCP Proxy NLB, AWS NLB) is placed in front of an SSL listener with proxy_protocol = true. The TLS handshake at the listener was completing successfully and the client certificate was present, but it was not exposed to authentication or rule events. Functions, ACL rules, and authentication backends that depend on the client certificate (CN, subject, full PEM) now work correctly in this deployment shape.
#17330 Hardened the PROXY Protocol v2 TLV parser on TCP and SSL listeners with proxy_protocol enabled. Previously, a TLV whose declared length overran the buffer caused the parser to silently truncate the TLV stream, dropping any trailing fields. The parser is now strict: malformed TLV streams cause the connection to be rejected with a warning log entry instead of being accepted with a partially parsed PROXY header.
#17428 Fixed a Dashboard OIDC SSO crash that prevented EMQX from completing the OpenID provider discovery when the provider's .well-known/openid-configuration response included a Cache-Control header such as max-age=0 (observed with Kanidm). The crash caused the OIDC supervisor to exhaust its restart budget after a single failure, leaving SSO unable to recover without a config re-save. The cache-control parser is now tolerant of these values, the worker no longer hard-crashes on a bad expiry, and the OIDC supervisor allows several restarts within a minute so transient failures retry cleanly.

Gateway

#17141 Fixed CoAP connection-mode token takeover so reconnecting UDP/DTLS clients can resume with a valid token while invalid token/clientid combinations are rejected. Also ensured required connection info fields are present before running CoAP takeover connected hooks.
#17258 Fixed an issue in the MQTT-SN gateway where a connected client sending a second CONNECT packet on the same session would crash its connection process. The gateway now responds with a DISCONNECT and closes the session gracefully.
#17287 Fixed MQTT-SN clients crash caused by packets received in unexpected connection or Will states, including DISCONNECT during connection setup, REGISTER before the Will handshake completes, and WILLMSGUPD before a Will topic exists.
#17419 Fixed CoAP gateway observe notifications to honor the gateway.coap.notify_type setting.

Observe notifications now use a per-session confirmable in-flight window of 1 and a fixed pending queue of 100 entries shared by all observe tokens. When a confirmable notification is in flight, later observe notifications are queued instead of being silently lost. When the queue is full, the oldest pending notification is dropped, delivery.dropped.queue_full is incremented, and a throttled warning is logged.

Cancelling an observe relation now also removes pending notifications for that observed topic/filter and observe token, so queued notifications are not delivered after the client has cancelled the observe, including wildcard observe filters.

Observability

#16842 Reduced noisy plugin config warning logs when no peer node has the plugin config yet.

Previously, when a node tried to fetch plugin config from peer nodes during startup, it would log a warning even when all peers simply didn't have the config (e.g., first node to load the plugin). Now this benign case is logged at debug level, and only genuine errors (RPC failures, timeouts) remain as warnings.
#16843 Fixed an issue where HTTP headers and query string parameters were not passed through to plugin API handlers, causing plugins to receive empty headers and missing query parameters.
#16863 Added a warning log when an async reply is received for an already-expired request.
#16868 Improved REST API authentication error messages to guide programmatic clients toward using API keys (Basic auth) instead of repeatedly logging in for bearer tokens. Error responses now mention the api_key.bootstrap_file configuration option and the POST /api_key endpoint for creating persistent API keys.
#16879 Add log.audit.cache_size as the primary config key for the audit log DB cache size, while keeping log.audit.max_filter_size for backward compatibility.
#16890 Fixed an ExHook issue where successful reconnect reloads could duplicate the same server name in the running list and trigger repeated callback dispatches.
#16904 Prevent enabling or starting multiple versions of the same plugin at once. When a newer version is enabled, older configured versions of that plugin are automatically disabled, and management API actions now return a clear error instead of reporting success while another version is still active.
#16939 Fixed the built-in database authenticator so it no longer logs a warning when the default bootstrap file path is configured but the file does not exist.
#16956 Log client connection termination at warning level instead of info when the reason is emsgsize (received packet exceeds mqtt.max_packet_size).
#17002 Updated minirest library to version 1.4.12. This version fixes a bug that caused EMQX API to produce malformed API responses with 204 No Content status line, emitting invalid content-length header.
#17024 Dashboard HTTP listener now automatically uses IPv6 when the bind address is an IPv6 address, removing the need to explicitly set inet6 = true.
#17054 Fixed GET /api/v5/configs?key=... returning incomplete data when Accept: application/json was set.

Previously, the JSON response ignored the key query parameter and always returned a fixed subset of root configurations, which excluded keys like multi_tenancy. The endpoint now honors the key parameter in JSON responses consistently with the hocon (text/plain) response.
#17118 Improved pagination on multi-tenancy list endpoints (/mt/ns_list, /mt/ns_list_details, /mt/managed_ns_list, /mt/managed_ns_list_details, /mt/ns/{ns}/client_list):
- Added an RFC 8288 Link: <?...>; rel="next" response header. When more pages are available the header carries the query-only URI-reference of the next page; when absent, the current response is the last page. This removes the prior ambiguity where a full page (len(results) == limit) could not be distinguished from the exact-boundary "no more data" case without an extra request.
- Added inclusive keyset cursor query parameters (first_ns, first_clientid) alongside the existing exclusive cursors (last_ns, last_clientid). The inclusive form supports exact-match lookup (e.g. ?first_ns=foo&limit=1) and is preserved across paginated Link headers when the caller opts in. The two forms are mutually exclusive on a single request; supplying both returns HTTP 400.
#17134 Fixed invalid json term error returned by the banned clients listing API for client ID and username regex bans created before 6.2.0. The compiled regex retained in the database from the older release is now translated back to the original pattern string when serializing the response.
#17227 Cluster config file save errors now name the file and the underlying reason.

When cluster.hocon (or its directory) is read-only, immutable, or otherwise unwritable (e.g. mounted read-only into a container), changing config via the dashboard or REST API previously returned an opaque HTTP 400 with body {config_update_crashed,{badmatch,{error,ebusy}}} and only logged a badmatch crash that did not name the file.

The error now:
- Logs failed_to_save_conf_file with the actual file path and reason (eacces, eperm, ebusy, ...) plus a hint listing common operator-side causes.
- Returns a structured HTTP 400 body that names both the file and the reason, so the cause is visible in the dashboard without digging through node logs.
Previously, when only the temporary file write failed (e.g. read-only directory), the API silently returned HTTP 200 even though the change was not persisted to disk. The API now correctly reports failure in this case as well.
#17246 Upgraded jose library from 1.11.10 to 1.11.12, picking up EC and EdDSA key fixes for newer OTP releases.
#17247 When a plugin's REST API callback crashes or runs over its timeout budget, the broker now logs the failing API method and path together with the configured timeout, so the offending call is identifiable in mixed-traffic logs. A timeout is logged as a warning (not an error) and includes a hint pointing at plugins.api_endpoint.timeout, the config key to raise when a plugin callback legitimately needs more time.
#17254 Improved memory-usage reporting inside containers. The broker now picks the most constraining memory reading among cgroup v2, cgroup v1, and the host's /proc/meminfo (smallest non-zero total wins, larger usage ratio breaks ties). Previously the reading could be misleading in two ways: on containers with a tight cgroup limit, the host view could indicate >70% while the cgroup limit was <10% (or the reverse); and on hosts where a cgroup is mounted with no memory limit set, the cgroup reading could collapse the reported usage ratio to ~0%. Overload-protection thresholds and the Memory used metric now reflect the limit that actually constrains the process.
#17319 GET /api/v5/schemas/{hotconf,actions,connectors} now returns the response with Content-Type: application/json. Previously the response body was valid JSON but the header was text/plain; charset=utf-8, which broke clients that dispatch on the response content type.
#17406 Now, events captured by a trace initiated by a namespaced admin are limited to the namespace of such admin, for traces of types topic, IP address, and clientid. Traces of type rule ID already had such behavior.
#17473 Lower the log level of unabled_to_stop_plugin_apps from warning to info when the plugin's Erlang applications cannot be stopped because other running applications still depend on them. This is an expected, non-actionable condition during plugin unload and no longer raises a warning.

Deployment

#16901 Fixed RPM package OpenSSL dependency for RHEL 9.6 LTS: pinned openssl >= 3.5.1 for RHEL >= 9.7 and openssl >= 3.0.7 for older RHEL 9 versions.
#17311 Fixed Docker startup when the container hostname cannot be resolved. The entrypoint now falls back to the interface IP address before auto-generating the node name, and fails with a clear error if no node host can be determined.
#17369 Moved the dashboard listener defaults (http.bind and the placeholder HTTPS ssl_options) from the user-editable etc/emqx.conf into the shipped etc/base.hocon. Runtime updates -- including those made through the dashboard, the REST API, or the emqx_acme plugin's automatic HTTPS configuration -- are now correctly preserved across restarts instead of being silently reverted to the default self-signed certificate by the hardcoded emqx.conf block.
#17504 Fixed bin/emqx failing to detect a running node when its command line is wider than the terminal. The process discovery ps -ef call has been switched to ps -efww so that long -root <path> arguments are not truncated and the running EMQX process is reliably matched.

This discussion was created from the release EMQX Enterprise 6.1.2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EMQX Enterprise 6.1.2 #17527

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

EMQX Enterprise 6.1.2 #17527

Uh oh!

zmstone Jun 9, 2026 Maintainer

Breaking Changes

Enhancements

Security Hardening

Clustering

Observability

Access Control

Multi-tenancy

Gateway

Data Integration

Deployment

Bug Fixes

Core MQTT Functionalities

Rule Engine

Data Integration

Clustering

Access Control

Gateway

Observability

Deployment

Replies: 0 comments

zmstone
Jun 9, 2026
Maintainer