Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Js leaf deny #2693

Merged
merged 18 commits into from
Dec 16, 2021
Merged

Js leaf deny #2693

merged 18 commits into from
Dec 16, 2021

Conversation

matthiashanel
Copy link
Contributor

@matthiashanel matthiashanel commented Nov 17, 2021

From now on JS API traffic will be denied across leaf nodes, unless the system account is shared and the availability domain is identical.
The main reason we start to enforce this is that enabling jetstream in an availability domain would alter the api traffic flow.
For example, a leaf node without jetstream was able to communicate with a JS in the hub without using it's domain name.
If for an unrelated reason jetstream had to be enabled in a the leaf node, this default traffic would all of a sudden go some place else (remain local) where before it'd go to a hub.

To avoid this from the get go we suggest setting the domain in client applications as soon as you need to communicate with a JS other than the local one.
If domain names are already used in such a way, there is nothing to do.

We do also recommend setting domain names for each availability zone.
(Availability zones can only change when crossing leaf nodes. If the leaf node share the system account and specify the same domain name, they are considered within the same availability zone. When this is done, meta group leadership is pinned to server that accept leaf node connections. This is to avoid extra hops.)

If you are currently not using domain names and rely on the old behavior, the server contains a new option default_js_domain that contains a map from account name to default domain name.

default_js_domain:{
 ACC1:domain,
 ACC2:domain,
}

In case the JetStream you want to reach does not have a domain name set just specify "" as domain name.

Please be aware that if you are currently not using domain names, and have to specify the empty domain name, in order to connect accross leaf nodes both ends have to be configured in such a way. This is because with this change, each server is going to make it's individual decission about which traffic to deny. For example. If you have a hub without jetstream and a leaf node that offers JS.
Until you enabled a domain name in the leaf node and change your clients to provide this name, you have to set
default_js_domain:{your-account-in-hub:""} in the hub and default_js_domain:{your-account-in-leaf:""}
This will instruct both server to not deny default api traffic.

Alternatively you could also restart your leaf node and provide a domain name. Then, in the hub set default_js_domain:{your-account-in-hub:"leaf-domain-name"}

In mixed mode setups (rare use case) you will have to set the domain name in the non JS enabled server as well.
Basically telling them, to which availability domain they belong.

jetstream: {
	enabled: false
	domain: domain-this-server-belongs
}

In the rare case where a JS availability domain is extended via a non clustered leaf node, you need to explicitly instruct the server to be in clustered mode with this new JetStream config option extension_hint: will_extend.

There are two more situations where the extension_hint can be used, but the server will print when this option is applicable. Primarily they are there to assist eh server during initial startup until it was able to decide if it's leaf node connection was part of the same or a different availability domain.

@ripienaar
Copy link
Contributor

This default_js_domain:{ ACC1:domain, ACC2:domain, } seems really weird, we have nowhere this kind of structure in the configuration, seems weird to introduce a new style of flag.

why can default_js_domain not be set on the account?

@matthiashanel
Copy link
Contributor Author

matthiashanel commented Nov 17, 2021

This default_js_domain:{ ACC1:domain, ACC2:domain, } seems really weird, we have nowhere this kind of structure in the configuration, seems weird to introduce a new style of flag.

why can default_js_domain not be set on the account?

It covers jwt as well. (without a jwt change)
I suggested something like that early on and it was rejected.

When the same jwt based account is used on both ends of a leaf node, only having a single value might not be flexible enough. Meaning there might be different defaults depending on where you connect.

Lastly, is only to assist in the transition and is expected to be removed in a subsequent release.
Long term we want apps to provide a domain name if they are communicating with a js not in their (availability domain (in short cluster/supercluster))

Copy link
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comments on the MQTT code/tests for now.

server/mqtt_test.go Outdated Show resolved Hide resolved
replies sync.Map
nuid *nuid.NUID
quitCh chan struct{}
domain string // Domain or possibly empty. This is added to session subject.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Granted it would probably be more work, but I thought we discussed that we would actually store a prefix here (either $JS.API or actual $JS.<domain>.API) and add this prefix to all API subjects. We would have to replace existing references to the constants such as JSApiStreamCreateT to something like mqttAPIStreamCreateT that would be defined as the same than JSApiStreamCreateT but without the prefix, in this case %s.STREAM.CREATE.%s. So in place where we currently have:

jsa.newRequest(mqttJSAStreamCreate, fmt.Sprintf(JSApiStreamCreateT, cfg.Name), 0, cfgb)

we would do:

jsa.newRequest(mqttJSAStreamCreate, fmt.Sprintf(mqttAPIStreamCreateT, jsa.apiPfx, cfg.Name), 0, cfgb)

or something similar.

I agree that the places impacted currently should be "low volume", but still, that makes use of more sprintf/memory allocs if you trim the prefix and create the actual prefix in each requests (in newRequestEx).

We could go with your approach first and improve later if you think it is best to minimize changes in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will submit a separate PR once this one is oked.

@matthiashanel matthiashanel force-pushed the js-leaf-deny branch 3 times, most recently from b6d15bc to 71aa074 Compare December 2, 2021 21:25
server/jetstream.go Outdated Show resolved Hide resolved
server/jetstream.go Outdated Show resolved Hide resolved
server/jetstream_cluster.go Outdated Show resolved Hide resolved
server/leafnode.go Outdated Show resolved Hide resolved
server/leafnode.go Outdated Show resolved Hide resolved
server/server.go Outdated Show resolved Hide resolved
server/mqtt_test.go Outdated Show resolved Hide resolved
@matthiashanel matthiashanel force-pushed the js-leaf-deny branch 2 times, most recently from 38672ab to 48b4bc5 Compare December 11, 2021 18:10
Copy link
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have noticed a DATA RACE in Travis, which I can reproduce locally. Not saying that this is introduced by your changes but instead maybe the new tests show this.

I believe that where we have leafnode.go:1486:

siReply := acc.siReply

we should replace with:

siReply := copyBytes(acc.siReply)

Also, I noticed some missing defer nc.Close().

server/leafnode_test.go Show resolved Hide resolved
server/leafnode_test.go Show resolved Hide resolved
server/leafnode_test.go Show resolved Hide resolved
server/leafnode_test.go Show resolved Hide resolved
server/leafnode_test.go Show resolved Hide resolved
server/leafnode_test.go Show resolved Hide resolved
Signed-off-by: Matthias Hanel <mh@synadia.com>
Signed-off-by: Matthias Hanel <mh@synadia.com>
default_js_domain is a mapping from account to domain, settable when
JetStream is not enabled in an accountn

Signed-off-by: Matthias Hanel <mh@synadia.com>
Signed-off-by: Matthias Hanel <mh@synadia.com>
Signed-off-by: Matthias Hanel <mh@synadia.com>
Signed-off-by: Matthias Hanel <mh@synadia.com>
Signed-off-by: Matthias Hanel <mh@synadia.com>
Signed-off-by: Matthias Hanel <mh@synadia.com>
Signed-off-by: Matthias Hanel <mh@synadia.com>
Signed-off-by: Matthias Hanel <mh@synadia.com>
Signed-off-by: Matthias Hanel <mh@synadia.com>
Signed-off-by: Matthias Hanel <mh@synadia.com>
Signed-off-by: Matthias Hanel <mh@synadia.com>
Signed-off-by: Matthias Hanel <mh@synadia.com>
Signed-off-by: Matthias Hanel <mh@synadia.com>
Basically all server decided that they do not campaign.
Needed to manually kick them once they are no longer observer.

Signed-off-by: Matthias Hanel <mh@synadia.com>
Signed-off-by: Matthias Hanel <mh@synadia.com>
Copy link
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@matthiashanel
Copy link
Contributor Author

matthiashanel commented Dec 16, 2021

Early releases of the Jetstream feature attempted to "do the right thing" in a variety of circumstances, but the logic used proved less resilient to certain changes, allowing some actions-at-a-distance to break previously functioning deployments. We're updating the NATS server logic to provide a more resilient model.

Our advice for developers is that:

  1. You should always set a Jetstream Domain in the server
  2. If you only ever want to talk to Jetstream in the same server/cluster to which you connect, then the clients do not need to specify a Jetstream Domain
  3. In all other cases, the client should set the Jetstream Domain explicitly

Note in particular that a deployment model with a backbone to which leafnodes or leafnode clusters connect needs particular care in enabling Jetstream in the backbone: if clients are not setting a Jetstream Domain and expecting to talk to the one leafnode they have in their account, such an upgrade can break working configurations. Administrators can use default_js_domain/mqtt.js_domain to provide mappings to assist in keeping such clients functioning across a transition. However, we recommend that this be considered a transient upgrade assistance mode, not a long term configuration.

We added js_domain to permanently support mqtt running on the non JetStream enabled end of a leaf node configuration.
As you upgrade these servers, we recommend specifying the domain for mqtt to use, absent a local one.
Regular nats clients need to specify the nats.Domain option.
For mqtt, js_domain is the way to configure that.

For example:

mqtt: {
    js_domain: target-domain
}

Updating clients with a new domain setting may not be as quick or easy as updating nats-server.
That is why we added the default_js_domain. It is a map from account name (or account nkey) to domain (or "" when none was set) to use for client's that don't set a domain.
To avoid outages we recommend first upgrading and re-configuring the server in which accounts and clusters do not have JetStream enabled.
In ngs, if a leaf node enabled JetStream for clients connected to ngs, the domain name is now required.
Existing ngs customers are not affected, and only have to care about upgrading their leaf nodes and connected clients.

For example:

default_js_domain:{
 ACC1:other-domain-name,
 ACC2:my-default-js-traffic-goes-to-this-domain,
 ACC3:""
}

You should only add to default_js_domain accounts that do not have Jetstream enabled.
Once you have changed all your clients to either use the local JetStream or provide the appropriate Domain name and mqtt has js_domain set where needed, you can remove default_js_domain from your configurations.

In the rare case where a JetStream availability domain is extended via a NON-clustered leaf node, you need to explicitly instruct the server to be in clustered mode with the new extension_hint JetStream configuration option.
All potential usages of extension_hint are related to NON-clustered server. Any other potential usages to speed up the startup process will be printed on server startup.
Possible values to set are: will_extend, no_extend

For example:

jetstream {
    extension_hint: will_extend
}

If your cluster consists of a mix of servers with and without JetStream enabled, servers without JetStream enabled need to set the domain name to the same value as servers with JetStream enabled.

jetstream: {
	enabled: false
	domain: name
}

@matthiashanel matthiashanel marked this pull request as ready for review December 16, 2021 21:38
@matthiashanel matthiashanel merged commit 3e8b662 into main Dec 16, 2021
@matthiashanel matthiashanel deleted the js-leaf-deny branch December 16, 2021 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants