Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify NATS stream replicas #7023

Closed
wkloucek opened this issue Aug 11, 2023 · 13 comments
Closed

Specify NATS stream replicas #7023

wkloucek opened this issue Aug 11, 2023 · 13 comments

Comments

@wkloucek
Copy link
Contributor

wkloucek commented Aug 11, 2023

Is your feature request related to a problem? Please describe.

I would like to have more than 1 replica for the oCIS NATS streams. See also https://docs.nats.io/nats-concepts/jetstream/streams

Describe the solution you'd like

Have a setting to set the replica to a number > 1 I like.

Describe alternatives you've considered

Manually edit the streams in NATS, eg nats stream edit main-queue --replicas=3.

Additional context

~ # nats s report
Obtaining Stream stats

╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                    Stream Report                                                     │
├────────────────────┬─────────┬───────────┬───────────┬──────────┬─────────┬──────┬─────────┬─────────────────────────┤
│ Stream             │ Storage │ Placement │ Consumers │ Messages │ Bytes   │ Lost │ Deleted │ Replicas                │
├────────────────────┼─────────┼───────────┼───────────┼──────────┼─────────┼──────┼─────────┼─────────────────────────┤
│ OBJ_userlog        │ File    │           │ 0         │ 2        │ 541 B   │ 0    │ 0       │ nats-1*                 │
│ OBJ_postprocessing │ File    │           │ 0         │ 308      │ 150 KiB │ 0    │ 372     │ nats-0*                 │
│ main-queue         │ File    │           │ 7         │ 1,025    │ 1.0 MiB │ 0    │ 0       │ nats-2*                 │
│ OBJ_eventhistory   │ File    │           │ 0         │ 1,884    │ 1.3 MiB │ 0    │ 0       │ nats-1*                 │
╰────────────────────┴─────────┴───────────┴───────────┴──────────┴─────────┴──────┴─────────┴─────────────────────────╯

The current stream looks like this:

~ # nats stream info main-queue
Information for Stream main-queue created 2023-08-11 12:58:41

             Subjects: main-queue
             Replicas: 1
              Storage: File

Options:

            Retention: Limits
     Acknowledgements: true
       Discard Policy: Old
     Duplicate Window: 2m0s
    Allows Msg Delete: true
         Allows Purge: true
       Allows Rollups: false

Limits:

     Maximum Messages: unlimited
  Maximum Per Subject: unlimited
        Maximum Bytes: unlimited
          Maximum Age: unlimited
 Maximum Message Size: unlimited
    Maximum Consumers: unlimited


Cluster Information:

                 Name: nats
               Leader: nats-2

State:

             Messages: 0
                Bytes: 0 B
             FirstSeq: 0
              LastSeq: 0
     Active Consumers: 7

@wkloucek wkloucek added Category:Enhancement Add new functionality Type:Bug and removed Category:Enhancement Add new functionality labels Aug 11, 2023
@wkloucek
Copy link
Contributor Author

@kobergj from what I understand, having replicas = 1 does mean that a stream is dead when the nats node, where the stream lives, is gone?

@kobergj
Copy link
Collaborator

kobergj commented Aug 16, 2023

Isn't that always the case? From what I understand, if you set replicas=3, then 3 copies of the same event will be stored. But if the node is dead it cannot reach any of these 3 copies so the stream is dead too?! Or maybe I just misunderstand...

@wkloucek
Copy link
Contributor Author

See NATS documentation:

https://docs.nats.io/nats-concepts/jetstream#persistent-distributed-storage (Section: Stream replication factor)

Rather than defaulting to the maximum, we suggest selecting the best option based on use case behind the stream. This optimizes resource usage to create a more resilient system at scale.
Replicas=1 - Cannot operate during an outage of the server servicing the stream. Highly performant.
Replicas=2 - No significant benefit at this time. We recommend using Replicas=3 instead.
Replicas=3 - Can tolerate loss of one server servicing the stream. An ideal balance between risk and performance.
Replicas=4 - No significant benefit over Replicas=3 except marginally in a 5 node cluster.
Replicas=5 - Can tolerate simultaneous loss of two servers servicing the stream. Mitigates risk at the expense of performance.

So we definitely need 3 replicas for every stream when we want to achieve HA.

@wkloucek wkloucek added Severity:sev2-high operations severely restricted, workaround available Priority:p3-medium Normal priority labels Aug 16, 2023
@kobergj
Copy link
Collaborator

kobergj commented Aug 16, 2023

But isn't that a setting of the nats js server? We probably don't need it for the single binary, right?

I cannot define the value when I publish something because the stream already exists?

@wkloucek
Copy link
Contributor Author

But isn't that a setting of the nats js server? We probably don't need it for the single binary, right?

It's a setting of the stream. So when creating a stream, oCIS should specify replicas. Currently it's set to 1 replica. But for some oCIS installations this should be 3 or 5. 1 Replica is fine for the single process use case because we only have one NATS node.

I cannot define the value when I publish something because the stream already exists?

Right, it's a property of the stream. That setting can be changed when creating a stream or
anytime if you're allowed to modify a stream. This is possible via eg. nats-box, but this requires additional stream management, either manual or automated. Therefore it would be way nicer if there was a AUDIT_EVENTS_REPLICATION_FACTOR that would initially set and later on change the replicas setting of a stream. AUDIT_EVENTS_REPLICATION_FACTOR would default to 1 for the oCIS single process use case.

@kobergj
Copy link
Collaborator

kobergj commented Aug 17, 2023

Ah now I got it. go-micro currently only creates a new stream if it doesn't exists. We can add an option upstream to pass the replica setting.

Updating the stream is another thing. Should we add a "create-or-update" logic to go-micro? Or do you see downsides with that?

@wkloucek
Copy link
Contributor Author

Ah now I got it. go-micro currently only creates a new stream if it doesn't exists. We can add an option upstream to pass the replica setting.

Yes, probably this would apply to nats-js caches, events and store implementations, since they all leverage streams.

Updating the stream is another thing. Should we add a "create-or-update" logic to go-micro? Or do you see downsides with that?

I would prefer the "create-or-update" logic. At least increasing replicas works flawlessly (when the enough NATS nodes are available). I did not test reducing replicas and setting replicas > number of available nodes.

@wkloucek
Copy link
Contributor Author

For Kubernetes the NACK exists: https://docs.nats.io/running-a-nats-service/configuration/resource_management/configuration_mgmt/kubernetes_controller

We also could decide that stream configuration needs to be done outside of oCIS

@wkloucek wkloucek added Category:Enhancement Add new functionality Severity:sev4-low no loss of service, req. for docs info or enhancement and removed Type:Bug Severity:sev2-high operations severely restricted, workaround available labels Aug 30, 2023
@wkloucek
Copy link
Contributor Author

working on it in owncloud/ocis-charts#388

@wkloucek wkloucek added p4-low Topic:Documentation and removed Priority:p3-medium Normal priority Severity:sev4-low no loss of service, req. for docs info or enhancement p4-low Category:Enhancement Add new functionality labels Sep 5, 2023
@wkloucek
Copy link
Contributor Author

wkloucek commented Sep 5, 2023

@mmattel I would consider this as a documentation only task.

We should document that oCIS is creating streams with replicas set to 1. If one has a NATS cluster (=multiple nodes), one must ensure to also raise replicas for the streams. We'll add this to the Helm Chart deployment example in owncloud/ocis-charts#388

@wkloucek
Copy link
Contributor Author

will be supersed if #7272 is implemented

Copy link

stale bot commented Dec 15, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 10 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status:Stale label Dec 15, 2023
@micbar micbar added the Category:Enhancement Add new functionality label Dec 15, 2023
@stale stale bot removed the Status:Stale label Dec 15, 2023
@micbar
Copy link
Contributor

micbar commented Jan 26, 2024

Closed by owncloud/ocis-charts#472

@micbar micbar closed this as completed Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants