Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store the events for riemann in an external database #1003

Open
aratik711 opened this issue Dec 21, 2021 · 3 comments
Open

Store the events for riemann in an external database #1003

aratik711 opened this issue Dec 21, 2021 · 3 comments

Comments

@aratik711
Copy link

Is your feature request related to a problem? Please describe.
Currently there is no way to scale riemann, wanted to know if we can store the events in a separate DB/Cache and have multiple riemann use it making riemann scalable.

Describe the solution you'd like
Use of a separate DB to store events/ttl etc before they are sent to their destination(ex: influx)

Describe alternatives you've considered
I haven't figured out any alternatives yet. Suggestions are welcome.

@jarpy
Copy link
Contributor

jarpy commented Dec 26, 2021

You might want to try the Riemann Users mailing list to have a conversation about architectural patterns that could help you achieve your goals.

My team, for example, uses Logstash as a routing and queuing layer in front of Riemann. It is configured to send most events to Elasticsearch for storage, and also sends some of them to Riemann. If we needed to, we could use the routing layer to route subsets of events to multiple Riemann instances. Riemann itself is inherently not a distributed application, doing everything in memory. That makes it really fast, but leaves distributed architecture decisions in the hands of the operator.

@sanel
Copy link
Contributor

sanel commented Jan 13, 2022

AFAIK you can't scale Riemann this way, because there are two things to store:

  1. index database, which you might or might not use. This is just a hasmap of internal metrics, before they are expired. This can be sourced relatively easily to external storage, like Redis.
  2. core states through function calls. I don't think this can be easily put somewhere else.

I think, the only "proper" was for scaling Riemann is to use federation, something like Prometheus does [1] and @jarpy mentioned: have one Riemann that accepts all metrics and pass them down to another Riemann instances that will do specific logic, calculations or storing things in a database. Image:

                                +--------> riemann #2
            +------------+      |
  metric -> | riemann #1 | -----+
            +------------+      |
                                +--------> riemann #3

code:

(stream
  (where (metric #"^cpu")
    (forward riemann-2))

  (where (metric #"^disk")
    (forward riemann-3)))

Now, you could scale riemann #1 this way by adding multiple nodes behind of e.g. HAProxy, as long as you just forward events around. Also, if you happen to lose riemann #2, you might not get "cpu" events, but you'll get "disk" events. Not ideal, but better than a single instance.

[1] https://prometheus.io/docs/prometheus/latest/federation/

@vipinvkmenon
Copy link

Currently, the approach that we have is the one mentioned by @sanel is what we have for pseudo-Multi-Az approach .
we have multiple instances of #1 behind an LB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants