Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming adr #1157

Merged
merged 8 commits into from
Dec 14, 2023
Merged

Conversation

cardenaso11
Copy link
Contributor

@cardenaso11 cardenaso11 commented Nov 8, 2023

An ADR on improving the internal interfaces used in hydra-node to allow for better extension points to consume and produce StateChanged events.


  • CHANGELOG updated or not needed
  • Documentation updated or not needed
  • Haddocks updated or not needed
  • No new TODOs introduced or explained herafter

@cardenaso11 cardenaso11 force-pushed the streaming-adr branch 2 times, most recently from 69d3cfb to 39e7606 Compare November 8, 2023 10:40
Copy link
Collaborator

@ch1bo ch1bo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the context you give and most of the rationale for this aligns with ADR25.

I left some comments to prompt further detailing on the decision section. It would be great if you could respond on why some of the proposed changes are needed for your project so we can distill that into the key decision points.


- Client must be flexible and ready to handle many different events

- There is no way to simply hand off transactions to the hyrda-node currently, a full connection must be initiated before observed transactions can be applied, and bulky JSON objects can slow down "bulk" operations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, there is a way to connect to the websocket without receiving the full history (at least) by u sing history=0 query parameter.

- Many applications, like SundaeSwap's Gummiworm Protocol, do not need need the full general API, and benefit from any performance improvement

- The challenge of subscribing to event types is complex to handle, but would be applicable to many parts of Hydra, not just for subscribing to new transactions. It's a good fit for passing stuff off to a message queue (MQTT, Kafka, whatever), probably from a dedicated process (see "Chain Server" below)
- Could also just directly use a "compatible" spec like STOMP or MQTT, but that would lock in implementation details
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ADR25 is a related draft proposal.

- This mock chain was used to mock the layer L1 chain, not the L2 ledger
- Attempt in February 2023 to externalize chainsync server as part of [#230][#230]
- Similar to [#119][#119], but the "Chain Server" component in charge of translating Hydra Websocket messages into direct chain, mock chain, or any hypothetical message queue, not necessarily just ZeroMQ
- Deemed low priority due to ambiguous use-case at the time. SundaeSwap's Gummiworm Protocol would benefit from the additional control enabled by the Event Server
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these two points relevant? The chain layer is mostly unrelated to the Client API.


A new abstraction, the EventServer, will be introduced. Each internal hydra event will be sent to the event server, which will be responsible for persisting that event and returning a durable monotonically increasing global event ID. Different implementations of the event server can handle this event differently. Initially, we will implement the following:
- MockEventServer, which increments a counter for the ID, and discards the event
- FileEventServer, which increments a counter for the ID, and encapsulates the existing file persistence mechanism
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Persisting the event stream should not be optional as this would change the node's capabilities of not retaining Head state (and it does not compose well.. loading the state is in a completely different location part of the application than when consuming these events)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should definitely be the default, and perhaps required when running online, but in offline mode, when using it as a tool in a larger component, the state is persisted / managed externally. When using it as a component to do integration testing, saving the state would slow down the tests.

That being said, so long as the infrastructural piece is here, we can maintain that capability in our fork specifically for the Gummiworm protocol.

A new abstraction, the EventServer, will be introduced. Each internal hydra event will be sent to the event server, which will be responsible for persisting that event and returning a durable monotonically increasing global event ID. Different implementations of the event server can handle this event differently. Initially, we will implement the following:
- MockEventServer, which increments a counter for the ID, and discards the event
- FileEventServer, which increments a counter for the ID, and encapsulates the existing file persistence mechanism
- SocketEventServer, which increments a counter for the ID, and writes the event to an arbitrary unix socket
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different to the websocket variant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Namely, it could write to a unix socket in a binary format, rather than incurring the overhead of a web socket + tcp + serialization. We can also expose all internal events to the socket, which would be useful in ecosystem integrations, whereas the websocket currently exposes only a subset of events (from my understanding, I may be wrong here)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The websocket currently contains all information, even a bit more. The shape is different and additional persistence is hacky. So it's not perfect, but we should be able to unify things here.

I don't think there is much of an overhead in using WebSockets over unix domain sockets. Plus there is additional complexity of message framing on domain sockets.

It sounds like what you want is rather a more compact encoding (binary instead of JSON)? I have suggested a way to select the encoding via a content-type in ADR25

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok; my understanding from talking with Arnaud was that there were some events that are modelled internally but not exposed to the websocket, so we were trying to provide a deeper integration without breaking the websocket API, but yea, in that case we can unify the websocket and unix socket APIs for now

- Generalizes the UTxO persistence mechanism introduced in [offline mode][offline-mode]
- May be configured to only persist UTxO state periodically and/or on SIGTERM, allowing for performant in-memory Hydra usage. This is intended for offline mode usecases where the transactions ingested are already persisted elsewhere.
- One configuration which we expect will be common and useful, is the usage of a MultiplexingEventServer configured with a primary MockEventServer, and a secondary UTxOStateServer, which will persist the UTxO state to disk.
- MultiplexingEventServer, which has a primary event server (which it writes to first, and uses its ID) and a list of secondary event servers (which it writes to in sequence, but discards the ID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that instead of one, we might have multiple event servers configured for event consumption? Maybe this can be decided without going into too much details how a MultiplexingEventServer is structured (i.e. why compose with a mock one if you could rather say that "zero or more event servers can be registered")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here was that we may want to multiplex out the results (save to disk and write to dynamodb), but we structured it this way fir two reasons:

  • if we have this requirement of an ID, then the IDs that different servers provide may conflict, so we need one to choose unambiguously
  • some of these sources may be required for ACID properties (persist to durable storage) and some may be sufficient for "fire and forget" for informational purposes (like analytics), so the secondaries were in that spirit

That being said, if we don't have the id requirement, and we don't care about the ability to do fire and forget (or we can implement that via a wrapper), then I'd be happy to simplify this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the event already has an ID and we can simplify things here (as commented above as well)

- UTxOStateServer, which increments a counter for the ID, and updates a file with the latest UTxO after the event
- Generalizes the UTxO persistence mechanism introduced in [offline mode][offline-mode]
- May be configured to only persist UTxO state periodically and/or on SIGTERM, allowing for performant in-memory Hydra usage. This is intended for offline mode usecases where the transactions ingested are already persisted elsewhere.
- One configuration which we expect will be common and useful, is the usage of a MultiplexingEventServer configured with a primary MockEventServer, and a secondary UTxOStateServer, which will persist the UTxO state to disk.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be an oddly specific decision to be taken. But I understand this is what you actually implemented in #1118 and aim to be using in gummiworm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. I'm going to try to rephrase it to make the use-case a little more coherent. It's intended to be used to easily maintain a UTxO that can later be used as a starting UTxO

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment here about re-using UTxO in offline mode and why that actually should not be needed

- One configuration which we expect will be common and useful, is the usage of a MultiplexingEventServer configured with a primary MockEventServer, and a secondary UTxOStateServer, which will persist the UTxO state to disk.
- MultiplexingEventServer, which has a primary event server (which it writes to first, and uses its ID) and a list of secondary event servers (which it writes to in sequence, but discards the ID)

New configuration options will be introduced for choosing between and configuring these options for the event server. The default will be a multiplexing event server, using the file event server as its primary event server, and a websocket broadcast event server as its secondary event server.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the main point of the decision block.. how will the behavior of the hydra-node change from the outside? How much configurability needs to be introduced and how can the default event consumers be turned off?

slug: 29
title: |
29. EventServer abstraction for event persistence
authors: []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
authors: []
authors: [@cardenaso11]

---

## Status
N/A
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
N/A
Draft

@Quantumplation
Copy link
Contributor

@ch1bo @abailly after struggling to make the pieces fit for this ADR for a while, we significantly simplified it by separating the persistence changes from the websocket changes; we still plan to provide both, so expect a second ADR, but rather than trying to unify those mechanisms, we'll just make separate improvements to each.

Let me know if you want to discuss these on a call!

@ghost
Copy link

ghost commented Dec 6, 2023

@Quantumplation Thanks a lot for this, I will have a look later on today and get back to you but I think it's all pretty clear.

@ch1bo ch1bo self-requested a review December 6, 2023 10:46
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left some comments, I think we should clarify the intention behind the "at-least-once" semantics.


The Hydra node represents a significant engineering asset, providing layer 1 monitoring, peer to peer consensus, durable persistence, and an isomorphic Cardano ledger. Because of this, it is being eyed as a key building block not just in Hydra based applications, but other protocols as well.

One remaining difficulty in integrating Hydra into a fully productionalized software stack is the persistence model. Currently, the Hydra node achieves durability by writing a sequence of "StateChanged" events to disk. If a node is interrupted, upon restarting, it can replay just these events, ignoring their corresponding effects, to recover the same internal state it had when it was interrupted. However, downstream consumers don't have the same luxury.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to complete this paragraph and fully motivate the change with one example of how this would be useful in production, if possible from your experience

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please. I do not understand the context / motivation for this.

The decision section does (to my understanding) imply that "persistence mechanisms" would get events re-submitted upon startup of the node. This, however does sound more like we would want to have "event sinks" for the internal event stream (StateChanged + likely more stuff). That would make sense to me, especially as the Hydra.API.Server could be modeled as such an event sink.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ch1bo That sounds like what we had before, but when we went to actually define how it would work (in particular, right now they're treated very differently; and defining the exact recovery semantics like in the example below, revealed just how different they are), it was unclear how we would actually unify the two; after talking with @abailly-iohk, he suggested a much simpler step, just focused on extending the persistence mechanism.

Let me elaborate here first, and once we reach understanding, i'll codify it in the ADR.

Currently, we want to fork the hydra node for the gummi-worm protocol, produce a stream of "archives" of the transactions we see confirmed. We have 3 ways to do this:

  • Build an application that connects to the websocket API, and bundle up and persist the transactions from the websocket messages we see
  • Build an application that tails the persistence log file on disk, and performs the relevant logic
  • Fork the hydra node, and replace the persistence mechanism with our own

You might imagine someone that wants to run this in a cloud native setting, where events are persisted to kinesis so they can be consumed from an AWS lambda, for example, would have a similar set of options.

However, approaches 1 and 2 have a number of downsides: they are a fully separate component to deploy, maintain, monitor, restart, etc; 2 in particular is dependent on a file format that is likely to change (at least, more so than the websocket API);

Approach 3 would be difficult to maintain long term, because as the hydra node changes underneath it, it would be more and more work to merge the upstream changes into our fork.

The idea here is to genericize the persistence mechanism so that the fork can extend it in a fairly unobtrusive way.

It's a little frustrating, because we've been going back and forth on this design for a while now, and we keep seeming to reach agreement on what would be a good thing to build while we're on a call together, and then that has diverged by the time we get the ADR up for review.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more aspects to your analysis of approaches. IMO approach 1 would be most favorable because it does not increase the scope to be maintained and allows for more loose coupling with a clear interface. For example, I do not see it favorable to add AWS dependencies to the hydra-node just for optional storing on S3 (or kafka deps, or ...). Not sure about a plugin mechanism either, but maybe an option.


I do understand your frustration. Maybe we should use less ambiguous modes of communication like drawing/amending an architecture diagram to clarify one or the other approach?

Or prototyping what you actually need for your solution and chat about code?

Anyhow, please see my comments as early as possible feedback and we can also merge the ADR as is.. the actual implementation will result in discussions about details and further refinement anyhow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, I do not see it favorable to add AWS dependencies to the hydra-node just for optional storing on S3 (or kafka deps, or ...)

Agreed, we do not plan to add this directly to the hydra node; the goal is to make it possible to add in a fork (or through a plugin mechanism), while at the same time reducing the overall burden of keeping that fork up to date.

As it is now, we'd have to make enough changes to the hydra code base that each time you touched the persistence mechanism at all, we'd have to go through a potentially painful merge process.

By having this mechanism, us (and other users) would simply be implementing a standard interface in the fork, and the chance of merge conflicts is much less drastic.


One remaining difficulty in integrating Hydra into a fully productionalized software stack is the persistence model. Currently, the Hydra node achieves durability by writing a sequence of "StateChanged" events to disk. If a node is interrupted, upon restarting, it can replay just these events, ignoring their corresponding effects, to recover the same internal state it had when it was interrupted. However, downstream consumers don't have the same luxury.

We propose generalizing the persistence mechanism to open the door to a plugin based approach to extending the Hydra node.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence seems more relevant to the Decision section. I would suggest replacing it with concrete examples.
eg.

For example, a hydra-node deployed as part of a larger system on k8s could want store it's event stream 
using some persistence service afforded by the stack it is part of instead of plain, "naive", disk-based 
persistence. This would allow scenarios like graceful shutdown and restart across redeployments, or 
redundancy of hydra-node-based services through a primary-secondary architecture.


# Decision

We propose adding a "persistence combinator", which can combine one or more "persistenceIncremental" instances.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We propose adding a "persistence combinator", which can combine one or more "persistenceIncremental" instances.
We propose adding a `CombinedPersistence` mechanism, which can combine one or more `PersistenceIncremental` instances.


When appending to the combinator, it will forward the append to each persistence mechanism.

As the node starts up, as part of recovering the node state, it will also ensure "at least once" semantics for each persistence mechanism. It will maintain a notion of a "primary" instance, from which events are loaded, and "secondary instances", which are given the opportunity to re-observe each event.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not entirely clear to me what you mean here, perhaps a diagram (sequence?) would help understanding how would that work? There's both the append and the loadAll part to consider.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine we have a setup that uses the CombinedPersistence mechanism to:

  • save the StateChanged to the log file
  • builds up a larger archive of transactions (say, 10,000 transactions at a time), and then saves them to S3
  • publishes a message for each StateChanged on kinesis

Now imagine we write the StateChanged event to disk, and crash when saving to S3. On startup, we will load all events from StateChanged and replay them against the hydra state, recovering our place. However:

  • the S3 destination wouldn't see the StateChanged event, and would lose progress on the most recent archive it was producing; downstream consumers would see a gap in the stream
  • The kinesis stream would also not see that last StateChanged event, and would also see a gap.

Perhaps, you might suggest, you solve this by reversing the order of these sinks: we save the event to each of the other sinks, and the persistence on disk last. Now, when we recover, the state changed has never happened according to the node, but we've emitted it to S3 and kinesis, letting downstream actors react to state that the node then might not consider final.

So, if we build this combinator, it needs to not just feed the StateChanged events to the hydra state on startup (the outcomes of which get thrown away), but also to the other PersistenceIncremental instances. This gives an "at least once" delivery semantics for each PersistenceIncremental instance; This allows it to recover its partial state, and upgrade that to "exactly once" delivery semantics if the underlying medium support it (for example, checking if a file exists in S3, keeping track of it's own delivery cursor, etc.)


Each persistence mechanism will be responsible for it's own durability; for example, it may maintain its own checkpoint, and only re-broadcast the event if its after the checkpoint.

The exact implementation details of the above are left unspecified, to allow some flexibility and experimentation on the exact mechanism to realize these goals.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough :) But at least the ADR should be clear about the overall behaviour of the combination of persistence: Is it "transactional"? probably not as we mentioned "at least once" but then some persistence backends might become unreliable and we need to clarify what they should expect upon reloading?


## Consequences

Here are the consequences we forsee from this change:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Here are the consequences we forsee from this change:
Here are the consequences we foresee from this change:

- The default operation of the node remains unchanged
- Projects forking the hydra node have a natively supported mechanism to extend node persistence
- These extensions can preserve robust "at least once" semantics for each hydra event
- Sundae Labs will build a "Save transaction batches to S3" proof of concept extension
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


When appending to the combinator, it will forward the append to each persistence mechanism.

As the node starts up, as part of recovering the node state, it will also ensure "at least once" semantics for each persistence mechanism. It will maintain a notion of a "primary" instance, from which events are loaded, and "secondary instances", which are given the opportunity to re-observe each event.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If "secondary instances" are not even used to load anything.. are these even "persistence mechanisms"?

I do understand this idea rather as an "event sink".

Practically this would mean that instead of piggy-backing the append interface, we would want the node to submit events to zero or more event sinks.

I assume the event in question is what is called StateChanged today + any information we currently model as Effect (at least ClientEffects). That is, any information which you could observe internally on the persisted state file + what is sent on the websocket today (ServerOutput).

Note that the term is heavily overloaded, but let's postulate with such a data Event.

A useful design IMO would be to have an

data EventSink m = EventSink { process :: [Event] -> m ()

handle, where the node has a list of those:

data HydraNode m = HydraNode { ..., eventSinks :: [EventSink m] }

and stepHydraNode (it's main loop) would process them in something like:

-- [...]
outcome <- logic
-- Not all hydra logic outcomes may yield and event
deriveEvents outcome $ \events ->
  forM_ eventSinks $ \EventSink{process} -> process events

That way, we could have multiple implementations for EventSink. I read sending things off to S3 or through some IPC into a scrolls filter.

We could even model the current API data Server = Server { sendOutput :: ServerOutput -> m () } as such an event sink.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH.. not sure if we want to really implement those sinks in the hydra-node package at all? Maybe having one generic event stream exposed on some socket to which other processes could attach and consume event messages is more flexible?

(I do believe that websockets using binary encoded data is a good candidate for such a socket)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned above, in a conversation with @abailly-iohk, we were talked out of this "EventSink" idea, for two reasons:

  • the "StateChanged" events and other outcomes were handled so differently, so unifying them seemed awkward
  • the semantics of how the replay after a crash works, in that case, get very confusing (see my comment with an example scenario above)

I'm happy to implement that if we're circling back to it, but I'm starting to feel like a ping-pong ball between you and Arnaud 😅

@ch1bo
Copy link
Collaborator

ch1bo commented Dec 12, 2023

Rewrote several parts based on our discussion today. Also, here is a half-way updated architecture diagram I scribbled together (red stuff is new):

image

@ch1bo ch1bo requested review from Quantumplation and a user December 13, 2023 10:42
@ch1bo ch1bo removed their assignment Dec 13, 2023
@Quantumplation
Copy link
Contributor

I like the revision; splitting "EventSource" and "EventSink" makes it pretty clear what the distinction is.

To me, the only open question is, what are the es that the EventSource and EventSink are producting/consuming? Is it the outcome type, or the StateChanged type, or some refactor of those types?

@ghost
Copy link

ghost commented Dec 13, 2023

To me, the only open question is, what are the es that the EventSource and EventSink are producting/consuming? Is it the outcome type, or the StateChanged type, or some refactor of those types?

I think it can be left out of the ADR, and the implementation will be telling.

  • Implementing a first version with only the StateChanged will be straightforward, because there won't be any filtering involve should one want to restore a node from an event stream.
  • Then I suspect opportunities for factoring and simplifying the node's logic will surface, having all the Outcome be part of the event stream, and filter according to the usage. This should allow us for example to change how the Server receives its event.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments before I go on holidays, seems like we are getting there


* We realize that the `EventSource` and `EventSink` handles, as well as their aggregation in `HydraNode` are used as an API by forks of the `hydra-node` and try to minimize changes to it.

## Consequences
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An important consequence I would add, which is also in #1213, is that it becomes necessary to version external representation of data we share with the outside world. Perhaps this is worth mentioning here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should be the concern of a given EventSink and is especially valuable if we hope to have EventSink and EventSource be compatible across revisions of their implementations.


* The `stepHydraNode` main loop does call `putEvent` on all `eventSinks` in sequence.

- TBD: transaction semantics? what happens if one fails?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we expect the append to succeed and any exception thrown will kill the node. For the purpose of making the node restartable, this is probably good enough a guarantee?

* Sundae Labs can build a "Save transaction batches to S3" proof of concept `EventSink`.
* Sundae Labs can build a "Scrolls source" `EventSink`.
* Sundae Labs can build a "Amazon Kinesis" `EventSource` and `EventSink`.
* Extension points like `EventSource` and `EventSink` could be dynamically loaded as plugins without having to fork `hydra-node` (maybe in a future ADR).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possibility discussed on discord would be to make the sources and sinks configurable and "generic" through a URL-based mechanism, eg. you would start a node with:

hydra-node .... --data-source file:///some/file/path --data-sink tcp://some.host:1234 --data-sink https://some.service/

Then the code of the node knows how to talk to these external components but does not control them,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea to use a URI to select & configure available source and sink implementations. I'll add a note about source & sinks could be configurable (first step) or dynamically loaded (later step), but I think we agree that we should approach this incrementally and only build what we (also @Quantumplation) actually need.

Copy link
Collaborator

@ch1bo ch1bo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specified that all loaded events on startup should be (re-)submitted to all configured eventSinks and incorporated other reviewer comments.

We likely need to update one or the other things still when implementing this, but this is a very good Draft state ADR now!

Copy link
Contributor

@Quantumplation Quantumplation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! This is largely what I was envisioning a while back, and the split between "Source" and "Sink" really solves the challenges I was having. Thanks for taking a pass at revising this!

@ch1bo ch1bo merged commit 5d1593a into cardano-scaling:master Dec 14, 2023
17 of 18 checks passed
@Quantumplation Quantumplation deleted the streaming-adr branch December 14, 2023 19:13
@ch1bo ch1bo added this to the 0.15.0 milestone Dec 22, 2023
@ch1bo ch1bo mentioned this pull request Feb 26, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants