Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a design for SP -> Host information flow #91

Open
Aaron-Hartwig opened this issue May 12, 2023 · 5 comments
Open

Need a design for SP -> Host information flow #91

Aaron-Hartwig opened this issue May 12, 2023 · 5 comments

Comments

@Aaron-Hartwig
Copy link
Contributor

We don't really have a design for the SP -> Host class of requests yet to my knowledge. We could have the host poll regularly (via a new HostRequest message )and get the updates in bulk, but I'd rather we build out SpRequest messages to indicate presence changes, interrupts from modules, and hardware faults. This also feels like an important thing to solve imminently, so I'd propose we add it to the FCS milestone.

@bnaecker thoughts?

@bnaecker
Copy link
Collaborator

How apropos. I'm currently adding a polling loop into Dendrite now, which fetches the state every so often. I might argue that's good enough at this point, but I don't really know what's involved in getting the SP to emit these updates. Does that happen in the FPGA? Or is there a polling loop in the SP code? Another question is, would the SP emitting these messages impact correctness, or just latency? I suppose if state is changing more quickly than the polling loop, we'd lose resolution around those transients. But for steady-state behavior we'd learn what we needed.

@Aaron-Hartwig
Copy link
Contributor Author

It's polling the whole way down right now, so doing that at another layer would keep the model simple and that has been good enough. The SP just polls the FPGAs to get updated state, which it would then pass on to Dendrite at some frequency. The only impact here is latency. Any issues which require immediate action (thermal/power problems) have action taken directly by the SP or FPGA without needing Dendrite in the loop anyway.

@bnaecker
Copy link
Collaborator

bnaecker commented May 16, 2023

So, thinking through the design a bit. If we have the SP poll, we need to decide at least two things: (1) how often to poll and (2) where to send messages. I think that means we need a host request message for configuring those things, something like:

struct NotificationSetup {
    pub interval: std::time::Duration,
    pub address: std::net::SocketAddrV6,
}

After the SP acknowledges that, it would need to continually send messages, though we may want it to decide to remove that subscriber at some point if enough of the messages fail or are not acknowledged. It seems like we'd also want a way to deregister as a subscriber too.

Does that all make sense and sound reasonable? I think my worry is that it's a good bit of extra code and state to put into the SP. We'd need to keep track of subscribers, though I guess that could be limited to at most one at a time. (Or some other small number.) We'd need logic for polling, and for handling the case where we fail to send updates, possibly removing the subscriber at some point.

Actually, now that I think about it, we'd also need to decide which states we want to be notified about. Is it just status changes?

Another issue that comes to mind is that we might need to put the logic for polling into the host-side anyway. The main reason is so that we could tell the difference between getting no updates from the SP, and a network partition that prevents any updates from getting through.

Anyway, that's just a few random thoughts, let me know if that shakes anything loose.

@Aaron-Hartwig
Copy link
Contributor Author

If we have the SP poll, we need to decide at least two things: (1) how often to poll and (2) where to send messages.

Why does this need to be configurable? Also, isn't there only one place to send the message (the attached Scrimlet)?

After the SP acknowledges that, it would need to continually send messages, though we may want it to decide to remove that subscriber at some point if enough of the messages fail or are not acknowledged. It seems like we'd also want a way to deregister as a subscriber too.

Pub/sub seems complicated given we already have a model for this type of SP/Host communication (one request, one response).

Does that all make sense and sound reasonable?

Not really, I don't think we are on the same page here. I must be missing something.

I don't see why this warrants an entirely new paradigm to what we already have in place. My vision (which I acknowledge is not up to date with where you are at currently) would be to choose one of the following:

  • SP-driven: Make a SpRequest which contains some TBD state change information. This could be sent when relevant state has changed and could be filtered a bit (send at max every 500ms, for example). The Host would HostResponse ACK this.
  • Host-driven: In my head this is the polling implementation. The Host would send a HostRequest at some frequency to request some TBD state from the SP. The SP would then supply the requested information via a SpResponse. A simple implementation of this would be doing a HostRequest::Status every 1000ms or something.

Lets sync on this sometime soon so we can get on the same page, I think I'm lacking a piece of understanding that you have.

@bnaecker
Copy link
Collaborator

bnaecker commented Jun 7, 2023

@Aaron-Hartwig and I talked about this in person today. We're both in agreement that this should be deferred for now. We'll wait until we have good reason that polling isn't sufficient, which is what Dendrite and the Tofino SDE do today. That'll hopefully inform the design of the updates, both to the message protocol and the SP code as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants