Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

public API for SID #103

Open
trgill opened this issue May 7, 2021 · 11 comments
Open

public API for SID #103

trgill opened this issue May 7, 2021 · 11 comments
Assignees

Comments

@trgill
Copy link
Contributor

trgill commented May 7, 2021

Consider making the CLIs thin layers above a publicly usable API.

See:

#102 (comment)

#89 (comment)

@bmarzins bmarzins self-assigned this May 18, 2021
@bmarzins
Copy link
Member

@prajnoha I've been making the response formatting happen in the sid daemon itself, as mentioned in #102 (comment) and I have an issue I'm not sure if we need to solve. Currently, usid version will print the output in environment variable format as NAME=VALUE, while sidctl (and the formatter code in general) outputs TABLE format data as NAME: VALUE. Obviously if udev is importing the data, it needs to be in environment variable format, and commands with CMD_KV_EXPORT_UDEV, will ignore the requested format when exporting the udev keys. But do we care about whether something like the version output is in environment variable format. If so, I can add a new format type to the formatter code.

@prajnoha
Copy link
Member

Sure, the usid is supposed to be used in udev so that one requires KEY=VALUE for udev to import it correctly. On the other hand, sidctl is mainly for users where the : doesn't matter at all and is probably more readable than =. And if sidctl is used in scripts, we can either use existing json format... or if anybody will really need that, we can add the extra format option to have it in KEY=VALUE too, as you say.

Right now, I think, we don't need to add the extra formatting option - maybe if we have more sidctl commands (or other components using formatter code) that will need the KEY=VALUE more, we can add that, but probably not needed right now just for that sidctl version.

@bmarzins
Copy link
Member

The issue with not adding a seperate ENV formatting type is that the SID daemon is doing the formatting. So we can't use NAME=VALUE in usid and NAME: VALUE in sidctl, unless the daemon is able to format its replies with either type. Since we want usid to always use NAME=VALUE, we either give up on using NAME: VALUE in sidctl, or we add the new format type. But adding a new format type is easy, so I'll just do that, unless there are objections.

@prajnoha
Copy link
Member

Ah yes, I see, got mislead by the fact that usid uses pure fprintf right now, this is going to be shared formatting used for both. OK then, let's just add the extra format option...

@prajnoha
Copy link
Member

Ben, just wanted to ask - with your changes, we're still going to use the memfds, right? I mean:

  • daemon putting the formatted result into a memfd-backed buffer
  • then daemon sends the FD over to the other end (API endpoint) for the command/client/other...
  • then daemon closing its instance of FD
  • then the other side reading the received FD, doing output (or whatever else processing the other side needs)

Or you want to follow a different scheme?

@bmarzins
Copy link
Member

Ben, just wanted to ask - with your changes, we're still going to use the memfds, right? I mean:

* daemon putting the formatted result into a memfd-backed buffer

* then daemon sends the FD over to the other end (API endpoint) for the command/client/other...

* then daemon closing its instance of FD

* then the other side reading the received FD, doing output (or whatever else processing the other side needs)

Or you want to follow a different scheme?

I was planning on only having the kv store dump use a memfd, like things currenly are. The response header is already using a regular buffer, and it doesn't really make sense to push that to the memfd. For things like the stats and version commands, there's not really enough information being passed to make a difference. But passing the full ascii formatted kv store dump could be pretty big. If we weren't using a memfd for that, then we probably shouldn't be passing that data as formatted ascii. We should pass it in as compact a format as possible. And if we pass that data in a binary format, I don't really see the point in passing the rest of the commands as ascii. But there really isn't a reason why we couldn't pass all the command reply data in a memfd. It seems like overkill to just pass a couple of lines of text, but it would keep the interface common for all the commands.

Actually, I can think of one reason why we might want to treat the kv store dump differently from the other commands. It's possible that callers will want to query sid for more specific information than just a complete dump. The library callers likely don't want to grab the complete dump and process it themselves, to answer their queries. SID understands the how information is stored in the kv store, and can more easily do this processing itself. The question is where to process it: in the client library or in the daemon itself. The downside to processing it in the daemon is that we need to extend the SID client/server API whenever we add new commands or options to process the data differently. If the client just works from a dump of the whole kv store (or in the future possibly a partial dump) the client/server API doesn't need to change if the client adds new ways to process the dump. However, having the server pass formatted ascii output just adds extra work for the client to do. Formatting the output on the server side means that the client would have to parse the formatted output, then process it, and then reformat the processed output. Of course the question, of whether we want to pass the dump command reply data as binary or formatted ascii is orthogonal to the question of whether we want to pass command reply data in memfds or not.

@prajnoha
Copy link
Member

I was planning on only having the kv store dump use a memfd, like things currenly are. The response header is already using a regular buffer, and it doesn't really make sense to push that to the memfd.

That makes sense, yes. That way, we can have it clearly separate - the metadata/header (which won't be actually used by the very end user of API - it's more like flags/directions for the API code itself). Then the data part can be directly passed to the API user.

For things like the stats and version commands, there's not really enough information being passed to make a difference. But passing the full ascii formatted kv store dump could be pretty big. If we weren't using a memfd for that, then we probably shouldn't be passing that data as formatted ascii. We should pass it in as compact a format as possible. And if we pass that data in a binary format, I don't really see the point in passing the rest of the commands as ascii. But there really isn't a reason why we couldn't pass all the command reply data in a memfd. It seems like overkill to just pass a couple of lines of text, but it would keep the interface common for all the commands.

Sure, for commands like stats, it would really be an overkill. The dump is clear - that can get huge. The tree - that can get a bit bigger too if we also count resource tree for main as well as all worker processes right in the moment when we have several active worker processes (and if it is ascii formatted on daemon side). So simply, we'll keep both ways in the protocol - the fd (memfd) passing as well as direct data sending.

Actually, I can think of one reason why we might want to treat the kv store dump differently from the other commands. It's possible that callers will want to query sid for more specific information than just a complete dump. The library callers likely don't want to grab the complete dump and process it themselves, to answer their queries. SID understands the how information is stored in the kv store, and can more easily do this processing itself.

This is a must - I'm counting with this to be implemented in the future. Right now, we use full dump for our own debugging purposes mostly, but in the API for wider use we really need to support queries for both selecting the records and even properties within those records - that's the second most wanted feature from SID besides instantiating devices/stacks and collecting info.

This is going to be a whole chapter for us to support. For example, one of such queries could be "give me a list of all devices in the stack above that are affected if I change device X" and similar. We need to come up with a sane query format. But this is certainly for a separate discussion, we just need to count with this right now that something like this will come.

The question is where to process it: in the client library or in the daemon itself. The downside to processing it in the daemon is that we need to extend the SID client/server API whenever we add new commands or options to process the data differently.
If the client just works from a dump of the whole kv store (or in the future possibly a partial dump) the client/server API doesn't need to change if the client adds new ways to process the dump.

...I'm more inclined to the processing on daemon side and supporting that query format in the request. If that query format is powerful and generic enough, I don't expect much changes in the API itself.

However, having the server pass formatted ascii output just adds extra work for the client to do. Formatting the output on the server side means that the client would have to parse the formatted output, then process it, and then reformat the processed output.

The best would be if client can get the result in a form so that there's minimal or even none extra processing to be done on client side. So the clients are really thin. Sure, if the client wants to put the data into some report with rows and columns, that extra processing is unavoidable (just like if we wanted to stuff the records in some nice form in a GUI). So it's more a question of a good format to use for which the clients can use existing external libraries to parse the output easily (which JSON certainly is).

Of course the question, of whether we want to pass the dump command reply data as binary or formatted ascii is orthogonal to the question of whether we want to pass command reply data in memfds or not.

Yes.

BTW, one of the formats supported could as well be the binary one, besides ascii ones... If it was binary, we could possibly do it a way that it's easy and straightforward for clients to use predefined structure to iterate over data. Though it's more prone to changes and ordering, but we have the protocol versioned, that versioning can include formats too.

@prajnoha
Copy link
Member

prajnoha commented May 20, 2021

That said, right now, we're considering static data - so working on some instantaneous state or db snapshot. But in the protocol and for the API, we should also count with possible streaming support in the future - like monitoring incoming events, monitoring triggers being fired etc...

@trgill
Copy link
Contributor Author

trgill commented May 20, 2021

What are we thinking for the KV store moving forward? Are we going to consider moving to something like lightning db, or extend the code we've got to sync to disk?

@bmarzins
Copy link
Member

The question is where to process it: in the client library or in the daemon itself. The downside to processing it in the daemon is that we need to extend the SID client/server API whenever we add new commands or options to process the data differently.
If the client just works from a dump of the whole kv store (or in the future possibly a partial dump) the client/server API doesn't need to change if the client adds new ways to process the dump.

...I'm more inclined to the processing on daemon side and supporting that query format in the request. If that query format is powerful and generic enough, I don't expect much changes in the API itself.

Sure. I'll do that then.

@prajnoha
Copy link
Member

What are we thinking for the KV store moving forward? Are we going to consider moving to something like lightning db, or extend the code we've got to sync to disk?

As for syncing to disk, I think we should be able to provide that functionality even with existing db backend without any major problems - in the manner of the dump command. But here, instead of sending the buffer/memfd over to client, we'd copy the buffer content to disk. The buffer_write already calls sendfile to copy the memfd to speficied fd so that should be pretty quick operation.

A more effective alternative (if we wanted to avoid even the extra copying between the memfd and the file) would be for the buffer code to support writing to a file directly as a new type of buffer backend which would use mmap-ed file (instead of anonymous memory like as it is with memfd). The code should be then almost the same as with "dump" code, only changing the buffer backend to that mmap-ed file and skipping the part where the "dump" code sends the memfd to client.

Then adding the part which would read the file and reconstruct the db out of it.

However, we need to take care that certain records/properties are not persistent over reboots, like major:minor. So we need to use persistent identifiers here instead whenever we reference devices in the db, which we don't do right at the moment - we're using exactly that major:minor...

That's actually something I'm working on right now, where I'm trying to generate a persistent unique identifier for a device ("unique" for the purpose of SID) which would allow us to:

  • Define device aliases (same device can be identified by more than one identifier).
  • Make the records persistent.
  • Define triggers on sets of devices in persistent way.

(That also includes the logic to match the persistent identifier with the actual device with major:minor as seen on the system.)

What I'm more concerned about is the ability to do effective selection/querying on the database where our current db backend comes short - it's simply a hash. We'd need to be able to select subsets of records, ranges. For that, we need a different backend than the hash (or at least extend the existing hash heavily). Whether we do it on our own or whether we reuse an existing database, that's a question whether we can find a database that will suit our needs and it won't restrict us more than bringing any good. It's true that LMDB has very close to this, but I hadn't tried that in depth so I could say it's the one we'd use for sure. It's still an open question...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants