New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple Recordings Extension #151
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an improvement, and is significantly simplified in that it does not modify the core specification, which I believe we have agreed is not something any extension should do.
My suggestions are as follows:
- Make the
global
primary
field required (see inline comment). - I agree, while this field was originally added at my suggestion I believe it largely serves as a convenient way for users to shoot themselves in the foot.
- We can add something like the following to cover the intent of the
channel
field:
### The `streams` field
The `multirecordings:streams` field is a JSON array
of `recording` strings, as defined by this extension, that indicate multiple
streams of data that were captured as part of the same event.
This field MUST only appear in the `primary` recording's metadata file. The
primary recording must appear first in this list, and the order of the array
can be used to infer channel order.
If we do suggestion 2 above, then the logic to determine what channel a recording is a little clunky but explicit:
primary_recording = this_file[`global`:`multirecordings:primary`]
load primary_recording_metadata
streams = primary_recording_metadata[`global`:`multirecordings:streams`]
channel_num = index(streams[this_file])
Thanks Ben!
|
||
| name | required | type | description | | ||
| ------------- | -------- | --------- | ----------------------------------- | | ||
| `primary` | false | recording | The primary recording this recording is linked to.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this field should be required if using this extension, and if the data is the primary channel it is its own filename. This will enable code implementing multirecordings
to always be capable of reliably referencing the primary channel, which may be the only file holding certain metadata.
| ------------- | -------- | --------- | ----------------------------------- | | ||
| `primary` | false | recording | The primary recording this recording is linked to.| | ||
| `channel` | false | uint | The channel index of this recording.| | ||
| `streams` | false | object | List of SigMF recordings that represent multiple streams.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be slightly reworded to be more specific:
| streams
| false | object | List of SigMF recording
objects that represent multiple streams.|
"example-channel-0", | ||
"example-channel-1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two lines should be:
"example-0",
"example-1"
I had misread and misunderstood that part so my apologies. I do think that is somewhat complicating. Is the primary purpose of this to avoid duplicate copies of annotations and captures? If so, it may be simpler to just add This certainly does not cover every use case, and there are edge cases we might want to address but what do you think about this general idea? |
If I'm understanding this right, you want to bundle recordings together by including metadata about the bundling in the individual recordings. If bundling is a goal, then I think there should be higher level metadata object, rather than an extension object, that describes this, leaving the individual recordings unmodified. Then, you can remap channels, put together recordings from different sources, etc., without modifying anything. |
@jacobagilbert - Yes, exactly. So, if normally you would describe a field, say, geolocation, with a single value (in this case a Geopoint), but instead want to describe it with a recording of Geopoints because it's a time-varying field (i.e., the receiver is moving), how do you go about that? In my mind, that's the toughest problem this PR has to solve. We have discussed three potential options thus far:
@willcode raises an interesting idea... it's actually kind of related to an idea that @dbouquin and I discussed at Hat Creek Observatory a couple of years ago regarding a top-level metadata file showing relationships between recordings. We had never talked about using it for this specific use-case, but now that Jeff calls it out, it does seem super relevant. So, in this context, would the value in the specific recording (again, let's say geolocation) be the "start point", and then in a top-level metadata file, we say "the geolocation field in recording A is described by recording B"? |
Been thinking about this. I think @willcode's suggestion is the right path, and it's actually the same core concept of #108 from Daina about having a top-level metadata file that describes relationships between recordings. In addition to other benefits, the fact that each individual Recording remains the same as it would have been otherwise is huge. I'm experimenting a bit with how to make this work, working name is a "collection". Right now, the working model is that fields from other namespaces that can be described by recordings get added to the "collections" namespace, and in the key/value pair, the value is a [base filename, dataset hash] JSON array. There are a couple of new fields that need to get added, such as "streams" for MIMO systems. (note: in above image, the values of ["string", "hash"], the first String is meant to be a base filename like N.sigmf-[meta|data].) The downside of this is that every field that can be used at the top-level must get added to the extension. BUT, if we don't do that, then we have to overwrite the definition of those fields to allow pointing to a Recording as the value for a key, which I think is worse. Better to re-define the key in a new namespace attached to a new value datatype that create confusion in the original namespace, IMO. Posting this to gather thoughts / feedback on the approach. |
Certainly from a practical perspective, this seems much more manageable than having individual collects optionally reference each other. Collection seems like a reasonable name. |
A parallel application is video/audio editing MLT/EDL |
I do believe allowing arbitrary type overloading will be a parsing burden (mostly in libsigmf), and I think that something like this is reasonable. Adding a file to manage simpler more common use cases (like 2 phase coherent channels) does seem a bit heavy handed, but I've been thinking about this and there are drawbacks to all approaches i can come up with and I do not find this latest suggestion is any more onerous than others.
This is a reasonable idea, but I think we should require an unambiguous way to identify that a file is part of a collection, and to identify and access the other files in that collection given any file that is part of said collection. This could be a required (or possibly optional...) global object which defines the top level collection file (and the presence of this field indicates it is part of a collection). If there is the objective that specific files be part of multiple independent collections then this won't work (though at that point they are all part of the same collection, right? its entirely possible I am missing a use case here). Final thought: @willcode was your suggestion that this be implemented as part of the core spec (and not as an extension)? It sounds like that but I am not sure. |
I actually didn't see this in Jeff's note, but I was thinking about this this morning and I think that's exactly what we should do - especially since it preserves existing backwards compatibility. BUT, where I think this breaks is that we need the fields in their own namespace, rather than |
Hmm, I don't have a real opinion about core vs ext for collections, but it's a vertical extension, where "extensions" are horizonal. So does it really fit the concept of an extension? |
I also do not think this fits the concept of an extension for the same reason Jeff mentioned. We could implement something like the following structure: {
"global":
{
"core:collections":
{
"geolocation": "geo_data_basefile",
"hagl": "hagl_data_basefile",
"streams": ["channel-0", "channel-1", "channel-2", "channel-3"]
}
},
...?
} And define the top level collections file as just the appropriate namespace Of course at that point we can probably just put this object directly in the 'primary' collection data file and avoid the need for a top level file altogether (full circle to your idea 3 above that has not been seriously considered). I also think we should be explicit about what values can take on a Do you envision use of this concept be confined to |
Responding to specific points from @willcode and @jacobagilbert -
Okay, so, let's say these are our requirements:
Thinking about it, right now we've got three top-level objects: global, captures, annotations. It sort of feels like we are describing a fourth, honestly, but one that is optional. So, if we pull on that thread, the next question is -- how do you specify the fields in that object? Other than top-level objects, all other fields are namespaced, and breaking consistency there seems confusing. Also, I would argue we /have/ to namespace them, because user extensions will certainly add new fields to the collection object under their own namespace. I think what this distills down to is that the main spec now specifies two namespaces, core and collection, and four top-level objects. It defines the fields from core that are part of collection. Users made add new fields under extension namespaces to the collection object. Mock-up file: {
"collection": {
"collection:extensions" : {
"antenna": "v0.1.0"
},
"collection:geolocation": ["latlong", "hash"],
"collection:hagl": ["hagl", "hash"],
"collection:azimuth": ["antenna", "hash"],
"collection:streams": [
["example-channel-0", "hash"],
["example-channel-1", "hash"]
],
// this one is from a user extension
"userdef:somefield": ["foobar", "hash"]
}
} I think I like this... only problem I have with it is calling the top-level object "collection" and the namespace "collection" is confusing. Maybe we change the namespace defined by the primary spec to something else to be less confusing? Otherwise, thoughts overall? |
I agree this is effectively a 4th top level object. You raise some very good points, but I am still not sure it is necessary to specify these in a namespace other than User extensions could then also define fields as valid in Another possibility for example.sigmf-collection: {
"collection": {
"core:extensions" : {
"antenna": "v0.1.0"
}
"core:geolocation": ["latlong", "hash"],
"core:hagl": ["hagl", "hash"],
"antenna:azimuth": ["antenna_az", "hash"],
"core:streams": [
["example-channel-0", "hash"],
["example-channel-1", "hash"]
],
// this one is from a user extension
"extension:field": ["datafile_for_field", "hash"]
}
} I may be missing the point of confusion with doing something like the above, so please let me know if so. |
I don't think you are missing anything -- I think I'm just more worried about having the same "key" name in the "core" extension but with two different datatypes for the "value" conditionalized on whether it's in "global" or "collection". I would love to get inputs from other folks, especially if it's more people that disagree with me. @n-west, @storborg, @willcode, @mormj, @Teque5? This is really, I think, the final detail before we hammer out the last draft of this, which is itself the last outstanding feature of v1.0.0 😊 |
Continued in #157. |
This PR replaces #99
New draft of the multi-recordings extension based on all of the discussion we have had to-date. I have a few issues with current draft, but sharing it for general comment / discussion so we can converge on the best solution.
My issues with it:
primary
,channel
, andstreams
fields. For example, the existence of thestreams
array implies that the current recording is the primary.channel
field is still useful when paired with thestreams
array.Here's my biggest concern:
The way the previous draft of this extension worked was by specifying exactly which fields in
core
and extension namespaces could have a multirecording link. That was clunky and painful. The new draft allows you to use it in place of any non-string datatype, which I think is cleaner and realistically the right move for future extensions. That means, though, that anytime this extension is included, parsers cannot assume that fields that are normally ints or bools are actually those -- they have to be checked to see if they are strings, first. This seems painful to me. You could actually consider this a violation of the "don't modifycore
" rule for extensions, since it enables a new datatype forcore
fields.I would especially love to hear thoughts on this from the broader group. As always, we're trying to figure out how to strike the right balance of not simplicity in the design and simplicity for applications working with those recordings.
Would be great to get comments / thoughts / perspectives, @jacobagilbert , @Teque5 , @storborg , @pwicks86 , @n-west , @gmabey