-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
length
field in core
#79
Comments
On the one hand, I dislike the idea because it is redundant with the data file, and that means it could be incorrect. On the other, we already have sample indexes for various positions, so there are already possible consistency issues, of course. But I'm thinking of a simple recorder that creates a metadata file, starts a thing that streams samples to disk, then exits (or does not have the ability to ask the data-streamer when it is done, or the whole process/machine is killed abruptly such as by the battery running out in an 'embedded' recorder). This would be unable to know what the final length is. In the streaming-SigMF applications, presumably length could be required for that purpose in particular but not necessarily in all recordings. Does the additional case you noted also need length to be a required field in all recordings, or could it be optional? |
@kpreid - You have a good point regarding redundancy with the data file. I also have an intense distaste for duplicated data, as it inevitably leads to errors. The one issue I see with not having it, though, is in the MD-only distribution case, where you don't have access to the datafile. That said, if we put it in And great point regarding Writers that don't have the ability to ask to ask the data-streamer when it's done. When this happens, though, wouldn't some other process need to come back around later and clean-up the metadata to make it a compliant recording? For example, it may need to concatenate the written That said, if the above is too burdensome, I think making it optional is a good midpoint, allowing Writers to remain very basic if they need to be, while also providing the extra utility for those applications that can handle it. Thanks for your continued input and feedback, @kpreid! |
If the recorder is not doing any annotation (because it is just a signal recorder with no smarts) and there is no required length field, then it only needs to write one capture segment and other metadata which can all be known at the start. (I am tangentially reminded of the issue of clock skew between RF sample clock and wall clock, which is a reason one would want to have an end timestamp for long recordings. But that should be optional, is not the same as a sample length value, and is another issue entirely.) The use case I am imagining is like: "Hi, I am a black-box recorder who was turned on at [GPS position and timestamp], here is what I recorded since that instant." Such devices could be useful for a situation when you want to record RF in a large area from multiple points. (Drone competition? Foxhunt? Wildlife tracking? I dunno but it seems like something to do.) (Oh, another slight complication to going back to fix stuff later: writing until your disk is literally full. It would be easy as such things go to reserve a block, but it's still more things to get right.) Certainly a cleanup-later tool could add lengths, but it makes the process less "just drag the files off the SD card" and creates an extra opportunity for Technically Invalid But The Information Is All There files. |
@kpreid - Great point regarding a Writer just being a recorder. Also, disk space. Okay, let's talk about While it would be more convenient in many cases (including mine) to have it in both In #60, @drewlio originally requested a @kpreid - What are your thoughts? I have a feeling I know what your opinion will be, but guessing is for when you can't get canonical answers |
Assuming capture segments are still intended to be a partition of the dataset (exactly one capture segment applies to any sample), then the least-redundant schema is to have a global length and capture segments only have start points. |
Love it. |
In principle I agree with @kpreid's argument that we should minimize redundancy and the chances for error, but also in purely practical terms: Getting the length of the if n == len(md['captures']) - 1:
len_nth_capture = md['global']['core:sample_count'] - md['captures'][n]['sample_start']
else:
len_nth_capture = md['captures'][n + 1]['sample_start'] - md['captures'][n]['sample_start'] compared to if each len_nth_capture = md['captures'][n]['sample_count'] So we pay for reducing redundancy in the spec with seriously complicating each instance of what will be a very common computation. |
I have added a |
So, over in #60, a number of people have commented on the need for a
length
field. I just hit a case in my own SigMF work, completely independent of streaming where having alength
field would make things significantly easier.We decided not to include a
length
field in thecaptures
segment, originally, because we thought it would complicate streamingcaptures
segments to-disk. Some thoughts after working with our current design for a bit:length
field in eachcaptures
segment would actually impede writing the segment to disk. If a Writer application realizes it needs to write a new capture segment, it can add alength
field to the current segment before proceeding on to writing the next segment. Since, pragmatically, thecaptures
andannotations
arrays have to be buffered / written independently, anyway, to support streaming, I don't think this causes any additional seeking, really.length
field in theglobal
object. We've already made decisions that would require modifying a metadata file if a data file is split, so I don't think this causes any more burden than already exists.You could argue that it's up to the Reader application to grab the global length by datatype and size of file, but (a) that's not possible for metadata-only distributions (per #76) and (b) would only be for the
global
field, not thecaptures
field.Thoughts on the above? Anyone agree? Anyone disagree? What problems exist that I'm not thinking about?
The text was updated successfully, but these errors were encountered: