-
Notifications
You must be signed in to change notification settings - Fork 110
Improve consistency of timestamp usage #690
Conversation
+1 Great & fast response - thanks @kozbo & @david4096! |
84e5cce
to
b9effec
Compare
b1fdbb7
to
0729a4b
Compare
Are timestamps necessary for Features? How would they differ from the FeatureSet timestamps? Similarly, do individual variants need timestamps? What is the difference between the timestamps in Datasets and those in FeatureSets? |
No, but they are always optional. The BRCA Exchange has recently put in some effort to exploring how to satisfy the FDA revision requirements for data used for medical purposes. The idea is that the date accessed becomes important when the annotations are updated so often. I think that this timestamp model fits the most basic use case of a client being able to determine if a variant has changed from last access. This does not completely satisfy the needs of reliable replication, but it is a step in the right direction. The reference server's implementation would not have variants modified without modifying a variant set, but the BRCA Exchange's implementation might. The same could be true of GENCODE feature sets, wherein I will be able to see if any new featuresets have been added newer than the ones I have already downloaded. An implementor doesn't need to return timestamps, but it needs to be a possible field. I would assume that if a nested document didn't have a timestamp that it inherited from its ancestor. Perhaps stating this clearly in the documentation would be helpful. I can't construe a situation where the reference server ought to return |
Exactly. We cannot force complex data scenarios, but have to enable their implementation. |
OK. Thanks. I would be careful with the its always optional argument. Unnecessary or poorly defined fields could become a pain to maintain and also affect usability. Adding fields can be easier than deleting them. |
The comments for timestamps need to be rewritten to define what the timestamp is for. The current comments are too vague. Does creation time refer to when the message was added to the server? Why would anyone care when the data was loaded? If not, is there existing data with timestamps or are we being wishful? Why do all the comments say record? Should this be message? The current server uses very few, if any, of the timestamps. Some are just set to the current time. The input methods seem to be an afterthought. Are timestamps required in any cases? Has there been any discussion about enforcing those? If not, why is this a field and not just metadata? These fields might optional in the message. In the server, they will take up disk space. Do we want to add them to the server if no one is using them? |
Record seems to apply to biosample or individual. I don't think it applies to other things, like datasets. It should be more semantically clear. |
We are proposing to put created and updated into a metadata message. That way, any changes are just in one place, rather than in every message. Additionally, the description needs to be better. I propose: These fields describe the data. Any dates associated with determining what data was available on a particular server at a particular time would be separate. |
@ejacox The specification "underlying data" is a good change, to separate from server timestamps. Apart from that, any record should have these as optional; however, the API may be designed as to evaluate records for their overall latest change (but that wouldn't make sense for creation ... anyway confuses things, and I don't really see the problem?). |
Modification: Created, Updated don't have to be part of Variant (but Variantset, Callset). The comments were not specific enough to make this clear. |
+1 to @ejacox that create/update should refer to the data. My only remaining question is if there really a practical differentiation between created and updated at the data level. I mean if the analysis/alignment/etc is rerun if that really an update or is it just all new data? But, (to argue with myself) in the case of phenotype data, or sample attribute collection, I could see the need to differentiate between creation and update (eg we mislabeled all the samples and needed to correct the error). So I'd say to be consistent across the api keep both everywhere and make them optional. I think it's important to keep in mind that the schema is only defining what should come back in the message and not how the data is stored. |
Timestamps are available across the API and this PR replaces the mixture of timestamp implementations by settling on ISO 8601. Created and updated fields were added to the dataset, feature, and feature set messages. Field names in assay metadata were renamed to be consistent with the remainder of the data model. Closes ga4gh#682 and closes ga4gh#685 Found mention of timestamp in intro.rst and replaced with newer definition
Fix dataset comment
rename VariantSet.metadata to VariantSet.variant_set_metadata
I've placed the The VariantSet.metadata message was renamed to VariantSet.variant_set_metadata because of the name collision. |
+1 |
This PR is on hold while we review how best to approach timestamps in the API. #839 |
Timestamps are available across the API and this PR replaces the mixture of timestamp implementations by settling on ISO 8601. Created and updated fields were added to the dataset, feature, and feature set messages. Field names in assay metadata were renamed to be consistent with the remainder of the data model. Closes #682 and closes #685 and closes #690
Found mention of timestamp in intro.rst and replaced with newer definition. Thanks @mbaudis