Improve consistency of timestamp usage #690

david4096 · 2016-08-17T17:24:57Z

Timestamps are available across the API and this PR replaces the mixture of timestamp implementations by settling on ISO 8601. Created and updated fields were added to the dataset, feature, and feature set messages. Field names in assay metadata were renamed to be consistent with the remainder of the data model. Closes #682 and closes #685 and closes #690

Found mention of timestamp in intro.rst and replaced with newer definition. Thanks @mbaudis

mbaudis · 2016-08-17T18:47:19Z

+1

Great & fast response - thanks @kozbo & @david4096!

ejacox · 2017-01-12T17:07:23Z

Are timestamps necessary for Features? How would they differ from the FeatureSet timestamps?

Similarly, do individual variants need timestamps?

What is the difference between the timestamps in Datasets and those in FeatureSets?

david4096 · 2017-01-12T17:31:14Z

No, but they are always optional.

The BRCA Exchange has recently put in some effort to exploring how to satisfy the FDA revision requirements for data used for medical purposes. The idea is that the date accessed becomes important when the annotations are updated so often. I think that this timestamp model fits the most basic use case of a client being able to determine if a variant has changed from last access.

This does not completely satisfy the needs of reliable replication, but it is a step in the right direction.

The reference server's implementation would not have variants modified without modifying a variant set, but the BRCA Exchange's implementation might. The same could be true of GENCODE feature sets, wherein I will be able to see if any new featuresets have been added newer than the ones I have already downloaded.

An implementor doesn't need to return timestamps, but it needs to be a possible field. I would assume that if a nested document didn't have a timestamp that it inherited from its ancestor. Perhaps stating this clearly in the documentation would be helpful. I can't construe a situation where the reference server ought to return created timestamps for feature messages, but I can imagine other implementations that might.

mbaudis · 2017-01-12T19:06:01Z

An implementor doesn't need to return timestamps ... needs to be a possible field

Exactly. We cannot force complex data scenarios, but have to enable their implementation.

ejacox · 2017-01-13T07:25:51Z

OK. Thanks. I would be careful with the its always optional argument. Unnecessary or poorly defined fields could become a pain to maintain and also affect usability. Adding fields can be easier than deleting them.

ejacox · 2017-01-13T15:11:16Z

The comments for timestamps need to be rewritten to define what the timestamp is for. The current comments are too vague. Does creation time refer to when the message was added to the server? Why would anyone care when the data was loaded? If not, is there existing data with timestamps or are we being wishful? Why do all the comments say record? Should this be message?

The current server uses very few, if any, of the timestamps. Some are just set to the current time. The input methods seem to be an afterthought. Are timestamps required in any cases? Has there been any discussion about enforcing those? If not, why is this a field and not just metadata?

These fields might optional in the message. In the server, they will take up disk space. Do we want to add them to the server if no one is using them?

ejacox · 2017-01-13T15:23:01Z

Record seems to apply to biosample or individual. I don't think it applies to other things, like datasets. It should be more semantically clear.

ejacox · 2017-01-25T19:10:30Z

We are proposing to put created and updated into a metadata message. That way, any changes are just in one place, rather than in every message. Additionally, the description needs to be better. I propose:
created = "The time the underlying data was produced (i.e. when the experiment or analysis was run)."
updated = "The time the underlying data changed. This time is independent of when the data was loaded."

These fields describe the data. Any dates associated with determining what data was available on a particular server at a particular time would be separate.

mbaudis · 2017-01-25T20:07:29Z

@ejacox The specification "underlying data" is a good change, to separate from server timestamps. Apart from that, any record should have these as optional; however, the API may be designed as to evaluate records for their overall latest change (but that wouldn't make sense for creation ... anyway confuses things, and I don't really see the problem?).

mbaudis · 2017-01-25T20:13:52Z

Modification: Created, Updated don't have to be part of Variant (but Variantset, Callset). The comments were not specific enough to make this clear.

andrewjesaitis · 2017-01-26T23:58:52Z

+1 to @ejacox that create/update should refer to the data.
+1 to @mbaudis that they should be optional

My only remaining question is if there really a practical differentiation between created and updated at the data level. I mean if the analysis/alignment/etc is rerun if that really an update or is it just all new data? But, (to argue with myself) in the case of phenotype data, or sample attribute collection, I could see the need to differentiate between creation and update (eg we mislabeled all the samples and needed to correct the error). So I'd say to be consistent across the api keep both everywhere and make them optional.

I think it's important to keep in mind that the schema is only defining what should come back in the message and not how the data is stored.

Timestamps are available across the API and this PR replaces the mixture of timestamp implementations by settling on ISO 8601. Created and updated fields were added to the dataset, feature, and feature set messages. Field names in assay metadata were renamed to be consistent with the remainder of the data model. Closes ga4gh#682 and closes ga4gh#685 Found mention of timestamp in intro.rst and replaced with newer definition

Fix dataset comment

rename VariantSet.metadata to VariantSet.variant_set_metadata

david4096 · 2017-02-01T21:34:56Z

I've placed the created and updated fields in their own message in common.proto and moved that throughout the API.

The VariantSet.metadata message was renamed to VariantSet.variant_set_metadata because of the name collision.

ejacox · 2017-02-02T19:58:35Z

+1

david4096 · 2017-03-06T21:10:39Z

This PR is on hold while we review how best to approach timestamps in the API. #839

david4096 added the in progress label Aug 17, 2016

mbaudis mentioned this pull request Aug 17, 2016

message_create_time inconsistent and ambiguous in Experiment #637

Closed

david4096 force-pushed the timestamps branch 2 times, most recently from 84e5cce to b9effec Compare August 17, 2016 20:47

david4096 added the ready label Oct 26, 2016

This was referenced Nov 21, 2016

Extend dataset record structure to include fields used by Beacon #626

Open

Choose consistent time format #568

Open

This was referenced Dec 7, 2016

Update tests for timestamps ga4gh/compliance#230

Open

Timestamp fixes ga4gh/ga4gh-server#1488

Open

david4096 modified the milestone: v0.6.0a9 Dec 7, 2016

kozbo modified the milestones: 2016-02, Formerly - v0.6.0a9 Dec 22, 2016

david4096 force-pushed the timestamps branch from b1fdbb7 to 0729a4b Compare December 22, 2016 21:10

david4096 mentioned this pull request Jan 3, 2017

added timestamps to metadata #765

Closed

mbaudis mentioned this pull request Jan 9, 2017

time format fixes #729

Closed

david4096 force-pushed the timestamps branch from 0729a4b to 6129dff Compare January 10, 2017 08:00

david4096 mentioned this pull request Jan 25, 2017

Rich type representation in Attributes #700

Merged

david4096 added 3 commits February 1, 2017 11:11

Add timestamps to RNA, reference, reads

0a9e8eb

Fix dataset comment

Move to metadata field

531ca1e

Place created/updated behind metadata key

b02c218

rename VariantSet.metadata to VariantSet.variant_set_metadata

david4096 force-pushed the timestamps branch from a2f5f8d to b02c218 Compare February 1, 2017 21:33

Change doc

a85d7b5

david4096 mentioned this pull request Feb 6, 2017

Integrating external identifiers #807

Merged

Merge remote-tracking branch 'upstream/master' into timestamps

6e3f6b5

david4096 force-pushed the timestamps branch from 15456c7 to 6e3f6b5 Compare February 6, 2017 19:39

kozbo modified the milestones: 2017-01 v0.6.0 stable release, 2017-00 v0.6.0a10 Feb 25, 2017

andrewjesaitis mentioned this pull request Mar 6, 2017

Remove created and updated timestamps from API #839

Open

kozbo removed this from the 2017-01 v0.6.0 stable release milestone Mar 8, 2017

reece closed this Oct 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve consistency of timestamp usage #690

Improve consistency of timestamp usage #690

david4096 commented Aug 17, 2016 •

edited

Loading

mbaudis commented Aug 17, 2016

ejacox commented Jan 12, 2017

david4096 commented Jan 12, 2017

mbaudis commented Jan 12, 2017

ejacox commented Jan 13, 2017

ejacox commented Jan 13, 2017

ejacox commented Jan 13, 2017

ejacox commented Jan 25, 2017

mbaudis commented Jan 25, 2017

mbaudis commented Jan 25, 2017

andrewjesaitis commented Jan 26, 2017

david4096 commented Feb 1, 2017

ejacox commented Feb 2, 2017

david4096 commented Mar 6, 2017

Improve consistency of timestamp usage #690

Improve consistency of timestamp usage #690

Conversation

david4096 commented Aug 17, 2016 • edited Loading

mbaudis commented Aug 17, 2016

ejacox commented Jan 12, 2017

david4096 commented Jan 12, 2017

mbaudis commented Jan 12, 2017

ejacox commented Jan 13, 2017

ejacox commented Jan 13, 2017

ejacox commented Jan 13, 2017

ejacox commented Jan 25, 2017

mbaudis commented Jan 25, 2017

mbaudis commented Jan 25, 2017

andrewjesaitis commented Jan 26, 2017

david4096 commented Feb 1, 2017

ejacox commented Feb 2, 2017

david4096 commented Mar 6, 2017

david4096 commented Aug 17, 2016 •

edited

Loading