-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider ISO8601 for datetime which would be much easier to parse and validate across systems #5
Comments
@cmungall Since this is such a far-reaching question on what we have as our consistent datetime format, I'm assigning to you to chime in. Should we stick with the time format we have? Is it inherited from something else like MIxS? |
I fully support doing this. There are a number of ISO 8601 formats which can be used. We will also need to be explicit about which version. I believe the latest is ISO 8601-2:2019 |
After looking at some of the logic of biolinkml and the generated schemas, this might be a more broad issue. The schemas for attributes generally contain a I seems to me the intention is that the ETL pipeline should be normalizing terms like this so that consumers (such as me) don't have to, but perhaps I'm missing something. |
We had consensus that ISO 8601 should be used for all datetime. Further - the nmdc-schema should have a single ISO 8601 conformant string value and no raw value, and the nmdc-schema should regex validate the value. The responsibility for transforming sources to ISO 8601 conformant values lies with the ETL (at least presently in the GOLD as source case). There is agreement that transforming date-time should NOT be the responsibility of the search application ingest but should happen upstream. (Ideally in a post-ETL world, the source would provide conformant values.) |
The current fields with datetime values (e.g. As for pushing to the next sprint, I vote "yes". |
@jeffbaumes @wdduncan I feel like there is agreement on what is intended, but implementing in the ETL and then the portal is work to be done. Moved this to the May sprint. |
@jeffbaumes I can do for the GOLD fields that I process. Do you happen to have a list of the fields you need to be converted? |
I think it should be validated for any date/datetime field in the schema. |
@jbeezley it would help if you had a list of the fields of the fields that you were especially concerned with. |
This is primarily about validating that the data is correct. According to the schema, datetimes should be provided as xsd:dateTime which is defined as ISO 8601. If that validation is not occurring for all datetime fields, I would classify it as a bug in the schema code. |
@jbeezley I understand that. I have to write a script to transform the values. The only data types I have to work with are strings (i.e., all the data in the GOLD dump are strings). Moreover, many fields are empty (the data can be very sparse). So, I was just wondering if there were certain fields that were causing you problems so that I could prioritize tasks to suit your needs. We already identified |
My ingest goes to great lengths to recognize a wide array of datetime formats, so I don't have the information myself. I would think if the validation were performed correctly, then getting that information from the errors emitted would be straightforward. |
@wdduncan the only other one I notice at a glance is the database object's date_created slot. biosample collection_date, extreme_event, fire, and flooding are already annotated to be timestamp values. I'm not sure why the latter three are timestamp values, but I guess that's MIxS? @jbeezley FYI the issue is upstream of validation because add_date and mod_date currently have assigned ranges of |
I went through the schema looking for timestamp fields. In MIXS, I found the following fields took dates as values:
Of these terms, only As for the data that I am pulling in from GOLD, the only fields I found with datetime values were the I also exact data for the |
It is now in metadata-translation/src/bin/lib in the nmdc-runtime repo as well. I had it there in my local repository, but it wasn't under version control because of a conflicting You should be all set now. Let me know if anything else is blocking for you. |
@ssarrafan this is pretty much finished, but waiting on @dwinston to close the ticket. So, I'll move this to the June sprint. |
As the code for this is written, and what remains is for the metadata to be re-processed, I think this issue can be closed for now. It should be re-opened if any datetimes are discovered to be improperly encoded. |
Basically the portal team would like to review with the metadata team the choice of this more difficult to parse datetime. If there are good reasons for this format we can keep it. If there are standard libraries for parsing these somewhat odd timestamps we should utilize them.
The text was updated successfully, but these errors were encountered: