Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-30124: Use native timestamp comparison for ingest_date #544

Merged
merged 4 commits into from Jul 5, 2021

Conversation

andy-slac
Copy link
Contributor

Do not convert ingest_date to nanoseconds, instead use native time
functions for all operations. This requires some time literals to be
converted from astropy Time to datetime. The change implies that we
cannot use ingest_date with other TAI-based timestamp columns in the
same sub-expression. If we need that feature then ingest_date needs to
be migrated to TAI nanoseconds.

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

dt = value.to_datetime()
return ScalarWhereClauseConverter.fromLiteral(dt)
else:
return ScalarWhereClauseConverter.fromLiteral(value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the approach you've taken here is fine, but I'll mention another one here in case you like it better:

  • Make a new TimeLiteralWhereClauseConverter subclass for time literals that only stores the str from the expression, and does not coerce it to either astropy.time.Time, but still declares its dtype to be astropy.time.Time.
  • Remove Time from the registerBinary calls for other comparison operators, and instead register Time-only ones that delegate to a func that coerces those literals according to the other operand.

I think that generalizes better to other kinds of context-dependent coercion, though of course it'd be nice if we didn't need any more of that.

Copy link
Contributor Author

@andy-slac andy-slac Jul 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TallJimbo, thanks for suggestion! I have decided to simplify things based on that idea and remove two-pass tree analysis in favor of doing what you suggest based on type of operands in comparison operators. This greatly simplifies the whole thing but with a caveat of not supporting some complex expressions of a kind ingest_date < MAX(time_literal1, time_literal2) but we do not currently support this syntax anyways. I think we'll be OK with this for now, and if we need to extend our language support for more complex expressions we'll need to rethink how the whole conversion works (or do something with Time/datetime duality).

One thing that I noticed is that we do not handle timestamps in IN operator, even though documentation says it should work. I'll open a new ticket for that.

Could you look at it once again, should be shorter commit this time.

@andy-slac andy-slac force-pushed the tickets/DM-30124 branch 3 times, most recently from 8030f0a to 93e94df Compare July 1, 2021 20:38
@timj
Copy link
Member

timj commented Jul 1, 2021

Can we have a test where we put a dataset and then use ingest date in a queryDatasets call constrained to the now and 10 seconds in the past. If everything works we will get the dataset back, if we have the 37s offset we won't. Also check that this works with bind since that's what is used for oods.

@andy-slac
Copy link
Contributor Author

@timj, I'll try to make this sort of test, need to remember how to make a working registry for this test.

@timj
Copy link
Member

timj commented Jul 1, 2021

@andy-slac I think you can probably add it in test_butler.py immediately after one of the butler.put calls in there.

@andy-slac
Copy link
Contributor Author

Looks like registry test class already has a testIngestTimeQuery() method, I'll extend that instead.

@timj
Copy link
Member

timj commented Jul 1, 2021

None of those tests do a butler.put but I think loading an external YAML always resets the ingest date so I think they would all be written with the current UTC (also would be great if bind was used in one of them).

@andy-slac
Copy link
Contributor Author

I think we always use current timestamp in UTC for ingest_date, butler.put() has no control over that, it just calls registry.insertDataset() so the result is exactly the same. Actual value stored is determined by database assigning default value to a column (CURRENT_TIMESTAMP for sqlite or NOW() for Postgres). For Postgres testing we need clocks synchronized of course.

Do not convert ingest_date to nanoseconds, instead use native time
functions for all operations. This requires some time literals to be
converted from astropy Time to datetime. The change implies that we
cannot use ingest_date with other TAI-based timestamp columns in the
same sub-expression. If we need that feature then ingest_date needs to
be migrated to TAI nanoseconds.
@andy-slac
Copy link
Contributor Author

I have extended unit test for ingest date but now have to run it in Postgres which means that I have to recall how to build everything on my desktop.

@timj
Copy link
Member

timj commented Jul 1, 2021

I'm pretty sure the github action runs the postgres unittests.

@andy-slac
Copy link
Contributor Author

Indeed, even better. Tests are all successful.

@andy-slac andy-slac requested a review from TallJimbo July 1, 2021 23:11
Co-authored-by: Jim Bosch <jbosch@astro.princeton.edu>
@andy-slac andy-slac merged commit 900adb8 into master Jul 5, 2021
@andy-slac andy-slac deleted the tickets/DM-30124 branch July 5, 2021 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants