Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interval type improvements #1067

Merged
merged 31 commits into from
Feb 16, 2022
Merged

Interval type improvements #1067

merged 31 commits into from
Feb 16, 2022

Conversation

mathemancer
Copy link
Contributor

@mathemancer mathemancer commented Feb 14, 2022

Fixes #430

This adds a custom interval type at the SQLAlchemy level that maps to the default PostgreSQL type. Further, we can now accept precision and fields arguments when creating or altering columns involving the INTERVAL type.

Technical details

The precision type option takes an integer (1-6) as input. The fields type option takes a string, and defines which fields the interval stores. Acceptable strings are:

YEAR
MONTH
DAY
HOUR
MINUTE
SECOND
YEAR TO MONTH
DAY TO HOUR
DAY TO MINUTE
DAY TO SECOND
HOUR TO MINUTE
HOUR TO SECOND
MINUTE TO SECOND

If both precision and fields are specified, then fields must include seconds, since precision applies to the seconds field.

The reason for a custom type was initially to ensure that we didn't convert PostgreSQL INTERVALs into Python timedeltas, since that conversion is lossy. It also standardizes the output and some aspects of input for intervals. Future PRs will similarly standardize output of other time-related types.

For reference, we'll be using the ISO 8601 spec as reduced by RFC 3339 for standardized output, and always-acceptable (at the DB layer) input. We will also attempt to handle any string as input using the default PostgreSQL date / time / duration parsing. So, in the case of intervals, we have strings like

f"P{years}Y{months}M{days}DT{hours}H{minutes}M{seconds}S"

Each variable is an integer with the exception of seconds for output (seconds can be a float). For input, decimals are allowed, and will be converted appropriately. Also, inputs will "carry over" when possible. Seconds and minutes will aggregate into hours, but hours won't aggregate into days since some days are different numbers of hours around DST changes. Days won't aggregate into months, but months will aggregate into years. For output, any missing units (e.g. zero minutes) will have actual zeroes so the client can count on each part being in the returned string. For input, this is not necessary (but you do need to include the T separator between the date and time sections if you include time values).

As a bonus, this PR also fixes some bugs in the constraints API tests that were preventing the pipeline from passing.

Finally, there

Checklist

  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the master branch of the repository
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no
    visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

If SQLAlchemy doesn't have this hint, it uses default psycopg2
behavior, which loses information in the case of intervals.
Performance drag, but rarely used.
@codecov-commenter
Copy link

codecov-commenter commented Feb 15, 2022

Codecov Report

Merging #1067 (c93359f) into master (a7591cb) will increase coverage by 0.06%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1067      +/-   ##
==========================================
+ Coverage   92.70%   92.77%   +0.06%     
==========================================
  Files         108      109       +1     
  Lines        3852     3888      +36     
==========================================
+ Hits         3571     3607      +36     
  Misses        281      281              
Flag Coverage Δ
pytest-backend 92.77% <100.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
db/columns/base.py 92.06% <ø> (ø)
db/columns/operations/select.py 100.00% <100.00%> (ø)
db/types/__init__.py 100.00% <100.00%> (ø)
db/types/exceptions.py 100.00% <100.00%> (ø)
db/types/interval.py 100.00% <100.00%> (ø)
db/types/operations/cast.py 100.00% <100.00%> (ø)
mathesar/api/db/viewsets/columns.py 91.39% <100.00%> (+0.09%) ⬆️
mathesar/api/serializers/columns.py 98.79% <100.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a7591cb...c93359f. Read the comment docs.

@mathemancer mathemancer marked this pull request as ready for review February 15, 2022 06:39
@mathemancer mathemancer requested a review from a team February 15, 2022 06:40
Copy link
Contributor

@kgodey kgodey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall, I added a couple of comments. @mathemancer, please resolve before merge.

I also think @silentninja should review this since he more recently worked on date and time types.

db/filters/operations/apply.py Outdated Show resolved Hide resolved
db/tests/types/test_interval.py Show resolved Hide resolved
@kgodey kgodey removed their assignment Feb 15, 2022
@kgodey kgodey added the pr-status: review A PR awaiting review label Feb 15, 2022
Copy link
Contributor

@silentninja silentninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Nice work @mathemancer, I learned a few sqlalchemy tricks, thanks. Can you add the interval standard as ISO 8601 to the Engineering decision

@@ -69,19 +70,13 @@ def create(self, request, table_pk=None):
)
else:
raise database_base_api_exceptions.ProgrammingAPIException(e)
except TypeError as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have you removed capturing this exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch. I changed that so I could get more debugging output at some point and forgot to revert the change.

@@ -29,6 +29,7 @@ class TypeOptionSerializer(MathesarErrorMessageMixin, serializers.Serializer):
length = serializers.IntegerField(required=False)
precision = serializers.IntegerField(required=False)
scale = serializers.IntegerField(required=False)
fields = serializers.CharField(required=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to add this api change to the PR description.

f'fields "{self.impl.fields}" is not in {all_fields}'
)

def column_expression(self, col):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any issues with using column_expression, I am just curious to know if you have thought about setting a local intervalstyle instead.

Copy link
Contributor Author

@mathemancer mathemancer Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with that is we need to avoid psycopg2 interpreting the result as an actual INTERVAL. The column expression casts to TEXT at the DB layer, which then gets picked up by psycopg2 as a python str. The problem with letting psycopg2 pick up INTERVALs otherwise is that it uses a python timedelta to represent them, but timedelta makes some different choices than PosgreSQL (e.g., assuming 30 days can be accumulated into 1 month) and the conversion is therefore inaccurate. This would result in erroneous info being passed to the UI, and in fact make viewing some values (e.g., 37 days) impossible.

intervalstyle doesn't change the actual return type, and so doesn't really solve the problem by itself. We'd still need to cast to text using a column_expression. Given that, I opted to handle the formatting ourselves, since the PostgreSQL iso_8601 style is slightly out-of-spec, and made a few choices that don't work that well for our use case (IMO). It also makes the formatting that would happen completely visible, rather than hidden in the PostgreSQL docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thanks for the explanation!

@silentninja silentninja added pr-status: revision A PR awaiting follow-up work from its author after review and removed pr-status: review A PR awaiting review labels Feb 16, 2022
@mathemancer
Copy link
Contributor Author

Looks good to me. Nice work @mathemancer, I learned a few sqlalchemy tricks, thanks. Can you add the interval standard as ISO 8601 to the Engineering decision

I'm currently writing up a wiki page that will cover the spec. It's a little more involved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-status: revision A PR awaiting follow-up work from its author after review
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Handle INTERVAL type in the backend
4 participants