Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record grouping #315

Merged
merged 21 commits into from
Jul 7, 2021
Merged

Record grouping #315

merged 21 commits into from
Jul 7, 2021

Conversation

eito-fis
Copy link
Contributor

@eito-fis eito-fis commented Jul 1, 2021

Fixes #216

Adds the ability to get group count information from the record list endpoint.

Technical details
Adds a get_group_counts() function to the db module, which takes in a list of fields to group by and returns a dictionary mapping tuples of fields to their counts. Also adds a group_count_by parameter to the record list endpoint, where a user can pass a list of fields to group by and get

'group_count': {
    'group_count_by': [fields_used_for_grouping],
    'results': {dictionary_mapping_fields_to_counts},
}

including in the return data. The keys for results are strings, with composite fields names joined by ,s.

Checklist

  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the master branch of the repository
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no
    visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@eito-fis
Copy link
Contributor Author

eito-fis commented Jul 1, 2021

Question before I open this for review - do we have a set place to handle errors for the functions were exposing in the API? As of now we don't have much explicit error handling except for some of the stuff that's automatically included in sqlalchemy-filters, but it would be nice to check the validity of the format and fields for the group_count_by parameter.

In practice, I think the most principled place for validation would be in the form object that we're using to parse the initial url parameters (This would also remove the awkward error handling in the middle of the list view right now). However, if we want to validate using try-catch patterns then handling things in the db layer seems more appropriate. I'm leaning towards doing the former and re-implementing the validation sqlalchemy-filters does on our end, but would appreciate input on what would be proper.

@kgodey
Copy link
Contributor

kgodey commented Jul 1, 2021

@eito-fis I don't think I have enough information to make an informed recommendation. Questions:

  1. What does validating using try-catch patterns involve, and how does it differ from validation in the form object?
  2. What validation from sqlalchemy-filters would we need to reimplement if we went with the first option?

@eito-fis
Copy link
Contributor Author

eito-fis commented Jul 1, 2021

@kgodey

  1. I guess by try-catch I mean literally catching formatting errors from whatever we're running in the db layers, in contrast to manually checking whether our inputs are valid. Ex: catching a BadFilterFormat error out of sqlalchemy-filters or a ProgrammingError out of psycopg2.
  2. The primary ones are around validating that the input is in the proper format (iterable or dictionary), that each dictionary contains the required fields, and that the field being targeted in the operation actually exists.

Having written this out, re-implementing the validation doesn't seem like a great idea...
I think I'll try and implement group validation in the db layer - in hindsight, having form validation separate from the actual grouping function is a bit weird since there would only be an implicit connection between the two.

@kgodey
Copy link
Contributor

kgodey commented Jul 1, 2021

@eito-fis Sounds like you've answered your own question, that all makes sense to me.

@eito-fis eito-fis marked this pull request as ready for review July 1, 2021 22:36
@eito-fis eito-fis requested review from a team, kgodey and mathemancer and removed request for a team July 1, 2021 22:36
@eito-fis
Copy link
Contributor Author

eito-fis commented Jul 1, 2021

This should be reviewed after #300 is merged.

@eito-fis eito-fis force-pushed the record_sorting branch 2 times, most recently from 9fe6036 to 0df171e Compare July 2, 2021 19:37
Base automatically changed from record_sorting to master July 2, 2021 19:51
Copy link
Contributor

@mathemancer mathemancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the overall approach, and agree that pushing the validation down to the DB layer makes sense. Unfortunately, there's a bug occurring when grouping by a single column. See the comment in the pagination.py file.

filters=filters, order_by=order_by
)
# Convert the tuple keys into strings so it can be converted to JSON
group_count = {','.join(k): v for k, v in group_count.items()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug if a string rather than a tuple is returned by table.get_group_counts, i.e., if we want to group by only one column. I suggest modifying that function so it only has one return type(ish), a tuple. This will be easier to deal with, and less prone to causing bugs. There should also be a test for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, this results in splitting the value of that column up into its constituent characters in the output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks

@eito-fis eito-fis requested a review from mathemancer July 6, 2021 18:52
Copy link
Contributor

@mathemancer mathemancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Record API should support grouping results by a given field.
3 participants