Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement range based grouping for numeric types. #384

Closed
Tracked by #248
kgodey opened this issue Jul 15, 2021 · 5 comments · Fixed by #1312
Closed
Tracked by #248

Implement range based grouping for numeric types. #384

kgodey opened this issue Jul 15, 2021 · 5 comments · Fixed by #1312
Assignees
Labels
type: enhancement New feature or request work: backend Related to Python, Django, and simple SQL

Comments

@kgodey
Copy link
Contributor

kgodey commented Jul 15, 2021

Problem

Users may want to group their records by ranges such as 0-1, 1-2, 2-3, or 0-10, 10-20, 20-30, or 0-100, 100-200, 200-300, etc. The range that they will want to group on will depend on the kind of data they have. For example, if the values in a column range from 0.3 to 5.2, then grouping by 0-10, 10-20 etc. is not very useful.

We need to provide both useful ranges for a given column and the ability to group by them.

Solution

  • Columns of Number types should provide useful grouping ranges for that column in the Column API based on either:
    • the number of groups desired
    • the group size desired
  • We should extend our record grouping mechanism to accept ranges for columns of Number types. We should be able to accept arbitrary ranges, not just the ranges suggested in the Column API.
    • Users should be able to set a min and max value for a range and the increment size

This involves:

  • Implementing the grouping in the backend
  • Updating the /api/v0/databases/<id>/types/ endpoint to store available grouping on this type This will be done in Grouping description endpoint #1310

Additional context

@kgodey kgodey added this to the 07. Initial Data Types milestone Jul 15, 2021
@kgodey kgodey added ready Ready for implementation type: enhancement New feature or request work: backend Related to Python, Django, and simple SQL work: database labels Jul 15, 2021
@mathemancer
Copy link
Contributor

@kgodey @ghislaineguerin Should the ranges have the same size on the number line, or the same number of entries? For example, if a column has entries

mycol
-----
   1
   2
   3
   4
   5
   6
   7
  20

Should splitting into two groups result in

 1-10
-----
   1
   2
   3
   4
   5
   6
   7
-----
11-20
-----
  20

or

 1-4
-----
   1
   2
   3
   4
-----
 5-20
-----
   5
   6
   7
  20

@kgodey
Copy link
Contributor Author

kgodey commented Jul 16, 2021

@mathemancer Good question. From the backend perspective, I think it makes sense to provide both options. The frontend (or other Mathesar clients) can use whichever would be best in the use case.

@kgodey
Copy link
Contributor Author

kgodey commented Sep 13, 2021

@powellc FYI I updated the issue description with more details.

@kgodey kgodey assigned mathemancer and unassigned powellc Oct 5, 2021
@kgodey kgodey added status: started and removed ready Ready for implementation labels Oct 12, 2021
@mathemancer
Copy link
Contributor

As I've gotten into this:

  • Implementing "range-based" grouping turns out to be equivalent for any type that can be ordered (for example, text types could be a - c, d - f, ... x - z ).
  • Grouping by distinct tuples is a special case of range-based grouping, where the ranges are very small (one element each).

I'm currently working on the assumption that the former bullet point makes sense, and we can have that functionality (even if we don't want it in the front end at the moment). For the second point, I think I'll start by just getting the range grouping working, then we can deprecate the other grouping form.

@kgodey Does that make sense to you?

@kgodey
Copy link
Contributor Author

kgodey commented Oct 13, 2021

@mathemancer That sounds fine to me. I think we'll want different frontend experiences for grouping by distinct tuples vs. range-based grouping, but there's no reason why the API can't be the same for both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request work: backend Related to Python, Django, and simple SQL
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants