-
-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to sort (and paginate) by column #189
Comments
Given how unlikely it is that this will pose a real problem I think I like option 1: enable sort-by-column by default for all tables, then allow power users to instead switch to explicit enabling of the functionality in their |
I think this can work with a I'd like to support "sort by X descending, then by Y ascending if there are dupes for X" as well. Two ways that could work:
Or...
The second option is probably better in that it makes it easier for columns to have a comma in their name. Is it possible for a SQLite column to start with a |
Might have to do something special to get sort-by-nulls-last: https://stackoverflow.com/questions/12503120/how-to-do-nulls-last-in-sqlite
Would need to figure out a smart way to get the default value - maybe by running a min() or max() against the column first? |
This is a better pattern as you don't have to pick a minimum value:
|
I think there are actually four kinds of sort order we need to support;
It looks like [-blah] is a valid SQLite table name, so mark I descending with a hyphen prefix isn't good. Instead, maybe this:
|
I'd like to continue to support _next=token pagination even for custom sort orders. To do that I should include rowid (or general primary key) as the tie breaker on all sorts so I can incorporate that it into the _next= token. |
In terms of user interface: the obvious place to put this is as a drop down menu on the column headers. This also means the UI can support combined sort orders. Assuming you are already sorted by county descending and you select the candidate column header, the options could be:
|
I'm tempted to put these verbose sorting options inline in the page HTML but have them in the table footer so they don't clog up the top half of the page with uninteresting links - then use JavaScript to hoik them out into a dropdown menu attached to each column header. |
There is one other interesting option for auto-enabling/disabling sort: the inspect command could include data about column index presence and whether or not a column has any null values in it. This would allow us to dynamically include a "nulls last" option but only for columns that contain at least one null. It's quite a lot of additional engineering for a very minor feature though, so I think I'll punt on that for the moment. We may find that the _group_count feature can benefit from column value statistics later on though. |
Alternative idea: by default enable all sorting in the UI. If a table has more than 100,000 rows disable sorting UI except for columns that have an index. Allow this to be overridden in metadata.json |
I'm not entirely sure how to get Consider this data: If the page size was set to 9 rather than 11, the page divide would be between those two rows with the same value in the |
The problem is that our So I think this is the right SQL:
But how do I encode a |
Maybe the answer here is that anything that's encoded in the next token is treated as >= with the exception of columns known to be primary keys, which are treated as > |
Pushed some work-in-progress with failing unit tests here: 2f8359c Here's a demo: https://datasette-column-sort-wip.now.sh/sortable-4bbaa6f/sortable?_sort=sortable - note that the |
I'm going to combine the code for explicit sorting with the existing code for _next= pagination - so even tables without an explicit sort order will run through the same code since they are ordered and paginated by primary key. |
A common problem with keyset pagination is that it can distort the "total number of rows" logic - every time you navigate to a further page the total rows count can decrease due to the extra arguments in the |
A note about views: a view cannot be paginated using keyset pagination because records returned from a view don't have a primary key - so there's no way to reliably distinguish between _next= records when the sorted column has duplicates with the same value. Datasette already takes this into account: views are paginated using offset/limit instead. We can continue to do that even for views that have been sorted using a |
To break this up into smaller units, the first implementation of this will only support a single |
Actually next page SQL when sorting looks more like this:
The next page after row 190 with sortable value 111 should show either records that are greater than 111 or records that match 111 but have a greater primary key than the last one seen. |
Allows for paginated sorted results based on a specified column. Refs #189
Demo: senator tweets ordered by number of replies: Page 2 (note that since Senators retweet things there are tweets with the same text/number-of-replies but retweeted by different senators that span the page break): https://datasette-issue-189-demo.now.sh/fivethirtyeight-2628db9/twitter-ratio%2Fsenators?_next=8556%2C121799&_sort_desc=replies |
Plus renamed human_description to human_description_en Refs #189
Small bug: "201 rows where sorted by sortable_with_nulls" shouldn't have the word "where" in it. |
I'm going to split the following out into separate tickets:
|
Actually I think I always want nulls last when ordering asc, nulls first when ordering desc. |
Here's a demo of the new clickable column headers: https://datasette-issue-189-demo-3.now.sh/salaries-7859114-7859114/2017+Maryland+state+salaries?_search=university&_sort_desc=last_name |
You can now explicitly set which columns in a table can be used for sorting using the _sort and _sort_desc arguments using metadata.json: { "databases": { "database1": { "tables": { "example_table": { "sortable_columns": [ "height", "weight" ] } } } } } Refs #189
We were showing this: 201 rows where sorted by sortable_with_nulls We now show this: 201 rows sorted by sortable_with_nulls
Allows for paginated sorted results based on a specified column. Refs #189
Plus renamed human_description to human_description_en Refs #189
You can now explicitly set which columns in a table can be used for sorting using the _sort and _sort_desc arguments using metadata.json: { "databases": { "database1": { "tables": { "example_table": { "sortable_columns": [ "height", "weight" ] } } } } } Refs #189
We were showing this: 201 rows where sorted by sortable_with_nulls We now show this: 201 rows sorted by sortable_with_nulls
I've merged this into master. |
Awesome! |
This is now released in Datasette 0.15 https://github.com/simonw/datasette/releases/tag/0.15 |
I think I found a bug. I tried to sort by middle initial in my salaries set, and many middle initials are null. The next_url gets set by Datasette to: But then |
As requested in #185 (comment)
I've previously avoided this for performance reasons: sort-by-column on a column without an index is likely to perform badly for hundreds of thousands of rows.
That's not a good enough reason to avoid the feature entirely though. A few options:
metadata.json
) to enable it for specific tables/columnsWe already have the mechanism in place to cut off SQL queries that take more than X seconds, so if someone DOES try to sort by a column that's too expensive it won't actually hurt anything - but it would be nice to not show people a "sort" option which is guaranteed to throw a timeout error.
The vast majority of datasette usage that I've seen so far is on smaller datasets where the performance penalties of sort-by-column are extremely unlikely to show up.
Still left to do:
The text was updated successfully, but these errors were encountered: