-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Add sort
parameter to st.bar_chart
#12339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🎉 Snyk checks have passed. No issues have been found so far.✅ security/snyk check is complete. No issues have been found. (View Details) ✅ license/snyk check is complete. No issues have been found. (View Details) |
✅ PR preview is ready!
|
st.bar_chart
sort
parameter to st.bar_chart
+ disable default sorting
📈 Python coverage change detectedThe Python unit test coverage has increased by 0.0165%
✅ Coverage change is within normal range. Coverage by files
|
assert chart_spec["mark"] in [altair_type, {"type": altair_type}] | ||
assert chart_spec["encoding"]["x"]["type"] == "ordinal" | ||
assert chart_spec["encoding"]["x"]["sort"] == ["c", "b", "a"] | ||
assert chart_spec["encoding"]["x"]["sort"] is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note sure if this test still makes sense now or if there's another way to check that the ordered categories are applied properly, given that we're disabling sorting now by setting sort=None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just ran this manually and it seems like it's ordered b -> c -> a, i.e. in the order of the data. Not sure how before this even worked to get the right order from the Pandas dataframe, given that Altair should sort the categories alphabetically by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how before this even worked to get the right order from the Pandas dataframe, given that Altair should sort the categories alphabetically by default?
I believe Altair has some extra treatment if the column is categorical with ordered=True. Also see this old issue and PR related to this aspect:
- Using ordered categorical columns with built-in charts causes an error #7776
- Fix handling of ordinal columns for builtin-charts #7771
I'm wondering if we should add some special handling in this case (if the column is categorical + ordered=True)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved this by not doing any special casing but allowing sort=True
to get the old behavior with Altair's automatic sorting back. (The drawback of special casing is that then there's no way to disable this sorting...).
sort : str or None | ||
How to sort the bars. If ``None`` (default), bars are shown in data | ||
order (no sorting). If this is the name of a column (e.g. ``"col1"``), | ||
bars are sorted by that column in ascending order. If this is the | ||
name of a column prefixed with a minus sign (e.g. ``"-col1"``), bars are | ||
sorted by that column in descending order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One consideration: We could also support bool
here with True
being the current behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah pretty neat idea. I changed it now so it uses False
by default and allows setting it to True
to have Altair's automatic sorting. This also resolves the ordered categorical issue above, since you can just set sort=True
to have the same sorting of ordered categorical values as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also been thinking about changing the default to True
to break fewer existing charts, but I think False
by default is much less confusing. Plotly and matplotlib also don't apply any automatic sorting for bar charts by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 but it might be good to validate if it makes sense to add special treatment for ordered categorical columns, see this comment: #12339 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new sort
parameter to st.bar_chart
to give users explicit control over how bars are ordered, while also disabling Altair's default alphabetical sorting behavior. This addresses user confusion and provides more predictable chart behavior.
Key changes:
- Adds
sort
parameter supporting boolean values and column name strings (with optional "-" prefix for descending order) - Disables automatic sorting by default (
sort=False
) for more predictable behavior - Updates existing snapshot tests to reflect the new default behavior
Reviewed Changes
Copilot reviewed 6 out of 57 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
lib/streamlit/elements/vega_charts.py | Adds the sort parameter to the bar_chart function signature and documentation |
lib/streamlit/elements/lib/built_in_chart_utils.py | Core implementation of sort functionality with column parsing, validation, and encoding logic |
lib/tests/streamlit/elements/vega_charts_test.py | Comprehensive unit tests for the new sort parameter behavior |
lib/streamlit/elements/arrow.py | Updates add_rows functionality to preserve sort parameter |
e2e_playwright/st_bar_chart.py | Adds test cases for various sort scenarios |
e2e_playwright/st_bar_chart_test.py | Updates test expectations and adds snapshot tests for sort behavior |
def _parse_sort_column(df: pd.DataFrame, sort_from_user: bool | str) -> str | None: | ||
if sort_from_user is False or sort_from_user is True: | ||
return None | ||
|
||
sort_column = sort_from_user.removeprefix("-") | ||
if sort_column not in df.columns: | ||
raise StreamlitColumnNotFoundError(df, sort_column) | ||
|
||
return sort_column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function doesn't handle the case where sort_from_user
is an empty string. An empty string would pass the boolean checks and then removeprefix('-')
would return an empty string, which would likely not be found in df.columns
and raise an error. Consider adding an explicit check for empty strings.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would already trigger an error because the empty string wouldn't be in the dataframe columns.
# String: sort by column name (optional '-' prefix for descending) | ||
sort_order: Literal["ascending", "descending"] | ||
if sort_from_user.startswith("-"): | ||
sort_order = "descending" | ||
else: | ||
sort_order = "ascending" | ||
sort_field = sort_from_user.removeprefix("-") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to _parse_sort_column
, this function doesn't handle empty strings. If sort_from_user
is an empty string, it would create a SortField
with an empty field name, which could cause issues in Altair. Consider adding validation to ensure the string is not empty.
# String: sort by column name (optional '-' prefix for descending) | |
sort_order: Literal["ascending", "descending"] | |
if sort_from_user.startswith("-"): | |
sort_order = "descending" | |
else: | |
sort_order = "ascending" | |
sort_field = sort_from_user.removeprefix("-") | |
sort_field = sort_from_user.removeprefix("-") | |
if not sort_field: | |
raise StreamlitAPIException( | |
"Sort column name cannot be empty. Please provide a valid column name for sorting." | |
) |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already caught by the check above.
def _parse_sort_column(df: pd.DataFrame, sort_from_user: bool | str) -> str | None: | ||
if sort_from_user is False or sort_from_user is True: | ||
return None | ||
|
||
sort_column = sort_from_user.removeprefix("-") | ||
if sort_column not in df.columns: | ||
raise StreamlitColumnNotFoundError(df, sort_column) | ||
|
||
return sort_column | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function _parse_sort_column
lacks a docstring. Per Python Guide docstring requirements, functions should include docstrings using Numpydoc style. Although this function is prefixed with underscore indicating private scope, it supports chart utilities that users access through st.bar_chart
. Add a docstring explaining the function's purpose, parameters, and return value in Numpydoc format.
def _parse_sort_column(df: pd.DataFrame, sort_from_user: bool | str) -> str | None: | |
if sort_from_user is False or sort_from_user is True: | |
return None | |
sort_column = sort_from_user.removeprefix("-") | |
if sort_column not in df.columns: | |
raise StreamlitColumnNotFoundError(df, sort_column) | |
return sort_column | |
def _parse_sort_column(df: pd.DataFrame, sort_from_user: bool | str) -> str | None: | |
""" | |
Parse the sort column parameter for chart utilities. | |
Parameters | |
---------- | |
df : pd.DataFrame | |
The DataFrame containing the data to be plotted. | |
sort_from_user : bool or str | |
The sort parameter provided by the user. If a string, it specifies | |
the column to sort by (with optional "-" prefix for descending order). | |
If a boolean, it's ignored and None is returned. | |
Returns | |
------- | |
str or None | |
The name of the column to sort by without any direction prefix, | |
or None if sort_from_user is a boolean. | |
Raises | |
------ | |
StreamlitColumnNotFoundError | |
If the specified sort column doesn't exist in the DataFrame. | |
""" | |
if sort_from_user is False or sort_from_user is True: | |
return None | |
sort_column = sort_from_user.removeprefix("-") | |
if sort_column not in df.columns: | |
raise StreamlitColumnNotFoundError(df, sort_column) | |
return sort_column | |
Spotted by Diamond (based on custom rule: Python Guide)
Is this helpful? React 👍 or 👎 to let us know.
sort
parameter to st.bar_chart
+ disable default sortingsort
parameter to st.bar_chart
the chart's Altair spec. As a result this is easier to use for many | ||
"just plot this" scenarios, while being less customizable. | ||
If ``st.line_chart`` does not guess the data specification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing these lines because they aren't really accurate anymore today given that you can use x
and y
to specify which columns to use.
Describe your changes
This PR adds a
sort
parameter tost.bar_chart
to sort the bars by a column (sort="col1"
for ascending andsort="-col1"
for descending order).The default is
sort=True
for now, which keeps Altair's default sorting. We might change this later on to disable sorting by default but want to discuss it first.GitHub Issue Link (if applicable)
Closes #385
Closes #7111
Testing Plan
Contribution License Agreement
By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.