-
Notifications
You must be signed in to change notification settings - Fork 7
Add count labels and ordering options to visualize_feature bar plots
#78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
idanmoradarthas
merged 11 commits into
master
from
77-add-count-labels-to-bar-plots-in-visualize_feature-method
Nov 2, 2025
Merged
Add count labels and ordering options to visualize_feature bar plots
#78
idanmoradarthas
merged 11 commits into
master
from
77-add-count-labels-to-bar-plots-in-visualize_feature-method
Nov 2, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Bump pyarrow to version 22.0.0 and ruff to version 0.14.3 in pyproject.toml. - Increment package version to 1.10.0rc2 in __init__.py. - Adjust version test in test_version.py to reflect the new version.
- Introduced a new parameter `show_counts` to the `visualize_feature` function, allowing users to display count values on top of bars in count plots. - Replaced the seaborn countplot with a custom matplotlib bar plot for improved flexibility and control over the visualization. - Updated documentation to reflect the new parameter and its functionality.
- Added an `order` parameter to the `visualize_feature` function, allowing users to specify the order of categorical levels in count plots. - Updated documentation to detail the new ordering options, including sorting by count and alphabetical order, as well as accepting explicit lists. - Modified tests to validate the new ordering functionality for various feature types. - Adjusted plot sizes in tests for better visualization of results.
- Changed the boolean feature visualization image in README.md and preprocess documentation to reflect the new count display format. - Updated the test for boolean feature visualization to use the new parameter and added parameterization for testing both display options. - Removed the old boolean visualization image as it is no longer needed.
- Replaced outdated visualization images in README.md and preprocess documentation with new images reflecting updated float, integer, datetime, and categorical feature visualizations. - Modified tests to accommodate new visualization formats and added parameterization for object and category features. - Removed obsolete image files that are no longer needed.
- Updated the `visualize_feature` function to sort value counts in descending order when the `order` parameter is set to "count_desc". - Added a new test to validate the `visualize_feature` function with various ordering options, including "count_desc", "count_asc", and "alpha_asc". - Adjusted plot size in the test for improved visualization.
- Introduced a new test to validate the behavior of the `visualize_feature` function when provided with an invalid order parameter, ensuring it raises a ValueError with an appropriate message. - This enhances the robustness of the feature visualization by confirming that incorrect inputs are properly managed.
- Introduced a new test to validate the `visualize_feature` function when provided with a list of order parameters, enhancing the testing coverage for ordering functionality. - Adjusted plot size in the test for improved visualization consistency. - This update ensures that the function behaves correctly with multiple ordering configurations.
- Updated README.md and preprocess documentation to include detailed descriptions of the `visualize_feature` function's capabilities, particularly for handling high-cardinality categorical features and customizing sorting and count display options. - Added examples demonstrating the use of the `remove_na`, `show_counts`, and `order` parameters for various feature types. - Improved formatting and clarity in the documentation to better guide users in utilizing the visualization features effectively.
- Adjusted the plot size in the `test_visualize_feature_float_datetime_int` test to improve visualization clarity, changing the height from 8 to 11 inches. - Updated the corresponding baseline image to reflect this change in the test output.
- Introduced a new helper function `_plot_count_bar` to streamline the creation of bar charts for categorical data, allowing for customizable ordering and optional count labels. - Refactored the `visualize_feature` function to utilize `_plot_count_bar`, enhancing code clarity and maintainability. - This update improves the flexibility of visualizations by consolidating bar plotting logic into a dedicated function.
visualize_feature bar plots
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This pull request enhances the
visualize_featurefunction in theds_utils/preprocess.pymodule by introducing new parameters for displaying count labels on bar plots and customizing the order of categories. These changes address the limitations highlighted in issue #77, where users noted the difficulty in estimating exact counts from the y-axis in categorical visualizations. By replacing the Seaborncountplotwith a custom Matplotlib-based implementation, we gain greater flexibility, including optional count annotations and sorting options. This improves readability, especially for high-cardinality data, and makes the visualizations more suitable for reports and presentations.The updates also include dependency upgrades, a package version bump, refreshed documentation with examples, updated test cases, and new baseline images to reflect the changes.
Key Changes
New Features in
visualize_featureAdded
show_countsParameter:True)visualize_featuremethod #77 by making frequencies explicit without relying on axis scales.show_counts=Falseto revert to the previous behavior.Added
orderParameter:"count_desc": Sort by descending count (most frequent first)."count_asc": Sort by ascending count."alpha_asc": Sort alphabetically in ascending order."alpha_desc": Sort alphabetically in descending order.None, uses the default index order fromvalue_counts().Added
axParameter:Code Refactoring
Introduced a new helper function
_plot_count_barinds_utils/preprocess.py:pandas.Seriesof value counts.orderandshow_countslogic, including sorting the series and annotating bars withax.bar_label().sns.countplot, improving maintainability and customization without Seaborn dependencies for this plot.Updated
visualize_featureto use_plot_count_barfor relevant feature types (categorical, object, boolean, integer).Minor adjustments for handling high-cardinality features (e.g., limiting to top 10 categories with a warning, as before).
Documentation Updates
visualize_featureto describe the new parameters (show_counts,order).visualize_feature, including parameter descriptions and behavior notes.remove_na.Testing Enhancements
show_counts=True/Falseoutputs.ordervalues, including string options, lists, and invalid inputs (raisesValueErrorwith a clear message)._plot_count_barbehavior indirectly throughvisualize_feature.Testing and Validation
show_counts=Falseandorder=Nonereproduces the original behavior.Related Issue
Closes #77 by adding the requested count labels and extending functionality with ordering options for better usability.
This PR improves the overall utility of the
DataScienceUtilspackage for data exploration and visualization tasks. Feedback welcome!