Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify categorical plots #466

Merged
merged 48 commits into from
Mar 13, 2015
Merged

Unify categorical plots #466

merged 48 commits into from
Mar 13, 2015

Conversation

mwaskom
Copy link
Owner

@mwaskom mwaskom commented Mar 9, 2015

This is a follow-on to #410. The majority of the changes are in terms of the implementation of things, but there are a few important new features and changes.

In short, the categorical distribution plots (boxplot, violinplot and stripplot) have been unified with the categorical estimation plots (barplot, pointplot, and countplot). From the user perspective, that means they all share a basic API. From the developer perspective, it means that as many common operations were abstracted out into the the underlying _CategoricalPlotter and _CategoricalStatPlotter classes. That means adding additional plot kinds in the future should be easier and more robust.

Demonstration of major new features

The main highlight in terms of new functionality is that barplot and pointplot can now draw horizontally:

sns.barplot(x="tip", y="day", data=tips)

seaborn-barplot-4

Additionally, factorplot can use any of the categorical plot kinds:

sns.factorplot(x="age", y="embark_town", hue="sex", row="class",
               data=titanic[titanic.embark_town.notnull()],
               size=2, aspect=3.5, palette="Set3",
               kind="violin", split=True, cut=0, bw=.2)

seaborn-factorplot-6

API changes and new features

  • The major API change in {bar,point,factor}_plot is that the x_order parameter has been renamed to order to reflect the fact that the categorical variable can be on the x or y axis.
  • The ability to draw counts by passing only an x variable to barplot has been removed, but the new countplot function has been added to retain this functionality.
  • In all places within the categorical plots, the representation and order of categorical levels is inferred from the data with an attempt to use pandas Categorical information if it exists. Otherwise, the default is now to use the order of levels as they appear in the dataframe rather than sorting by default. However, this behavior has not yet been changed in FacetGrid. That will likely be a different PR.
  • Two parameters have been removed from the estimation plots (with attempts to catch and warn): hline and dropna
  • This PR adds a compatibility layer for some of the API breaks in Overhaul of categorical distribution plots #410
  • Added the scale parameter to pointplot, which scales the point size and linewidths by a single factor.
  • Added the errcolor parameter to barplot and also now pass extra **kwargs down to plt.bar.
  • Added the ax property to FacetGrid, which will give access to the main Axes in the case where there are no row or col facets.
  • Added a more informative exception when string variables are named that don't appear in data.
  • Fixed how the proxy artist used for generating the violinplot and boxplot legends is drawn to avoid errors when the first category has no data for a hue level.
  • There are longer narrative explanations in the docstrings for all of the categorical plotting functions and copious examples for that will be executed and shown in the online API docs.

Linked issues:

This PR addresses several issues: #361, #416, #425, #435, #445, #448

Outstanding issues

  • Update the release notes
  • Update and reorganize the API docs homepage
  • countplot needs some docstring examples
  • Update the tutorials (may get punted on)

@mwaskom
Copy link
Owner Author

mwaskom commented Mar 9, 2015

It would be great if people could pull this branch and throw all their weird, real-life data at it to check corner cases I may have missed with the test data/my own generally pretty clean experimental data.

@phobson
Copy link
Contributor

phobson commented Mar 9, 2015

Just want to say that this + the fact that factor plot takes row|col|hue arguments is

Huge

@phobson
Copy link
Contributor

phobson commented Mar 9, 2015

What's your the timeline for this? I can maybe take a crack at the tutorials (separate branch & PR) while testing. But that'll definitely have to wait for later in the week spilling over into next week.

@mwaskom
Copy link
Owner Author

mwaskom commented Mar 9, 2015

The tutorial changes I have in mind are broader than just this PR -- now that I have an infrastructure for examples in the API docs, I want to move much of the nitty gritty stuff to there and make the tutorials more high-level and integrative. But I don't quite have a vision for what that will look like yet, so I'm not gonna hold up this PR for it.

@mwaskom
Copy link
Owner Author

mwaskom commented Mar 12, 2015

Something I am considering is changing hue= in the categorical plots to by=, or possibly something different, to better differentiate it from what FacetGrid is doing with hue, which in almost all cases (except pointplot) is different.

This is a mess but I don't have time to finish it at the moment.
See matplotlib/matplotlib#4162 for details
on bug in matplotlib on Python 3.4 that motivates this.
This will include levels that appear in the `category` list, but that
do not appear in the data.

See #361

"""
if order is None:
if hasattr(values, "categories"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for reference: this only supports pandas >=0.15, in former versions Categorical.levels have to be used...

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants