Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coloring scatter plots by factor/string #11422

Open
kasuteru opened this issue Jun 12, 2018 · 10 comments
Open

Coloring scatter plots by factor/string #11422

kasuteru opened this issue Jun 12, 2018 · 10 comments

Comments

@kasuteru
Copy link

Bug report

This may be either a bug report or a feature request, depending how you view things. The issue is that you cannot use a factor/string column in a pandas dataframe as color option without awkward workarounds.

Code for reproduction

data = pd.DataFrame(data={"x":[1,2,3,2,4], "y":[2,1,2,3,1], "z":["a", "b", "b", "a", "c"]})

plt.scatter(x="x", y="y", data=data)  # This works as nicely and as expected
plt.scatter(x="x", y="y", c="z", data=data) # This throws the following error:

Actual outcome

ValueError: c of shape (5,) not acceptable as a color sequence for x with size 5, y with size 5

Expected outcome
An image where each point is colored according to it's value in "z", cycling through a predefined color map. A shining example of how easy such a commonly used visualization should be is ggplot, see the code in this post: The color option automatically notices that this is a factor variable and assigns each level a color. Also, a legend is automatically shown.

Matplotlib version
'2.1.0'

@kasuteru
Copy link
Author

After checking the documentation again, this is definitely a feature request. After seeing how many posts on stackoverflow are about exactly this problem, I think this is something that is really not working as expected - Intuition (and implementations in R and, for example, seaborn), suggest that matplotlib should automatically deduce what the user wants in this case instead of throwing an error.

@jklymak
Copy link
Member

jklymak commented Jun 12, 2018

OK, if you do data={"x":[1,2,3,2,4], "y":[2,1,2,3,1], "z":[5, 4, 3, 2, 1]} this works, right? We don't have categorical data for the z axis.

@kasuteru
Copy link
Author

kasuteru commented Jun 12, 2018 via email

@kasuteru
Copy link
Author

kasuteru commented Jun 12, 2018 via email

@jklymak
Copy link
Member

jklymak commented Jun 12, 2018

Yes, right now categorical axes work by converting the string "units" to numbers. But we don't have the equivalent unit conversion machinery in place for z-data. I think one of the reasons seaborn is around is to scratch that itch. We are thinking about how to do it, lead by @story645, but its not trivial.

@jklymak jklymak added this to the v3.1 milestone Jun 12, 2018
@ImportanceOfBeingErnest
Copy link
Member

This is how I would currently do it:

data = pd.DataFrame(data={"x":[1,2,3,2,4], "y":[2,1,2,3,1], "z":["a", "b", "b", "a", "c"]})
labels, data["znum"] = np.unique(data.z, return_inverse=True)

scatter = plt.scatter(x="x", y="y", c="znum", data=data)

Concerning the legend, there is a proposal #11127 which is still under review to simplify the legend creation. If this passes as in its current proposed version, a legend could be simply created like

plt.legend(scatter.legend_elements()[0],labels)

However, as of now you would need to create the legend manually, e.g. like

handle = lambda c : plt.Line2D([0],[0], ls="", marker="o", color=scatter.cmap(scatter.norm(c)))
handles = [handle(c) for c in np.unique(data.znum)]
plt.legend(handles, labels)

@ImportanceOfBeingErnest
Copy link
Member

Xref: This request is also very similar to the request in #6214.

@kasuteru
Copy link
Author

@ImportanceOfBeingErnest : That indeed seems to be the same issue, with broader scope.

@jklymak jklymak modified the milestones: v3.1.0, v3.2.0 Feb 4, 2019
@timhoffm timhoffm modified the milestones: v3.2.0, unassigned Aug 11, 2019
@story645 story645 modified the milestones: unassigned, needs sorting Oct 6, 2022
@github-actions
Copy link

This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help!

@github-actions github-actions bot added the status: inactive Marked by the “Stale” Github Action label Oct 11, 2023
@story645 story645 added topic: color/color & colormaps and removed status: inactive Marked by the “Stale” Github Action labels Oct 11, 2023
@story645
Copy link
Member

story645 commented Oct 11, 2023

I think this is still wanted and I think one way to tackle this is to try and make NoNorm unit aware #7383

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants