Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorical: Unsorted, String only, fix overwrite bug #9783

Merged
merged 6 commits into from Feb 11, 2018

Conversation

story645
Copy link
Member

Alternative to #9774 and supersedes #9340 that preserves more of the existing behavior. Open discussion on whether that should be preserved, but closes out #9350, #9312, #9336 .

import numpy as np
dog = ["bored", "happy","happy","bored", "bored","happy"]
cat = ["bored", "bored","bored","happy","bored","bored"]
time = ["combing","feeding","drinking","napping","washing","playing"]

fig, ax = plt.subplots()
ax.plot(time, dog, label="dog")
ax.plot(time, cat, label="cat")
ax.legend()

index

And it's supporting missing values/nans in scatter but not in plot (in plot it's treating them as a categorical). My guess is this is tied into #9713 (plot doesn't behave like the other functions).

[1, 1, -1, 2, 100, 0, 200])]
ids = ["unicode", "single", "basic", "mixed"]

test_cases = {"unicode": ("Здравствуйте мир",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be an ordered dict so that they are always in the same order so that pytest-xdist works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda surprised since I was always taught that tests should be independent of each other/order but ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are, but xdist imports the tests in both processes and then sorts out who does what (instead of trying to pickle and ship tests across the wire). One of it's start up checks is that it finds exactly the same tests in both processes (to make sure nothing funny is happening). In 3.5 the order is random which is fine from a testing point of view but not fine from xdist counting it's fingers and toes point of view.

@tacaswell
Copy link
Member

please leave for me to merge

@tacaswell tacaswell mentioned this pull request Nov 20, 2017
6 tasks
Copy link
Member

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks okay, but the public facing methods could do with some short docstrings (which would make reviewing easier too!)

def __init__(self, data):
"""Create mapping between unique categorical values
and numerical identifier

Parameters
----------
data: iterable
sequence of values
sequence of values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sequence can contain anything including mixed types right? Might be worth adding a note here if that's true.


@staticmethod
def axisinfo(unit, axis):
majloc = StrCategoryLocator(axis.unit_data.locs)
majfmt = StrCategoryFormatter(axis.unit_data.seq)
majloc = StrCategoryLocator(axis.unit_data._locs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this get a short docstring too?

data = [d.encode('utf-8', 'ignore').decode('utf-8')
for d in data]
return np.array(data, dtype=np.unicode)
def to_str(value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short docstring here too please!

@tacaswell tacaswell modified the milestones: v2.1.1, v2.2 Dec 6, 2017
@tacaswell tacaswell added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Feb 5, 2018
@story645
Copy link
Member Author

story645 commented Feb 6, 2018

@dstansby, plan is to address your comments, just wanna run through this the CI first.

@story645 story645 changed the title Categorical: Unsorted, Mixed Type, Support NaN on Scatter, fix overwrite bug Categorical: Unsorted, String only, fix overwrite bug Feb 6, 2018
.appveyor.yml Outdated
@@ -66,7 +66,7 @@ install:
- activate test-environment
- echo %PYTHON_VERSION% %TARGET_ARCH%
# pytest-cov>=2.3.1 due to https://github.com/pytest-dev/pytest-cov/issues/124
- pip install -q "pytest!=3.3.0" "pytest-cov>=2.3.1" pytest-rerunfailures pytest-timeout pytest-xdist
- pip install -q "pytest=3.4.0" "pytest-cov>=2.3.1" pytest-rerunfailures pytest-timeout pytest-xdist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should >= this so we aren't stuck on 3.4 for ever.... Not sure why Appveryor isn't getting 3.4 anyways.

@story645
Copy link
Member Author

story645 commented Feb 6, 2018

Pretty sure this PR mostly ended up same as #10212, but here the failing tests are marked as xfail. But this PR also cleans up/consolidates the tests in #10212 (lots of overlap between me and @anntzer ). As for the failing tests, they're due to inconsistencies in unit implementation between scatter, bar, and plot, and that seems to be a fix better suited for a standalone PR. (And an audit of how conversion is done in all the plotting functions).
cc: @jklymak @tacaswell

@jklymak
Copy link
Member

jklymak commented Feb 6, 2018

xref #9713

@tacaswell
Copy link
Member

Can you also deprecated unit_data? There was already a instance attribute for holding that data (self.units) that we missed when doing the initial implemenation

https://github.com/matplotlib/matplotlib/blob/0ec46336fd142313fa12de936e002e94e2a0dce5/lib/matplotlib/axis.py#L799-L820

if pos is None:
return ""
r_mapping = {v: to_str(k) for k, v in self._unit_data._mapping.items()}
return r_mapping.get(int(x), '')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to use int(np.round(x)) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah...probably

@@ -1,257 +1,273 @@
# -*- coding: utf-8 -*-
# -*- coding: utf-8 -*-A
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra 'A'


def __init__(self, data):
def __init__(self, data=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to remove passing in data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the way that the mapping is updated below I think you have to not take input (as the counter is just incremented) so it will step on user values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐑 I am 😪 and can not read. Carry on this is 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we needed that? In this case, it's UnitData(['a', 'b', 'c']). Granted, I dunno if it ever gets called with data, or if it's always an indirect via an update...

@@ -66,7 +66,7 @@ install:
- activate test-environment
- echo %PYTHON_VERSION% %TARGET_ARCH%
# pytest-cov>=2.3.1 due to https://github.com/pytest-dev/pytest-cov/issues/124
- pip install -q "pytest!=3.3.0" "pytest-cov>=2.3.1" pytest-rerunfailures pytest-timeout pytest-xdist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit worried about requiring the newest version of pytest? Are there new features we need or just bug fixes and this was the easiest way to make sure we got them?

Copy link
Member Author

@story645 story645 Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is what I needed to get pytest.param to work, which I used to mark the individual failing tests. I can do a rewrite to get around that, it just makes the tests even clunkier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets see if we get push back from the packagers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We mark parametrized tests as xfail without using pytest.param all over the place. For example, needs_usetex used in this parameter is really an xfail.

So I'm not so sure you need to bump requirements here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to go back to an old version of the docs, e.g. 3.0.0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Then will do that and undo the travis/appveyor changes.

Copy link
Member Author

@story645 story645 Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, doesn't work/fails spectacularly when I try. Technically params was introduced in 3.2, but I dunno how to say version >=3.2 and !=3.3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd that xfail doesn't seem to catch did-not-raise errors, but Fedora has 3.2, so if that's all you need, we'd probably be fine with that. Debian other-than-stable is probably okay too.

Copy link
Member Author

@story645 story645 Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solution for the how to specify problem: pytest!=3.3.0, >=3.20 and travis didn't break this time. 🤞 for appveyor.

@tacaswell
Copy link
Member

I am overall 👍 on this. It fixes several real bugs, and does not make anything worse (plot and bar still take mixed types, but we now have xfail tests to fix).

@story645 can you squash this down to fewer commits / get rid of the merges?

@story645
Copy link
Member Author

story645 commented Feb 8, 2018

@tacaswell tried but I'm doing the "verified" merges via the website and have now broken my repo like twice...

Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really needs a couple of paragraphs at the top explaining what is going on, and helping future folks follow the code. I get it now, but it took me a while to understand that the str2num mapping is being put in axis.units. I don't know if we want that paragraph to be in the public API or not, but something would sure help.

def unit_data(self, unit_data):
self._unit_data = unit_data
self.set_units = unit_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this. set_units is a method on the axis class. What is unit_data?

(bytes, six.text_type, np.str_, np.bytes_)))


def to_str(value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be private?

value: string or iterable
value or list of values to be converted
unit: None
units to use for an axis with string data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit doesn't do anything, so the doc string shouldn't make it sound like it will...

# default_units->axis_info->convert
if axis.unit_data is None:
axis.unit_data = UnitData(data)
if axis.units is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this needs some explanation, somewhere. If I use the jpl toy example, ax.yaxis.units returns something like 'meters'. dates.py returns the timezone (if set). Here, you load axis.units up with the data map. I guess thats OK. But its a little mysterious. Some description of why would be appreciated. Is this really the only place we can carry that map around?

Second, the name UnitData doesn't help me know whats going on. I'd consider changing this name to CategoricalUnitsMap or something that makes it clear its categorical that is involved and that its a map. If I want to query an axis as to its units, axis.units is a useful place to look, and UnitData doesn't quite convey that (though it will usually be <matplotlib.category.UnitData object at 0x11eda2d68>, so take as you will.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that 'units' is a place to stash what ever the handler feels like stashing there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, although it could do with documenting, I think units is a place for the converter to store any variables that affect how it does the conversion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see thats how its being used. But the naive user (me as of a couple of months ago) would have a though time understanding that, and might think that this property of the axis class, namedaxis.units, might actually be the units of the axis.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the initial implementation of categorical we missed this and added unit_data (which is new being deprecated) to stash the mapping.

Copy link
Member Author

@story645 story645 Feb 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the wrinkle to this is that it'd probably be useful to store the sort of sorting info the jpl folks do in their implemention of StrConvertor but that's probably a refactor away. And just another attribute on the unit object...

@tacaswell
Copy link
Member

I took the liberty of pushing commits to address the latest round of comments.


import unittest

def _to_str(value):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of having two versions of the exact same function. Can the _to_str inside the StrFormatter of call be pulled out into a private class method of StrFormatter so that it can be called in the same say convert can be called?



class StrCategoryConverter(units.ConversionInterface):
@staticmethod
def convert(value, unit, axis):
"""Uses axis.unit_data map to encode
data as floats
"""Uses axis.units to encode string data as floats
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc string isn't correct in this implementation. It uses unit to encode the string.

value : string or iterable
value or list of values to be converted
unit : :class:`.UnitData`
units to use for an axis with string data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest "UnitData contains map between category strings and floats.

"""
Parameters
----------
units: dict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't right either, is it? Its a UnitData object as well, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, updating this now

Copy link
Member

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one tiny change, but this looks 👍 overall to me

@@ -0,0 +1,10 @@
Deprecated `Axis.unt_data`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unt_data --> unit_data

@tacaswell
Copy link
Member

fixed the typo @dstansby found.

@dstansby dstansby dismissed their stale review February 11, 2018 15:19

Changes made

@jklymak
Copy link
Member

jklymak commented Feb 11, 2018

Took the liberty of pushing a bit more docs for the preamble. I think this is an improvement over the previous version, and basically think it can be merged. There are still going to be some confused folks, but at least the rules are clearer now.

@jklymak
Copy link
Member

jklymak commented Feb 11, 2018

Won't merge w/o someone else checking the pre-amble change I made.

@tacaswell tacaswell merged commit ace9663 into matplotlib:master Feb 11, 2018
@tacaswell
Copy link
Member

Did not wait for CI, as the last commit is docs-only.

@dstansby
Copy link
Member

🎉 thanks a lot @story645 for putting up with all our reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. topic: categorical
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants