Categorical: Unsorted, String only, fix overwrite bug #9783

story645 · 2017-11-14T20:35:04Z

Alternative to #9774 and supersedes #9340 that preserves more of the existing behavior. Open discussion on whether that should be preserved, but closes out #9350, #9312, #9336 .

import numpy as np
dog = ["bored", "happy","happy","bored", "bored","happy"]
cat = ["bored", "bored","bored","happy","bored","bored"]
time = ["combing","feeding","drinking","napping","washing","playing"]

fig, ax = plt.subplots()
ax.plot(time, dog, label="dog")
ax.plot(time, cat, label="cat")
ax.legend()

And it's supporting missing values/nans in scatter but not in plot (in plot it's treating them as a categorical). My guess is this is tied into #9713 (plot doesn't behave like the other functions).

tacaswell · 2017-11-15T00:42:10Z

lib/matplotlib/tests/test_category.py

-                 [1, 1, -1, 2, 100, 0, 200])]
-    ids = ["unicode", "single", "basic", "mixed"]
+
+    test_cases = {"unicode": ("Здравствуйте мир",


This needs to be an ordered dict so that they are always in the same order so that pytest-xdist works.

Kinda surprised since I was always taught that tests should be independent of each other/order but ok.

they are, but xdist imports the tests in both processes and then sorts out who does what (instead of trying to pickle and ship tests across the wire). One of it's start up checks is that it finds exactly the same tests in both processes (to make sure nothing funny is happening). In 3.5 the order is random which is fine from a testing point of view but not fine from xdist counting it's fingers and toes point of view.

tacaswell · 2017-11-16T00:32:21Z

please leave for me to merge

dstansby

I think this looks okay, but the public facing methods could do with some short docstrings (which would make reviewing easier too!)

dstansby · 2017-12-03T12:59:13Z

lib/matplotlib/category.py

    def __init__(self, data):
        """Create mapping between unique categorical values
        and numerical identifier

        Parameters
        ----------
        data: iterable
-            sequence of values
+              sequence of values


The sequence can contain anything including mixed types right? Might be worth adding a note here if that's true.

dstansby · 2017-12-03T13:00:25Z

lib/matplotlib/category.py


    @staticmethod
    def axisinfo(unit, axis):
-        majloc = StrCategoryLocator(axis.unit_data.locs)
-        majfmt = StrCategoryFormatter(axis.unit_data.seq)
+        majloc = StrCategoryLocator(axis.unit_data._locs)


Can this get a short docstring too?

dstansby · 2017-12-03T13:00:47Z

lib/matplotlib/category.py

-            data = [d.encode('utf-8', 'ignore').decode('utf-8')
-                    for d in data]
-        return np.array(data, dtype=np.unicode)
+def to_str(value):


Short docstring here too please!

story645 · 2018-02-06T17:54:51Z

@dstansby, plan is to address your comments, just wanna run through this the CI first.

jklymak · 2018-02-06T19:13:52Z

.appveyor.yml

@@ -66,7 +66,7 @@ install:
  - activate test-environment
  - echo %PYTHON_VERSION% %TARGET_ARCH%
  # pytest-cov>=2.3.1 due to https://github.com/pytest-dev/pytest-cov/issues/124
-  - pip install -q "pytest!=3.3.0" "pytest-cov>=2.3.1" pytest-rerunfailures pytest-timeout pytest-xdist
+  - pip install -q "pytest=3.4.0" "pytest-cov>=2.3.1" pytest-rerunfailures pytest-timeout pytest-xdist


I think you should >= this so we aren't stuck on 3.4 for ever.... Not sure why Appveryor isn't getting 3.4 anyways.

Old review

story645 · 2018-02-06T23:10:20Z

Pretty sure this PR mostly ended up same as #10212, but here the failing tests are marked as xfail. But this PR also cleans up/consolidates the tests in #10212 (lots of overlap between me and @anntzer ). As for the failing tests, they're due to inconsistencies in unit implementation between scatter, bar, and plot, and that seems to be a fix better suited for a standalone PR. (And an audit of how conversion is done in all the plotting functions).
cc: @jklymak @tacaswell

jklymak · 2018-02-06T23:11:33Z

xref #9713

tacaswell · 2018-02-08T01:24:18Z

Can you also deprecated unit_data? There was already a instance attribute for holding that data (self.units) that we missed when doing the initial implemenation

https://github.com/matplotlib/matplotlib/blob/0ec46336fd142313fa12de936e002e94e2a0dce5/lib/matplotlib/axis.py#L799-L820

tacaswell · 2018-02-08T01:21:50Z

lib/matplotlib/category.py

+        if pos is None:
+            return ""
+        r_mapping = {v: to_str(k) for k, v in self._unit_data._mapping.items()}
+        return r_mapping.get(int(x), '')


Probably better to use int(np.round(x)) ?

Oh, yeah...probably

tacaswell · 2018-02-08T01:29:32Z

lib/matplotlib/tests/test_category.py

@@ -1,257 +1,273 @@
-# -*- coding: utf-8 -*-
+# -*- coding: utf-8 -*-A


tacaswell · 2018-02-08T02:47:08Z

lib/matplotlib/category.py


-    def __init__(self, data):
+    def __init__(self, data=None):


Do you want to remove passing in data?

Given the way that the mapping is updated below I think you have to not take input (as the counter is just incremented) so it will step on user values.

🐑 I am 😪 and can not read. Carry on this is 👍

I thought we needed that? In this case, it's UnitData(['a', 'b', 'c']). Granted, I dunno if it ever gets called with data, or if it's always an indirect via an update...

tacaswell · 2018-02-08T03:01:07Z

.appveyor.yml

@@ -66,7 +66,7 @@ install:
  - activate test-environment
  - echo %PYTHON_VERSION% %TARGET_ARCH%
  # pytest-cov>=2.3.1 due to https://github.com/pytest-dev/pytest-cov/issues/124
-  - pip install -q "pytest!=3.3.0" "pytest-cov>=2.3.1" pytest-rerunfailures pytest-timeout pytest-xdist


I am a bit worried about requiring the newest version of pytest? Are there new features we need or just bug fixes and this was the easiest way to make sure we got them?

I think this is what I needed to get pytest.param to work, which I used to mark the individual failing tests. I can do a rewrite to get around that, it just makes the tests even clunkier.

This is where it's from: https://github.com/matplotlib/matplotlib/pull/9783/files#diff-f36a7b45d6a24734ba38d2da8f52f138R257

Lets see if we get push back from the packagers.

We mark parametrized tests as xfail without using pytest.param all over the place. For example, needs_usetex used in this parameter is really an xfail.

So I'm not so sure you need to bump requirements here.

You have to go back to an old version of the docs, e.g. 3.0.0.

Thanks! Then will do that and undo the travis/appveyor changes.

Ugh, doesn't work/fails spectacularly when I try. Technically params was introduced in 3.2, but I dunno how to say version >=3.2 and !=3.3

Odd that xfail doesn't seem to catch did-not-raise errors, but Fedora has 3.2, so if that's all you need, we'd probably be fine with that. Debian other-than-stable is probably okay too.

solution for the how to specify problem: pytest!=3.3.0, >=3.20 and travis didn't break this time. 🤞 for appveyor.

tacaswell · 2018-02-08T03:48:29Z

I am overall 👍 on this. It fixes several real bugs, and does not make anything worse (plot and bar still take mixed types, but we now have xfail tests to fix).

@story645 can you squash this down to fewer commits / get rid of the merges?

story645 · 2018-02-08T05:09:42Z

@tacaswell tried but I'm doing the "verified" merges via the website and have now broken my repo like twice...

jklymak

This really needs a couple of paragraphs at the top explaining what is going on, and helping future folks follow the code. I get it now, but it took me a while to understand that the str2num mapping is being put in axis.units. I don't know if we want that paragraph to be in the public API or not, but something would sure help.

jklymak · 2018-02-09T22:39:18Z

lib/matplotlib/axis.py

    def unit_data(self, unit_data):
-        self._unit_data = unit_data
+        self.set_units = unit_data


I'm confused by this. set_units is a method on the axis class. What is unit_data?

jklymak · 2018-02-09T22:42:29Z

lib/matplotlib/category.py

+                        (bytes, six.text_type, np.str_, np.bytes_)))
+
+
+def to_str(value):


Should this be private?

jklymak · 2018-02-09T23:05:46Z

lib/matplotlib/category.py

+        value: string or iterable
+            value or list of values to be converted
+        unit: None
+            units to use for an axis with string data


unit doesn't do anything, so the doc string shouldn't make it sound like it will...

jklymak · 2018-02-09T23:19:14Z

lib/matplotlib/category.py

        # default_units->axis_info->convert
-        if axis.unit_data is None:
-            axis.unit_data = UnitData(data)
+        if axis.units is None:


OK, this needs some explanation, somewhere. If I use the jpl toy example, ax.yaxis.units returns something like 'meters'. dates.py returns the timezone (if set). Here, you load axis.units up with the data map. I guess thats OK. But its a little mysterious. Some description of why would be appreciated. Is this really the only place we can carry that map around?

Second, the name UnitData doesn't help me know whats going on. I'd consider changing this name to CategoricalUnitsMap or something that makes it clear its categorical that is involved and that its a map. If I want to query an axis as to its units, axis.units is a useful place to look, and UnitData doesn't quite convey that (though it will usually be <matplotlib.category.UnitData object at 0x11eda2d68>, so take as you will.

My understanding is that 'units' is a place to stash what ever the handler feels like stashing there.

Yes, although it could do with documenting, I think units is a place for the converter to store any variables that affect how it does the conversion.

Yes, I see thats how its being used. But the naive user (me as of a couple of months ago) would have a though time understanding that, and might think that this property of the axis class, namedaxis.units, might actually be the units of the axis.

With the initial implementation of categorical we missed this and added unit_data (which is new being deprecated) to stash the mapping.

the wrinkle to this is that it'd probably be useful to store the sort of sorting info the jpl folks do in their implemention of StrConvertor but that's probably a refactor away. And just another attribute on the unit object...

tacaswell · 2018-02-10T21:25:53Z

I took the liberty of pushing commits to address the latest round of comments.

story645 · 2018-02-11T00:45:17Z

lib/matplotlib/tests/test_category.py


-import unittest
+
+def _to_str(value):


Not a fan of having two versions of the exact same function. Can the _to_str inside the StrFormatter of call be pulled out into a private class method of StrFormatter so that it can be called in the same say convert can be called?

jklymak · 2018-02-11T01:05:09Z

lib/matplotlib/category.py



 class StrCategoryConverter(units.ConversionInterface):
    @staticmethod
    def convert(value, unit, axis):
-        """Uses axis.unit_data map to encode
-        data as floats
+        """Uses axis.units to encode string data as floats


This doc string isn't correct in this implementation. It uses unit to encode the string.

jklymak · 2018-02-11T01:06:22Z

lib/matplotlib/category.py

+        value : string or iterable
+            value or list of values to be converted
+        unit : :class:`.UnitData`
+            units to use for an axis with string data


Suggest "UnitData contains map between category strings and floats.

jklymak · 2018-02-11T01:08:02Z

lib/matplotlib/category.py

+        """
+        Parameters
+        ----------
+        units: dict


This isn't right either, is it? Its a UnitData object as well, right?

yeah, updating this now

dstansby

Just one tiny change, but this looks 👍 overall to me

dstansby · 2018-02-11T10:58:26Z

doc/api/next_api_changes/2018-02-10-HA.rst

@@ -0,0 +1,10 @@
+Deprecated `Axis.unt_data`


unt_data --> unit_data

tacaswell · 2018-02-11T15:04:48Z

fixed the typo @dstansby found.

Changes made

jklymak · 2018-02-11T21:42:57Z

Took the liberty of pushing a bit more docs for the preamble. I think this is an improvement over the previous version, and basically think it can be merged. There are still going to be some confused folks, but at least the rules are clearer now.

jklymak · 2018-02-11T21:47:35Z

Won't merge w/o someone else checking the pre-amble change I made.

tacaswell · 2018-02-11T22:01:46Z

Did not wait for CI, as the last commit is docs-only.

dstansby · 2018-02-11T22:02:47Z

🎉 thanks a lot @story645 for putting up with all our reviews!

updated tests to dicts

4c9353f

story645 added topic: categorical status: duplicate and removed status: duplicate labels Nov 14, 2017

tacaswell added this to the v2.1.1 milestone Nov 15, 2017

tacaswell reviewed Nov 15, 2017

View reviewed changes

tacaswell mentioned this pull request Nov 20, 2017

Don't sort categorical keys. #9318

Closed

6 tasks

dstansby previously requested changes Dec 3, 2017

View reviewed changes

tacaswell modified the milestones: v2.1.1, v2.2 Dec 6, 2017

story645 mentioned this pull request Jan 3, 2018

Error Handling of Non-Ints/Floats for postion of xticks #10147

Closed

tacaswell added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Feb 5, 2018

story645 changed the title ~~Categorical: Unsorted, Mixed Type, Support NaN on Scatter, fix overwrite bug~~ Categorical: Unsorted, String only, fix overwrite bug Feb 6, 2018

jklymak reviewed Feb 6, 2018

View reviewed changes

tacaswell reviewed Feb 8, 2018

View reviewed changes

category bug fix + new tests + refactor

34b8eb4

story645 force-pushed the cat branch from 4491883 to 34b8eb4 Compare February 8, 2018 05:01

Merge branch 'master' into cat

543d235

story645 force-pushed the cat branch 2 times, most recently from 506ca48 to c07b9a3 Compare February 8, 2018 16:43

story645 force-pushed the cat branch from b0a1184 to efec2de Compare February 9, 2018 17:43

jklymak reviewed Feb 9, 2018

View reviewed changes

tacaswell approved these changes Feb 11, 2018

View reviewed changes

story645 commented Feb 11, 2018

View reviewed changes

jklymak reviewed Feb 11, 2018

View reviewed changes

story645 force-pushed the cat branch 2 times, most recently from 627f50f to cfdbd9b Compare February 11, 2018 03:03

addressing documentation comments + more use of units

4d57690

story645 force-pushed the cat branch from cfdbd9b to 4d57690 Compare February 11, 2018 04:03

dstansby previously requested changes Feb 11, 2018

View reviewed changes

doc/api/next_api_changes/2018-02-10-HA.rst Outdated

@@ -0,0 +1,10 @@

Deprecated `Axis.unt_data`

Copy link

Member

dstansby Feb 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unt_data --> unit_data

DOC: fix typo

22e3a66

Suggested pre-amble change

c7d57f6

jklymak approved these changes Feb 11, 2018

View reviewed changes

tacaswell merged commit ace9663 into matplotlib:master Feb 11, 2018

This was referenced Feb 12, 2018

Are categorical plots with single letter strings limited to show 10 categories? #9843

Closed

Data types not preserved in categoricals #9350

Closed

Integer Categorical Values Not Getting Mapped Correctly #9336

Closed

story645 mentioned this pull request Feb 12, 2018

categorical axis sorts its keys #9312

Closed

QuLogic modified the milestones: needs sorting, v2.2.0 Feb 12, 2018

jklymak mentioned this pull request Feb 21, 2018

Categorical refactor #10212

Closed

5 tasks

borgesaugusto mentioned this pull request Oct 8, 2023

List of lists of categorical data failing: Scatter ravel is performed before _process_unit_info() is called. #27035

Open

5 tasks

story645 deleted the cat branch October 10, 2023 02:03

		@@ -1,257 +1,273 @@
		# -- coding: utf-8 --
		# -- coding: utf-8 --A

		(bytes, six.text_type, np.str_, np.bytes_)))


		def to_str(value):

Categorical: Unsorted, String only, fix overwrite bug #9783

Categorical: Unsorted, String only, fix overwrite bug #9783

Conversation

story645 commented Nov 14, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tacaswell commented Nov 16, 2017

dstansby left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

story645 commented Feb 6, 2018

Choose a reason for hiding this comment

story645 commented Feb 6, 2018 • edited

jklymak commented Feb 6, 2018

tacaswell commented Feb 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

story645 Feb 8, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

story645 Feb 8, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

story645 Feb 8, 2018 • edited

Choose a reason for hiding this comment

tacaswell commented Feb 8, 2018

story645 commented Feb 8, 2018

jklymak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

story645 Feb 11, 2018 • edited

Choose a reason for hiding this comment

tacaswell commented Feb 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dstansby left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tacaswell commented Feb 11, 2018

jklymak commented Feb 11, 2018

jklymak commented Feb 11, 2018

tacaswell commented Feb 11, 2018

dstansby commented Feb 11, 2018

story645 commented Feb 6, 2018 •

edited

story645 Feb 8, 2018 •

edited

story645 Feb 8, 2018 •

edited

story645 Feb 8, 2018 •

edited

story645 Feb 11, 2018 •

edited