Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorical refactor #10212

Closed
wants to merge 22 commits into from
Closed

Conversation

tacaswell
Copy link
Member

PR Summary

PR Checklist

  • Has Pytest style unit tests
  • Code is PEP 8 compliant
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@tacaswell tacaswell added Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. topic: categorical labels Jan 10, 2018
@tacaswell tacaswell added this to the v2.2 milestone Jan 10, 2018
@@ -1,128 +1,109 @@
# -*- coding: utf-8 OA-*-za
# -*- coding: utf-8 OA -*-
Copy link
Contributor

@anntzer anntzer Jan 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's "OA"?
actually not even sure why this file needs an encoding cookie

return vmap[value]

vals = shim_array(value)
"""Use axis.units mapping tncode data as floats."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

raise TypeError("{val!r} is not a string".format(val=k))
self._mapping[k] = int(v)
if self._mapping:
start = max(self._mapping.values()) + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like if you pass an explicit mapping then either

  1. you don't expect new categories to come in and you'd rather error out on new categories, or
  2. you may want to have your own index-generator as well.

thoughts?

@@ -750,15 +749,14 @@ def limit_range_for_scale(self, vmin, vmax):
return self._scale.limit_range_for_scale(vmin, vmax, self.get_minpos())

@property
@cbook.deprecated("2.1.1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.1.2, below the same

@tacaswell tacaswell force-pushed the categorical_refactor branch 2 times, most recently from efe305e to 7d336f7 Compare January 10, 2018 04:07
@tacaswell
Copy link
Member Author

@anntzer 's comment about not letting further updates if the user provides an explicit initial mapping. I don't really see the down side of letting user expand on initial data. Once it is loaded and you have loaded the initial data up, it's not different than the case were after the first round of plotting we see some new categories.

@story645
Copy link
Member

story645 commented Jan 10, 2018

I agree with @anntzer that I don't get why you'd let the user put in an initial mapping - or how they'd even do that. I agree that at that point they should just be doing it by hand, and it adds an I think unnecessary layer of complexity to the init.


@staticmethod
def axisinfo(unit, axis):
majloc = StrCategoryLocator(axis.unit_data.locs)
majfmt = StrCategoryFormatter(axis.unit_data.seq)
majloc = StrCategoryLocator(axis.units)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It bugs me that locator and formatter are getting the mapping and there's not a separation, so at the least I think there should be a comment saying that this is because the Formatter and Locator require references rather than objects/copies.


class StrCategoryLocator(ticker.Locator):
def __init__(self, unit_data):
self._unit_data = unit_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency, shouldn't this be units?

def __call__(self, x, pos=None):
if pos is None:
return ""
r_mapping = {v: k for k, v in self._unit_data._mapping.items()}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the mapping is now an ordered dict starting at 0, isn't it really just an array where the index is the mapping? so basically list(self._unit_data.keys())[int(x)] ? (which makes me wonder if that shouldn't just be how the mapping is stored anyway....)

Copy link
Member

@story645 story645 Jan 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe the self._unit_data. should be a proper dict/hash like object? That has the added benefit of moving the getter function into the object implementation and then

if not isinstance(k, six.text_type):             
    k = k.decode('utf-8')
    return axis.units._mapping[k]

can be in one place.

@anntzer
Copy link
Contributor

anntzer commented Jan 15, 2018

The idea with user-provided mapping is that we don't even check that the converted values they provide are indeed consecutive integers starting with zero (and I'm not sure we should either), so (unless we do) incrementing a counter may or may not be a reasonable way to provide "new" converted values.

@@ -211,32 +211,24 @@ def set_patchprops(self, fill_poly, **kwargs):
fill_poly.set(**kwargs)

def _xy_from_xy(self, x, y):
def _check_numeric(inp):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is raising all sorts of test errors for non-categorical...

Simplify String Categorical handling
------------------------------------

- Only handling `str` (not `bytes` in pyhon3). This disallows using integers or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sp: python3

(sorry, I know we're supposed to push minor editorial changes directly, but I'm away from my main machine ATM)

story645 and others added 10 commits February 4, 2018 21:47
 - Expect that `axis.units` has a `_mapping` attribute.
 - Allow only strings.  This prevents issues with mixed input where
   plotting different sub-sets of the data will or will not trigger
   the correct converter / unit behavior.
 - Do not allow missing data.  Do not special case nan or inf for the
   same reasons as above.
 - deprecate `Axis.unit_data`.  Axis already has a `units` attribute
   which hold a UnitData instance.
Change to using direct sub-classes of Locator and Formatter which hold
a reference to the UnitData object.  This makes the classes a bit less
brittle and avoids having extra methods on the classes which will not
work as intended.
Reattach the unitful data to Line2D.
@tacaswell
Copy link
Member Author

The explicit for number post conversion was prematurely masking a different exception that we are testing for related to the shape so I just reverted that change.

The other changes to _base.py where somehow breaking pandas + datetime64, but reverting that seems to have broken some other things.

@tacaswell
Copy link
Member Author

@tacaswell 's homework

  • rip out pre-defining the mapping
  • fix the tests
  • update the docs

@tacaswell tacaswell modified the milestones: v2.2, v3.0 Feb 5, 2018
story645 added a commit to story645/matplotlib that referenced this pull request Feb 6, 2018
@jklymak
Copy link
Member

jklymak commented Feb 21, 2018

Should this still be open given that #9783 is merged?

@dstansby
Copy link
Member

What's the status of this? I'm happy to put time into reviewing since it's labelled as release critical for 3.0, but just wanted to double check first!

@story645
Copy link
Member

Close for now because most of this went in via #9783 and much of what didn't is related to general unit inconsistencies and so should be a stand along PR anyway.

@story645 story645 closed this May 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. status: needs revision topic: categorical
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants