Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept constant memoryviews in HashTable.lookup #21688

Merged
merged 13 commits into from Jul 7, 2018

Conversation

Projects
None yet
6 participants
@xhochy
Copy link
Contributor

commented Jun 30, 2018

xref #10070
xref #12813

building an ExtensionArray for nullable ints using a pyarrow.Array, the underlying memoryview is marked as constant. The HashTable Cython code uses the memoryview in the lookup function but in a read-only fashion. Without the const specifier Cython raises an error that the memoryview is read-only.

Not sure how to test this in a small scope (i.e. without using pyarrow) or if this small change is worth mentioning in the changelog.

  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry
@xhochy

This comment has been minimized.

Copy link
Contributor Author

commented Jun 30, 2018

The Travis failure seems quite silent. Is there a way to extract more information?

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jun 30, 2018

I restarted the one. I haven't seen that error before.

@TomAugspurger TomAugspurger added the Algos label Jun 30, 2018

@codecov

This comment has been minimized.

Copy link

commented Jun 30, 2018

Codecov Report

Merging #21688 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21688      +/-   ##
==========================================
- Coverage   91.95%   91.95%   -0.01%     
==========================================
  Files         160      160              
  Lines       49837    49820      -17     
==========================================
- Hits        45830    45812      -18     
- Misses       4007     4008       +1
Flag Coverage Δ
#multiple 90.33% <ø> (-0.01%) ⬇️
#single 42.07% <ø> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/category.py 97.01% <0%> (-0.28%) ⬇️
pandas/core/arrays/datetimelike.py 94.38% <0%> (-0.25%) ⬇️
pandas/core/indexes/period.py 93.35% <0%> (-0.23%) ⬇️
pandas/core/indexes/multi.py 94.96% <0%> (-0.09%) ⬇️
pandas/core/groupby/groupby.py 92.65% <0%> (-0.05%) ⬇️
pandas/core/indexes/timedeltas.py 91% <0%> (-0.03%) ⬇️
pandas/core/frame.py 97.19% <0%> (-0.01%) ⬇️
pandas/core/strings.py 98.63% <0%> (ø) ⬆️
pandas/core/arrays/datetimes.py 100% <0%> (ø) ⬆️
pandas/util/testing.py 85.33% <0%> (ø) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e84cf2a...a4cb629. Read the comment docs.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jun 30, 2018

Must have been a one-off failure.

cc @chris-b1 if you have any thoughts on this?

@@ -379,7 +379,7 @@ cdef class {{name}}HashTable(HashTable):
self.table.vals[k] = i

@cython.boundscheck(False)
def lookup(self, {{dtype}}_t[:] values):
def lookup(self, const {{dtype}}_t[:] values):

This comment has been minimized.

Copy link
@jreback

jreback Jun 30, 2018

Contributor

we have multiple lookup routines for different drapes - is there a reason to not change them all?

why just lookup? we have many routines which couldnin theory benefit from this

This comment has been minimized.

Copy link
@xhochy

xhochy Jun 30, 2018

Author Contributor

I can make a pass over hashtable_class_helper.pxi.in and add const specifiers where applicable.

@chris-b1

This comment has been minimized.

Copy link
Contributor

commented Jun 30, 2018

Now that cython supports const memoryviews we probably should go ahead and use them everywhere possible, #18825 example of working around this.

Looks like would bump our required version up to 0.28 but shouldn't really be a problem since we distribute c files on PyPI?
https://github.com/cython/cython/blob/master/CHANGES.rst#028-2018-03-13

@@ -343,6 +343,8 @@ Other
^^^^^

- :meth: `~pandas.io.formats.style.Styler.background_gradient` now takes a ``text_color_threshold`` parameter to automatically lighten the text color based on the luminance of the background color. This improves readability with dark background colors without the need to limit the background colormap range. (:issue:`21258`)
- Building pandas for development now requires ``cython >= 0.28``
- Require at least 0.28 version of cython to support read-only memoryviews (:issue:`21688`)

This comment has been minimized.

Copy link
@xhochy

xhochy Jun 30, 2018

Author Contributor

Referenced the PR as issue here. Is that sufficient or would you like to have separate issues for each line for documentation purposes?

This comment has been minimized.

Copy link
@jreback

jreback Jul 1, 2018

Contributor

so we really need to bump cython here?

This comment has been minimized.

Copy link
@jschendel

jschendel Jul 1, 2018

Member

Assuming we need to bump Cython, it looks like there are a few things missing in order to fully bump it: pin some CI builds to the minimum Cython version, update version number in install.rst.

See the changes in #18623 for where I bumped to 0.24; should show all the places that need updating.

This comment has been minimized.

Copy link
@xhochy

xhochy Jul 1, 2018

Author Contributor

Reading through the changelog of Cython, we need to bump to 0.28 to support the const specifier for memoryviews. Looking a bit through the Travis setup, we only pin Cython for the 2.7 builds and these are the ones that failed. (sadly with a weird error message).

This comment has been minimized.

Copy link
@jreback

jreback Jul 2, 2018

Contributor

we need to bump to 0.28 to support the const specifier for memoryviews

is this an actual change in cython?

I agree its nice to change this, but cython 0.28 is pretty new.

This comment has been minimized.

Copy link
@jreback

jreback Jul 7, 2018

Contributor

can you move the building pandas to a new sub-section (like we did previously when we bumped deps, as going to do this for more deps this cycle).

also is 0.28 min, or 0.28.2?

This comment has been minimized.

Copy link
@xhochy

xhochy Jul 7, 2018

Author Contributor

0.28.0 works but we should build with 0.28.2 due to the mentioned problems in sklearn. I would at least bump the Cython version on Travis & CricleCI.

Should we then allow 0.28.0 or require 0.28.2 so that we don't get the bug reports w.r.t. problems that are fixes in .2?

@@ -343,6 +343,8 @@ Other
^^^^^

- :meth: `~pandas.io.formats.style.Styler.background_gradient` now takes a ``text_color_threshold`` parameter to automatically lighten the text color based on the luminance of the background color. This improves readability with dark background colors without the need to limit the background colormap range. (:issue:`21258`)
- Building pandas for development now requires ``cython >= 0.28``
- Require at least 0.28 version of cython to support read-only memoryviews (:issue:`21688`)

This comment has been minimized.

Copy link
@jreback

jreback Jul 1, 2018

Contributor

so we really need to bump cython here?

@jreback jreback added the Dependencies label Jul 2, 2018

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jul 2, 2018

is this an actual change in cython?

Yes, it is a new cython feature that it can now work on read-only memoryviews (see the third bullet point here: https://github.com/cython/cython/blob/master/CHANGES.rst#028-2018-03-13). This is also a feature much wanted by sklearn: scikit-learn/scikit-learn#10624

As @chris-b1 said, if we bump the cython version requirement, we should try to simplify our code base and use memoryviews in more places (#10070 as another example where two version of a function are added, one with memoryviews and other with ndarray interface to support read-only arrays)

IMO, ideally, if we bump the version, it would be good to test for a few internal cases that the const memoryviews are actually working (since this is new functionality in cython, there might be still some rough edges. In sklearn they found a bug in const memoryviews that was fixed in 0.28.2).
(but since I am not going to do that work, no strong opinion on doing this before actually bumping the version requirement)

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jul 2, 2018

Looks like would bump our required version up to 0.28 but shouldn't really be a problem since we distribute c files on PyPI?

That's correct.

@xhochy

This comment has been minimized.

Copy link
Contributor Author

commented Jul 3, 2018

Happy to implement the necessary changes when we bump the Cython version but it would be nice to know whether this is the preferred way forward?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 3, 2018

yeah am ok with bumping the cython version i guess for 0.24

since this is not user facing really

we should also again bump numpy min but that’s a different issue

@xhochy xhochy force-pushed the xhochy:accept-constant-memoryview-lookup branch from 6d38e08 to 36626d8 Jul 7, 2018

@@ -343,6 +343,8 @@ Other
^^^^^

- :meth: `~pandas.io.formats.style.Styler.background_gradient` now takes a ``text_color_threshold`` parameter to automatically lighten the text color based on the luminance of the background color. This improves readability with dark background colors without the need to limit the background colormap range. (:issue:`21258`)
- Building pandas for development now requires ``cython >= 0.28``
- Require at least 0.28 version of cython to support read-only memoryviews (:issue:`21688`)

This comment has been minimized.

Copy link
@jreback

jreback Jul 7, 2018

Contributor

can you move the building pandas to a new sub-section (like we did previously when we bumped deps, as going to do this for more deps this cycle).

also is 0.28 min, or 0.28.2?

@@ -1079,13 +1079,17 @@ class TestHashTable(object):

def test_lookup_nan(self):

This comment has been minimized.

Copy link
@jreback

jreback Jul 7, 2018

Contributor

can you make this a fixture and use this for False/True

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 7, 2018

after we merge this can try to remove the other read-only memory fixes that are in take routines
xrref #10070 and #12813

@xhochy xhochy force-pushed the xhochy:accept-constant-memoryview-lookup branch from 36626d8 to 20c181f Jul 7, 2018

jreback added some commits Jul 7, 2018

@jreback jreback added this to the 0.24.0 milestone Jul 7, 2018

@jreback

jreback approved these changes Jul 7, 2018

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 7, 2018

pushed some small doc changes. ping on green for merge.

@jreback jreback merged commit dcbf8b5 into pandas-dev:master Jul 7, 2018

2 of 3 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 7, 2018

thanks @xhochy nice patch!

@xhochy xhochy deleted the xhochy:accept-constant-memoryview-lookup branch Jul 8, 2018

alimcmaster1 added a commit to alimcmaster1/pandas that referenced this pull request Aug 12, 2018

gfyoung added a commit to forking-repos/pandas that referenced this pull request Sep 10, 2018

gfyoung added a commit that referenced this pull request Sep 11, 2018

CI / BLD: Various CI Backports (#22637)
* CI: Bump NumPy to 1.9.3

Backport of gh-22499.

* BLD: Fix openpyxl to 2.5.5

Backport of gh-22601.

* CI: Resolve timeout issue on Travis

Backported from gh-22429.

* CI: Migrate to CircleCI 2.0

Backport of gh-21814.

* Upgrade Cython to >=0.28.2

Backported from gh-21688.

* TST: Patch locale handling

Backported from gh-21739.
Backport of gh-22213.

Sup3rGeo added a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.