Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Hashtable size hint cap #22805

Merged
merged 7 commits into from
Jan 15, 2019
Merged

Conversation

chris-b1
Copy link
Contributor

@chris-b1 chris-b1 commented Sep 22, 2018

Chris Bartak and others added 2 commits September 21, 2018 10:30
@pep8speaks
Copy link

pep8speaks commented Sep 22, 2018

Hello @chris-b1! Thanks for updating the PR.

Line 1419:45: E261 at least two spaces before inline comment

Comment last updated on January 14, 2019 at 20:09 Hours UTC

@@ -266,6 +266,7 @@ cdef class {{name}}HashTable(HashTable):
def __cinit__(self, size_hint=1):
self.table = kh_init_{{dtype}}()
if size_hint is not None:
size_hint = min(size_hint, _SIZE_HINT_LIMIT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can type the size_hint as a uint_t I think to avoid this problem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's getting typed as a uint on the next line - we actually want it untyped here so it could accept arbitrarily large values, just needs capped before before being converted to the c-type (what was causing the original error)

@gfyoung gfyoung added Bug Groupby Internals Related to non-user accessible pandas implementation labels Sep 23, 2018
@jreback
Copy link
Contributor

jreback commented Nov 23, 2018

can you rebase

@jreback
Copy link
Contributor

jreback commented Dec 12, 2018

@chris-b1 can you rebase / update

@codecov
Copy link

codecov bot commented Dec 12, 2018

Codecov Report

Merging #22805 into master will decrease coverage by 0.03%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22805      +/-   ##
==========================================
- Coverage   92.21%   92.18%   -0.04%     
==========================================
  Files         162      169       +7     
  Lines       51768    50810     -958     
==========================================
- Hits        47739    46839     -900     
+ Misses       4029     3971      -58
Flag Coverage Δ
#multiple 90.6% <ø> (-0.02%) ⬇️
#single 42.37% <ø> (-0.63%) ⬇️
Impacted Files Coverage Δ
pandas/io/formats/html.py 90.68% <0%> (-7.93%) ⬇️
pandas/core/arrays/period.py 91.98% <0%> (-6.31%) ⬇️
pandas/plotting/_tools.py 72.62% <0%> (-6.15%) ⬇️
pandas/io/feather_format.py 85.71% <0%> (-4.03%) ⬇️
pandas/io/parquet.py 73.72% <0%> (-3.2%) ⬇️
pandas/core/arrays/base.py 94.24% <0%> (-3.18%) ⬇️
pandas/plotting/_style.py 74.28% <0%> (-3.14%) ⬇️
pandas/core/sparse/scipy_sparse.py 97.05% <0%> (-2.95%) ⬇️
pandas/core/arrays/datetimes.py 96.82% <0%> (-1.74%) ⬇️
pandas/core/indexes/range.py 95.73% <0%> (-1.6%) ⬇️
... and 136 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9bc42f9...74b3b45. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 12, 2018

Codecov Report

Merging #22805 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #22805   +/-   ##
=======================================
  Coverage   92.38%   92.38%           
=======================================
  Files         166      166           
  Lines       52363    52363           
=======================================
  Hits        48377    48377           
  Misses       3986     3986
Flag Coverage Δ
#multiple 90.81% <ø> (ø) ⬆️
#single 42.91% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ff40c2f...d3cbb2b. Read the comment docs.

@chris-b1
Copy link
Contributor Author

@jreback - I think this is good now

@@ -573,9 +574,10 @@ cdef class StringHashTable(HashTable):
# or a sentinel np.nan / None missing value
na_string_sentinel = '__nan__'

def __init__(self, int size_hint=1):
def __init__(self, size_hint=1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just type this as int64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, that'll work

@mroeschke
Copy link
Member

This just needs a rebase and it looks good to go (Travis failure was a timeout)

@jreback
Copy link
Contributor

jreback commented Jan 1, 2019

can you merge master and update

@jreback
Copy link
Contributor

jreback commented Jan 14, 2019

@chris-b1 can you merge master

@chris-b1
Copy link
Contributor Author

@jreback - should be good now!

@jreback jreback added this to the 0.24.0 milestone Jan 15, 2019
@jreback jreback merged commit f512a61 into pandas-dev:master Jan 15, 2019
@jreback
Copy link
Contributor

jreback commented Jan 15, 2019

thanks @chris-b1

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Groupby Get Groups fails with "OverflowError: value too large to convert to int"
5 participants