Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: ExtensionArray.searchsorted #24350

Merged
merged 6 commits into from Dec 28, 2018

Conversation

Projects
None yet
5 participants
@TomAugspurger
Copy link
Contributor

commented Dec 19, 2018

No description provided.

@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Dec 19, 2018

@pep8speaks

This comment has been minimized.

Copy link

commented Dec 19, 2018

Hello @TomAugspurger! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 28, 2018 at 19:54 Hours UTC
@codecov

This comment has been minimized.

Copy link

commented Dec 19, 2018

Codecov Report

Merging #24350 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24350      +/-   ##
==========================================
+ Coverage   92.29%   92.29%   +<.01%     
==========================================
  Files         162      162              
  Lines       51806    51816      +10     
==========================================
+ Hits        47815    47825      +10     
  Misses       3991     3991
Flag Coverage Δ
#multiple 90.7% <100%> (ø) ⬆️
#single 42.99% <20%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/base.py 97.46% <100%> (+0.04%) ⬆️
pandas/core/arrays/sparse.py 92.15% <100%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c230f29...58418ab. Read the comment docs.

@codecov

This comment has been minimized.

Copy link

commented Dec 19, 2018

Codecov Report

Merging #24350 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24350      +/-   ##
==========================================
- Coverage    92.3%    92.3%   -0.01%     
==========================================
  Files         165      165              
  Lines       52176    52186      +10     
==========================================
+ Hits        48161    48170       +9     
- Misses       4015     4016       +1
Flag Coverage Δ
#multiple 90.72% <100%> (ø) ⬆️
#single 42.96% <27.27%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/arrays/base.py 98.23% <100%> (+0.03%) ⬆️
pandas/core/base.py 97.7% <100%> (ø) ⬆️
pandas/core/arrays/sparse.py 92.17% <100%> (+0.06%) ⬆️
pandas/util/testing.py 87.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d1b2a52...a91fcec. Read the comment docs.

@TomAugspurger TomAugspurger referenced this pull request Dec 19, 2018

Merged

REF: DatetimeLikeArray #24024

7 of 12 tasks complete
"""
Find indices where elements should be inserted to maintain order.
.. versionadded:: 0.25.0

This comment has been minimized.

Copy link
@jreback

jreback Dec 19, 2018

Contributor

0.24.0

Show resolved Hide resolved pandas/core/arrays/base.py
# 2. Values between the values in the `data_for_sorting` fixture
# 3. Missing values.
arr = self.astype(object)
return arr.searchsorted(v, side=side, sorter=sorter)

This comment has been minimized.

Copy link
@jreback

jreback Dec 19, 2018

Contributor

do we need to astype to object?

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Dec 19, 2018

Author Contributor

We need an ndarray. I suppose we could do np.asarray(self), which will convert to the best possible ndarray? But that could be lossy and so you wouldn't get the correct answer.

So yes, I think we do need object.

This comment has been minimized.

Copy link
@jreback

jreback Dec 23, 2018

Contributor

this is going to be very inefficient, so subclasses would almost certainly need to override. I would rather add this as an abstract method then.

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Dec 26, 2018

Author Contributor

This is similar to other methods. #24433

@@ -505,6 +506,54 @@ def unique(self):
uniques = unique(self.astype(object))
return self._from_sequence(uniques, dtype=self.dtype)

def searchsorted(self, v, side="left", sorter=None):

This comment has been minimized.

Copy link
@topper-123

topper-123 Dec 20, 2018

Contributor

I personally don't like one-letter parameter names. Use values instead?

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Dec 20, 2018

Author Contributor

I don't like it either, but theres value in matching NumPy here.

This comment has been minimized.

Copy link
@topper-123

topper-123 Dec 20, 2018

Contributor

Sorry, meant value, as that is the name used in Series.searchsorted and various other searchsorted implementations in pandas (Categorical.searchsorted at least, haven’t checked all impl.).

I think it`s inconsistent to follow numpy naming here, when pandas’s is better:-)

This comment has been minimized.

Copy link
@jreback

jreback Dec 23, 2018

Contributor

we use value currently in both Index and Series. let's be consistent here (in fact I think we went thru a deprecation cycle on those a while back).

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Dec 26, 2018

Author Contributor

Did you see Stephan's comment in
#24350 (comment)?

Ideally, np.searchsorted(extension_array), would always work. If we do np.searshsorted(v=extension_array) I think a TypeError will be raised.

This comment has been minimized.

Copy link
@jreback

jreback Dec 26, 2018

Contributor

see my comment above; this would be an inconsistency in naming
which is much worse that have np.searchedsorted(ea) not working which is just convenience anyhow

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Dec 28, 2018

Author Contributor

It's not just a convenience. @shoyer do you have thoughts here?

Subclasses implementing __array_function__ will be allowed to override ExtensionArray.searchsorted with the correct function signature, but it'd be nice if things worked out of the box.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Dec 20, 2018

@shoyer

This comment has been minimized.

Copy link
Member

commented Dec 20, 2018

if I were writing an ExtensionArray that also wanted to implement __array_function__, is it OK for the name of positional arguments to differ?

__array_function__ passes on the exact positional and keyword argument from how the function is called. So in practice this would mean that anyone who uses keyword arguments with NumPy's names would get a TypeError, unless you add keyword-only arguments for NumPy's names, too. Using positional arguments would be fine, though.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Dec 20, 2018

@@ -505,6 +506,54 @@ def unique(self):
uniques = unique(self.astype(object))
return self._from_sequence(uniques, dtype=self.dtype)

def searchsorted(self, v, side="left", sorter=None):

This comment has been minimized.

Copy link
@jreback

jreback Dec 23, 2018

Contributor

we use value currently in both Index and Series. let's be consistent here (in fact I think we went thru a deprecation cycle on those a while back).

# 2. Values between the values in the `data_for_sorting` fixture
# 3. Missing values.
arr = self.astype(object)
return arr.searchsorted(v, side=side, sorter=sorter)

This comment has been minimized.

Copy link
@jreback

jreback Dec 23, 2018

Contributor

this is going to be very inefficient, so subclasses would almost certainly need to override. I would rather add this as an abstract method then.

Show resolved Hide resolved pandas/core/arrays/base.py
@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Dec 28, 2018

Changed v to value.

# 2. Values between the values in the `data_for_sorting` fixture
# 3. Missing values.
arr = self.astype(object)
return arr.searchsorted(value, side=side, sorter=sorter)

This comment has been minimized.

Copy link
@jreback

jreback Dec 28, 2018

Contributor

IIRC you have an issue to show a warning here for EA's that don't redefined this I think?

@jreback jreback merged commit 7617ed1 into pandas-dev:master Dec 28, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20181228.64 succeeded
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 28, 2018

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Dec 28, 2018

@TomAugspurger TomAugspurger deleted the TomAugspurger:ea-searchsorted branch Jan 2, 2019

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.