ENH: ExtensionArray.searchsorted #24350

TomAugspurger · 2018-12-19T13:22:05Z

No description provided.

pep8speaks · 2018-12-19T13:22:12Z

Hello @TomAugspurger! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 28, 2018 at 19:54 Hours UTC

codecov · 2018-12-19T13:57:00Z

Codecov Report

Merging #24350 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #24350      +/-   ##
==========================================
+ Coverage   92.29%   92.29%   +<.01%     
==========================================
  Files         162      162              
  Lines       51806    51816      +10     
==========================================
+ Hits        47815    47825      +10     
  Misses       3991     3991

Flag	Coverage Δ
#multiple	`90.7% <100%> (ø)`	⬆️
#single	`42.99% <20%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/arrays/base.py	`97.46% <100%> (+0.04%)`	⬆️
pandas/core/arrays/sparse.py	`92.15% <100%> (+0.06%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c230f29...58418ab. Read the comment docs.

codecov · 2018-12-19T13:57:02Z

Codecov Report

Merging #24350 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #24350      +/-   ##
==========================================
- Coverage    92.3%    92.3%   -0.01%     
==========================================
  Files         165      165              
  Lines       52176    52186      +10     
==========================================
+ Hits        48161    48170       +9     
- Misses       4015     4016       +1

Flag	Coverage Δ
#multiple	`90.72% <100%> (ø)`	⬆️
#single	`42.96% <27.27%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/arrays/base.py	`98.23% <100%> (+0.03%)`	⬆️
pandas/core/base.py	`97.7% <100%> (ø)`	⬆️
pandas/core/arrays/sparse.py	`92.17% <100%> (+0.06%)`	⬆️
pandas/util/testing.py	`87.75% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d1b2a52...a91fcec. Read the comment docs.

jreback · 2018-12-19T14:18:56Z

pandas/core/arrays/base.py

+        """
+        Find indices where elements should be inserted to maintain order.
+
+        .. versionadded:: 0.25.0


pandas/core/arrays/base.py

jreback · 2018-12-19T14:33:31Z

pandas/core/arrays/base.py

+        # 2. Values between the values in the `data_for_sorting` fixture
+        # 3. Missing values.
+        arr = self.astype(object)
+        return arr.searchsorted(v, side=side, sorter=sorter)


do we need to astype to object?

We need an ndarray. I suppose we could do np.asarray(self), which will convert to the best possible ndarray? But that could be lossy and so you wouldn't get the correct answer.

So yes, I think we do need object.

this is going to be very inefficient, so subclasses would almost certainly need to override. I would rather add this as an abstract method then.

This is similar to other methods. #24433

topper-123 · 2018-12-20T11:56:01Z

pandas/core/arrays/base.py

@@ -505,6 +506,54 @@ def unique(self):
        uniques = unique(self.astype(object))
        return self._from_sequence(uniques, dtype=self.dtype)

+    def searchsorted(self, v, side="left", sorter=None):


I personally don't like one-letter parameter names. Use values instead?

I don't like it either, but theres value in matching NumPy here.

Sorry, meant value, as that is the name used in Series.searchsorted and various other searchsorted implementations in pandas (Categorical.searchsorted at least, haven’t checked all impl.).

I think it`s inconsistent to follow numpy naming here, when pandas’s is better:-)

we use value currently in both Index and Series. let's be consistent here (in fact I think we went thru a deprecation cycle on those a while back).

Did you see Stephan's comment in
#24350 (comment)?

Ideally, np.searchsorted(extension_array), would always work. If we do np.searshsorted(v=extension_array) I think a TypeError will be raised.

see my comment above; this would be an inconsistency in naming
which is much worse that have np.searchedsorted(ea) not working which is just convenience anyhow

It's not just a convenience. @shoyer do you have thoughts here?

Subclasses implementing __array_function__ will be allowed to override ExtensionArray.searchsorted with the correct function signature, but it'd be nice if things worked out of the box.

TomAugspurger · 2018-12-20T18:20:35Z

Hmm, I'm not sure what to do then. ExtensionArrays are more at the level of an ndarray than a Series. @shoyer, if I were writing an ExtensionArray that also wanted to implement `__array_function__`, is it OK for the name of positional arguments to differ?

…

On Thu, Dec 20, 2018 at 12:05 PM topper-123 ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pandas/core/arrays/base.py <#24350 (comment)>: > @@ -505,6 +506,54 @@ def unique(self): uniques = unique(self.astype(object)) return self._from_sequence(uniques, dtype=self.dtype) + def searchsorted(self, v, side="left", sorter=None): Sorry, meant value, as that is the name used in Series.searchsorted and various other searchsorted implementations in pandas ( Categorical.searchsorted at least, haven’t checked all impl.). I think it`s inconsistent to follow numpy naming here, when pandas’s is better:-) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#24350 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIsqBJDEwwMDdqbdFzQEupGtUfVF7ks5u69GFgaJpZM4ZaO7Y> .

shoyer · 2018-12-20T19:01:00Z

if I were writing an ExtensionArray that also wanted to implement __array_function__, is it OK for the name of positional arguments to differ?

__array_function__ passes on the exact positional and keyword argument from how the function is called. So in practice this would mean that anyone who uses keyword arguments with NumPy's names would get a TypeError, unless you add keyword-only arguments for NumPy's names, too. Using positional arguments would be fine, though.

TomAugspurger · 2018-12-20T19:12:55Z

Thanks. Given that, it's probably best to follow NumPy's signature exactly wherever possible.

…

On Thu, Dec 20, 2018 at 1:01 PM Stephan Hoyer ***@***.***> wrote: if I were writing an ExtensionArray that also wanted to implement __array_function__, is it OK for the name of positional arguments to differ? __array_function__ passes on the exact positional and keyword argument from how the function is called. So in practice this would mean that anyone who uses keyword arguments with NumPy's names would get a TypeError, unless you add keyword-only arguments for NumPy's names, too. Using positional arguments would be fine, though. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#24350 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIk9xLOo3D-VHj7MD8IcGKlWNfp31ks5u695ygaJpZM4ZaO7Y> .

jreback · 2018-12-23T15:10:17Z

pandas/core/arrays/base.py

@@ -505,6 +506,54 @@ def unique(self):
        uniques = unique(self.astype(object))
        return self._from_sequence(uniques, dtype=self.dtype)

+    def searchsorted(self, v, side="left", sorter=None):


we use value currently in both Index and Series. let's be consistent here (in fact I think we went thru a deprecation cycle on those a while back).

jreback · 2018-12-23T15:13:20Z

pandas/core/arrays/base.py

+        # 2. Values between the values in the `data_for_sorting` fixture
+        # 3. Missing values.
+        arr = self.astype(object)
+        return arr.searchsorted(v, side=side, sorter=sorter)


this is going to be very inefficient, so subclasses would almost certainly need to override. I would rather add this as an abstract method then.

pandas/core/arrays/base.py

TomAugspurger · 2018-12-28T19:55:28Z

Changed v to value.

jreback · 2018-12-28T21:15:06Z

pandas/core/arrays/base.py

+        # 2. Values between the values in the `data_for_sorting` fixture
+        # 3. Missing values.
+        arr = self.astype(object)
+        return arr.searchsorted(value, side=side, sorter=sorter)


IIRC you have an issue to show a warning here for EA's that don't redefined this I think?

jreback · 2018-12-28T21:15:22Z

thanks @TomAugspurger

TomAugspurger · 2018-12-28T21:21:12Z

Yes (for developers). I'm hoping to do that later today.

…

On Fri, Dec 28, 2018 at 3:15 PM Jeff Reback ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pandas/core/arrays/base.py <#24350 (comment)>: > + Returns + ------- + indices : array of ints + Array of insertion points with the same shape as `value`. + + See Also + -------- + numpy.searchsorted : Similar method from NumPy. + """ + # Note: the base tests provided by pandas only test the basics. + # We do not test + # 1. Values outside the range of the `data_for_sorting` fixture + # 2. Values between the values in the `data_for_sorting` fixture + # 3. Missing values. + arr = self.astype(object) + return arr.searchsorted(value, side=side, sorter=sorter) IIRC you have an issue to show a warning here for EA's that don't redefined this I think? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#24350 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHInoAr2DXOO0ZNBQDm03hhR6q3aSrks5u9onegaJpZM4ZaO7Y> .

ENH: ExtensionArray.searchsorted

ac94414

TomAugspurger added the ExtensionArray Extending pandas with custom dtypes or arrays. label Dec 19, 2018

TomAugspurger added this to the 0.24.0 milestone Dec 19, 2018

PR number

58418ab

TomAugspurger mentioned this pull request Dec 19, 2018

REF: DatetimeLikeArray #24024

Merged

12 tasks

32-bit compat

fee9c1a

jreback requested changes Dec 19, 2018

View reviewed changes

updates

ff8bbc3

jreback requested changes Dec 19, 2018

View reviewed changes

topper-123 reviewed Dec 20, 2018

View reviewed changes

jreback requested changes Dec 23, 2018

View reviewed changes

TomAugspurger added 2 commits December 28, 2018 13:53

Merge remote-tracking branch 'upstream/master' into ea-searchsorted

e6adcd9

v -> value

a91fcec

jreback mentioned this pull request Dec 28, 2018

searchsorted, repeat broken off from #24024 #24461

Merged

jreback approved these changes Dec 28, 2018

View reviewed changes

jreback reviewed Dec 28, 2018

View reviewed changes

jreback merged commit 7617ed1 into pandas-dev:master Dec 28, 2018

TomAugspurger deleted the ea-searchsorted branch January 2, 2019 20:17

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

ENH: ExtensionArray.searchsorted (pandas-dev#24350)

e69f635

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

ENH: ExtensionArray.searchsorted (pandas-dev#24350)

3dfcaec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: ExtensionArray.searchsorted #24350

ENH: ExtensionArray.searchsorted #24350

TomAugspurger commented Dec 19, 2018

pep8speaks commented Dec 19, 2018 •

edited

Loading

codecov bot commented Dec 19, 2018

codecov bot commented Dec 19, 2018 •

edited

Loading

jreback Dec 19, 2018

jreback Dec 19, 2018

TomAugspurger Dec 19, 2018

jreback Dec 23, 2018

TomAugspurger Dec 26, 2018

topper-123 Dec 20, 2018

TomAugspurger Dec 20, 2018

topper-123 Dec 20, 2018

jreback Dec 23, 2018

TomAugspurger Dec 26, 2018

jreback Dec 26, 2018

TomAugspurger Dec 28, 2018

TomAugspurger commented Dec 20, 2018 via email

shoyer commented Dec 20, 2018

TomAugspurger commented Dec 20, 2018 via email

jreback Dec 23, 2018

jreback Dec 23, 2018

TomAugspurger commented Dec 28, 2018

jreback Dec 28, 2018

jreback commented Dec 28, 2018

TomAugspurger commented Dec 28, 2018 via email

ENH: ExtensionArray.searchsorted #24350

ENH: ExtensionArray.searchsorted #24350

Conversation

TomAugspurger commented Dec 19, 2018

pep8speaks commented Dec 19, 2018 • edited Loading

Comment last updated on December 28, 2018 at 19:54 Hours UTC

codecov bot commented Dec 19, 2018

Codecov Report

codecov bot commented Dec 19, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Dec 20, 2018 via email

shoyer commented Dec 20, 2018

TomAugspurger commented Dec 20, 2018 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Dec 28, 2018

Choose a reason for hiding this comment

jreback commented Dec 28, 2018

TomAugspurger commented Dec 28, 2018 via email

pep8speaks commented Dec 19, 2018 •

edited

Loading

codecov bot commented Dec 19, 2018 •

edited

Loading