Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Make Series.searchsorted return a scalar, when supplied a scalar #23801

Merged
merged 4 commits into from Dec 21, 2018

Conversation

Projects
None yet
5 participants
@topper-123
Copy link
Contributor

commented Nov 20, 2018

  • xref #23466 & #22034
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Series.searchsorted returns the wrong shape for scalar input. Numpy arrays and all other array types return a scalar if the input is a scalar, but Series does not.

For example:

>>> import numpy as np
>>> np.array([1, 2, 3]).searchsorted(1)
0
>>> np.array([1, 2, 3]).searchsorted([1])
array([0])
>>> import pandas as pd
>>> d = pd.date_range('2018', periods=4)
>>> d.searchsorted(d[0])
0
>>> d.searchsorted(d[:1])
array([0])

>>> n = 100_000
>>> s = pd.Series(([0]*n + [1]*n + + [2]*n))
>>> s.searchsorted(1)
array([100000], dtype=int32)  # master
100000  # this PR. Scalar input should lead to scalar output
>>> s.searchsorted([1])
array([100000], dtype=int32)  # master and this PR
@pep8speaks

This comment has been minimized.

Copy link

commented Nov 20, 2018

Hello @topper-123! Thanks for updating the PR.

Comment last updated on November 20, 2018 at 00:41 Hours UTC

@topper-123 topper-123 force-pushed the topper-123:Series.searchsorted_api branch from 9323f92 to ef5c259 Nov 20, 2018

@@ -27,7 +27,8 @@
is_bool, is_categorical_dtype, is_datetime64_dtype, is_datetime64tz_dtype,
is_datetimelike_v_numeric, is_datetimelike_v_object,
is_extension_array_dtype, is_interval_dtype, is_list_like, is_number,
is_period_dtype, is_sequence, is_timedelta64_dtype, needs_i8_conversion)
is_period_dtype, is_scalar, is_sequence, is_timedelta64_dtype,
needs_i8_conversion) # noqa

This comment has been minimized.

Copy link
@jreback

jreback Nov 20, 2018

Contributor

why did you change this?

This comment has been minimized.

Copy link
@topper-123

topper-123 Nov 20, 2018

Author Contributor

´´is_scalar`` wasn't in testing, I need it in my tests and think it's helpful to have in this common testing module.

I need to to for scalar because assert np.array([2]) == 2 passes without an AssertionError...

This comment has been minimized.

Copy link
@topper-123

topper-123 Nov 20, 2018

Author Contributor

Though I do get an flake8 error for unused import. Trying to resolve that...

@@ -1008,6 +1008,7 @@ Other API Changes
- Slicing a single row of a DataFrame with multiple ExtensionArrays of the same type now preserves the dtype, rather than coercing to object (:issue:`22784`)
- :class:`DateOffset` attribute `_cacheable` and method `_should_cache` have been removed (:issue:`23118`)
- Comparing :class:`Timedelta` to be less or greater than unknown types now raises a ``TypeError`` instead of returning ``False`` (:issue:`20829`)
- :meth:`Series.searchsorted`, when supplied a scalar value to search for, now returns a scalar instead of an array (:issue:`xxxxx`).

This comment has been minimized.

Copy link
@jreback

jreback Nov 20, 2018

Contributor

use this PR number

same shape as `value`.
.. versionchanged :: 0.24.0
Ìf `value`is a scalar, an int is now always returned.

This comment has been minimized.

Copy link
@jschendel

jschendel Nov 20, 2018

Member
  • "Ìf" -- > "If"
  • missing a space between "`value`" and "is"

@topper-123 topper-123 force-pushed the topper-123:Series.searchsorted_api branch 2 times, most recently from c9b3c91 to e998d7c Nov 20, 2018

@@ -86,9 +86,11 @@ def test_searchsorted(self):
# Searching for single item argument, side='left' (default)
res_cat = c1.searchsorted('apple')
assert res_cat == 2
assert tm.is_scalar(res_cat)

This comment has been minimized.

Copy link
@topper-123

topper-123 Nov 20, 2018

Author Contributor

tm.is_scalar guards against res_cat being an array, as assert np.array(...) == 2 would always pass the assertion test, as long as the array is non-empty.

Another way would be to to do assert (res_cat == 2) is True.

@topper-123

This comment has been minimized.

Copy link
Contributor Author

commented Nov 20, 2018

TheTravis failure is unrelated (hypothesis related)

>   @given(index=indices(max_length=5), num_columns=integers(0, 5))
>   def test_frequency_is_original(self, index, num_columns):
pandas/tests/frame/test_apply.py:1160: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../miniconda3/envs/pandas/lib/python2.7/site-packages/hypothesis/core.py:638: in execute
    ) % (test.__name__, text_repr[0],))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <hypothesis.core.StateForActualGivenExecution object at 0x7f2c035ae050>
message = "Hypothesis test_frequency_is_original(self=<pandas.tests.frame.test_apply.TestDataFrameAggregate instance at 0x7f2c3f...', freq=None), num_columns=3) produces unreliable results: Falsified on the first call but did not on a subsequent one"
    def __flaky(self, message):
        if len(self.falsifying_examples) <= 1:
>           raise Flaky(message)
E           Flaky: Hypothesis test_frequency_is_original(self=<pandas.tests.frame.test_apply.TestDataFrameAggregate instance at 0x7f2c3f1955a8>, index=DatetimeIndex(['2237-05-12', '2237-05-13', '2237-05-14', '2237-05-15'], dtype='datetime64[ns]', freq=None), num_columns=3) produces unreliable results: Falsified on the first call but did not on a subsequent one
../../../miniconda3/envs/pandas/lib/python2.7/site-packages/hypothesis/core.py:884: Flaky
@@ -182,22 +183,28 @@ def test_searchsorted_monotonic(indices):
# test searchsorted only for increasing
if indices.is_monotonic_increasing:

This comment has been minimized.

Copy link
@jreback

jreback Nov 20, 2018

Contributor

pls don't import is_scalar from testing, import it from the canonical location

pandas.api.types.is_scalar.

@@ -28,6 +28,7 @@
is_datetimelike_v_numeric, is_datetimelike_v_object,
is_extension_array_dtype, is_interval_dtype, is_list_like, is_number,
is_period_dtype, is_sequence, is_timedelta64_dtype, needs_i8_conversion)
from pandas.core.dtypes.common import is_scalar # noqa: F401

This comment has been minimized.

Copy link
@jreback

jreback Nov 20, 2018

Contributor

don't add this

This comment has been minimized.

Copy link
@jreback

jreback Nov 20, 2018

Contributor

these are internally used imports. this is not exporting these functions.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 9, 2018

@topper-123 happy to have this if it can be done in a reasonbale way. pls update or close.

@topper-123 topper-123 force-pushed the topper-123:Series.searchsorted_api branch from e998d7c to 9d74e8c Dec 9, 2018

@codecov

This comment has been minimized.

Copy link

commented Dec 9, 2018

Codecov Report

Merging #23801 into master will increase coverage by 0.07%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23801      +/-   ##
==========================================
+ Coverage   92.21%   92.28%   +0.07%     
==========================================
  Files         162      161       -1     
  Lines       51723    51459     -264     
==========================================
- Hits        47695    47491     -204     
+ Misses       4028     3968      -60
Flag Coverage Δ
#multiple 90.68% <100%> (+0.07%) ⬆️
#single 42.31% <33.33%> (-0.73%) ⬇️
Impacted Files Coverage Δ
pandas/core/base.py 97.61% <ø> (-0.03%) ⬇️
pandas/core/series.py 93.68% <100%> (-0.02%) ⬇️
pandas/util/testing.py 86.05% <100%> (-1.36%) ⬇️
pandas/core/sparse/scipy_sparse.py 97.05% <0%> (-2.95%) ⬇️
pandas/io/common.py 70.54% <0%> (-1.56%) ⬇️
pandas/io/html.py 91.2% <0%> (-1.47%) ⬇️
pandas/core/dtypes/common.py 94.37% <0%> (-1.26%) ⬇️
pandas/core/indexes/timedeltas.py 89.26% <0%> (-1.16%) ⬇️
pandas/io/formats/html.py 97.68% <0%> (-0.93%) ⬇️
... and 89 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 77278ba...9d74e8c. Read the comment docs.

@codecov

This comment has been minimized.

Copy link

commented Dec 9, 2018

Codecov Report

Merging #23801 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23801      +/-   ##
==========================================
+ Coverage   92.29%   92.29%   +<.01%     
==========================================
  Files         162      162              
  Lines       51845    51846       +1     
==========================================
+ Hits        47852    47853       +1     
  Misses       3993     3993
Flag Coverage Δ
#multiple 90.7% <100%> (ø) ⬆️
#single 42.97% <16.66%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/series.py 93.71% <100%> (ø) ⬆️
pandas/core/base.py 97.66% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04a0eac...c09a411. Read the comment docs.

@topper-123 topper-123 force-pushed the topper-123:Series.searchsorted_api branch from 9d74e8c to 49bcca5 Dec 10, 2018

@topper-123

This comment has been minimized.

Copy link
Contributor Author

commented Dec 10, 2018

I've updated the PR.

@topper-123

This comment has been minimized.

Copy link
Contributor Author

commented Dec 13, 2018

Is thia ok? This would make the searchsorted method for all of the pandas objects return scalars for scalar inputs and make the API consistent.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 15, 2018

this PR looked ok, merge master.

@topper-123 topper-123 force-pushed the topper-123:Series.searchsorted_api branch from 49bcca5 to 4b31c78 Dec 16, 2018

@topper-123

This comment has been minimized.

Copy link
Contributor Author

commented Dec 16, 2018

Rebased.

Array of insertion points with the same shape as `value`.
int or array of ints
A scalar or array of insertion points with the
same shape as `value`.

This comment has been minimized.

Copy link
@jreback

jreback Dec 17, 2018

Contributor

@datapythonista is this the correct format?

This comment has been minimized.

Copy link
@datapythonista

datapythonista Dec 18, 2018

Member

looks good, but can you use int or array of int (I'd like to parse at some point the types from these, so I prefer the type name int over ints)

@jreback jreback added this to the 0.24.0 milestone Dec 17, 2018

Array of insertion points with the same shape as `value`.
int or array of ints
A scalar or array of insertion points with the
same shape as `value`.

This comment has been minimized.

Copy link
@datapythonista

datapythonista Dec 18, 2018

Member

looks good, but can you use int or array of int (I'd like to parse at some point the types from these, so I prefer the type name int over ints)

value : array_like
Values to insert into `self`.
value : scalar or array_like
Value(s) to insert into `self`.t

This comment has been minimized.

Copy link
@datapythonista

datapythonista Dec 18, 2018

Member
Suggested change
Value(s) to insert into `self`.t
Value(s) to insert into ``self.t``.

I think render the whole self.t as code makes more sense (also, we end parameter descriptions with a period).

This comment has been minimized.

Copy link
@topper-123

topper-123 Dec 18, 2018

Author Contributor

I think this is actually better: Value(s) to search for.

As we're nor actually inserting, but searching?

@topper-123 topper-123 force-pushed the topper-123:Series.searchsorted_api branch from b7e502e to 27af08e Dec 18, 2018

@topper-123 topper-123 force-pushed the topper-123:Series.searchsorted_api branch 2 times, most recently from 2545f4a to 8f9c917 Dec 21, 2018

@topper-123 topper-123 force-pushed the topper-123:Series.searchsorted_api branch from 8f9c917 to c09a411 Dec 21, 2018

@jreback jreback merged commit 8c58817 into pandas-dev:master Dec 21, 2018

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20181221.18 succeeded
Details

@topper-123 topper-123 deleted the topper-123:Series.searchsorted_api branch Dec 21, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.