Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas Series.ne operator returning unexpected result against two slices of same Series #19855

Closed
hunterjackson opened this issue Feb 23, 2018 · 1 comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Numeric Operations Arithmetic, Comparison, and Logical operations Usage Question

Comments

@hunterjackson
Copy link

So I have this series of integers shown below

    from pandas import Series
    s = Series([1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

And I want to see how many times the numbers changes over the series, so I compare two slices of the same string to one another.

    s[1:].ne(s[:-1])
    Out[4]: 
    0      True
    1     False
    2     False
    3     False
    4     False
    5     False
    6     False
    7     False
    8     False
    9     False
    10    False
    11    False
    12    False
    13    False
    14    False
    15    False
    16    False
    17    False
    18    False
    19    False
    20    False
    21    False
    22    False
    23    False
    24    False
    25    False
    26    False
    27    False
    28    False
    29    False
    30    False
    31    False
    32    False
    33    False
    34    False
    35    False
    36    False
    37    False
    38    False
    39     True
    dtype: bool

Not only does the output using the Series.ne method not make any logical sense to me but the output is also longer than either of the inputs which is especially confusing.

I think this might be related to this #1134

Apologies if this isn't an issue but I haven't been able to find any satisfactory explanation for this behavior

tl;dr:

Where s is a pandas.Series of int's

[i != s[:-1][idx] for idx, i in enumerate(s[1:])] != s[:-1].ne(s[1:]).tolist()

@jreback
Copy link
Contributor

jreback commented Feb 23, 2018

This was changed here: #13894

use a shorter sequence just to make this fit in a reasonable amount of space

In [38]: s = s[0:5]

In [39]: s
Out[39]: 
0    1
1    2
2    3
3    1
4    2
dtype: int64

In [40]: a = s[1:]

In [41]: b = s[:-1]

In [42]: a
Out[42]: 
1    2
2    3
3    1
4    2
dtype: int64

In [43]: b
Out[43]: 
0    1
1    2
2    3
3    1
dtype: int64

these align, meaning that they are reindexed to the union of the series

In [44]: a, b = a.align(b)

In [45]: a
Out[45]: 
0    NaN
1    2.0
2    3.0
3    1.0
4    2.0
dtype: float64

In [46]: b
Out[46]: 
0    1.0
1    2.0
2    3.0
3    1.0
4    NaN
dtype: float64

The comparison is then clear (== for clarity here)

In [47]: a == b
Out[47]: 
0    False
1     True
2     True
3     True
4    False
dtype: bool

@jreback jreback closed this as completed Feb 23, 2018
@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Numeric Operations Arithmetic, Comparison, and Logical operations labels Feb 23, 2018
@jreback jreback added this to the No action milestone Feb 23, 2018
ctberthiaume added a commit to seaflow-uw/seaflowpy that referenced this issue Sep 24, 2020
After updating to pandas 1.1.0 the bead location summary plots broke. These matplotlib
timeseries plots used to be able to take pandas DateTime objects directly. After the
update it became necessary to explicitly convert to matplotlib dates first with
matplotlib.dates.date2num.

When bead finding input data is above the user specified event limit it will be randomly
sampled to fit with the limit. This rearranges the dataframe index, which had unexpected
effects during rough filtering because of mark_noise and mark_saturated would return
Series with new indexes. Subsequent comparison with the original dataframe would align
along the indexes which now did not match.

Fixed this by 1) resetting the index after sampling and 2) return ndarrays from
mark_noise mark_saturated so that index alignemnt won't happen anyway. When comparing
pandas Series, be aware of your indexes!
e.g. pandas-dev/pandas#19855
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Numeric Operations Arithmetic, Comparison, and Logical operations Usage Question
Projects
None yet
Development

No branches or pull requests

2 participants