Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix wrong row indexing in the result for 'window after filter' for timecontext adjustment #2696

Merged

Conversation

LeeTZ
Copy link
Contributor

@LeeTZ LeeTZ commented Mar 19, 2021

Overview

When time context is adjusted, the result of 'window after filer' is wrong for pandas backend. This PR fixes the bug.

Example:

        window = ibis.trailing_window(ibis.interval(days=3), order_by='time')
        # add a filter before window
        expr = exp[expr['bool_col']]
        expr = expr.mutate(v1=expr['value'].count().over(window))
        result = expr.execute(timecontext=context)

The result looks like:

 index  Unnamed: 0    id bool_col  tinyint_col  smallint_col  ...  date_string_col  string_col           timestamp_col    year month    v1
0   3938.0      3938.0  40.0     True          0.0           0.0  ...         01/05/09           0 2009-01-05 00:40:01.800  2009.0   1.0  11.0
1      NaN         NaN   NaN      NaN          NaN           NaN  ...              NaN         NaN                     NaT     NaN   NaN  12.0
2   3940.0      3940.0  42.0     True          2.0           2.0  ...         01/05/09           2 2009-01-05 00:42:01.810  2009.0   1.0  13.0
3      NaN         NaN   NaN      NaN          NaN           NaN  ...              NaN         NaN                     NaT     NaN   NaN  14.0
4   3942.0      3942.0  44.0     True          4.0           4.0  ...         01/05/09           4 2009-01-05 00:44:01.860  2009.0   1.0  15.0
5      NaN         NaN   NaN      NaN          NaN           NaN  ...              NaN         NaN                     NaT     NaN   NaN  11.0
6   3944.0      3944.0  46.0     True          6.0           6.0  ...         01/05/09           6 2009-01-05 00:46:01.950  2009.0   1.0  12.0
7      NaN         NaN   NaN      NaN          NaN           NaN  ...              NaN         NaN                     NaT     NaN   NaN  13.0
8   3946.0      3946.0  48.0     True          8.0           8.0  ...         01/05/09           8 2009-01-05 00:48:02.800  2009.0   1.0  14.0
9      NaN         NaN   NaN      NaN          NaN           NaN  ...              NaN         NaN                     NaT     NaN   NaN  15.0
10  3948.0      3948.0  50.0     True          0.0           0.0  ...         01/06/09           0 2009-01-06 00:50:02.250  2009.0   1.0  11.0
11     NaN         NaN   NaN      NaN          NaN           NaN  ...              NaN         NaN                     NaT     NaN   NaN  12.0
12  3950.0      3950.0  52.0     True          2.0           2.0  ...         01/06/09           2 2009-01-06 00:52:02.260  2009.0   1.0  13.0
13     NaN         NaN   NaN      NaN          NaN           NaN  ...              NaN         NaN                     NaT     NaN   NaN  14.0
14  3952.0      3952.0  54.0     True          4.0           4.0  ...         01/06/09           4 2009-01-06 00:54:02.310  2009.0   1.0  15.0
15     NaN         NaN   NaN      NaN          NaN           NaN  ...              NaN         NaN                     NaT     NaN   NaN  11.0
16  3954.0      3954.0  56.0     True          6.0           6.0  ...         01/06/09           6 2009-01-06 00:56:02.400  2009.0   1.0  12.0
17     NaN         NaN   NaN      NaN          NaN           NaN  ...              NaN         NaN                     NaT     NaN   NaN  13.0
18  3956.0      3956.0  58.0     True          8.0           8.0  ...         01/06/09           8 2009-01-06 00:58:02.530  2009.0   1.0  14.0
19     NaN         NaN   NaN      NaN          NaN           NaN  ...              NaN         NaN                     NaT     NaN   NaN  15.0
20  3958.0      3958.0  60.0     True          0.0           0.0  ...         01/07/09           0 2009-01-07 01:00:02.700  2009.0   1.0  11.0
21     NaN         NaN   NaN      NaN          NaN           NaN  ...              NaN         NaN                     NaT     NaN   NaN  12.0
22  3960.0      3960.0  62.0     True          2.0           2.0  ...         01/07/09           2 2009-01-07 01:02:02.710  2009.0   1.0  13.0
23     NaN         NaN   NaN      NaN          NaN           NaN  ...              NaN         NaN                     NaT     NaN   NaN  14.0
24  3962.0      3962.0  64.0     True          4.0           4.0  ...         01/07/09           4 2009-01-07 01:04:02.760  2009.0   1.0  15.0
25     NaN         NaN   NaN      NaN          NaN           NaN  ...              NaN         NaN                     NaT     NaN   NaN  11.0
26  3964.0      3964.0  66.0     True          6.0           6.0  ...         01/07/09           6 2009-01-07 01:06:02.850  2009.0   1.0  12.0
27     NaN         NaN   NaN      NaN          NaN           NaN  ...              NaN         NaN                     NaT     NaN   NaN  13.0
28  3966.0      3966.0  68.0     True          8.0           8.0  ...         01/07/09           8 2009-01-07 01:08:02.980  2009.0   1.0  14.0
29     NaN         NaN   NaN      NaN          NaN           NaN  ...              NaN         NaN                     NaT     NaN   NaN  15.0
30  3968.0      3968.0  70.0     True          0.0           0.0  ...         01/08/09           0 2009-01-08 01:10:03.150  2009.0   1.0   NaN
31  3970.0      3970.0  72.0     True          2.0           2.0  ...         01/08/09           2 2009-01-08 01:12:03.160  2009.0   1.0   NaN
32  3972.0      3972.0  74.0     True          4.0           4.0  ...         01/08/09           4 2009-01-08 01:14:03.210  2009.0   1.0   NaN
33  3974.0      3974.0  76.0     True          6.0           6.0  ...         01/08/09           6 2009-01-08 01:16:03.300  2009.0   1.0   NaN
34  3976.0      3976.0  78.0     True          8.0           8.0  ...         01/08/09           8 2009-01-08 01:18:03.430  2009.0   1.0   NaN
35  3978.0      3978.0  80.0     True          0.0           0.0  ...         01/09/09           0 2009-01-09 01:20:03.600  2009.0   1.0   NaN
.....

Note that there are many NaN rows. This is because indexes for the result series are misaligned. Caused by reset_index in time context trimming.

Another minor change in this PR is to rename trim_with_timecontext to be trim_window_series. The method is used to trim the resulting series of window execution only. The method name should not sound that general.

Test

A new test is added for this case in ibis/backends/test/test_timecontext.py

# re-indexing index to count from 0
subset = subset.reset_index(drop=True).reset_index()
subset = subset.reset_index(drop=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need this

@LeeTZ LeeTZ force-pushed the tiezheng/pandas_context_adjustment_nan_rows branch from 04cef7f to 841dd3e Compare March 19, 2021 19:03
@LeeTZ LeeTZ changed the title [WIP] Fix wrong row indexing in the result for 'window after filter' for timecontext adjustment Fix wrong row indexing in the result for 'window after filter' for timecontext adjustment Mar 22, 2021
@icexelloss icexelloss changed the title Fix wrong row indexing in the result for 'window after filter' for timecontext adjustment BUG: Fix wrong row indexing in the result for 'window after filter' for timecontext adjustment Mar 22, 2021
Copy link
Contributor

@icexelloss icexelloss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. can you add a whatsnew note.

@jreback jreback added this to the Next release milestone Mar 22, 2021
@jreback
Copy link
Contributor

jreback commented Mar 22, 2021

@LeeTZ can you merge master one more time

@LeeTZ
Copy link
Contributor Author

LeeTZ commented Mar 23, 2021

@jreback thanks I rebased and checks are green now, also added release note

@jreback
Copy link
Contributor

jreback commented Mar 23, 2021

lgtm @LeeTZ can you merge master and ping on green.

@LeeTZ
Copy link
Contributor Author

LeeTZ commented Mar 23, 2021

@jreback Sure all green now

@jreback jreback merged commit 2fd8cc1 into ibis-project:master Mar 23, 2021
@jreback
Copy link
Contributor

jreback commented Mar 23, 2021

thanks @LeeTZ

@cpcloud cpcloud removed this from the Next release milestone Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pandas The pandas backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants