Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

date_ranges: ComputeError and different behaviour than int_ranges #11892

Closed
2 tasks done
Julian-J-S opened this issue Oct 20, 2023 · 7 comments · Fixed by #11900
Closed
2 tasks done

date_ranges: ComputeError and different behaviour than int_ranges #11892

Julian-J-S opened this issue Oct 20, 2023 · 7 comments · Fixed by #11900
Assignees
Labels
A-timeseries Area: date/time functionality accepted Ready for implementation bug Something isn't working python Related to Python Polars

Comments

@Julian-J-S
Copy link
Contributor

Checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

(
    pl.DataFrame({
        'asc': [date(2021, 1, 1), date(2021, 1, 2), date(2021, 1, 3)],
        'desc': [date(2021, 1, 3), date(2021, 1, 2), date(2021, 1, 1)],
        'end': date(2021, 1, 3),
    })
    .with_columns(
        asc_end=pl.date_ranges(start='asc', end='end'),

        # <<< ComputeError: `start` and `end` must have the same length >>>
        # asc_end_manual=pl.date_ranges(start='asc', end=date(2021, 1, 3)),

        # <<< ComputeError: `end` must be equal to or greater than `start` >>>
        # asc_desc=pl.date_ranges(start='asc', end='desc'),
        # desc_asc=pl.date_ranges(start='desc', end='asc'),
    )
)
┌────────────┬────────────┬────────────┬───────────────────────────────────┐
│ ascdescendasc_end                           │
│ ------------                               │
│ datedatedatelist[date]                        │
╞════════════╪════════════╪════════════╪═══════════════════════════════════╡
│ 2021-01-012021-01-032021-01-03 ┆ [2021-01-01, 2021-01-02, 2021-01… │
│ 2021-01-022021-01-022021-01-03 ┆ [2021-01-02, 2021-01-03]          │
│ 2021-01-032021-01-012021-01-03 ┆ [2021-01-03]                      │
└────────────┴────────────┴────────────┴───────────────────────────────────┘

# working with int_ranges
(
    pl.DataFrame({
        'asc': [1, 2, 3],
        'desc': [3, 2, 1],
        'end': 3,
    })
    .with_columns(
        asc_end=pl.int_ranges(start='asc', end='end'),
        asc_end_manual=pl.int_ranges(start='asc', end=3),
        asc_desc=pl.int_ranges(start='asc', end='desc'),
        desc_asc=pl.int_ranges(start='desc', end='asc'),
    )
)
┌─────┬──────┬─────┬───────────┬────────────────┬───────────┬───────────┐
│ ascdescendasc_endasc_end_manualasc_descdesc_asc  │
│ ---------------------       │
│ i64i64i64list[i64] ┆ list[i64]      ┆ list[i64] ┆ list[i64] │
╞═════╪══════╪═════╪═══════════╪════════════════╪═══════════╪═══════════╡
│ 133   ┆ [1, 2]    ┆ [1, 2]         ┆ [1, 2]    ┆ []        │
│ 223   ┆ [2]       ┆ [2]            ┆ []        ┆ []        │
│ 313   ┆ []        ┆ []             ┆ []        ┆ [1, 2]    │
└─────┴──────┴─────┴───────────┴────────────────┴───────────┴───────────┘

Log output

No response

Issue description

ComputeError: start and end must have the same length

"manually" specifying start/end should work but is does not

ComputeError: end must be equal to or greater than start

int_ranges behaves different here. I think this should be consistent and using the int version. If end is smaller than start the range should be empty.

Expected behavior

examples above should all work.

  • "manually" specifying start/end should work (according to documentation and like in int_ranges)
  • if end is smaller than start an empty list should be returned (consistent with int_ranges)

Installed versions

Replace this line with the output of pl.show_versions(). Leave the backticks in place.
@Julian-J-S Julian-J-S added bug Something isn't working python Related to Python Polars labels Oct 20, 2023
@stinodego
Copy link
Member

"manually" specifying start/end should work but is does not

Yes it should, this is a bug.

if end is smaller than start an empty list should be returned (consistent with int_ranges)

I'm not so sure about this one though...

@stinodego stinodego added the accepted Ready for implementation label Oct 20, 2023
@stinodego stinodego self-assigned this Oct 20, 2023
@deanm0000
Copy link
Collaborator

I'm not so sure about this one though...

Maybe a strict=False mode. Given that people might want to use this in a when/then/otherwise, I like the idea of preventing ignoring the error like we have with cast

As a work around OP could do something like:

desc_asc=pl.date_ranges(start=(pl.col('desc')-pl.duration(days=1)).clip_max(pl.col('asc')), end='asc', closed='right')

@Julian-J-S
Copy link
Contributor Author

if end is smaller than start an empty list should be returned (consistent with int_ranges)

I'm not so sure about this one though...

I think both are reasonable options but it currently is inconsistent

  • date_ranges: 2022-01-03 to 2022-01-01 -> ERROR
  • int_ranges: 3 to 1 -> []

I personally prefer the int_ranges behaviour. If you want an ascending range and the end is lower than the start then there is no item in that range, ergo an empty list. No need to panic/error. An empty list is a very reasonable result here.

@stinodego
Copy link
Member

stinodego commented Oct 20, 2023

Hm yeah, I think I agree...

@MarcoGorelli what are your thoughts on this one?

@stinodego stinodego added the A-timeseries Area: date/time functionality label Oct 20, 2023
@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Oct 20, 2023

If you want an ascending range and the end is lower than the start then there is no item in that range, ergo an empty list. No need to panic/error. An empty list is a very reasonable result here.

Yup, this is what I'd expect; matches both Python and Rust behaviour with ranges (where start > end), so would be consistent with both of our main underlying languages/syntaxes. I don't think anyone would be surprised by it? 👍

@MarcoGorelli
Copy link
Collaborator

Sounds reasonable - just one thing to note, I think that for start == end and closed='both', the output would be a single-element series, rather than an empty one?

@deanm0000
Copy link
Collaborator

Sounds reasonable - just one thing to note, I think that for start == end and closed='both', the output would be a single-element series, rather than an empty one?

Seems correct to me, that's how it is now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-timeseries Area: date/time functionality accepted Ready for implementation bug Something isn't working python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants