Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior when creating a new Series column with pl.date_range #7110

Closed
2 tasks done
guzmanlopez opened this issue Feb 22, 2023 · 6 comments · Fixed by #8513
Closed
2 tasks done

Unexpected behavior when creating a new Series column with pl.date_range #7110

guzmanlopez opened this issue Feb 22, 2023 · 6 comments · Fixed by #8513
Assignees
Labels
bug Something isn't working python Related to Python Polars

Comments

@guzmanlopez
Copy link

guzmanlopez commented Feb 22, 2023

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

Creating a new Series column using pl.date_range works well if the DataFrame has a number of rows > 1. If the DataFrame has only one row, a ShapeError error is raised and the new column can not be added.

Note:

  • using pl.arange to create a Series of integers works fine in both scenarios.

  • with the current behavior of pl.date_range, it could limit using it inside a groupby clause if the groups have a count of 1 row.

Reproducible example

from datetime import date

import polars as pl


# Create a dummy dataframe
df = pl.DataFrame(
    {
        "name": ["A", "B", "C"],
        "from": [date(2020, 1, 1), date(2020, 1, 2), date(2020, 1, 3)],
        "to": [date(2020, 1, 2), date(2020, 1, 3), date(2020, 1, 4)],
    }
)

# Check it works as expected with all the rows:
(
    df.with_columns(
        pl.date_range(
            low=pl.col("from"),
            high=pl.col("to"),
            interval="1d",
            lazy=True,
        ).alias("date_range")
    ).explode("date_range")
)

shape: (6, 4)
┌──────┬────────────┬────────────┬────────────┐
│ namefromtodate_range │
│ ------------        │
│ strdatedatedate       │
╞══════╪════════════╪════════════╪════════════╡
│ A2020-01-012020-01-022020-01-01 │
│ A2020-01-012020-01-022020-01-02 │
│ B2020-01-022020-01-032020-01-02 │
│ B2020-01-022020-01-032020-01-03 │
│ C2020-01-032020-01-042020-01-03 │
│ C2020-01-032020-01-042020-01-04 │
└──────┴────────────┴────────────┴────────────┘

# Check that it does not work as expected if we select only one row:
(
    df.limit(1)
    .with_columns(
        pl.date_range(
            low=pl.col("from"),
            high=pl.col("to"),
            interval="1d",
            lazy=True,
        ).alias("date_range")
    )
    .explode("date_range")
)


ShapeError: Could not add column. The Series length 2 differs from the DataFrame height: 1

Expected behavior

# The expected output should be:

shape: (2, 4)
┌──────┬────────────┬────────────┬────────────┐
│ namefromtodate_range │
│ ------------        │
│ strdatedatedate       │
╞══════╪════════════╪════════════╪════════════╡
│ A2020-01-012020-01-022020-01-01 │
│ A2020-01-012020-01-022020-01-02 │
└──────┴────────────┴────────────┴────────────┘

Installed versions

---Version info---
Polars: 0.16.8
Index type: UInt32
Platform: Linux-6.1.12-arch1-1-x86_64-with-arch
Python: 3.7.13 (default, Jun 13 2022, 16:06:34) 
[GCC 12.1.0]
---Optional dependencies---
pyarrow: 10.0.1
pandas: 1.3.5
numpy: 1.21.6
fsspec: 2023.1.0
connectorx: <not installed>
xlsx2csv: <not installed>
deltalake: <not installed>
matplotlib: 3.5.3
@guzmanlopez guzmanlopez added bug Something isn't working python Related to Python Polars labels Feb 22, 2023
@ritchie46
Copy link
Member

I think you should ask a Stackoverflow question on how to get what you want to achieve. This doesn't seem a bug to me.

@guzmanlopez
Copy link
Author

guzmanlopez commented Feb 23, 2023

Hello @ritchie46, I already solved what I wanted to achieve because my dataset has multiple entries, so no problem. But, the unexpected behavior appeared when I was debugging and using just one entry. I would expect the pl.date_range function to behave like pl.arange, store the range object in the DataFrame and be available to explode it in a further step like the following example:

df = pl.DataFrame(
    {
        "name": ["A", "B", "C"],
        "from": [1, 2, 3],
        "to": [3, 4, 5],
    }
)

# This works as expected for pl.arange:
(
    df.limit(1)
    .with_columns(
        pl.arange(
            low=pl.col("from"),
            high=pl.col("to"),
            step=1,
        ).alias("integers_range")
    )
    .explode("integers_range")
)

shape: (2, 4)
┌──────┬──────┬─────┬────────────────┐
│ namefromtointegers_range │
│ ------------            │
│ stri64i64i64            │
╞══════╪══════╪═════╪════════════════╡
│ A131              │
│ A132              │
└──────┴──────┴─────┴────────────────┘

Using pl.date_range should not give a ShapeError, instead, it should return a DataFrame with the date range stored on it independently of the shape of the DataFrame. Both functions return ranges, but pl.date_range does not allow storing the range object in a column if your DataFrame has one row, but if it has more than one row it works, seems unexpected to me.

@ritchie46 ritchie46 self-assigned this Feb 24, 2023
@ritchie46
Copy link
Member

Right, I see the problem now. I will take a look later.

@wangkev
Copy link

wangkev commented Apr 2, 2023

I am also experiencing this issue in 0.16.16. Has there been a fix/workaround?

@MarcoGorelli
Copy link
Collaborator

@wangkev I'm taking a look - for now you could use .implode() after the date_range call

@ritchie46
Copy link
Member

@wangkev I'm taking a look - for now you could use .implode() after the date_range call

@MarcoGorelli we do something naive in the function where we see single value series as a different operation. This is wrong and we probably should break some behavior here. Let me know if you need anything from me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
4 participants