Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

with_context not present for pl.DataFrame #14775

Closed
h4ck4l1 opened this issue Feb 29, 2024 · 9 comments
Closed

with_context not present for pl.DataFrame #14775

h4ck4l1 opened this issue Feb 29, 2024 · 9 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@h4ck4l1
Copy link

h4ck4l1 commented Feb 29, 2024

Description

Hello polars team,
Big fan. I just observed that lazyframes have the method with_context while eager ones do not.
I just wanted to know reason behind that.
Thankyou!

@h4ck4l1 h4ck4l1 added the enhancement New feature or an improvement of an existing feature label Feb 29, 2024
@h4ck4l1
Copy link
Author

h4ck4l1 commented Feb 29, 2024

It makes no difference to me though as its only a matter of calling .lazy() when using .with_context. I am just curious.

@cmdlineluser
Copy link
Contributor

I think .with_context existed as a workaround for the lack of a horizontal concat on LazyFrames.

That was recently added:

I recall reading in one of the previous issues that .with_context may end up being deprecated now?

@mcrumiller
Copy link
Contributor

mcrumiller commented Feb 29, 2024

I recall reading in one of the previous issues that .with_context may end up being deprecated now?

I don't think it should, with_context allows for aggregate operations on different sized frames, which is nice:

>>> import polars as pl
>>> ldf1 = pl.LazyFrame({"a": [1, 2, 3]})
>>> ldf2 = pl.LazyFrame({"b": [1, 2, 3, 4]})
>>> ldf1.with_context(ldf2).select(pl.all().sum()).collect()
shape: (1, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 6   ┆ 10  │
└─────┴─────┘

@h4ck4l1
Copy link
Author

h4ck4l1 commented Feb 29, 2024

oh....I didn't know that. So it was a workaround.
bro doesn't .with_context have its own unique advantages?
let me give you an example where I am kinda struggling without .with_context on dataframes/lazyframes

If I want to get a mask of different dataframe and directly plot.

A = (
    pl.DataFrame({
        "id":np.arange(1000),
        "some_ranodm_strcol":np.random.choice(a=["a","b","c"],size=1000)
    })
)

B = (
    pl.DataFrame({
        "id":np.random.choice(1000,size=500,replace=False),
        "some_ranodm_strcol":np.random.choice(a=["a","b","c"],size=500)
    })
    .sort(by="id")
)
(
    A
    .lazy()
    .with_context(
        B.lazy().select(pl.all().suffix("_b"))
    )
    .select(
        pl.col("id"),
        pl.col("id").is_in(pl.col("id_b")).cast(pl.Int8).alias("new_col")
    )
    .collect()
    .plot.line(
        x="id",
        y="new_col"
    )
)

Is there a different alternative?. I learnt .with_context in udemy's course and its quite helpful when I want to use other dataframes columns to fast visualize(though they didn't teach that)
Both frames have different sizes. so with .with_context omits the necessity to select/extract series from both the frames and do operations on them.

This is why I was curious as to why eager frames don't have .with_context. Its such a unique and novel feature compared to pandas

@h4ck4l1
Copy link
Author

h4ck4l1 commented Feb 29, 2024

I recall reading in one of the previous issues that .with_context may end up being deprecated now?

I don't think it should, with_context allows for aggregate operations on different sized frames, which is nice:

>>> import polars as pl
>>> ldf1 = pl.LazyFrame({"a": [1, 2, 3]})
>>> ldf2 = pl.LazyFrame({"b": [1, 2, 3, 4]})
>>> ldf1.with_context(ldf2).select(pl.all().sum()).collect()
shape: (1, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 6   ┆ 10  │
└─────┴─────┘

Yes exactly. and .with_context should also be extended to pl.DataFrames if theres no downsides to it

@stinodego
Copy link
Member

LazyFrame.with_context will be deprecated with the release of 1.0.0 (#16860). We will not fix any bugs associated with it and we will not be adding it to DataFrame.

Please use pl.concat(..., how="horizontal") instead to combine DataFrames/LazyFrames into a single context.

@stinodego stinodego closed this as not planned Won't fix, can't repro, duplicate, stale Jun 10, 2024
@h4ck4l1
Copy link
Author

h4ck4l1 commented Jun 25, 2024

I wish I'd atleast know the reason cuz I started(opened) this issue out of curiosity, but thankyou for communicating the changes, now I am sure what to use and what not to.

@cmdlineluser
Copy link
Contributor

#16444 (comment)

Yes! I want to deprecate and remove with_context. It is hacky and buggy design.

@h4ck4l1
Copy link
Author

h4ck4l1 commented Jun 25, 2024

@cmdlineluser Thankyou!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

4 participants