-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
$join()
with join_nulls = TRUE
doesn't produce expected result with DataFrames
#945
Comments
Thanks for reporting! If it does not reproduce in Python, it is a problem with this repository and I would be happy for you to work on it; if you get the same result in Python, we will have to report the problem upstream. |
Thanks! I checked in Python and it returned the same (expected) output for LazyFrames and DataFrames, so it seems to be a problem with r-polars. import polars as pl
df1 = pl.DataFrame(
{
"x": ["John"],
"y": [None]
}
)
df2 = pl.DataFrame(
{
"x": ["John"],
"y": [None],
"value": [1]
}
)
print(
df1
.join(
other = df2,
how = "left",
on = ["x", "y"],
join_nulls = True
)
)
#> shape: (1, 3)
#> ┌──────┬──────┬───────┐
#> │ x ┆ y ┆ value │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ null ┆ i64 │
#> ╞══════╪══════╪═══════╡
#> │ John ┆ null ┆ 1 │
#> └──────┴──────┴───────┘
# lazy
print(
df1.lazy()
.join(
other = df2.lazy(),
how = "left",
on = ["x", "y"],
join_nulls = True
)
.collect()
)
#> shape: (1, 3)
#> ┌──────┬──────┬───────┐
#> │ x ┆ y ┆ value │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ null ┆ i64 │
#> ╞══════╪══════╪═══════╡
#> │ John ┆ null ┆ 1 │
#> └──────┴──────┴───────┘ Created at 2024-03-19 16:43:14 GMT by reprexlite v0.5.0 |
The bug is actually very simple: the user can provide the Lines 1008 to 1029 in d30005b
@george-wood I think this is a good first issue, do you want to fix this and add a test? Feel free to open an incomplete PR if you struggle somewhere and we can see how to carry it to the finish line |
@etiennebacher I'd be happy to. |
I noticed that the |
I was using the
join_nulls
argument in$join()$
, added in #716. Joining LazyFrames withjoin_nulls = TRUE
produces the expected result. However,join_nulls
seems to be ignored when joining DataFrames. I'd be happy to look into this if it might be a good first issue?Created on 2024-03-19 with reprex v2.1.0
There is a test for
join_nulls = TRUE
using LazyFrames, but not DataFrames:r-polars/tests/testthat/test-joins.R
Line 173 in 0f98aa2
The text was updated successfully, but these errors were encountered: