Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to select right table join key from join output #13130

Closed
2 tasks done
cpcloud opened this issue Dec 19, 2023 · 13 comments
Closed
2 tasks done

Unable to select right table join key from join output #13130

cpcloud opened this issue Dec 19, 2023 · 13 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@cpcloud
Copy link

cpcloud commented Dec 19, 2023

Checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

left = pl.DataFrame({"x": [1, 2, 3, 4]}).lazy()
right = pl.DataFrame({"x": [3, 2, 1]}).lazy()

left.join(right, on="x", how="left").select(
    # what goes here to select right["x"]
)

Log output

No response

Issue description

I would like to be able to operate on columns coming from the right side of a join, without having to rename them beforehand.

Expected behavior

I would expect there to be some way to select the right side join key without having to rename.

Installed versions

--------Version info---------
Polars:               0.20.0
Index type:           UInt32
Platform:             Linux-6.1.67-x86_64-with-glibc2.38
Python:               3.10.13 (main, Aug 24 2023, 12:59:26) [GCC 12.3.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          3.0.0
connectorx:           <not installed>
deltalake:            0.14.0
fsspec:               2023.6.0
gevent:               <not installed>
matplotlib:           3.8.2
numpy:                1.26.2
openpyxl:             <not installed>
pandas:               2.1.4
pyarrow:              14.0.1
pydantic:             2.5.2
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           1.4.50
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@cpcloud cpcloud added bug Something isn't working python Related to Python Polars labels Dec 19, 2023
@ritchie46
Copy link
Member

The right key is redundant they will always be the same so polars doesn't create it. Select "x".

@ritchie46 ritchie46 added invalid A bug report that is not actually a bug and removed bug Something isn't working labels Dec 19, 2023
@cpcloud
Copy link
Author

cpcloud commented Dec 19, 2023

The right key is redundant they will always be the same so polars doesn't create it. Select "x".

Is it?

Joins in other systems don't work that way and I think that's only true when the join is an inner join and the predicates are all equality predicates. Here's DuckDB:

D create or replace table t as select unnest([1, 2, 3, 4]) x;
D create or replace table s as select unnest([3, 2, 1]) x;
D select t.x as t_x, s.x as s_x from t left join s using (x);
┌───────┬───────┐
│  t_x  │  s_x  │
│ int32 │ int32 │
├───────┼───────┤
│     1 │     1 │
│     2 │     2 │
│     3 │     3 │
│     4 │       │
└───────┴───────┘

s.x, the right key, has different values than t.x.

@cmdlineluser
Copy link
Contributor

This functionality was also asked about on SO earlier: https://stackoverflow.com/questions/77684593/

User wanted ham_right without having to manually .with_columns

df = pl.DataFrame({
    "foo": [1, 2, 3],
    "bar": [6.0, 7.0, 8.0],
    "ham": ["a", "b", "c"],
})

other_df = pl.DataFrame({
    "apple": ["x", None, "z"],
    "ham": ["a", "b", "d"],
})

df.join(other_df.with_columns(pl.col("ham").name.suffix("_right")), on="ham", how="left")

# shape: (3, 5)
# ┌─────┬─────┬─────┬───────┬───────────┐
# │ foo ┆ bar ┆ ham ┆ apple ┆ ham_right │
# │ --- ┆ --- ┆ --- ┆ ---   ┆ ---       │
# │ i64 ┆ f64 ┆ str ┆ str   ┆ str       │
# ╞═════╪═════╪═════╪═══════╪═══════════╡
# │ 1   ┆ 6.0 ┆ a   ┆ x     ┆ a         │
# │ 2   ┆ 7.0 ┆ b   ┆ null  ┆ b         │
# │ 3   ┆ 8.0 ┆ c   ┆ null  ┆ null      │
# └─────┴─────┴─────┴───────┴───────────┘

@ritchie46
Copy link
Member

Oh, right. I thought about inner join. My bad.

@ritchie46 ritchie46 removed the invalid A bug report that is not actually a bug label Dec 19, 2023
@ritchie46
Copy link
Member

Yeap, good point. We changed this behavior for outer joins. Might need to do that for left as well.

@ritchie46 ritchie46 added enhancement New feature or an improvement of an existing feature and removed python Related to Python Polars labels Dec 19, 2023
@deanm0000
Copy link
Collaborator

deanm0000 commented Dec 19, 2023

join_asof too please.

#6165

@mcrumiller
Copy link
Contributor

If someone is diving in to determine which columns are returned, #9621 is another good one to address. Currently when you join on an expression, you may or may not end up with an unexpected literal column returned.

@mcrumiller
Copy link
Contributor

@cpcloud for the time being, left joins will always either return the left-column's value or a null, so you can proxy this with something like:

import polars as pl
from polars import col, when

left = pl.DataFrame({"x": [1, 2, 3, 4]})
right = pl.DataFrame({"x": [3, 2, 1]})

def join_with_right_columns(left, right, on, *args, **kwargs):
    return left.join(right.with_columns(join_success=pl.lit(True)), on=on, *args, **kwargs).with_columns(
        when(col("join_success").is_not_null()).then(x).alias(f"{x}_right")
        for x in on
    ).drop("join_success")

out = join_with_right_columns(left, right, on="x", how="left")
print(out)
shape: (4, 2)
┌─────┬─────────┐
│ x   ┆ x_right │
│ --- ┆ ---     │
│ i64 ┆ i64     │
╞═════╪═════════╡
│ 1   ┆ 1       │
│ 2   ┆ 2       │
│ 3   ┆ 3       │
│ 4   ┆ null    │
└─────┴─────────┘

@Arengard
Copy link

Hi Phillip just copy the the join column :)

left = pl.DataFrame({"x": [1, 2, 3, 4]}).lazy()
right = pl.DataFrame({"x": [3, 2, 1], "right_x": [3, 2, 1]}).lazy()

left.join(right, on="x", how="left").collect()

@deanm0000
Copy link
Collaborator

deanm0000 commented Dec 20, 2023

import polars as pl
from polars import col, when

left = pl.DataFrame({"x": [1, 2, 3, 4]})
right = pl.DataFrame({"x": [3, 2, 1]})

def join_with_right_columns(left, right, on, *args, **kwargs):
    return left.join(right.with_columns(join_success=pl.lit(True)), on=on, *args, **kwargs).with_columns(
        when(col("join_success").is_not_null()).then(x).alias(f"{x}_right")
        for x in on
    ).drop("join_success")

out = join_with_right_columns(left, right, on="x", how="left")
print(out)
shape: (4, 2)
┌─────┬─────────┐
│ x   ┆ x_right │
│ --- ┆ ---     │
│ i64 ┆ i64     │
╞═════╪═════════╡
│ 1   ┆ 1       │
│ 2   ┆ 2       │
│ 3   ┆ 3       │
│ 4   ┆ null    │
└─────┴─────────┘

Wouldn't it be simpler to just do right.with_columns(pl.col(on).name.suffix("_right")

@stinodego
Copy link
Member

stinodego commented Jan 8, 2024

Closing in favor of #13441

@stinodego stinodego closed this as not planned Won't fix, can't repro, duplicate, stale Jan 8, 2024
@cpcloud
Copy link
Author

cpcloud commented Jan 8, 2024

@stinodego Did you mean to link a different issue?

@stinodego
Copy link
Member

Yes, updated. We will implement a left_coalesce join and the default join will be as you request it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

7 participants