Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

left_join with by.x and by.y #62

Closed
Mrostgaard opened this issue Sep 23, 2021 · 7 comments
Closed

left_join with by.x and by.y #62

Mrostgaard opened this issue Sep 23, 2021 · 7 comments
Labels
doc Improvements or additions to documentation

Comments

@Mrostgaard
Copy link

Hello

So I have some R code that looks like this:

new_df = df1 %>% merge(df2, by.x = "col", by.y = "col2", all.x = TRUE)
I'm trying to merge left with two columns. Both have by.y value (col2) but only df1 have col1 value.
When I try to do it the datar way like this:
new_df = df1 >> left_join(df2, by =["col1", "col2"])

I get the error:
KeyError: 'col1'

Am I doing something wrong? Or is it not possible to do by.x, by.y like in R?

When doing it the pandas way like:

new_df = pd.merge(df1, df2, left_on = "col1", right_on = "col2", how="left")

It returns col2_y and col2_x, which I'm not interested in. This is not a problem in the R code

@pwwang
Copy link
Owner

pwwang commented Sep 23, 2021

You need to use dict for the column mapping:

image

See also examples at:

https://pwwang.github.io/datar/notebooks/mutate-joins/

@pwwang pwwang added the doc Improvements or additions to documentation label Sep 23, 2021
@Mrostgaard
Copy link
Author

Mrostgaard commented Sep 23, 2021

Doing that returns KeyError {'col2', 'col1'} (in that order)
Adding keep=True return: KeyError: 'col1_y'

@pwwang
Copy link
Owner

pwwang commented Sep 23, 2021

Do you have an example dataset that I can investigate?

@Mrostgaard
Copy link
Author

@pwwang I'm sorry, but I'm unfortunately not at liberty to share the datasets. My question was mostly aimed at founding out whether or not, there was something in the documentation I had missed. I will just have to see if I can do it some other way 🙂 Once again I appreciate your insanely quick response time

@pwwang
Copy link
Owner

pwwang commented Sep 23, 2021

No problem at all. You can also play with that notebook on binder to see if it works in your case.

@Mrostgaard
Copy link
Author

Thank you for that

@pwwang
Copy link
Owner

pwwang commented Sep 26, 2021

I assume this is solved. If problem still persists, feel free to reopen.

@pwwang pwwang closed this as completed Sep 26, 2021
@pwwang pwwang mentioned this issue Oct 5, 2021
pwwang added a commit that referenced this issue Oct 5, 2021
* 🔧 Add metadata for datasets

* 📝 Mention datar-cli in README

* 🔊 Send logs to stderr

* 📌Pin depedency verions; 🚨 Switch to flake8;

* 🔖 0.5.2

* 🔊 Update CHANGELOG

* ⚡️ Optimize dplyr.arrange when data are series from the df

* 🔧 Update coveragerc

* 🐛 Fix #63

* 📝 Update doc for argument `by` for join functions (#62)

* 🐛 Fix #65

* 🔖 0.5.3

* 🔥 Remove prints from tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants