-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More general joining #68
Comments
Nice work! I remember I had trouble getting joins to work on multiple
columns. I ended up computing a "key" column which was a string
concatenation of three other columns, then joining on that. Perhaps this PR
will make this workaround unnecessary.
…On Tue, 4 Sep 2018 07:47 Jan Kislinger, ***@***.***> wrote:
I have fixed a typo (right_in -> right_on) and reversed logic in one if
statement in function that creates join parameters. See the changes here:
master...jankislinger:fix-join-multiple-by
<master...jankislinger:fix-join-multiple-by>
Now it can be used to join tables on columns with different names:
import pandas as pd
from dfply import *
a = pd.DataFrame({
'x1': ['A', 'B', 'C'],
'x2': [1, 2, 3]
})
b = pd.DataFrame({
'x4': ['A', 'B', 'D'],
'x3': [True, False, True]
})
a >> inner_join(b, by=('x1', 'x4'))
It would be also convenient to be able to use multiple by statements. For
example expression
a >> inner_join(b, by=['x1', ('x2', 'x3')])
could be used as
a.merge(b, left_on=['x1', 'x2'], right_on=['x1', 'x3'])
If you agree I would modify the code and create a PR.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#68>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABOypD5n48_Si53piWQPApCUMiqCARs5ks5uXiHfgaJpZM4WYYqg>
.
|
The only issue I can see is if you use by as list of length 2, e.g. List (two columns to join by, same names in bot data frames):
Tuple (single column, different names):
|
By convention, by=['x1', 'x2'] would use both columns from both tables.
If you want to use column x1 from table 1 and x2 from table 2, then create
a common column by setting x1 = x2 in table 2.
…On Tue, 4 Sep 2018 08:25 Jan Kislinger, ***@***.***> wrote:
The only issue I can see is if you use by as list of length 2, e.g. by=['x1',
'x2']. It is not clear whether you want to use x1 from left table and x2
from the right one or both columns from both tables. I would suggest using
list for multiple columns and tuple for different names of the same column.
The third option is to use dictionary (which is btw the closest to the
implementation in dplyr).
List (two columns to join by, same names in bot data frames):
a >> inner_join(b, by=['x1', 'x2'])
a.merge(b, left_on=['x1', 'x2'], right_on=['x1', 'x2'])
Tuple (single column, different names):
a >> inner_join(b, by=('x1', 'x2'))
a.merge(b, left_on='x1', right_on='x2')
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#68 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABOypJKRSeCk2cH8zRj9UqEtFs4SPNKpks5uXirCgaJpZM4WYYqg>
.
|
I agree that See this commit: |
Cool thank you. Will merge the PR now. |
I have fixed a typo (
right_in
->right_on
) and reversed logic in one if statement in function that creates join parameters. See the changes here:master...jankislinger:fix-join-multiple-by
Now it can be used to join tables on columns with different names:
It would be also convenient to be able to use multiple by statements. For example expression
could be used as
If you agree I would modify the code and create a PR.
The text was updated successfully, but these errors were encountered: