-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible problem in left join matching if distances between two values are identical #65
Comments
This is a problem with the implementation of Lines 137 to 138 in 562df32
So for Lines 144 to 149 in 562df32
That's why in this special case it looks like the algorithm chooses the rightmost match instead the leftmost one. |
Just another clarification: this only happens if the difference |
- Add .cjoinLeft function which avoids/fixes issue #65.
The left join matching should find for each value in
x
the index of the value iny
with the smallest difference (also given that the difference is smallertolerance
).The current implementation of the left join (
MsCoreUtils:::.joinLeft
) seems to perform a non-intuitive match if the distance of a value inx
to two values iny
is the same.Example:
with a
tolerance = 1
we have the following matches:x[1]
(1):NA
x[2]
(3): matches bothy[1]
(3) andy[2]
(4).x[3]
(5): matches bothy[2]
(4) andy[3]
(5).x[4]
(6): matches bothy[3]
(5) andy[4]
(7).x[5]
(8): matchesy[4]
(7).Now, given that we aim to find for each element in
x
the closest element iny
we expect to get:x[1]
-NA
x[2]
-y[1]
x[3]
-y[3]
x[4]
-y[4]
x[5]
-NA
The result of
MsCoreUtils:::.joinLeft(x, y, 1, 0)
is however:i.e. it does not match
x[4]
(6) toy[4]
(7) which would be the first match, but matchesx[5]
(8) toy[4]
(7). While this is not wrong, I would expect from a left join to matchx[4]
toy[4]
, i.e. that it uses the first match for eachx
in case there are ties.The text was updated successfully, but these errors were encountered: