Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different (incorrect) results for sortedMergeInnerJoin vs. hashInnerJoin #43

Closed
puddleglum56 opened this issue Sep 7, 2022 · 5 comments

Comments

@puddleglum56
Copy link

puddleglum56 commented Sep 7, 2022

Hello!

Great library, very helpful to avoid having to implement SQL joins myself in Typescript. Smaller and easier to work with than AlaSQL.

Recently I was writing some tests for some joins I am doing with this library (installed via NPM) and found that sortedMergeInnerJoin is giving incorrect results.

Here is the code to reproduce:

const leftAccessor = (row: any) => {
        return row["id"];
};

const rightAccessor = (row: any) => {
        return row["id];
};

const left = [
    { id: 1, name: "John", sameColumn: "left", leftId: 1 },
    { id: 2, name: "Jane", sameColumn: "left", leftId: 2 },
    { id: "left", name: "Left", sameColumn: "left", leftId: "left" },
];

const right = [
    { id: 1, age: 20, sameColumn: "right", rightId: 1 },
    { id: 2, age: 30, sameColumn: "right", rightId: 2 },
    { id: "right", age: 1000, sameColumn: "right", rightId: "right" },
];

const results = sortedMergeInnerJoin(left, leftAccessor, right, rightAccessor);

This is producing:

    [
      {
        id: 1,
        name: 'John',
        sameColumn: 'right',
        leftId: 1,
        age: 20,
        rightId: 1
      },
      {
        id: 2,
        name: 'Left',
        sameColumn: 'right',
        leftId: 'left',
        age: 30,
        rightId: 2
      }
    ]

, instead of expected:

    [
      {
        id: 1,
        name: 'John',
        sameColumn: 'right',
        leftId: 1,
        age: 20,
        rightId: 1
      },
      {
        id: 2,
        name: 'Jane',
        sameColumn: 'right',
        leftId: 2,
        age: 30,
        rightId: 2
      }
    ]

It looks like the expected join is returned by hashInnerJoin.

@mtraynham
Copy link
Owner

mtraynham commented Sep 7, 2022

Hey @puddleglum56, thanks for the kind words and taking an interest in the library!

I ran a test locally to check. If I'm reading this right, these two rows are producing the unexpected output.

{ id: 2, name: "Jane", sameColumn: "left", leftId: 2 }

{ id: "left", name: "Left", sameColumn: "left", leftId: "left" }

You're right, I threw together a test and it definitely errors.

describe('#43', () => {
    const leftAccessor = (row: any) => {
        return row["id"];
    };
    const rightAccessor = (row: any) => {
        return row["id"];
    };
    const merger = (l: any, r: any): any => assign({}, l, r)
    const left = [
        { id: 1, name: "John", sameColumn: "left", leftId: 1 },
        { id: 2, name: "Jane", sameColumn: "left", leftId: 2 },
        { id: "left", name: "Left", sameColumn: "left", leftId: "left" },
    ];
    const right = [
        { id: 1, age: 20, sameColumn: "right", rightId: 1 },
        { id: 2, age: 30, sameColumn: "right", rightId: 2 },
        { id: "right", age: 1000, sameColumn: "right", rightId: "right" },
    ];
    const result = joins.sortedMergeInnerJoin(left, leftAccessor, right, rightAccessor, merger);
    const expected = [
        {
            id: 1,
            name: 'John',
            sameColumn: 'right',
            leftId: 1,
            age: 20,
            rightId: 1
        },
        {
            id: 2,
            name: 'Jane',
            sameColumn: 'right',
            leftId: 2,
            age: 30,
            rightId: 2
        }
    ];
    it('should match the expected output', () =>
        expect(result).toEqual(expected));
});
Failures:
1) #43 should match the expected output
  Message:
    Expected $[1].name = 'Left' to equal 'Jane'.
    Expected $[1].leftId = 'left' to equal 2.
  Stack:
        at <Jasmine>
        at UserContext.<anonymous> (lodash-joins/lib/index.spec.ts:260:24)
        at <Jasmine>

I wonder if this has to do with the sort or comparison of the id field producing the incorrect comparisons. If I switch the *Accessor functions to using string variants, so all the id fields are the same type, it works.

    const leftAccessor = (row: any) => {
        return `${row["id"]}`;
    };
    const rightAccessor = (row: any) => {
        return `${row["id"]}`;
    };

I'll look into a bit further, but good find!

@mtraynham
Copy link
Owner

mtraynham commented Sep 7, 2022

I see why, it's caused by the mixture of numbers and strings as the key accessor. Comparisons on the keys are done using < or >, https://github.com/mtraynham/lodash-joins/blob/master/lib/sortedMerge/sortedMergeInnerJoin.ts#L27-L30. If both are false, it merges.

If I remember correctly, the expectation instead was using natural comparisons, if < is false and > is false, than it must be true, but that's not really the case for 2 < "left" or 2 > "left", where both produce false 😅 .

Let me think about a better way to perform comparisons when the types aren't the same. For now you could update the accessor to return strings always.

    const leftAccessor = (row: any) => {
        return `${row["id"]}`;
    };
    const rightAccessor = (row: any) => {
        return `${row["id"]}`;
    };

The hash implementation works because it puts the keys into a dictionary, which coerces them to strings. The nested loop implementation also works because the 2 != 'left'. This case stands out because there's no natural sort for number vs string.

@mtraynham
Copy link
Owner

mtraynham commented Sep 8, 2022

Interesting as well, Lodash's sortBy function is used here to sort by the accessor prior to comparison. Seems if you return different types there, it also breaks the sort.

> _.sortBy([{'id': 'b'}, {'id': 'c'}, {'id': 'a'}], 'id')
[ { id: 'a' }, { id: 'b' }, { id: 'c' } ]
> _.sortBy([{'id': 'b'}, {'id': 'c'}, {'id': 2}, {'id': 'a'}], 'id')
[ { id: 'b' }, { id: 'c' }, { id: 2 }, { id: 'a' } ]

Notice the last example has unchanged order.


Maybe converting the keys to strings and using localeCompare for both the sort and comparisons would work, but then I'm wondering if it's worth the change.

Seems like you would want your keys to be the same domain/type. Is there a reason they aren't or a reason they can't be, to help produce a determinate sort?

@puddleglum56
Copy link
Author

Hi @mtraynham,

Thanks so much for looking into this. Sorry for the late reply.

Ah, what you found makes sense! It hadn't even occurred to me that I was using different key types, it was just a random selection to help me write the tests.

To answer your question, for my use case there is no reason to expect keys of different types. It seems that this is an extreme edge case that wouldn't really make sense in the real world. I will simply update my tests and switch back to sortedMerge join functions.

Thank you again for your time and effort!

@mtraynham
Copy link
Owner

mtraynham commented Sep 8, 2022

Not a problem! I'll retain what's currently implemented, and added a caveat to the readme about this finding, https://github.com/mtraynham/lodash-joins#implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants