Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In dataflows no other join type besides an inner join works #2241

Closed
siepkes opened this issue Jun 16, 2019 · 8 comments

Comments

Projects
None yet
2 participants
@siepkes
Copy link
Contributor

commented Jun 16, 2019

Describe the bug

Creating a join in a dataflow always seems to be an inner join regardless of the join type you select. For example if you select a left join you still get an inner join.

To Reproduce

Create 2 datasets and import them in a data flow. For example datasets A and B:

Dataset A:

id name
1 Pete
2 John
3 Carry
4 Jenkins

Dataset B (Same ID's as dataset A only there is no record with ID 4):

id occupation
1 Farmer
2 Electrician
3 Rocket scientist
5 Teacher

When selecting dataset A in the dataflow window and creating a left join on the ID column on dataset B I expect the following result:

id name occupation
1 Pete Farmer
2 John Electrician
3 Carry Rocket scientist
4 Jenkins

However the actual result is:

id name occupation
1 Pete Farmer
2 John Electrician
3 Carry Rocket scientist

Screenshots

Select left join:

select_join

Result (an inner join):

results

Desktop:

  • OS: CentOS Linux
  • Browser Chrome 75.0.3770.80 (latest at time of writing)
  • Version 3.2.4 (latest at time of writing)

@metatron-app metatron-app deleted a comment from teamsprint Jun 17, 2019

@joohokim1

This comment has been minimized.

Copy link
Contributor

commented Jun 17, 2019

Thank you for your sincere report.

Actually, joins other than inner join were low priority, but I’ve found out it is but that difficult.

Besides, recently I’ve changed the join code to remove potential bugs and speed up, so this is the right time to support all the join types that are shown in the join pop-up window.

I’ll finish this task by the end of June.
Thanks for your help and interest!

@joohokim1 joohokim1 self-assigned this Jun 17, 2019

@joohokim1 joohokim1 added this to the 3.3.0 milestone Jun 17, 2019

@siepkes siepkes changed the title In dataflows no other join type besides in an inner join works In dataflows no other join type besides an inner join works Jun 17, 2019

@siepkes

This comment has been minimized.

Copy link
Contributor Author

commented Jun 17, 2019

That's great to hear! Let me know if I can help.

joohokim1 added a commit that referenced this issue Jun 18, 2019

@joohokim1 joohokim1 referenced this issue Jun 18, 2019

Closed

#2241 implement left, right, full outer joins #2254

1 of 7 tasks complete

joohokim1 added a commit that referenced this issue Jun 18, 2019

joohokim1 added a commit that referenced this issue Jun 18, 2019

@joohokim1

This comment has been minimized.

Copy link
Contributor

commented Jun 19, 2019

@siepkes I've made a PR for this issue (#2255)
I'll be grateful if you double check it (at code level or UI level any).
Some minor UI issues have been reported, but they will be resolved soon.

@siepkes

This comment has been minimized.

Copy link
Contributor Author

commented Jun 19, 2019

Great to hear! I'll take a look at it coming Friday (today and tomorrow I'm on the road).

@siepkes

This comment has been minimized.

Copy link
Contributor Author

commented Jun 20, 2019

@joohokim1 I had some time to spare today and looked at the PR. I created a built based on master and the PR and did some basic tests with left, right and outer joins; It all seems to work fine and as expected!

I'm not really familiar with the inner workings with Metatron so commenting on the implementation itself is somewhat harder for me. However I skimmed over the code in the PR and it looked good to me!

Thanks for your speedy reaction!

ksparknot pushed a commit that referenced this issue Jun 24, 2019

ksparknot pushed a commit that referenced this issue Jun 24, 2019

joohokim1 added a commit that referenced this issue Jun 24, 2019

joohokim1 added a commit that referenced this issue Jun 24, 2019

@joohokim1

This comment has been minimized.

Copy link
Contributor

commented Jun 25, 2019

All comments about lineage above are my mistake. (They are for another issue.)

ksparknot pushed a commit that referenced this issue Jun 27, 2019

ksparknot pushed a commit that referenced this issue Jun 28, 2019

joohokim1 added a commit that referenced this issue Jun 28, 2019

joohokim1 added a commit that referenced this issue Jul 1, 2019

#2241 implement left, right, full outer joins (+ limit control)
* #2241 implement left, right, full outer joins

* #2241 fix a typo

* #2241 add limit control

* #2241 keep the join type selected & use a default selection for right dataset

* #2241 all columns will be selected when right dataset is loaded

* #2241 If the right dataset is selected, it is dynamically excluded

* #2241 choosing the right predicate changes the left predicate

* #2241 bug-fix on stageIdx of PREVIEW action
@joohokim1

This comment has been minimized.

Copy link
Contributor

commented Jul 1, 2019

Works done.

@joohokim1 joohokim1 closed this Jul 1, 2019

@joohokim1

This comment has been minimized.

Copy link
Contributor

commented Jul 1, 2019

Integrated test #1 passed.
A wrong patch has been made in this branch, but this will be fixed very soon by another PR.
Above problem is not related with this issue's essential.

ksparknot pushed a commit that referenced this issue Jul 1, 2019

joohokim1 added a commit that referenced this issue Jul 10, 2019

#2142 lineage view - Metadata Level
* #2142 Add metadata dependency entities

* #2142 add lineage map API

* #2142 refactoring (mainly, rename)

* #2142 fix errors from previous commit

* #2142 Process infinite loop case (circuit dependency)

* #2142 the component shows lineageEdges on UI

* #2142 mark the main node as distinguishable

* #2142 test data injection

* #2142 add loadLineageMap API

* #2241 finish batch load lineage data map (CSV)

* #2241 fix bug on lineage map load

* #2241 reformat code

* #2241 rename confusing names in lineage code

* #2142 make lineages with preparation dataset

* #2142 to match server and client APIs

* #2142 organize diagrams according to the plan

* #2142 added a lineage column view

* #2142 prototype of lineage diagram

* #2142 closeInfo method should be public

* #2142 the button for creating lineage was moved into the more option

* #2142 remove test area and merge to here

joohokim1 added a commit that referenced this issue Jul 15, 2019

#2335 column info in snapshot
* #fn add .gitignore

* #fn apply google java code style

* #2241 bug-fix on outer joins. they cannot be parallelized.

* #2335 save column type info in snapshot enitty

ufoscw added a commit that referenced this issue Jul 16, 2019

#2142 lineage view - Metadata Level
* #2142 Add metadata dependency entities

* #2142 add lineage map API

* #2142 refactoring (mainly, rename)

* #2142 fix errors from previous commit

* #2142 Process infinite loop case (circuit dependency)

* #2142 the component shows lineageEdges on UI

* #2142 mark the main node as distinguishable

* #2142 test data injection

* #2142 add loadLineageMap API

* #2241 finish batch load lineage data map (CSV)

* #2241 fix bug on lineage map load

* #2241 reformat code

* #2241 rename confusing names in lineage code

* #2142 make lineages with preparation dataset

* #2142 to match server and client APIs

* #2142 organize diagrams according to the plan

* #2142 added a lineage column view

* #2142 prototype of lineage diagram

* #2142 closeInfo method should be public

* #2142 the button for creating lineage was moved into the more option

* #2142 remove test area and merge to here

ufoscw added a commit that referenced this issue Jul 16, 2019

#2335 column info in snapshot
* #fn add .gitignore

* #fn apply google java code style

* #2241 bug-fix on outer joins. they cannot be parallelized.

* #2335 save column type info in snapshot enitty
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.