Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to merge two lists that have a one to many relationship #1451

Closed
bradllj opened this issue Jul 16, 2017 · 3 comments

Comments

Projects
None yet
2 participants
@bradllj
Copy link

commented Jul 16, 2017

I have two files:

File_1

{"business_id": "A",
"full_address": "12345 address city state",
"city": "Charlotte",
"name": "business name A"
}

{"business_id": "B",
"full_address": "12345 address city state",
"city": "Charlotte",
"name": "business name B"
}

File_2 - it has many rows for each business.

{"review_id": "kkkk",
"stars": 1,
"date": "2016-07-11",
"text": "a long comment",
"business_id": "A"}

{"review_id": "llll",
"stars": 2,
"date": "2017-02-22",
"text": "a long comment 2",
"business_id": "A"}

{"review_id": "mmm",
"stars": 2,
"date": "2017-03-22",
"text": "a long comment 3",
"business_id": "B"}

The output should look like

{"business_id": "A",
"full_address": "12345 address city state",
"city": "Charlotte",
"name": "business name A",
"review_id": "kkkk",
"stars": 1,
"date": "2016-07-11",
"text": "a long comment"
}

{"business_id": "A",
"full_address": "12345 address city state",
"city": "Charlotte",
"name": "business name A",
"review_id": "llll",
"stars": 2,
"date": "2017-02-22",
"text": "a long comment 2"
}

{"business_id": "B",
"full_address": "12345 address city state",
"city": "Charlotte",
"name": "business name B",
"review_id": "mmm",
"stars": 2,
"date": "2017-03-22",
"text": "a long comment 3"
}

I am using this code

def hashJoin(a1; a2; field):
  # hash phase:
  (reduce a1[] as $o ({};  . + { ($o | field): $o } )) as $h1
  | (reduce a2[] as $o ({};  . + { ($o | field): $o } )) as $h2
  # join phase:
  | reduce ($h1|keys[]) as $key
      ([]; if $h2|has($key) then . + [ $h1[$key] + $h2[$key] ] else . + $h1[$key] end) ;

hashJoin( $file1; $file2; .business_id)[]

jq -nc --slurpfile file1 file1.json --slurpfile file2 file2.json -f join.jq

But it keeps only joining the last item in the review file like this

"business_id": "A",
"full_address": "12345 address city state",
"city": "Charlotte",
"name": "business name A",
"review_id": "lll",
"stars": 1,
"date": "2017-02-22",
"text": "a long comment 2"
}
{"business_id": "B",
"full_address": "12345 address city state",
"city": "Charlotte",
"name": "business name B",
"review_id": "mmm",
"stars": 2,
"date": "2017-03-22",
"text": "a long comment 2"
}

How do I put a loop in that will iterate all the items in the second file?

@pkoppstein

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2017

I believe the following variant of your program will do what you want:

# leftJoin(a1; a2; field) expects a1 and a2 to be arrays of JSON objects
# and that for each of the objects, the field value is a string.
# A left join is performed on "field".
def leftJoin(a1; a2; field):
  # hash phase:
  (reduce a2[] as $o ({}; . + { ($o | field): $o } )) as $h2
  # join phase:
  | reduce a1[] as $o ([]; . +  [$h2[$o | field] + $o ]) ;

leftJoin( $file2; $file1; .business_id)

@bradllj

This comment has been minimized.

Copy link
Author

commented Jul 16, 2017

Thanks for helping me learn jq. This answer works great.

@pkoppstein

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2017

@brandlij - Thanks. If you intend your question to be a feature request, then please clarify that, otherwise you could CLOSE the issue. For future reference, please ask ordinary usage questions at stackoverflow.com with the jq tag as stackoverflow.com is better suited to the Q&A format.

Also for future reference, if something like leftJoin were to become a builtin, the definition would presumably use the new builtin INDEX/1 and therefore be closer to:

 def leftJoin(a1; a2; field):
   def wrap: if type == "string" then . else tojson end;
   # hash phase:
   (a2 | INDEX(field)) as $h2
   # join phase:
   | reduce a1[] as $o ([]; . +  [$h2[$o | field | wrap] + $o ]) ;

@bradllj bradllj closed this Jul 16, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.