Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: function which returns array pairs(permutations/combinations) #714

Closed
neilkod opened this issue May 3, 2019 · 5 comments

Comments

3 participants
@neilkod
Copy link

commented May 3, 2019

It's useful to aggregate on combinations of items in in an array.

A toy example:

User 1 bought [eggs, potatoes, milk, grapefruit]

We want to aggregate all of the pairs(or triplets, etc) of items bought together.

so ideally, given this input, it would return

[(eggs, potatoes), (eggs, milk), (eggs, grapefruit), (potatoes, milk), (potatoes, grapefruit)...(milk, grapefruit)]

Ideally a flag for ordered vs unordered pairs would also be useful.

A workaround is to UNNEST the items and then self-join the unnested data on a join key(in this case user id), and filter out the cases where item_left = item_right

@martint martint added the enhancement label May 3, 2019

@findepi

This comment has been minimized.

Copy link
Member

commented May 4, 2019

@neilkod is ngrams (https://prestosql.io/docs/current/functions/array.html#ngrams) something similar to what you want?

@martint

This comment has been minimized.

Copy link
Member

commented May 4, 2019

@findepi, no, ngrams produces sequences of adjacent n elements. This issue is about adding something to produce permutations or k-combinations (i.e., n choose k)

@findepi

This comment has been minimized.

Copy link
Member

commented May 4, 2019

@neilkod please see #718

@neilkod

This comment has been minimized.

Copy link
Author

commented May 5, 2019

@findepi thanks for jumping on this so quickly!

re: ngrams - @martint said it best-ngrams is useful for sequences of adjacent n elements - this is most useful when analyzing text. Many people frequently and incorrectly assume that ngrams will return combinations. This might be a function of the lack of descriptive documentation for ngrams()

neilkod added a commit to neilkod/presto that referenced this issue May 5, 2019

add small explanation for ngram
ngrams are adjacent elements of an array, lets update the documentation to reflect what they are. 

Given that ngrams can potentially confuse end users(I see this all the time) and even presto developers(see prestosql#714) I think it's worth briefly explaining what an ngram is
@neilkod

This comment has been minimized.

Copy link
Author

commented May 5, 2019

@findepi i submitted #719 to briefly explain what an ngram is(in the docs) - based on @martint's comments above. I need to go through the CLA process though. Feel free to take this over.

@findepi findepi closed this in #718 May 8, 2019

@findepi findepi added this to the 311 milestone May 8, 2019

@findepi findepi referenced this issue May 8, 2019

Closed

Release notes for 311 #716

5 of 5 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.