Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request : Compare results of two queries #28639

Closed
tblake84 opened this issue Feb 12, 2018 · 1 comment
Closed

Feature Request : Compare results of two queries #28639

tblake84 opened this issue Feb 12, 2018 · 1 comment
Labels

Comments

@tblake84
Copy link

tblake84 commented Feb 12, 2018

This feature request is a result of a post on Reddit in which r/warkolm asked that I open a feature request.

I would like a simplified way to compare the results of two queries. Basically, what I am trying to do is correlate two data sets using a specific field in data set 'A' to see if it exists in a specific field of data set 'B'. Right now, I am doing this using a pretty rough bash script which pulls the results of my first query into an array and iterates through it querying dataset 'B' for the value. Below is an example of what I am doing in my script with two indices, "twitter" and "facebook" ("twitter" contains user tweet history and "facebook" contains user account information). So here, I am trying to find users that tweeted a specific message, then query 'facebook' to see if they also have an account there and if so, output the information to a logfile. For an analogy, I am trying to do the same exact thing that VLOOKUP does in Excel.

# Check dataset 'tweet' to see what users sent the message "foobar"
users=($(curl -XGET http://localhost:9200/twitter/tweet/_search -d "{
  "_source": ["user"],
  "size": 300,
  "query": {
    "bool": {
      "must": [
        {"match": { "message" : "foobar" }}
      ]
    }
  }
}" | jq . | grep user | awk '{print $2}' | sed 's/\"//g' | sed 's/\,//g' | sort -u))

# For each user that tweeted the message 'foobar', query if their account is active
for user in ${users[@]}; do
fbaccount=($(curl -XGET http://localhost:9200/facebook/accounts/_search -d "{
  \"_source\": [\"status\"],
  \"query\": {
    \"term\": {\"user\" : "${user}"}
  }
}" | jq . | grep status | awk '{print $2}' | sed 's/\"//g' | sed 's/\,//g' | sort -u))

# Print all users who tweeted "foobar" and have a facebook profile to logfile
if [[ ${fbaccount} == "active" ]]; then
  printf "${user}\n" >> logfile
fi  
done
@s1monw s1monw added the discuss label Feb 12, 2018
@tblake84 tblake84 changed the title Compare results of two queries Feature Request : Compare results of two queries Feb 12, 2018
@DaveCTurner
Copy link
Contributor

It sounds like you're after the ability to join on the user field. We've looked into this functionality a number of times before:

Unfortunately the proposed implementations only work for small datasets, so we won't be adding this to Elasticsearch. The best approach is to do this client-side as you are. You may find things like the multi-search API to be useful if you want to reduce the number of round-trips to your cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants