# Query DSL Lab

Remember that before launching the notebook you have to load the environmental variables needed:

```
source bash/opensearch
```

**You will have to edit the file and set the remote address of your opensearch instance.**

## Loading the datasets

In [7]:
%%bash

cd ${DATASET_LOCATION}/movielens-latest-small

# movies
curl --silent --insecure -u ${OPENSEARCH_USER}:${OPENSEARCH_PASSWD} \
    -X DELETE \
    "https://${OPENSEARCH_HOST}:${OPENSEARCH_PORT}/movies?pretty"
    
curl --silent --insecure -u ${OPENSEARCH_USER}:${OPENSEARCH_PASSWD} \
    -X PUT -H "Content-Type: application/json" \
    --data-binary @movies-index.json \
    "https://${OPENSEARCH_HOST}:${OPENSEARCH_PORT}/movies"
    
curl --silent --insecure -u ${OPENSEARCH_USER}:${OPENSEARCH_PASSWD} \
    -X PUT -H "Content-Type: application/json" \
    --data-binary @movies-bulk.json \
    "https://${OPENSEARCH_HOST}:${OPENSEARCH_PORT}/_bulk" > /dev/null

{
  "acknowledged" : true
}
{"acknowledged":true,"shards_acknowledged":true,"index":"movies"}

In [12]:
%%bash

cd ${DATASET_LOCATION}/movielens-latest-small

# movies-tuned
curl --silent --insecure -u ${OPENSEARCH_USER}:${OPENSEARCH_PASSWD} \
    -X DELETE \
    "https://${OPENSEARCH_HOST}:${OPENSEARCH_PORT}/movies-tuned?pretty"
    
curl --silent --insecure -u ${OPENSEARCH_USER}:${OPENSEARCH_PASSWD} \
    -X PUT -H "Content-Type: application/json" \
    --data-binary @movies-tuned-index.json \
    "https://${OPENSEARCH_HOST}:${OPENSEARCH_PORT}/movies-tuned"
    
curl --silent --insecure -u ${OPENSEARCH_USER}:${OPENSEARCH_PASSWD} \
    -X PUT -H "Content-Type: application/json" \
    --data-binary @movies-tuned-bulk.json \
    "https://${OPENSEARCH_HOST}:${OPENSEARCH_PORT}/_bulk" > /dev/null

{
  "acknowledged" : true
}
{"acknowledged":true,"shards_acknowledged":true,"index":"movies-tuned"}

In [13]:
%%bash

cd ${DATASET_LOCATION}/movielens-latest-small

# ratings
curl --silent --insecure -u ${OPENSEARCH_USER}:${OPENSEARCH_PASSWD} \
    -X DELETE \
    "https://${OPENSEARCH_HOST}:${OPENSEARCH_PORT}/ratings?pretty"
    
curl --silent --insecure -u ${OPENSEARCH_USER}:${OPENSEARCH_PASSWD} \
    -X PUT -H "Content-Type: application/json" \
    --data-binary @ratings-index.json \
    "https://${OPENSEARCH_HOST}:${OPENSEARCH_PORT}/ratings"
    
curl --silent --insecure -u ${OPENSEARCH_USER}:${OPENSEARCH_PASSWD} \
    -X PUT -H "Content-Type: application/json" \
    --data-binary @ratings-bulk.json \
    "https://${OPENSEARCH_HOST}:${OPENSEARCH_PORT}/_bulk" > /dev/null

{
  "acknowledged" : true
}
{"acknowledged":true,"shards_acknowledged":true,"index":"ratings"}

## Let's practice

### Total number of ratings
How many ratins do we have in the `ratings` index?

In [2]:
%%bash

curl --silent --insecure -u ${OPENSEARCH_USER}:${OPENSEARCH_PASSWD} \
    -X GET \
    "https://${OPENSEARCH_HOST}:${OPENSEARCH_PORT}/ratings/_count?pretty"

{
  "count" : 100836,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  }
}


### Star Trek films sorted by year
Using the `movies-tuned` index, Look for Star Trek films sorted by year (most recent first). Show the title and the year of the film.

In [3]:
%%bash

curl --silent --insecure -u ${OPENSEARCH_USER}:${OPENSEARCH_PASSWD} \
    -X GET -H "Content-Type: application/json" \
    "https://${OPENSEARCH_HOST}:${OPENSEARCH_PORT}/movies-tuned/_search?pretty" -d '
{
    "query" : {
        "match_phrase" : {
            "title": "star trek"
        }
    },
    "sort": [
        {"year": {"order": "desc"}}
    ],
    "_source": ["title", "year"]
}'

{
  "took" : 67,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 13,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "movies-tuned",
        "_type" : "_doc",
        "_id" : "135569",
        "_score" : null,
        "_source" : {
          "year" : "2016",
          "title" : "Star Trek Beyond"
        },
        "sort" : [
          1451606400000
        ]
      },
      {
        "_index" : "movies-tuned",
        "_type" : "_doc",
        "_id" : "102445",
        "_score" : null,
        "_source" : {
          "year" : "2013",
          "title" : "Star Trek Into Darkness"
        },
        "sort" : [
          1356998400000
        ]
      },
      {
        "_index" : "movies-tuned",
        "_type" : "_doc",
        "_id" : "68358",
        "_score" : null,
        "_source" : {
          "year" : "2009",
        

### Star Trek films average rating
Using the `ratings` index, compute the average rating of the Star Trek films.

Show the title and the year of the film.

Notice that we have to use the `title.raw` field for sorting, if you try to use the `title` field you will get an error.

In [5]:
%%bash

curl --silent --insecure -u ${OPENSEARCH_USER}:${OPENSEARCH_PASSWD} \
    -X GET -H "Content-Type: application/json" \
    "https://${OPENSEARCH_HOST}:${OPENSEARCH_PORT}/ratings/_search?pretty" -d '
{
    "query" : {
        "match_phrase" : {
            "title": "star trek"
        }
    },
    "aggs": {
        "per_film": {
            "terms": {
                "field": "title.raw"
            },
            "aggs": {
                "avg_rating": {
                    "avg": {
                        "field": "rating"
                    }
                }
            }
        }
    },
    "size": 0
}'

{
  "took" : 16,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 616,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "per_film" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 61,
      "buckets" : [
        {
          "key" : "Star Trek: Generations",
          "doc_count" : 108,
          "avg_rating" : {
            "value" : 3.3425925925925926
          }
        },
        {
          "key" : "Star Trek: First Contact",
          "doc_count" : 91,
          "avg_rating" : {
            "value" : 3.78021978021978
          }
        },
        {
          "key" : "Star Trek II: The Wrath of Khan",
          "doc_count" : 62,
          "avg_rating" : {
            "value" : 3.661290322580645
          }
        },
        {
          "key" : "Star Trek",
          "doc_count" : 59,
     

### Sci-Fi and Action films from 2000 to 2010 not Star Trek and preferably Superman or Batman movies
Using the `movies-tuned` index, look for Sci-Fi and Action films from 2000 to 2010 that are not Star Trek movies and that preferably are Superman or Batman movies.

In [17]:
%%bash

curl --silent --insecure -u ${OPENSEARCH_USER}:${OPENSEARCH_PASSWD} \
    -X GET -H "Content-Type: application/json" \
    "https://${OPENSEARCH_HOST}:${OPENSEARCH_PORT}/movies-tuned/_search?pretty" -d '
{
    "query" : {
        "bool": {
            "must_not": [
                {"match_phrase": {"title": "star trek"}}
            ],
            "should": [
                {"match": {"title": "superman"}},
                {"match": {"title": "batman"}}
            ],
            "filter": [
                {"range": {"year": {"from": 2000, "to": 2010}}},
                {"terms": {"genres": ["Action", "Sci-Fi"]}}
            ]            
        }
    }
}'

{
  "took" : 12,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 741,
      "relation" : "eq"
    },
    "max_score" : 11.802275,
    "hits" : [
      {
        "_index" : "movies-tuned",
        "_type" : "_doc",
        "_id" : "95149",
        "_score" : 11.802275,
        "_source" : {
          "movieId" : 95149,
          "title" : "Superman/Batman: Public Enemies",
          "year" : "2009",
          "genres" : [
            "Action",
            "Animation",
            "Fantasy"
          ]
        }
      },
      {
        "_index" : "movies-tuned",
        "_type" : "_doc",
        "_id" : "46530",
        "_score" : 8.042772,
        "_source" : {
          "movieId" : 46530,
          "title" : "Superman Returns",
          "year" : "2006",
          "genres" : [
            "Action",
            "Adventure",
            "Sci-Fi",
            "IMAX"
       