In [None]:
import pymongo
import json

In [None]:
course_cluster_uri = "mongodb://agg-student:agg-password@cluster0-shard-00-00-jxeqq.mongodb.net:27017,cluster0-shard-00-01-jxeqq.mongodb.net:27017,cluster0-shard-00-02-jxeqq.mongodb.net:27017/test?ssl=true&replicaSet=Cluster0-shard-0&authSource=admin"
course_client = pymongo.MongoClient(course_cluster_uri)

In [None]:
movies = course_client['aggregations']['movies']

# Lab: Expression  Composition

## For this lab, you'll be composing expressions together 

#### The dataset for this lab can be downloaded [here](https://s3.amazonaws.com/edu-static.mongodb.com/lessons/coursera/aggregation/movies.json) for upload to your own cluster.

### Prelude

This lab will have you work with data within arrays, a common operation.

Specifically, one of the arrays you'll work with is ``writers``, from the
**movies** collection.

There are times when we want to make sure that the field is an array, and that
it is not empty. We can do this within ``$match``

  `{ "$match": { "writers": { "$elemMatch": { "$exists": true } } }`

However, the entries within ``writers`` presents another problem. A good amount
of entries in ``writers`` look something like the following, where the writer is
attributed with their specific contribution ::

  `"writers" : [ "Vincenzo Cerami (story)", "Roberto Benigni (story)" ]`

But the writer also appears in the ``cast`` array as "Roberto Benigni"!

Give it a look with the following query

In [61]:
result = movies.find_one({"title": "Life Is Beautiful"}, { "_id": 0, "cast": 1, "writers": 1})
print(json.dumps(result, indent=4))

{
    "cast": [
        "Roberto Benigni",
        "Nicoletta Braschi",
        "Giustino Durano",
        "Giorgio Cantarini"
    ],
    "writers": [
        "Vincenzo Cerami (story)",
        "Roberto Benigni (story)"
    ]
}


This presents a problem, since comparing ``"Roberto Benigni"`` to
``"Roberto Benigni (story)"`` will definitely result in a difference.

Thankfully there is a powerful expression to help us, ``$map``. ``$map`` lets us
iterate over an array, element by element, performing some transformation on
each element. The result of that transformation will be returned in the same
place as the original element.

Within ``$map``, the argument to ``input`` can be any expression as long as it
resolves to an array. The argument to ``as`` is the name we want to use to refer
to each element of the array when performing whatever logic we want, surrounding
it with quotes and prepending two `$` signs. The field ``as`` is optional, and if omitted
each element must be referred to as ``"$$this"``

      "writers": {
        "$map": {
          "input": "$writers",
          "as": "writer",
          "in": "$$writer"


``in`` is where the work is peformed. Here, we use the ``$arrayElemAt``
expression, which takes two arguments, the array and the index of the element we
want. We use the ``$split`` expression, splitting the values on ``" ("``.

If the string did not contain the pattern specified, the only modification is it
is wrapped in an array, so ``$arrayElemAt`` will always work

      "writers": "$map": {
        "input": "$writers",
        "as": "writer",
        "in": {
          "$arrayElemAt": [
            {
              "$split": [ "$$writer", " (" ]
            },
            0
          ]
        }
      }
      
Let's see it in action to get a full sense of what it does.

In [62]:
# this stage is provided for you, use it later as well
mapping = {
    "$project": {
        "_id": 0,
        "cast": 1,
        "directors": 1,
        "writers": {
            "$map": {
                "input": "$writers",
                "as": "writer",
                "in": {
                    "$arrayElemAt": [
                        { "$split": ["$$writer", " ("] },
                        0
                    ]
                }
            }
        }
    }
}

In [63]:

result = movies.aggregate([
    {
        "$match": {"title": "Life Is Beautiful"}
    },
    mapping
])
print(json.dumps(list(result), indent=4))

[
    {
        "cast": [
            "Roberto Benigni",
            "Nicoletta Braschi",
            "Giustino Durano",
            "Giorgio Cantarini"
        ],
        "directors": [
            "Roberto Benigni"
        ],
        "writers": [
            "Vincenzo Cerami",
            "Roberto Benigni"
        ]
    }
]


## Question

Let's find how many movies in our **movies** collection are a "labor of love",
where the same person appears in ``cast``, ``directors``, and ``writers``


How many movies are "labors of love"?

To get a count, ensure you add the following to the end of your pipeline list.

In [107]:
counting = {
    "$count": "labors_of_love"
}

In [108]:
movies.aggregate([{"$project":{ "_id": 0,"title":1,"cast": 1,"directors":1,"writers": 1}},{"$limit":20}])


<pymongo.command_cursor.CommandCursor at 0x1a51235b5b0>

The necessary mapping stage is provided for you.

In [109]:
mapping = {
    "$project": {
        "_id": 0,
        "cast": 1,
        "directors": 1,
        "writers": {
            "$map": {
                "input": "$writers",
                "as": "writer",
                "in": {
                    "$arrayElemAt": [
                        { "$split": ["$$writer", " ("] },
                        0
                    ]
                }
            }
        }
    }
}

In [110]:
#Multti-Level matching:
predicate = {"$match": {"writers": {"$elemMatch": {"$exists": True}},"cast":{"$elemMatch":{"$exists": True}},
             "directors": {"$elemMatch": {"$exists": True}}}}

In [111]:
list(movies.aggregate([predicate,{"$limit":1}]))

[{'_id': ObjectId('573a1390f29313caabcd4cf1'),
  'title': 'Ingeborg Holm',
  'year': 1913,
  'runtime': 96,
  'released': datetime.datetime(1913, 10, 27, 0, 0),
  'cast': ['Hilda Borgstr�m',
   'Aron Lindgren',
   'Erik Lindholm',
   'Georg Gr�nroos'],
  'poster': 'http://ia.media-imdb.com/images/M/MV5BMTI5MjYzMTY3Ml5BMl5BanBnXkFtZTcwMzY1NDE2Mw@@._V1_SX300.jpg',
  'plot': "Ingeborg Holm's husband opens up a grocery store and life is on the sunny side for them and their three children. But her husband becomes sick and dies. Ingeborg tries to keep the store, ...",
  'fullplot': "Ingeborg Holm's husband opens up a grocery store and life is on the sunny side for them and their three children. But her husband becomes sick and dies. Ingeborg tries to keep the store, but because of the lazy, wasteful staff she eventually has to close it. With no money left, she has to move to the poor-house and she is separated from her children. Her children are taken care of by foster-parents, but Ingeborg 

In [112]:
#Passing an if statment 
projection = {"$project": {"_id": 0, "directors": 1, "writers": 1, "cast": 1, "cast_directors_writers":
                           {"$cond": {"if": {"$isArray": {"$setIntersection": ["$cast", "$directors", "$writers"] }},\
                        "then": {"$size": {"$setIntersection": ["$cast", "$directors", "$writers"] }}, "else": 0}}}}

In [113]:
list(movies.aggregate([projection,{"$limit":10}]))

[{'cast': ['Hilda Borgstr�m',
   'Aron Lindgren',
   'Erik Lindholm',
   'Georg Gr�nroos'],
  'directors': ['Victor Sj�str�m'],
  'writers': ['Nils Krok (play)', 'Victor Sj�str�m'],
  'cast_directors_writers': 0},
 {'directors': ['�mile Reynaud'], 'cast_directors_writers': 0},
 {'cast': ['Georges M�li�s'],
  'directors': ['Georges M�li�s'],
  'cast_directors_writers': 0},
 {'cast': ["Jeanne d'Alcy", 'Georges M�li�s'],
  'directors': ['Georges M�li�s'],
  'cast_directors_writers': 0},
 {'cast': ['Mrs. Auguste Lumiere', 'Andr�e Lumi�re', 'Auguste Lumi�re'],
  'directors': ['Louis Lumi�re'],
  'cast_directors_writers': 0},
 {'cast': ['Charles Kayser', 'Jtestn Ott'],
  'directors': ['William K.L. Dickson'],
  'cast_directors_writers': 0},
 {'cast': ['Elizabeth Tait', 'Jtestn Tait', 'Norman Campbell', 'Bella Cola'],
  'directors': ['Charles Tait'],
  'writers': ['Charles Tait'],
  'cast_directors_writers': 0},
 {'directors': ['�mile Ctestl'], 'cast_directors_writers': 0},
 {'cast': ['Mary F

In [114]:
matching = {"$match": {"cast_directors_writers": {"$gt": 0}}}

In [115]:
pipeline = [
    predicate,
    mapping,
    projection,
    matching,
    counting
]

display(list(movies.aggregate(pipeline)))

[{'labors_of_love': 1596}]