# `Querying an Array`

---
Match an array field :-

{ 

....

\<field\> : [\<value\>, \<value\>, ...]

...

}

----

[Array query operators](https://docs.mongodb.com/manual/reference/operator/query-array/) :- For querying on an array field.

- [$all](https://docs.mongodb.com/manual/reference/operator/query/all/#mongodb-query-op.-all) - Matches arrays that contain all elements specified in the query.

- [$size](https://docs.mongodb.com/manual/reference/operator/query/size/#mongodb-query-op.-size) - Selects documents if the array field is a specified size.

[Array projection  operators](https://docs.mongodb.com/manual/reference/operator/projection/_) :- Project elements from an array field.

- [$slice](https://docs.mongodb.com/manual/reference/operator/projection/slice/#proj._S_slice) - Limits the number of elements projected from an array. 

---
----
### Connecting to MongoDB

----

In [1]:
# Importing the required libraries
import pymongo
import pprint as pp


pp.sorted = lambda x, key=None: x

In [2]:
# Connect to the mongo client - Atlas Cluster
# client = pymongo.MongoClient('<connection_string>')

In [3]:
# Choose a database
db = client.sample_analytics

---
Sample document from `accounts` collection.

----

In [4]:
# Sample document
pp.pprint(
    db.accounts.find_one()
)

{'_id': ObjectId('5ca4bbc7a2dd94ee5816238c'),
 'account_id': 371138,
 'limit': 9000,
 'products': ['Derivatives', 'InvestmentStock']}


----
### Querying using array operators

----

Query an array that contains given value.

For example, find documents where the `products` array contains `Derivatives` element.

---

In [5]:
# Find documents that contains specific array element

cur = db.accounts.find(
                        # query expression
                        {
                            'products':'Derivatives'
                        },
                        # projection
                        {
                            'products':1,
                            '_id':0
                        })

# Print documents
for doc in cur:
    pp.pprint(doc)

{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'CurrencyService', 'InvestmentStock']}
{'products': ['CurrencyService',
              'Derivatives',
              'InvestmentFund',
              'Commodity',
              'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock', 'CurrencyService']}
{'products': ['InvestmentFund', 'Derivatives', 'InvestmentStock']}
{'products': ['Brokerage', 'CurrencyService', 'InvestmentStock', 'Derivatives']}
{'products': ['Derivatives', 'Brokerage', 'Commodity', 'InvestmentStock']}
{'products': ['InvestmentFund',
              'Derivatives',
              'InvestmentStock',
              'CurrencyService']}
{'products': ['Commodity', 'CurrencyService', 'Derivatives', 'InvestmentStock']}
{'products': ['CurrencyService',
              'InvestmentFund',
              'InvestmentStock',
              'Derivatives']}
{'products': ['Brokerage', 'CurrencyService', 'Derivatives', 'InvestmentStock']}
{'products': ['Brok

---

Query a collection that contains a documents **with specific array values only and in the mentioned order**.

For example, find documents that contains only elements `[Derivatives, InvestmentStock]` and in the same order.

---

In [6]:
# Document that contains exact array values - same values and same order

cur = db.accounts.find(
                        # query expression
                        {
                            'products':['Derivatives', 'InvestmentStock']
                        },
                        # projection
                        {
                            'products':1,
                            '_id': 0
                        })

# Print documents
for doc in cur:
    pp.pprint(doc)

{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Deriv

---
**`$all` array operator**

[$all](https://docs.mongodb.com/manual/reference/operator/query/all/#op._S_all) matches arrays that contain all elements specified in the query.

**Syntax -** `{ <field>: { $all: [ <value1> , <value2> ... ] } }`

----

For example, find documents that contain `Commodity` and `InvestmentStock` in the `products` array field.

Documents with array fiel containing at least these elements will be retrieved. The order does not matter here.

---

In [7]:
# Documents that contains multiple array elements in any order

cur = db.accounts.find(
                        # query expression
                        {
                            'products':{
                                        '$all':['InvestmentStock', 'Commodity']
                                        }
                        },
                        # projection
                        {
                            'products':1,
                            '_id':0
                        })

# Print documents

for doc in cur:
    pp.pprint(doc)

{'products': ['InvestmentStock', 'Commodity', 'Brokerage', 'CurrencyService']}
{'products': ['Commodity', 'InvestmentStock']}
{'products': ['CurrencyService',
              'Derivatives',
              'InvestmentFund',
              'Commodity',
              'InvestmentStock']}
{'products': ['CurrencyService', 'Brokerage', 'InvestmentStock', 'Commodity']}
{'products': ['Brokerage', 'Commodity', 'InvestmentStock']}
{'products': ['Derivatives', 'Brokerage', 'Commodity', 'InvestmentStock']}
{'products': ['Commodity', 'InvestmentStock']}
{'products': ['Commodity', 'InvestmentStock']}
{'products': ['Brokerage', 'Commodity', 'InvestmentStock']}
{'products': ['Commodity', 'CurrencyService', 'Derivatives', 'InvestmentStock']}
{'products': ['CurrencyService', 'Commodity', 'InvestmentStock']}
{'products': ['Brokerage', 'InvestmentStock', 'Commodity', 'Derivatives']}
{'products': ['InvestmentFund', 'InvestmentStock', 'Derivatives', 'Commodity']}
{'products': ['Brokerage',
              'Commodi

---
**`$size` array operator**

[$size](https://docs.mongodb.com/manual/reference/operator/query/size/#op._S_size) selects documents if the array field is a specified size.

**Syntax -** `{ <field>: { $size: <size> } }`

---

For example, we can retrieve all documents that have only two elements in `products` array field.

---

In [8]:
# Documents with array of size 2

cur = db.accounts.find(
                        # query expression
                        {
                            'products':{
                                        '$size':2
                                        }
                        },
                        # projection
                        {
                            'products':1,
                            '_id':0
                        })

# Print documents

for doc in cur:
    pp.pprint(doc)

{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Commodity', 'InvestmentStock']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['Brokerage', 'InvestmentStock']}
{'products': ['Brokerage', 'InvestmentStock']}
{'products': ['Commodity', 'InvestmentStock']}
{'products': ['Commodity', 'InvestmentStock']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['CurrencyService', 'InvestmentStock']}
{'products': ['InvestmentStock', 'CurrencyService']}
{'products': ['CurrencyService', 'InvestmentStock']}
{'products': ['Brokerage', 'InvestmentStock']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['Commodity', 'InvestmentStock']}
{'products': ['Brokerage', 'InvestmentStock']}
{'products': ['Commodity', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Brokerage', 'InvestmentStock']}
{'products': 

---
---
**`$slice` projection operator**

Using [$slice](https://docs.mongodb.com/manual/reference/operator/projection/slice/#proj._S_slice) you can limit the elements of an array to return in the query result.

**Syntax -** `find(<query>,{ <arrayField>: { $slice: <number> } })`

----

For instance, we return only the first two elements of `products` array where the size of the array is 5.

---

In [9]:
# Limit the number of array elements to return

cur = db.accounts.find(
                        # query expression
                        {'products':{
                                        '$size':5
                                    }
                        },
                        # projection
                        {
                            'products':{'$slice':2},
                            '_id':0
                        })

# Print documents

for doc in cur:
    pp.pprint(doc)

{'account_id': 383777,
 'limit': 10000,
 'products': ['CurrencyService', 'Derivatives']}
{'account_id': 599752, 'limit': 10000, 'products': ['Brokerage', 'Commodity']}
{'account_id': 515844,
 'limit': 10000,
 'products': ['Commodity', 'CurrencyService']}
{'account_id': 627629, 'limit': 10000, 'products': ['Brokerage', 'Derivatives']}
{'account_id': 571279, 'limit': 10000, 'products': ['Commodity', 'Derivatives']}
{'account_id': 161714,
 'limit': 9000,
 'products': ['Commodity', 'CurrencyService']}
{'account_id': 472963,
 'limit': 10000,
 'products': ['InvestmentFund', 'Derivatives']}
{'account_id': 839927,
 'limit': 10000,
 'products': ['InvestmentFund', 'CurrencyService']}
{'account_id': 475102,
 'limit': 10000,
 'products': ['CurrencyService', 'InvestmentFund']}
{'account_id': 452778,
 'limit': 10000,
 'products': ['InvestmentFund', 'Derivatives']}
{'account_id': 332179,
 'limit': 10000,
 'products': ['Commodity', 'CurrencyService']}
{'account_id': 387979, 'limit': 10000, 'products':

---
To return the last element(s) of array, prefix the minus(-) operator before the number.

---

In [10]:
# Complete document
pp.pprint(
    db.accounts.find_one({
                            'products':{
                                            '$size':5
                                        }
                        })
)

{'_id': ObjectId('5ca4bbc7a2dd94ee58162391'),
 'account_id': 383777,
 'limit': 10000,
 'products': ['CurrencyService',
              'Derivatives',
              'InvestmentFund',
              'Commodity',
              'InvestmentStock']}


In [11]:
# Return last 2 array element

pp.pprint(
    db.accounts.find_one(
                        # query expression
                        {
                            'products':{
                                            '$size':5
                                        }
                        },
                        # projection
                        {
                            'products':{
                                            '$slice':-2
                                        }
                        })
)

{'_id': ObjectId('5ca4bbc7a2dd94ee58162391'),
 'account_id': 383777,
 'limit': 10000,
 'products': ['Commodity', 'InvestmentStock']}


---
Also returns array elements after skipping specified number of elements.

For example, retrieve the 3rd element of `products` field where `account_id` is 599752.

---

In [12]:
# Original doc
pp.pprint(
    db.accounts.find_one({
                            'account_id': 599752
                        })
)

{'_id': ObjectId('5ca4bbc7a2dd94ee581623b7'),
 'account_id': 599752,
 'limit': 10000,
 'products': ['Brokerage',
              'Commodity',
              'Derivatives',
              'CurrencyService',
              'InvestmentStock']}


In [13]:
# Return third array element
# skip 2 and return 1 element

pp.pprint(
    db.accounts.find_one(
                            # query expression
                            {
                                'account_id': 599752
                            },
                            # projection
                            {
                                'products':{
                                                '$slice': [2, 1]
                                            }
                            })
)

{'_id': ObjectId('5ca4bbc7a2dd94ee581623b7'),
 'account_id': 599752,
 'limit': 10000,
 'products': ['Derivatives']}


---
---
**Query array based on index of element.**

Can query on arrays based on index of elements.

For example, retrieve those documents where the first element of `products` array is `Derivatives`.

---

In [14]:
# First element `Derivatives`
cur = db.accounts.find(
                        # query expression
                        {
                            'products.0': 'Derivatives', 
                        },
                        # projection
                        {
                            'products': 1,
                            '_id':0
                        })

for doc in cur:
    pp.pprint(doc)

{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'CurrencyService', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock', 'CurrencyService']}
{'products': ['Derivatives', 'Brokerage', 'Commodity', 'InvestmentStock']}
{'products': ['Derivatives', 'Brokerage', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives',
              'InvestmentStock',
              'InvestmentFund',
              'CurrencyService']}
{'products': ['Derivatives', 'Brokerage', 'Commodity', 'InvestmentStock']}
{'products': ['Derivatives',
              'CurrencyService',
              'InvestmentFund',
              'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'InvestmentStock']}
{'products': ['Derivatives', 'Commodity', 'InvestmentStock']}
{'products': ['Derivatives', 'Brokerage', 'InvestmentStock', 'InvestmentFund']}
{'products': ['Derivatives', 'InvestmentFund', 'Investment

---
For example, retrieve those documents where the first element of `products` array is `Derivatives`.

---

In [15]:
# Second element `Derivatives`
cur = db.accounts.find(
                        # query expression
                        {
                            'products.1': 'Derivatives', 
                        },
                        # projection
                        {
                            'products': 1,
                            '_id': 0
                        })

for doc in cur:
    pp.pprint(doc)

{'products': ['CurrencyService',
              'Derivatives',
              'InvestmentFund',
              'Commodity',
              'InvestmentStock']}
{'products': ['InvestmentFund', 'Derivatives', 'InvestmentStock']}
{'products': ['InvestmentFund',
              'Derivatives',
              'InvestmentStock',
              'CurrencyService']}
{'products': ['InvestmentFund', 'Derivatives', 'InvestmentStock']}
{'products': ['InvestmentFund', 'Derivatives', 'InvestmentStock']}
{'products': ['Commodity', 'Derivatives', 'InvestmentStock']}
{'products': ['Brokerage',
              'Derivatives',
              'InvestmentFund',
              'Commodity',
              'InvestmentStock']}
{'products': ['Brokerage', 'Derivatives', 'InvestmentStock']}
{'products': ['Brokerage', 'Derivatives', 'InvestmentFund', 'InvestmentStock']}
{'products': ['Commodity',
              'Derivatives',
              'Brokerage',
              'CurrencyService',
              'InvestmentStock']}
{'products': 

---
### Question - 

Retrieve those documents where the `products` array only contains `InvestmentStock` and `InvestmentFund` elements but in any order.

---

In [16]:
# Question
cur = db.accounts.find(
                        # query expression
                        {
                            'products':{
                                        '$all':['InvestmentStock', 'InvestmentFund'],
                                        '$size': 2
                                        }
                        },
                        # projection
                        {
                            'products':1,
                            '_id':0,
                        })

# Print documents

for doc in cur:
    pp.pprint(doc)

{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['InvestmentStock', 'InvestmentFund']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products': ['InvestmentFund', 'InvestmentStock']}
{'products':

---
### Question - 

Count the number of documents where the `products` field contains `[Derivatives, InvestmentStock]` in the particular order.

---

In [17]:
# Question
db.accounts.find(
                    # query expression
                    {
                        'products':['Derivatives', 'InvestmentStock']
                    },
                    # projection
                    {
                        'products':1,
                        '_id':0,
                    }).count()

  # Remove the CWD from sys.path while we load stuff.


92

---
### Exercise 1 - 

Retrieve those documents where the first product of the customer is `Derivatives` and `7000<= limit <=9000`.

---

---
### Exercise 2 - 

Retrieve those documents where the customers have `limit` less than equal to 5000 and have purchased 5 `products`.

---