# Querrying a noSQL db (MongoDB)

MongoDB supports both basic and advanced CRUD operations (**C**reate **R**ead **U**pdate and **D**elete).

In this notebook, we'll explore the **Read** operations - specifically, how to query an existing collection of documents in MongoDB.

>NOTE: For more information on MongoDB's support for query operations, see [here]( https://docs.mongodb.com/manual/tutorial/query-documents/.).

## Step 1: Establish a connection to the Database

You will need to setup the client so that it connects to your project. 

Then, you'll need to specify the database within this project.

Finally, you'll need to specify the collection within which you wish to make the query.


In [1]:
import pymongo # pymongo is a python driver for MongoDB

client = pymongo.MongoClient('mongodb://localhost:27017/')

db = client.cardio # # this is a database that was created in a previous notebook (4)
collection = db.patients # this is a collection that was created in a previous notebook (4)

## Step2: Query the collection

You can use many different approaches to querrying the collection. 

### Get all documents in a collection

In [2]:
import json  
query = {} # select all documents (same as select * from patients)
doc = collection.find(query)
for record in doc: # 'doc' is a cursor that we can iterate over
    print(record)

{'_id': 1, 'office_num': [3], 'patient_info': {'fname': 'Jordan', 'lname': 'Smith', 'mname': 'Jane'}, 'patient_numbers': ['177-763-1706', '690-743-9150', '008-063-6083'], 'weights': [343, 373, 209, 266], 'height': 129, 'bp_readings': [{'Systolic': 148, 'Diastolic': 84, 'Rate': 78}, {'Systolic': 101, 'Diastolic': 83, 'Rate': 95}, {'Systolic': 141, 'Diastolic': 73, 'Rate': 71}, {'Systolic': 140, 'Diastolic': 93, 'Rate': 78}, {'Systolic': 107, 'Diastolic': 93, 'Rate': 81}], 'comments': 'good patient', 'readings': [{'Systolic': 999, 'Diastolic': 999, 'Rate': 999}]}
{'_id': 2, 'office_num': [9], 'patient_info': {'fname': 'Liam', 'lname': 'Hale'}, 'patient_numbers': ['344-978-6907', '366-258-5178', '128-657-0704', '999-622-8303'], 'weights': [370, 208, 297, 353, 266], 'height': 173, 'bp_readings': [{'Systolic': 129, 'Diastolic': 99, 'Rate': 77}, {'Systolic': 142, 'Diastolic': 87, 'Rate': 98}, {'Systolic': 146, 'Diastolic': 70, 'Rate': 84}, {'Systolic': 150, 'Diastolic': 97, 'Rate': 107}, {'S

### Find a document based on a (first level) field value 

In [3]:
query = { "_id": 1 } # search for any five start restaurants
doc = collection.find(query)
for record in doc:
  print(json.dumps(record, indent=4)) # using json.dumps() to make the output more readable

{
    "_id": 1,
    "office_num": [
        3
    ],
    "patient_info": {
        "fname": "Jordan",
        "lname": "Smith",
        "mname": "Jane"
    },
    "patient_numbers": [
        "177-763-1706",
        "690-743-9150",
        "008-063-6083"
    ],
    "weights": [
        343,
        373,
        209,
        266
    ],
    "height": 129,
    "bp_readings": [
        {
            "Systolic": 148,
            "Diastolic": 84,
            "Rate": 78
        },
        {
            "Systolic": 101,
            "Diastolic": 83,
            "Rate": 95
        },
        {
            "Systolic": 141,
            "Diastolic": 73,
            "Rate": 71
        },
        {
            "Systolic": 140,
            "Diastolic": 93,
            "Rate": 78
        },
        {
            "Systolic": 107,
            "Diastolic": 93,
            "Rate": 81
        }
    ],
    "comments": "good patient",
    "readings": [
        {
            "Systolic": 999,
            "Diast

In [4]:
query = { "comments": "good patient" }
doc = collection.find(query)
for record in doc:
  print(json.dumps(record, indent=4))

{
    "_id": 1,
    "office_num": [
        3
    ],
    "patient_info": {
        "fname": "Jordan",
        "lname": "Smith",
        "mname": "Jane"
    },
    "patient_numbers": [
        "177-763-1706",
        "690-743-9150",
        "008-063-6083"
    ],
    "weights": [
        343,
        373,
        209,
        266
    ],
    "height": 129,
    "bp_readings": [
        {
            "Systolic": 148,
            "Diastolic": 84,
            "Rate": 78
        },
        {
            "Systolic": 101,
            "Diastolic": 83,
            "Rate": 95
        },
        {
            "Systolic": 141,
            "Diastolic": 73,
            "Rate": 71
        },
        {
            "Systolic": 140,
            "Diastolic": 93,
            "Rate": 78
        },
        {
            "Systolic": 107,
            "Diastolic": 93,
            "Rate": 81
        }
    ],
    "comments": "good patient",
    "readings": [
        {
            "Systolic": 999,
            "Diast

In [5]:
query = { "comments": {"$regex": "ood" }} # https://en.wikipedia.org/wiki/Regular_expression
doc = collection.find(query)
for record in doc:
  print(json.dumps(record, indent=4))

{
    "_id": 1,
    "office_num": [
        3
    ],
    "patient_info": {
        "fname": "Jordan",
        "lname": "Smith",
        "mname": "Jane"
    },
    "patient_numbers": [
        "177-763-1706",
        "690-743-9150",
        "008-063-6083"
    ],
    "weights": [
        343,
        373,
        209,
        266
    ],
    "height": 129,
    "bp_readings": [
        {
            "Systolic": 148,
            "Diastolic": 84,
            "Rate": 78
        },
        {
            "Systolic": 101,
            "Diastolic": 83,
            "Rate": 95
        },
        {
            "Systolic": 141,
            "Diastolic": 73,
            "Rate": 71
        },
        {
            "Systolic": 140,
            "Diastolic": 93,
            "Rate": 78
        },
        {
            "Systolic": 107,
            "Diastolic": 93,
            "Rate": 81
        }
    ],
    "comments": "good patient",
    "readings": [
        {
            "Systolic": 999,
            "Diast

### Query values in an embedded array
In our sample data, we have an embedded array of phone numbers. Here's how we can search an embedded array.

In [6]:
query = { "patient_numbers": {"$regex": '^177'}} # search for records that have this area code
doc = collection.find(query)
for record in doc:
  print(json.dumps(record, indent=4))

{
    "_id": 1,
    "office_num": [
        3
    ],
    "patient_info": {
        "fname": "Jordan",
        "lname": "Smith",
        "mname": "Jane"
    },
    "patient_numbers": [
        "177-763-1706",
        "690-743-9150",
        "008-063-6083"
    ],
    "weights": [
        343,
        373,
        209,
        266
    ],
    "height": 129,
    "bp_readings": [
        {
            "Systolic": 148,
            "Diastolic": 84,
            "Rate": 78
        },
        {
            "Systolic": 101,
            "Diastolic": 83,
            "Rate": 95
        },
        {
            "Systolic": 141,
            "Diastolic": 73,
            "Rate": 71
        },
        {
            "Systolic": 140,
            "Diastolic": 93,
            "Rate": 78
        },
        {
            "Systolic": 107,
            "Diastolic": 93,
            "Rate": 81
        }
    ],
    "comments": "good patient",
    "readings": [
        {
            "Systolic": 999,
            "Diast

In [7]:
query = { "bp_readings": {"Systolic" : 148 , "Diastolic": 84, "Rate": 78}} # in this style, you need all fields
doc = collection.find(query)
for record in doc:
  print(json.dumps(record, indent=4))

{
    "_id": 1,
    "office_num": [
        3
    ],
    "patient_info": {
        "fname": "Jordan",
        "lname": "Smith",
        "mname": "Jane"
    },
    "patient_numbers": [
        "177-763-1706",
        "690-743-9150",
        "008-063-6083"
    ],
    "weights": [
        343,
        373,
        209,
        266
    ],
    "height": 129,
    "bp_readings": [
        {
            "Systolic": 148,
            "Diastolic": 84,
            "Rate": 78
        },
        {
            "Systolic": 101,
            "Diastolic": 83,
            "Rate": 95
        },
        {
            "Systolic": 141,
            "Diastolic": 73,
            "Rate": 71
        },
        {
            "Systolic": 140,
            "Diastolic": 93,
            "Rate": 78
        },
        {
            "Systolic": 107,
            "Diastolic": 93,
            "Rate": 81
        }
    ],
    "comments": "good patient",
    "readings": [
        {
            "Systolic": 999,
            "Diast

### Querying embedded documents

In our sample data, the readings field contains an array of embeded documents. Let's explore some examples showing how we can search an embedded document.

In [8]:
# in this style, you need all fields - so this one won't return anything
query = { "bp_readings": {"Systolic" : 148}} 
doc = collection.find(query)
for record in doc:
    print(json.dumps(record, indent=4))

In [9]:
# If you want to query on one field, you can use the following approach
query = { "bp_readings.Systolic" : 148} 
doc = collection.find(query)
for record in doc:
    print(json.dumps(record, indent=4))

{
    "_id": 1,
    "office_num": [
        3
    ],
    "patient_info": {
        "fname": "Jordan",
        "lname": "Smith",
        "mname": "Jane"
    },
    "patient_numbers": [
        "177-763-1706",
        "690-743-9150",
        "008-063-6083"
    ],
    "weights": [
        343,
        373,
        209,
        266
    ],
    "height": 129,
    "bp_readings": [
        {
            "Systolic": 148,
            "Diastolic": 84,
            "Rate": 78
        },
        {
            "Systolic": 101,
            "Diastolic": 83,
            "Rate": 95
        },
        {
            "Systolic": 141,
            "Diastolic": 73,
            "Rate": 71
        },
        {
            "Systolic": 140,
            "Diastolic": 93,
            "Rate": 78
        },
        {
            "Systolic": 107,
            "Diastolic": 93,
            "Rate": 81
        }
    ],
    "comments": "good patient",
    "readings": [
        {
            "Systolic": 999,
            "Diast

In [10]:
# we can also check for greater than, greater than equal, less than, less than equal
query = { "bp_readings.Systolic" : {"$gte" : 130}} # in this style, you need all fields - so this one won't return anything
doc = collection.find(query)
for record in doc:
    print(json.dumps(record, indent=4))

{
    "_id": 1,
    "office_num": [
        3
    ],
    "patient_info": {
        "fname": "Jordan",
        "lname": "Smith",
        "mname": "Jane"
    },
    "patient_numbers": [
        "177-763-1706",
        "690-743-9150",
        "008-063-6083"
    ],
    "weights": [
        343,
        373,
        209,
        266
    ],
    "height": 129,
    "bp_readings": [
        {
            "Systolic": 148,
            "Diastolic": 84,
            "Rate": 78
        },
        {
            "Systolic": 101,
            "Diastolic": 83,
            "Rate": 95
        },
        {
            "Systolic": 141,
            "Diastolic": 73,
            "Rate": 71
        },
        {
            "Systolic": 140,
            "Diastolic": 93,
            "Rate": 78
        },
        {
            "Systolic": 107,
            "Diastolic": 93,
            "Rate": 81
        }
    ],
    "comments": "good patient",
    "readings": [
        {
            "Systolic": 999,
            "Diast

Another approach is to use \$elemMatch. This will conduct an element wise match of the fields in an array of embedded documents.

In [11]:
query = { "bp_readings" : {"$elemMatch" : {'Systolic' : 104}}} 
doc = collection.find(query)
for record in doc:
    print(json.dumps(record, indent=4))

{
    "_id": 12,
    "office_num": [
        5
    ],
    "patient_info": {
        "fname": "July",
        "lname": "Hale"
    },
    "patient_numbers": [
        "449-472-8576",
        "139-634-1019"
    ],
    "weights": [
        369
    ],
    "height": 137,
    "bp_readings": [
        {
            "Systolic": 104,
            "Diastolic": 86,
            "Rate": 83
        },
        {
            "Systolic": 136,
            "Diastolic": 95,
            "Rate": 79
        },
        {
            "Systolic": 127,
            "Diastolic": 86,
            "Rate": 103
        },
        {
            "Systolic": 122,
            "Diastolic": 94,
            "Rate": 93
        }
    ]
}
{
    "_id": 31,
    "office_num": [
        3
    ],
    "patient_info": {
        "fname": "July",
        "lname": " Fox"
    },
    "patient_numbers": [
        "319-893-8639",
        "287-631-1200"
    ],
    "weights": [
        285,
        303,
        160,
        392
    ],
    "height

In [12]:
query = { "bp_readings" : {"$elemMatch" : {'Systolic' : {'$gte': 150}}}} 
doc = collection.find(query)
for record in doc:
    print(json.dumps(record, indent=4))

{
    "_id": 2,
    "office_num": [
        9
    ],
    "patient_info": {
        "fname": "Liam",
        "lname": "Hale"
    },
    "patient_numbers": [
        "344-978-6907",
        "366-258-5178",
        "128-657-0704",
        "999-622-8303"
    ],
    "weights": [
        370,
        208,
        297,
        353,
        266
    ],
    "height": 173,
    "bp_readings": [
        {
            "Systolic": 129,
            "Diastolic": 99,
            "Rate": 77
        },
        {
            "Systolic": 142,
            "Diastolic": 87,
            "Rate": 98
        },
        {
            "Systolic": 146,
            "Diastolic": 70,
            "Rate": 84
        },
        {
            "Systolic": 150,
            "Diastolic": 97,
            "Rate": 107
        },
        {
            "Systolic": 132,
            "Diastolic": 95,
            "Rate": 68
        },
        {
            "Systolic": 133,
            "Diastolic": 94,
            "Rate": 95
        },
 

In [13]:
client.close()

## Step 3: Review other possibly querries

There are many ways we could use MongoDB to query the data. See here for more examples https://docs.mongodb.com/manual/tutorial/query-documents/. I'd also encourate you to experiment and try queries through the MongoDB online interface.

## Use of Indices

Indices can be used to improve the performance of queries. See [here](https://docs.mongodb.com/manual/indexes/) for more information on indices.