# Lab: MongoDB Indexing

![https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-WD0231EN-SkillsNetwork/IDSN-logo.png](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-WD0231EN-SkillsNetwork/IDSN-logo.png)

Estimated time needed: **30** minutes

## **Objectives**

After completing this lab, you will be able to:

- Measure the time it takes to execute a query with the explain function
- Describe the process of creating, listing and deleting indexes
- Evaluate the effectiveness of an index

Run the below command on the newly opened terminal. (You can copy the code by clicking on the little copy button on the bottom right of the codeblock below and then paste it, wherever you wish.)



In [1]:

#mongosh -u root -p PASSWORD --authenticationDatabase admin local




![https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-DB0151EN-edX/images/mongo-shell.png](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-DB0151EN-edX/images/mongo-shell.png)

The command contains the username and password to connect to mongodb server (the text after the -p option is the password). Your output would be different from the one shown above. Copy the command given to you, and keep it handy. You will need it in the next step.

Or you can simply click on MongoDB CLI which does that for you.

![https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-DB0151EN-edX/images/mongodb-cli.png](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-DB0151EN-edX/images/mongodb-cli.png)

In MongoDB CLI (for example, mongo shell), switch the context to `training` database.



In [2]:

#use training




And create a collection called `bigdata`



In [3]:

#db.createCollection("bigdata")


# **Pymongo**

In [4]:
import pymongo

# Connect to MongoDB
client = pymongo.MongoClient("mongodb://localhost:27017/")

# Select the training database
db = client["training"]

In [5]:
db.create_collection("bigdata")

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'training'), 'bigdata')



# **Exercise 1 - Insert documents**

- Let's insert 200k documents into the newly created collection.
- This should take a few seconds to complete.
- The code given below will insert these documents into the `bigdata` collection.
- Each document has a field named `account_no` which is assigned to incrementing variable `i`.
- Another field `balance` contains a randomly generated number, to simulate the bank balance for the account.

Copy the below code and paste it on the mongo client.



In [None]:

# docsToInsert = []
# for (i = 1; i <= 200000; i++) { docsToInsert[i-1] = { "account_no": i, "balance": Math.round(Math.random() * 20000) } }
# db.bigdata.insertMany(docsToInsert);


In [7]:
import random

# List to store documents to insert
docs_to_insert = []

# Generate and add documents to the list
for i in range(1, 200001):
    doc = {"account_no": i, "balance": round(random.random() * 20000, 2)}
    docs_to_insert.append(doc)

# Insert documents in batches of 1000 for better performance
batch_size = 1000
for i in range(0, len(docs_to_insert), batch_size):
    db.bigdata.insert_many(docs_to_insert[i:i + batch_size])



Verify that `200000` documents got inserted by running the below command.



In [8]:

#db.bigdata.countDocuments()


In [9]:
db.bigdata.count_documents({})

200000



# **Exercise 2 - Measure the time taken by a query**

Let's run a query and find out how much time it takes to complete. You will query for the details of account number 58982.

We will make use of the `explain` function to find the time taken to run the query in milliseconds.

> The db.collection.explain(“executionStats”) method provides statistics about the performance of a query. These statistics can be useful in measuring if and how a query uses an index. See db.collection.explain() for details.
>

Run the below command.



In [None]:

#db.bigdata.find({"account_no":58982}).explain("executionStats").executionStats.executionTimeMillis


In [11]:
db.bigdata.find({"account_no": 58982}).explain()["executionStats"]["executionTimeMillis"]

101



Make a note of the milliseconds it took to run the query. We will need this at a later stage.

# **Exercise 3 - Working with indexes**

Before you create an index, choose the field you wish to create an index on. It is usually the field that you query most.

Run the below command to create an index on the field `account_no`.



In [None]:

#db.bigdata.createIndex({"account_no":1})


In [12]:
db.bigdata.create_index([("account_no", pymongo.ASCENDING)])

'account_no_1'



Where `1` means `ascending` order.

Run the below command to get a list of indexes on the `bigdata` collection.



In [None]:

#db.bigdata.getIndexes()


In [13]:
db.bigdata.index_information()

{'_id_': {'v': 2, 'key': [('_id', 1)]},
 'account_no_1': {'v': 2, 'key': [('account_no', 1)]}}



You should see an index named `account_no_1`

# **Exercise 4 - Find out how effective an index is**

You will now run the same query for account `58982` and compare the execution time from previous run. This way, you can compare the improvement.

Run the below command.



In [None]:

#db.bigdata.find({"account_no":58982}).explain("executionStats").executionStats.executionTimeMillis


In [14]:
db.bigdata.find({"account_no": 58982}).explain()["executionStats"]["executionTimeMillis"]

17



This time, the execution time should be a lot less than previously. If you see `0`, it means the query completed under 1 millisecond.

# **Exercise 6 - Delete an index**

Use the below command to delete the index we created earlier. Here you can provide index definition or name.



In [None]:

#db.bigdata.dropIndex({"account_no":1})


In [15]:
db.bigdata.drop_index([("account_no", pymongo.ASCENDING)])

In [16]:
db.bigdata.index_information()

{'_id_': {'v': 2, 'key': [('_id', 1)]}}



# **Bonus information**

MongoDB creates a unique index on the `_id` field during the creation of a collection. The `_id` index prevents clients from inserting two documents with the same value for the `_id` field. You cannot drop this index on the `_id` field.

# **Practice exercises**

1. Problem:

> Create an index on the balance field.
>
- Click here for Hint

> use the command db.collection.createIndex()
>
- Click here for Solution

On the mongo client run the below commands.



In [None]:

#db.bigdata.createIndex({"balance":1})


In [17]:
db.bigdata.create_index([("balance", pymongo.ASCENDING)])

'balance_1'



1. Problem:

> Query for documents with a balance of 10000 and record the time taken.
>
- Click here for Hint

> use the command db.collection.find().explain()
>
- Click here for Solution



In [18]:

#db.bigdata.find({"balance":10000}).explain("executionStats").executionStats.executionTimeMillis


In [20]:
db.bigdata.find({"balance": 10000}).explain()["executionStats"]["executionTimeMillis"]


0



1. Problem:

> Drop the index you have created.
>
- Click here for Hint

> use the command db.collection.dropIndex()
>
- Click here for Solution



In [None]:

#db.bigdata.dropIndex({"balance":1})


In [21]:
db.bigdata.drop_index([("balance", pymongo.ASCENDING)])



1. Problem:

> Query for documents with a balance of 10000 and record the time taken, and compare it with the previously recorded time.
>
- Click here for Hint

> use the command db.collection.find().explain()
>
- Click here for Solution



In [22]:

#db.bigdata.find({"balance": 10000}).explain("executionStats").executionStats.executionTimeMillis


In [23]:
db.bigdata.find({"balance": 10000}).explain()["executionStats"]["executionTimeMillis"]


117