<a href="https://colab.research.google.com/github/sreent/data-management-intro/blob/main/MongoDB%20Hand-On%20Lab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1 Setting Up MongoDB Environment

In [1]:
!sudo wget http://archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_amd64.deb
!sudo dpkg -i libssl1.1_1.1.1f-1ubuntu2_amd64.deb

# Import the public key used by the package management system
!wget -qO - https://www.mongodb.org/static/pgp/server-4.4.asc | apt-key add -

# Create a list file for MongoDB
!echo "deb [ arch=amd64,arm64 ] http://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.4 multiverse" | tee /etc/apt/sources.list.d/mongodb-org-4.4.list

# Reload the local package database
!apt-get update > /dev/null

# Install the MongoDB packages
!apt-get install -y mongodb-org > /dev/null

# Install pymongo
!pip install -q pymongo

# Create Data Folder
!mkdir -p /data/db

# Start MongoDB
!mongod --fork --logpath /var/log/mongodb.log --dbpath /data/db

--2024-02-02 16:39:26--  http://archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_amd64.deb
Resolving archive.ubuntu.com (archive.ubuntu.com)... 185.125.190.36, 185.125.190.39, 91.189.91.83, ...
Connecting to archive.ubuntu.com (archive.ubuntu.com)|185.125.190.36|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1318204 (1.3M) [application/x-debian-package]
Saving to: ‘libssl1.1_1.1.1f-1ubuntu2_amd64.deb’


2024-02-02 16:39:27 (1.34 MB/s) - ‘libssl1.1_1.1.1f-1ubuntu2_amd64.deb’ saved [1318204/1318204]

Selecting previously unselected package libssl1.1:amd64.
(Reading database ... 121730 files and directories currently installed.)
Preparing to unpack libssl1.1_1.1.1f-1ubuntu2_amd64.deb ...
Unpacking libssl1.1:amd64 (1.1.1f-1ubuntu2) ...
Setting up libssl1.1:amd64 (1.1.1f-1ubuntu2) ...
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/

In [2]:
from pymongo import MongoClient

# Establish connection to MongoDB
try:
    client = MongoClient('localhost', 27017)
    print("Connected to MongoDB")
except Exception as e:
    print("Error connecting to MongoDB: ", e)
    exit()

# List databases to check the connection
try:
    databases = client.list_database_names()
    print("Databases:", databases)
except Exception as e:
    print("Error listing databases: ", e)

# Retrieve server status
try:
    server_status = client.admin.command("serverStatus")
    print("Server Status:", server_status)
except Exception as e:
    print("Error retrieving server status: ", e)

# Perform basic database operations (Create, Read)
try:
    db = client.test_db
    collection = db.test_collection
    # Insert a document
    insert_result = collection.insert_one({"name": "test", "value": 123})
    print("Insert operation result:", insert_result.inserted_id)
    # Read a document
    read_result = collection.find_one({"name": "test"})
    print("Read operation result:", read_result)
except Exception as e:
    print("Error performing database operations: ", e)

Connected to MongoDB
Databases: ['admin', 'config', 'local']
Insert operation result: 65bd1af00b0595e3f9e8e334
Read operation result: {'_id': ObjectId('65bd1af00b0595e3f9e8e334'), 'name': 'test', 'value': 123}


# 2 Preparations

Databases and collections in MongoDB are created implicitly while data is inserted. In this tutorial, you will create a collection of *films*. There is no collection so far, so create one by inserting a document.

In [3]:
query = """
db.films.insert({
    "title": "Star Trek Into Darkness",
    "year": 2013,
    "genre": [
        "Action",
        "Adventure",
        "Sci-Fi",
    ],
    "actors": [
        "Pine, Chris",
        "Quinto, Zachary",
        "Saldana, Zoe",
    ],
    "releases": [
        {
            "country": "USA",
            "date": ISODate("2013-05-17"),
            "prerelease": true
        },
        {
            "country": "Germany",
            "date": ISODate("2003-05-16"),
            "prerelease": false
        }
    ]
})"""

!mongo --quiet --eval '{query}'

WriteResult({ "nInserted" : 1 })


Now, there is a *films* collection. You can list the contents of the newly created collection by calling the <code>find()</code> function.

In [4]:
query = """db.films.find()"""

In [5]:
!mongo --quiet --eval '{query}'

{ "_id" : ObjectId("65bd1b0529b338f6aa92aa44"), "title" : "Star Trek Into Darkness", "year" : 2013, "genre" : [ "Action", "Adventure", "Sci-Fi" ], "actors" : [ "Pine, Chris", "Quinto, Zachary", "Saldana, Zoe" ], "releases" : [ { "country" : "USA", "date" : ISODate("2013-05-17T00:00:00Z"), "prerelease" : true }, { "country" : "Germany", "date" : ISODate("2003-05-16T00:00:00Z"), "prerelease" : false } ] }


If you prefer your result nicely formatted, use <code>pretty()</code>:

In [6]:
query = """db.films.find().pretty()"""

In [7]:
!mongo --quiet --eval '{query}'

{
	"_id" : ObjectId("65bd1b0529b338f6aa92aa44"),
	"title" : "Star Trek Into Darkness",
	"year" : 2013,
	"genre" : [
		"Action",
		"Adventure",
		"Sci-Fi"
	],
	"actors" : [
		"Pine, Chris",
		"Quinto, Zachary",
		"Saldana, Zoe"
	],
	"releases" : [
		{
			"country" : "USA",
			"date" : ISODate("2013-05-17T00:00:00Z"),
			"prerelease" : true
		},
		{
			"country" : "Germany",
			"date" : ISODate("2003-05-16T00:00:00Z"),
			"prerelease" : false
		}
	]
}


As you can see, now there is an <code>_id</code> field which is unique for every document

Now insert some more films:

In [9]:
query = """
db.films.insert({
    "title": "Iron Man 3",
    "year": 2013,
    "genre": [
        "Action",
        "Adventure",
        "Sci-Fi",
    ],
    "actors": [
        "Downey Jr., Robert",
        "Paltrow, Gwyneth",
    ]
})
""" # no releases

!mongo --quiet --eval '{query}'

WriteResult({ "nInserted" : 1 })


In [10]:
query = """
db.films.insert({
    "title": "This Means War",
    "year": 2011,
    "genre": [
        "Action",
        "Comedy",
        "Romance",
    ],
    "actors": [
        "Pine, Chris",
        "Witherspoon, Reese",
        "Hardy, Tom",
    ],
    "releases": [
        {
            "country": "USA",
            "date": ISODate("2011-02-17"),
            "prerelease": false
        },
        {
            "country": "UK" ,
            "date": ISODate("2011-03-01"),
            "prerelease": true
        }
    ]
})
"""

!mongo --quiet --eval '{query}'

WriteResult({ "nInserted" : 1 })


In [11]:
query = """
db.films.insert({
    "title": "The Amazing Spider - Man 2",
    "year": 2014,
    "genre": [
        "Action",
        "Adventure",
        "Fantasy",
    ],
    "actors": [
        "Stone, Emma" ,
        "Woodley, Shailene"
    ]
})
""" # also no releases

!mongo --quiet --eval '{query}'

WriteResult({ "nInserted" : 1 })


# 3 Querying

Now query your collection! Have MongoDB return all films with title **"Iron Man 3"** by calling:

In [12]:
query = """
db.films.find({"title": "Iron Man 3"})
"""

!mongo --quiet --eval '{query}'

{ "_id" : ObjectId("65bd1b3d575321dceac8a1f9"), "title" : "Iron Man 3", "year" : 2013, "genre" : [ "Action", "Adventure", "Sci-Fi" ], "actors" : [ "Downey Jr., Robert", "Paltrow, Gwyneth" ] }


Using <code>findOne</code> instead of find produces at most one result (in pretty format):

In [13]:
query = """
db.films.findOne({"title": "Iron Man 3"})
"""

!mongo --quiet --eval '{query}'

{
	"_id" : ObjectId("65bd1b3d575321dceac8a1f9"),
	"title" : "Iron Man 3",
	"year" : 2013,
	"genre" : [
		"Action",
		"Adventure",
		"Sci-Fi"
	],
	"actors" : [
		"Downey Jr., Robert",
		"Paltrow, Gwyneth"
	]
}


Regular expressions can also be used to query a collection. In this tutorial, a short notation is used where the actual regular expression is bounded by slashes (/). The following call yields all movies that start with the letter T:

In [14]:
query = """
db.films.find({"title": /^T/})
"""

!mongo --quiet --eval '{query}'

{ "_id" : ObjectId("65bd1b4315cff144411008db"), "title" : "This Means War", "year" : 2011, "genre" : [ "Action", "Comedy", "Romance" ], "actors" : [ "Pine, Chris", "Witherspoon, Reese", "Hardy, Tom" ], "releases" : [ { "country" : "USA", "date" : ISODate("2011-02-17T00:00:00Z"), "prerelease" : false }, { "country" : "UK", "date" : ISODate("2011-03-01T00:00:00Z"), "prerelease" : true } ] }
{ "_id" : ObjectId("65bd1b4837a5513a8b8e539f"), "title" : "The Amazing Spider - Man 2", "year" : 2014, "genre" : [ "Action", "Adventure", "Fantasy" ], "actors" : [ "Stone, Emma", "Woodley, Shailene" ] }


In [16]:
query = """
db.films.find({"title": {"$regex": "^T"}})
"""

!mongo --quiet --eval '{query}'

{ "_id" : ObjectId("65bd1b4315cff144411008db"), "title" : "This Means War", "year" : 2011, "genre" : [ "Action", "Comedy", "Romance" ], "actors" : [ "Pine, Chris", "Witherspoon, Reese", "Hardy, Tom" ], "releases" : [ { "country" : "USA", "date" : ISODate("2011-02-17T00:00:00Z"), "prerelease" : false }, { "country" : "UK", "date" : ISODate("2011-03-01T00:00:00Z"), "prerelease" : true } ] }
{ "_id" : ObjectId("65bd1b4837a5513a8b8e539f"), "title" : "The Amazing Spider - Man 2", "year" : 2014, "genre" : [ "Action", "Adventure", "Fantasy" ], "actors" : [ "Stone, Emma", "Woodley, Shailene" ] }


If you are only interested in certain attributes, you can use projection to thin out the produced result. While the selection criteria are given by the first argument of find, the projection is given by the second argument. An example:

In [17]:
query = """
db.films.find({"title": /^T/}, {"title": 1})
"""

!mongo --quiet --eval '{query}'

{ "_id" : ObjectId("65bd1b4315cff144411008db"), "title" : "This Means War" }
{ "_id" : ObjectId("65bd1b4837a5513a8b8e539f"), "title" : "The Amazing Spider - Man 2" }


In [18]:
query = """
db.films.find({"title": {"$regex": "^T"}}, {"title": 1})
"""

!mongo --quiet --eval '{query}'

{ "_id" : ObjectId("65bd1b4315cff144411008db"), "title" : "This Means War" }
{ "_id" : ObjectId("65bd1b4837a5513a8b8e539f"), "title" : "The Amazing Spider - Man 2" }


By default, the <code>_id</code> is part of the output, so you have to explicitly suppress it, if you don’t want to have it returned by MongoDB:

In [19]:
query = """
db.films.find({"title": /^T/}, {"_id": 0, "title": 1})
"""

!mongo --quiet --eval '{query}'

{ "title" : "This Means War" }
{ "title" : "The Amazing Spider - Man 2" }


In [20]:
query = """
db.films.find({"title": {"$regex": "^T"}}, {"_id": 0, "title": 1})
"""

!mongo --quiet --eval '{query}'

{ "title" : "This Means War" }
{ "title" : "The Amazing Spider - Man 2" }


You can also use conditional operators, for example to perform range queries. The following returns the titles of all films starting with the letter T where the year attribute is greater than 2009 and less than or equal to 2011:

In [21]:
query = """
db.films.find({
    "year": {
        $gt: 2009,
        $lte: 2011
    },
    "title": /^T/
},
{
    "_id": 0,
    "title": 1
})
"""

!mongo --quiet --eval '{query}'

{ "title" : "This Means War" }


In [22]:
query = """
db.films.find({
    "year": {
        $gt: 2009,
        $lte: 2011
    },
    "title": {"$regex": "^T"}
},
{
    "_id": 0,
    "title": 1
})
"""

!mongo --quiet --eval '{query}'

{ "title" : "This Means War" }


For a logical disjunction of the selection criteria, use the <code>$or</code> operator:

In [23]:
query = """
db.films.find({
    $or: [
        {
            "year": {
                $gt: 2009,
                $lte: 2011
            }
        },
        {
            "title": /^T/
        }
    ]
},
{
    "_id": 0,
    "title": 1
})
"""

!mongo --quiet --eval '{query}'

{ "title" : "This Means War" }
{ "title" : "The Amazing Spider - Man 2" }


In [24]:
query = """
db.films.find({
    $or: [
        {
            "year": {
                $gt: 2009,
                $lte: 2011
            }
        },
        {
            "title": {"$regex": "^T"}
        }
    ]
},
{
    "_id": 0,
    "title": 1
})
"""

!mongo --quiet --eval '{query}'

{ "title" : "This Means War" }
{ "title" : "The Amazing Spider - Man 2" }
