In [None]:
CHAPTER 11 Python – PyMongo
11.1 What is NOSQL?
In today’s era of real-time web applications, NoSQL databases are becoming
increasingly popular. The term NoSQL originally referred to “non SQL” or
“non-relational”, but its supporters prefer to call “Not only SQL” to indicate
that SQL-like query language may be supported alongside.
NoSQL is touted as open-source, distributed and horizontally scalable
schema-free database architecture. NoSQL databases are more scalable and
provide superior performance as compared to RDBMS. This primarily
because it requires that schemas (table structure, relationships, etc.) be
defined before you can add data. Some of the popular NoSQL databases
extensively in use today include MongoDB, CouchDB, Cassendra, HBase,
etc.


In [None]:
Several NoSQL products are available in the market. They are classified into
four categories based on the data model used by them.
Key-Value store: Uses a hash table (also called a dictionary). Each item in
the database is stored as a unique attribute name (or ‘key’), associated with
its value. The key-value model is the simplest type of NoSQL database.
Examples of key-value databases are Amazon simpleDB, Oracle BDB,
Riak, and Berkeley DB.
Column Oriented: This type of databases store and process very large
amount of data distributed over multiple machines. Keys point to multiple
columns. The columns are arranged by column family. Examples of column-
oriented databases are Cassandra and HBase.
Document oriented: Database of this category is an advanced key-value
store database. The semi-structured documents are stored in formats like
JSON. Documents can contain many different key-value pairs, or key-array
pairs, or even nested documents.

In [None]:
Graph Based: Databases of this type are used to store information about
networks of data, such as social connections. A flexible graph model can
scale across multiple machines. Graph stores include Neo4J and Giraph.
In this chapter, we shall get acquainted with a hugely popular document-
oriented database, MongoDB and how it can be interfaced with Python
through PyMongo module.

In [None]:
11.2 MongoDB
NoSQL databases typically have huge amount of data. Hence, more often
than not, power of a single CPU is not enough when it comes to fetching
data corresponding to a query. MongoDB uses a sharding technique which
splits data sets across multiple instances. A large collection of data is split
across multiple physical servers called ‘shards’, even though they behave as
one collection. Query request from the application is routed to appropriate
shard and the result is served. Thus, MongoDB achieves horizontal
scalability. (Figure 11.1)

In [None]:
The Document is the heart of a MongoDB database. It is a collection of key-
value pairs – similar to Python’s dictionary object. We can also think of it
being similar to single row in a table of SQL based relational database.Collection in MongoDB is analogous to a table in the relational database.
However, it doesn’t have a predefined schema. The Collection has a
dynamic schema in the sense each document may of a variable number of k-
v pairs not necessarily with the same keys in each document.
Each document is characterized by a special key called “_id” having a
unique value, again similar to a primary key in the entity table of a relational
database.
MongoDB server has a command-line interface from inside which different
database operations can be performed.

In [None]:
11.3 Installation of MongoDB
MongoDB server software is available in two forms: Community edition
(open source release) and Enterprise edition (having additional features such
as administration, and monitoring).
The MongoDB community edition is available for Windows, Linux as well
as MacOS operating systems at https://www.mongodb.com/download-
center/community. Choose appropriate version as per the OS and
architecture of your machine and install it as per the instructions on the
official website. Examples in this chapter assume that MongoDB is installed
on Windows in e:\mongodb folder.
Start MongoDB server from command terminal using the following
command:
The server is now listening to connection request from client at port number
22017 of the localhost. (Server’s startup logs are omitted in the above
display). To stop it, press ctrl-C. MongoDB databases are stored in the
bin\data directory. You can specify alternative location though by specifying
--dbpath option as follows:
Example 11.1
E:\mongodb\bin>mongod --dbpath e:\testNow, start Mongo shell in another terminal.
Mongo shell is a Javascript interface to MongoDB server. It is similar to the
SQLite shell or MsSQL console, as we have seen earlier chapter. The CRUD
operations on MongoDB database can be performed from here.

In [None]:
11.4 MongoDB - Create Database
To display the current database in use, there’s db command. Default
database in use is test.
With ‘use’ command any other database is set as current. If the named
database doesn’t exist, a new one is created.
However, until you store data (such as collection or document ) in it, is the
database is not created. The following command inserts a document in
‘products’ collection under the current database.

In [None]:
11.5 MongoDB - Insert Document
Appropriately, insertone() method is available to a collection object in a
database. A document is provided to it as a parameter.Result of above (for that matter any insert/update/delete operation)
command returns WriteResult object. The insert() method inserts
multiple documents if the argument is an array of documents. In that case,
the result is BulkWriteResult object.
The insert() function inserts single document or array whereas a single
document is inserted with inserOne() method and array whereas
insert_many() method is used with an array.

In [None]:
11.6 MongoDB - Querying Collection
Retrieving data from the database is always an important operation.
MongoDB’s collection has find() method with which documents are
fetched. Without any argument, find() method returns a result set of all
documents in a collection. In effect, it is equivalent to ‘SELECT * FROM
<table>’ in SQL.Note that, ‘_id’ key is automatically added to each document. The value of
each _id is of ObjectId type and is unique for each document.
Invariably, you would want to apply to the result set returned by find() . It is
done by putting the key-value pair in its parenthesis. In its generalized form,
the conditional query is written as follows:
Example 11.2
db.collection.find({"key":"value"})
The following statement retrieves a document whose ‘Name’ key has ‘TV’
value.
Example 11.3
> db.products.find({"Name":"TV"})
{ "_id" : ObjectId("5c8d420c7bebaca49b767db4"), "ProductID" :
2, "Name" : "TV", "price" : 40000 }
MongoDB doesn’t use traditional logical operator symbols. Instead, it has its
own operators, as listed below: (table 11.1)

In [None]:
Table 11.1 Logical operators
MongoDB operator
Description$eq equal to (==)
$gt greater than (>)
$gte greater than or equal to (>=)
$in if equal to any value in array
$lt less than (<)
$lte less than or equal to (<=)
$ne not equal to (!=)
$nin if not equal to any value in array
The operators are used in find() method to apply filter. The following
statement returns products with price>10000.

In [None]:
The $and as well as $or operators are available for compound logical
expressions. Their usage is, as follows:
Example 11.4
db.collection.find($and:[{"key1":"value1"}, {"key2":"value2"}])
Use the following command to fetch products with price between 1000 and
10000.

In [None]:
11.7 MongoDB - Update Document
Predictably, there is an update() method available to collection object. Just
as in SQL UPDATE, the $set operator assigns updated value to a specified
key. Its primary usage is, as below:
Example 11.5
db.collection.update({"key":"value"},
{"key":"newvalue"}})
{$set:
For example, the following statement changes the price of ‘TV’ to 50000.
The WriteResult() confirms the modification. You can also use Boolean
operators in the update criteria. To perform update on multiple documents,
use updateMany() method. The following command use $inc operator to
increment the price by 500 for all products with ProductID greater than 3.

In [None]:
11.8 MongoDB - Delete Document
The remove() method deletes one or more documents from the collection
based on the provided criteria. The following statement will result in
removal of a document pertaining to price>40000 (in our data it happens to
be with name=’TV’).
Run the find() method in the shell to verify the removal.
Now that, we have attained some level of familiarity with MongoDB with
the help of shell commands.,let us concentrate on our main objective – use
MongoDB in Python.

In [None]:
11.9 PyMongo Module
module is an official Python driver for MongoDB database
developed by Mongo Inc. It can be used on Windows, Linux, as well as
MacOS. As always, you need to install this module using pip3 utility.
PyMongo
Before attempting to perform any operation on a database, ensure that you
have started the server using ‘ mongod ’ command and the server is listening at
port number 22017.
To let your Python interpreter interact with the server, establish a connection
with the object of MongoClient class.
Example 11.6
>>> from pymongo import MongoClient
>>> client=MongoClient()

In [None]:
The following syntax is also valid for setting up connection with server.
Example 11.7
>>> client = MongoClient('localhost', 27017)
#or
client = MongoClient('mongodb://localhost:27017')
In order to display currently available databases use list_database_names()
method of MongoClient class.
Example 11.8
>>> client.list_database_names()
['admin', 'config', 'local', 'mydb']

In [None]:
11.10 PyMongo – Add Collection
Create a new database object by using any name currently not in the list.
Example 11.9
>>> db=client.newdb
The Database is actually created when first document is inserted. The
following statement will implicitly create a ‘products’ collection and
multiple documents from the given list of dictionary objects.
Example 11.10
>>> pricelist=[{'ProductID':1, 'Name':'Laptop', 'price':25000},
{'ProductID':2,
'Name':'TV',
'price':40000},{'ProductID':3,
'Name':'Router',
'price':2000},{'ProductID':4,
'Name':'Scanner',
'price':5000},{'ProductID':5,
'Name':'Printer', 'price':9000}]
>>> db.products.insert_many(pricelist)
You can confirm the insertion operation by
shell, as we have done earlier.
find()
method in the Mongo
We create a collection object explicitly by using create_collection() method
of the database object.

In [None]:
Example 11.11
>>> db.create_collection('customers')
Now, we can add one or more documents in it. The following script adds
documents in ‘customers’ collection.
Example 11.12
from pymongo import MongoClient
client=MongoClient()
db=client.newdb
db.create_collection("customers")
cust=db['customers']
custlist=
[{'CustID':1,'Name':'Ravikumar','GSTIN':'27AAJPL7103N1ZF'},
{'CustID':2,'Name':'Patel','GSTIN':'24ASDFG1234N1ZN'},
{'CustID':3,'Name':'Nitin','GSTIN':'27AABBC7895N1ZT'},
{'CustID':4,'Name':'Nair','GSTIN':'32MMAF8963N1ZK'},
{'CustID':5,'Name':'Shah','GSTIN':'24BADEF2002N1ZB'},
{'CustID':6,'Name':'Khurana','GSTIN':'07KABCS1002N1ZV'},
{'CustID':7,'Name':'Irfan','GSTIN':'05IIAAV5103N1ZA'},
{'CustID':8,'Name':'Kiran','GSTIN':'12PPSDF22431ZC'},
{'CustID':9,'Name':'Divya','GSTIN':'15ABCDE1101N1ZA'},
{'CustID':10,'Name':'John','GSTIN':'29AAEEC4258E1ZK'}]
cust.insert_many(custlist)
client.close()

In [None]:
11.11 PyMongo - Querying Collection
PyMongo module defines find () method to be used with a collection object.
It returns a cursor object which provides a list of all documents in the
collection.
Example 11.13
>>> products=db['products']
>>> docs=products.find()
>>> list(docs)[{'_id': ObjectId('5c8dec275405c12e3402423c'), 'ProductID': 1,
'Name':
'Laptop',
'price':
25000},
{'_id':
ObjectId('5c8dec275405c12e3402423d'), 'ProductID': 2, 'Name':
'TV',
'price':
50000},
{'_id':
ObjectId('5c8dec275405c12e3402423e'), 'ProductID': 3, 'Name':
'Router',
'price':
2000},
{'_id':
ObjectId('5c8dec275405c12e3402423f'), 'ProductID': 4, 'Name':
'Scanner',
'price':
5000},
{'_id':
ObjectId('5c8dec275405c12e34024240'), 'ProductID': 5, 'Name':
'Printer', 'price': 9000}]
This cursor object is an iterator that serves one document for every call of
the next() method. Each document is a dictionary object of k-v pairs. The
following code displays the name and GSTIN of all customers.

In [None]:
Example 11.14
#mongofind.py
from pymongo import MongoClient
client=MongoClient()
db=client.newdb
cust=db['customers']
docs=cust.find()
while True:
try:
doc=docs.next()
print (doc['Name'], doc['GSTIN'])
except StopIteration:
break
client.close()
Run above script from command prompt.

In [None]:
You can, ofcourse, employ a regular ‘ for ’ loop to traverse the cursor object
to obtain one document at a time.
Example 11.15
for doc in docs:
print (doc['Name'], doc['GSTIN'])
The logical operators of MongoDB (described earlier in this chapter) are
used to apply filter criteria for find() method. As an example, products with
price>10000 are fetched with the following statement:
Example 11.16
>>> products=db['products']
>>> docs=products.find({'price':{'$gt':10000}})
>>> for doc in docs:
print (doc.get('Name'), doc.get('price'))
Laptop 25000
TV 50000

In [None]:
11.12 PyMongo – Update Document
PyMongo offers two collection methods for modification of data in one or
more documents. They are update_one() and update_many() . Both require
a filter criteria and a new value of one or more keys. The update_one()updates only the first document that satisfies filter criteria. On the other
hand, update_many() performs update on all documents that satisfy the filter
criteria.
Example 11.17
collection.update_one(filter, newval)
Following Python script accepts name of the product from user and displays
the current price. It is updated to the new price input by the user.
Example 11.18
#mongoupdate.py
from pymongo import MongoClient
client=MongoClient()
db=client.newdb
prod=input('enter name:')
doc=db.products.find_one({'Name':prod})
print (doc['Name'], doc['price'])
price=int(input('enter price:'))
db['products'].update_one({'Name':prod},{"$set":
{'price':price}})
client.close()
Execute above script from the command prompt:

In [None]:
11.13 PyMongo - Relationships
MongoDB is a non-relational database. However, you can still establish
relationships between documents in a database. MongoDB uses two different
approaches for this purpose. One is an embedded approach and the other is a
referencing approach.

In [None]:
Embedded Relationship
In this case, the documents appear in a nested manner where another
document is used as the value of a certain key. The following code
represents a ‘customer’ document showing a customer (with ‘_id’=1) buys
two products. A list of two product documents is the value of ‘prods’ key.
Example 11.19
>>> cust.insert_one({'_id':1,'name':'Ravi',
'prods':[
{ 'Name':'TV', 'price':40000},
{'Name':'Scanner','price':5000}
]
})
Querying such an embedded document is straightforward as all data is
available in the parent document itself.

In [None]:
Example 11.20
>>> doc=cust.find_one({'_id':1},{'prods':1})
>>> doc
{'_id': 1, 'prods': [{'Name': 'TV', 'price': 40000}, {'Name':
'Scanner', 'price': 5000}]}
The embedded approach has a major drawback. The database is not
normalized and, hence, data redundancy arises. As size grows, it may affect
the performance of read/write operations.
Reference Relationship
This approach is somewhat similar to the relations in a SQL based database.
The collections (equivalent to RDBMS table) are normalized for optimum
performance. One document refers to other with its ‘ _id ’ key.
Recollecting that instead of automatically generated random values for ‘_id’,
they can be explicitly specified while inserting document in a collection,
following is the constitution of ‘products’ collection.

In [None]:
Example 11.21
>>> list(prod.find())
[{'_id': 1, 'Name': 'Laptop', 'price': 25000}, {'_id': 2,
'Name': 'TV', 'price': 40000}, {'_id': 3, 'Name': 'Router',
'price': 2000}, {'_id': 4, 'Name': 'Scanner', 'price': 5000},
{'_id': 5, 'Name': 'Printer', 'price': 9000}]
We now create ‘customers’ collection.
Example 11.22
>>> db.create_collection('customers')
>>> cust=db['customers']
The following document is inserted with one key ‘prods’ being a list of
'_id's from products collection.
Example 11.23
>>> cust.insert_one({'_id':1, 'Name':'Ravi', 'prods':[2,4]})
However, in such a case, you may have to run two queries: one on the parent
collection, and another on related collection. First, fetch the _ids of related
table.

In [None]:
Example 11.24
>>> doc=cust.find_one({'_id':1},{'prods':1})
>>> prods
[2, 4]
Then, iterate over the list and access required field from the related
document.
Example 11.25
>>> for each in prods:
doc=prod.find_one({'_id':each})
print (doc['Name'])TV
Scanner
Reference approach can be used to build one-to-one or one-to-many type of
relationships. The choice of approach (embedded or reference) largely
depends on data usage, projected growth of size of the document and the
atomicity of the transaction.
In this chapter, we had an overview of MongoDB database and its Python
interface in the form of PyMongo module. In the next chapter, another
NoSQL database – Cassandra – is going to be explained along with its
association with Python.