Reference: https://www.datacamp.com/community/tutorials/introduction-mongodb-python

SQL vs NoSQL Difference 

Data in RDBMS is stored in database objects called tables.
In NoSQL, data is stored in several ways: column-orientied, document-oriented, graph-based, or key-value store.

NoSQL advantage:
1. Documents can be created without having to first define their structure
2. Each document can have its own unique structure
3. The syntax vary from db to db
4. Large volumns of structured, semi-structured, unstructured data
5. easy to use OOP programming
6. Horizontally scalable

Document db: MongoDB<br/>
Graph stores: Neo4J, Giraph(social connections data)<br/>
Key-value stores: Redis<br/>
Wide-column stores: HBase(optimized for queries over large datasets<br/>

In [3]:
import pymongo

In [4]:
from pymongo import MongoClient
client = MongoClient()

In [16]:
#specify host and port
client = MongoClient()

In [17]:
#create a database
db = client['datacampdb']

In [18]:
#inserting a document
article = {
    "learner": "rh2835",
    "about": "integration of mongodb and python",
    "tags": ["mongodb", "python", "pymongo"]
}
articles = db.articles
result = articles.insert_one(article)

In [19]:
print("first article key is: {}".format(result.inserted_id))

first article key is: 5be765eec600ebe49eedb18f


In [20]:
#confirm the article collection is created
db.list_collection_names()

['articles']

In [35]:
#insert more document 
article1 = {
    "learner": "yuanchen",
    "about": "knn and python",
    "tags": ["knn", "pymongo"]
}

article2 = {
    "learner": "lingling",
    "about": "web development and python",
    "tags": ["web","design","html"]
}

In [36]:
new_articles = articles.insert_many([article1,article2])

In [37]:
print("the new article IDs are{}".format(new_articles.inserted_ids))

the new article IDs are[ObjectId('5be78395c600ebe49eedb192'), ObjectId('5be78395c600ebe49eedb193')]


In [38]:
#retrive a single document 
print(articles.find_one())

{'_id': ObjectId('5be765eec600ebe49eedb18f'), 'learner': 'rh2835', 'about': 'integration of mongodb and python', 'tags': ['mongodb', 'python', 'pymongo']}


In [39]:
#retrive all document
for article in articles.find():
    print (article)

{'_id': ObjectId('5be765eec600ebe49eedb18f'), 'learner': 'rh2835', 'about': 'integration of mongodb and python', 'tags': ['mongodb', 'python', 'pymongo']}
{'_id': ObjectId('5be77d7fc600ebe49eedb190'), 'author': 'yuanchen', 'about': 'knn and python', 'tags': ['knn', 'pymongo']}
{'_id': ObjectId('5be77d7fc600ebe49eedb191'), 'author': 'lingling', 'about': 'web development and python', 'tags': ['web', 'design', 'html']}
{'_id': ObjectId('5be78395c600ebe49eedb192'), 'learner': 'yuanchen', 'about': 'knn and python', 'tags': ['knn', 'pymongo']}
{'_id': ObjectId('5be78395c600ebe49eedb193'), 'learner': 'lingling', 'about': 'web development and python', 'tags': ['web', 'design', 'html']}


In [40]:
#for web applications, if need to get string url ID
#first, convert obtained string ID into an ObjectId
from bson.objectid import ObjectId
def get(post_id):
    document = client.db.collection.find_one({
        '_id': ObjectId(post_id)
    })

In [41]:
#return selected fields
for article in articles.find(
    {},{"_id":0, "learner":1, "about":1}
):
    print(article)

{'learner': 'rh2835', 'about': 'integration of mongodb and python'}
{'about': 'knn and python'}
{'about': 'web development and python'}
{'learner': 'yuanchen', 'about': 'knn and python'}
{'learner': 'lingling', 'about': 'web development and python'}


In [42]:
#sorting results: 1 asending, -1 descending
doc = articles.find().sort("learner",-1)
for x in doc:
    print(x)

{'_id': ObjectId('5be78395c600ebe49eedb192'), 'learner': 'yuanchen', 'about': 'knn and python', 'tags': ['knn', 'pymongo']}
{'_id': ObjectId('5be765eec600ebe49eedb18f'), 'learner': 'rh2835', 'about': 'integration of mongodb and python', 'tags': ['mongodb', 'python', 'pymongo']}
{'_id': ObjectId('5be78395c600ebe49eedb193'), 'learner': 'lingling', 'about': 'web development and python', 'tags': ['web', 'design', 'html']}
{'_id': ObjectId('5be77d7fc600ebe49eedb190'), 'author': 'yuanchen', 'about': 'knn and python', 'tags': ['knn', 'pymongo']}
{'_id': ObjectId('5be77d7fc600ebe49eedb191'), 'author': 'lingling', 'about': 'web development and python', 'tags': ['web', 'design', 'html']}


In [43]:
#update a doc
query = {"learner": "rh2835"}
new_author = {"$set": {"learner": "john"}}
articles.update_one(query,new_author)
for article in articles.find():
    print(article)

{'_id': ObjectId('5be765eec600ebe49eedb18f'), 'learner': 'john', 'about': 'integration of mongodb and python', 'tags': ['mongodb', 'python', 'pymongo']}
{'_id': ObjectId('5be77d7fc600ebe49eedb190'), 'author': 'yuanchen', 'about': 'knn and python', 'tags': ['knn', 'pymongo']}
{'_id': ObjectId('5be77d7fc600ebe49eedb191'), 'author': 'lingling', 'about': 'web development and python', 'tags': ['web', 'design', 'html']}
{'_id': ObjectId('5be78395c600ebe49eedb192'), 'learner': 'yuanchen', 'about': 'knn and python', 'tags': ['knn', 'pymongo']}
{'_id': ObjectId('5be78395c600ebe49eedb193'), 'learner': 'lingling', 'about': 'web development and python', 'tags': ['web', 'design', 'html']}


In [44]:
#limit result
limited_result = articles.find().limit(1)
for x in limited_result:
    print(x)

{'_id': ObjectId('5be765eec600ebe49eedb18f'), 'learner': 'john', 'about': 'integration of mongodb and python', 'tags': ['mongodb', 'python', 'pymongo']}


In [47]:
#delete a doc
delete_articles = db.articles.delete_one({"_id": ObjectId('5be765eec600ebe49eedb18f')
                       })
#delete_many for multiple doc
print(delete_articles.deleted_count," articles deleted")

0  articles deleted


In [49]:
#drop a collection
#articles.drop()

#confirm droping
#db.list_collection_names()

In [50]:
#ODM: object document mapper
from mongoengine import *
connect('datacampdb')

MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, read_preference=Primary())

In [51]:
#define the fields a user will have and data types
#required=T, we have to specify this field when creating a user
class User(Document):
    email = StringField(required=True)
    first_name = StringField(max_length=30)
    last_name = StringField(max_length=30)

In [52]:
#create posts doc and reference the users doc
class Post(Document):
    title = StringField(max_length=120, required=True)
    author = ReferenceField(User)

In [53]:
user = User(email="abc@gmail.com", 
           first_name="yuan",
           last_name="chen")
user.save()

<User: User object>

In [54]:
print(user.id, user.email)

5be78cd5c600ebe49eedb195 abc@gmail.com


Pros and Cons for ORM(object relational mapper)<br/>

Using ORM saves a lot of time because:<br/>
    1. DRY(don't repeat code)<br/>
    2. a lot of staff is done automatically
    3. no poorly-formed SWL

Using ORM library is more flexible:<br/>
    1. It abstracts the DB system, so you can change it whenever you want

Using ORM can be a pain:<br/>
    1. you have to learn it
    2. you have to set it up
    3. Performance is OK for usual queries, not for complex queries for big projects
    4. it's a trap for new programmers to write very greedy statements