# MongoDB Example - for Accountant reinsurance loss data analysis 

<!-- PELICAN_BEGIN_SUMMARY -->

MongoDB is a document-oriented database. Instead of storing your data in tables made out of individual rows,
like a relational database does, it stores your data in collections made out of individual documents.
In MongoDB, a document is a big JSON blob with no particular format or schema.

You can have all your data in one single table or collection.

<!-- PELICAN_END_SUMMARY -->

### MongoDB for Reinsurance Data Analysis
- Create sample data in MongoDB
- Read webpage data into MongoDB and retrieve it for data analysis

In [39]:
## ref: http://api.mongodb.com/python/current/tutorial.html
# Import pymongo
# The first step when working with PyMongo is to create a MongoClient to the running mongod instance
# Making a Connection with MongoClient
import pymongo
from pymongo import MongoClient


# Connect on the default host and port.
client = MongoClient()


# We can also specify the host and port explicitly
client = MongoClient('localhost', 27017)


# Getting a Database
# A single instance of MongoDB can support multiple independent databases.
# When working with PyMongo you access databases using attribute style access on MongoClient instances:
db = client.test_database


# If your database name is such that using attribute style access won’t work (like test-database),
# you can use dictionary style access instead:
collection = db.test_collection

collection

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'test_database'), 'test_collection')

In [40]:
#Data in MongoDB is represented (and stored) using JSON-style documents. 
#In PyMongo we use dictionaries to represent documents.
#As an example, the following dictionary might be used to represent a blog post:
import datetime
post = {"reinsurer": "AIG",
        "treaty": "XOL layer",
        "tags": ["mongodb", "python", "pymongo"],
        "date": datetime.datetime.utcnow()}

In [None]:
#Inserting a Document
- When a document is inserted a special key, "_id", is automatically added if the document doesn’t already contain an "_id" key.
- The value of "_id" must be unique across the collection. insert_one() returns an instance of InsertOneResult. 

In [41]:
# To insert a document into a collection we can use the insert_one() method:
posts = db.posts
post_id = posts.insert_one(post).inserted_id
post_id


ObjectId('5b720f16a68b144f903802fa')

In [None]:
# After inserting the first document, the posts collection has actually been created on the server.
# We can verify this by listing all of the collections in our database:

In [42]:
db.collection_names(include_system_collections=False)

['posts', 'profiles']

In [43]:
post2 = {"reinsurer": "Swiss Re",
        "treaty": "Clash Layer",
        "tags": ["mongodb", "python", "pymongo"],
        "date": datetime.datetime.utcnow()}

post_id = posts.insert_one(post2).inserted_id
post_id

ObjectId('5b720f1ea68b144f903802fb')

In [None]:
# Getting a Single Document With find_one()
- The most basic type of query that can be performed in MongoDB is find_one().
- This method returns a single document matching a query (or None if there are no matches).
- It is useful when you know there is only one matching document, or
- are only interested in the first match. Here we use find_one() to get the first document from the posts collection:

In [44]:
import pprint
pprint.pprint(posts.find_one())

{'_id': ObjectId('5b6e411ba68b144f903802ee'),
 'author': 'Mike',
 'date': datetime.datetime(2018, 8, 11, 1, 47, 39, 710000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}


In [45]:
pprint.pprint(posts.find_one({"reinsurer": "AIG"}))

{'_id': ObjectId('5b720f16a68b144f903802fa'),
 'date': datetime.datetime(2018, 8, 13, 23, 6, 57, 506000),
 'reinsurer': 'AIG',
 'tags': ['mongodb', 'python', 'pymongo'],
 'treaty': 'XOL layer'}


In [46]:
pprint.pprint(posts.find_one({"reinsurer": "ACE"}))

None


In [None]:
# Querying By ObjectId
- We can also find a post by its _id, which in our example is an ObjectId:

In [47]:
post_id  ##output is an object

ObjectId('5b720f1ea68b144f903802fb')

In [48]:
pprint.pprint(posts.find_one({"_id": post_id}))

{'_id': ObjectId('5b720f1ea68b144f903802fb'),
 'date': datetime.datetime(2018, 8, 13, 23, 7, 10, 339000),
 'reinsurer': 'Swiss Re',
 'tags': ['mongodb', 'python', 'pymongo'],
 'treaty': 'Clash Layer'}


In [None]:
# Note that an ObjectId is not the same as its string representation:

In [49]:
post_id_as_str = str(post_id)
post_id_as_str  ## output is a string


'5b720f1ea68b144f903802fb'

In [50]:
posts.find_one({"_id": post_id_as_str}) # No result

In [None]:
# Get URL data
- A common task in web applications is to get an ObjectId from the request URL and find the matching document.
- It’s necessary in this case to convert the ObjectId from a string before passing it to find_one:

In [51]:
from bson.objectid import ObjectId

# The web framework gets post_id from the URL and passes it as a string
def get(post_id):
    # Convert from string to ObjectId:
    document = client.db.collection.find_one({'_id': ObjectId(post_id)})

In [None]:
# A Note On Unicode Strings
- You probably noticed that the regular Python strings we stored earlier look different when retrieved from the server (e.g. u’Mike’ instead of ‘Mike’).
- A short explanation is in order.
- MongoDB stores data in BSON format. BSON strings are UTF-8 encoded so PyMongo must ensure that any strings it stores contain only valid UTF-8 data.
- Regular strings (<type ‘str’>) are validated and stored unaltered. Unicode strings (<type ‘unicode’>) are encoded UTF-8 first. 
- The reason our example string is represented in the Python shell as u’Mike’ instead of ‘Mike’ 
- is that PyMongo decodes each BSON string to a Python unicode string, not a regular str.

In [None]:
# Bulk Inserts
- we can also perform bulk insert operations, by passing a list as the first argument to insert_many(). 
- This will insert each document in the list, sending only a single command to the server:
- different shape, but can handle with new title,can add to the collection... because its not a table

In [52]:
#new_posts[1] has a different “shape” than the other posts 
#- there is no "tags" field and we’ve added a new field, "title". 
#This is what we mean when we say that MongoDB is schema-free.


new_posts = [{"reinsurer": "AIG",
              "treaty": "XOL Layer 2018",
              "tags": ["bulk", "insert"],
              "date": datetime.datetime(2018, 11, 12, 11, 14)},
              {"reinsurer": "Munich Re",
               "treaty": "QS 2018",
               "text": "QS 20% for US business",
               "date": datetime.datetime(2018, 11, 10, 10, 45)}]

result = posts.insert_many(new_posts)
result.inserted_ids

[ObjectId('5b720f37a68b144f903802fc'), ObjectId('5b720f37a68b144f903802fd')]

In [None]:
# The result from insert_many() now returns two ObjectId instances, one for each inserted document.
 - new_posts[1] has a different “shape” than the other posts - there is no "tags" field and we’ve added a new field, "title".
- This is what we mean when we say that MongoDB is schema-free.

In [53]:
for post in posts.find():
     pprint.pprint(post)

{'_id': ObjectId('5b6e411ba68b144f903802ee'),
 'author': 'Mike',
 'date': datetime.datetime(2018, 8, 11, 1, 47, 39, 710000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}
{'_id': ObjectId('5b6e4225a68b144f903802f0'),
 'author': 'Jasmin',
 'date': datetime.datetime(2018, 8, 11, 1, 55, 49, 411000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My second blog post!'}
{'_id': ObjectId('5b6e4490a68b144f903802f1'),
 'author': 'Mike',
 'date': datetime.datetime(2009, 11, 12, 11, 14),
 'tags': ['bulk', 'insert'],
 'text': 'Another post!'}
{'_id': ObjectId('5b6e4490a68b144f903802f2'),
 'author': 'Eliot',
 'date': datetime.datetime(2009, 11, 10, 10, 45),
 'text': 'and pretty easy too!',
 'title': 'MongoDB is fun'}
{'_id': ObjectId('5b720f16a68b144f903802fa'),
 'date': datetime.datetime(2018, 8, 13, 23, 6, 57, 506000),
 'reinsurer': 'AIG',
 'tags': ['mongodb', 'python', 'pymongo'],
 'treaty': 'XOL layer'}
{'_id': ObjectId('5b720f1ea68b144f903802fb'),
 'date': d

In [54]:
# different than find_one
for post in posts.find({"reinsurer": "AIG"}):
    pprint.pprint(post)

{'_id': ObjectId('5b720f16a68b144f903802fa'),
 'date': datetime.datetime(2018, 8, 13, 23, 6, 57, 506000),
 'reinsurer': 'AIG',
 'tags': ['mongodb', 'python', 'pymongo'],
 'treaty': 'XOL layer'}
{'_id': ObjectId('5b720f37a68b144f903802fc'),
 'date': datetime.datetime(2018, 11, 12, 11, 14),
 'reinsurer': 'AIG',
 'tags': ['bulk', 'insert'],
 'treaty': 'XOL Layer 2018'}


In [55]:
pprint.pprint(posts.find_one({"reinsurer": "AIG"}))

{'_id': ObjectId('5b720f16a68b144f903802fa'),
 'date': datetime.datetime(2018, 8, 13, 23, 6, 57, 506000),
 'reinsurer': 'AIG',
 'tags': ['mongodb', 'python', 'pymongo'],
 'treaty': 'XOL layer'}


In [None]:
# Counting
 - If we just want to know how many documents match a query we can perform a count() operation instead of a full query.
- We can get a count of all of the documents in a collection:

In [56]:
posts.count()

8

In [57]:
posts.find({"reinsurer": "AIG"}).count()

2

In [None]:
# Range Queries
- MongoDB supports many different types of advanced queries.
- As an example, lets perform a query where we limit results to posts older than a certain date,
- but also sort the results by author:

In [58]:
d = datetime.datetime(2018, 11, 12, 12)
for post in posts.find({"date": {"$lt": d}}).sort("reinsurer"):
    pprint.pprint(post)

{'_id': ObjectId('5b6e411ba68b144f903802ee'),
 'author': 'Mike',
 'date': datetime.datetime(2018, 8, 11, 1, 47, 39, 710000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}
{'_id': ObjectId('5b6e4225a68b144f903802f0'),
 'author': 'Jasmin',
 'date': datetime.datetime(2018, 8, 11, 1, 55, 49, 411000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My second blog post!'}
{'_id': ObjectId('5b6e4490a68b144f903802f1'),
 'author': 'Mike',
 'date': datetime.datetime(2009, 11, 12, 11, 14),
 'tags': ['bulk', 'insert'],
 'text': 'Another post!'}
{'_id': ObjectId('5b6e4490a68b144f903802f2'),
 'author': 'Eliot',
 'date': datetime.datetime(2009, 11, 10, 10, 45),
 'text': 'and pretty easy too!',
 'title': 'MongoDB is fun'}
{'_id': ObjectId('5b720f16a68b144f903802fa'),
 'date': datetime.datetime(2018, 8, 13, 23, 6, 57, 506000),
 'reinsurer': 'AIG',
 'tags': ['mongodb', 'python', 'pymongo'],
 'treaty': 'XOL layer'}
{'_id': ObjectId('5b720f37a68b144f903802fc'),
 'date': d

In [None]:
# Indexing
- Adding indexes can help accelerate certain queries and can also add additional functionality to querying and storing documents.
- In this example, we’ll demonstrate how to create a unique index on a key that rejects documents whose value for that key already exists in the index.
- First, we’ll need to create the index:

In [59]:
 result = db.profiles.create_index([('user_id', pymongo.ASCENDING)],
                                  unique=True)
 sorted(list(db.profiles.index_information()))

['_id_', 'user_id_1']

In [None]:
# Notice that we have two indexes now: one is the index on _id that MongoDB creates automatically,
# and the other is the index on user_id we just created.

In [67]:
 # set up some user profiles:

user_profiles = [
    {'user_id': 211, 'reinsurer': 'AIG'},
    {'user_id': 212, 'reinsurer': 'SCOR'}]
result = db.profiles.insert_many(user_profiles)       
   

BulkWriteError: batch op errors occurred

In [38]:
 # DuplicateKeyError: E11000 duplicate key error collection: test_database.profiles index: user_id_1 dup key: { : 212 }
 new_profile = {'user_id': 213, 'reinsurer': 'XL American'}
 duplicate_profile = {'user_id': 212, 'reinsurer': 'SCOR S.E'}
 result = db.profiles.insert_one(new_profile)  # This is fine.
 result = db.profiles.insert_one(duplicate_profile)

In [None]:
#### *Display the webpage data using node.jc to MongoDB then retrieced it*
#https://stackoverflow.com/questions/17256710/display-the-data-onto-webpage-retrieved-from-mongodb-using-node-js#