# Demonstration of reading BSON files and inserting the data into MongDB

## Load the bson file into a python data structure

In [1]:
import bson 

with open ('./data/tweets.bson', 'rb') as fin:
    tweets = bson.decode_all(fin.read())     

## Create a connection to our MongoDB instance

In [2]:
import pymongo # pymongo is a python driver for MongoDB

client = pymongo.MongoClient('mongodb://localhost:27017/')

db = client.tweets # create a database called tweets, to store, um ... tweets
collection = db.tweet_set01 # create a collection called tweet_set01, to store tweets

## Insert our data into MongoDB

We can insert into MongoDB using the insert_one() method. We can also insert multiple documents at once using the insert_many() method. The insert_many() method takes a list of documents as an argument. 

Let's insert the tweets into the collection one by one.

In [3]:
collection.drop() # make sure this is empty before we start inserting tweets

for tweet in tweets:
    result = collection.insert_one(tweet)  

In [4]:
results = collection.find().limit(10); # this will get the first 10 tweets in the collection tweet_set01

for result in results:
    print(result)

{'_id': ObjectId('553bbecae8f1e57878b72a1c'), 'created_at': 'Sat Apr 25 16:19:03 +0000 2015', 'id': 5.919998745409044e+17, 'id_str': '591999874540904448', 'text': 'RT @webinara: RT: http://t.co/tgxDJSOrHb #webinar #TrueTwit #TechTip.\nA Node.js API development webinar:\nhttps://t.co/nBjkk4MnuN', 'source': '<a href="http://ameanmagazine.blogspot.com" rel="nofollow">A Mean Magazine Bot</a>', 'truncated': False, 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'follow_request_sent': False, 'screen_name': 'ameanmbot', 'description': 'Non curated list of relevant tweets about the MEAN stack. For a curated one visit @ameanmagazine', 'statuses_count': 197348.0, 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'location': '', 'geo_enabled': False, 'default_profile': True, 'following': True, 'favourites_count': 93.0, 'verified': False, 'lang

Now, see how much faster it is to insert many at once...

In [5]:
collection.drop() # clear out the collection of all tweets
collection.insert_many(tweets)

results = collection.find().limit(10);

for result in results:
    print(result)

{'_id': ObjectId('553bbecae8f1e57878b72a1c'), 'created_at': 'Sat Apr 25 16:19:03 +0000 2015', 'id': 5.919998745409044e+17, 'id_str': '591999874540904448', 'text': 'RT @webinara: RT: http://t.co/tgxDJSOrHb #webinar #TrueTwit #TechTip.\nA Node.js API development webinar:\nhttps://t.co/nBjkk4MnuN', 'source': '<a href="http://ameanmagazine.blogspot.com" rel="nofollow">A Mean Magazine Bot</a>', 'truncated': False, 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'follow_request_sent': False, 'screen_name': 'ameanmbot', 'description': 'Non curated list of relevant tweets about the MEAN stack. For a curated one visit @ameanmagazine', 'statuses_count': 197348.0, 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'location': '', 'geo_enabled': False, 'default_profile': True, 'following': True, 'favourites_count': 93.0, 'verified': False, 'lang

In [6]:
client.close()