# Sample code from lecture 5

The following notebook contains Python examples of using MongoDB. 

## Accessing MongoDB using Python

**(a)** Connecting to database

In [1]:
import pymongo

connection = pymongo.MongoClient('localhost', 27017)
db = connection['test']

**(b)** Before executing the code below, make sure you've created a collection named profiles in your database. To do this, you need to perform the following steps: 
- Download the file named "users.json" from the class website.
- Open up a terminal/command prompt and use mongoimport to load the users.json file into a collection named profiles(see the lecture slide on "Bulk Import of JSON file").

    prompt> mongoimport -d test -c profiles --file users.json 

**(c)** Counting the number of documents in a collection:

In [3]:
# db.profiles.count()   # method is deprecated. 

db.profiles.count_documents({})

4

**(d)** Displaying the documents in collection

In [4]:
for record in db.profiles.find():
    print(record)

{'_id': 3, 'Name': 'John', 'City': 'Lansing', 'Age': 24}
{'_id': 2, 'Name': 'Mary', 'City': 'Chicago', 'Age': 45}
{'_id': 1, 'Name': 'Bob', 'City': 'Detroit', 'Age': 41}
{'_id': 4, 'Name': 'Jill', 'City': 'Lansing', 'Age': 22}


**(e)** Display only those who lived in Lansing

In [5]:
for record in db.profiles.find({"City": "Lansing"}):
    print(record)

{'_id': 3, 'Name': 'John', 'City': 'Lansing', 'Age': 24}
{'_id': 4, 'Name': 'Jill', 'City': 'Lansing', 'Age': 22}


**(f)** Add records to collection

In [6]:
new_recs = [{'Name':'Mike','City':'Okemos','Interests':['music','bowling']},
           {'Name':'Lee', 'City': 'Okemos', 'Groups': 'AAA'},
           {'Name':'Dee', 'City':'Okemos'}]
result = db.profiles.insert_many(new_recs)
db.profiles.count_documents({})

7

**(g)** Delete records of individuals from Lansing.

In [7]:
db.profiles.delete_many({'City':'Okemos'})

<pymongo.results.DeleteResult at 0x111c53c48>

In [8]:
for record in db.profiles.find():
    print(record)

{'_id': 3, 'Name': 'John', 'City': 'Lansing', 'Age': 24}
{'_id': 2, 'Name': 'Mary', 'City': 'Chicago', 'Age': 45}
{'_id': 1, 'Name': 'Bob', 'City': 'Detroit', 'Age': 41}
{'_id': 4, 'Name': 'Jill', 'City': 'Lansing', 'Age': 22}


**(h)** Calculating average age of individuals 

In [9]:
cursor = db.profiles.aggregate([
            {"$group": {"_id": "$City", "avgAge": {"$avg": "$Age"}}}
         ])

for record in cursor:
    print(record)

{'_id': 'Detroit', 'avgAge': 41.0}
{'_id': 'Chicago', 'avgAge': 45.0}
{'_id': 'Lansing', 'avgAge': 23.0}


## Using MongoDB to store tweets

In [9]:
import tweepy
from tweepy import OAuthHandler
from tweepy import API

C_KEY = 'XXXXXXX'
C_SECRET = 'XXXXXXX'
A_TOKEN_KEY = 'XXXXXX'
A_TOKEN_SECRET = 'XXXXXXX'

auth = tweepy.OAuthHandler(C_KEY, C_SECRET)
auth.set_access_token(A_TOKEN_KEY, A_TOKEN_SECRET)
api = tweepy.API(auth)

keyword = 'from:CDCgov'
posts = api.search(q=keyword,count=50)
for tweet in posts:
    print("* "+str(tweet.text.encode("utf-8")))

* b'Mothers: #Flu is one type of #infection that can lead to #sepsis. Protect yourself and your child from flu by getti\xe2\x80\xa6 https://t.co/GmbZ2PUlr3'
* b'@astvansh Please contact CDC-INFO at 800-232-4636 or send your questions via email at https://t.co/pUF2JIBq19.'
* b'Get the latest updates on #influenza and this #fluseason by following @CDCFlu. https://t.co/8u5kvApkJD'
* b'@crucial_yt Measles is 1 of the most contagious diseases on Earth. Infected ppl can spread measles 4 days before th\xe2\x80\xa6 https://t.co/GaHm8ODweg'
* b'@zerosafespace Measles spreads through the air when an infected person coughs or sneezes. The virus is not shed in\xe2\x80\xa6 https://t.co/W2nLy7DVsm'
* b'RT @CDCInjury: Help prevent intimate partner violence by learning more about promoting healthy behaviors in relationships. Get more info he\xe2\x80\xa6'
* b'FREE CE: Watch our latest #CDCGrandRounds on cervical cancer. Go #BeyondtheData as CDC\xe2\x80\x99s Dr. Phoebe Thorpe and Dr.\xe2\x80\xa6 https:/

In [10]:
import json
import pymongo

try:
    connection = pymongo.MongoClient('localhost', 27017)
    db = connection['test']

    for tweet in posts:
        db.twitter.insert_one(tweet._json)
    
    connection.close()
except pymongo.errors.ConnectionFailure as e:
   print("Could not connect to MongoDB: %s" % e)

In [11]:
import json
import pymongo

try:
    connection = pymongo.MongoClient('localhost', 27017)
    db = connection['test']

    tweets = db.twitter
    print("Found", tweets.count_documents({}), "tweets:")

    for tweet in tweets.find():
        print(tweet['text'])
    
    connection.close()
    
except pymongo.errors.ConnectionFailure as e:
   print("Could not connect to MongoDB: %s" % e)

Found 50 tweets:
Mothers: #Flu is one type of #infection that can lead to #sepsis. Protect yourself and your child from flu by gettiâ€¦ https://t.co/GmbZ2PUlr3
@astvansh Please contact CDC-INFO at 800-232-4636 or send your questions via email at https://t.co/pUF2JIBq19.
Get the latest updates on #influenza and this #fluseason by following @CDCFlu. https://t.co/8u5kvApkJD
@crucial_yt Measles is 1 of the most contagious diseases on Earth. Infected ppl can spread measles 4 days before thâ€¦ https://t.co/GaHm8ODweg
@zerosafespace Measles spreads through the air when an infected person coughs or sneezes. The virus is not shed inâ€¦ https://t.co/W2nLy7DVsm
RT @CDCInjury: Help prevent intimate partner violence by learning more about promoting healthy behaviors in relationships. Get more info heâ€¦
FREE CE: Watch our latest #CDCGrandRounds on cervical cancer. Go #BeyondtheData as CDCâ€™s Dr. Phoebe Thorpe and Dr.â€¦ https://t.co/dmg5fp4JSq
Hereâ€™s what to do if you or a loved one gets sick wi

In [12]:
for tweet in tweets.find({"text": {"$regex": ".*flu.*"}}):
    print(tweet['text'])

Mothers: #Flu is one type of #infection that can lead to #sepsis. Protect yourself and your child from flu by gettiâ€¦ https://t.co/GmbZ2PUlr3
Get the latest updates on #influenza and this #fluseason by following @CDCFlu. https://t.co/8u5kvApkJD
Hereâ€™s what to do if you or a loved one gets sick with #flu. https://t.co/e2i1idxp6j https://t.co/4bhuGpJSsy
3âƒ£If you do get #flu symptoms, and are at high risk of complications or are experiencing severe symptoms, talk withâ€¦ https://t.co/FP1kLT8GC7
1âƒ£For everyone 6 months and older, the first and most important step to #fightflu is to get a #fluvaccine.â€¦ https://t.co/sCrtHiXKjr
.@CDCFlu recommends 3 steps to #fightflu. https://t.co/qy68oF6KSP https://t.co/H1gJwcgjZL
#Clinicians: Help #fightflu this season by spreading the word about #flu vaccination.

Check out @CDCFluâ€™s digitalâ€¦ https://t.co/iTrvCtB0ce
#Flu severity indicators (illnesses, flu hospitalization rates, and % of deaths resulting from pneumonia or flu) arâ€¦ https://t

In [13]:
tweet = tweets.find_one()
for key in tweet:
   print("key: %s" % key)

key: _id
key: created_at
key: id
key: id_str
key: text
key: truncated
key: entities
key: metadata
key: source
key: in_reply_to_status_id
key: in_reply_to_status_id_str
key: in_reply_to_user_id
key: in_reply_to_user_id_str
key: in_reply_to_screen_name
key: user
key: geo
key: coordinates
key: place
key: contributors
key: is_quote_status
key: retweet_count
key: favorite_count
key: favorited
key: retweeted
key: possibly_sensitive
key: lang
