#**MongoDB With Python**

MongoDB stores data in JSON-like documents, which makes the database very flexible and scalable.
To be able to experiment with the code examples in this Notebook, you will need access to a MongoDB database.
You can download a free MongoDB database at https://www.mongodb.com.
Or get started right away with a MongoDB cloud service at https://www.mongodb.com/cloud/atlas.

##**PyMongo**

In [1]:
!pip install pymongo



In [2]:
!pip install pymongo[srv]

Collecting dnspython<3.0.0,>=1.16.0
  Downloading dnspython-2.2.0-py3-none-any.whl (266 kB)
[K     |████████████████████████████████| 266 kB 15.5 MB/s 
[?25hInstalling collected packages: dnspython
Successfully installed dnspython-2.2.0


##**1- Create Database**

To create a database in MongoDB, start by creating a MongoClient object, then specify a connection URL with the correct ip address and the name of the database you want to create.
MongoDB will create the database if it does not exist, and make a connection to it.

In [311]:
# Create a database called "mydatabase":
import pymongo

myclient = pymongo.MongoClient("mongodb://127.0.0.1:27017/")

mydb = myclient["mydatabase"]

```
Note:
In MongoDB, a database is not created until it gets content!
MongoDB waits until you have created a collection (table), with at least one document (record) before it actually creates the database
(and collection).



```



##**2. Check if Database Exists**

Remember:
In MongoDB, a database is not created until it gets content, so if this is your first time creating a database, you should complete the next two chapters (create collection and create document) before you check if the database exists!
You can check if a database exist by listing all databases in you system:

In [312]:
# Return a list of your system's databases:
print(myclient.list_database_names())

['admin', 'config', 'local', 'test']


In [3]:
# Check if "mydatabase" exists:

dblist = myclient.list_database_names()
if "mydatabase" in dblist:
    print("The database exists.")

The database exists.


##**3. Create Collection**

* To create a collection in MongoDB, use database object and specify the name of the collection you want to create.

* MongoDB will create the collection if it does not exist

In [315]:
# Create a collection called "customers":
import pymongo

myclient = pymongo.MongoClient("mongodb://127.0.0.1:27017/")
mydb = myclient["mydatabase"]

mycol = mydb["customers"]

##**4.Insert Document**

* A document in MongoDB is the same as a record in SQL databases.

* To insert a record, or document as it is called in MongoDB, into a collection, we use the insert_one() method.

* The first parameter of the insert_one() method is a dictionary containing the name(s) and value(s) of each field in the document you want to insert.

In [317]:
# Insert a record in the "customers" collection:

mydict = { "name": "John", "address": "Highway 37" }

x = mycol.insert_one(mydict)


##**5. Return the _id Field**

* The insert_one() method returns a InsertOneResult object, which has a property, inserted_id, that holds the id of the inserted document.

* Insert another record in the "customers" collection, and return the value of the _id field:

* In the example above no _id field was specified, so MongoDB assigned a unique _id for the record (document).

In [318]:
mydict = { "name": "Peter", "address": "Lowstreet 27" }

x = mycol.insert_one(mydict)

print(x.inserted_id)

620123974039d142fbd12c33


##**6. Insert Multiple Documents**
* To insert multiple documents into a collection in MongoDB, we use the insert_many() method.

* If you do not want MongoDB to assign unique ids for you document, you can specify the _id field when you insert the document(s).

* Remember that the values has to be unique. Two documents cannot have the same _id.

* The first parameter of the insert_many() method is a list containing dictionaries with the data you want to insert:

In [319]:
from pprint import pprint


mylist = [
  { "name": "Amy", "address": "Apple st 652"},
  { "name": "Hannah", "address": "Mountain 21"},
  { "name": "Michael", "address": "Valley 345"},
  { "name": "Sandy", "address": "Ocean blvd 2"},
  { "name": "Betty", "address": "Green Grass 1"},
  { "name": "Richard", "address": "Sky st 331"},
  { "name": "Susan", "address": "One way 98"},
  { "name": "Vicky", "address": "Yellow Garden 2"},
  { "name": "Ben", "address": "Park Lane 38"},
  { "name": "William", "address": "Central st 954"},
  { "name": "Chuck", "address": "Main Road 989"},
  { "name": "Viola", "address": "Sideway 1633"}
]

x = mycol.insert_many(mylist)

#print list of the _id values of the inserted documents:
pprint(x.inserted_ids)

[ObjectId('620123aa4039d142fbd12c34'),
 ObjectId('620123aa4039d142fbd12c35'),
 ObjectId('620123aa4039d142fbd12c36'),
 ObjectId('620123aa4039d142fbd12c37'),
 ObjectId('620123aa4039d142fbd12c38'),
 ObjectId('620123aa4039d142fbd12c39'),
 ObjectId('620123aa4039d142fbd12c3a'),
 ObjectId('620123aa4039d142fbd12c3b'),
 ObjectId('620123aa4039d142fbd12c3c'),
 ObjectId('620123aa4039d142fbd12c3d'),
 ObjectId('620123aa4039d142fbd12c3e'),
 ObjectId('620123aa4039d142fbd12c3f')]


##**7. Insert Multiple Documents, with Specified IDs**

* If you do not want MongoDB to assign unique ids for you document, you can specify the _id field when you insert the document(s).

* Remember that the values has to be unique. Two documents cannot have the same _id.

In [321]:
import pymongo

myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customersa"]

mylist = [
  { "_id": 1, "name": "John", "address": "Highway 37"},
  { "_id": 2, "name": "Peter", "address": "Lowstreet 27"},
  { "_id": 3, "name": "Amy", "address": "Apple st 652"},
  { "_id": 4, "name": "Hannah", "address": "Mountain 21"},
  { "_id": 5, "name": "Michael", "address": "Valley 345"},
  { "_id": 6, "name": "Sandy", "address": "Ocean blvd 2"},
  { "_id": 7, "name": "Betty", "address": "Green Grass 1"},
  { "_id": 8, "name": "Richard", "address": "Sky st 331"},
  { "_id": 9, "name": "Susan", "address": "One way 98"},
  { "_id": 10, "name": "Vicky", "address": "Yellow Garden 2"},
  { "_id": 11, "name": "Ben", "address": "Park Lane 38"},
  { "_id": 12, "name": "William", "address": "Central st 954"},
  { "_id": 13, "name": "Chuck", "address": "Main Road 989"},
  { "_id": 14, "name": "Viola", "address": "Sideway 1633"}
]

x = mycol.insert_many(mylist)

#print list of the _id values of the inserted documents:
print(x.inserted_ids)

BulkWriteError: batch op errors occurred, full error: {'writeErrors': [{'index': 0, 'code': 11000, 'keyPattern': {'_id': 1}, 'keyValue': {'_id': 1}, 'errmsg': 'E11000 duplicate key error collection: mydatabase.customersa index: _id_ dup key: { _id: 1 }', 'op': {'_id': 1, 'name': 'John', 'address': 'Highway 37'}}], 'writeConcernErrors': [], 'nInserted': 0, 'nUpserted': 0, 'nMatched': 0, 'nModified': 0, 'nRemoved': 0, 'upserted': []}

##**8. Find**

* In MongoDB we use the find and findOne methods to find data in a collection.

* Just like the SELECT statement is used to find data in a table in a SQL databases

#####**Find One**
* To select data from a collection in MongoDB, we can use the find_one() method.

* The find_one() method returns the first occurrence in the selection.

In [196]:
# Find the first document in the customers collection:

x = mycol.find_one()

print(x)

{'_id': ObjectId('620104994039d142fbd0da63'), 'name': 'John', 'address': 'Highway 37'}


#####**Find All**

* To select data from a table in MongoDB, we can also use the find() method.

* The find() method returns all occurrences in the selection.

* The first parameter of the find() method is a query object. In this example we use an empty query object, which selects all documents in the collection.

* No parameters in the find() method gives you the same result as SELECT * in SQL.

In [322]:
# Return all documents in the "customers" collection, and print each document:

for x in mycol.find():
    print(x)

{'_id': 1, 'name': 'John', 'address': 'Highway 37'}
{'_id': 2, 'name': 'Peter', 'address': 'Lowstreet 27'}
{'_id': 3, 'name': 'Amy', 'address': 'Apple st 652'}
{'_id': 4, 'name': 'Hannah', 'address': 'Mountain 21'}
{'_id': 5, 'name': 'Michael', 'address': 'Valley 345'}
{'_id': 6, 'name': 'Sandy', 'address': 'Ocean blvd 2'}
{'_id': 7, 'name': 'Betty', 'address': 'Green Grass 1'}
{'_id': 8, 'name': 'Richard', 'address': 'Sky st 331'}
{'_id': 9, 'name': 'Susan', 'address': 'One way 98'}
{'_id': 10, 'name': 'Vicky', 'address': 'Yellow Garden 2'}
{'_id': 11, 'name': 'Ben', 'address': 'Park Lane 38'}
{'_id': 12, 'name': 'William', 'address': 'Central st 954'}
{'_id': 13, 'name': 'Chuck', 'address': 'Main Road 989'}
{'_id': 14, 'name': 'Viola', 'address': 'Sideway 1633'}


#####**Return Only Some Fields**
* The second parameter of the find() method is an object describing which fields to include in the result.

* This parameter is optional, and if omitted, all fields will be included in the result.

In [323]:
# Return only the names and addresses, not the _ids:

for x in mycol.find({},{ "_id": 0, "name": 1, "address": 1 }):
    print(x)

{'name': 'John', 'address': 'Highway 37'}
{'name': 'Peter', 'address': 'Lowstreet 27'}
{'name': 'Amy', 'address': 'Apple st 652'}
{'name': 'Hannah', 'address': 'Mountain 21'}
{'name': 'Michael', 'address': 'Valley 345'}
{'name': 'Sandy', 'address': 'Ocean blvd 2'}
{'name': 'Betty', 'address': 'Green Grass 1'}
{'name': 'Richard', 'address': 'Sky st 331'}
{'name': 'Susan', 'address': 'One way 98'}
{'name': 'Vicky', 'address': 'Yellow Garden 2'}
{'name': 'Ben', 'address': 'Park Lane 38'}
{'name': 'William', 'address': 'Central st 954'}
{'name': 'Chuck', 'address': 'Main Road 989'}
{'name': 'Viola', 'address': 'Sideway 1633'}


You are not allowed to specify both 0 and 1 values in the same object (except if one of the fields is the _id field). If you specify a field with the value 0, all other fields get the value 1, and vice versa:

In [324]:
# This example will exclude "address" from the result:

for x in mycol.find({},{ "address": 0 }):
    print(x)

{'_id': 1, 'name': 'John'}
{'_id': 2, 'name': 'Peter'}
{'_id': 3, 'name': 'Amy'}
{'_id': 4, 'name': 'Hannah'}
{'_id': 5, 'name': 'Michael'}
{'_id': 6, 'name': 'Sandy'}
{'_id': 7, 'name': 'Betty'}
{'_id': 8, 'name': 'Richard'}
{'_id': 9, 'name': 'Susan'}
{'_id': 10, 'name': 'Vicky'}
{'_id': 11, 'name': 'Ben'}
{'_id': 12, 'name': 'William'}
{'_id': 13, 'name': 'Chuck'}
{'_id': 14, 'name': 'Viola'}


##**9. Query**

**9.1. Filter the Result**

* When finding documents in a collection, you can filter the result by using a query object.

* The first argument of the `find()` method is a query object, and is used to limit the search.

In [325]:
# Find document(s) with the address "Park Lane 38":

myquery = { "address": "Park Lane 38" }

mydoc = mycol.find(myquery)

for x in mydoc:
    print(x)

{'_id': 11, 'name': 'Ben', 'address': 'Park Lane 38'}


#####**9.2. Advanced Query**

* To make advanced queries you can use modifiers as values in the query object.

* E.g. to find the documents where the "address" field starts with the letter `"S"` or higher (alphabetically), use the greater than modifier: `{"$gt": "S"}`:

In [326]:
# Find documents where the address starts with the letter "S" or higher:

myquery = { "address": { "$gt": "S" } }

mydoc = mycol.find(myquery)

for x in mydoc:
    print(x)

{'_id': 5, 'name': 'Michael', 'address': 'Valley 345'}
{'_id': 8, 'name': 'Richard', 'address': 'Sky st 331'}
{'_id': 10, 'name': 'Vicky', 'address': 'Yellow Garden 2'}
{'_id': 14, 'name': 'Viola', 'address': 'Sideway 1633'}


##**10. Sort**

**Sort the Result**

* Use the `sort()` method to sort the result in ascending or descending order.

* The `sort()` method takes one parameter for "fieldname" and one parameter for "direction" (ascending is the default direction).

In [327]:
# Sort the result alphabetically by name:

mydoc = mycol.find().sort("name")

for x in mydoc:
    print(x)

{'_id': 3, 'name': 'Amy', 'address': 'Apple st 652'}
{'_id': 11, 'name': 'Ben', 'address': 'Park Lane 38'}
{'_id': 7, 'name': 'Betty', 'address': 'Green Grass 1'}
{'_id': 13, 'name': 'Chuck', 'address': 'Main Road 989'}
{'_id': 4, 'name': 'Hannah', 'address': 'Mountain 21'}
{'_id': 1, 'name': 'John', 'address': 'Highway 37'}
{'_id': 5, 'name': 'Michael', 'address': 'Valley 345'}
{'_id': 2, 'name': 'Peter', 'address': 'Lowstreet 27'}
{'_id': 8, 'name': 'Richard', 'address': 'Sky st 331'}
{'_id': 6, 'name': 'Sandy', 'address': 'Ocean blvd 2'}
{'_id': 9, 'name': 'Susan', 'address': 'One way 98'}
{'_id': 10, 'name': 'Vicky', 'address': 'Yellow Garden 2'}
{'_id': 14, 'name': 'Viola', 'address': 'Sideway 1633'}
{'_id': 12, 'name': 'William', 'address': 'Central st 954'}


**Sort Descending**

* Use the value -1 as the second parameter to sort descending.

* sort("name", 1) #ascending

* sort("name", -1) #descending

In [None]:
# Sort the result reverse alphabetically by name:

mydoc = mycol.find().sort("name", -1)

for x in mydoc:
    print(x)

##**11.Delete Document**

* To delete one document, we use the `delete_one()` method.

* The first parameter of the `delete_one()` method is a query object defining which document to delete.

`Note:` If the query finds more than one document, only the first occurrence is deleted.

In [220]:
# Delete the document with the address "Mountain 21":

myquery = { "address": "Mountain 21" }

mycol.delete_one(myquery)

* To delete more than one document, use the delete_many() method.

* The first parameter of the delete_many() method is a query object defining which documents to delete.

## **12. Drop Collection**

**Delete Collection:**
You can delete a table, or collection as it is called in MongoDB, by using the `drop()` method.

* The drop() method returns true if the collection was dropped successfully, and false if the collection does not exist.


In [221]:
# Delete the "customers" collection:

mycol.drop()

##**13. Update**

**Update Collection:**
You can update a record, or document as it is called in MongoDB, by using the `update_one()` method.

* The first parameter of the `update_one()` method is a query object defining which document to update.

`Note`: If the query finds more than one record, only the first occurrence is updated.

* The second parameter is an object defining the new values of the document.

In [226]:
# Change the address from "Valley 345" to "Canyon 123":

myquery = { "address": "Valley 345" }
newvalues = { "$set": { "address": "Canyon 123" } }

mycol.update_one(myquery, newvalues)

#print "customers" after the update:
for x in mycol.find():
    print(x)

{'_id': ObjectId('620108e04039d142fbd0dab8'), 'name': 'John', 'address': 'Highway 37'}
{'_id': ObjectId('620108ee4039d142fbd0dab9'), 'name': 'Amy', 'address': 'Apple st 652'}
{'_id': ObjectId('620108ee4039d142fbd0daba'), 'name': 'Hannah', 'address': 'Mountain 21'}
{'_id': ObjectId('620108ee4039d142fbd0dabb'), 'name': 'Michael', 'address': 'Canyon 123'}
{'_id': ObjectId('620108ee4039d142fbd0dabc'), 'name': 'Sandy', 'address': 'Ocean blvd 2'}
{'_id': ObjectId('620108ee4039d142fbd0dabd'), 'name': 'Betty', 'address': 'Green Grass 1'}
{'_id': ObjectId('620108ee4039d142fbd0dabe'), 'name': 'Richard', 'address': 'Sky st 331'}
{'_id': ObjectId('620108ee4039d142fbd0dabf'), 'name': 'Susan', 'address': 'One way 98'}
{'_id': ObjectId('620108ee4039d142fbd0dac0'), 'name': 'Vicky', 'address': 'Yellow Garden 2'}
{'_id': ObjectId('620108ee4039d142fbd0dac1'), 'name': 'Ben', 'address': 'Park Lane 38'}
{'_id': ObjectId('620108ee4039d142fbd0dac2'), 'name': 'William', 'address': 'Central st 954'}
{'_id': Obj

**Update Many:**
To update all documents that meets the criteria of the query, use the `update_many()` method.

In [None]:
# Update all documents where the address starts with the letter "S":

myquery = { "address": { "$regex": "^S" } }
newvalues = { "$set": { "name": "Minnie" } }

x = mycol.update_many(myquery, newvalues)

print(x.modified_count, "documents updated.")

##**14. Limit**

**Limit the Result:**
To limit the result in MongoDB, we use the `limit()` method.

* The `limit()` method takes one parameter, a number defining how many documents to return.

Consider you have a "customers" collection:

In [227]:
# Limit the result to only return 5 documents:

myresult = mycol.find().limit(5)

#print the result:
for x in myresult:
    print(x)

{'_id': ObjectId('620108e04039d142fbd0dab8'), 'name': 'John', 'address': 'Highway 37'}
{'_id': ObjectId('620108ee4039d142fbd0dab9'), 'name': 'Amy', 'address': 'Apple st 652'}
{'_id': ObjectId('620108ee4039d142fbd0daba'), 'name': 'Hannah', 'address': 'Mountain 21'}
{'_id': ObjectId('620108ee4039d142fbd0dabb'), 'name': 'Michael', 'address': 'Canyon 123'}
{'_id': ObjectId('620108ee4039d142fbd0dabc'), 'name': 'Sandy', 'address': 'Ocean blvd 2'}


## **Collect Data with API and store it in MongoDB**




**1- Game Data**
 https://www.gamespot.com/api/? 
 

In [328]:
from pymongo import MongoClient
import requests
import pandas as pd
import pymongo 

In [329]:
# client = MongoClient('127.0.0.1', 27017)
# db_name = 'gamespot_reviews'

# # connect to the database
# db = client['Kaggle']


client = pymongo.MongoClient('mongodb://127.0.0.1:27017/')

# connect to the database
db = client['Games']

reviews = db.gameReviews


In [330]:
games_base = "http://www.gamespot.com/api/reviews/?api_key=[YOUR KEY]&format=json"
headers = {
    "user_agent": "nasrineshraghi API Access"
}

In [331]:
review_fields = "id,title,score,deck,body,good,bad"

In [332]:
pages = list(range(0, 20))
pages_list = pages[0:20:10]

In [333]:
def get_games(url_base, num_pages, fields, collection):

    field_list = "&field_list=" + fields + "&sort=score:desc" + "&offset="

    for page in num_pages:
        url = url_base + field_list + str(page)
        response = requests.get(url, headers=headers).json()
        video_games = response['results']
        for i in video_games:
            collection.insert_one(i)
            print("Data Inserted")

In [334]:
get_games(games_base, pages_list, review_fields, reviews)

print(get_games)

Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data Inserted
Data I

In [335]:
scores = []
from pprint import pprint

for score in list(reviews.find({}, {"_id":0, "score": 10})):
    scores.append(score)

pprint(scores[:5])

[{'score': '10.0'},
 {'score': '10.0'},
 {'score': '10.0'},
 {'score': '10.0'},
 {'score': '10.0'}]


**2- Nobel Prize Data: https://nobelprize.readme.io/docs**

In [336]:
import requests
from pymongo import MongoClient
import requests
import pandas as pd

In [338]:
client = MongoClient()
client = MongoClient('127.0.0.1', 27017)

# connect to the database
db = client['nobel']

In [339]:
for collection_name in ["laureates"]:
    response = requests.get("http://api.nobelprize.org/v1/{}.json".\
                           format(collection_name[:-1]))

In [340]:
documents = response.json()[collection_name]

In [341]:
db[collection_name].insert_many(documents)

<pymongo.results.InsertManyResult at 0x28815b9bd08>

In [342]:
#No parameters in the find() method gives you the same result as SELECT * in MySQL.
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["nobel"]
nobel_data = mydb["laureates"]


In [343]:
#Return only the names and addresses, not the _ids:
for x in nobel_data.find({},{ "_id": 0, "firstname": 1, "born": 1 }):
  print(x)

{'firstname': 'Wilhelm Conrad', 'born': '1845-03-27'}
{'firstname': 'Hendrik A.', 'born': '1853-07-18'}
{'firstname': 'Pieter', 'born': '1865-05-25'}
{'firstname': 'Henri', 'born': '1852-12-15'}
{'firstname': 'Pierre', 'born': '1859-05-15'}
{'firstname': 'Marie', 'born': '1867-11-07'}
{'firstname': 'Lord', 'born': '1842-11-12'}
{'firstname': 'Philipp', 'born': '1862-06-07'}
{'firstname': 'J.J.', 'born': '1856-12-18'}
{'firstname': 'Albert A.', 'born': '1852-12-19'}
{'firstname': 'Gabriel', 'born': '1845-08-16'}
{'firstname': 'Guglielmo', 'born': '1874-04-25'}
{'firstname': 'Ferdinand', 'born': '1850-06-06'}
{'firstname': 'Johannes Diderik', 'born': '1837-11-23'}
{'firstname': 'Wilhelm', 'born': '1864-01-13'}
{'firstname': 'Gustaf', 'born': '1869-11-30'}
{'firstname': 'Heike', 'born': '1853-09-21'}
{'firstname': 'Max', 'born': '1879-10-09'}
{'firstname': 'William', 'born': '1862-07-02'}
{'firstname': 'Lawrence', 'born': '1890-03-31'}
{'firstname': 'Charles Glover', 'born': '1877-06-07'}

In [158]:
#This example will exclude "address" from the result:
for x in nobel_data.find({},{ "bornCity": 0 }):
  pprint(x)

{'_id': ObjectId('61fd96d54039d142fbd0d691'),
 'born': '1845-03-27',
 'bornCountry': 'Prussia (now Germany)',
 'bornCountryCode': 'DE',
 'died': '1923-02-10',
 'diedCity': 'Munich',
 'diedCountry': 'Germany',
 'diedCountryCode': 'DE',
 'firstname': 'Wilhelm Conrad',
 'gender': 'male',
 'id': '1',
 'prizes': [{'affiliations': [{'city': 'Munich',
                               'country': 'Germany',
                               'name': 'Munich University'}],
             'category': 'physics',
             'motivation': '"in recognition of the extraordinary services he '
                           'has rendered by the discovery of the remarkable '
                           'rays subsequently named after him"',
             'share': '1',
             'year': '1901'}],
 'surname': 'Röntgen'}
{'_id': ObjectId('61fd96d54039d142fbd0d692'),
 'born': '1853-07-18',
 'bornCountry': 'the Netherlands',
 'bornCountryCode': 'NL',
 'died': '1928-02-04',
 'diedCountry': 'the Netherlands',
 'diedCount

**3- Collect Tweeter and store it in MongoDB**

In [276]:
# !pip install dnspython

In [277]:
# !pip install tweepy

In [278]:
# !pip install twitter

In [344]:
import pymongo
from pymongo import MongoClient
import json
import tweepy
import twitter
from pprint import pprint
import configparser
import pandas as pd

In [345]:
CONSUMER_KEY      = ""
CONSUMER_SECRET   = ""
OAUTH_TOKEN       = ""
OATH_TOKEN_SECRET = "

mongod_connect = 'mongodb://127.0.0.1:27017/'

In [346]:
client = MongoClient(mongod_connect)
db = client.Twitter # use or create a database named demo
tweet_collection = db.tweetData #use or create a collection named tweet_collection
# tweet_collection.create_index([("id", pymongo.ASCENDING)],unique = True) # make sure the collected tweets are unique

In [347]:
# stream_auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
# stream_auth.set_access_token(OAUTH_TOKEN, OATH_TOKEN_SECRET)

# strem_api = tweepy.API(stream_auth)

In [348]:
rest_auth = twitter.oauth.OAuth(OAUTH_TOKEN,OATH_TOKEN_SECRET,CONSUMER_KEY,CONSUMER_SECRET)
rest_api = twitter.Twitter(auth=rest_auth)

In [349]:
count = 100 #number of returned tweets, default and max is 100
q = "covid19"  #define the keywords, tweets contain election

In [350]:
search_results = rest_api.search.tweets( count=count,q=q) #you can use both q and geocode
statuses = search_results["statuses"]
since_id_new = statuses[-1]['id']
for statuse in statuses:
    try:
        tweet_collection.insert_one(statuse)
        pprint(statuse['created_at'])# print the date of the collected tweets
    except:
        pass

'Mon Feb 07 14:03:57 +0000 2022'
'Mon Feb 07 14:03:57 +0000 2022'
'Mon Feb 07 14:03:57 +0000 2022'
'Mon Feb 07 14:03:56 +0000 2022'
'Mon Feb 07 14:03:55 +0000 2022'
'Mon Feb 07 14:03:55 +0000 2022'
'Mon Feb 07 14:03:55 +0000 2022'
'Mon Feb 07 14:03:54 +0000 2022'
'Mon Feb 07 14:03:53 +0000 2022'
'Mon Feb 07 14:03:52 +0000 2022'
'Mon Feb 07 14:03:52 +0000 2022'
'Mon Feb 07 14:03:52 +0000 2022'
'Mon Feb 07 14:03:52 +0000 2022'
'Mon Feb 07 14:03:51 +0000 2022'
'Mon Feb 07 14:03:51 +0000 2022'
'Mon Feb 07 14:03:51 +0000 2022'
'Mon Feb 07 14:03:51 +0000 2022'
'Mon Feb 07 14:03:49 +0000 2022'
'Mon Feb 07 14:03:49 +0000 2022'
'Mon Feb 07 14:03:48 +0000 2022'
'Mon Feb 07 14:03:48 +0000 2022'
'Mon Feb 07 14:03:48 +0000 2022'
'Mon Feb 07 14:03:48 +0000 2022'
'Mon Feb 07 14:03:48 +0000 2022'
'Mon Feb 07 14:03:47 +0000 2022'
'Mon Feb 07 14:03:46 +0000 2022'
'Mon Feb 07 14:03:46 +0000 2022'
'Mon Feb 07 14:03:46 +0000 2022'
'Mon Feb 07 14:03:46 +0000 2022'
'Mon Feb 07 14:03:45 +0000 2022'
'Mon Feb 0

In [351]:
print(tweet_collection.estimated_document_count())# number of tweets collected


100


In [298]:
tweet_collection.create_index([("text", pymongo.TEXT)], name='text_index', default_language='english') # create a text index


'text_index'