In [1]:
import pandas as pd
import numpy as np

# 8.5. Introducción a MongoDB.

- Instalamos pymongo.
- Docs: https://api.mongodb.com/python/current/tutorial.html

In [8]:
!pip install pymongo
!pip install pymongo[srv]

[33mYou are using pip version 19.0.3, however version 20.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
[33mYou are using pip version 19.0.3, however version 20.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


- Importamos:

In [5]:
from pymongo import MongoClient

- Nos hacemos una cuenta en mongo Atlas: https://www.mongodb.com/cloud/atlas
- Nos conectamos con MongoClient

In [7]:
client = MongoClient(
    "PON AQUI TU MONGO ADDRS"
)

### Databases
- Crear y obtener una database.
- Una instancia de MongoDB puede soportar múltiples databases independientes.
- Podemos acceder a ellas con attribute style o dictionary style.

In [9]:
db = client.test_database

In [10]:
db = client['test-database']

### Collections
- Una colección es un grupo de documentos guardados en mongodb, se asimila a una tabla en una base de datos relacional.

In [11]:
collection = db.test_collection
collection = db['test-collection']

### Documents

- Los datos en MongoDB se representan y guardan usando JSON. En pymongo usamos diccionarios para representarlos.

In [30]:
import datetime
post = {
    "author": "Mike",
    "text": "My first blog post!",
    "tags": ["mongodb", "python", "pymongo"],
    "date": datetime.datetime.utcnow()
}

In [31]:
post

{'author': 'Mike',
 'text': 'My first blog post!',
 'tags': ['mongodb', 'python', 'pymongo'],
 'date': datetime.datetime(2020, 8, 1, 8, 23, 6, 807268)}

- Los documentos pueden tener tipos de datos nativos que serán automaticamente convertidos.
- Para insertar un documento podemos usar el método insert_one() sobre una colección.

In [35]:
posts = db.posts
post_id = posts.insert_one(post)
post_id

<pymongo.results.InsertOneResult at 0x112246208>

In [18]:
post_id.inserted_id

ObjectId('5f2523fd51d0dc1fd8ff1d51')

- Después de insertar el primer documento, la colección es creada:

In [19]:
db.list_collection_names()

['posts']

### Query

- Obtener un único documento:

In [36]:
posts.find_one()

{'_id': ObjectId('5f2523c451d0dc1fd8ff1d50'),
 'author': 'Mike',
 'text': 'My first blog post!',
 'tags': ['mongodb', 'python', 'pymongo'],
 'date': datetime.datetime(2020, 8, 1, 8, 11, 23, 330000)}

- Útil para encontrar el primer documento de la colección.
- Podemos obtener elementos especificando un campo

In [37]:
posts.find_one({"author": "Mike"})

{'_id': ObjectId('5f2523c451d0dc1fd8ff1d50'),
 'author': 'Mike',
 'text': 'My first blog post!',
 'tags': ['mongodb', 'python', 'pymongo'],
 'date': datetime.datetime(2020, 8, 1, 8, 11, 23, 330000)}

In [38]:
posts.find_one({"author": "Eliot"})

### Bulk Inserts

In [39]:
new_posts = [
    {"author": "Mike",
     "text": "Another post!",
     "tags": ["bulk", "insert"],
     "date": datetime.datetime(2009, 11, 12, 11, 14)},
    {"author": "Eliot",
     "title": "MongoDB is fun",
     "text": "and pretty easy too!",
     "date": datetime.datetime(2009, 11, 10, 10, 45)}
]
result = posts.insert_many(new_posts)
result.inserted_ids

[ObjectId('5f2526d451d0dc1fd8ff1d54'), ObjectId('5f2526d451d0dc1fd8ff1d55')]

### Querying más de un documento

In [40]:
for post in posts.find():
    print(post)

{'_id': ObjectId('5f2523c451d0dc1fd8ff1d50'), 'author': 'Mike', 'text': 'My first blog post!', 'tags': ['mongodb', 'python', 'pymongo'], 'date': datetime.datetime(2020, 8, 1, 8, 11, 23, 330000)}
{'_id': ObjectId('5f2523fd51d0dc1fd8ff1d51'), 'author': 'Mike 2', 'text': 'My first blog post!', 'tags': ['mongodb', 'python', 'pymongo'], 'date': datetime.datetime(2020, 8, 1, 8, 12, 41, 588000)}
{'_id': ObjectId('5f25250651d0dc1fd8ff1d52'), 'author': 'Mike 3', 'text': 'My first blog post!', 'tags': ['mongodb', 'python', 'pymongo'], 'date': datetime.datetime(2020, 8, 1, 8, 17, 4, 18000)}
{'_id': ObjectId('5f25266d51d0dc1fd8ff1d53'), 'author': 'Mike', 'text': 'My first blog post!', 'tags': ['mongodb', 'python', 'pymongo'], 'date': datetime.datetime(2020, 8, 1, 8, 23, 6, 807000)}
{'_id': ObjectId('5f2526d451d0dc1fd8ff1d54'), 'author': 'Mike', 'text': 'Another post!', 'tags': ['bulk', 'insert'], 'date': datetime.datetime(2009, 11, 12, 11, 14)}
{'_id': ObjectId('5f2526d451d0dc1fd8ff1d55'), 'author

In [41]:
for post in posts.find({"author": "Mike"}):
    print(post)

{'_id': ObjectId('5f2523c451d0dc1fd8ff1d50'), 'author': 'Mike', 'text': 'My first blog post!', 'tags': ['mongodb', 'python', 'pymongo'], 'date': datetime.datetime(2020, 8, 1, 8, 11, 23, 330000)}
{'_id': ObjectId('5f25266d51d0dc1fd8ff1d53'), 'author': 'Mike', 'text': 'My first blog post!', 'tags': ['mongodb', 'python', 'pymongo'], 'date': datetime.datetime(2020, 8, 1, 8, 23, 6, 807000)}
{'_id': ObjectId('5f2526d451d0dc1fd8ff1d54'), 'author': 'Mike', 'text': 'Another post!', 'tags': ['bulk', 'insert'], 'date': datetime.datetime(2009, 11, 12, 11, 14)}


### Contar

In [42]:
posts.count_documents({})

6

In [43]:
posts.count_documents({"author": "Mike"})

3

### Range Queries

- MongoDB soporta queries avanzadas: https://docs.mongodb.com/manual/reference/operator/

In [44]:
d = datetime.datetime(2009, 11, 12, 12)
for post in posts.find({"date": {"$lt": d}}).sort("author"):
    print(post)

{'_id': ObjectId('5f2526d451d0dc1fd8ff1d55'), 'author': 'Eliot', 'title': 'MongoDB is fun', 'text': 'and pretty easy too!', 'date': datetime.datetime(2009, 11, 10, 10, 45)}
{'_id': ObjectId('5f2526d451d0dc1fd8ff1d54'), 'author': 'Mike', 'text': 'Another post!', 'tags': ['bulk', 'insert'], 'date': datetime.datetime(2009, 11, 12, 11, 14)}


### Ejercicio

- Crea un dataframe con new_posts:

In [45]:
new_posts = [
    {"author": "Mike",
     "text": "Another post!",
     "tags": ["bulk", "insert"],
     "date": datetime.datetime(2009, 11, 12, 11, 14)},
    {"author": "Eliot",
     "title": "MongoDB is fun",
     "text": "and pretty easy too!",
     "date": datetime.datetime(2009, 11, 10, 10, 45)}
]

- Busca la forma de insertar este dataframe a mongo, creando una nueva base de datos y una nueva colección, usa df.to_dict("records").

- Realiza una query sobre esa colección.