# Getting started with MongoDB

In this notebook we will use a local docker instance of MongDB. 

We will then connect to the cluster using the Python driver and create a database and a collection.

Before you begin, you will need to keep in mind the overall structure of an Atlas MongoDB.

## Step 1: Create a MongoDB Atlas account, organization, cluster and database

Start the mongoDB server.

In your terminal, navigate to the MongoDB-1node folder where you have the docker-compose.yml file and run the following command:

```bash
docker-compose up
```

This will download a MongoDB docker image, and start a MongoDB server on your local machine.

## Step 2: Install PyMongo
    
If you are connecting to a local MongoDB server; you can use conda to install the pymongo module

If you do not alread have pymongo installed (for instance, you may have previously installed pymongo for MongoDB atlas, therefore you do not need to reinstall or install it through conda), in your terminal, run:

```bash
conda install -c conda-forge pymongo
```

NOTE: If you wish to connect to a MongoDB Atlas server, you need to look at the W05 notebooks which provide information on this. 

## Step 3: Test your connection
    
The following code will connect to your local MongoDB server and create a database called ism6562_w06. If it already exists, it will simply be connected to this existing database.

In [1]:
import pymongo # pymongo is a python driver for MongoDB

client = pymongo.MongoClient('mongodb://localhost:27017/')
db = client.ism6562
collection = db.blogger


In MongDB, a collection is a group of documents. A document is a set of key-value pairs. These key value pairs are stored as BSON (Binary JSON). JSON looks very much like a Python datastructure you should be familiar with - a dictionary.

Let's say we are creating an application that allows users to create blog posts....

In [2]:
import datetime # datetime is a python module for working with dates and times
post = {"author": "Prof Smith",
        "title": "My first blog post!",
        "tags": ["ism6562", "Big Data", "mongodb", "python", "pymongo"],
        "date": datetime.datetime.utcnow(), # datetime.datetime.utcnow() returns the current date and time
        "text": "This is my first blog post. I am excited to be teaching this class!"
}

Let's now store this 'document' in the 'blogger' collection.

In [3]:
# If you run this notebook multiple times, it will results in the same data being entered multiple times. 
# To remove any previous data, uncomment the following two lines
collection.drop()

post_id = collection.insert_one(post).inserted_id # this inserts the post into the collection, then returns the id of the post
post_id

ObjectId('6513558ef216a0f4f7f8559f')

In [4]:
results = collection.find() # this returns the first document in the collection 'bloggers'

for result in results:
    print(result)

{'_id': ObjectId('6513558ef216a0f4f7f8559f'), 'author': 'Prof Smith', 'title': 'My first blog post!', 'tags': ['ism6562', 'Big Data', 'mongodb', 'python', 'pymongo'], 'date': datetime.datetime(2023, 9, 26, 22, 5, 2, 5000), 'text': 'This is my first blog post. I am excited to be teaching this class!'}


In [5]:
client.close() # close the connection to the database

![](images/first_insert.png)