# Getting started with MongoDB

In this notebook we will create an account on MongoDB Atlas and create a free cluster. 

We will then connect to the cluster using the Python driver and create a database and a collection.

Before you being, you will need to keep in mind the overall structure of an Atlas MongoDB.

## MongDB Atlas

MongoDB Atlas is a cloud-hosted MongoDB service. It is a fully managed database as a service (DBaaS) that hosts your data on MongoDB instances in the cloud. Atlas is available on AWS, Azure, and Google Cloud Platform.

## MongoDB Atlas Structure 

MongoDB Atlas Schema is a bit different than the traditional relational database schema. 

In MongoDB and organization is a group of users. A project is a group of databases that share the same security settings and network access. A cluster is a group of database servers that store your data. A database is a container for collections. A collection is a group of documents. A document is a set of key-value pairs. A field is a key in a document. A value is a value in a document.

This structure is represented in the following diagram:

```
Organizations
    |
    |---Projects (set you database and network access at this level)
            |
            |---Clusters (get your remote connection details at this level)
                    |
                    |---Databases 
                            |
                            |---Collections
                                    |
                                    |---Documents
                                            |
                                            |---Fields
                                                    |
                                                    |---Values
```

## Step 1: Create a MongoDB Atlas account, organization, cluster and database

Create a MondgoDB Atlas account, organization, cluster and database by following the steps found [here](https://www.mongodb.com/basics/mongodb-atlas-tutorial)

* Create a MongoDB Cloud account
    * Create a new organization
    * Create a new project
* Create a MongoDB Atlas cluster
    * Select the free tier
    * Select the AWS region closest to you
    * Select allow access from anywhere
* Connect to the cluster

Each project you create can have seperate user and network access specifications. As part of the process of creating a cluster, you entered a username and password. Store this username and password in a file (in the same folder as this notebook) called credentials.py. We will use this file to connect to the cluster later.

* Create a file called credentials.py
*  In credentials.py enter the following (be sure to use your own username and password):
```python
username = 'someusername'
password = 'somepassword'
```

> NOTE: There are two forms of authentication for MongoDB Atlas. The first is the username and password you entered when you created the cluster. The second is the API key. The API key is a more secure way to authenticate. You can read more about it [here](https://docs.atlas.mongodb.com/configure-api-access/)

## Step 2: Install PyMongo
    
On your local computer (with miniconda/anaconda and jupyter lab installed) open a local terminal.

You will need to install the pymongo package. This package will allow you to connect to the MongoDB Atlas service (or any MongoDB).

To install pymongo, open a terminal and enter the following command:

`python -m pip install 'pymongo[srv]'`  # note: on MacOS and Linux, use python3 instead of python
                                   
>NOTE1: To install pymongo you must use pip. If you installed it using Conda, this will only work with a local installation of MongoDB. To access the cloud based MongoDB that you created in step 1, you will need to install pymongo using the pip command shown above. 

>NOTE2: If you already installed pymongo using conda, you will need to uninstall it and install it using the pip command above (to uninstall `conda remove pymongo`.)

## Step 3: Test your connection
    
Test that everything is setup correctly by running the following code.  To do this we will:
* Import the pymongo package
* Import the credentials.py file
* Create a connection string
* Connect to the database
  * In this example, I have created a database called 'ism6562_w05' - you will need to use your own database name. This is created using the Atlas web interface.)
* Create a collection called 'test'
* Create a document in the collection 

If you do not have any errors, you have successully configured your environment and are ready to begin working on the next notebook.

In [1]:
import pymongo # pymongo is a python driver for MongoDB
import credentials # load username and password from credentials.py
connection_string = f"mongodb+srv://{credentials.username}:{credentials.password}@cluster0.o3xshsn.mongodb.net/?retryWrites=true&w=majority"
#cluster0.o3xshsn.mongodb.net


In [2]:
client = pymongo.MongoClient(connection_string) # create a client object to connect to the database. get this cluster address from the MongoDB Atlas UI
db = client['ism6562_w05'] # this connects to an existing database called ism6562_w05 or creates a new databse is ism6562_w05 does not exist.

In MongDB, a collection is a group of documents. A document is a set of key-value pairs. These key value pairs are stored as BSON (Binary JSON). JSON looks very much like a Python datastructure you should be familiar with - a dictionary.

Let's say we are creating an application that allows users to create blog posts....

In [3]:
import datetime # datetime is a python module for working with dates and times
post = {"author": "Prof Smith",
        "title": "My first blog post!",
        "tags": ["ism6562", "Big Data", "mongodb", "python", "pymongo"],
        "date": datetime.datetime.utcnow(), # datetime.datetime.utcnow() returns the current date and time
        "text": "This is my first blog post. I am excited to be teaching this class!"
}

Let's now store this 'document' in the 'blogger' collection.

In [4]:
posts = db['blogger'] # this creates a new collection called 'blogger' in the database for which we have credentials and an address.
post_id = posts.insert_one(post).inserted_id # this inserts the post into the collection, then returns the id of the post
post_id

ServerSelectionTimeoutError: ac-g3fkke3-shard-00-02.o3xshsn.mongodb.net:27017: timed out,ac-g3fkke3-shard-00-01.o3xshsn.mongodb.net:27017: timed out,ac-g3fkke3-shard-00-00.o3xshsn.mongodb.net:27017: timed out, Timeout: 30s, Topology Description: <TopologyDescription id: 651af7f647ae295066b997a0, topology_type: ReplicaSetNoPrimary, servers: [<ServerDescription ('ac-g3fkke3-shard-00-00.o3xshsn.mongodb.net', 27017) server_type: Unknown, rtt: None, error=NetworkTimeout('ac-g3fkke3-shard-00-00.o3xshsn.mongodb.net:27017: timed out')>, <ServerDescription ('ac-g3fkke3-shard-00-01.o3xshsn.mongodb.net', 27017) server_type: Unknown, rtt: None, error=NetworkTimeout('ac-g3fkke3-shard-00-01.o3xshsn.mongodb.net:27017: timed out')>, <ServerDescription ('ac-g3fkke3-shard-00-02.o3xshsn.mongodb.net', 27017) server_type: Unknown, rtt: None, error=NetworkTimeout('ac-g3fkke3-shard-00-02.o3xshsn.mongodb.net:27017: timed out')>]>

In [None]:
client.close() # close the connection to the database

## Step 4: Reivew your cluster (using the Atlas web interface) to see the data you just created

To review the data you just created, go to the Atlas web interface and select the collection you just created and added a document to. You should see the document you just created (organized in a JSON format).

![](images/first_insert.png)