# MongoDB

Based on https://docs.mongodb.com/getting-started 


MongoDB is a **NoSQL** open-source **document database**.  MongoDB provides horizontal scaling by replicating and partitioning the data over multiple nodes. This can improve the reliability and scalability of the system.

A record in MongoDB is a **document**, which is a data structure composed of field and value pairs. MongoDB documents are similar to JSON objects or Python dictionaries. The values of fields may include other documents, arrays, and arrays of documents.

This is an example of a document:
```JSON
{
   "_id" : ObjectId("54c955492b7c8eb21818bd09"),
   "address" : {
      "street" : "2 Avenue",
      "zipcode" : "10075",
      "building" : "1480",
      "coord" : [ -73.9557413, 40.7720266 ]
   },
   "borough" : "Manhattan",
   "cuisine" : "Italian",
   "grades" : [
      {
         "date" : ISODate("2014-10-01T00:00:00Z"),
         "grade" : "A",
         "score" : 11
      },
      {
         "date" : ISODate("2014-01-16T00:00:00Z"),
         "grade" : "B",
         "score" : 17
      }
   ],
   "name" : "Vella",
   "restaurant_id" : "41704620"
}
```
In MongoDB, documents have a unique **_id** field that acts as a primary key. MongoDB automatically adds a unique _id to each document if you are not providing it by yourself.

MongoDB stores documents in **collections**. Collections are analogous to tables in relational databases. Unlike a table, however, a collection does not require its documents to have the same schema.

You can start a Docker image with MongoDB like this:
```bash
docker run -p 27017:27017 -d mongo
```

In production you really (!) would need to enable authentication with username and password, but for development purposes this is fine.

Create a new Python environment with uv and install the following packages

```
pymongo
requests
jupyterlab
```

In [None]:
from pymongo import MongoClient
from pprint import pprint
import json
import requests

Use MongoClient to create a connection. If you do not specify any arguments to MongoClient, then MongoClient defaults to the MongoDB instance that runs on the localhost interface on port 27017. You can also specify a complete MongoDB URI to define the connection, including explicitly specifying the host and port number. For example, the following creates a connection to a MongoDB instance that runs on mongodb0.example.net and the port of 27017: client = MongoClient("mongodb://mongodb0.example.net:27017")

In [None]:
# Client connects to "localhost" by default
client = MongoClient()

The first fundamental class of objects you will interact with using pymongo is Database which represents the database construct in MongoDB. Databases hold groups of logically related collections. MongoDB creates new databases implicitly upon their first use. Connect (create) with a database of your name, e.g. 

```python
db = client["rolandmueller"]
``` 
or 

```python
db = client.rolandmueller
```

In [None]:
# Create local "bipm" database on the fly
db = client['bipm']

In [None]:
# When we rerun the whole notebook, we start from scratch
# by dropping the colection "courses"
db.courses.drop()

In [None]:
# Create a Python Dictonary
courses = [
    {'title': 'Data Science',
     'lecturer': {
         'name': 'Markus Löcher',
         'department': 'Math',
         'status': 'Professor'
     },
     'semester': 1},
    {'title': 'Data Warehousing',
     'lecturer': {
         'name': 'Roland M. Mueller',
         'department': 'Information Systems',
         'status': 'Professor'
     },
     'semester': 1},
    {'title': 'Business Process Management',
     'lecturer': {
         'name': 'Frank Habermann',
         'department': 'Information Systems',
         'status': 'Professor'
     },
     'semester': 1},
    {'title': 'Complexity and Organizational Decision-Making',
     'lecturer': {
         'name': 'Frank Habermann',
         'department': 'Information Systems',
         'status': 'Professor'
     },
     'semester': 1},
    {'title': 'Natural Language Processing Lab',
     'lecturer': {
         'name': 'Diana Hristova',
         'department': 'Information Systems',
         'status': 'Professor'
     },
     'semester': 2},
    {'title': 'Enterprise Architectures for Big Data',
     'lecturer': {
         'name': 'Roland M. Mueller',
         'department': 'Information Systems',
         'status': 'Professor'
     },
     'semester': 2},
    {'title': 'Business Process Innovation Lab',
     'lecturer': {
         'name': 'Frank Habermann',
         'department': 'Information Systems',
         'status': 'Professor'
     },
     'semester': 2},
    {'title': 'Edge IoT & AI',
     'lecturer': {
         'name': 'Alexander Eck',
         'department': 'Information Systems',
         'status': 'External'
     },
     'semester': 2},
    {'title': 'Research Methods',
     'lecturer': {
         'name': 'Marcus Birkenkrahe',
         'department': 'Information Systems',
         'status': 'Professor'
     },
     'semester': 2},
]

In [7]:
pprint(courses)

[{'lecturer': {'department': 'Math',
               'name': 'Markus Löcher',
               'status': 'Professor'},
  'semester': 1,
  'title': 'Data Science'},
 {'lecturer': {'department': 'Information Systems',
               'name': 'Roland M. Mueller',
               'status': 'Professor'},
  'semester': 1,
  'title': 'Data Warehousing'},
 {'lecturer': {'department': 'Information Systems',
               'name': 'Frank Habermann',
               'status': 'Professor'},
  'semester': 1,
  'title': 'Business Process Management'},
 {'lecturer': {'department': 'Information Systems',
               'name': 'Frank Habermann',
               'status': 'Professor'},
  'semester': 1,
  'title': 'Complexity and Organizational Decision-Making'},
 {'lecturer': {'department': 'Information Systems',
               'name': 'Diana Hristova',
               'status': 'Professor'},
  'semester': 2,
  'title': 'Natural Language Processing Lab'},
 {'lecturer': {'department': 'Information Systems',
   

## insert_many()

You can use the `insert_one()` method and the `insert_many()` method to add documents to a collection in MongoDB. If you attempt to add documents to a collection that does not exist, MongoDB will create the collection for you.

In [8]:
db.courses.insert_many(courses)

<pymongo.results.InsertManyResult at 0x10ef67ac8>

## find()

You can use the find() method to issue a query to retrieve data from a collection in MongoDB. All queries in MongoDB have the scope of a single collection.
Queries can return all documents in a collection or only the documents that match a specified filter or criteria. You can specify the filter or criteria in a document and pass as a parameter to the find() method. With no parameter, find() returns all documents in the collection.

The find() method returns query results in a cursor, which is an iterable object that yields documents. Then you can print all documents.

```python
cursor = db.my_collection.find()

for document in cursor:
    pprint(document)
```


In [32]:
# TODO: Print all courses

## JSON

You can store a JSON document if you convert it before to a Python dictionary:

In [10]:
my_json = '{"title": "Master Thesis", "semester": 3}'
another_course = json.loads(my_json)
another_course

{'semester': 3, 'title': 'Master Thesis'}

## insert_one()

The `insert_one()` method adds the document into the collection.


In [33]:
# TODO: Store `another_course` as another course:

In [34]:
# TODO: Print all courses

## find_one() and find()

`find_one()` returns the first match. ```find()```returns all matches.

The query condition for `find_one()` and `find()` for an equality match on fields has the following form:
```python
{ <field1>: <value1>, <field2>: <value2>, ... } 
```

The following operation finds the first documents whose name field equals "Manhattan".

```python
cursor = db.restaurants.find_one({"name": "Manhattan"})
```


In [None]:
# TODO: Find the course with the title "Data Science"
# save the result in a varibale result
# and pprint the result.

In [14]:
print(result["_id"])
print(result["lecturer"]["name"])

5cc57fee6879c205b3f26031
Markus Löcher


In [36]:
# TODO: Find the first course (one course) in the second semester
# and print it

In [37]:
# TODO: Find all courses in the second semester
# and print the course titles

In [38]:
# TODO: Find all courses in the second semester
# and print the lecturers names

## Subelements

Sometimes documents contains embedded documents as its elements. To specify a condition on a field in these documents, use the dot notation. Dot notation requires quotes around the whole dotted field name. The following queries for documents whose grades array contains an embedded document with a field grade equal to "B".

```python
cursor = db.restaurants.find({"grades.grade": "B"})
```

In [39]:
# TODO: Find all courses of Frank Habermann
# and print the title and the semester

## Logical AND

You can specify a logical conjunction (AND) for a list of query conditions by separating the conditions with a comma in the conditions document.

```python
cursor = db.restaurants.find({"cuisine": "Italian", "address.zipcode": "10075"})
```

In [40]:
# TODO: Find all courses from Frank Habermann in the second semester
# and print the title and the semester

## Logical OR

You can specify a logical disjunction (OR) for a list of query conditions by using the $or query operator.

```python
cursor = db.restaurants.find({"$or": [{"cuisine": "Italian"}, {"address.zipcode": "10075"}]})
```


In [41]:
# TODO: Find all courses from Frank Habermann or Markus Löcher
# and print the title and the semester

## Greater than, Less than

MongoDB provides operators to specify query conditions, such as comparison operators. Query conditions using operators generally have the following form:
```python
{ <field1>: { <operator1>: <value1> } }
```

Greater Than Operator (`$gt`). Query for documents whose grades array contains an embedded document with a field score greater than 30.

```python
cursor = db.restaurants.find({"grades.score": {"$gt": 30}})
```

Less Than Operator (`$lt`). Query for documents whose grades array contains an embedded document with a field score less than 10.

```python
cursor = db.restaurants.find({"grades.score": {"$lt": 10}})
```



In [42]:
# TODO: Find all courses in semester greater than 1
# and print the title and the semester

## Counting

`count_documents()` works like `find()` but returns the number of matched documents-

In [43]:
# TODO: How many courses are in the second semester?

# Downloading Nobel Prize Winners with an API and storing them in MongoDB

![](https://upload.wikimedia.org/wikipedia/en/e/ed/Nobel_Prize.png)
The Nobel Prize offers a Web API https://nobelprize.readme.io/docs/prize

Because the API is giving us JSON and MongoDB is able to store documents in a JSON-like format, using a document database like MongoDB seems like a good fit to store the results of the API.  You can get all laureates at http://api.nobelprize.org/v1/laureate.json and all prizes at http://api.nobelprize.org/v1/prize.json

We will just download all laureates and prizes and store them in MongoDB!

In [None]:
# Create local "nobel" database on the fly
db = client["nobel"]
db.prizes.drop()
db.laureates.drop()
# API documented at https://nobelprize.readme.io/docs/prize
for collection_name in ["prizes", "laureates"]:
    singular = collection_name[:-1] # the API uses singular
    response = requests.get( "http://api.nobelprize.org/v1/{}.json".format(singular))
    documents = response.json()[collection_name]
    # Create collections on the fly
    db[collection_name].insert_many(documents)

In [45]:
pprint(db.laureates.find_one())

{'_id': ObjectId('5cc5806d6879c205b3f2687d'),
 'born': '1845-03-27',
 'bornCity': 'Lennep (now Remscheid)',
 'bornCountry': 'Prussia (now Germany)',
 'bornCountryCode': 'DE',
 'died': '1923-02-10',
 'diedCity': 'Munich',
 'diedCountry': 'Germany',
 'diedCountryCode': 'DE',
 'firstname': 'Wilhelm Conrad',
 'gender': 'male',
 'id': '1',
 'prizes': [{'affiliations': [{'city': 'Munich',
                               'country': 'Germany',
                               'name': 'Munich University'}],
             'category': 'physics',
             'motivation': '"in recognition of the extraordinary services he '
                           'has rendered by the discovery of the remarkable '
                           'rays subsequently named after him"',
             'share': '1',
             'year': '1901'}],
 'surname': 'Röntgen'}


In [46]:
# TODO: Print the first name of the first document

With `count_documents` you can count the number of matching documents. 

In [None]:
# How many female laureates exists?

With the `$regex` function you can use a regular expression. `distinct` list only all distinct entries.

In [48]:
db.laureates.distinct("bornCountry", {"bornCountry": {"$regex": "Germany"}})

['Prussia (now Germany)',
 'Hesse-Kassel (now Germany)',
 'Germany',
 'Schleswig (now Germany)',
 'Germany (now Poland)',
 'Germany (now France)',
 'West Germany (now Germany)',
 'Bavaria (now Germany)',
 'Germany (now Russia)',
 'Mecklenburg (now Germany)',
 'W&uuml;rttemberg (now Germany)',
 'East Friesland (now Germany)']

In [49]:
# TODO: How many laureates are from Germany?

In [50]:
# TODO: Find all physics nobel laureates that are from Germany
# print the year of the first prize, the first name, and surename

In [51]:
# TODO: find and print the document for "Malala" (firstname)

## Sort()

With `sort()` you can sort the list of documents. The parameter of sort is a list of sorting tuples. Each tuple is a value and an integer value 1 or -1 which states whether the collection to be sorted in ascending (1) or descending (-1) order.

Sort all restaurants according to the grade in ascending order.
```python
cursor = db.restaurants.find().sort([("grades.grade", 1)])
```

In [None]:
# TODO: Find only female nobel laureates
# and sort them according the the prize year in ascending order
# print year of the first prize, firstname, and surename