# Big Data Technologies - NoSQL Databases: MongoDB

MongoDB was installed using Docker with the following command in a terminal:

docker run -d -p 27017:27017 --name mongodb mongo:7

## Task 1

In [1]:
# Start container if needed
!docker start mongodb

mongodb


## Task 2

The MongoDB database and collection were created using the Shell with these commands:

1. docker exec -it mongodb mongosh
2. use university
3. db.createCollection("students")


## Task 3

Step 1: Install MongoDB Python Client Library:

In [4]:
import pip
from gettext import install
!pip install pymongo

Collecting pymongo
  Using cached pymongo-4.12.1-cp312-cp312-win_amd64.whl.metadata (22 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
  Using cached dnspython-2.7.0-py3-none-any.whl.metadata (5.8 kB)
Using cached pymongo-4.12.1-cp312-cp312-win_amd64.whl (897 kB)
Using cached dnspython-2.7.0-py3-none-any.whl (313 kB)
Installing collected packages: dnspython, pymongo
Successfully installed dnspython-2.7.0 pymongo-4.12.1



[notice] A new release of pip is available: 24.2 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Step 2: Import all packages for the subsequent steps and task:

In [70]:
from pymongo import MongoClient
from pymongo.errors import ConnectionFailure
from faker import Faker
import random
import pandas as pd

print("Imported all packages!")

Imported all packages!


Step 3: Test the database connection. **Note:** For security reasons, the API token is read from an external file ('MongoDB_API.txt'), which is not included in this submission:

In [71]:
try:
    # Connect to MongoDB
    client = MongoClient("mongodb://localhost:27017/", serverSelectionTimeoutMS=3000)

    # Force connection by pinging the server
    client.admin.command("ping")

    # Access database and collection
    db = client["university"]
    collection = db["students"]

    print("Connection to MongoDB was successful.")
    print("Databases:", client.list_database_names())
    print("Collections in 'university':", db.list_collection_names())

except ConnectionFailure:
    print("Failed to connect to MongoDB.")

Connection to MongoDB was successful.
Databases: ['admin', 'config', 'local', 'university']
Collections in 'university': ['students']


## Task 4

Create random JSON-style documents. 200 random student documents are generated using the 'faker' library and stored in a list.  
Each document includes: 'student_id', 'ame', 'age', 'major', and 'grade'.

In [None]:
# Generate fake student data
faker = Faker()
programs = ["Computer Science", "Mathematics", "Data Science",  "Finance", "Economics", "Biology", "History"]
# Generate 200 unique 6-digit student IDs
student_ids = random.sample(range(100000, 999999), 200)

students = []
for student_id in student_ids:
    student = {
        "student_id": student_id,
        "name": faker.name(),
        "age": random.randint(18, 30),
        "program": random.choice(programs),
        "grade": round(random.uniform(1.0, 5.0), 2)
    }
    students.append(student)

# Show entries
for s in students:
    print(s)

{'student_id': 315275, 'name': 'Jennifer Johnson', 'age': 28, 'program': 'Computer Science', 'grade': 4.08}
{'student_id': 880015, 'name': 'Leslie Mendez', 'age': 27, 'program': 'Mathematics', 'grade': 4.12}
{'student_id': 984245, 'name': 'Pamela Nguyen', 'age': 27, 'program': 'Computer Science', 'grade': 2.65}
{'student_id': 289131, 'name': 'Jodi Dunn', 'age': 29, 'program': 'Data Science', 'grade': 1.77}
{'student_id': 982494, 'name': 'Dan Hartman', 'age': 23, 'program': 'Data Science', 'grade': 1.47}
{'student_id': 440893, 'name': 'Nancy Gibson', 'age': 23, 'program': 'Data Science', 'grade': 2.97}
{'student_id': 215396, 'name': 'Brandon Travis', 'age': 28, 'program': 'Data Science', 'grade': 1.86}
{'student_id': 961264, 'name': 'Luis Ford', 'age': 27, 'program': 'Computer Science', 'grade': 4.4}
{'student_id': 549186, 'name': 'Grant Mills', 'age': 27, 'program': 'Finance', 'grade': 1.65}
{'student_id': 301362, 'name': 'Brian Smith', 'age': 30, 'program': 'Mathematics', 'grade': 1.4

## Task 5

Step 1: **Insert** all 200 documents

In [73]:
# Insert the student documents into MongoDB
insert_result = collection.insert_many(students) # collection defined in step 2 of task 3 
print(f"Inserted {len(insert_result.inserted_ids)} student documents.")

# Real check if the documents are in the database
count = collection.count_documents({})
print(f"Total documents in collection 'students': {count}")

Inserted 200 student documents.
Total documents in collection 'students': 200


Step 2: **Update** all 'Mathematics' students by improving their grade by 0.3 (improving meaning decreasing by 0.3)

In [74]:
# 1. Get all Mathematics students and show first 3 as examples before the update
math_students_before = list(collection.find({"program": "Mathematics"}))
print("Grades for first 3 Mathematics students before update:")
for student in math_students_before[:3]:
    print(f"ID: {student['student_id']}, grade: {round(student['grade'], 2)}")
print()

# Update all students majoring in Mathematics: improve grade by 0.3
update_result = collection.update_many(
    {"program": "Mathematics"},  # filter condition
    {"$inc": {"grade": -0.3}}    # improve (decrease) grade by 0.3
)

# 3. Get all Mathematics students and show first 3 as examples after the update
math_students_after = list(collection.find({"program": "Mathematics"}))
print("Grades for first 3 Mathematics students after update::")
for student in math_students_after[:3]:
    print(f"ID: {student['student_id']}, grade: {round(student['grade'], 2)}")

Grades for first 3 Mathematics students before update:
ID: 880015, grade: 4.12
ID: 301362, grade: 1.46
ID: 127037, grade: 3.41

Grades for first 3 Mathematics students after update::
ID: 880015, grade: 3.82
ID: 301362, grade: 1.16
ID: 127037, grade: 3.11


Step 3: **Query** all 'Economics' students with a grade better than (lower than) 2.0

In [75]:
results = collection.find({
    "program": "Economics",
    "grade": {"$lt": 2.0}
})

# Print how many matched the query
econ_students_list = list(results)
print(f"Found {len(econ_students_list)} Economics students with grade better than 2.0.\n")

# Show first 3 as examples
print("Example results:")
for student in econ_students_list[:3]:
    print(student)


Found 14 Economics students with grade better than 2.0.

Example results:
{'_id': ObjectId('681c3c833ea16bfb05eacb4c'), 'student_id': 604245, 'name': 'Linda Herrera MD', 'age': 23, 'program': 'Economics', 'grade': 1.69}
{'_id': ObjectId('681c3c833ea16bfb05eacb54'), 'student_id': 876786, 'name': 'Debbie Valenzuela MD', 'age': 25, 'program': 'Economics', 'grade': 1.84}
{'_id': ObjectId('681c3c833ea16bfb05eacb55'), 'student_id': 667335, 'name': 'Michael Newton', 'age': 21, 'program': 'Economics', 'grade': 1.82}


Step 4: **Delete** all students with a grade worse than (higher than) 4.0

In [76]:
# 1. Delete documents 
delete_result = collection.delete_many({
    "grade": {"$gt": 4.0}
})
students_deleted_count = delete_result.deleted_count
print(f"Deleted {students_deleted_count} students with a grade worse than 4.0.")

# Show how may documents remain (should be 200 - students_deleted_count)
students_remaining_count = collection.count_documents({"grade": {"$lte": 4.0}})  # lte = less than equal
difference = 200 - students_deleted_count
print(f"Documents with a grade better than 4.0 still remaining: {students_remaining_count}")
print(f"The number of remaining students matches 200 - number of deleted students: {difference == students_remaining_count}")

Deleted 43 students with a grade worse than 4.0.
Documents with a grade better than 4.0 still remaining: 157
The number of remaining students matches 200 - number of deleted students: True


(Alternatively Step 5: **Delete** all entries)

In [77]:
#delete_result = collection.delete_many({})
#print(f"Deleted {delete_result.deleted_count} documents from the collection.")