# MongoDB Lesson - Working Notes & Q&A

This notebook contains my questions and explanations from learning MongoDB concepts. It serves as a reference guide for understanding NoSQL database operations.

## Question 1: What does `client.list_database_names()` mean?

### My Question:
> "What is the code means?"

### Explanation:
This code calls a **method** on the MongoDB client connection to retrieve a list of all database names available in the MongoDB cluster.

**Think of it like:** Asking your MongoDB server *"Hey, what databases do you have stored?"* - similar to asking someone *"What folders do you have in your filing cabinet?"*

**Breaking it down:**
- **`client`** - Connection object to the MongoDB cluster (like having a phone line to the database)
- **`.list_database_names()`** - Built-in MongoDB method that returns all database names as a list
- **Parentheses `()`** - Mean you're calling/executing this method

**Returns:** A Python list containing strings of all database names, like:
```python
['admin', 'sample_mflix', 'local', 'myapp_db']
```

**Real-world use:** This is typically one of the first things you do when connecting to a MongoDB cluster - it's like getting your bearings and seeing what's available to work with.

In [None]:
# Example: List all databases in the cluster
client.list_database_names()

## Question 2: How do MongoDB collections relate to each other?

### My Question:
> "In my db collection, there are 6 tables, this code will extract all form the 6 table? how does it link to each other from the different table?"

### Key Understanding:
**Important:** `movies.find_one()` will **only** extract data from **one single collection** - specifically the `movies` collection. It won't touch the other 5 collections.

### The 6 Collections in sample_mflix:
- `theaters` - Physical theater locations
- `sessions` - Movie screening times/showtimes
- `users` - Customer/user accounts
- `movies` - Movie catalog/content
- `comments` - User reviews/ratings
- `embedded_movies` - Movies with nested/embedded data

### How Collections Link Together:
**Think of it like a movie theater ecosystem:**

1. **`movies` ↔ `comments`**: Movies have user reviews
   - Link: `comments.movie_id` references `movies._id`

2. **`users` ↔ `comments`**: Users write comments about movies
   - Link: `comments.email` references `users.email`

3. **`theaters` ↔ `sessions`**: Theaters host movie screenings
   - Link: `sessions.theaterId` references `theaters.theaterId`

4. **`movies` ↔ `sessions`**: Sessions show specific movies
   - Link: `sessions.movie` references `movies._id`

### MongoDB vs SQL:
- **SQL tables** are often connected through foreign keys and require JOINs
- **MongoDB collections** are more independent - they don't automatically "link" to each other
- You need to use **aggregation pipelines** with `$lookup` to join collections (like JOINs in SQL)

In [None]:
# Example: Access each collection separately
movie = db.movies.find_one()
comment = db.comments.find_one()
theater = db.theaters.find_one()

print("Movie:", movie['title'] if movie else "No movies found")
print("Comment:", comment['text'][:50] if comment else "No comments found")
print("Theater:", theater['name'] if theater else "No theaters found")

## Question 3: What is ObjectId and why do we need to import it?

### My Question:
> "What is this? why do we need to import this?"

### What is ObjectId?
`ObjectId` is MongoDB's **special data type** for unique identifiers. Think of it like a super-powered primary key that's much more sophisticated than a simple number.

### The Problem:
In MongoDB, every document has a unique `_id` field that looks like:
```
_id: ObjectId("573a1390f29313caabcd4eaf")
```

When you work with this ID as a string in Python (like `"573a1390f29313caabcd4eaf"`), MongoDB doesn't recognize it because it's expecting the special `ObjectId` format, not just plain text.

### The Solution:
The import `from bson.objectid import ObjectId` gives you a **converter tool** that transforms a string into the proper MongoDB ObjectId format.

### Real-World Analogy:
Think of it like **international phone numbers**:
- You can't just dial "1234567890" to call someone internationally
- You need the proper format: "+1-123-456-7890"
- `ObjectId()` is like adding the country code and formatting - it converts your "string" into the format MongoDB actually understands

### When You Need It:
- **Searching by ID**: `movies.find_one({'_id': ObjectId('573a1390f29313caabcd4eaf')})`
- **Updating specific documents**: Using the exact `_id` to target one document
- **Creating relationships**: When one collection references another by `_id`

### What Happens Without It:
If you tried `movies.find_one({'_id': '573a1390f29313caabcd4eaf'})` (without ObjectId), MongoDB would return `None` because it's looking for an ObjectId but you gave it a string.

In [None]:
from bson.objectid import ObjectId

# Example: Correct way to search by _id
movie = movies.find_one({'_id': ObjectId('573a1390f29313caabcd4eaf')})

# This would NOT work (returns None):
# movie = movies.find_one({'_id': '573a1390f29313caabcd4eaf'})

## Question 4: Is the _id auto-generated and unique?

### My Questions:
> "Is this id in mongodb auto generated? and is the unique key for each collection?"

### Is the _id Auto-Generated?
**Yes!** MongoDB **automatically generates** a unique `_id` for every document you insert, unless you specifically provide one yourself.

Think of it like getting an automatic ticket number when you take a number at the DMV - you don't have to create it, the system does it for you.

### Is it the Unique Key for Each Collection?
**Absolutely!** The `_id` field serves as the **primary key** for each collection.

### Key Properties:
- **Unique within each collection**: No two documents in the same collection can have the same `_id`
- **Immutable**: Once created, you cannot change a document's `_id`
- **Always indexed**: MongoDB automatically creates an index on `_id` for fast lookups

### ObjectId Structure:
When MongoDB auto-generates an `_id`, it creates an `ObjectId` that contains:
- **Timestamp**: When the document was created
- **Machine identifier**: Which server created it
- **Process ID**: Which process created it
- **Counter**: An incrementing number for uniqueness

This makes it like a super-smart ID that tells you **when**, **where**, and **how** the document was created!

### Real-World Analogy:
Think of `_id` like a **Social Security Number** for documents:
- Every person (document) gets exactly one
- It's automatically assigned at "birth" (creation)
- It uniquely identifies that person across the entire system
- You can't change it once it's assigned

In [None]:
# Example: Auto-generated vs Manual _id

# Auto-generated _id (MongoDB creates it)
result1 = movies.insert_one({'title': 'My Movie Auto', 'year': 2024})
print(f"Auto-generated _id: {result1.inserted_id}")

# Manual _id (you provide it)
result2 = movies.insert_one({'_id': 'custom123', 'title': 'My Movie Manual', 'year': 2024})
print(f"Manual _id: {result2.inserted_id}")

# Clean up the test documents
movies.delete_one({'_id': result1.inserted_id})
movies.delete_one({'_id': 'custom123'})

## Key Learning Points Summary

### MongoDB Collection Concepts:
1. **Collections are independent** - one query only accesses one collection
2. **Relationships exist** but require explicit linking (via aggregation or application logic)
3. **Each collection has its own unique _id system**

### ObjectId Essentials:
1. **Always import** `from bson.objectid import ObjectId` when working with _id fields
2. **Auto-generated by default** - you don't need to create unique IDs
3. **Contains metadata** - timestamp, machine, process info
4. **Immutable and unique** within each collection

### Practical Applications:
- Use `find_one()` for single document retrieval
- Use aggregation pipelines with `$lookup` for joining collections
- Always convert string IDs to ObjectId when querying by _id
- Let MongoDB auto-generate _id unless you have specific requirements

## Question 5: What does the `$gt` operator do in MongoDB queries?

### My Question:
> "Explain this" (referring to: `for m in movies.find({"released": {"$gt": d}}):`)

### What This Code Does:
This code searches for all movies that were **released after** December 1, 2015 and prints each one.

```python
for m in movies.find({"released": {"$gt": d}}):
    print(m)
```

### Breaking It Down:

#### **The Query Structure:**
- **`movies.find()`** - Searches through the movies collection
- **`{"released": {"$gt": d}}`** - The filter criteria

#### **The `$gt` Operator:**
- **`$gt`** stands for **"greater than"**
- It's a **comparison operator** in MongoDB
- **`d`** is the datetime object `datetime(2015, 12, 1)` from the previous line

### Real-World Analogy:
Think of this like asking a librarian: *"Show me all movies that came out after December 1, 2015."* The librarian goes through the catalog and hands you every movie released from December 2, 2015 onwards.

### How It Works Step-by-Step:
1. **MongoDB looks at each movie document**
2. **Checks the `released` field** (which contains a date)
3. **Compares it to December 1, 2015**
4. **If the movie's release date is later** → includes it in results
5. **Loops through all matching movies** and prints them

### Why Use Date Filtering?
This is extremely useful in data science for:
- **Analyzing recent trends** - "What movies came out in the last 5 years?"
- **Time-series analysis** - Studying patterns over specific periods
- **Business intelligence** - "Show me all sales after Q4 2020"

### Related Query Operators:
- **`$gte`** - "greater than or equal" (includes December 1st)
- **`$lt`** - "less than" 
- **`$lte`** - "less than or equal"
- **`$ne`** - "not equal"
- **`$in`** - "in" (matches any value in an array)

In [None]:
# Example: Date filtering with $gt operator
from datetime import datetime

# Set the comparison date
d = datetime(2015, 12, 1)

# Find all movies released after December 1, 2015
for m in movies.find({"released": {"$gt": d}}):
    print(f"Title: {m['title']}, Released: {m['released']}")

# Related query operators examples:

# 1. Greater than or equal ($gte) - includes the exact date
start_date = datetime(2015, 12, 1)
movies_from_dec_1st = movies.find({"released": {"$gte": start_date}})

# 2. Less than ($lt) - movies before a date
old_movies = movies.find({"released": {"$lt": datetime(2010, 1, 1)}})

# 3. Date range query (between two dates)
start_date = datetime(2015, 12, 1)
end_date = datetime(2015, 12, 15)
movies_in_range = movies.find({
    "released": {
        "$gte": start_date, 
        "$lte": end_date
    }
})

# 4. Not equal ($ne) - exclude specific values
not_action_movies = movies.find({"genres": {"$ne": "Action"}})

# 5. In array ($in) - match any value in a list
specific_years = movies.find({"year": {"$in": [2015, 2016, 2017]}})

print("Examples of different query operators completed!")

In [None]:
# Template for common MongoDB operations

# 1. List all collections
collections = db.list_collection_names()
print("Available collections:", collections)

# 2. Access a specific collection
movies = db.movies

# 3. Find one document
sample_movie = movies.find_one()
print("Sample movie _id:", sample_movie['_id'])

# 4. Query by ObjectId
if sample_movie:
    same_movie = movies.find_one({'_id': sample_movie['_id']})
    print("Found same movie:", same_movie['title'])

## MongoDB Functions Reference

This section compiles all the MongoDB functions and methods discussed in the main lesson, organized by category for easy reference.

### 🔗 **Connection & Setup Functions**

#### `MongoClient(uri, server_api)`
- **Purpose**: Creates a connection to MongoDB cluster
- **Example**: `client = MongoClient(uri, server_api=ServerApi('1'))`
- **When to use**: Initial setup to connect to your MongoDB database

#### `client.admin.command('ping')`
- **Purpose**: Tests if connection to MongoDB is successful
- **Returns**: Confirmation of connection status
- **When to use**: To verify your connection is working

---

### 🗂️ **Database & Collection Discovery Functions**

#### `client.list_database_names()`
- **Purpose**: Lists all databases available in the cluster
- **Returns**: Python list of database names
- **Example**: `['admin', 'sample_mflix', 'local']`
- **When to use**: To explore what databases are available

#### `db.list_collection_names()`
- **Purpose**: Lists all collections within a specific database
- **Returns**: Python list of collection names
- **Example**: `['movies', 'comments', 'users', 'theaters']`
- **When to use**: To see what collections exist in your database

---

### 📖 **Read (Query) Functions**

#### `collection.find_one()`
- **Purpose**: Retrieves a single document from collection
- **Returns**: First document found or None
- **Example**: `movies.find_one()` - gets any one movie
- **When to use**: When you need just one example document

#### `collection.find_one(query)`
- **Purpose**: Retrieves a single document matching specific criteria
- **Returns**: First matching document or None
- **Example**: `movies.find_one({'title': 'Traffic in Souls'})`
- **When to use**: When searching for a specific document

#### `collection.find()`
- **Purpose**: Retrieves multiple documents (returns cursor)
- **Returns**: Cursor object (iterable)
- **Example**: `for movie in movies.find(): print(movie)`
- **When to use**: When you need multiple documents

#### `collection.find().limit(n)`
- **Purpose**: Limits number of results returned
- **Returns**: Cursor with maximum n documents
- **Example**: `movies.find().limit(5)`
- **When to use**: To prevent overwhelming output or for pagination

#### `collection.find().sort(field, direction)`
- **Purpose**: Sorts results by specified field
- **Returns**: Sorted cursor
- **Example**: `movies.find().sort('year', pymongo.ASCENDING)`
- **When to use**: When you need ordered results

---

### 🔍 **Query Operators**

#### `{"field": {"$gt": value}}`
- **Purpose**: Greater than comparison
- **Example**: `{"released": {"$gt": datetime(2015, 12, 1)}}`
- **When to use**: For date/number range queries

#### `{"field": {"$gte": value}}`
- **Purpose**: Greater than or equal comparison
- **Example**: `{"comment_count": {"$gte": 1}}`
- **When to use**: For inclusive range queries

#### `{"field": {"$regex": "pattern"}}`
- **Purpose**: Text pattern matching
- **Example**: `{"plot": {"$regex": "spy"}}`
- **When to use**: For text search within fields

### Related Operations
- $lt - Less than comparison
- $lte - Less than or equal
- $ne - Not equal (exclude values)
- $in - Match any value in array
- $nin - Not in array (exclude multiple values)
- $exists - Check if field exists
- $regex variations - Start-of-string pattern matching
- Range Queries - Combining multiple operators
---

### 🔧 **Data Manipulation Functions**

#### `collection.insert_one(document)`
- **Purpose**: Inserts a single new document
- **Returns**: InsertOneResult with inserted_id
- **Example**: `movies.insert_one({'title': 'New Movie', 'year': 2024})`
- **When to use**: Adding new records

#### `collection.insert_many(documents)`
- **Purpose**: Inserts multiple documents at once
- **Returns**: InsertManyResult with inserted_ids
- **Example**: `movies.insert_many([{doc1}, {doc2}])`
- **When to use**: Bulk data insertion

#### `collection.update_one(filter, update)`
- **Purpose**: Updates a single document matching filter
- **Returns**: UpdateResult with modification info
- **Example**: `movies.update_one({'title': 'Old Title'}, {'$set': {'title': 'New Title'}})`
- **When to use**: Modifying specific documents

#### `collection.update_many(filter, update)`
- **Purpose**: Updates multiple documents matching filter
- **Returns**: UpdateResult with modification info
- **When to use**: Bulk updates across multiple documents

#### `collection.delete_one(filter)`
- **Purpose**: Deletes a single document matching filter
- **Returns**: DeleteResult with deletion info
- **Example**: `movies.delete_one({'title': 'Movie to Delete'})`
- **When to use**: Removing specific records

#### `collection.delete_many(filter)`
- **Purpose**: Deletes multiple documents matching filter
- **Returns**: DeleteResult with deletion info
- **When to use**: Bulk deletion operations

---

### 🔀 **Advanced Operations - Aggregation**

#### `collection.aggregate(pipeline)`
- **Purpose**: Runs complex data processing pipeline
- **Returns**: Cursor with processed results
- **Example**: `movies.aggregate([{"$match": {"year": 2020}}])`
- **When to use**: For complex data analysis, joins, grouping

#### `{"$match": {query}}`
- **Purpose**: Filters documents in aggregation pipeline
- **Example**: `{"$match": {"title": "A Star Is Born"}}`
- **When to use**: First stage filtering in pipelines

#### `{"$sort": {field: direction}}`
- **Purpose**: Sorts documents in aggregation pipeline
- **Example**: `{"$sort": {"year": pymongo.ASCENDING}}`
- **When to use**: Ordering results in pipelines

#### `{"$lookup": {options}}`
- **Purpose**: Joins data from different collections (like SQL JOIN)
- **Example**: `{"$lookup": {"from": "comments", "localField": "_id", "foreignField": "movie_id", "as": "related_comments"}}`
- **When to use**: Combining data from multiple collections

#### `{"$group": {options}}`
- **Purpose**: Groups documents and performs aggregations
- **Example**: `{"$group": {"_id": "$year", "movie_count": {"$sum": 1}}}`
- **When to use**: For counting, averaging, grouping operations

#### `{"$addFields": {new_fields}}`
- **Purpose**: Adds new computed fields to documents
- **Example**: `{"$addFields": {"comment_count": {"$size": "$related_comments"}}}`
- **When to use**: Creating calculated fields

#### `{"$limit": number}`
- **Purpose**: Limits number of documents in pipeline
- **Example**: `{"$limit": 5}`
- **When to use**: Controlling output size in pipelines

---

### 🛠️ **Utility Functions**

#### `ObjectId(string)`
- **Purpose**: Converts string to MongoDB ObjectId format
- **Example**: `ObjectId('573a1390f29313caabcd4eaf')`
- **When to use**: When querying by _id field

#### `datetime(year, month, day)`
- **Purpose**: Creates Python datetime object for date queries
- **Example**: `datetime(2015, 12, 1)`
- **When to use**: For date-based filtering

---

### 📊 **Update Operators**

#### `{"$set": {field: new_value}}`
- **Purpose**: Sets field to new value
- **Example**: `{"$set": {"title": "New Title"}}`
- **When to use**: Updating specific fields

#### `{"$sum": 1}` or `{"$sum": "$field"}`
- **Purpose**: Counts documents or sums field values
- **Example**: `{"movie_count": {"$sum": 1}}`
- **When to use**: In aggregation for counting/summing

#### `{"$size": "$array_field"}`
- **Purpose**: Gets size/length of an array field
- **Example**: `{"$size": "$related_comments"}`
- **When to use**: Counting array elements

## Question 6: What does `.aggregate()` do in MongoDB?

### Understanding `.aggregate()` - The Data Processing Pipeline

Think of `.aggregate()` like a **data processing factory assembly line**. Each stage performs a specific operation on your data as it flows through the pipeline, transforming and analyzing documents in sophisticated ways.

### What `.aggregate()` Does:
- **Processes data in stages** - like an assembly line with multiple workstations
- **Transforms and analyzes** documents through a series of operations  
- **Combines multiple operations** that would require separate queries in simpler databases
- **Returns a cursor** you can iterate through, just like `find()`

### Why Use Aggregation vs Simple Queries?

#### Simple Queries (`find()`):
- **Purpose**: Basic filtering and retrieval
- **Like asking**: *"Show me all red cars"*
- **Best for**: Single-step operations, direct document matching

#### Aggregation Pipelines (`aggregate()`):
- **Purpose**: Complex data analysis and transformation
- **Like asking**: *"Show me all red cars, group them by year, count how many per year, sort by year, and only show years with more than 10 cars"*
- **Best for**: Multi-step operations, joins, calculations, analytics

### Real-World Analogy - Library Organization:

Imagine you're organizing a massive library with multiple sections:

1. **Stage 1** (`$lookup`): *"Bring related books from different sections together"*
   - Like gathering a main book + all its reviews from the review section

2. **Stage 2** (`$addFields`): *"Calculate how many reviews each book has"*  
   - Like counting review slips and writing the total on each book

3. **Stage 3** (`$match`): *"Only keep books with at least 1 review"*
   - Like filtering out books that nobody has reviewed

4. **Stage 4** (`$limit`): *"Take only the first 5 books"*
   - Like selecting just 5 books for display

Each stage receives the output from the previous stage and passes its results to the next stage.

### How Aggregation Pipelines Work:

```python
# Pipeline structure - it's just a list of operations!
pipeline = [
    stage_1,  # First operation
    stage_2,  # Second operation  
    stage_3,  # Third operation
]

# Execute the pipeline
results = collection.aggregate(pipeline)
```

### Key Benefits of Aggregation:
- **Server-side processing** - MongoDB does the heavy lifting
- **Efficient data flow** - Each stage only processes what it needs
- **Complex analytics** - Joins, grouping, calculations in one operation
- **Flexible and powerful** - Can handle sophisticated data analysis tasks

### When to Use Aggregation:
- **Joining collections** (like SQL JOINs)
- **Grouping and counting** (like SQL GROUP BY)
- **Complex filtering** with multiple conditions
- **Data transformation** and calculation
- **Analytics and reporting** operations

The `.aggregate()` method is what makes MongoDB extremely powerful for data analysis - it's like having a built-in data science toolkit!

In [None]:
# Quick Reference - Common MongoDB Code Patterns

# === CONNECTION SETUP ===
from pymongo.mongo_client import MongoClient
from pymongo.server_api import ServerApi
from bson.objectid import ObjectId
from datetime import datetime
import pymongo

# Connect to MongoDB
client = MongoClient(uri, server_api=ServerApi('1'))
db = client.database_name
collection = db.collection_name

# === BASIC QUERIES ===
# Get one document
doc = collection.find_one()
doc = collection.find_one({"field": "value"})
doc = collection.find_one({"_id": ObjectId("string_id")})

# Get multiple documents
for doc in collection.find():
    print(doc)

for doc in collection.find({"field": "value"}).limit(5):
    print(doc)

# === COMMON QUERY PATTERNS ===
# Date range queries
start_date = datetime(2015, 12, 1)
end_date = datetime(2015, 12, 15)
collection.find({"released": {"$gt": start_date}})
collection.find({"released": {"$gte": start_date, "$lte": end_date}})

# Text search
collection.find({"plot": {"$regex": "spy"}})
collection.find({"plot": {"$regex": "^Once upon a time"}})  # starts with

# Sorting
collection.find().sort("year", pymongo.ASCENDING)
collection.find().sort("year", pymongo.DESCENDING)

# === CRUD OPERATIONS ===
# Create
result = collection.insert_one({"title": "New Movie", "year": 2024})
print(f"Inserted ID: {result.inserted_id}")

# Update
collection.update_one(
    {"title": "Old Title"}, 
    {"$set": {"title": "New Title"}}
)

# Delete
collection.delete_one({"title": "Movie to Delete"})

# === AGGREGATION PATTERNS ===
# Simple aggregation
pipeline = [
    {"$match": {"year": 2020}},
    {"$sort": {"title": 1}},
    {"$limit": 10}
]
results = collection.aggregate(pipeline)

# Join collections (lookup)
pipeline = [
    {"$lookup": {
        "from": "comments",
        "localField": "_id",
        "foreignField": "movie_id",
        "as": "movie_comments"
    }},
    {"$limit": 5}
]

# Group and count
pipeline = [
    {"$group": {
        "_id": "$year",
        "count": {"$sum": 1}
    }},
    {"$sort": {"_id": 1}}
]