## 🛠️ Project Kickoff: Setting Up MongoDB + Jupyter

Let’s get this thing going. First, we load our dataset into MongoDB.

In [None]:
# Import necessary libraries
from pymongo import MongoClient
import pprint

In [None]:
# Create a MongoClient instance
client = MongoClient()

In [None]:
# List all databases to confirm uk_food is listed
print(client.list_database_names())

In [None]:
# Access the uk_food database and list its collections
db = client['uk_food']
print(db.list_collection_names())

In [None]:
# Access and preview a document from the establishments collection
establishments = db['establishments']
pprint.pprint(establishments.find_one())

In [None]:
# Assign the establishments collection to a variable for use
establishments = db['establishments']

# 🧼 Eat Safe, Love – Data Cleaning Edition

## Notebook Setup
This notebook walks through the setup and cleanup of our food inspection dataset—making sure it’s polished before we do any analysis.

In [None]:
from pymongo import MongoClient
import pandas as pd
from pprint import pprint

In [None]:
# Create an instance of MongoClient
mongo = MongoClient(port=27017)

In [None]:
# assign the uk_food database to a variable name
db = mongo['uk_food']

In [None]:
# review the collections in our database


In [None]:
# assign the collection to a variable
establishments = db['establishments']

## Part 3: Exploratory Analysis Setup
Before we dig into specific questions, here’s the general plan:
- Use `count_documents` to see how many results we’re working with
- Use `pprint` to peek at the first entry
- Convert to a DataFrame, check row count, and preview top rows

### 🔍 Who’s got a hygiene score of 20?

In [None]:
# Find the establishments with a hygiene score of 20
query =

# Use count_documents to display the number of documents in the result

# Display the first document in the results using pprint


In [None]:
# Convert the result to a Pandas DataFrame

# Display the number of rows in the DataFrame

# Display the first 10 rows of the DataFrame


### 🏙️ What are London's cleaner spots? (RatingValue >= 4)

In [None]:
# Find the establishments with London in the LocalAuthorityName and has a RatingValue greater than or equal to 4.
query =

# Use count_documents to display the number of documents in the result

# Display the first document in the results using pprint


In [None]:
# Convert the result to a Pandas DataFrame

# Display the number of rows in the DataFrame

# Display the first 10 rows of the DataFrame


### ⭐ Closest, cleanest 5-star joints near Penang Flavours

In [None]:
# Search within 0.01 degree on either side of the latitude and longitude.
# Rating value must equal 5
# Sort by hygiene score

degree_search = 0.01
latitude =
longitude =

query =
sort =
limit =

# Print the results


In [None]:
# Convert result to Pandas DataFrame


### 🧼 How many zero-hygiene places per local authority?

In [None]:
# Create a pipeline that:
# 1. Matches establishments with a hygiene score of 0
# 2. Groups the matches by Local Authority
# 3. Sorts the matches from highest to lowest

# Print the number of documents in the result

# Print the first 10 results


In [None]:
# Convert the result to a Pandas DataFrame

# Display the number of rows in the DataFrame

# Display the first 10 rows of the DataFrame


## 🧽 Part 2: Data Cleanup – Fix It Before We Flip It

In [None]:
# New restaurant data
new_restaurant = {
    "BusinessName": "Penang Flavours",
    "BusinessType": "Restaurant/Cafe/Canteen",
    "BusinessTypeID": "",
    "AddressLine1": "Penang Flavours",
    "AddressLine2": "146A Plumstead Rd",
    "AddressLine3": "London",
    "AddressLine4": "",
    "PostCode": "SE18 7DY",
    "Phone": "",
    "LocalAuthorityCode": "511",
    "LocalAuthorityName": "Greenwich",
    "LocalAuthorityWebSite": "http://www.royalgreenwich.gov.uk",
    "LocalAuthorityEmailAddress": "health@royalgreenwich.gov.uk",
    "scores": {
        "Hygiene": "",
        "Structural": "",
        "ConfidenceInManagement": ""
    },
    "SchemeType": "FHRS",
    "geocode": {
        "longitude": "0.08384000",
        "latitude": "51.49014200"
    },
    "RightToReply": "",
    "Distance": 4623.9723280747176,
    "NewRatingPending": True
}
establishments.insert_one(new_restaurant)

### 🍽️ Step 2: What’s the BusinessTypeID for 'Restaurant/Cafe/Canteen'?

In [None]:
pprint.pprint(establishments.find_one(
    {"BusinessType": "Restaurant/Cafe/Canteen"},
    {"BusinessType": 1, "BusinessTypeID": 1, "_id": 0}
))

### 🔄 Step 3: Let’s update Penang Flavours with that info

In [None]:
# Replace with actual ID from above after running
establishments.update_one(
    {"BusinessName": "Penang Flavours"},
    {"$set": {"BusinessTypeID": "1"}}  # <- Replace '1' with actual ID
)

### 🗑️ Step 4: Say goodbye to Dover’s establishments

In [None]:
dover_count = establishments.count_documents({"LocalAuthorityName": "Dover"})
print(f"Establishments in Dover before deletion: {dover_count}")

establishments.delete_many({"LocalAuthorityName": "Dover"})

dover_count_after = establishments.count_documents({"LocalAuthorityName": "Dover"})
print(f"Establishments in Dover after deletion: {dover_count_after}")

### 📍 Step 5: Convert latitude/longitude to usable decimals

In [None]:
establishments.update_many(
    {},
    [{
        "$set": {
            "geocode.latitude": {"$toDouble": "$geocode.latitude"},
            "geocode.longitude": {"$toDouble": "$geocode.longitude"}
        }
    }]
)

### 🔢 Step 6: Convert `RatingValue` to integers

In [None]:
establishments.update_many(
    {},
    [{"$set": {"RatingValue": {"$toInt": "$RatingValue"}}}]
)