# 🎬 AI-Powered Movie Recommendation Engine (MongoDB + AI)

This notebook recommends movies based on:
- Similar **cast**
- Same **director/production**
- Similar **plot** using vector search (MongoDB Atlas Vector Search)
- Streaming availability using TMDb API


In [1]:
# 📦 Install required packages
!pip install pymongo[srv] sentence-transformers requests

Collecting sentence-transformers
  Downloading sentence_transformers-4.1.0-py3-none-any.whl.metadata (13 kB)
Collecting pymongo[srv]
  Downloading pymongo-4.13.0-cp311-cp311-win_amd64.whl.metadata (22 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo[srv])
  Using cached dnspython-2.7.0-py3-none-any.whl.metadata (5.8 kB)
Collecting transformers<5.0.0,>=4.41.0 (from sentence-transformers)
  Downloading transformers-4.52.4-py3-none-any.whl.metadata (38 kB)
Collecting huggingface-hub>=0.20.0 (from sentence-transformers)
  Downloading huggingface_hub-0.32.4-py3-none-any.whl.metadata (14 kB)
Collecting tokenizers<0.22,>=0.21 (from transformers<5.0.0,>=4.41.0->sentence-transformers)
  Downloading tokenizers-0.21.1-cp39-abi3-win_amd64.whl.metadata (6.9 kB)
Collecting safetensors>=0.4.3 (from transformers<5.0.0,>=4.41.0->sentence-transformers)
  Downloading safetensors-0.5.3-cp38-abi3-win_amd64.whl.metadata (3.9 kB)
Downloading sentence_transformers-4.1.0-py3-none-any.whl (345 kB)
   -----



In [6]:
# 🔗 Connect to MongoDB Atlas
from pymongo import MongoClient

# Replace with your connection string
client = MongoClient('mongodb+srv://devkeetu18:Bellebau1807@clusterhack.aea5ryf.mongodb.net/?retryWrites=true&w=majority&appName=Clusterhack')
db = client['sample_mflix']

In [7]:
# 🎯 Fetch the movie document by title
movie_title = 'The Perils of Pauline'
movie_doc = db.movies.find_one({"title": movie_title})

if movie_doc:
    print(f"Found: {movie_doc['title']}")
else:
    print("Movie not found")

Found: The Perils of Pauline


In [8]:
# 🎭 Movies with similar cast
cast_members = movie_doc.get("cast", [])
similar_by_cast = db.movies.find({
    "cast": {"$in": cast_members},
    "title": {"$ne": movie_title}
}).limit(5)

print("\nSimilar by Cast:")
for movie in similar_by_cast:
    print(movie.get("title"))


Similar by Cast:


In [9]:
# 🎬 Same director or production
directors = movie_doc.get("directors", [])
production = movie_doc.get("production")

similar_by_team = db.movies.find({
    "$or": [
        {"directors": {"$in": directors}},
        {"production": production}
    ],
    "title": {"$ne": movie_title}
}).limit(5)

print("\nSimilar by Director/Production:")
for movie in similar_by_team:
    print(movie.get("title"))


Similar by Director/Production:
The Great Train Robbery
A Corner in Wheat
Winsor McCay, the Famous Cartoonist of the N.Y. Herald and His Moving Comics
Traffic in Souls
Gertie the Dinosaur


In [10]:
# 🧠 Similar plot using vector search (MongoDB Atlas)
embedding = movie_doc.get("plot_embedding")

similar_by_plot = db.movies.aggregate([
    {
        "$vectorSearch": {
            "index": "plot_vector_index",
            "path": "plot_embedding",
            "queryVector": embedding,
            "numCandidates": 100,
            "limit": 5
        }
    },
    {"$match": {"title": {"$ne": movie_title}}}
])

print("\nSimilar by Plot:")
for movie in similar_by_plot:
    print(movie.get("title"))

OperationFailure: PlanExecutor error during aggregation :: caused by :: "queryVector" Unexpected type found when parsing vector, full error: {'ok': 0.0, 'errmsg': 'PlanExecutor error during aggregation :: caused by :: "queryVector" Unexpected type found when parsing vector', 'code': 8, 'codeName': 'UnknownError', '$clusterTime': {'clusterTime': Timestamp(1749103454, 14), 'signature': {'hash': b'\x9d\x19|\xce\xa1j\x87`\xf2\xc5\xa2S.\\"\xd9\x95\x9bW\xce', 'keyId': 7450139087270313987}}, 'operationTime': Timestamp(1749103454, 14)}

In [11]:
# 📺 Get streaming platform (TMDb API)
import requests

def get_streaming_services(title):
    api_key = 'your_tmdb_api_key'
    res = requests.get('https://api.themoviedb.org/3/search/movie',
        params={"api_key": api_key, "query": title})
    results = res.json().get('results')
    if not results:
        return {}
    movie_id = results[0]['id']

    prov_url = f'https://api.themoviedb.org/3/movie/{movie_id}/watch/providers'
    watch = requests.get(prov_url, params={"api_key": api_key})
    return watch.json().get("results", {})

streaming_info = get_streaming_services(movie_title)
print("\nStreaming Platforms:")
print(streaming_info)


Streaming Platforms:
{}
