In [1]:
# semantic.py
!pip install spacy
!python -m spacy download en_core_web_md

import spacy
from google.colab import files

# Upload the movies.txt file
uploaded = files.upload()  # This will prompt you to choose the file from your computer

# Read the uploaded file
movies = []
for filename in uploaded.keys():
    with open(filename, "r", encoding="utf-8") as f:
        movies = [line.strip() for line in f.readlines()]

# Load SpaCy medium model
nlp = spacy.load("en_core_web_md")

# Planet Hulk description
planet_hulk_desc = (
    "Will he save their world or destroy it? "
    "When the Hulk becomes too dangerous for the Earth, the Illuminati trick Hulk into a shuttle "
    "and launch him into space to a planet where the Hulk can live in peace. "
    "Unfortunately, Hulk lands on the planet Sakaar where he is sold into slavery and trained as a gladiator."
)

# Function to find the most similar movie
def most_similar_movie(description, movie_list):
    """Returns the movie with the description most similar to the input."""
    desc_doc = nlp(description)
    max_similarity = -1
    best_movie = None
    for movie in movie_list:
        movie_doc = nlp(movie)
        similarity = desc_doc.similarity(movie_doc)
        if similarity > max_similarity:
            max_similarity = similarity
            best_movie = movie
    return best_movie, max_similarity

# Run the function
recommended_movie, similarity_score = most_similar_movie(planet_hulk_desc, movies)
print(f"Recommended movie: {recommended_movie}")
print(f"Similarity score: {similarity_score:.3f}")

Collecting en-core-web-md==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.8.0/en_core_web_md-3.8.0-py3-none-any.whl (33.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m33.5/33.5 MB[0m [31m57.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: en-core-web-md
Successfully installed en-core-web-md-3.8.0
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_md')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


Saving movies.txt to movies.txt
Recommended movie: Movie F :In the last moments of World War II, a young German soldier fighting for survival finds a Nazi captain's uniform. Impersonating an officer, the man quickly takes on the monstrous identity of the perpetrators he is trying to escape from.
Similarity score: 0.947


SpaCy’s medium model (en_core_web_md) uses word vectors and looks for semantic content.

Even though the movie plot is very different (WWII story vs. Planet Hulk), SpaCy is likely picking up on:

Words associated with action, conflict, danger, or survival (“soldier fighting for survival”, “monstrous identity”, etc.)

Certain common terms like “soldier”, “monstrous”, or “dangerous” may have high vector similarity to “Hulk”, “gladiator”, “planet”, “dangerous”, etc.

SpaCy is not reading plot logic, only semantic closeness of words.

SpaCy’s model captures general similarity of words and concepts, but it doesn’t understand context like humans do.