<h2 style="font-weight:bold; text-align:center; padding:20px; border:4px solid yellow; border-radius:30px; background-color:#fffde7;">
🎬 IMDb 5000+ Movies Dataset Analysis
</h2>

<p align="center">
  <img src="https://cdn.pixabay.com/photo/2016/03/31/19/14/popcorn-1295546_1280.png" alt="Movies Banner" width="600"/>
</p>

---

### 📌 <span style="font-weight:bold; border-bottom: 3px solid yellow;">About the Dataset</span>

The <strong>TMDB 5000+ Movies Dataset</strong> contains detailed information from two CSV files:
- `tmdb_5000_movies.csv`
- `tmdb_5000_credits.csv`

It includes:

- 🎥 Movie titles  
- ⭐ IMDb-style ratings  
- 📅 Release years  
- 🎭 Genres  
- 🧑‍🤝‍🧑 Cast & Crew  
- 💰 Budget and Revenue  
- 🏆 Taglines and Production Data  

---

### 📂 Dataset Columns Preview

#### `tmdb_5000_movies.csv`

| Column Name        | Description                            |
|--------------------|----------------------------------------|
| `title`            | Movie name                             |
| `release_date`     | Release date                           |
| `budget`           | Estimated budget                       |
| `revenue`          | Gross worldwide revenue                |
| `genres`           | Main genres (JSON format)              |
| `overview`         | Movie summary                          |
| `original_language`| Language of the movie                  |
| `runtime`          | Duration in minutes                    |
| `vote_average`     | Average rating                         |
| `vote_count`       | Total number of votes                  |

#### `tmdb_5000_credits.csv`

| Column Name | Description         |
|-------------|---------------------|
| `movie_id`  | ID matching with movies file |
| `title`     | Movie name          |
| `cast`      | Cast details (JSON) |
| `crew`      | Crew details (JSON) |

---




In [1]:
# 📚 Data Handling
import numpy as np
import pandas as pd

# 🛠️ File Handling
import os
import ast
import pickle

# 📊 Feature Extraction
from sklearn.feature_extraction.text import CountVectorizer

# 🧠 Similarity Calculation
from sklearn.metrics.pairwise import cosine_similarity


![Movie Banner](https://i.pinimg.com/1200x/e2/89/78/e28978b0c7977dfb32dd77947ab3881a.jpg)

<div style="display: flex; gap: 10px;">
  <img src="https://i.pinimg.com/736x/87/ce/f0/87cef0375e611123a7a53f157b2cf196.jpg" style="width: 25%;">
  <img src="https://i.pinimg.com/736x/96/0e/a6/960ea61c1e824f8c98055cf8f4efaf85.jpg" style="width: 25%;">
  <img src="https://i.pinimg.com/1200x/e3/c5/35/e3c535ac3bb37e216ebad126f578a6ff.jpg" style="width: 25%;">
  <img src="https://i.pinimg.com/1200x/05/a9/95/05a9951138f51d653a00b27423c4b7e4.jpg" style="width: 25%;">
</div>


<h2 style="font-weight:bold; text-align:center; padding:20px; border:4px solid green; border-radius:30px; background-color:#fffde7;">
🎬 Dataset Loading
</h2>

In [2]:
movies = pd.read_csv('/kaggle/input/tmdb_5000_movies.csv')
credits = pd.read_csv('/kaggle/input/tmdb_5000_credits.csv')


In [3]:
import os
print(os.listdir('/kaggle/input/'))


['tmdb_5000_movies.csv', 'tmdb_5000_credits.csv']


In [4]:
movies.head(2)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500


In [5]:
movies.shape

(4803, 20)

In [6]:
credits.head()

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


<h2 style="font-weight:bold; text-align:center; padding:20px; border:4px solid green; border-radius:30px; background-color:#fffde7;">
🎬 Data merging
</h2>

In [7]:
movies = movies.merge(credits,on='title')

In [8]:
movies.head()
# budget
# homepage
# id
# original_language
# original_title
# popularity
# production_comapny
# production_countries
# release-date(not sure)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,movie_id,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,19995,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,285,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,206647,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...",...,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106,49026,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]",...,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124,49529,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [9]:
movies = movies[['movie_id','title','overview','genres','keywords','cast','crew']]

In [10]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...","[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


<h2 style="font-weight:bold; text-align:center; padding:20px; border:4px solid green; border-radius:30px; background-color:#fffde7;">
 🧹 Textual Data Transformation and Feature Engineering

</h2>

In [11]:
import ast

In [12]:
def convert(text):
    L = []
    for i in ast.literal_eval(text):
        L.append(i['name']) 
    return L 

In [13]:
movies.dropna(inplace=True)

In [14]:
movies['genres'] = movies['genres'].apply(convert)
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [15]:
movies['keywords'] = movies['keywords'].apply(convert)
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [16]:
import ast
ast.literal_eval('[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]')

[{'id': 28, 'name': 'Action'},
 {'id': 12, 'name': 'Adventure'},
 {'id': 14, 'name': 'Fantasy'},
 {'id': 878, 'name': 'Science Fiction'}]

In [17]:
def convert3(text):
    L = []
    counter = 0
    for i in ast.literal_eval(text):
        if counter < 3:
            L.append(i['name'])
        counter+=1
    return L 

In [18]:
movies['cast'] = movies['cast'].apply(convert)
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[Sam Worthington, Zoe Saldana, Sigourney Weave...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[Johnny Depp, Orlando Bloom, Keira Knightley, ...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...","[Daniel Craig, Christoph Waltz, Léa Seydoux, R...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...","[Christian Bale, Michael Caine, Gary Oldman, A...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","[Taylor Kitsch, Lynn Collins, Samantha Morton,...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [19]:
movies['cast'] = movies['cast'].apply(lambda x:x[0:3])

In [20]:
def fetch_director(text):
    L = []
    for i in ast.literal_eval(text):
        if i['job'] == 'Director':
            L.append(i['name'])
    return L 

In [21]:
movies['crew'] = movies['crew'].apply(fetch_director)

In [22]:
def collapse(L):
    L1 = []
    for i in L:
        L1.append(i.replace(" ",""))
    return L1

In [23]:
movies['cast'] = movies['cast'].apply(collapse)
movies['crew'] = movies['crew'].apply(collapse)
movies['genres'] = movies['genres'].apply(collapse)
movies['keywords'] = movies['keywords'].apply(collapse)

In [24]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, ScienceFiction]","[cultureclash, future, spacewar, spacecolony, ...","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron]
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[ocean, drugabuse, exoticisland, eastindiatrad...","[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski]
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[spy, basedonnovel, secretagent, sequel, mi6, ...","[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes]
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[dccomics, crimefighter, terrorist, secretiden...","[ChristianBale, MichaelCaine, GaryOldman]",[ChristopherNolan]
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, ScienceFiction]","[basedonnovel, mars, medallion, spacetravel, p...","[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton]


In [25]:
movies['overview'] = movies['overview'].apply(lambda x:x.split())

In [26]:
movies['tags'] = movies['overview'] + movies['genres'] + movies['keywords'] + movies['cast'] + movies['crew']

In [27]:
new = movies.drop(columns=['overview','genres','keywords','cast','crew'])
#new.head()

In [28]:
new['tags'] = new['tags'].apply(lambda x: " ".join(x))
new.head()

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...
4,49529,John Carter,"John Carter is a war-weary, former military ca..."


<h2 style="font-weight:bold; text-align:center; padding:20px; border:4px solid green; border-radius:30px; background-color:#fffde7;">
🧠 Text Vectorization Using CountVectorizer
</h2>

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>TF-IDF Explained</title>
  <style>
    body {
      background-color: #fffbea;
      font-family: 'Georgia', serif;
      padding: 50px;
      color: #3e3e3e;
    }
    h1 {
      font-size: 42px;
      color: #b8860b;
      border-bottom: 4px solid #f7ce5b;
      margin-bottom: 25px;
    }
    .formula-box {
      background-color: #fff8dc;
      border-left: 10px solid #f0ad4e;
      padding: 20px;
      font-size: 22px;
      margin-bottom: 20px;
      border-radius: 8px;
    }
    .explanation {
      font-size: 20px;
      line-height: 1.6;
      margin-bottom: 30px;
    }
  </style>
</head>
<body>

  <h1>📘 TF-IDF Formula with Explanation</h1>

  <div class="formula-box">
    <strong>TF(t,d)</strong> = (Number of times term <em>t</em> appears in document <em>d</em>) / (Total number of terms in <em>d</em>)
  </div>
  <p class="explanation">
    <strong>Term Frequency (TF)</strong> measures how often a specific word appears in a document. The frequency is normalized by dividing the count of the term by the total number of terms in the document, so that longer documents don’t have unfairly high term counts.
  </p>

  <div class="formula-box">
    <strong>IDF(t)</strong> = log (N / (1 + df<sub>t</sub>))
  </div>
  <p class="explanation">
    <strong>Inverse Document Frequency (IDF)</strong> tells us how important a word is across all documents. If a word occurs in many documents, it gets a lower IDF. Words that appear rarely (but are still meaningful) get a higher weight, helping to reduce the influence of commonly used words like “the” or “is”.
  </p>

  <div class="formula-box">
    <strong>TF-IDF(t,d)</strong> = TF(t,d) × IDF(t)
  </div>
  <p class="explanation">
    <strong>TF-IDF</strong> is the product of TF and IDF. It gives higher scores to terms that occur frequently in a document but not across all documents. This makes it useful in keyword extraction, search engines, and text classification to identify relevant terms.
  </p>

</body>
</html>


In [29]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features=5000,stop_words='english')
    

In [30]:
vector = cv.fit_transform(new['tags']).toarray()

In [31]:
vector.shape

(4806, 5000)

<h2 style="font-weight:bold; text-align:center; padding:20px; border:4px solid green; border-radius:30px; background-color:#fffde7;">
🔗 Measuring Similarity with Cosine Similarity
</h2>

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Cosine Similarity Explained</title>
  <style>
    body {
      background-color: #fffdf2;
      font-family: 'Georgia', serif;
      padding: 50px;
      color: #3e3e3e;
    }
    h1 {
      font-size: 42px;
      color: #d2691e;
      border-bottom: 4px solid #f0c36d;
      margin-bottom: 25px;
    }
    .formula-box {
      background-color: #fef9e7;
      border-left: 10px solid #ffcc00;
      padding: 20px;
      font-size: 24px;
      margin-bottom: 20px;
      border-radius: 10px;
    }
    .explanation {
      font-size: 20px;
      line-height: 1.6;
      margin-bottom: 30px;
    }
  </style>
</head>
<body>

  <h1>📏 Cosine Similarity Formula with Explanation</h1>

  <div class="formula-box">
    cos(θ) = (A · B) / (||A|| × ||B||)
  </div>
  <p class="explanation">
    Cosine similarity measures the cosine of the angle between two vectors in a multi-dimensional space. It is commonly used to compare documents in NLP and recommender systems. If the angle is small (cosine value near 1), the vectors are similar.
  </p>

  <div class="formula-box">
    A · B = Σ A<sub>i</sub> × B<sub>i</sub><br>
    ||A|| = √(Σ A<sub>i</sub><sup>2</sup>)<br>
    ||B|| = √(Σ B<sub>i</sub><sup>2</sup>)
  </div>
  <p class="explanation">
    The dot product (A · B) calculates the numerator, showing overlap between values in vector A and B. The denominator normalizes both vectors by calculating their magnitudes (lengths). Cosine similarity is bounded between -1 and 1, with 1 meaning perfect similarity.
  </p>

</body>
</html>


In [32]:
from sklearn.metrics.pairwise import cosine_similarity

In [33]:
similarity = cosine_similarity(vector)

In [34]:
similarity

array([[1.        , 0.08964215, 0.06071767, ..., 0.02519763, 0.0277885 ,
        0.        ],
       [0.08964215, 1.        , 0.06350006, ..., 0.02635231, 0.        ,
        0.        ],
       [0.06071767, 0.06350006, 1.        , ..., 0.02677398, 0.        ,
        0.        ],
       ...,
       [0.02519763, 0.02635231, 0.02677398, ..., 1.        , 0.07352146,
        0.04774099],
       [0.0277885 , 0.        , 0.        , ..., 0.07352146, 1.        ,
        0.05264981],
       [0.        , 0.        , 0.        , ..., 0.04774099, 0.05264981,
        1.        ]])

In [35]:
new[new['title'] == 'The Lego Movie'].index[0]

744

<h2 style="font-weight:bold; text-align:center; padding:20px; border:4px solid green; border-radius:30px; background-color:#fffde7;">
🎯 Movie Recommendation Function
</h2>

In [36]:
def recommend(movie):
    index = new[new['title'] == movie].index[0]
    distances = sorted(list(enumerate(similarity[index])),reverse=True,key = lambda x: x[1])
    for i in distances[1:6]:
        print(new.iloc[i[0]].title)
        
    

In [37]:
recommend('Gandhi')

Gandhi, My Father
The Wind That Shakes the Barley
A Passage to India
Guiana 1838
Ramanujan


<h2 style="font-weight:bold; text-align:center; padding:20px; border:4px solid green; border-radius:30px; background-color:#fffde7;">
💾 Saving Recommendation Data for Deployment
</h2>

In [38]:
import pickle

In [39]:
pickle.dump(new,open('movie_list.pkl','wb'))
pickle.dump(similarity,open('similarity.pkl','wb'))