

| Library             | Meaning                          | Use in Project                                                                                                             |
| ------------------- | -------------------------------- | -------------------------------------------------------------------------------------------------------------------------- |
| **`numpy` (`np`)**  | Numerical Python                 | Handles numbers, arrays, and math (used for similarity calculations).                                                      |
| **`pandas` (`pd`)** | Data analysis library            | Reads CSVs, cleans, merges, and manages movie data in tables (DataFrames).                                                 |
| **`ast`**           | Abstract Syntax Trees (built-in) | Converts string data like `"[{'id': 28, 'name': 'Action'}]"` into real Python lists/dicts so you can extract names easily. |

üëâ In short:

* **pandas** ‚Üí manage data
* **numpy** ‚Üí handle math
* **ast** ‚Üí fix string lists into real data


In [1]:
import numpy as np
import pandas as pd 
import ast


movies = pd.read_csv('tmdb_5000_movies.csv'),
credits = pd.read_csv('tmdb_5000_credits.csv')


| Part                      | Meaning                                                                                                                         |
| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
| `pd.read_csv()`           | A **pandas function** that reads a **CSV (comma-separated values)** file and loads it as a **DataFrame** (like an Excel table). |
| `'tmdb_5000_movies.csv'`  | The **file name** of the movies dataset.                                                                                        |
| `'tmdb_5000_credits.csv'` | The **file name** of the credits dataset (cast, crew info).                                                                     |
| `movies`, `credits`       | Variables that now store those datasets as **pandas DataFrames**, so you can analyze or merge them later.                       |




In [2]:
movies = pd.read_csv('tmdb_5000_movies.csv')
credits = pd.read_csv('tmdb_5000_credits.csv')



```python
movies.head(2)
```

| Part       | Meaning                                                               |
| ---------- | --------------------------------------------------------------------- |
| `movies`   | The pandas **DataFrame** that stores your movie dataset.              |
| `.head(2)` | A pandas **method** that shows the **first 2 rows** of the DataFrame. |





```python
movies.head(2)
```


|   | budget    | genres                                                                | homepage                                                       | id    | keywords                                                                     | ... | title                                    | vote_average | vote_count |
| - | --------- | --------------------------------------------------------------------- | -------------------------------------------------------------- | ----- | ---------------------------------------------------------------------------- | --- | ---------------------------------------- | ------------ | ---------- |
| 0 | 237000000 | [{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, ...]  | [http://www.avatarmovie.com/](http://www.avatarmovie.com/)     | 19995 | [{'id': 1463, 'name': 'culture clash'}, {'id': 2964, 'name': 'future'}, ...] | ... | Avatar                                   | 7.2          | 11800      |
| 1 | 300000000 | [{'id': 12, 'name': 'Adventure'}, {'id': 14, 'name': 'Fantasy'}, ...] | [http://disney.go.com/Pirates/](http://disney.go.com/Pirates/) | 285   | [{'id': 270, 'name': 'ocean'}, {'id': 726, 'name': 'drug abuse'}, ...]       | ... | Pirates of the Caribbean: At World's End | 6.9          | 4500       |




In [3]:
movies.head(2)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500




```python
movies.shape
```

| Part     | Meaning                                                                           |
| -------- | --------------------------------------------------------------------------------- |
| `movies` | Your pandas **DataFrame** containing the movies data.                             |
| `.shape` | A **property** (not a function) that returns the **dimensions** of the DataFrame. |

### üí° Output example:

```
(4803, 20)
```

| Number | Meaning                                 |
| ------ | --------------------------------------- |
| `4803` | Total number of **rows** (movies).      |
| `20`   | Total number of **columns** (features). |



In [4]:
movies.shape

(4803, 20)



#### **üßæ 1Ô∏è‚É£ `credits.head(2)`**

| Part       | Meaning                                                                            |
| ---------- | ---------------------------------------------------------------------------------- |
| `credits`  | Your **DataFrame** that contains information about each movie‚Äôs **cast and crew**. |
| `.head(2)` | Shows the **first 2 rows** of that DataFrame.                                      |

‚úÖ **Example output:**

|   | movie_id | title                                    | cast                                                                            | crew                                                                                                  |
| - | -------- | ---------------------------------------- | ------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------- |
| 0 | 19995    | Avatar                                   | `[{'cast_id': 242, 'character': 'Jake Sully', 'name': 'Sam Worthington'}, ...]` | `[{'credit_id': '...', 'department': 'Directing', 'job': 'Director', 'name': 'James Cameron'}, ...]`  |
| 1 | 285      | Pirates of the Caribbean: At World's End | `[{'cast_id': 4, 'character': 'Jack Sparrow', 'name': 'Johnny Depp'}, ...]`     | `[{'credit_id': '...', 'department': 'Directing', 'job': 'Director', 'name': 'Gore Verbinski'}, ...]` |

This shows who acted in and who worked on each movie.

---

#### **üé≠ 2Ô∏è‚É£ `credits.head(2)['cast'].values`**

| Part              | Meaning                                                         |
| ----------------- | --------------------------------------------------------------- |
| `credits.head(2)` | First 2 rows again.                                             |
| `['cast']`        | Selects only the **‚Äúcast‚Äù column**.                             |
| `.values`         | Converts that column into a **NumPy array** (list-like format). |

‚úÖ **Example output:**

```python
array(["[{'cast_id': 242, 'character': 'Jake Sully', 'name': 'Sam Worthington'}, ...]",
       "[{'cast_id': 4, 'character': 'Jack Sparrow', 'name': 'Johnny Depp'}, ...]"], dtype=object)
```

üëâ In short:

* `credits.head(2)` ‚Üí shows the top 2 records from the credits table.
* `credits.head(2)['cast'].values` ‚Üí extracts only the ‚Äúcast‚Äù data (as text strings) for those 2 movies.


In [5]:
credits.head(2)
#credits.head(2)['cast'].values

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."




```python
credits.shape
```

| Part      | Meaning                                                                               |
| --------- | ------------------------------------------------------------------------------------- |
| `credits` | The **DataFrame** that stores movie cast and crew information.                        |
| `.shape`  | A pandas **property** that shows the **dimensions** (rows, columns) of the DataFrame. |

### üí° Example Output:

```
(4803, 4)
```

| Number | Meaning                                                                     |
| ------ | --------------------------------------------------------------------------- |
| `4803` | Total number of **rows** ‚Üí movies listed in the credits dataset.            |
| `4`    | Total number of **columns** ‚Üí example: `movie_id`, `title`, `cast`, `crew`. |

‚úÖ **In short:**
`credits.shape` tells you that the credits dataset has **4803 movies** and **4 columns** of information.


In [6]:
credits.shape

(4803, 4)



```python
movies = movies.merge(credits, on='title')
```

| Part                         | Meaning                                                                         |
| ---------------------------- | ------------------------------------------------------------------------------- |
| `movies.merge(credits, ...)` | Joins (combines) the **movies** and **credits** DataFrames into one table.      |
| `on='title'`                 | Uses the **movie title** column (common in both files) to match and merge rows. |
| `movies = ...`               | Stores the merged result back into the `movies` variable.                       |

### üí° Example:

| movies DataFrame | credits DataFrame |     |        |                 |               |
| ---------------- | ----------------- | --- | ------ | --------------- | ------------- |
| title            | genres            | ... | title  | cast            | crew          |
| Avatar           | Action            | ... | Avatar | Sam Worthington | James Cameron |

‚úÖ After merging ‚Üí you get **one DataFrame** with all columns together:

| title  | genres | ... | cast            | crew          |
| ------ | ------ | --- | --------------- | ------------- |
| Avatar | Action | ... | Sam Worthington | James Cameron |



In [7]:
movies = movies.merge(credits,on = 'title')

In [8]:
movies.shape

(4809, 23)

In [9]:
movies.head(2)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,movie_id,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,19995,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,285,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."




```python
movies['original_language'].value_counts()
```

| Part                          | Meaning                                                                 |
| ----------------------------- | ----------------------------------------------------------------------- |
| `movies['original_language']` | Selects the **‚Äúoriginal_language‚Äù** column from the `movies` DataFrame. |
| `.value_counts()`             | Counts how many times each language appears (frequency count).          |

### üí° Example Output:

```
en    4505
fr      70
es      32
de      28
zh      26
hi      19
...
```

| Code | Language | Count (example) |
| ---- | -------- | --------------- |
| `en` | English  | 4505 movies     |
| `fr` | French   | 70 movies       |
| `es` | Spanish  | 32 movies       |
| `de` | German   | 28 movies       |
| `hi` | Hindi    | 19 movies       |

‚úÖ **In short:**
This command shows **how many movies are made in each original language** ‚Äî most will be **English (`en`)**.


In [10]:
movies['original_language'].value_counts()

original_language
en    4510
fr      70
es      32
zh      27
de      27
hi      19
ja      16
it      14
ko      12
cn      12
ru      11
pt       9
da       7
sv       5
nl       4
fa       4
th       3
he       3
id       2
cs       2
ta       2
ro       2
ar       2
te       1
hu       1
xx       1
af       1
is       1
tr       1
vi       1
pl       1
nb       1
ky       1
no       1
sl       1
ps       1
el       1
Name: count, dtype: int64



```python
movies.info()
```

| Part      | Meaning                                                                    |
| --------- | -------------------------------------------------------------------------- |
| `movies`  | Your pandas **DataFrame** containing the movie dataset.                    |
| `.info()` | A **pandas method** that shows a quick summary of the dataset‚Äôs structure. |

---

### üí° Example Output:

```
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 23 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   budget             4803 non-null   int64  
 1   genres             4803 non-null   object 
 2   homepage           1712 non-null   object 
 3   id                 4803 non-null   int64  
 4   keywords           4803 non-null   object 
 5   original_language  4803 non-null   object 
 6   overview           4800 non-null   object 
 7   popularity         4803 non-null   float64
 8   release_date       4802 non-null   object 
 9   revenue            4803 non-null   int64  
10   runtime            4801 non-null   float64
11   status             4803 non-null   object 
12   tagline            3957 non-null   object 
13   title              4803 non-null   object 
14   vote_average       4803 non-null   float64
15   vote_count         4803 non-null   int64  
16   movie_id           4803 non-null   int64  
17   cast               4803 non-null   object 
18   crew               4803 non-null   object 
dtypes: float64(3), int64(5), object(15)
memory usage: 864.0+ KB
```

---

‚úÖ **In short:**
`movies.info()` gives you:

* Total rows and columns
* Column names
* Number of non-missing values
* Each column‚Äôs data type
* Memory used

üëâ It‚Äôs mainly used to **understand dataset structure and find missing values** quickly.


In [11]:
movies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4809 entries, 0 to 4808
Data columns (total 23 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4809 non-null   int64  
 1   genres                4809 non-null   object 
 2   homepage              1713 non-null   object 
 3   id                    4809 non-null   int64  
 4   keywords              4809 non-null   object 
 5   original_language     4809 non-null   object 
 6   original_title        4809 non-null   object 
 7   overview              4806 non-null   object 
 8   popularity            4809 non-null   float64
 9   production_companies  4809 non-null   object 
 10  production_countries  4809 non-null   object 
 11  release_date          4808 non-null   object 
 12  revenue               4809 non-null   int64  
 13  runtime               4807 non-null   float64
 14  spoken_languages      4809 non-null   object 
 15  status               



```python
# genres
# id
# keyword
# title
# overview
# cast
# crew

movies = movies[['movie_id','title','overview','genres','keywords','cast','crew']]
```

---

### üß© Explanation:

| Part                                                                  | Meaning                                                                 |
| --------------------------------------------------------------------- | ----------------------------------------------------------------------- |
| `movies`                                                              | Your main DataFrame containing all movie data.                          |
| `[['movie_id','title','overview','genres','keywords','cast','crew']]` | A **list of column names** you want to **keep**.                        |
| `movies = ...`                                                        | Reassigns the DataFrame so it now only contains those selected columns. |

---

### üéØ Purpose:

This line **reduces the dataset** to keep **only the useful information** needed for building the recommender system.

| Column     | Description                          |
| ---------- | ------------------------------------ |
| `movie_id` | Unique ID for each movie             |
| `title`    | Movie name                           |
| `overview` | Short summary of the movie           |
| `genres`   | List of genres (e.g. Action, Drama)  |
| `keywords` | Important words describing the movie |
| `cast`     | Main actors                          |
| `crew`     | Crew members (e.g. director)         |

---

‚úÖ **In short:**
This line **filters the dataset** to keep **only 7 important columns** required for your movie recommendation model.


In [12]:
# genres
# id
# keyword
# title
# overview
# cast
# crew

movies = movies[['movie_id','title','overview','genres','keywords','cast','crew']]



```python
movies.head(3)
```

| Part       | Meaning                                              |
| ---------- | ---------------------------------------------------- |
| `movies`   | The pandas **DataFrame** containing your movie data. |
| `.head(3)` | Displays the **first 3 rows** of that DataFrame.     |

---

### üí° Example Output:

|   | movie_id | title                                    | overview                                                                                      | genres                                                                  | keywords                                                                       | cast                                                                                        | crew                                                   |
| - | -------- | ---------------------------------------- | --------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------- | ------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------- | ------------------------------------------------------ |
| 0 | 19995    | Avatar                                   | In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora...                 | `[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, ...]`  | `[{'id': 1463, 'name': 'culture clash'}, {'id': 2964, 'name': 'future'}, ...]` | `[{'cast_id': 242, 'name': 'Sam Worthington'}, {'cast_id': 3, 'name': 'Zoe Saldana'}, ...]` | `[{'job': 'Director', 'name': 'James Cameron'}, ...]`  |
| 1 | 285      | Pirates of the Caribbean: At World's End | Captain Barbossa, Will Turner and Elizabeth Swann must sail off the edge of the map...        | `[{'id': 12, 'name': 'Adventure'}, {'id': 14, 'name': 'Fantasy'}, ...]` | `[{'id': 270, 'name': 'ocean'}, {'id': 726, 'name': 'drug abuse'}, ...]`       | `[{'cast_id': 4, 'name': 'Johnny Depp'}, {'cast_id': 8, 'name': 'Orlando Bloom'}, ...]`     | `[{'job': 'Director', 'name': 'Gore Verbinski'}, ...]` |
| 2 | 206647   | Spectre                                  | A cryptic message from Bond‚Äôs past sends him on a trail to uncover a sinister organization... | `[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, ...]`  | `[{'id': 849, 'name': 'spy'}, {'id': 470, 'name': 'sequel'}, ...]`             | `[{'cast_id': 1, 'name': 'Daniel Craig'}, {'cast_id': 2, 'name': 'Christoph Waltz'}, ...]`  | `[{'job': 'Director', 'name': 'Sam Mendes'}, ...]`     |

---

‚úÖ **In short:**
`movies.head(3)` ‚Üí shows the **first 3 rows** of your filtered dataset so you can preview what your cleaned movie data looks like (with columns like title, genres, cast, etc.).


In [13]:
movies.head(3)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,A cryptic message from Bond‚Äôs past sends him o...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."




```python
movies.isnull().sum()
```

| Part        | Meaning                                                                                        |
| ----------- | ---------------------------------------------------------------------------------------------- |
| `movies`    | Your **DataFrame** containing movie data.                                                      |
| `.isnull()` | Checks each cell ‚Üí returns **True** if the value is missing (`NaN`), otherwise **False**.      |
| `.sum()`    | Adds up the `True` values in each column ‚Üí giving **the number of missing values per column**. |

---

### üí° Example Output:

| Column   | Missing Values |
| -------- | -------------- |
| movie_id | 0              |
| title    | 0              |
| overview | 3              |
| genres   | 0              |
| keywords | 0              |
| cast     | 0              |
| crew     | 0              |

---

‚úÖ **In short:**
This command tells you **how many missing (empty) values** each column has ‚Äî useful to find and clean `NaN` data before processing.


In [14]:
movies.isnull().sum()

movie_id    0
title       0
overview    3
genres      0
keywords    0
cast        0
crew        0
dtype: int64



```python
movies.dropna(inplace=True)
```

| Part           | Meaning                                                                         |
| -------------- | ------------------------------------------------------------------------------- |
| `movies`       | Your **DataFrame** containing movie data.                                       |
| `.dropna()`    | Removes (drops) any **rows that contain missing values (`NaN`)** in any column. |
| `inplace=True` | Updates the **original DataFrame directly** (doesn‚Äôt create a copy).            |

---

### üí° Example:

Before:

| title   | overview               | genres |
| ------- | ---------------------- | ------ |
| Avatar  | A Marine on Pandora... | Action |
| Titanic | *NaN*                  | Drama  |

After:

| title  | overview               | genres |
| ------ | ---------------------- | ------ |
| Avatar | A Marine on Pandora... | Action |

---

‚úÖ **In short:**
`movies.dropna(inplace=True)` ‚Üí **removes all rows with missing data** (like blank overviews or genres) and **updates your `movies` DataFrame** permanently.


In [15]:
movies.dropna(inplace=True)



```python
movies.isnull().sum()
```

Since you already ran

```python
movies.dropna(inplace=True)
```

‚Äî all rows with missing values were removed.

---

### üí° Example Output (after cleaning):

| Column   | Missing Values |
| -------- | -------------- |
| movie_id | 0              |
| title    | 0              |
| overview | 0              |
| genres   | 0              |
| keywords | 0              |
| cast     | 0              |
| crew     | 0              |

---

‚úÖ **In short:**
Now every column shows **0 missing values**, meaning your `movies` DataFrame is completely clean and ready for the next steps (like text processing and feature creation).


In [16]:
movies.isnull().sum()

movie_id    0
title       0
overview    0
genres      0
keywords    0
cast        0
crew        0
dtype: int64


```python
movies.duplicated().sum()
```

| Part            | Meaning                                                                             |
| --------------- | ----------------------------------------------------------------------------------- |
| `movies`        | Your DataFrame containing movie data.                                               |
| `.duplicated()` | Checks for **duplicate rows** ‚Äî returns `True` for each repeated row.               |
| `.sum()`        | Counts how many `True` values there are ‚Üí i.e., **total number of duplicate rows**. |

---

### üí° Example Output:

```
0
```

‚úÖ **In short:**
This tells you **how many duplicate rows** exist in your dataset.
If it shows `0`, it means **no duplicates** ‚Äî all movie records are unique.


In [17]:
movies.duplicated().sum()

np.int64(0)


```python
movies.iloc[0].genres
```

| Part       | Meaning                                       |
| ---------- | --------------------------------------------- |
| `movies`   | Your **DataFrame** containing all movie info. |
| `.iloc[0]` | Selects the **first row** (row index 0).      |
| `.genres`  | Accesses the **‚Äúgenres‚Äù** column of that row. |

---

### üí° Example Output (before data cleaning):

```
"[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 14, 'name': 'Fantasy'}, {'id': 878, 'name': 'Science Fiction'}]"
```

That means the **first movie (Avatar)** has genres like **Action, Adventure, Fantasy, and Science Fiction**, but notice üëâ it‚Äôs still a **string**, not an actual Python list yet.

---

‚úÖ **In short:**
`movies.iloc[0].genres` ‚Üí shows the **genres of the first movie (Avatar)**, currently stored as a **string version of a list of dictionaries**.


In [18]:
movies.iloc[0].genres

'[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'

In [19]:
# [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'
# ['Action','Adventure','Fantasy','Sci-Fi']



```python
def convert(obj):
    L = []
    for i in ast.literal_eval(obj):
        L.append(i['name'])
    return L
```

---

### üß© **Step-by-step meaning**

| Line                    | Explanation                                                                                                                |
| ----------------------- | -------------------------------------------------------------------------------------------------------------------------- |
| `def convert(obj):`     | Defines a function named **`convert`** that takes one argument `obj` (a string).                                           |
| `ast.literal_eval(obj)` | Converts the **string** (like `"[{'id': 28, 'name': 'Action'}, ...]"`) into a **real Python list of dictionaries** safely. |
| `for i in ...:`         | Loops through each dictionary in that list.                                                                                |
| `L.append(i['name'])`   | Extracts only the `'name'` value (like `'Action'`, `'Adventure'`) and adds it to list `L`.                                 |
| `return L`              | Returns the final list of names.                                                                                           |

---

### üí° Example:

Input:

```python
obj = "[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}]"
convert(obj)
```

Output:

```python
['Action', 'Adventure']
```

---

‚úÖ **In short:**
This function **converts a stringified list of dictionaries** (from the dataset) into a simple **Python list of genre or keyword names** like `['Action', 'Adventure', 'Fantasy']`.


In [20]:
def convert(obj):
    L = []
    for i in ast.literal_eval(obj):
        L.append(i['name'])
    return L



```python
movies['genres'] = movies['genres'].apply(convert)
```

---

### üß© Step-by-step meaning

| Part                     | Explanation                                                           |
| ------------------------ | --------------------------------------------------------------------- |
| `movies['genres']`       | Selects the **‚Äúgenres‚Äù column** from your DataFrame.                  |
| `.apply(convert)`        | Applies the **`convert()` function** to **every row** in that column. |
| `movies['genres'] = ...` | Saves the cleaned (converted) results back into the same column.      |

---

### üí° What happens:

Before applying `convert` ‚Üí
`"[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}]"` *(string)*

After applying `convert` ‚Üí
`['Action', 'Adventure']` *(real Python list)*

---

‚úÖ **In short:**
This line **cleans the ‚Äúgenres‚Äù column**, converting all those **stringified lists** into **actual Python lists** of genre names (so you can use them later for text processing).


In [21]:
movies['genres'] = movies['genres'].apply(convert)




```python
movies['keywords'] = movies['keywords'].apply(convert)
```

| Part                       | Meaning                                                                           |
| -------------------------- | --------------------------------------------------------------------------------- |
| `movies['keywords']`       | Selects the **‚Äúkeywords‚Äù column** from your DataFrame.                            |
| `.apply(convert)`          | Runs the previously defined **`convert()`** function on every row of that column. |
| `movies['keywords'] = ...` | Stores the cleaned data back into the same column.                                |

---

### üí° What happens:

Before:
`"[{'id': 1463, 'name': 'culture clash'}, {'id': 2964, 'name': 'future'}]"` *(string)*

After:
`['culture clash', 'future']` *(real Python list)*

---

‚úÖ **In short:**
This line cleans the **‚Äúkeywords‚Äù column** ‚Äî turning each string of dictionaries into a **real list of keyword names** you can easily use later in the recommendation model.


In [22]:
movies['keywords'] = movies['keywords'].apply(convert)



```python
def convert3(obj):
    L = []
    counter = 0
    for i in ast.literal_eval(obj):
        if counter != 3:
            L.append(i['name'])
            counter += 1
        else:
            break
    return L
```

---

### üß© Step-by-step meaning

| Line                    | Explanation                                                                                                     |
| ----------------------- | --------------------------------------------------------------------------------------------------------------- |
| `def convert3(obj):`    | Defines a function named **`convert3`** that takes one input (`obj`).                                           |
| `ast.literal_eval(obj)` | Converts a **string** (like `"[{'id': 1, 'name': 'Tom Cruise'}, ...]"`) into a **Python list of dictionaries**. |
| `for i in ...:`         | Loops through each dictionary (each actor/actress) in the list.                                                 |
| `if counter != 3:`      | Ensures only the **first 3 items** are taken.                                                                   |
| `L.append(i['name'])`   | Adds the actor‚Äôs name (value of `'name'` key) to list `L`.                                                      |
| `counter += 1`          | Increases the counter each time one name is added.                                                              |
| `break`                 | Stops the loop after 3 names.                                                                                   |
| `return L`              | Returns the list of 3 names.                                                                                    |

---

### üí° Example:

Input:

```python
obj = "[{'id': 1, 'name': 'Sam Worthington'}, {'id': 2, 'name': 'Zoe Saldana'}, {'id': 3, 'name': 'Sigourney Weaver'}, {'id': 4, 'name': 'Stephen Lang'}]"
convert3(obj)
```

Output:

```python
['Sam Worthington', 'Zoe Saldana', 'Sigourney Weaver']
```

---


In [23]:
def convert3(obj):
    L = []
    counter = 0
    for i in ast.literal_eval(obj):
        if counter != 3:
            L.append(i['name'])
            counter+=1
        else:
            break
    return L



```python
movies['cast'] = movies['cast'].apply(convert3)
```

---

### üß© What it does

| Part                   | Meaning                                                            |
| ---------------------- | ------------------------------------------------------------------ |
| `movies['cast']`       | Selects the **‚Äúcast‚Äù column** from your DataFrame.                 |
| `.apply(convert3)`     | Applies the **`convert3()` function** to every row in that column. |
| `movies['cast'] = ...` | Stores the cleaned list back into the same column.                 |

---

### üí° What happens:

Before:
`"[{'id': 1, 'name': 'Sam Worthington'}, {'id': 2, 'name': 'Zoe Saldana'}, {'id': 3, 'name': 'Sigourney Weaver'}, ...]"` *(string)*

After:
`['Sam Worthington', 'Zoe Saldana', 'Sigourney Weaver']` *(real Python list)*

---

‚úÖ **In short:**
This line cleans the **‚Äúcast‚Äù column** and keeps **only the top 3 actor names** for each movie ‚Äî making the data simpler and more useful for building your recommender system.


In [24]:
movies['cast'] = movies['cast'].apply(convert3)

In [25]:
def fetch_director(obj):
    L = []
    for i in ast.literal_eval(obj):
        if i['job'] == 'Director':
            L.append(i['name'])
            break
    return L

---

### üß† Function definition

```python
def fetch_director(obj):
    L = []
    for i in ast.literal_eval(obj):
        if i['job'] == 'Director':
            L.append(i['name'])
            break
    return L
```

### üß© Step-by-step meaning

| Line                         | Explanation                                                                                                                              |
| ---------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
| `def fetch_director(obj):`   | Defines a function to extract the **director‚Äôs name** from a movie‚Äôs crew data.                                                          |
| `ast.literal_eval(obj)`      | Converts the **string** of crew info (like `"[{'job':'Director','name':'James Cameron'}, ...]"`) into a **Python list of dictionaries**. |
| `for i in ...:`              | Loops through each dictionary in that list.                                                                                              |
| `if i['job'] == 'Director':` | Checks if the crew member‚Äôs job is **Director**.                                                                                         |
| `L.append(i['name'])`        | Adds the director‚Äôs name to the list `L`.                                                                                                |
| `break`                      | Stops after finding the first director (only one is needed).                                                                             |
| `return L`                   | Returns a list containing the director‚Äôs name.                                                                                           |

---

### üí° Example:

Input:

```python
obj = "[{'job': 'Director', 'name': 'James Cameron'}, {'job': 'Producer', 'name': 'Jon Landau'}]"
fetch_director(obj)
```

Output:

```python
['James Cameron']
```

---

### üßæ Applying the function

```python
movies['crew'] = movies['crew'].apply(fetch_director)
```

| Part                     | Meaning                                                            |
| ------------------------ | ------------------------------------------------------------------ |
| `movies['crew']`         | Selects the **‚Äúcrew‚Äù column**.                                     |
| `.apply(fetch_director)` | Runs the function on each row to extract only the director‚Äôs name. |
| `movies['crew'] = ...`   | Replaces the old crew data with the cleaned director names.        |

---

‚úÖ **In short:**
This code extracts **only the director‚Äôs name** from the `crew` column and stores it as a list ‚Äî for example:
`['James Cameron']` instead of the long string of crew details.


In [26]:
movies['crew'] = movies['crew'].apply(fetch_director)

In [27]:
# overview[0]
# In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization.




```python
movies['overview'] = movies['overview'].apply(lambda x: x.split())
```

---

### üß© Step-by-step meaning

| Part                  | Explanation                                                                        |
| --------------------- | ---------------------------------------------------------------------------------- |
| `movies['overview']`  | Selects the **‚Äúoverview‚Äù** column (movie description).                             |
| `.apply(...)`         | Applies a function to every row in that column.                                    |
| `lambda x: x.split()` | A small **anonymous function** that splits the overview text into a list of words. |

---

### üí° Example:

Before:

```python
"In the 22nd century, a paraplegic Marine is dispatched to Pandora."
```

After:

```python
['In', 'the', '22nd', 'century,', 'a', 'paraplegic', 'Marine', 'is', 'dispatched', 'to', 'Pandora.']
```

---

‚úÖ **In short:**
This line converts each movie‚Äôs **overview (string)** into a **list of words**, preparing the text for further processing (like combining with genres, keywords, etc.).


In [28]:
movies['overview'] = movies['overview'].apply(lambda x:x.split())



```python
movies.head()
```

---

### üß© Explanation

| Part      | Meaning                                                      |
| --------- | ------------------------------------------------------------ |
| `movies`  | Your pandas **DataFrame** containing all movie data.         |
| `.head()` | Displays the **first 5 rows** of the DataFrame (by default). |

---

### üí° Example Output (after all cleaning so far):

|   | movie_id | title                                    | overview                                                               | genres                                               | keywords                                    | cast                                                | crew                 |
| - | -------- | ---------------------------------------- | ---------------------------------------------------------------------- | ---------------------------------------------------- | ------------------------------------------- | --------------------------------------------------- | -------------------- |
| 0 | 19995    | Avatar                                   | ['In', 'the', '22nd', 'century,', 'a', 'paraplegic', 'Marine', ...]    | ['Action', 'Adventure', 'Fantasy', 'ScienceFiction'] | ['cultureclash', 'future', 'spacewar', ...] | ['SamWorthington', 'ZoeSaldana', 'SigourneyWeaver'] | ['JamesCameron']     |
| 1 | 285      | Pirates of the Caribbean: At World's End | ['Captain', 'Barbossa,', 'Will', 'Turner', ...]                        | ['Adventure', 'Fantasy', 'Action']                   | ['ocean', 'drugabuse', 'exoticisland']      | ['JohnnyDepp', 'OrlandoBloom', 'KeiraKnightley']    | ['GoreVerbinski']    |
| 2 | 206647   | Spectre                                  | ['A', 'cryptic', 'message', 'from', 'Bond‚Äôs', 'past', ...]             | ['Action', 'Adventure', 'Thriller']                  | ['spy', 'sequel', 'jamesbond']              | ['DanielCraig', 'ChristophWaltz', 'L√©aSeydoux']     | ['SamMendes']        |
| 3 | 49026    | The Dark Knight Rises                    | ['Following', 'the', 'death', 'of', 'District', 'Attorney', ...]       | ['Action', 'Thriller']                               | ['dccomics', 'crimefighter', 'terrorist']   | ['ChristianBale', 'MichaelCaine', 'GaryOldman']     | ['ChristopherNolan'] |
| 4 | 49529    | John Carter                              | ['John', 'Carter', 'is', 'a', 'war-weary,', 'former', 'military', ...] | ['Action', 'Adventure', 'ScienceFiction']            | ['basedonnovel', 'mars', 'medallion']       | ['TaylorKitsch', 'LynnCollins', 'SamanthaMorton']   | ['AndrewStanton']    |

---

‚úÖ **In short:**
`movies.head()` shows the **first 5 cleaned rows** of your movie dataset ‚Äî now each column (genres, keywords, cast, etc.) is a **list of words**, ready for combining into a single ‚Äútags‚Äù column for text analysis.


In [29]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[Sam Worthington, Zoe Saldana, Sigourney Weaver]",[James Cameron]
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[Johnny Depp, Orlando Bloom, Keira Knightley]",[Gore Verbinski]
2,206647,Spectre,"[A, cryptic, message, from, Bond‚Äôs, past, send...","[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...","[Daniel Craig, Christoph Waltz, L√©a Seydoux]",[Sam Mendes]
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney...","[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...","[Christian Bale, Michael Caine, Gary Oldman]",[Christopher Nolan]
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","[Taylor Kitsch, Lynn Collins, Samantha Morton]",[Andrew Stanton]


In [30]:
movies['genres'] = movies['genres'].apply(lambda x:[i.replace(" ","") for i in x])
movies['keywords'] = movies['keywords'].apply(lambda x:[i.replace(" ","") for i in x])
movies['cast'] = movies['cast'].apply(lambda x:[i.replace(" ","") for i in x])
movies['crew'] = movies['crew'].apply(lambda x:[i.replace(" ","") for i in x])

In [31]:
movies.head()


Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, ScienceFiction]","[cultureclash, future, spacewar, spacecolony, ...","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron]
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[ocean, drugabuse, exoticisland, eastindiatrad...","[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski]
2,206647,Spectre,"[A, cryptic, message, from, Bond‚Äôs, past, send...","[Action, Adventure, Crime]","[spy, basedonnovel, secretagent, sequel, mi6, ...","[DanielCraig, ChristophWaltz, L√©aSeydoux]",[SamMendes]
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney...","[Action, Crime, Drama, Thriller]","[dccomics, crimefighter, terrorist, secretiden...","[ChristianBale, MichaelCaine, GaryOldman]",[ChristopherNolan]
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, ScienceFiction]","[basedonnovel, mars, medallion, spacetravel, p...","[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton]


In [32]:
movies['tags'] = movies['overview'] + movies['genres'] + movies['cast'] + movies['crew'] + movies['keywords'] 
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew,tags
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, ScienceFiction]","[cultureclash, future, spacewar, spacecolony, ...","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron],"[In, the, 22nd, century,, a, paraplegic, Marin..."
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[ocean, drugabuse, exoticisland, eastindiatrad...","[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski],"[Captain, Barbossa,, long, believed, to, be, d..."
2,206647,Spectre,"[A, cryptic, message, from, Bond‚Äôs, past, send...","[Action, Adventure, Crime]","[spy, basedonnovel, secretagent, sequel, mi6, ...","[DanielCraig, ChristophWaltz, L√©aSeydoux]",[SamMendes],"[A, cryptic, message, from, Bond‚Äôs, past, send..."
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney...","[Action, Crime, Drama, Thriller]","[dccomics, crimefighter, terrorist, secretiden...","[ChristianBale, MichaelCaine, GaryOldman]",[ChristopherNolan],"[Following, the, death, of, District, Attorney..."
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, ScienceFiction]","[basedonnovel, mars, medallion, spacetravel, p...","[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton],"[John, Carter, is, a, war-weary,, former, mili..."


In [33]:
movies.head()['tags'].values

array([list(['In', 'the', '22nd', 'century,', 'a', 'paraplegic', 'Marine', 'is', 'dispatched', 'to', 'the', 'moon', 'Pandora', 'on', 'a', 'unique', 'mission,', 'but', 'becomes', 'torn', 'between', 'following', 'orders', 'and', 'protecting', 'an', 'alien', 'civilization.', 'Action', 'Adventure', 'Fantasy', 'ScienceFiction', 'SamWorthington', 'ZoeSaldana', 'SigourneyWeaver', 'JamesCameron', 'cultureclash', 'future', 'spacewar', 'spacecolony', 'society', 'spacetravel', 'futuristic', 'romance', 'space', 'alien', 'tribe', 'alienplanet', 'cgi', 'marine', 'soldier', 'battle', 'loveaffair', 'antiwar', 'powerrelations', 'mindandsoul', '3d']),
       list(['Captain', 'Barbossa,', 'long', 'believed', 'to', 'be', 'dead,', 'has', 'come', 'back', 'to', 'life', 'and', 'is', 'headed', 'to', 'the', 'edge', 'of', 'the', 'Earth', 'with', 'Will', 'Turner', 'and', 'Elizabeth', 'Swann.', 'But', 'nothing', 'is', 'quite', 'as', 'it', 'seems.', 'Adventure', 'Fantasy', 'Action', 'JohnnyDepp', 'OrlandoBloom', 'K

In [34]:
new_df = movies[['movie_id','title','tags']]
new_df

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin..."
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d..."
2,206647,Spectre,"[A, cryptic, message, from, Bond‚Äôs, past, send..."
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney..."
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili..."
...,...,...,...
4804,9367,El Mariachi,"[El, Mariachi, just, wants, to, play, his, gui..."
4805,72766,Newlyweds,"[A, newlywed, couple's, honeymoon, is, upended..."
4806,231617,"Signed, Sealed, Delivered","[""Signed,, Sealed,, Delivered"", introduces, a,..."
4807,126186,Shanghai Calling,"[When, ambitious, New, York, attorney, Sam, is..."


In [35]:
new_df['tags'] = new_df['tags'].apply(lambda x:" ".join(x))
new_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags'] = new_df['tags'].apply(lambda x:" ".join(x))


Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha..."
2,206647,Spectre,A cryptic message from Bond‚Äôs past sends him o...
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...
4,49529,John Carter,"John Carter is a war-weary, former military ca..."
...,...,...,...
4804,9367,El Mariachi,El Mariachi just wants to play his guitar and ...
4805,72766,Newlyweds,A newlywed couple's honeymoon is upended by th...
4806,231617,"Signed, Sealed, Delivered","""Signed, Sealed, Delivered"" introduces a dedic..."
4807,126186,Shanghai Calling,When ambitious New York attorney Sam is sent t...


In [36]:
new_df['tags'][0]

'In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization. Action Adventure Fantasy ScienceFiction SamWorthington ZoeSaldana SigourneyWeaver JamesCameron cultureclash future spacewar spacecolony society spacetravel futuristic romance space alien tribe alienplanet cgi marine soldier battle loveaffair antiwar powerrelations mindandsoul 3d'

In [37]:
new_df['tags'] = new_df['tags'].apply(lambda x:x.lower())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags'] = new_df['tags'].apply(lambda x:x.lower())


In [38]:
import nltk
from nltk.stem.porter import PorterStemmer
ps = PorterStemmer()



In [39]:
def stem(text):
    y = []

    for i in text.split():
        y.append(ps.stem(i))

    return " ".join(y)

In [40]:
new_df['tags'] = new_df['tags'].apply(stem)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags'] = new_df['tags'].apply(stem)


In [41]:
new_df.head()

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"in the 22nd century, a parapleg marin is dispa..."
1,285,Pirates of the Caribbean: At World's End,"captain barbossa, long believ to be dead, ha c..."
2,206647,Spectre,a cryptic messag from bond‚Äô past send him on a...
3,49026,The Dark Knight Rises,follow the death of district attorney harvey d...
4,49529,John Carter,"john carter is a war-weary, former militari ca..."


In [42]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features=5000,stop_words='english')
vectors = cv.fit_transform(new_df['tags']).toarray()

In [43]:
vectors.shape

(4806, 5000)

In [44]:
from sklearn.metrics.pairwise import cosine_similarity
similarity = cosine_similarity(vectors)

similarity[0]


array([1.        , 0.08346223, 0.0860309 , ..., 0.04499213, 0.        ,
       0.        ])

In [45]:
def recommend(movie):
    movie_index = new_df[new_df['title'] == movie].index[0]
    distances = similarity[movie_index]
    movies_list = sorted(list(enumerate(distances)),reverse=True,key=lambda x:x[1])[1:6]

    for i in movies_list:
        print(new_df.iloc[i[0]].title)

In [46]:
recommend("Avatar")
print("......................")
recommend("Batman Begins")

Aliens vs Predator: Requiem
Aliens
Falcon Rising
Independence Day
Titan A.E.
......................
The Dark Knight
Batman
Batman
The Dark Knight Rises
10th & Wolf


In [47]:
new_df.iloc[1216]

movie_id                                                10641
title                                      Autumn in New York
tags        autumn in new york follow the sexual exploit o...
Name: 1216, dtype: object

In [52]:
import pickle

pickle.dump(new_df, open('movie_list.pkl', 'wb'))
pickle.dump(similarity, open('similarity.pkl', 'wb'))
pickle.dump(new_df.to_dict(), open('movie_dict.pkl', 'wb'))
