# 1. Importing Libraries
---
- Few basic libraries are mentioned below to start the work.

- Additional libraries will be imported as required.

In [1]:
import numpy as np
import pandas as pd

# 2. Reading the CSV files
---

In [2]:
movie = pd.read_csv('tmdb_5000_movies.csv')
credit = pd.read_csv('tmdb_5000_credits.csv')

# 3. Data Acquisition & Description
---
- This dataset provides detailed information about the movies released from 1990 till 2016.

In [3]:
movie.sample(2)
# movie.columns

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
1991,19000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 35, ""nam...",,9737,"[{""id"": 416, ""name"": ""miami""}, {""id"": 703, ""na...",en,Bad Boys,Marcus Burnett is a hen-pecked family man. Mik...,33.872182,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",4/7/1995,141407024,118.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,Whatcha gonna do?,Bad Boys,6.5,1699
3780,0,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 18, ""nam...",,14293,"[{""id"": 2348, ""name"": ""hustler""}, {""id"": 10183...",en,Poolhall Junkies,A retired pool hustler is forced to pick up th...,5.90259,"[{""name"": ""Gold Circle Films"", ""id"": 12026}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",6/7/2002,0,99.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,It's your shot. Take it.,Poolhall Junkies,6.5,25


In [4]:
credit.sample(2)

Unnamed: 0,movie_id,title,cast,crew
3486,347764,Goddess of Love,"[{""cast_id"": 3, ""character"": ""Venus"", ""credit_...","[{""credit_id"": ""559b17589251413ccc00050b"", ""de..."
3861,1667,March of the Penguins,"[{""cast_id"": 10, ""character"": ""Narrator"", ""cre...","[{""credit_id"": ""52fe430ac3a36847f8035fa9"", ""de..."


## 3.1 Data Description
---
- To get some quick description out of the data you can use describe method defined in pandas library.

In [5]:
# movies.describe(include='all')
movie.describe()

Unnamed: 0,budget,id,popularity,revenue,runtime,vote_average,vote_count
count,4803.0,4803.0,4803.0,4803.0,4801.0,4803.0,4803.0
mean,29045040.0,57165.484281,21.492301,82260640.0,106.875859,6.092172,690.217989
std,40722390.0,88694.614033,31.81665,162857100.0,22.611935,1.194612,1234.585891
min,0.0,5.0,0.0,0.0,0.0,0.0,0.0
25%,790000.0,9014.5,4.66807,0.0,94.0,5.6,54.0
50%,15000000.0,14629.0,12.921594,19170000.0,103.0,6.2,235.0
75%,40000000.0,58610.5,28.313505,92917190.0,118.0,6.8,737.0
max,380000000.0,459488.0,875.581305,2787965000.0,338.0,10.0,13752.0


### **Some Observations**
---

- id column is used as an index
- Names of **few columns** are **quite different** and **need change for easier operations**.
- **Revenue** is **+vely skewed** data as mean is way higher than then median. 
- **Revenue** data also has **outliers.**
- **vote_average** are **symetrical distrubtion** and **no outliers** observed. 
- **Runtime** data is **slightly +vely** skewed with some **outliers.** 

In [6]:
# credit.describe(include='all')
# credit.describe()

## **3.2 Data Information**
---

In [7]:
movie.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4803 non-null   int64  
 1   genres                4803 non-null   object 
 2   homepage              1712 non-null   object 
 3   id                    4803 non-null   int64  
 4   keywords              4803 non-null   object 
 5   original_language     4803 non-null   object 
 6   original_title        4803 non-null   object 
 7   overview              4800 non-null   object 
 8   popularity            4803 non-null   float64
 9   production_companies  4803 non-null   object 
 10  production_countries  4803 non-null   object 
 11  release_date          4802 non-null   object 
 12  revenue               4803 non-null   int64  
 13  runtime               4801 non-null   float64
 14  spoken_languages      4803 non-null   object 
 15  status               

- info() function gives us following insights about movie data

    - Some data in **homepage and tagline** columns are **missing.**
    - Dataset has total **7 numeric** datatype columns and **13 Object** datatype columns.  
    

In [8]:
credit.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   movie_id  4803 non-null   int64 
 1   title     4803 non-null   object
 2   cast      4803 non-null   object
 3   crew      4803 non-null   object
dtypes: int64(1), object(3)
memory usage: 93.9+ KB


- Dataset has total **1 numeric** datatype columns and **3 Object** datatype columns.
- There is no missing value in this dataset



# **4. Data Pre-Processing**
---

- Here we will perform **Data Processing** on our dataset to make data usable for EDA.

- Let's correct the dataset first.

## **4.1 Merge the dataset**
---
- We have **merged** both dataset on based of **title.**


In [9]:
movies = movie.merge(credit,on='title')
movies.sample(1)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,movie_id,cast,crew
1168,40000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 35, ""...",http://www.backtothefuture.com/movies/backtoth...,196,"[{""id"": 386, ""name"": ""railroad robber""}, {""id""...",en,Back to the Future Part III,The final installment of the Back to the Futur...,45.769562,"[{""name"": ""Universal Pictures"", ""id"": 33}, {""n...",...,118.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,They've saved the best trip for last... But th...,Back to the Future Part III,7.1,2900,196,"[{""cast_id"": 14, ""character"": ""Marty McFly / S...","[{""credit_id"": ""52fe4225c3a36847f80078d5"", ""de..."


## 4.2 Selecting the columns which we need  for our project
---

In [10]:
movies = movies[['movie_id','title','overview','genres','keywords','cast','crew']]

## 4.3 Finding and removing null values from movies dataset
---

In [11]:
movies.isnull().sum()

movie_id    0
title       0
overview    3
genres      0
keywords    0
cast        0
crew        0
dtype: int64

- In **overview** we have **3 null values**

#### **Removing null values from overview*

In [12]:
movies.dropna(inplace=True)

In [13]:
movies.isnull().sum()

movie_id    0
title       0
overview    0
genres      0
keywords    0
cast        0
crew        0
dtype: int64

- Now in **movies dataset** there is **no null values**

#### **Checking duplicate values in dataset**

In [14]:
movies.duplicated().sum()

0

## 4.4 Fatching the genres of movies from movies dataset
---

In [15]:
# movies.iloc[0].genres
movies['genres'][0]

'[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'

#### function for fatching genres 

In [16]:
import ast
def convert(obj):
    L = []
    for i in ast.literal_eval(obj):
        L.append(i['name'])
    return L

In [17]:
movies['genres'] = movies['genres'].apply(convert)

In [18]:
movies['genres'][0]

['Action', 'Adventure', 'Fantasy', 'Science Fiction']

In [19]:
movies['keywords'][0]

'[{"id": 1463, "name": "culture clash"}, {"id": 2964, "name": "future"}, {"id": 3386, "name": "space war"}, {"id": 3388, "name": "space colony"}, {"id": 3679, "name": "society"}, {"id": 3801, "name": "space travel"}, {"id": 9685, "name": "futuristic"}, {"id": 9840, "name": "romance"}, {"id": 9882, "name": "space"}, {"id": 9951, "name": "alien"}, {"id": 10148, "name": "tribe"}, {"id": 10158, "name": "alien planet"}, {"id": 10987, "name": "cgi"}, {"id": 11399, "name": "marine"}, {"id": 13065, "name": "soldier"}, {"id": 14643, "name": "battle"}, {"id": 14720, "name": "love affair"}, {"id": 165431, "name": "anti war"}, {"id": 193554, "name": "power relations"}, {"id": 206690, "name": "mind and soul"}, {"id": 209714, "name": "3d"}]'

In [20]:
movies['keywords'] = movies['keywords'].apply(convert)

In [21]:
movies.head(2)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."


## 4.5 Fatching the Actors of movies from movies dataset
---


In [22]:
movies['cast'][0]

'[{"cast_id": 242, "character": "Jake Sully", "credit_id": "5602a8a7c3a3685532001c9a", "gender": 2, "id": 65731, "name": "Sam Worthington", "order": 0}, {"cast_id": 3, "character": "Neytiri", "credit_id": "52fe48009251416c750ac9cb", "gender": 1, "id": 8691, "name": "Zoe Saldana", "order": 1}, {"cast_id": 25, "character": "Dr. Grace Augustine", "credit_id": "52fe48009251416c750aca39", "gender": 1, "id": 10205, "name": "Sigourney Weaver", "order": 2}, {"cast_id": 4, "character": "Col. Quaritch", "credit_id": "52fe48009251416c750ac9cf", "gender": 2, "id": 32747, "name": "Stephen Lang", "order": 3}, {"cast_id": 5, "character": "Trudy Chacon", "credit_id": "52fe48009251416c750ac9d3", "gender": 1, "id": 17647, "name": "Michelle Rodriguez", "order": 4}, {"cast_id": 8, "character": "Selfridge", "credit_id": "52fe48009251416c750ac9e1", "gender": 2, "id": 1771, "name": "Giovanni Ribisi", "order": 5}, {"cast_id": 7, "character": "Norm Spellman", "credit_id": "52fe48009251416c750ac9dd", "gender": 

#### Function for fatching actors 

In [23]:
def convert2(obj):
    L = []
    counter = 0
    for i in ast.literal_eval(obj):
        if counter !=3:
            L.append(i['name'])
            counter+=1
        else:
            break
    return L

In [24]:
movies['cast'] = movies['cast'].apply(convert2)

In [25]:
movies.sample(2)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
347,22794,Cloudy with a Chance of Meatballs,Inventor Flint Lockwood creates a machine that...,"[Animation, Comedy, Family]","[weather, food, science]","[Bill Hader, Anna Faris, James Caan]","[{""credit_id"": ""52fe444fc3a368484e01be87"", ""de..."
4111,8885,Waltz with Bashir,"Much awarded animated documentary, in which di...","[Drama, Animation, War]","[israel, palestine, middle east, lebanon, nigh...","[Ari Folman, Ron Ben-Yishai, Dror Harazi]","[{""credit_id"": ""52fe44c4c3a36847f80a900d"", ""de..."


## 4.6 Fatching the Director name of movies from movies dataset
---

In [26]:
movies['crew'][0]

'[{"credit_id": "52fe48009251416c750aca23", "department": "Editing", "gender": 0, "id": 1721, "job": "Editor", "name": "Stephen E. Rivkin"}, {"credit_id": "539c47ecc3a36810e3001f87", "department": "Art", "gender": 2, "id": 496, "job": "Production Design", "name": "Rick Carter"}, {"credit_id": "54491c89c3a3680fb4001cf7", "department": "Sound", "gender": 0, "id": 900, "job": "Sound Designer", "name": "Christopher Boyes"}, {"credit_id": "54491cb70e0a267480001bd0", "department": "Sound", "gender": 0, "id": 900, "job": "Supervising Sound Editor", "name": "Christopher Boyes"}, {"credit_id": "539c4a4cc3a36810c9002101", "department": "Production", "gender": 1, "id": 1262, "job": "Casting", "name": "Mali Finn"}, {"credit_id": "5544ee3b925141499f0008fc", "department": "Sound", "gender": 2, "id": 1729, "job": "Original Music Composer", "name": "James Horner"}, {"credit_id": "52fe48009251416c750ac9c3", "department": "Directing", "gender": 2, "id": 2710, "job": "Director", "name": "James Cameron"},

#### Function for fatching Director name 

In [27]:
def fetch_director(obj):
    L = []
    for i in ast.literal_eval(obj):
        if i['job'] == 'Director':
            L.append(i['name'])
            break
    return L

In [28]:
movies['crew'] = movies['crew'].apply(fetch_director)

In [29]:
movies.sample(2)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
2936,17708,Corky Romano,"Corky Romano is a bumbling, simpleton, veterin...","[Action, Comedy, Crime]",[],"[Chris Kattan, Vinessa Shaw, Peter Falk]",[Rob Pritts]
3758,8357,What the #$*! Do We (K)now!?,Amanda (Marlee Maitlin) is a divorced woman wh...,[Documentary],"[alternate dimension, new age, parallel world,...","[Marlee Matlin, Elaine Hendrix, Robert Blanche]",[William Arntz]


#### converting string into list

In [30]:
movies['overview'] = movies['overview'].apply(lambda x:x.split())
# converting string into list 

In [31]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[Sam Worthington, Zoe Saldana, Sigourney Weaver]",[James Cameron]
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[Johnny Depp, Orlando Bloom, Keira Knightley]",[Gore Verbinski]
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...","[Daniel Craig, Christoph Waltz, Léa Seydoux]",[Sam Mendes]
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney...","[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...","[Christian Bale, Michael Caine, Gary Oldman]",[Christopher Nolan]
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","[Taylor Kitsch, Lynn Collins, Samantha Morton]",[Andrew Stanton]


####  Function for removing spaces from columns cast, crew, genres, keywords

In [32]:
def collapse(L):
    L1 = []
    for i in L:
        L1.append(i.replace(" ",""))
    return L1

In [33]:
movies['cast'] = movies['cast'].apply(collapse)
movies['crew'] = movies['crew'].apply(collapse)
movies['genres'] = movies['genres'].apply(collapse)
movies['keywords'] = movies['keywords'].apply(collapse)

In [34]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, ScienceFiction]","[cultureclash, future, spacewar, spacecolony, ...","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron]
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[ocean, drugabuse, exoticisland, eastindiatrad...","[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski]
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[spy, basedonnovel, secretagent, sequel, mi6, ...","[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes]
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney...","[Action, Crime, Drama, Thriller]","[dccomics, crimefighter, terrorist, secretiden...","[ChristianBale, MichaelCaine, GaryOldman]",[ChristopherNolan]
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, ScienceFiction]","[basedonnovel, mars, medallion, spacetravel, p...","[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton]


# 4.7 Concatenation of columns
---
-  concatenation of columns **overview, genres, keywords, cast, crew** into **tags** column

In [35]:
movies['tags'] = movies['overview'] + movies['genres'] + movies['keywords'] + movies['cast'] + movies['crew']

In [37]:
# movies['tags'][1]

#### Removing overview, genres, keywords, cast, crew from movies dataset

In [38]:
new = movies.drop(columns=['overview','genres','keywords','cast','crew'])

In [39]:
new

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin..."
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d..."
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send..."
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney..."
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili..."
...,...,...,...
4804,9367,El Mariachi,"[El, Mariachi, just, wants, to, play, his, gui..."
4805,72766,Newlyweds,"[A, newlywed, couple's, honeymoon, is, upended..."
4806,231617,"Signed, Sealed, Delivered","[""Signed,, Sealed,, Delivered"", introduces, a,..."
4807,126186,Shanghai Calling,"[When, ambitious, New, York, attorney, Sam, is..."


#### Converting list into string

In [40]:
new['tags'] = new['tags'].apply(lambda x: " ".join(x))
new.head()

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...
4,49529,John Carter,"John Carter is a war-weary, former military ca..."


In [41]:
new['tags'][0]

'In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization. Action Adventure Fantasy ScienceFiction cultureclash future spacewar spacecolony society spacetravel futuristic romance space alien tribe alienplanet cgi marine soldier battle loveaffair antiwar powerrelations mindandsoul 3d SamWorthington ZoeSaldana SigourneyWeaver JamesCameron'

In [42]:
new['tags'][1]

"Captain Barbossa, long believed to be dead, has come back to life and is headed to the edge of the Earth with Will Turner and Elizabeth Swann. But nothing is quite as it seems. Adventure Fantasy Action ocean drugabuse exoticisland eastindiatradingcompany loveofone'slife traitor shipwreck strongwoman ship alliance calypso afterlife fighter pirate swashbuckler aftercreditsstinger JohnnyDepp OrlandoBloom KeiraKnightley GoreVerbinski"

# 5. Model fitting 
---

## 5.1 importing countvectorizer
---


In [43]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features=5000,stop_words='english')

- Here **max_features=5000** means **finding the 5000 common words which occurs in a vector** 
- **stop_words='english'** will ignore the stop words of english

## 5.2 fitting the model
---

In [44]:
vector = cv.fit_transform(new['tags']).toarray()

In [45]:
vector.shape

(4806, 5000)

In [46]:
cv.get_feature_names()

['000',
 '007',
 '10',
 '100',
 '11',
 '12',
 '13',
 '14',
 '15',
 '16',
 '17',
 '18',
 '18th',
 '19',
 '1930s',
 '1940s',
 '1944',
 '1950s',
 '1960s',
 '1970s',
 '1971',
 '1976',
 '1980',
 '1980s',
 '1985',
 '1990s',
 '19th',
 '19thcentury',
 '20',
 '200',
 '2003',
 '2009',
 '20th',
 '21st',
 '23',
 '24',
 '25',
 '30',
 '300',
 '3d',
 '40',
 '50',
 '500',
 '60',
 '60s',
 '70',
 'aaron',
 'aaroneckhart',
 'abandoned',
 'abducted',
 'abigailbreslin',
 'abilities',
 'ability',
 'able',
 'aboard',
 'abuse',
 'abusive',
 'academy',
 'accept',
 'accepted',
 'accepts',
 'access',
 'accident',
 'accidental',
 'accidentally',
 'accompanied',
 'accomplish',
 'account',
 'accountant',
 'accused',
 'ace',
 'achieve',
 'act',
 'acting',
 'action',
 'actionhero',
 'actions',
 'activist',
 'activities',
 'activity',
 'actor',
 'actors',
 'actress',
 'acts',
 'actual',
 'actually',
 'adam',
 'adams',
 'adamsandler',
 'adamshankman',
 'adaptation',
 'adapted',
 'addict',
 'addicted',
 'addiction',
 'a

## 5.3 Finding the cosine similarity between vectors
---

#### Importing cosine similarity

In [47]:
from sklearn.metrics.pairwise import cosine_similarity

#### Finding cosine similarity of each vector with another vector

In [48]:
similarity = cosine_similarity(vector)

In [49]:
similarity[0]

array([1.        , 0.09107651, 0.06071767, ..., 0.02548236, 0.02817181,
       0.        ])

In [50]:
similarity

array([[1.        , 0.09107651, 0.06071767, ..., 0.02548236, 0.02817181,
        0.        ],
       [0.09107651, 1.        , 0.06451613, ..., 0.02707652, 0.        ,
        0.        ],
       [0.06071767, 0.06451613, 1.        , ..., 0.02707652, 0.        ,
        0.        ],
       ...,
       [0.02548236, 0.02707652, 0.02707652, ..., 1.        , 0.07537784,
        0.04828045],
       [0.02817181, 0.        , 0.        , ..., 0.07537784, 1.        ,
        0.05337605],
       [0.        , 0.        , 0.        , ..., 0.04828045, 0.05337605,
        1.        ]])

In [142]:
# new[new['title'] == 'The Lego Movie'].index[0]

# 6. **Creating function for movie recommendation**
---

In [51]:
def recommend(movie):
    index = new[new['title'] == movie].index[0] # fatching the movie_id 
    distances = sorted(list(enumerate(similarity[index])),reverse=True,key = lambda x: x[1])
    for i in distances[1:11]:
        print(new.iloc[i[0]].title)

#### Calling our recommend() function
- passing the movie name in our **recommend()** function

In [52]:
recommend('Batman Begins')

The Dark Knight
The Dark Knight Rises
Batman
Batman & Robin
Batman
Amidst the Devil's Wings
Batman v Superman: Dawn of Justice
Defendor
Batman Returns
Dead Man Down


## Importing pickle and dumping
---

In [53]:
import pickle

In [54]:
pickle.dump(new,open('movie_list.pkl','wb'))
# pickle.dump(new_df.to_dict(),open('movie_dict.pkl','wb'))
pickle.dump(similarity,open('similarity.pkl','wb'))