### Movie recommendation system

Would we be able to predict which movies might or might not be a commercial success? This dataset collects part of the knowledge from the API TMDB, which contains only 5000 movies out of the total number. The following resources are available:

tmdb_5000_movies:
https://raw.githubusercontent.com/4GeeksAcademy/k-nearest-neighbors-project-tutorial/main/tmdb_5000_movies.csv


tmdb_5000_credits:
https://raw.githubusercontent.com/4GeeksAcademy/k-nearest-neighbors-project-tutorial/main/tmdb_5000_credits.csv

In [28]:
# Libraries
import pandas as pd
import pandasql as psql
import numpy as np
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neighbors import NearestNeighbors
from sklearn.metrics import accuracy_score
from sklearn.metrics.pairwise import cosine_similarity
from pickle import dump

In [15]:
# Load data
movies = pd.read_csv('https://raw.githubusercontent.com/4GeeksAcademy/k-nearest-neighbors-project-tutorial/main/tmdb_5000_movies.csv')
movies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4803 non-null   int64  
 1   genres                4803 non-null   object 
 2   homepage              1712 non-null   object 
 3   id                    4803 non-null   int64  
 4   keywords              4803 non-null   object 
 5   original_language     4803 non-null   object 
 6   original_title        4803 non-null   object 
 7   overview              4800 non-null   object 
 8   popularity            4803 non-null   float64
 9   production_companies  4803 non-null   object 
 10  production_countries  4803 non-null   object 
 11  release_date          4802 non-null   object 
 12  revenue               4803 non-null   int64  
 13  runtime               4801 non-null   float64
 14  spoken_languages      4803 non-null   object 
 15  status               

In [16]:
movies.sample(10)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
2207,20000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.12rounds-movie.com/,17134,"[{""id"": 6149, ""name"": ""police""}, {""id"": 8233, ...",en,12 Rounds,When New Orleans cop Danny Fisher prevents a b...,15.66135,"[{""name"": ""The Mark Gordon Company"", ""id"": 155...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-03-19,17280326,108.0,"[{""iso_639_1"": ""cs"", ""name"": ""\u010cesk\u00fd""...",Released,Survive all 12,12 Rounds,5.7,220
2031,22000000,"[{""id"": 18, ""name"": ""Drama""}, {""id"": 35, ""name...",http://theinformantmovie.warnerbros.com/,11323,"[{""id"": 5888, ""name"": ""agriculture""}, {""id"": 6...",en,The Informant!,A rising star at agri-industry giant Archer Da...,17.772518,"[{""name"": ""Section Eight"", ""id"": 129}, {""name""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-09-18,35424826,108.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,Based on a tattle-tale.,The Informant!,6.0,301
852,70000000,"[{""id"": 53, ""name"": ""Thriller""}, {""id"": 18, ""n...",,9833,"[{""id"": 246, ""name"": ""dancing""}, {""id"": 1523, ...",en,The Phantom of the Opera,"Deformed since birth, a bitter man known only ...",18.927463,"[{""name"": ""Odyssey Entertainment"", ""id"": 3539}...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2004-12-08,154648887,143.0,"[{""iso_639_1"": ""it"", ""name"": ""Italiano""}, {""is...",Released,The classic musical comes to the big screen fo...,The Phantom of the Opera,7.0,438
2898,12000000,"[{""id"": 18, ""name"": ""Drama""}, {""id"": 36, ""name...",http://www.cityoflifeanddeath.co.uk/,21345,"[{""id"": 1327, ""name"": ""war crimes""}, {""id"": 15...",zh,南京!南京!,"City of Life and Death takes place in 1937, du...",4.793348,"[{""name"": ""Media Asia Films"", ""id"": 5552}, {""n...","[{""iso_3166_1"": ""CN"", ""name"": ""China""}, {""iso_...",2009-04-22,10652498,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,,City of Life and Death,7.6,55
2770,0,"[{""id"": 18, ""name"": ""Drama""}]",,13074,"[{""id"": 6075, ""name"": ""sport""}]",en,Resurrecting the Champ,Up-and-coming sports reporter rescues a homele...,4.898437,"[{""name"": ""Battleplan Productions"", ""id"": 2108...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-06-14,0,112.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,,Resurrecting the Champ,5.9,60
3960,0,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 10751, ""...",https://www.facebook.com/pages/The-Deported/46...,170480,[],en,The Deported,An Italian-American actor is deported to Mexic...,0.194848,[],[],2010-06-15,0,90.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,,The Deported,0.0,0
719,60000000,"[{""id"": 10402, ""name"": ""Music""}, {""id"": 99, ""n...",http://www.thisisit-movie.com,13576,"[{""id"": 3490, ""name"": ""pop star""}, {""id"": 6027...",en,Michael Jackson's This Is It,"A compilation of interviews, rehearsals and ba...",15.798622,"[{""name"": ""Columbia Pictures"", ""id"": 5}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-10-28,0,111.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,Like You've Never Seen Him Before,This Is It,6.7,247
3744,4500000,"[{""id"": 27, ""name"": ""Horror""}, {""id"": 35, ""nam...",https://www.facebook.com/HanselGretelGetBaked,165864,"[{""id"": 616, ""name"": ""witch""}, {""id"": 10776, ""...",en,Hansel and Gretel Get Baked,An intense new marijuana strain named “Black F...,2.503612,"[{""name"": ""Tribeca Productions"", ""id"": 11391}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2013-02-19,0,86.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,,Hansel and Gretel Get Baked,4.8,67
3036,10000000,"[{""id"": 99, ""name"": ""Documentary""}, {""id"": 104...",http://www.1dthisisus-movie.com/feature/mosaic/,164558,"[{""id"": 6029, ""name"": ""concert""}]",en,One Direction: This Is Us,Go behind the scenes during One Directions sel...,7.944457,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2013-08-28,0,92.0,"[{""iso_639_1"": ""pt"", ""name"": ""Portugu\u00eas""}...",Released,A motion picture event.,One Direction: This Is Us,8.0,201
878,90000000,"[{""id"": 10752, ""name"": ""War""}, {""id"": 18, ""nam...",,3683,"[{""id"": 1956, ""name"": ""world war ii""}, {""id"": ...",en,Flags of Our Fathers,There were five Marines and one Navy Corpsman ...,23.9366,"[{""name"": ""DreamWorks SKG"", ""id"": 27}, {""name""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2006-10-18,65900249,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,They fight for their country but they die for ...,Flags of Our Fathers,6.7,526


In [17]:
credits = pd.read_csv('https://raw.githubusercontent.com/4GeeksAcademy/k-nearest-neighbors-project-tutorial/main/tmdb_5000_credits.csv')
credits.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   movie_id  4803 non-null   int64 
 1   title     4803 non-null   object
 2   cast      4803 non-null   object
 3   crew      4803 non-null   object
dtypes: int64(1), object(3)
memory usage: 150.2+ KB


In [18]:
credits.sample(10)

Unnamed: 0,movie_id,title,cast,crew
2524,16769,Coal Miner's Daughter,"[{""cast_id"": 2, ""character"": ""Loretta Lynn"", ""...","[{""credit_id"": ""589e87c7c3a3684bb7004dc7"", ""de..."
1407,10066,House of Wax,"[{""cast_id"": 22, ""character"": ""Carly Jones"", ""...","[{""credit_id"": ""52fe43199251416c750036c1"", ""de..."
1292,1257,Because I Said So,"[{""cast_id"": 6, ""character"": ""Daphne"", ""credit...","[{""credit_id"": ""52fe42ebc3a36847f802cb51"", ""de..."
743,6435,Practical Magic,"[{""cast_id"": 1, ""character"": ""Sally Owens"", ""c...","[{""credit_id"": ""52fe4451c3a36847f808eeb7"", ""de..."
2068,2196,Death at a Funeral,"[{""cast_id"": 4, ""character"": ""Daniel"", ""credit...","[{""credit_id"": ""52fe4340c3a36847f8045d0d"", ""de..."
753,5820,The Sentinel,"[{""cast_id"": 1, ""character"": ""Pete Garrison"", ...","[{""credit_id"": ""556ae1fdc3a3682725000fe5"", ""de..."
624,72331,Abraham Lincoln: Vampire Hunter,"[{""cast_id"": 8, ""character"": ""Abraham Lincoln""...","[{""credit_id"": ""52fe4864c3a368484e0f66a9"", ""de..."
915,9441,Stepmom,"[{""cast_id"": 9, ""character"": ""Isabel Kelly"", ""...","[{""credit_id"": ""594311409251417f77013a6c"", ""de..."
406,51052,Arthur Christmas,"[{""cast_id"": 2, ""character"": ""Arthur (voice)"",...","[{""credit_id"": ""5913a131925141580d00132d"", ""de..."
2174,47941,Under the Rainbow,"[{""cast_id"": 2, ""character"": ""Bruce Thorpe"", ""...","[{""credit_id"": ""52fe474fc3a36847f812e93d"", ""de..."


In [19]:
import sqlite3

conn = sqlite3.connect("../data/movies_database.db")

movies.to_sql("movies_table", conn, if_exists = "replace", index = False)
credits.to_sql("credits_table", conn, if_exists = "replace", index = False)

4803

In [20]:
# Merge tables for creating a new DataFrame

query = """
    SELECT *
    FROM movies_table
    INNER JOIN credits_table
    ON movies_table.title = credits_table.title;
"""

total_data = pd.read_sql_query(query, conn)
conn.close()

total_data = total_data.loc[:, ~total_data.columns.duplicated()]
total_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4809 entries, 0 to 4808
Data columns (total 23 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4809 non-null   int64  
 1   genres                4809 non-null   object 
 2   homepage              1713 non-null   object 
 3   id                    4809 non-null   int64  
 4   keywords              4809 non-null   object 
 5   original_language     4809 non-null   object 
 6   original_title        4809 non-null   object 
 7   overview              4806 non-null   object 
 8   popularity            4809 non-null   float64
 9   production_companies  4809 non-null   object 
 10  production_countries  4809 non-null   object 
 11  release_date          4808 non-null   object 
 12  revenue               4809 non-null   int64  
 13  runtime               4807 non-null   float64
 14  spoken_languages      4809 non-null   object 
 15  status               

In [21]:
total_data.sample(10)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,movie_id,cast,crew
1235,40000000,"[{""id"": 80, ""name"": ""Crime""}, {""id"": 28, ""name...",,11398,"[{""id"": 478, ""name"": ""china""}, {""id"": 3292, ""n...",en,The Art of War,When ruthless terrorists threaten to bring dow...,7.832337,"[{""name"": ""Amen Ra Films"", ""id"": 421}, {""name""...",...,117.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,Who is your foe?,The Art of War,5.6,135,11398,"[{""cast_id"": 9, ""character"": ""Neil Shaw"", ""cre...","[{""credit_id"": ""52fe44389251416c7502cfeb"", ""de..."
3265,0,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 18, ""nam...",,209263,"[{""id"": 4501, ""name"": ""masseuse""}, {""id"": 4543...",en,Enough Said,Eva is a divorced soon-to-be empty-nester wond...,14.969093,"[{""name"": ""Fox Searchlight Pictures"", ""id"": 43}]",...,93.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,,Enough Said,6.6,348,209263,"[{""cast_id"": 3, ""character"": ""Eva"", ""credit_id...","[{""credit_id"": ""52fe4d60c3a368484e1e5c23"", ""de..."
4606,0,"[{""id"": 99, ""name"": ""Documentary""}]",http://www.51birchstreet.com/index.php,79161,[],en,51 Birch Street,Documentary filmmaker Doug Block had every rea...,0.049921,"[{""name"": ""Copacetic Pictures"", ""id"": 8139}]",...,90.0,[],Released,Do you really want to know your parents?,51 Birch Street,6.8,6,79161,"[{""cast_id"": 1001, ""character"": ""Herself"", ""cr...","[{""credit_id"": ""52fe49c4c3a368484e13e9bb"", ""de..."
2853,12000000,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 10751, ""...",,34549,[],en,Max Keeble's Big Move,"Max Keeble, the victim of his 7th grade class,...",1.081822,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,86.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,His World. His Rules.,Max Keeble's Big Move,5.4,33,34549,"[{""cast_id"": 1, ""character"": ""Max Keeble"", ""cr...","[{""credit_id"": ""52fe456a9251416c91031963"", ""de..."
384,90000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 18, ""...",,8358,"[{""id"": 911, ""name"": ""exotic island""}, {""id"": ...",en,Cast Away,"Chuck, a top international manager for FedEx, ...",57.739713,"[{""name"": ""DreamWorks SKG"", ""id"": 27}, {""name""...",...,143.0,"[{""iso_639_1"": ""ru"", ""name"": ""P\u0443\u0441\u0...",Released,"At the edge of the world, his journey begins.",Cast Away,7.5,3218,8358,"[{""cast_id"": 3, ""character"": ""Chuck Noland"", ""...","[{""credit_id"": ""52fe44a2c3a36847f80a1505"", ""de..."
4314,1344000,"[{""id"": 18, ""name"": ""Drama""}, {""id"": 10749, ""n...",,86331,"[{""id"": 254, ""name"": ""france""}, {""id"": 293, ""n...",fr,Q,In a social context deteriorated by a countryw...,20.422246,"[{""name"": ""Acajou Films"", ""id"": 18519}, {""name...",...,103.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""}]",Released,,Desire,4.1,140,86331,"[{""cast_id"": 1005, ""character"": ""C\u00e9cile"",...","[{""credit_id"": ""577ce3ba9251413b63001f8c"", ""de..."
156,140000000,"[{""id"": 18, ""name"": ""Drama""}, {""id"": 28, ""name...",,616,"[{""id"": 233, ""name"": ""japan""}, {""id"": 1327, ""n...",en,The Last Samurai,Nathan Algren is an American hired to instruct...,52.341226,"[{""name"": ""Cruise/Wagner Productions"", ""id"": 4...",...,154.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,"In the face of an enemy, in the heart of one m...",The Last Samurai,7.3,1895,616,"[{""cast_id"": 11, ""character"": ""Captain Nathan ...","[{""credit_id"": ""52fe425ec3a36847f8018e53"", ""de..."
3996,2700000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 10751...",,17264,"[{""id"": 383, ""name"": ""poker""}, {""id"": 2673, ""n...",en,The Black Stallion,"While traveling with his father, young Alec be...",11.35972,"[{""name"": ""United Artists"", ""id"": 60}]",...,118.0,"[{""iso_639_1"": ""ar"", ""name"": ""\u0627\u0644\u06...",Released,"From the moment he first saw the stallion, he ...",The Black Stallion,7.0,59,17264,"[{""cast_id"": 1, ""character"": ""Alec Ramsey"", ""c...","[{""credit_id"": ""54ee5070925141795f004b0c"", ""de..."
1663,30000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.antitrustthemovie.com/,9989,"[{""id"": 1576, ""name"": ""technology""}, {""id"": 21...",en,Antitrust,A computer programmer's dream job at a hot Por...,8.359599,"[{""name"": ""Industry Entertainment"", ""id"": 376}...",...,108.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,Truth can be dangerous...Trust can be deadly.,Antitrust,5.8,153,9989,"[{""cast_id"": 18, ""character"": ""Milo Hoffman"", ...","[{""credit_id"": ""52fe4558c3a36847f80c90af"", ""de..."
1533,34000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.mgm.com/view/movie/1292/Moonraker/,698,"[{""id"": 110, ""name"": ""venice""}, {""id"": 1583, ""...",en,Moonraker,During the transportation of a Space Shuttle a...,29.887404,"[{""name"": ""United Artists"", ""id"": 60}, {""name""...",...,126.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Outer space now belongs to 007.,Moonraker,5.9,541,698,"[{""cast_id"": 15, ""character"": ""James Bond"", ""c...","[{""credit_id"": ""52fe426cc3a36847f801d72b"", ""de..."


In [23]:
# Create a dataframe only with selected columns
selected_columns = ["movie_id", "title", "overview", "genres", "keywords", "cast", "crew"]
df_selected = total_data[selected_columns]
df_selected.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4809 entries, 0 to 4808
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   movie_id  4809 non-null   int64 
 1   title     4809 non-null   object
 2   overview  4806 non-null   object
 3   genres    4809 non-null   object
 4   keywords  4809 non-null   object
 5   cast      4809 non-null   object
 6   crew      4809 non-null   object
dtypes: int64(1), object(6)
memory usage: 263.1+ KB


In [24]:
# Data transform as expected
import json

def load_json_safe(json_str, default_value = None):
    try:
        return json.loads(json_str)
    except (TypeError, json.JSONDecodeError):
        return default_value
    
df_selected["genres"] = df_selected["genres"].apply(lambda x: [item["name"] for item in json.loads(x)] if pd.notna(x) else None)
df_selected["keywords"] = df_selected["keywords"].apply(lambda x: [item["name"] for item in json.loads(x)] if pd.notna(x) else None)

df_selected["cast"] = df_selected["cast"].apply(lambda x: [item["name"] for item in json.loads(x)][:3] if pd.notna(x) else None)

df_selected["crew"] = df_selected["crew"].apply(lambda x: " ".join([crew_member['name'] for crew_member in load_json_safe(x) if crew_member['job'] == 'Director']))

df_selected["overview"] = df_selected["overview"].apply(lambda x: [x])

df_selected.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_selected["genres"] = df_selected["genres"].apply(lambda x: [item["name"] for item in json.loads(x)] if pd.notna(x) else None)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_selected["keywords"] = df_selected["keywords"].apply(lambda x: [item["name"] for item in json.loads(x)] if pd.notna(x) else None)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/u

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"[In the 22nd century, a paraplegic Marine is d...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[Sam Worthington, Zoe Saldana, Sigourney Weaver]",James Cameron
1,285,Pirates of the Caribbean: At World's End,"[Captain Barbossa, long believed to be dead, h...","[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[Johnny Depp, Orlando Bloom, Keira Knightley]",Gore Verbinski
2,206647,Spectre,[A cryptic message from Bond’s past sends him ...,"[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...","[Daniel Craig, Christoph Waltz, Léa Seydoux]",Sam Mendes
3,49026,The Dark Knight Rises,[Following the death of District Attorney Harv...,"[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...","[Christian Bale, Michael Caine, Gary Oldman]",Christopher Nolan
4,49529,John Carter,"[John Carter is a war-weary, former military c...","[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","[Taylor Kitsch, Lynn Collins, Samantha Morton]",Andrew Stanton


In [26]:
df_selected["overview"] = df_selected["overview"].apply(lambda x: [str(x)])
df_selected["genres"] = df_selected["genres"].apply(lambda x: [str(genre) for genre in x])
df_selected["keywords"] = df_selected["keywords"].apply(lambda x: [str(keyword) for keyword in x])
df_selected["cast"] = df_selected["cast"].apply(lambda x: [str(actor) for actor in x])
df_selected["crew"] = df_selected["crew"].apply(lambda x: [str(crew_member) for crew_member in x])

df_selected["tags"] = df_selected["overview"] + df_selected["genres"] + df_selected["keywords"] + df_selected["cast"] + df_selected["crew"]
df_selected["tags"] = df_selected["tags"].apply(lambda x: ",".join(x).replace(",", " "))

df_selected.drop(columns = ["genres", "keywords", "cast", "crew", "overview"], inplace = True)

df_selected.iloc[0].tags

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_selected["overview"] = df_selected["overview"].apply(lambda x: [str(x)])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_selected["genres"] = df_selected["genres"].apply(lambda x: [str(genre) for genre in x])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_selected["keywords"] = df_selected[

'["[\'In the 22nd century  a paraplegic Marine is dispatched to the moon Pandora on a unique mission  but becomes torn between following orders and protecting an alien civilization.\']"] Action Adventure Fantasy Science Fiction culture clash future space war space colony society space travel futuristic romance space alien tribe alien planet cgi marine soldier battle love affair anti war power relations mind and soul 3d Sam Worthington Zoe Saldana Sigourney Weaver J a m e s   C a m e r o n'

In [34]:
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(df_selected["tags"])

model = NearestNeighbors(n_neighbors = 6, algorithm = "brute", metric = "cosine")
model.fit(tfidf_matrix)

def get_movie_recommendations(movie_title):
    movie_index = df_selected[df_selected["title"] == movie_title].index[0]
    distances, indices = model.kneighbors(tfidf_matrix[movie_index])
    similar_movies = [(df_selected["title"][i], distances[0][j]) for j, i in enumerate(indices[0])]
    return similar_movies[1:]

input_movie = "Practical Magic"
recommendations = get_movie_recommendations(input_movie)
print("Film recommendations '{}'".format(input_movie))
for movie, distance in recommendations:
    print("- Film: {}".format(movie))

Film recommendations 'Practical Magic'
- Film: Into the Woods
- Film: Harry Potter and the Order of the Phoenix
- Film: ParaNorman
- Film: Harry Potter and the Chamber of Secrets
- Film: Hansel & Gretel: Witch Hunters


In [35]:
dump(model, open("../models/knn_neighbors-6_algorithm-brute_metric-cosine.sav", "wb"))