# Movie recommendation system

## Problem Statement
Would we be able to predict which movies might or might not be a commercial success? </br>
This dataset collects part of the knowledge from the API TMDB, which contains only </br>
5000 movies out of the total number. 

## Dictionary
movie_id</br>
title</br>
overview</br>
genres</br>
keywords</br>
cast</br>
crew</br>

In [37]:
# Imports

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler
import sqlite3
import json

In [42]:
movies_raw = pd.read_csv("https://raw.githubusercontent.com/4GeeksAcademy/k-nearest-neighbors-project-tutorial/main/tmdb_5000_movies.csv")
credits_raw = pd.read_csv("https://raw.githubusercontent.com/4GeeksAcademy/k-nearest-neighbors-project-tutorial/main/tmdb_5000_credits.csv")

In [43]:
movies_raw.head(1)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800


In [44]:
credits_raw.head(1)

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."


In [53]:
conn = sqlite3.connect('movie_database.db')

In [54]:
movies_raw.to_sql('movies', conn, if_exists='replace', index=False)
credits_raw.to_sql('credits', conn, if_exists='replace', index=False)

4803

In [55]:
query = """
SELECT c.movie_id, m.title, m.overview, m.genres, m.keywords, c.cast, c.crew
FROM movies m
JOIN credits c
ON m.title = c.title;
"""
df_raw = pd.read_sql_query(query, conn)

In [56]:
conn.close()

In [57]:
df_raw.head(5)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...","[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [58]:
df_processed = df_raw.copy()

In [59]:
df_processed['genres'] = df_raw['genres'].apply(lambda x: [entry['name'] for entry in json.loads(x)] if pd.notna(x) else [])
df_processed['keywords'] = df_raw['keywords'].apply(lambda x: [entry['name'] for entry in json.loads(x)] if pd.notna(x) else [])
df_processed['cast'] = df_raw['cast'].apply(lambda x: [entry['name'] for entry in json.loads(x)][:3] if pd.notna(x) else [])
df_processed['crew'] = df_processed['crew'].apply(lambda x: [entry['name'] for entry in json.loads(x) if entry['job'] == 'Director'][0] if x and any(entry['job'] == 'Director' for entry in json.loads(x)) else '')
df_processed['overview'] = df_processed['overview'].apply(lambda x: [x] if pd.notna(x) else [])
df_processed['overview'] = df_raw['overview'].apply(lambda x: [x] if pd.notna(x) else [])


In [60]:
df_processed.sample(5, random_state=1010)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
4160,98549,The Legend of Hell's Gate: An American Conspiracy,"[In 1870s Texas, a ruthless bounty hunter and ...","[Action, Adventure, History, Western]",[based on real events],"[Eric Balfour, Lou Taylor Pucci, Henry Thomas]",Tanner Beard
1587,36593,The Naked Gun 33⅓: The Final Insult,[Frank Drebin is persuaded out of retirement t...,"[Comedy, Crime]","[undercover, spoof, state prison]","[Leslie Nielsen, Priscilla Presley, George Ken...",Peter Segal
2022,14635,The Rookie,[Jim Morris never made it out of the minor lea...,"[Drama, Family]","[father son relationship, baseball, sports tea...","[Dennis Quaid, Rachel Griffiths, Beth Grant]",John Lee Hancock
2221,26171,Everybody's Fine,"[Eight months after the death of his wife, Fra...",[Drama],"[family relationships, doctor, retired, visit,...","[Robert De Niro, Drew Barrymore, Kate Beckinsale]",Kirk Jones
3822,37495,Four Lions,[Four Lions tells the story of a group of Brit...,"[Comedy, Crime, Drama]","[terrorism, british farce]","[Riz Ahmed, Nigel Lindsay, Kayvan Novak]",Chris Morris


In [61]:
def remove_spaces(text):
    if isinstance(text, list):
        return [entry.replace(' ', '') for entry in text]
    else:
        return text.replace(' ', '')

# Apply the function to the specified columns
df_processed['genres'] = df_processed['genres'].apply(remove_spaces)
df_processed['cast'] = df_processed['cast'].apply(remove_spaces)
df_processed['crew'] = df_processed['crew'].apply(remove_spaces)
df_processed['keywords'] = df_processed['keywords'].apply(remove_spaces)

In [62]:
df_processed.sample(5, random_state=1010)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
4160,98549,The Legend of Hell's Gate: An American Conspiracy,"[In 1870s Texas, a ruthless bounty hunter and ...","[Action, Adventure, History, Western]",[basedonrealevents],"[EricBalfour, LouTaylorPucci, HenryThomas]",TannerBeard
1587,36593,The Naked Gun 33⅓: The Final Insult,[Frank Drebin is persuaded out of retirement t...,"[Comedy, Crime]","[undercover, spoof, stateprison]","[LeslieNielsen, PriscillaPresley, GeorgeKennedy]",PeterSegal
2022,14635,The Rookie,[Jim Morris never made it out of the minor lea...,"[Drama, Family]","[fathersonrelationship, baseball, sportsteam, ...","[DennisQuaid, RachelGriffiths, BethGrant]",JohnLeeHancock
2221,26171,Everybody's Fine,"[Eight months after the death of his wife, Fra...",[Drama],"[familyrelationships, doctor, retired, visit, ...","[RobertDeNiro, DrewBarrymore, KateBeckinsale]",KirkJones
3822,37495,Four Lions,[Four Lions tells the story of a group of Brit...,"[Comedy, Crime, Drama]","[terrorism, britishfarce]","[RizAhmed, NigelLindsay, KayvanNovak]",ChrisMorris


In [68]:
df_processed['tags'] = df_processed[df_processed.columns[1:]].apply(
    lambda x: ','.join(x.dropna().astype(str)),
    axis=1
)

df_processed.sample(10)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew,tags
3549,13173,The Ten,"[Ten stories, each inspired by one of the ten ...",[Comedy],"[independentfilm, ventriloquistdummy, multiple...","[PaulRudd, AdamBrody, JonHamm]",DavidWain,"The Ten,['Ten stories, each inspired by one of..."
599,10592,Hart's War,[Fourth-generation Army Col. William McNamara ...,"[Drama, War]","[blackpeople, worldwarii, prisonersofwar, u.s....","[BruceWillis, ColinFarrell, TerrenceHoward]",GregoryHoblit,"Hart's War,[""Fourth-generation Army Col. Willi..."
3953,50942,Creature,[An amphibious shark-like monster terrorizes a...,"[Horror, ScienceFiction, Thriller]",[],"[CraigT.Nelson, KimCattrall, ColmFeore]",StuartGillard,"Creature,['An amphibious shark-like monster te..."
4422,258755,Hidden Away,[At the age of 14 the world around you changes...,"[Romance, Drama]",[],"[GermánAlcarazu, AdilKoukouh, AnaWagener]",MikelRueda,"Hidden Away,[""At the age of 14 the world aroun..."
3816,874,A Man for All Seasons,[A Man for All Seasons is the filmed version o...,"[Drama, History]","[england, pope, beheading, deathpenalty, thoma...","[PaulScofield, WendyHiller, LeoMcKern]",FredZinnemann,"A Man for All Seasons,['A Man for All Seasons ..."
4457,343409,Windsor Drive,"[River Miller, a mentally unstable actor haunt...","[Thriller, Mystery]",[womandirector],"[SamaireArmstrong, AnnaGurji, MattCohen]",NatalieBible',"Windsor Drive,['River Miller, a mentally unsta..."
3019,88036,Sparkle,"[Musical prodigy, Sparkle (Jordin Sparks) stru...","[Drama, Music]","[soongsisters, duringcreditsstinger]","[WhitneyHouston, CeeLoGreen, DerekLuke]",SalimAkil,"Sparkle,['Musical prodigy, Sparkle (Jordin Spa..."
409,16858,All That Jazz,[Bob Fosse's semi-autobiographical film celebr...,"[Drama, Music]","[showbusiness, filmmaking, tapdancing, moviein...","[RoyScheider, JessicaLange, LelandPalmer]",BobFosse,"All That Jazz,[""Bob Fosse's semi-autobiographi..."
4671,376010,Western Religion,[The year is 1879. Gunfighters from the far re...,[Western],[],"[PeterShinkoda, MerikTadros, PeterSherayko]",JamesO'Brien,"Western Religion,['The year is 1879. Gunfighte..."
1103,68734,Argo,[As the Iranian revolution reaches a boiling p...,"[Drama, Thriller]","[cia, wifehusbandrelationship, document, revol...","[BenAffleck, BryanCranston, AlanArkin]",BenAffleck,"Argo,[""As the Iranian revolution reaches a boi..."
