# Sistema de recomendação de filmes

- Os sistemas de recomendação têm se mostrado extremamente úteis tanto no mercado quanto na vida real. Graças aos avanços em algoritmos de aprendizado de máquina e inteligência artificial, esses sistemas são capazes de fornecer sugestões personalizadas e relevantes para os usuários, ajudando-os a descobrir novos produtos e aumentando o engajamento do cliente.

## 1. Problema de Negócio

- Uma empresa de streaming de filmes deseja desenvolver um sistema de recomendação para seus usuários. O objetivo é fornecer recomendações personalizadas com base nos hábitos de visualização de cada usuário, incentivando-os a descobrir novos filmes que sejam de seu interesse.


- Você decide fazer algo a respeito criando um sistema de recomendação  baseado em conteúdo. Um exemplo de um sistema baseado em conteúdo é aquele que diz “se você comprou este livro, você pode gostar destes livros também”.


- O conjunto de dados contém informações para cerca de 4.800 filmes, incluindo título, orçamento, gêneros, palavras-chave, elenco e muito mais . O dataset está  disponível no [Kaggle](https://www.kaggle.com/datasets/akshaydattatraykhare/movies-dataset?select=tmdb_5000_movies.csv).


- Os sistemas de recomendação baseados em conteúdo frequentemente usam uma técnica chamada similaridade de cosseno para quantificar a similaridade entre os produtos.

## 2. Importar bibliotecas e conjunto de dados

In [1]:
import pandas as pd
import numpy as np
import ast

creditos=pd.read_csv("tmdb_5000_credits.csv")
filmes=pd.read_csv("tmdb_5000_movies.csv")

creditos.head()

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [2]:
filmes.head()

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-07-16,1084939099,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-03-07,284139100,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124


###  Vamos juntar as duas bases? Qual a coluna em comum? title!

In [3]:
filmes_df=filmes.merge(creditos, on="title")

filmes_df.head()

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,movie_id,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,19995,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,285,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,206647,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...",...,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106,49026,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]",...,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124,49529,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [4]:
# Obter uma descrição dos dados

filmes_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4809 entries, 0 to 4808
Data columns (total 23 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4809 non-null   int64  
 1   genres                4809 non-null   object 
 2   homepage              1713 non-null   object 
 3   id                    4809 non-null   int64  
 4   keywords              4809 non-null   object 
 5   original_language     4809 non-null   object 
 6   original_title        4809 non-null   object 
 7   overview              4806 non-null   object 
 8   popularity            4809 non-null   float64
 9   production_companies  4809 non-null   object 
 10  production_countries  4809 non-null   object 
 11  release_date          4808 non-null   object 
 12  revenue               4809 non-null   int64  
 13  runtime               4807 non-null   float64
 14  spoken_languages      4809 non-null   object 
 15  status               

- O conjunto de dados contém 23 colunas, das quais apenas algumas são necessárias para descrever um filme.

### Eliminando todas as colunas, exceto as que serão usadas para quantificar as semelhanças entre os filmes

In [5]:
filmes_df=filmes_df[["title", "genres", "keywords", "cast", "crew"]].copy()


filmes_df.head()

Unnamed: 0,title,genres,keywords,cast,crew
0,Avatar,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,Pirates of the Caribbean: At World's End,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,Spectre,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,The Dark Knight Rises,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...","[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,John Carter,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


## 3. Pré-processamento

### Variable genres

In [6]:
filmes_df.iloc[0].genres

'[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'

- Muitos filmes podem se encaixar em mais de um tipo  de gênero, Vamos extrair os tipos de gêneros de cada filme.

In [7]:
def convert(obj):
    L=[]
    for i in ast.literal_eval(obj):
        L.append(i["name"])
    return L  

In [8]:
# Extraindo os tipos de gêneros
filmes_df["genres"]=filmes_df["genres"].apply(convert)

# Eliminando os colchetes
filmes_df["genres"]=filmes_df["genres"].apply(lambda x:" ".join(x))

filmes_df.head()

Unnamed: 0,title,genres,keywords,cast,crew
0,Avatar,Action Adventure Fantasy Science Fiction,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,Pirates of the Caribbean: At World's End,Adventure Fantasy Action,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,Spectre,Action Adventure Crime,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,The Dark Knight Rises,Action Crime Drama Thriller,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,John Carter,Action Adventure Science Fiction,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [9]:
# O filme Avatar se pode encaixar em 4 gêneros

filmes_df.iloc[0].genres

'Action Adventure Fantasy Science Fiction'

### Variable keywords

In [10]:
filmes_df.iloc[0].keywords

'[{"id": 1463, "name": "culture clash"}, {"id": 2964, "name": "future"}, {"id": 3386, "name": "space war"}, {"id": 3388, "name": "space colony"}, {"id": 3679, "name": "society"}, {"id": 3801, "name": "space travel"}, {"id": 9685, "name": "futuristic"}, {"id": 9840, "name": "romance"}, {"id": 9882, "name": "space"}, {"id": 9951, "name": "alien"}, {"id": 10148, "name": "tribe"}, {"id": 10158, "name": "alien planet"}, {"id": 10987, "name": "cgi"}, {"id": 11399, "name": "marine"}, {"id": 13065, "name": "soldier"}, {"id": 14643, "name": "battle"}, {"id": 14720, "name": "love affair"}, {"id": 165431, "name": "anti war"}, {"id": 193554, "name": "power relations"}, {"id": 206690, "name": "mind and soul"}, {"id": 209714, "name": "3d"}]'

- Podemos extrair as cinco principais palavras-chave de cada filme.

In [11]:
# Extraindo as palavras-chave
filmes_df["keywords"]=filmes_df["keywords"].apply(convert)

# Extraindo as cinco principais palavras-chave de cada filme
filmes_df["keywords"]=filmes_df["keywords"].apply(lambda x:x[0:5])

# Eliminando os colchetes
filmes_df["keywords"]=filmes_df["keywords"].apply(lambda x:" ".join(x))

filmes_df.head()

Unnamed: 0,title,genres,keywords,cast,crew
0,Avatar,Action Adventure Fantasy Science Fiction,culture clash future space war space colony so...,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,Pirates of the Caribbean: At World's End,Adventure Fantasy Action,ocean drug abuse exotic island east india trad...,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,Spectre,Action Adventure Crime,spy based on novel secret agent sequel mi6,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,The Dark Knight Rises,Action Crime Drama Thriller,dc comics crime fighter terrorist secret ident...,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,John Carter,Action Adventure Science Fiction,based on novel mars medallion space travel pri...,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [12]:
filmes_df.iloc[0].keywords

'culture clash future space war space colony society'

### Variable cast

In [13]:
filmes_df.iloc[0].cast

'[{"cast_id": 242, "character": "Jake Sully", "credit_id": "5602a8a7c3a3685532001c9a", "gender": 2, "id": 65731, "name": "Sam Worthington", "order": 0}, {"cast_id": 3, "character": "Neytiri", "credit_id": "52fe48009251416c750ac9cb", "gender": 1, "id": 8691, "name": "Zoe Saldana", "order": 1}, {"cast_id": 25, "character": "Dr. Grace Augustine", "credit_id": "52fe48009251416c750aca39", "gender": 1, "id": 10205, "name": "Sigourney Weaver", "order": 2}, {"cast_id": 4, "character": "Col. Quaritch", "credit_id": "52fe48009251416c750ac9cf", "gender": 2, "id": 32747, "name": "Stephen Lang", "order": 3}, {"cast_id": 5, "character": "Trudy Chacon", "credit_id": "52fe48009251416c750ac9d3", "gender": 1, "id": 17647, "name": "Michelle Rodriguez", "order": 4}, {"cast_id": 8, "character": "Selfridge", "credit_id": "52fe48009251416c750ac9e1", "gender": 2, "id": 1771, "name": "Giovanni Ribisi", "order": 5}, {"cast_id": 7, "character": "Norm Spellman", "credit_id": "52fe48009251416c750ac9dd", "gender": 

- Podemos extrair os cinco principais atores de cada filme.

In [14]:
def convert1(obj):
    L=[]
    counter=0
    for i in ast.literal_eval(obj):
        if counter != 5:
            L.append(i["name"])
            counter +=1
        else:
            break
    return L    

In [15]:
# Extraindo o elenco 
filmes_df["cast"]=filmes_df["cast"].apply(convert1)

# Eliminando os colchetes
filmes_df["cast"]=filmes_df["cast"].apply(lambda x:" ".join(x))

filmes_df.head()

Unnamed: 0,title,genres,keywords,cast,crew
0,Avatar,Action Adventure Fantasy Science Fiction,culture clash future space war space colony so...,Sam Worthington Zoe Saldana Sigourney Weaver S...,"[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,Pirates of the Caribbean: At World's End,Adventure Fantasy Action,ocean drug abuse exotic island east india trad...,Johnny Depp Orlando Bloom Keira Knightley Stel...,"[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,Spectre,Action Adventure Crime,spy based on novel secret agent sequel mi6,Daniel Craig Christoph Waltz Léa Seydoux Ralph...,"[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,The Dark Knight Rises,Action Crime Drama Thriller,dc comics crime fighter terrorist secret ident...,Christian Bale Michael Caine Gary Oldman Anne ...,"[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,John Carter,Action Adventure Science Fiction,based on novel mars medallion space travel pri...,Taylor Kitsch Lynn Collins Samantha Morton Wil...,"[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [16]:
filmes_df.iloc[0].cast

'Sam Worthington Zoe Saldana Sigourney Weaver Stephen Lang Michelle Rodriguez'

### Variable crew

In [17]:
filmes_df.iloc[1].crew

'[{"credit_id": "52fe4232c3a36847f800b579", "department": "Camera", "gender": 2, "id": 120, "job": "Director of Photography", "name": "Dariusz Wolski"}, {"credit_id": "52fe4232c3a36847f800b4fd", "department": "Directing", "gender": 2, "id": 1704, "job": "Director", "name": "Gore Verbinski"}, {"credit_id": "52fe4232c3a36847f800b54f", "department": "Production", "gender": 2, "id": 770, "job": "Producer", "name": "Jerry Bruckheimer"}, {"credit_id": "52fe4232c3a36847f800b503", "department": "Writing", "gender": 2, "id": 1705, "job": "Screenplay", "name": "Ted Elliott"}, {"credit_id": "52fe4232c3a36847f800b509", "department": "Writing", "gender": 2, "id": 1706, "job": "Screenplay", "name": "Terry Rossio"}, {"credit_id": "52fe4232c3a36847f800b57f", "department": "Editing", "gender": 0, "id": 1721, "job": "Editor", "name": "Stephen E. Rivkin"}, {"credit_id": "52fe4232c3a36847f800b585", "department": "Editing", "gender": 2, "id": 1722, "job": "Editor", "name": "Craig Wood"}, {"credit_id": "52f

- Podemos extrair o diretor de cada filme.

In [18]:
def buscar_diretor(obj):
    L = []    
    for i in ast.literal_eval(obj):
        if i["job"]=="Director":
            L.append(i["name"])
            break
    return L 

In [19]:
# Extraindo o diretor
filmes_df["crew"]=filmes_df["crew"].apply(buscar_diretor)

# Eliminando os colchetes
filmes_df["crew"]=filmes_df["crew"].apply(lambda x:" ".join(x))

filmes_df.head()

Unnamed: 0,title,genres,keywords,cast,crew
0,Avatar,Action Adventure Fantasy Science Fiction,culture clash future space war space colony so...,Sam Worthington Zoe Saldana Sigourney Weaver S...,James Cameron
1,Pirates of the Caribbean: At World's End,Adventure Fantasy Action,ocean drug abuse exotic island east india trad...,Johnny Depp Orlando Bloom Keira Knightley Stel...,Gore Verbinski
2,Spectre,Action Adventure Crime,spy based on novel secret agent sequel mi6,Daniel Craig Christoph Waltz Léa Seydoux Ralph...,Sam Mendes
3,The Dark Knight Rises,Action Crime Drama Thriller,dc comics crime fighter terrorist secret ident...,Christian Bale Michael Caine Gary Oldman Anne ...,Christopher Nolan
4,John Carter,Action Adventure Science Fiction,based on novel mars medallion space travel pri...,Taylor Kitsch Lynn Collins Samantha Morton Wil...,Andrew Stanton


In [20]:
filmes_df.iloc[0].crew

'James Cameron'

In [21]:
filmes_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4809 entries, 0 to 4808
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   title     4809 non-null   object
 1   genres    4809 non-null   object
 2   keywords  4809 non-null   object
 3   cast      4809 non-null   object
 4   crew      4809 non-null   object
dtypes: object(5)
memory usage: 225.4+ KB


### Adicionando  uma nova coluna  "features" ao DataFrame que combina todas as palavras nas outras colunas

In [22]:
filmes_df['features'] = filmes_df['title'] + ' ' + filmes_df['genres'] + ' ' + filmes_df['keywords'] + ' ' + filmes_df['cast'] + ' ' + filmes_df['crew']

In [23]:
filmes_df.head()

Unnamed: 0,title,genres,keywords,cast,crew,features
0,Avatar,Action Adventure Fantasy Science Fiction,culture clash future space war space colony so...,Sam Worthington Zoe Saldana Sigourney Weaver S...,James Cameron,Avatar Action Adventure Fantasy Science Fictio...
1,Pirates of the Caribbean: At World's End,Adventure Fantasy Action,ocean drug abuse exotic island east india trad...,Johnny Depp Orlando Bloom Keira Knightley Stel...,Gore Verbinski,Pirates of the Caribbean: At World's End Adven...
2,Spectre,Action Adventure Crime,spy based on novel secret agent sequel mi6,Daniel Craig Christoph Waltz Léa Seydoux Ralph...,Sam Mendes,Spectre Action Adventure Crime spy based on no...
3,The Dark Knight Rises,Action Crime Drama Thriller,dc comics crime fighter terrorist secret ident...,Christian Bale Michael Caine Gary Oldman Anne ...,Christopher Nolan,The Dark Knight Rises Action Crime Drama Thril...
4,John Carter,Action Adventure Science Fiction,based on novel mars medallion space travel pri...,Taylor Kitsch Lynn Collins Samantha Morton Wil...,Andrew Stanton,John Carter Action Adventure Science Fiction b...


In [24]:
filmes_df.iloc[0].features

'Avatar Action Adventure Fantasy Science Fiction culture clash future space war space colony society Sam Worthington Zoe Saldana Sigourney Weaver Stephen Lang Michelle Rodriguez James Cameron'

## 4. Machine Learning

Os sistemas de recomendação baseados em conteúdo requerem de:
- Uma forma de vetorizar e converter em números  os atributos que caracterizam um serviço ou produto (CountVectorizer).
- Um meio para calcular similaridade entre os vetores resultantes (cosine_similarity). 

### Usando CountVectorizer para vetorizar o texto na coluna features

In [25]:
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(stop_words='english', min_df=20)
word_matrix = vectorizer.fit_transform(filmes_df['features'])
word_matrix.shape

(4809, 911)

- A tabela de contagem de palavras contém 4809 linhas,uma para cada filme e 911 colunas.

### Calculando as semelhanças de cosseno para cada par de linhas

In [26]:
from sklearn.metrics.pairwise import cosine_similarity

sim = cosine_similarity(word_matrix)

### Gerando recomendações de filmes

- O objetivo desse sistema é inserir um título de filme e identificar os n filmes que são mais semelhantes a esse filme. 

In [27]:
def get_recommendations(title, filmes_df, sim, count=10):
    # Obtenha o índice de linha do título especificado no DataFrame
    index = filmes_df.index[filmes_df['title'].str.lower() == title.lower()]
    
    # Retorna uma lista vazia se não houver nenhuma entrada para o título especificado
    if (len(index) == 0):
        return []

    # Obtenha a linha correspondente na matriz de similaridade
    similarities = list(enumerate(sim[index[0]]))
    
    # Classifique as pontuações de similaridade nessa linha em ordem decrescente
    recommendations = sorted(similarities, key=lambda x: x[1], reverse=True)
    
    # Obtenha as n principais recomendações, ignorando a primeira entrada na lista 
    top_recs = recommendations[1:count + 1]

    # Gere uma lista de títulos dos índices em top_recs
    titles = []

    for i in range(len(top_recs)):
        title = filmes_df.iloc[top_recs[i][0]]['title']
        titles.append(title)

    return titles

- Esta função classifica as semelhanças de cosseno em ordem decrescente para identificar a contagem filmes mais parecidos com aquele identificado pelo parâmetro do título. Em seguida, ele retorna os títulos daqueles filmes.

### Usando get_recommendations para buscar no banco de dados por filmes semelhantes.

Quais filmes são parecidos com [Shrek](https://www.imdb.com/title/tt0126029/?ref_=fn_al_tt_1)?

In [28]:
get_recommendations('Shrek', filmes_df, sim)

['Shrek 2',
 'Shrek Forever After',
 'Shrek the Third',
 'Thunder and the House of Magic',
 "The Emperor's New Groove",
 'The Haunted Mansion',
 'Aladdin',
 'Spirited Away',
 'Return to Never Land',
 'WALL·E']

Quais filmes são parecidos com [Die hard](https://www.imdb.com/title/tt0095016/)?

In [29]:
get_recommendations('Die Hard', filmes_df, sim)

['Die Hard 2',
 'The Prince',
 'Sphinx',
 'Die Hard: With a Vengeance',
 '13 Hours: The Secret Soldiers of Benghazi',
 'Act of Valor',
 'Live Free or Die Hard',
 'A Good Day to Die Hard',
 'Broken Arrow',
 'Surrogates']

## 5. Conclusão

- O sistema faz um trabalho bastante confiável ao escolher filmes semelhantes.


- Adicionar a variável overview  pode ser uma excelente ideia para melhorar o modelo de recomendações de filmes.
