# 💻 Sistema de Recomendação de Animes
***

Os sistemas de recomendação são os sistemas projetados para recomendar coisas ao usuário com base em muitos fatores diferentes. Esses sistemas preveem o produto mais provável que os usuários provavelmente comprarão e são de interesse, utilizando uma série de algoritmos, análise de dados e inteligência artificial (IA). Empresas como Netflix, Amazon, etc. usam sistemas de recomendação para ajudar seus usuários a identificar o produto ou os filmes corretos para eles.

Sistemas de recomendação lidam com um grande volume de informações presentes filtrando as informações mais importantes com base nos dados fornecidos por um usuário e outros fatores que atendem à preferência e interesse do usuário. Ele descobre a correspondência entre usuário e item e imputa as semelhanças entre usuários e itens para recomendação.

Esse sistema implementa um sistema de **Recomendações Colaborativas de Animes**: O usuário receberá recomendações de animes que pessoas com gostos similares aos dele preferiram no passado. 

## 📚 Bibliotecas

In [98]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import requests
from io import StringIO
import scipy as sp
from scipy.sparse import csr_matrix
from sklearn.metrics.pairwise import cosine_similarity

from ipywidgets import widgets, HBox, Layout
from IPython.display import display

## 💾 Conjuto de Dados

O Anime-Recommendation-Database-2020, conjunto de dados utilizado no projeto, reúne dados de recomendação de   320.0000 usuários e 16.000 animes do site myanimelist.net.

**MyAnimeList**, muitas vezes abreviado para MAL, é uma rede social focado nos consumidores de animes e mangás, na qual possui como maior característica a possibilidade de seus usuários criarem uma lista pessoal para que possam catalogar as obras e classificar-las através de notas.

Informações detalhadas sobre o cojunto de dados podem ser encontradas em: https://www.kaggle.com/hernan4444/anime-recommendation-database-2020.

Dois dataframes serão utilizados, ```animelist.csv``` e ```anime.csv```.

### 💾 Dataframe anime

```anime.csv``` contém informações gerais de todos os animes (17.562 animes diferentes) incluindo gênero, estatísticas, estúdio, etc. Este arquivo tem as seguintes colunas:

| Column         | Description                                                                                           |
|----------------|-------------------------------------------------------------------------------------------------------|
| MAL_ID         | MyAnimelist ID of the anime. (e.g. 1)                                                                 |
| Name           | full name of the anime. (e.g. Cowboy Bebop)                                                           |
| Score          | average score of the anime given from all users in MyAnimelist database. (e.g. 8.78)                  |
| Genres         | comma separated list of genres for this anime. (e.g. Action, Adventure, Comedy, Drama, Sci-Fi, Space) |
| English name   | full name in english of the anime. (e.g. Cowboy Bebop)                                                |
| Japanese name  | full name in japanses of the anime. (e.g. カウボーイビバップ)                                                  |
| Type           | TV, movie, OVA, etc. (e.g. TV)                                                                        |
| Episodes'      | number of chapters. (e.g. 26)                                                                         |
| Aired          | broadcast date. (e.g. Apr 3, 1998 to Apr 24, 1999)                                                    |
| Premiered      | season premiere. (e.g. Spring 1998)                                                                   |
| Producers      | comma separated list of produducers (e.g. Bandai Visual)                                              |
| Licensors      | comma separated list of licensors (e.g. Funimation, Bandai Entertainment)                             |
| Studios        | comma separated list of studios (e.g. Sunrise)                                                        |
| Source         | Manga, Light novel, Book, etc. (e.g Original)                                                         |
| Duration       | duration of the anime per episode (e.g 24 min. per ep.)                                               |
| Rating         | age rate (e.g. R - 17+ (violence & profanity))                                                        |
| Ranked         | position based in the score. (e.g 28)                                                                 |
| Popularity     | position based in the the number of users who have added the anime to their list. (e.g 39)            |
| Members        | number of community members that are in this anime's "group". (e.g. 1251960)                          |
| Favorites      | number of users who have the anime as "favorites". (e.g. 61,971)                                      |
| Watching       | number of users who are watching the anime. (e.g. 105808)                                             |
| Completed      | number of users who have complete the anime. (e.g. 718161)                                            |
| On-Hold        | number of users who have the anime on Hold. (e.g. 71513)                                              |
| Dropped        | number of users who have dropped the anime. (e.g. 26678)                                              |
| Plan to Watch' | number of users who plan to watch the anime. (e.g. 329800)                                            |
| Score-10'      | number of users who scored 10. (e.g. 229170)                                                          |
| Score-9'       | number of users who scored 9. (e.g. 182126)                                                           |
| Score-8'       | number of users who scored 8. (e.g. 131625)                                                           |
| Score-7'       | number of users who scored 7. (e.g. 62330)                                                            |
| Score-6'       | number of users who scored 6. (e.g. 20688)                                                            |
| Score-5'       | number of users who scored 5. (e.g. 8904)                                                             |
| Score-4'       | number of users who scored 4. (e.g. 3184)                                                             |
| Score-3'       | number of users who scored 3. (e.g. 1357)                                                             |
| Score-2'       | number of users who scored 2. (e.g. 741)                                                              |
| Score-1'       | number of users who scored 1. (e.g. 1580)                                                             |

De acordo com a documentação do [repositório no GitHub](https://github.com/Hernan4444/MyAnimeList-Database), o arquivo pode ser acessado pelo Google Drive.

In [99]:
# Importar anime.csv
url = 'https://drive.google.com/file/d/1vfmfi4dGAXBp0T8QTNVYhA5g8_irNbKs/view?usp=sharing'
id_arquivo = url.split('/')[-2]
dwn_url = 'https://drive.google.com/uc?export=download&id=' + id_arquivo
url2 = requests.get(dwn_url).text
csv_raw = StringIO(url2)
anime_df = pd.read_csv(csv_raw) # anima_data -> anime_df

### 💾 Dataframe animelist

```animelist.csv``` tem a lista de todos os animes registrados pelo usuário com a respectiva pontuação, status de exibição e número de episódios assistidos. Este conjunto de dados contém 109 milhões de linhas, 17.562 animes diferentes e 325.772 usuários diferentes. O arquivo tem as seguintes colunas:

| Column           | Description                                                                             |
|------------------|-----------------------------------------------------------------------------------------|
| user_id          | non identifiable randomly generated user id.                                            |
| anime_id         | MyAnemlist ID of the anime. (e.g. 1).                                                   |
| score            | score between 1 to 10 given by the user. 0 if the user didn't assign a score. (e.g. 10) |
| watching_status  | state ID from this anime in the anime list of this user. (e.g. 2)                       |
| watched_episodes | numbers of episodes watched by the user. (e.g. 24)                                      |



Devido âs limitaçãoes de processamento só os primeiros 5.000.000 regitros foram usados. Se você tiver acesso a uma boa estação de trabalho, poderá usar todos os 109 milhões de registros.

O arquivo csv completo pode ser baixado em: https://drive.google.com/drive/folders/1UhinqGrH2XytkpiD7LlLzMcn7MY2I_gt

In [100]:
# Importar animelist.csv
rating_df = pd.read_csv("animelist.csv", nrows=5000000)

# Por motivos de eficiência, usar esses DF para usar o merge()
anime_df = anime_df.rename(columns={"MAL_ID": "anime_id"})
anime_contact_df = anime_df[["anime_id", "Name"]]

## 📊 Processamento do Conjunto de Dados 

### 📊 Mesclar Conjunto de Dados

Aplicar a operação ```merge``` em ```rating_df``` e ```anime_contact_df``` (dados extraido de ```anime_df```) em termos do ```anime_id``` para crirar um conjunto de dados com ambas as informações.

In [107]:
# Mesclar Dataframes
rating_df = rating_df.merge(anime_contact_df, left_on = 'anime_id', right_on = 'anime_id', how = 'left')
rating_df = rating_df[["user_id", "Name", "anime_id","rating", "watching_status", "watched_episodes"]]

KeyError: "['Name'] not in index"

In [None]:
rating_df.head()

In [None]:
rating_df.shape

### 🚫 Verificando Dados Faltantes

In [None]:
print("Anime Missing Values:\n")
print(anime_df.isna().sum())
print("\nRatings Missing Values:\n")
print(rating_df.isna().sum())

Now I will take only that data in which a particular anime has more than 200Votes and if a user has gave in total more than 500Votes to the anime.  

In [None]:
count = rating_df['user_id'].value_counts()
count1 = rating_df['anime_id'].value_counts()
rating_df = rating_df[rating_df['user_id'].isin(count[count >= 500].index)].copy()
rating_df = rating_df[rating_df['anime_id'].isin(count1[count1 >= 200].index)].copy()

In [None]:
rating_df.shape

## 📈 Criação do Modelo

Vamos criar uma tabela dinâmica (Pivot Table) com base nas colunas ```Name``` e ```User_id``` e salvá-la em uma variável ```pivot_table```.

Uma tabela dinâmica é forma de agrupar as entradas em uma tabela bidimensional que fornece uma sumarização multidimensional dos dados, nesse caso, as notas de cada usuário para um anime diferente.

In [108]:
pivot_table = rating_data.pivot_table(index="Name",columns="user_id", values="rating").fillna(0)
pivot_table

user_id,6,12,17,19,21,42,44,47,53,60,...,16452,16453,16463,16471,16481,16487,16489,16492,16496,16507
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"""Bungaku Shoujo"" Kyou no Oyatsu: Hatsukoi",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,281.0,0.0,...,0.0,0.0,281.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"""Bungaku Shoujo"" Memoire",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,354.0,0.0,...,0.0,0.0,354.0,0.0,0.0,354.0,0.0,0.0,0.0,0.0
"""Bungaku Shoujo"" Movie",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,627.0,627.0,...,0.0,0.0,627.0,0.0,0.0,627.0,0.0,0.0,627.0,0.0
.hack//G.U. Returner,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
.hack//G.U. Trilogy,0.0,221.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
xxxHOLiC Kei,0.0,0.0,713.0,0.0,713.0,0.0,0.0,0.0,713.0,0.0,...,0.0,0.0,0.0,0.0,713.0,0.0,0.0,0.0,0.0,0.0
xxxHOLiC Movie: Manatsu no Yoru no Yume,0.0,0.0,464.0,0.0,0.0,0.0,0.0,0.0,464.0,0.0,...,0.0,0.0,0.0,0.0,464.0,0.0,0.0,0.0,0.0,0.0
xxxHOLiC Rou,0.0,0.0,462.0,0.0,0.0,0.0,0.0,0.0,462.0,0.0,...,0.0,0.0,0.0,0.0,462.0,0.0,0.0,0.0,0.0,0.0
xxxHOLiC Shunmuki,0.0,0.0,495.0,0.0,0.0,0.0,0.0,0.0,495.0,0.0,...,0.0,0.0,0.0,0.0,495.0,0.0,0.0,0.0,0.0,0.0


A **similaridade por cosseno** é uma medida de similaridade de entre dois vetores num espaço vetorial que avalia o valor do cosseno do ângulo compreendido entre eles. Esta função trigonométrica proporciona um valor igual a 1 se o ângulo compreendido é zero, isto é se ambos vetores apontam a um mesmo lugar. Para qualquer ângulo diferente de 0, o valor de cosseno é inferior a um.

Uma tabela dinâmica é bidimensional, então enxergando as colunas como vetores, podemos usar a similaridade por cosseno para relacionar os animes

In [109]:
# Transforma a matriz em uma matriz esparação para otimizar as operações
pivot_table_csr = csr_matrix(pivot_table.values)

In [110]:
# Modelo de Similaridade entre os Animes
anime_similarity = cosine_similarity(pivot_table_csr)

In [111]:
# DataFrame de Similaridade entre os Animes
ani_sim_df = pd.DataFrame(anime_similarity, index = pivot_table.index, columns = pivot_table.index)

In [112]:
def anime_recommendation(ani_name):
    """
    This function will return the top 5 shows with the highest cosine similarity value and show match percent              
    """
    if ani_name in ani_sim_df:
        number = 1
        print('Recomendados porque você assistiu {}:\n'.format(ani_name))
        for anime in ani_sim_df.sort_values(by = ani_name, ascending = False).index[1:6]:
            print(f'#{number}: {anime}, {round(ani_sim_df[anime][ani_name]*100,2)}% de similaridade')
            number +=1  
    else:
        print('ERRO: {} não é um nome de anime válido ou não se encontra no conjunto de dados.\n'.format(ani_name)) 

## 📈 Utilizando o Modelo

In [113]:
style = {'description_width': 'initial'}
text = widgets.Text(description="Nome do Anime: ", style=style)
button = widgets.Button(description="Executar", )
output = widgets.Output()

inputs = HBox([text, button])

def on_button_clicked(b):
    output.clear_output()
    with output:
        anime_recommendation(text.value)

button.on_click(on_button_clicked)
display(inputs, output)



HBox(children=(Text(value='', description='Nome do Anime: ', style=DescriptionStyle(description_width='initial…

Output()