# Pokemon Recommender


This is a Content based Recommender system which recommends pokemon similar to the Input pokemon.

When i found this dataset, I thought it would be great to build a pokemon recommender system since I like Pokemon

This dataset contains pokemons from first six generations and their stats, 
I'll be using stats and type of pokemon to know similarities between pokemon, 
I have labelled each pokemon with Normal/Mega/Legendary tag and included them while calculating similarity, it is kind of like adding more weight to the similarity score of pokemons with similar tag

Below I am importing required libraries

* numpy, pandas basic libraries
* seaborn and matplotlib for plotting
* cosine_similarity heart of this model, to find the similarities between pokemons
* warnings to ignore warnings
* standard scaler to bring down the stats of the pokemon to same scale
* ipywidgets for intracting with plots


In [None]:
import numpy as np
import pandas as pd
import os
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics.pairwise import cosine_similarity

import warnings
warnings.filterwarnings("ignore")

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

import ipywidgets as widgets
from ipywidgets import interact, interact_manual

In [None]:
poke_df = pd.read_csv('../input/pokemon/Pokemon.csv')
poke_df.head(10)

So total 800 Pokemons

In [None]:
poke_df.shape

There are two types of pokemon
* Single type pokemon
* Dual type pokemon

Below null values in Type 2 are actually of pokemon with only one type

In [None]:
poke_df.isnull().sum()

Filling the null values with 'None' string

Adding a new column to the dataframe with tags Normal/Mega/Legendary

* Normal tag is for normal pokemon
* Mega tag for Mega evloved pokemon and
* Legendary for Legendary pokemon

Since there is already a column which says whether a pokemon is legendary or not, I'll be using that to tag Legendary pokemon, For Mega I'll be using str.rfind('Mega ') to mark the Mega evolved pokemons and rest as normal

There are two pokemons Meganium and Yanmega, which have mega in their , since Yanmega has small case m so it's not a problem but Meganium starts with upper case M, so to avoid this I added space to "Mega' '" since all mega pokemons have space after the word 'Mega' to tag only Mega pokemons.

Dropped columns which are not required

In [None]:
poke_df['Type 2'].fillna(value='None',inplace=True)
poke_df['Pokemon Type'] = poke_df[['Name','Legendary']].apply(lambda x : 'Legendary' if x[1]==True else('Mega' if x[0].rfind('Mega ')!=-1 else 'Normal'),axis=1)

poke_df.drop(columns=['#','Legendary'],inplace=True)
poke_df.head()

Below is the plot to check the count of the features,

values of the plot can be changed instantly using ipywidget's interact

In [None]:
@interact
def count_plot(Feature = ['Type 1','Type 2','Generation','Pokemon Type'],
               Hue = [None,'Type 1','Type 2','Generation','Pokemon Type'],
               Palette=plt.colormaps(),
               Style=plt.style.available, Width = (10,25,1), Height = (5,10,1), xTicks=(0,90,1)):
    
    
    plt.figure(figsize=(Width,Height))
    plt.style.use(Style)
    sns.countplot(x = Feature,
                  data = poke_df,
                  hue = Hue,
                  palette=Palette)
    plt.xticks(rotation=xTicks)

Below I made count vectorizer dataframe of Pokemon types, by passing in zero matrix of 800 rows (poke_df rows) and 19 columns (len of type 2 unique strings)

Type 1 and Type 2 have same categories but why i used type 2 is because it has an additional 'None' string which i added while filling in null values, so that the type 2 of single type pokemon will be taken as 'None' when i run for loop, Later this None column will be dropped from countvectorizer dataframe

In the for loop below i took type 1 and type 2 of every row from original dataframe and passed it as list to .loc of every row in zero matrix and assigned it to 1, so it goes to that respective type column and marks it as 1

Initially i tried using pd.get_dummies['Type 1','Type 2'] , but the problem with this is it made 37 columns treating the same type in type1 and type2 as different types, for example if one pokemon has "Fire" in type1 and other has "Fire" in type2 ,it made two columns as Type 1_Fire and Type 2_Fire which affects the similarity score

In [None]:
type_df = pd.DataFrame(np.zeros((poke_df.shape[0],len(poke_df['Type 2'].unique())),dtype=int),
                      index = poke_df.index,columns = sorted(poke_df['Type 2'].unique().tolist()))

for i in range(len(type_df)):
    types = []
    types.append(poke_df.loc[i,'Type 1'])
    types.append(poke_df.loc[i,'Type 2'])
    type_df.loc[i,types] = 1

type_df.head()

In [None]:
print(sorted(poke_df['Type 1'].unique().tolist()))
print(sorted(poke_df['Type 2'].unique().tolist()))

I am using standard scaler on pokemon stats here

In [None]:
scaled_df = scaler.fit_transform(poke_df.drop(columns=['Name', 'Type 1', 'Type 2', 'Total', 'Generation', 'Pokemon Type']))
scaled_df = pd.DataFrame(scaled_df,columns=['HP', 'Attack', 'Defense','Sp. Atk', 'Sp. Def', 'Speed'])
scaled_df.head()

Made a new df using previous two dataframes and also Pokemon Type(Normal/Mega/Legendary) df, This new dataframe will be passed into cosine similarity to find similarities between pokemon

In [None]:
new_poke_df = pd.concat([type_df.drop(columns=['None']),pd.get_dummies(poke_df['Pokemon Type']),scaled_df],axis=1)
new_poke_df.head()

Cosine similarity calculates similarity between each and every pokemon which gives us a similarity matrix of shape (800,800),

So each row has 800 columns, these 800 scores in every row is the similarity score of that row's pokemon with every other pokemon

In [None]:
cos_sim = cosine_similarity(new_poke_df.values,new_poke_df.values)

In [None]:
cos_sim.shape

Making a series which gives the index of pokemon when it's name is passed

In [None]:
poke_index = pd.Series(poke_df.index,index=poke_df['Name'])
poke_index['Venusaur']

Making a function which gives recommendations, by default the number of recommendations is set to 5 but this can be changed

* Below function takes pokemon name as input and passes it to index series which gives pokemon index
* Using this index we can take the similarity score of that particular pokemon from cosine similarity matrix
* while extracting cosine similarity of given pokemon i am also indexing each item in that respective pokemon's similarity array using enumerate, since the items in the array are in the same order as the pokemon in our original dataframe, so this index can later be used to extract pokemon names from dataframe by passing them as list
* sorting the score based on the similarity score in Descending order so that similar pokemons will be on top
* In similar_pokemon variable i am only storing scores based on recommendations leaving the first score (since it is the score of the pokemon given in the function) 
* Finally making a list of indices of those similarity score and passing them as list to the original poke_df and returing it

In [None]:
def recommend(pokemon,recommendations=5):
    index = poke_index[pokemon]
    similarity_score = list(enumerate(cos_sim[index]))
    sorted_score = sorted(similarity_score,key=lambda x : x[1],reverse=True)
    similar_pokemon = sorted_score[1:recommendations+1]
    poke_indices = [i[0] for i in similar_pokemon]
    return poke_df.iloc[poke_indices]

In [None]:
recommend('Charizard')

I also want to filter the results based on type,generation, which can be done using below function

In [None]:
def rec_pokemon_byFilter(pokemon,
                         recommendations = 10,
                         include_original = False,
                         Type = None,
                         Type2 = None,
                         Generation = None,
                         pokemon_type = None):
    
    '''
    Recommends top 10 Pokemon which are similar to the given Pokemon
    
    By default number of recommendations is set to 10
    
    pokemon          : Name of the pokemon in string format
    recommendation   : Number of similar pokemon in the output, value must be Integer
    include_original : Includes the given pokemon in the output dataframe, value must be boolean
    Type             : Filter output by primary type
    Type2            : Filter output by secondary type
    Generation       : Filter output by Generation 
    pokemon_type     : Filter output by Pokemon Type 
    
    '''
    index = poke_index[pokemon]
    similarity_score = list(enumerate(cos_sim[index]))
    sorted_score = sorted(similarity_score,key=lambda x : x[1],reverse=True)
    
    
    if include_original == False:
        similar_pokemon = sorted_score[1:]
    elif include_original == True:
        similar_pokemon = sorted_score
    
    poke_indices = [i[0] for i in similar_pokemon]
    df = poke_df.iloc[poke_indices]
    
    if Type != None:
        df = df[(df['Type 1'] == Type)|(df['Type 2'] == Type)]
    else:
        pass
    
    if Type2 != None:
        df = df[(df['Type 1'] == Type2)|(df['Type 2'] == Type2)]
    else:
        pass
    
    if Generation != None:
        df = df[df['Generation'] == Generation]
    else:
        pass
    
    if pokemon_type != None:
        df = df[df['Pokemon Type'] == pokemon_type]
    else:
        pass
    
    
    return df.head(recommendations) if include_original == False else df.head(recommendations+1)

In [None]:
rec_pokemon_byFilter('Dragonite',recommendations=5)

In [None]:
rec_pokemon_byFilter('Dragonite',recommendations=10,pokemon_type='Normal',include_original=True)

In [None]:
rec_pokemon_byFilter('Dragonite',recommendations=10,include_original=True,Type='Dragon')

In [None]:
rec_pokemon_byFilter('Dragonite',recommendations=10,include_original=True,Type='Fire')