### Pokemon with Images
This dataset explores a feature that is not present in the original dataset: The sprites of all pokemon. I'm curious to see the color distribution among Pokemon types. It won't work on Kaggle's platform. 
I don't think it's against the rules, but I'll delete it if it is.

If anyone would like to have the data, just send me a message!


<table border="0">
    <tr>
        <td><img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/pokesprites/025.png"/></td>
        <td><img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/pokesprites/094.png"/></td>
        <td><img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/pokesprites/558.png"/></td>
        <td><img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/pokesprites/654.png"/></td>
        <td><img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/pokesprites/009.png"/></td>
    </tr>
</table>

### Download all Pokemon Sprites
First, I'll use urllib to download the Pokemon's sprites. Serebii.net had a pretty straightforward url for the images, so it wasn't that hard.

In [None]:
import os,urllib.request



def download_all_sprites():
    POKEDIR = 'pokesprites/'
    N_POKEMON = 721 
    BASE_URL = "http://serebii.net/xy/pokemon/"
    try:
        os.mkdir(POKEDIR)
    except:
        pass
    for i in range(1,N_POKEMON+1):
        if i < 10:
            end = "00"+str(i) + ".png"
            url = BASE_URL + end
        elif i < 100:
            end = "0"+str(i) + ".png"
            url = BASE_URL + end 
        else:
            end = str(i) + ".png"
            url = BASE_URL + end
        filename = POKEDIR + end
        u = urllib.request.urlretrieve(url,filename)
        
# download_all_sprites()

## Helper functions
This cell holds the base for the code.

My idea was: count each pixel color in every sprite (excluding transparent ones). However, 3 bytes for every color (the RGB format) would make the color distribution too sparse. 

To fix this issue, I indexed 256 colors in a KD Tree, and assigned for each color that did not belong to this color set its nearest neighbor.
Apart from that, I think the code is more or less straightforward.

Obs: The code is in no way optimized.

In [None]:
from PIL import Image,ImageColor
import numpy as np
from matplotlib import pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
import seaborn as sns
import pandas as pd
from scipy.spatial import KDTree
import warnings

colors = ['#000000', '#000033', '#000066', '#000099', '#0000CC', '#0000FF',
          '#003300', '#003333', '#003366', '#003399', '#0033CC', '#0033FF', 
          '#006600', '#006633', '#006666', '#006699', '#0066CC', '#0066FF', 
          '#009900', '#009933', '#009966', '#009999', '#0099CC', '#0099FF', 
          '#00CC00', '#00CC33', '#00CC66', '#00CC99', '#00CCCC', '#00CCFF', 
          '#00FF00', '#00FF33', '#00FF66', '#00FF99', '#00FFCC', '#00FFFF', 
          '#330000', '#330033', '#330066', '#330099', '#3300CC', '#3300FF', 
          '#333300', '#333333', '#333366', '#333399', '#3333CC', '#3333FF', 
          '#336600', '#336633', '#336666', '#336699', '#3366CC', '#3366FF', 
          '#339900', '#339933', '#339966', '#339999', '#3399CC', '#3399FF', 
          '#33CC00', '#33CC33', '#33CC66', '#33CC99', '#33CCCC', '#33CCFF', 
          '#33FF00', '#33FF33', '#33FF66', '#33FF99', '#33FFCC', '#33FFFF', 
          '#660000', '#660033', '#660066', '#660099', '#6600CC', '#6600FF', 
          '#663300', '#663333', '#663366', '#663399', '#6633CC', '#6633FF', 
          '#666600', '#666633', '#666666', '#666699', '#6666CC', '#6666FF', 
          '#669900', '#669933', '#669966', '#669999', '#6699CC', '#6699FF', 
          '#66CC00', '#66CC33', '#66CC66', '#66CC99', '#66CCCC', '#66CCFF', 
          '#66FF00', '#66FF33', '#66FF66', '#66FF99', '#66FFCC', '#66FFFF', 
          '#990000', '#990033', '#990066', '#990099', '#9900CC', '#9900FF',
          '#993300', '#993333', '#993366', '#993399', '#9933CC', '#9933FF',
          '#996600', '#996633', '#996666', '#996699', '#9966CC', '#9966FF', 
          '#999900', '#999933', '#999966', '#999999', '#9999CC', '#9999FF',
          '#99CC00', '#99CC33', '#99CC66', '#99CC99', '#99CCCC', '#99CCFF', 
          '#99FF00', '#99FF33', '#99FF66', '#99FF99', '#99FFCC', '#99FFFF', 
          '#CC0000', '#CC0033', '#CC0066', '#CC0099', '#CC00CC', '#CC00FF', 
          '#CC3300', '#CC3333', '#CC3366', '#CC3399', '#CC33CC', '#CC33FF', 
          '#CC6600', '#CC6633', '#CC6666', '#CC6699', '#CC66CC', '#CC66FF',
          '#CC9900', '#CC9933', '#CC9966', '#CC9999', '#CC99CC', '#CC99FF', 
          '#CCCC00', '#CCCC33', '#CCCC66', '#CCCC99', '#CCCCCC', '#CCCCFF', 
          '#CCFF00', '#CCFF33', '#CCFF66', '#CCFF99', '#CCFFCC', '#CCFFFF', 
          '#FF0000', '#FF0033', '#FF0066', '#FF0099', '#FF00CC', '#FF00FF', 
          '#FF3300', '#FF3333', '#FF3366', '#FF3399', '#FF33CC', '#FF33FF', 
          '#FF6600', '#FF6633', '#FF6666', '#FF6699', '#FF66CC', '#FF66FF',
          '#FF9900', '#FF9933', '#FF9966', '#FF9999', '#FF99CC', '#FF99FF',
          '#FFCC00', '#FFCC33', '#FFCC66', '#FFCC99', '#FFCCCC', '#FFCCFF', 
          '#FFFF00', '#FFFF33', '#FFFF66', '#FFFF99', '#FFFFCC', '#FFFFFF']

colors_rgb = list(map(ImageColor.getrgb,colors))
approximate_color = KDTree(colors_rgb)


def to_hex(rgb):
    return '#%02x%02x%02x' % tuple(rgb)

def nearest_color(rgbcolor):
    _,ind = approximate_color.query(rgbcolor)
    return approximate_color.data[ind]


def freqs_to_frame(freqs):
    tbl = []
    for c in freqs:
        tbl.append((c,freqs[c]))


    tbl = pd.DataFrame.from_records(tbl,
                                       columns=('color','frequency'))
    return tbl.sort_values(by='frequency',ascending=False)[0:10]

def generate_color_histogram(pokenumber):
    if pokenumber < 10:
        pokeimage = 'pokesprites/00%d.png' %pokenumber
    elif pokenumber < 100:
        pokeimage = 'pokesprites/0%d.png' %pokenumber
    else:
        pokeimage = 'pokesprites/%d.png' %pokenumber
        
    
    img = Image.open(BASE_DIR + pokeimage)

    m,n = img.size
    color_freqs = {}
    npix = 0
    for i in range(m):
        for j in range(n):
            p = img.getpixel((i,j))
            if p[3] == 0: # transparent
                continue
            else:
                p = nearest_color(p[0:3])
                p_str = to_hex(p)
                if p_str in color_freqs:
                    color_freqs[p_str] += 1
                    npix +=1
                else:
                    color_freqs[p_str] = 1
                    npix += 1
    # normalize:
    for key in color_freqs:
        color_freqs[key] /= npix

    
    return freqs_to_frame(color_freqs)

def save_barplot(color_frame,filename):

    pal = sns.color_palette(color_frame['color'])
    fig, ax = plt.subplots( nrows=1, ncols=1 )
    
    ax = sns.barplot(x='color',y='frequency',
                         palette=pal,data=color_frame)
    ax.set_xticklabels([])
    ax.set_xlabel("")
    ax.set_ylabel("")
        

    fig.savefig(filename)
    plt.close(fig)  

try:
    os.mkdir('barplots/')
except:
    pass


### Create Barplots for Every Pokemon

In [None]:
def create_pkmn_barplots():
    N_POKEMON = 721
    i=0

    N_POKEMON = 721
    i=0
    for i in range(1,N_POKEMON + 1):
        c = generate_color_histogram(i)
        if i < 10:
            pokeimage = 'barplots/00%d.png' %i
        elif i < 100:
            pokeimage = 'barplots/0%d.png' %i
        else:
            pokeimage = 'barplots/%d.png' %i

        save_barplot(c,pokeimage)
        i+=1
        if i%20 == 0:
            print(i)

            
#create_pkmn_barplots()

## Main Color Distributions for The Starters

<table border="0">
    <tr>
        <td><img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/pokesprites/003.png"/></td>
        <td><img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/barplots/003.png"></td>
    </tr>
    <tr>
        <td><img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/pokesprites/006.png"/></td>
        <td><img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/barplots/006.png"></td>
    </tr>
    <tr>
        <td><img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/pokesprites/009.png"/></td>
        <td><img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/barplots/009.png"></td>
    </tr>
</table>

## Color Distribution by First Type

In [None]:
import pandas as pd

def create_barplots_types();
    df = pd.DataFrame.from_csv('Pokemon.csv')

    tp_col_freqs = {}

    try:
        os.mkdir('bytype/')
    except:
        continue
    for tp,df_tp in df.groupby('Type 1'):
        tp_col_freqs[tp] = {}
        freqs = tp_col_freqs[tp]
        for num,row in df_tp.iterrows():
            if "Mega " in row['Name']:
                continue # No mega evolutions in this 
            dist = generate_color_histogram(num)
            for _,row in dist.iterrows():
                if row['color'] in freqs:
                    freqs[row['color']] += row['frequency']
                else:
                    freqs[row['color']] = row['frequency']
        for color in freqs:
            freqs[color] /= float(df_tp.shape[0])    
        color_df = freqs_to_frame(freqs)
        filename = "bytype/" + tp + ".png"
        save_barplot(color_df,filename)
        print("%s finished." %tp)
        
#create_barplots_types()

## 10 Most Frequent Colors for Each Type

<table>
    <tr>    
        <h1>Bug</h1>
        <img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/bytype/Bug.png"/>
        
        
    </tr>
    <tr>
        <h1>Dark</h1>
        <img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/bytype/Dark.png"/>
    </tr>
    <tr>
                <h1>Dragon</h1>
        <img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/bytype/Dragon.png"/>
    </tr>
    <tr>
                <h1>Electric</h1>
        <img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/bytype/Electric.png"/>
    </tr>
    <tr>
                <h1>Fairy</h1>
        <img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/bytype/Fairy.png"/>
    </tr>
    <tr>
               <h1>Fighting</h1>
        <img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/bytype/Fighting.png"/>
    </tr>
    <tr>
              <h1>Fire</h1>
        <img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/bytype/Fire.png"/>
    </tr>
    <tr>
               <h1>Flying</h1>
        <img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/bytype/Flying.png"/>
    </tr>
    <tr>
               <h1>Ghost</h1>
        <img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/bytype/Ghost.png"/>
    </tr>
        <tr>
               <h1>Grass</h1>
        <img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/https://raw.githubusercontent.com/egrinstein/pokeimages/master/bytype/Grass.png"/>
    </tr>
    <tr>
               <h1>Ground</h1>
        <img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/bytype/Ground.png"/>
    </tr>
    <tr>
               <h1>Ice</h1>
        <img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/bytype/Ice.png"/>
    </tr>
    <tr>
               <h1>Normal</h1>
        <img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/bytype/Normal.png"/>
    </tr>
    <tr>
               <h1>Poison</h1>
        <img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/bytype/Poison.png"/>
    </tr>
    <tr>
               <h1>Psychic</h1>
        <img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/bytype/Psychic.png"/>
    </tr>
    <tr>
               <h1>Rock</h1>
        <img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/bytype/Rock.png"/>
    </tr>
    <tr>
               <h1>Steel</h1>
        <img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/https://raw.githubusercontent.com/egrinstein/pokeimages/master/bytype/Steel.png"/>
    </tr>
    <tr>
               <h1>Water</h1>
        <img src="https://raw.githubusercontent.com/egrinstein/pokeimages/master/bytype/Water.png"/>
    </tr>
    
    
    
</table>

<p>
    It's important to note that black and grey is present in almost every distribution for a simple reason: Every image has it's borders filled by black, and all the shadows are made with a grey color.
    
    Although the distribution seems unexpressive for some types,
    it works really well to characterize some types: 
    <ul>
    <li> In Fairy, beige and white are the predominant color, just like we'd imagine</li>
    <li>In Poison, purple reigns.</li>
    <li>Ice is full of blue and whitish colors.</li>
    <li> Fire is filled with red/orange/yellow </li>
    <li> Grass is populated with (drumroll) green.</li>
    </ul>
</p>


I hope you had as much fun reading this Notebook as I had making it!
If you have any questions, requests and etc, feel free to message me.