**Pokemon Wrangling Using Python**<br>
Let's catch 'em all

First, we import some (most used and useful) libraries<br>We can initialize the name using `as` to simply call the module

In [None]:
import numpy as np # numpy is a library for linear algebra
import pandas as pd # pandas is for data processing, CSV file I/O (e.g. pd.read_csv)
import random as rd # for generating pseudo-random numbers
import datetime, pytz # for manipulating dates and times
import io # provides the Python interfaces to stream handling
import requests # allows you to send organic, grass-fed HTTP/1.1 requests
import seaborn as sb # visualization library based on matplotlib
import matplotlib as mpl # famous 2D plotting library
import matplotlib.pyplot as plp 
import sklearn # implements machine learning, preprocessing, cross-validation and visualization algorithms
import sqlite3 # performs SQL on Python

**Data Import**

We can read the data using Pandas

In [None]:
poke = pd.read_csv('../input/Pokemon.csv')

Use `head` method to preview the data

In [None]:
poke.head()

Description of each column

**#**: ID for each pokemon<br>
**Name**: Name of each pokemon<br>
**Type 1**: Each pokemon has a type, this determines weakness/resistance to attacks<br>
**Type 2**: Some pokemon are dual type and have 2<br>
**Total**: sum of all stats that come after this, a general guide to how strong a pokemon is<br>
**HP**: hit points, or health, defines how much damage a pokemon can withstand before fainting<br>
**Attack**: the base modifier for normal attacks (eg. Scratch, Punch)<br>
**Defense**: the base damage resistance against normal attacks<br>
**SP Atk**: special attack, the base modifier for special attacks (e.g. fire blast, bubble beam)<br>
**SP Def**: the base damage resistance against special attacks<br>
**Speed**: determines which pokemon attacks first each round<br>

Lets see the summary statistics, excluding NaN values by using `describe` method

In [None]:
poke.describe()

Lets crunch the numbers, see how many pokemons for each type and visualize the proportion using `matplotlib`

In [None]:
poke["Type 1"].value_counts()

In [None]:
types = poke['Type 1']
colors = ['turquoise','white','lightgreen','green','purple','red','chocolate','yellow','brown','yellowgreen'
          ,'lavender','grey','lightcoral','darkgrey','silver','lightblue','pink','orange']
explode = np.arange(len(types.unique())) * 0.01

types.value_counts().plot.pie(
    explode=explode,
    colors=colors,
    title="Percentage of Different Types of Pokemon",
    autopct='%1.1f%%',
    shadow=True,
    startangle=90,
    figsize=(8,8)
)
plp.tight_layout()

Let's see the pattern between attack/defense and legendary/non legendary<br>
Visualization using `seaborn`

In [None]:
sb.FacetGrid(poke, hue="Legendary", size=8) \
   .map(plp.scatter, "Defense", "Attack") \
   .add_legend()

We can see the average HP of each type, Dragon ftw

In [None]:
typehp = poke[['Type 1', 'HP']].groupby(['Type 1'], as_index=False).mean().sort_values(by='HP', ascending=False)
sb.barplot(x='Type 1', y='HP', data=typehp)

SQL sounds fun, let's connect to SQL

In [None]:
c = sqlite3.connect(':memory:')
pd.read_csv('../input/Pokemon.csv').to_sql('poke',c)

For the starter, let's query on the biggest and smallest HP Pokemon

In [None]:
pd.read_sql("SELECT Name, `Type 1`, max(HP) FROM poke", c)

In [None]:
pd.read_sql("SELECT Name, `Type 1`, min(HP) FROM poke", c)

Top 5 Legendary and Non Legendary Pokemon Based on Total

In [None]:
pd.read_sql("SELECT Name, [Type 1], [Type 2], Total, HP, Attack, Defense, Speed, [Sp. Atk], [Sp. Def], Legendary, Generation FROM poke WHERE Legendary = '1' ORDER BY 4 DESC LIMIT 5", c)

In [None]:
pd.read_sql("SELECT Name, [Type 1], [Type 2], Total, HP, Attack, Defense, Speed, [Sp. Atk], [Sp. Def], Legendary, Generation FROM poke WHERE Legendary = '0' ORDER BY 4 DESC LIMIT 5", c)

Top 5 Legendary and Non Legendary Pokemon Based on HP

In [None]:
pd.read_sql("SELECT Name, [Type 1], [Type 2], Total, HP, Attack, Defense, Speed, [Sp. Atk], [Sp. Def], Legendary, Generation FROM poke WHERE Legendary = '1' ORDER BY 5 DESC LIMIT 5", c)

In [None]:
pd.read_sql("SELECT Name, [Type 1], [Type 2], Total, HP, Attack, Defense, Speed, [Sp. Atk], [Sp. Def], Legendary, Generation FROM poke WHERE Legendary = '0' ORDER BY 5 DESC LIMIT 5", c)

Statistics of Types

In [None]:
pd.read_sql("SELECT [Type 1], avg(HP), avg(Attack), avg(Defense), avg(Speed), max(HP), min(HP), count(Name) FROM poke GROUP BY 1 ORDER BY 2 DESC", c)

Number of Pokemon of each generation

In [None]:
pd.read_sql("SELECT Generation, count(Name) FROM poke GROUP BY 1", c)

I'm not done yet, I'll be back to analyze the fun asap :) Thanks!<br>
brb leveling up my Primarina, Gengar and Lucario