## 1. Load library

In [None]:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from ggplot import *
from sklearn.model_selection import train_test_split


In [None]:
pokemon = pd.read_csv('../input/Pokemon.csv')
pokemon.head()

In [None]:
pokemon.shape

In [None]:
pokemon2 = pokemon.drop(['#','Name', 'Type 1', 'Type 2', 'Generation', 'Legendary'], axis =1 )

## 2. Visualization with ggplot

In [None]:
ggplot(pokemon, aes(x='HP', y='Total', color='Type 1')) + geom_point() + \
    xlab("HP") + ylab("Total") + ggtitle("Total vs HP")

In [None]:
ggplot(pokemon[pokemon['Type 1'] == 'Water'], 
       aes(x='Attack', y='Defense', size='HP', color = 'Legendary')) +\
geom_point(shape = 5) + xlab('Attack') + ylab('Defense') + ggtitle("Water")+\
theme_bw()

**This polt doesn't work well for its legend color mistake.**

In [None]:
ggplot(pokemon[pokemon['Type 1'].isin(['Bug', 'Dark', 'Water', 'Steel'])], 
       aes(x='Attack', y='Defense', size = 'HP', color = 'Type 1', group = 'Type 1')) + \
    geom_point() + \
    facet_wrap('Type 1') + \
    theme_bw()

**The facet gglot shows the same problem with the color legend.**

In [None]:
ggplot(aes(x='Type 1'), data=pokemon) + \
    geom_bar(stat='identity')+ coord_flip()+ \
    xlab("Type 1") + ylab("Count") + ggtitle("Pokemon Type Count")

**The x axis label is overlaped, but when I tried to rotate the label using ggplot, the str type label will be transferred into a series of float number. When I tried to transfer the vertical bar into horizontal bar, it failed again. Also, this bar graph cannot be filled the color with column 'Type 1'**

### From the above graphes, we can see that ggplot doesn't work well in Python. Therefore, if we want to draw same graph, using libraries matplotlib and seaborn would show a better result. 

## 3. Correlation Heat 

In [None]:
pokemon_df = pokemon.drop(['#','Name', 'Type 1', 'Type 2', 'Generation', 'Legendary'], axis = 1)

In [None]:
ax = plt.axes()
ax.set_title("Pokemon Total Rating Correlation")
corr = pokemon2.corr()
sns.heatmap(corr, cbar=True, annot=True, square=True, fmt='.2f', 
                      annot_kws={'size': 8},  
                      yticklabels=corr.columns.values, xticklabels=corr.columns.values)

**From the heat map above, we can see that the Total rating of the pokemon are correlated with Sp. Atk and Sp. Def than the other attibutes. However, even though the lowest correlation between Total and Speed is  0.58, it is still a relatively high rate. Therefore, we will use other attributes to predict Total rating.**

## 4. Split Train and Test Dataset

In [None]:
x = pokemon['Total']
y = pokemon [['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed']]

In [None]:
x_train, x_test, y_train, y_test = train_test_split( x, y, test_size=0.5, random_state=107)

In [None]:
x_train.shape

In [None]:
x_test.shape

In [None]:
y_train.shape

In [None]:
y_test.shape

**By using function "train_test_split", we can split the dataframe into train and test dataframes**