# 1. Perkenalan

#### 1.1 Gambaran Besar Dataset
Dataset ini mencakup 721 karakter game pokemon, yaitu nomor, nama, tipe pertama dan kedua, dan statistik dasar: HP, Serangan, Pertahanan, Serangan Khusus, Pertahanan Khusus, dan Kecepatan.

#### 1.2 Sumber Data
Dataset asli berasal dari data kaggle: https://www.kaggle.com/abcsds/pokemon

#### 1.3 Objective yang ingin dicapai
Objective yang ingin dicapai dalam dataset ini adalah melihat nilai "Hit Points" atau "Health" terakhir dari Pokemon

#### 1.4 Features
* #: ID for each pokemon
* Name: Name of each pokemon
* Type 1: Each pokemon has a type, this determines weakness/resistance to attacks
* Type 2: Some pokemon are dual type and have 2
* Total: sum of all stats that come after this, a general guide to how strong a pokemon is
* HP: hit points, or health, defines how much damage a pokemon can withstand before fainting
* Attack: the base modifier for normal attacks (eg. Scratch, Punch)
* Defense: the base damage resistance against normal attacks
* SP Atk: special attack, the base modifier for special attacks (e.g. fire blast, bubble beam)
* SP Def: the base damage resistance against special attacks
* Speed: determines which pokemon attacks first each round


# 2. Import Library

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

# 3. Data Loading

In [2]:
df = pd.read_csv('Pokemon.csv')

#### 3.1 loading dataset

In [3]:
df

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,600,50,100,150,100,150,50,6,True
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True


In [4]:
df.shape

(800, 13)

In [5]:
# loading 10 dataset teratas
df.head(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


In [6]:
# loading 10 dataset terbawah
df.tail(10)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
790,714,Noibat,Flying,Dragon,245,40,30,35,45,40,55,6,False
791,715,Noivern,Flying,Dragon,535,85,70,80,97,80,123,6,False
792,716,Xerneas,Fairy,,680,126,131,95,131,98,99,6,True
793,717,Yveltal,Dark,Flying,680,126,131,95,131,98,99,6,True
794,718,Zygarde50% Forme,Dragon,Ground,600,108,100,121,81,95,95,6,True
795,719,Diancie,Rock,Fairy,600,50,100,150,100,150,50,6,True
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True
799,721,Volcanion,Fire,Water,600,80,110,120,130,90,70,6,True


In [7]:
# melakukan describe statistik dari dataset
df.describe()

Unnamed: 0,#,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation
count,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0
mean,362.81375,435.1025,69.25875,79.00125,73.8425,72.82,71.9025,68.2775,3.32375
std,208.343798,119.96304,25.534669,32.457366,31.183501,32.722294,27.828916,29.060474,1.66129
min,1.0,180.0,1.0,5.0,5.0,10.0,20.0,5.0,1.0
25%,184.75,330.0,50.0,55.0,50.0,49.75,50.0,45.0,2.0
50%,364.5,450.0,65.0,75.0,70.0,65.0,70.0,65.0,3.0
75%,539.25,515.0,80.0,100.0,90.0,95.0,90.0,90.0,5.0
max,721.0,780.0,255.0,190.0,230.0,194.0,230.0,180.0,6.0


In [8]:
df.quantile([0.25])

Unnamed: 0,#,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0.25,184.75,330.0,50.0,55.0,50.0,49.75,50.0,45.0,2.0,0.0


In [9]:
# melihat semua columns dan datatypes dari dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   #           800 non-null    int64 
 1   Name        800 non-null    object
 2   Type 1      800 non-null    object
 3   Type 2      414 non-null    object
 4   Total       800 non-null    int64 
 5   HP          800 non-null    int64 
 6   Attack      800 non-null    int64 
 7   Defense     800 non-null    int64 
 8   Sp. Atk     800 non-null    int64 
 9   Sp. Def     800 non-null    int64 
 10  Speed       800 non-null    int64 
 11  Generation  800 non-null    int64 
 12  Legendary   800 non-null    bool  
dtypes: bool(1), int64(9), object(3)
memory usage: 75.9+ KB


#### 3.2 Exploring Dataset

In [10]:
df.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


In [11]:
df['Total'].max()

780

In [12]:
df.loc[df['Total'] == df['Total'].max(),'Name']

163      MewtwoMega Mewtwo X
164      MewtwoMega Mewtwo Y
426    RayquazaMega Rayquaza
Name: Name, dtype: object

Dari dataset terlihat bahwa pokemon terkuat adalah **MewtwoMega Mewtwo X, MewtwoMega Mewtwo Y, dan RayquazaMega Rayquaza** dengan Total kekuatan **780**

In [13]:
df['Total'].min()

180

In [14]:
df.loc[df['Total'] == df['Total'].min(),'Name']

206    Sunkern
Name: Name, dtype: object

Dari dataset terlihat bahwa pokemon terkuat adalah **Sunkern** dengan Total kekuatan **180**

In [15]:
df['HP'].max()

255

In [16]:
df.loc[df['HP'] == df['HP'].max(),'Name']

261    Blissey
Name: Name, dtype: object

Dari dataset terlihat bahwa Pokemon **Blissey** merupakan pokemon dengan Hit Point tertinggi

In [17]:
df['HP'].min()

1

In [18]:
df.loc[df['HP'] == df['HP'].min(),'Name']

316    Shedinja
Name: Name, dtype: object

Dari dataset terlihat bahwa Pokemon **Shedinja** merupakan pokemon dengan Hit Point terendah

In [19]:
df['Defense'].max()

230

In [20]:
df.loc[df['Defense'] == df['Defense'].max(),'Type 1']

224    Steel
230      Bug
333    Steel
Name: Type 1, dtype: object

Dari Datset terlihat bahwa Pokemon yang memiliki Type **Steel dan Bug** memiliki Pertahanan terbaik

In [21]:
df['Attack'].max()

190

In [22]:
df.loc[df['Attack'] == df['Attack'].max(),'Type 1']

163    Psychic
Name: Type 1, dtype: object

Dari Dataset terlihat bahwa Pokemon yang memiliki Type **Psychic** memiliki Penyerangan terbaik

In [23]:
df['Sp. Atk'].max()

194

In [24]:
df.loc[df['Sp. Atk'] == df['Sp. Atk'].max(),'Type 1']

164    Psychic
Name: Type 1, dtype: object

Dari Dataset terlihat bahwa Pokemon yang memiliki Type **Psychic** memiliki Senjata Spesial terbaik

In [25]:
df['Sp. Def'].max()

230

In [26]:
df.loc[df['Sp. Def'] == df['Sp. Def'].max(),'Type 1']

230    Bug
Name: Type 1, dtype: object

Dari Datset terlihat bahwa Pokemon yang memiliki Type **Bug** memiliki Pertahanan spesial terbaik dalam menghadapi serangan spesial lawan

In [27]:
df['Speed'].max()

180

In [28]:
df.loc[df['Speed'] == df['Speed'].max(),'Type 1']

431    Psychic
Name: Type 1, dtype: object

Dari Datset terlihat bahwa Pokemon yang memiliki Type **Psychic** merupakan jenis Pokemon yang akan menyerang lebih dulu setiap putaran

In [29]:
df['Speed'].min()

5

In [30]:
df.loc[df['Speed'] == df['Speed'].min(),'Type 1']

230       Bug
495    Normal
Name: Type 1, dtype: object

Dari Datset terlihat bahwa Pokemon yang memiliki Type **Bug dan Normal** merupakan jenis Pokemon yang akan menyerang paling akhir dalam setiap putaran

## 4. Data Cleaning

#### 4.1 Memberi nama baru untuk kolom

In [31]:
# mengecek nama columns
df.columns

Index(['#', 'Name', 'Type 1', 'Type 2', 'Total', 'HP', 'Attack', 'Defense',
       'Sp. Atk', 'Sp. Def', 'Speed', 'Generation', 'Legendary'],
      dtype='object')

In [32]:
# rename columns
df = df.rename(
  columns={'#': "ID", "Name": "Pokemon", "Type 1": "Main Type", 'Type 2': "Second Type", 'Total': "Full Power", "HP": "Hit Points", "Sp. Atk": "Special Attack", "Sp. Def": "Special Defense"}
)

In [33]:
df.head()

Unnamed: 0,ID,Pokemon,Main Type,Second Type,Full Power,Hit Points,Attack,Defense,Special Attack,Special Defense,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


#### 4.2 Missing Values

In [34]:
# mengecek apakah terdapat Unexpected Missing Values
for var in df.columns:
    print(var, df[var].unique()[0:10], '\n')

ID [ 1  2  3  4  5  6  7  8  9 10] 

Pokemon ['Bulbasaur' 'Ivysaur' 'Venusaur' 'VenusaurMega Venusaur' 'Charmander'
 'Charmeleon' 'Charizard' 'CharizardMega Charizard X'
 'CharizardMega Charizard Y' 'Squirtle'] 

Main Type ['Grass' 'Fire' 'Water' 'Bug' 'Normal' 'Poison' 'Electric' 'Ground'
 'Fairy' 'Fighting'] 

Second Type ['Poison' nan 'Flying' 'Dragon' 'Ground' 'Fairy' 'Grass' 'Fighting'
 'Psychic' 'Steel'] 

Full Power [318 405 525 625 309 534 634 314 530 630] 

Hit Points [45 60 80 39 58 78 44 59 79 50] 

Attack [ 49  62  82 100  52  64  84 130 104  48] 

Defense [ 49  63  83 123  43  58  78 111  65  80] 

Special Attack [ 65  80 100 122  60 109 130 159  50  85] 

Special Defense [ 65  80 100 120  50  85 115  64 105  20] 

Speed [ 45  60  80  65 100  43  58  78  30  70] 

Generation [1 2 3 4 5 6] 

Legendary [False  True] 



In [35]:
# melihat jumlah missing data dalam dataset
df.isna().sum()

ID                   0
Pokemon              0
Main Type            0
Second Type        386
Full Power           0
Hit Points           0
Attack               0
Defense              0
Special Attack       0
Special Defense      0
Speed                0
Generation           0
Legendary            0
dtype: int64

In [36]:
# melihat persentase missing data
df.isnull().mean()

ID                 0.0000
Pokemon            0.0000
Main Type          0.0000
Second Type        0.4825
Full Power         0.0000
Hit Points         0.0000
Attack             0.0000
Defense            0.0000
Special Attack     0.0000
Special Defense    0.0000
Speed              0.0000
Generation         0.0000
Legendary          0.0000
dtype: float64

Persentase "Missing Values" sebesar **48,25%**

In [37]:
# melihat apakah Missing Values termasuk Standar Missing Values atau Non-Standar Missing Values
df['Second Type'].unique()

array(['Poison', nan, 'Flying', 'Dragon', 'Ground', 'Fairy', 'Grass',
       'Fighting', 'Psychic', 'Steel', 'Ice', 'Rock', 'Dark', 'Water',
       'Electric', 'Fire', 'Ghost', 'Bug', 'Normal'], dtype=object)

'Missing Values' termasuk **Standar Missing Values**

In [38]:
# mengisi missing values dengan 'Missing'
df['Second Type'].fillna('Missing', inplace=True)

In [39]:
# melihat jumlah missing data dalam dataset
df.isna().sum()

ID                 0
Pokemon            0
Main Type          0
Second Type        0
Full Power         0
Hit Points         0
Attack             0
Defense            0
Special Attack     0
Special Defense    0
Speed              0
Generation         0
Legendary          0
dtype: int64

#### 4.3 Menghapus kolom yang tidak dipakai

In [40]:
# membuat variabel data columns yang akan dihapus
del_columns = ['Generation', 'Legendary']

# menghapus columns yang tidak dipakai
df.drop(del_columns, inplace=True, axis=1)

In [41]:
# melihat dataset setelah menghapus columns yang tidak terpakai
df

Unnamed: 0,ID,Pokemon,Main Type,Second Type,Full Power,Hit Points,Attack,Defense,Special Attack,Special Defense,Speed
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80
4,4,Charmander,Fire,Missing,309,39,52,43,60,50,65
...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,600,50,100,150,100,150,50
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80


## 5. Explorasi Data

Dalam melakukan Explorasi data ini akan dilakukan pembagian untuk feature **"Hit Points"** menjadi 3 yaitu **"High"** , **Medium**, dan **"Low"** berdasarkan nilai quantile dari Features tersebut. 

In [42]:
# nilai 'High' feature "Hit Points"
df['Hit Points'].quantile([0.75])

0.75    80.0
Name: Hit Points, dtype: float64

In [43]:
# nilai 'Low' feature "Hit Points"
df['Hit Points'].quantile([0.25])

0.25    50.0
Name: Hit Points, dtype: float64

Dari nilai quantile diatas maka pembagian kategori untuk feature "Hit Points" adalah sebagai berikut:
* untuk kategori **High_Hit** adalah nilai "Hit Points" diatas **80**
* untuk kategori **Medium_Hit** adalah nilai "Hit Points" antara nilai **51 s/d 79**
* untuk kategori **Low_Hit** adalah nilai "Hit Points" antara nilai **dibawah 50**

#### 5.1 Query

In [44]:
df_query = df.drop('ID', axis=1)
df_query

Unnamed: 0,Pokemon,Main Type,Second Type,Full Power,Hit Points,Attack,Defense,Special Attack,Special Defense,Speed
0,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45
1,Ivysaur,Grass,Poison,405,60,62,63,80,80,60
2,Venusaur,Grass,Poison,525,80,82,83,100,100,80
3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80
4,Charmander,Fire,Missing,309,39,52,43,60,50,65
...,...,...,...,...,...,...,...,...,...,...
795,Diancie,Rock,Fairy,600,50,100,150,100,150,50
796,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110
797,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70
798,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80


In [45]:
# melakukan query data kategori high_hit untuk nilai feature "Hit Points" dengan nilai diatas 80 
high_hit = df_query[df_query['Hit Points'] >= 80]
high_hit

Unnamed: 0,Pokemon,Main Type,Second Type,Full Power,Hit Points,Attack,Defense,Special Attack,Special Defense,Speed
2,Venusaur,Grass,Poison,525,80,82,83,100,100,80
3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80
22,Pidgeot,Normal,Flying,479,83,80,75,70,70,101
23,PidgeotMega Pidgeot,Normal,Flying,579,83,80,80,135,80,121
36,Nidoqueen,Poison,Ground,505,90,92,87,75,85,76
...,...,...,...,...,...,...,...,...,...,...
793,Yveltal,Dark,Flying,680,126,131,95,131,98,99
794,Zygarde50% Forme,Dragon,Ground,600,108,100,121,81,95,95
797,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70
798,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80


In [46]:
# melakukan query data kategori medium_hit untuk nilai feature "Hit Points" dengan nilai antara 51 s/d 79
medium_hit = df_query.loc[ (df_query['Hit Points'] > 50) & (df_query['Hit Points'] < 80)]
medium_hit

Unnamed: 0,Pokemon,Main Type,Second Type,Full Power,Hit Points,Attack,Defense,Special Attack,Special Defense,Speed
1,Ivysaur,Grass,Poison,405,60,62,63,80,80,60
5,Charmeleon,Fire,Missing,405,58,64,58,80,65,80
6,Charizard,Fire,Flying,534,78,84,78,109,85,100
7,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100
8,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100
...,...,...,...,...,...,...,...,...,...,...
783,PumpkabooSuper Size,Ghost,Grass,335,59,66,70,44,55,41
784,GourgeistAverage Size,Ghost,Grass,494,65,90,122,58,75,84
785,GourgeistSmall Size,Ghost,Grass,494,55,85,122,58,75,99
786,GourgeistLarge Size,Ghost,Grass,494,75,95,122,58,75,69


In [47]:
# melakukan query data kategori low_hit untuk nilai feature "Hit Points" dengan nilai dibawah 50 
low_hit = df_query.loc[df_query['Hit Points'] <= 50 ]
low_hit

Unnamed: 0,Pokemon,Main Type,Second Type,Full Power,Hit Points,Attack,Defense,Special Attack,Special Defense,Speed
0,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45
4,Charmander,Fire,Missing,309,39,52,43,60,50,65
9,Squirtle,Water,Missing,314,44,48,65,50,64,43
13,Caterpie,Bug,Missing,195,45,30,35,20,20,45
14,Metapod,Bug,Missing,205,50,20,55,25,25,30
...,...,...,...,...,...,...,...,...,...,...
780,PumpkabooAverage Size,Ghost,Grass,335,49,66,70,44,55,51
781,PumpkabooSmall Size,Ghost,Grass,335,44,66,70,44,55,56
790,Noibat,Flying,Dragon,245,40,30,35,45,40,55
795,Diancie,Rock,Fairy,600,50,100,150,100,150,50


#### 5.2 Grouping

In [48]:
# group by multiple columns
df[
    (df["fran_id"] == "Spurs") &
    (df["year_id"] > 2010)
].groupby(["year_id", "game_result"])["game_id"].count()

KeyError: 'fran_id'

#### 5.3 Visualisasi 

In [None]:
hit_points = pd.DataFrame({"df": df_query.mean(),
                     "high_hit": high_hit.mean(),
                     "medium_hit": medium_hit.mean(),
                     "low_hit": low_hit.mean()
                    })

hit_points.plot(kind='bar', figsize=(15,8));

## 6. Pengambilan Keputusan

* Menangani missing values pada dataset
* mampu melakukan manipulasi kolom
* mampu membuat query terhadap dataset
* mampu melakukan grouping terhadap dataset
* mampu melakukan visualisasi sederhana menggunakan pandas

kesimpulan analisa : cari pokemon mana yang nilai attackenya tinggi, defensenya tinggi, dan visualisasi