<a href="https://colab.research.google.com/github/sahdahx/project-dsf24/blob/master/Sahdahx_DSF24.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Study Case Assignment - Proyek Analisis Data Pokemon**
Tabel data Pokemon ini mencakup informasi lengkap tentang beberapa Pokemon, dengan setiap baris mewakili satu entitas Pokemon. Berikut adalah deskripsi singkat tentang kolom-kolom dalam tabel:

1. **#**: Nomor identifikasi unik untuk setiap Pokemon.
2. **Name**: Nama Pokemon.
3. **Type 1**: Tipe utama dari Pokemon (Contoh: Grass, Fire, Rock).
4. **Type 2**: Tipe kedua dari Pokemon (Beberapa Pokemon hanya memiliki satu tipe).
5. **Total**: Total nilai statistik Pokemon.
6. **HP (Hit Points)**: Jumlah poin kesehatan atau daya tahan Pokemon.
7. **Attack**: Keahlian serangan fisik Pokemon.
8. **Defense**: Keahlian bertahan atau ketahanan fisik Pokemon.
9. **Sp. Atk (Special Attack)**: Keahlian serangan khusus Pokemon.
10. **Sp. Def (Special Defense)**: Keahlian bertahan atau ketahanan khusus Pokemon.
11. **Speed**: Kecepatan Pokemon, mempengaruhi urutan beraksi dalam pertempuran.
12. **Generation**: Generasi tempat Pokemon diperkenalkan.
13. **Legendary**: Status legendaris; True jika Pokemon legendaris, False jika tidak.

Tabel ini memberikan gambaran komprehensif tentang karakteristik setiap Pokemon, termasuk tipe, statistik, dan status khusus seperti legendaris atau tidak. Data ini dapat digunakan untuk analisis perbandingan antar Pokemon atau tren generasi tertentu.

Tabel data bisa diakses disini [disini](https://bit.ly/data-pokemon-dsf).

**Project by: Sahda Huwaidah Estiningtyas**



In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Analisis Data dengan Machine Learning Regression pada Data Pokemon
Analisis Machine Learning Regression adalah pendekatan yang menggunakan teknik machine learning, khususnya regresi linear, untuk memahami dan memodelkan hubungan antara variabel input tertentu dengan variabel output pada dataset Pokemon.

Saya menganalisis data Pokemon menggunakan teknik Machine Learning Regression, khususnya Linear Regression, untuk membangun model yang dapat memprediksi nilai 'Total' Pokemon berdasarkan nilai fitur-fitur tertentu. Linear Regression memodelkan hubungan linier antara variabel-variabel independen (fitur) dan variabel dependen (Total). Tujuan analisis data ini diharapkan menghasilkan model yang dapat memprediksi nilai 'Total' Pokemon dengan akurasi yang tinggi berdasarkan fitur-fitur yang dipilih.

In [None]:
# Membaca data
pokemon = pd.read_csv("Pokemon.csv")
pokemon

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,600,50,100,150,100,150,50,6,True
796,719,DiancieMega Diancie,Rock,Fairy,700,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,600,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.impute import SimpleImputer

d

In [None]:
# Pemilihan Fitur: Memilih variabel-variabel yang akan dijadikan fitur dalam model.
features = ['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed']
X = pokemon[features]
y = pokemon['Total']

In [None]:
# Penanganan Nilai yang Hilang: Memeriksa dan tangani nilai yang hilang jika ada.
X = X.fillna(X.mean())
X

Unnamed: 0,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
0,45,49,49,65,65,45
1,60,62,63,80,80,60
2,80,82,83,100,100,80
3,80,100,123,122,120,80
4,39,52,43,60,50,65
...,...,...,...,...,...,...
795,50,100,150,100,150,50
796,50,160,110,160,110,110
797,80,110,60,150,130,70
798,80,160,60,170,130,80


In [None]:
y = y.fillna(y.mean())
y

0      318
1      405
2      525
3      625
4      309
      ... 
795    600
796    700
797    600
798    680
799    600
Name: Total, Length: 800, dtype: int64

In [None]:
# Pembagian Data: Membagi dataset menjadi set pelatihan (train set) dan set pengujian (test set).

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Menampilkan output pembagian data
print("X_train:")
print(X_train.head())
print("X_test:")
print(X_test.head())
print("y_train:")
print(y_train.head())
print("y_test:")
print(y_test.head())

X_train:
      HP  Attack  Defense  Sp. Atk  Sp. Def  Speed
264  100      75      115       90      115     85
615  105     140       55       30       55     95
329   50     105      125       55       95     50
342   65      73       55       47       75     85
394   95      23       48       23       48     23

X_test:
     HP  Attack  Defense  Sp. Atk  Sp. Def  Speed
696  92     105       90      125       90     98
667  75      75       75      125       95     40
63   55      70       45       70       50     60
533  50      65      107      105      107     86
66   65      65       65       50       50     90

y_train:
264    580
615    480
329    480
342    400
394    260
Name: Total, dtype: int64

y_test:
696    600
667    485
63     350
533    520
66     385
Name: Total, dtype: int64


In [None]:
# Pemodelan Linear Regression
model = LinearRegression()

# Pelatihan Model
model.fit(X_train, y_train)

# Menampilkan parameter hasil pelatihan
print("Koefisien (Coefficients):")
print(model.coef_)
print("Intercept:")
print(model.intercept_)

Koefisien (Coefficients):
[1. 1. 1. 1. 1. 1.]

Intercept:
2.2737367544323206e-13


In [None]:
# Prediksi
y_pred = model.predict(X_test)
y_pred

array([600., 485., 350., 520., 385., 488., 467., 485., 320., 495., 300.,
       240., 290., 510., 480., 300., 465., 355., 460., 518., 305., 525.,
       490., 307., 340., 320., 490., 490., 466., 316., 435., 335., 264.,
       420., 505., 579., 484., 290., 700., 531., 200., 330., 530., 375.,
       294., 535., 670., 384., 360., 520., 300., 480., 335., 340., 510.,
       458., 194., 630., 250., 525., 285., 534., 470., 618., 500., 770.,
       395., 494., 470., 490., 680., 470., 600., 450., 485., 420., 490.,
       290., 505., 281., 600., 600., 305., 490., 510., 335., 494., 630.,
       520., 500., 305., 600., 405., 565., 567., 280., 580., 341., 450.,
       520., 540., 318., 305., 485., 390., 460., 305., 289., 390., 458.,
       680., 535., 500., 405., 530., 325., 525., 470., 210., 500., 236.,
       295., 490., 405., 430., 250., 545., 438., 410., 319., 520., 220.,
       390., 350., 430., 330., 306., 266., 395., 700., 525., 520., 479.,
       455., 634., 405., 500., 380., 464., 580., 49

In [None]:
# Evaluasi Model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
print(f'R-squared: {r2}')

Mean Squared Error: 9.784399579637254e-27
R-squared: 1.0


In [None]:
# Visualisasi data prediksi vs nilai sebenarnya
plt.scatter(y_test, y_pred, c='orange', marker='*')  # 'D' untuk diamond shape
plt.xlabel("Total Sebenarnya", c='magenta')
plt.ylabel("Total Prediksi", c='magenta')
plt.title("Hasil prediksi Total Pokemon menggunakan Linear Regression", c='purple')
plt.show()

NameError: name 'plt' is not defined