<a href="https://colab.research.google.com/github/hamzaharmanhusni/ProjectSkillAcademyPro/blob/main/02_machine_learning_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project: Machine Learning

**Instructions for Students:**

Please carefully follow these steps to complete and submit your project:

1. **Completing the Project**: You are required to work on and complete all tasks in the provided project. Be disciplined and ensure that you thoroughly engage with each task.
   
2. **Creating a Google Drive Folder**: Each of you must create a new folder on your Google Drive if you haven't already. This will be the repository for all your completed assignment and project files, aiding you in keeping your work organized and accessible.
   
3. **Uploading Completed Project**: Upon completion of your project, make sure to upload all necessary files, involving codes, reports, and related documents into the created Google Drive folder. Save this link in the 'Student Identity' section and also provide it as the last parameter in the `submit` function that has been provided.
   
4. **Sharing Folder Link**: You're required to share the link to your project Google Drive folder. This is crucial for the submission and evaluation of your project.
   
5. **Setting Permission toPublic**: Please make sure your Google Drive folder is set to public. This allows your instructor to access your solutions and assess your work correctly.

Adhering to these procedures will facilitate a smooth project evaluation process for you and the reviewers.

## Student Identity

In [None]:
# @title #### Student Identity
student_id = "REAJGDG4" # @param {type:"string"}
name = "Hamzah Arman Husni" # @param {type:"string"}
drive_link = "https://colab.research.google.com/drive/1yDKPtjXUd8I15jsEwR_Sv0KMUDASr4Gs#scrollTo=9fc57472-0432-474a-b1f7-c825edfc007a"  # @param {type:"string"}

assignment_id = "00_ml_project"

# Import grader package
!pip install rggrader
from rggrader import submit, submit_image



## Project Description

In this Machine Learning Project, you will create your own supervised Machine Learning (ML) model. We will use the full FIFA21 Dataset and we will identify players that are above average.

We will use the column "Overall" with a treshold of 75 to define players that are 'Valuable'. This will become our target output which we need for a supervised ML model. Because we use the "Overall" as our target output, you cannot use "Overall" in your features, this will be explained further below.

This project will provide a comprehensive overview of your abilities in machine learning, from understanding the problem, choosing the right model, training, and optimizing it.

## Grading Criteria

Your score will be awarded based on the following criteria:
* 100: The model has an accuracy of more than 80% and an F1 score of more than 85%. This model is excellent and demonstrates a strong understanding of the task.
* 90: The model has an accuracy of more than 75% and an F1 score of more than 80%. This model is very good, with some room for improvement.
* 80: The model has an accuracy of more than 70% and an F1 score between 70% and 80%. This model is fairly good but needs improvement in balancing precision and recall.
* 70: The model has an accuracy of more than 65% and an F1 score between 60% and 70%. This model is below average and needs significant improvement.
* 60 or below: The model has an accuracy of less than 65% or an F1 score of less than 60%, or the student did not submit the accuracy and F1 score. This model is poor and needs considerable improvement.

Rmember to make a copy of this notebook in your Google Drive and work in your own copy.

Happy modeling!

>Note: If you get the accuracy of 100% and F1 score of 100%, while it may earn you good grades, it's an indication of overfitting.

In [None]:
# Write any package/module installation that you need
# pip install goes here, this helps declutter your output below

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Load the dataset and clean it

In this task, you will prepare and load your dataset. You need to download the full FIFA 21 Dataset from the link here: [Kaggle FIFA Player Stats Database](https://www.kaggle.com/datasets/bryanb/fifa-player-stats-database?resource=download&select=FIFA21_official_data.csv).

>Note: Make sure you download FIFA 21 dataset.
>
>![FIFA21 Dataset](https://storage.googleapis.com/rg-ai-bootcamp/projects/fifa21_dataset-min.png)

After you download the dataset, you will then import the dataset then you will clean the data. For example there may be some empty cell in the dataset which you need to fill. Maybe there are also data that you need to convert to numeric value for analysis. Identify the data that is incomplete and fix them.

In the code block below, you can use the comments to guide you on what to do.

In [None]:
df = pd.read_csv("FIFA21_official_data.csv")

In [None]:
df.head()

Unnamed: 0,ID,Name,Age,Photo,Nationality,Flag,Overall,Potential,Club,Club Logo,...,SlidingTackle,GKDiving,GKHandling,GKKicking,GKPositioning,GKReflexes,Best Position,Best Overall Rating,Release Clause,DefensiveAwareness
0,176580,L. Suárez,33,https://cdn.sofifa.com/players/176/580/20_60.png,Uruguay,https://cdn.sofifa.com/flags/uy.png,87,87,Atlético Madrid,https://cdn.sofifa.com/teams/240/light_30.png,...,38.0,27.0,25.0,31.0,33.0,37.0,ST,87.0,€64.6M,57.0
1,192985,K. De Bruyne,29,https://cdn.sofifa.com/players/192/985/20_60.png,Belgium,https://cdn.sofifa.com/flags/be.png,91,91,Manchester City,https://cdn.sofifa.com/teams/10/light_30.png,...,53.0,15.0,13.0,5.0,10.0,13.0,CAM,91.0,€161M,68.0
2,212198,Bruno Fernandes,25,https://cdn.sofifa.com/players/212/198/20_60.png,Portugal,https://cdn.sofifa.com/flags/pt.png,87,90,Manchester United,https://cdn.sofifa.com/teams/11/light_30.png,...,55.0,12.0,14.0,15.0,8.0,14.0,CAM,88.0,€124.4M,72.0
3,194765,A. Griezmann,29,https://cdn.sofifa.com/players/194/765/20_60.png,France,https://cdn.sofifa.com/flags/fr.png,87,87,FC Barcelona,https://cdn.sofifa.com/teams/241/light_30.png,...,49.0,14.0,8.0,14.0,13.0,14.0,ST,87.0,€103.5M,59.0
4,224334,M. Acuña,28,https://cdn.sofifa.com/players/224/334/20_60.png,Argentina,https://cdn.sofifa.com/flags/ar.png,83,83,Sevilla FC,https://cdn.sofifa.com/teams/481/light_30.png,...,79.0,8.0,14.0,13.0,13.0,14.0,LB,83.0,€46.2M,79.0


In [None]:
df.isnull().sum()

ID                        0
Name                      0
Age                       0
Photo                     0
Nationality               0
                       ... 
GKReflexes                0
Best Position             0
Best Overall Rating       0
Release Clause         1629
DefensiveAwareness      942
Length: 65, dtype: int64

In [None]:
# Mendeteksi kolom-kolom numerik
numeric_columns = df.select_dtypes(include=['number']).columns

# Mengisi nilai null pada kolom-kolom numerik dengan nilai rata-rata
df[numeric_columns] = df[numeric_columns].fillna(df[numeric_columns].mean())

df.isnull().sum()

ID                        0
Name                      0
Age                       0
Photo                     0
Nationality               0
                       ... 
GKReflexes                0
Best Position             0
Best Overall Rating       0
Release Clause         1629
DefensiveAwareness        0
Length: 65, dtype: int64

In [None]:
nama_kolom = df.columns

print("Nama Kolom:")
print(nama_kolom)

Nama Kolom:
Index(['ID', 'Name', 'Age', 'Photo', 'Nationality', 'Flag', 'Overall',
       'Potential', 'Club', 'Club Logo', 'Value', 'Wage', 'Special',
       'Preferred Foot', 'International Reputation', 'Weak Foot',
       'Skill Moves', 'Work Rate', 'Body Type', 'Real Face', 'Position',
       'Jersey Number', 'Joined', 'Loaned From', 'Contract Valid Until',
       'Height', 'Weight', 'Crossing', 'Finishing', 'HeadingAccuracy',
       'ShortPassing', 'Volleys', 'Dribbling', 'Curve', 'FKAccuracy',
       'LongPassing', 'BallControl', 'Acceleration', 'SprintSpeed', 'Agility',
       'Reactions', 'Balance', 'ShotPower', 'Jumping', 'Stamina', 'Strength',
       'LongShots', 'Aggression', 'Interceptions', 'Positioning', 'Vision',
       'Penalties', 'Composure', 'Marking', 'StandingTackle', 'SlidingTackle',
       'GKDiving', 'GKHandling', 'GKKicking', 'GKPositioning', 'GKReflexes',
       'Best Position', 'Best Overall Rating', 'Release Clause',
       'DefensiveAwareness'],
      dtype

In [None]:
# Hapus kolom yang mengandung nilai null
df = df.dropna(axis=1)

# Tampilkan DataFrame setelah penghapusan kolom
df

Unnamed: 0,ID,Name,Age,Photo,Nationality,Flag,Overall,Potential,Club Logo,Value,...,StandingTackle,SlidingTackle,GKDiving,GKHandling,GKKicking,GKPositioning,GKReflexes,Best Position,Best Overall Rating,DefensiveAwareness
0,176580,L. Suárez,33,https://cdn.sofifa.com/players/176/580/20_60.png,Uruguay,https://cdn.sofifa.com/flags/uy.png,87,87,https://cdn.sofifa.com/teams/240/light_30.png,€31.5M,...,45.0,38.0,27.0,25.0,31.0,33.0,37.0,ST,87.0,57.000000
1,192985,K. De Bruyne,29,https://cdn.sofifa.com/players/192/985/20_60.png,Belgium,https://cdn.sofifa.com/flags/be.png,91,91,https://cdn.sofifa.com/teams/10/light_30.png,€87M,...,65.0,53.0,15.0,13.0,5.0,10.0,13.0,CAM,91.0,68.000000
2,212198,Bruno Fernandes,25,https://cdn.sofifa.com/players/212/198/20_60.png,Portugal,https://cdn.sofifa.com/flags/pt.png,87,90,https://cdn.sofifa.com/teams/11/light_30.png,€63M,...,67.0,55.0,12.0,14.0,15.0,8.0,14.0,CAM,88.0,72.000000
3,194765,A. Griezmann,29,https://cdn.sofifa.com/players/194/765/20_60.png,France,https://cdn.sofifa.com/flags/fr.png,87,87,https://cdn.sofifa.com/teams/241/light_30.png,€50.5M,...,54.0,49.0,14.0,8.0,14.0,13.0,14.0,ST,87.0,59.000000
4,224334,M. Acuña,28,https://cdn.sofifa.com/players/224/334/20_60.png,Argentina,https://cdn.sofifa.com/flags/ar.png,83,83,https://cdn.sofifa.com/teams/481/light_30.png,€22M,...,82.0,79.0,8.0,14.0,13.0,13.0,14.0,LB,83.0,79.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17103,247866,19 C. Miszta,16,https://cdn.sofifa.com/players/247/866/19_60.png,Poland,https://cdn.sofifa.com/flags/pl.png,50,70,https://cdn.sofifa.com/teams/1871/light_30.png,€50K,...,11.0,13.0,48.0,51.0,56.0,40.0,56.0,GK,49.0,47.014475
17104,251433,B. Voll,19,https://cdn.sofifa.com/players/251/433/20_60.png,Germany,https://cdn.sofifa.com/flags/de.png,51,63,https://cdn.sofifa.com/teams/27/light_30.png,€50K,...,12.0,10.0,52.0,55.0,52.0,42.0,57.0,GK,51.0,5.000000
17105,252420,T. Parker,18,https://cdn.sofifa.com/players/252/420/20_60.png,Northern Ireland,https://cdn.sofifa.com/flags/gb-nir.png,51,70,https://cdn.sofifa.com/teams/1923/light_30.png,€60K,...,10.0,11.0,50.0,49.0,50.0,53.0,55.0,GK,51.0,8.000000
17106,248182,H. Sveijer,18,https://cdn.sofifa.com/players/248/182/20_60.png,Sweden,https://cdn.sofifa.com/flags/se.png,49,63,https://cdn.sofifa.com/teams/113458/light_30.png,€50K,...,10.0,10.0,50.0,51.0,49.0,50.0,51.0,GK,49.0,8.000000


In [None]:
df.isnull().sum()

ID                          0
Name                        0
Age                         0
Photo                       0
Nationality                 0
Flag                        0
Overall                     0
Potential                   0
Club Logo                   0
Value                       0
Wage                        0
Special                     0
Preferred Foot              0
International Reputation    0
Weak Foot                   0
Skill Moves                 0
Work Rate                   0
Jersey Number               0
Height                      0
Weight                      0
Crossing                    0
Finishing                   0
HeadingAccuracy             0
ShortPassing                0
Volleys                     0
Dribbling                   0
Curve                       0
FKAccuracy                  0
LongPassing                 0
BallControl                 0
Acceleration                0
SprintSpeed                 0
Agility                     0
Reactions 

In [None]:
# Cari kolom dengan tipe data float64 untuk identifikasi fitur yang akan digunakan
kolom_float64 = df.select_dtypes(include=['float64','int64']).columns
kolom_float64

Index(['ID', 'Age', 'Overall', 'Potential', 'Special',
       'International Reputation', 'Weak Foot', 'Skill Moves', 'Jersey Number',
       'Crossing', 'Finishing', 'HeadingAccuracy', 'ShortPassing', 'Volleys',
       'Dribbling', 'Curve', 'FKAccuracy', 'LongPassing', 'BallControl',
       'Acceleration', 'SprintSpeed', 'Agility', 'Reactions', 'Balance',
       'ShotPower', 'Jumping', 'Stamina', 'Strength', 'LongShots',
       'Aggression', 'Interceptions', 'Positioning', 'Vision', 'Penalties',
       'Composure', 'Marking', 'StandingTackle', 'SlidingTackle', 'GKDiving',
       'GKHandling', 'GKKicking', 'GKPositioning', 'GKReflexes',
       'Best Overall Rating', 'DefensiveAwareness'],
      dtype='object')

In [None]:
df['Agility'] # Mencek nilai agility, apakah bisa digunakan sebagai fitur atau tidak. Ternyata nilai ini bisa digunakan sebagai fitur

0        76.0
1        78.0
2        79.0
3        91.0
4        82.0
         ... 
17103    22.0
17104    27.0
17105    28.0
17106    33.0
17107    39.0
Name: Agility, Length: 17108, dtype: float64

In [None]:
#Memastikan kembali apakah ada kolom yang bernilai Null
kolom_null = np.array(df.columns[df.isnull().any()])

print("Kolom-kolom dengan Nilai Null:")
print(kolom_null)

Kolom-kolom dengan Nilai Null:
[]


In [None]:
# Memastikan kembali kolom yang akan digunakan sebagai fitur
# Memanggil kolom dengan tipe data int64
kolom_int64 = df.select_dtypes(include='int64')
print("Kolom dengan tipe data int64:")
print(kolom_int64)

# Memanggil kolom dengan tipe data float64
kolom_float64 = df.select_dtypes(include='float64')

print("\nKolom dengan tipe data float64:")
print(kolom_float64)

Kolom dengan tipe data int64:
           ID  Age  Overall  Potential  Special
0      176580   33       87         87     2316
1      192985   29       91         91     2304
2      212198   25       87         90     2303
3      194765   29       87         87     2288
4      224334   28       83         83     2280
...       ...  ...      ...        ...      ...
17103  247866   16       50         70      766
17104  251433   19       51         63      760
17105  252420   18       51         70      753
17106  248182   18       49         63      747
17107  245862   18       47         65      731

[17108 rows x 5 columns]

Kolom dengan tipe data float64:
       International Reputation  Weak Foot  Skill Moves  Jersey Number  \
0                           5.0        4.0          3.0            9.0   
1                           4.0        5.0          4.0           17.0   
2                           2.0        4.0          4.0           18.0   
3                           4.0        

## Build and Train your model

In this task you will analyze the data and select the features that is best at predicting if the Player is a 'Valuable' player or not.

The first step is to **define the target output** that you will use for training. Here's an example of how to create a target output:
- `df['OK Player'] = df['Overall'].apply(lambda x: 1 if x >= 50 else 0) #Define the OK Player using treshold of 50.`

Next you will **identify the features** that will best predict a 'Valuable' player. You are required to **submit the features you selected** in the Submission section below. Because we use the "Overall" as our target output, the use of "Overall" in your features is not allowed. You will automatically get 0 if you submit "Overall" in your features.

Once you identify the features, you will then **split the data** into Training set and Testing/Validation set.

Depending on the features you selected, **you may need to scale the features**.

Now you will **train your model, choose the algorithm** you are going to use carefully to make sure it gives the best result.

Once you have trained your model, you need to test the model effectiveness. **Make predictions against your Testing/Validation set** and evaluate your model. You are required to **submit the Accuracy Score and F1 score** in the Submission section below.

In the code block below, you can use the comments to guide you on what to do.

We have also provided 3 variables that you must use in your code, `ml_features`, `ml_accuracy` and `ml_f1_score`. You can move the variables around your code, assign values to them, but you cannot delete them.

In [None]:
# Write your code here

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score

# Define the target output (Good >= 75)
df['Good Player'] = df['Overall'].apply(lambda x: 1 if x >= 75 else 0)

# Identify the features you will use in your model
ml_features  = ['Crossing', 'Finishing', 'HeadingAccuracy', 'ShortPassing', 'Volleys',
                'Dribbling', 'Curve', 'FKAccuracy', 'LongPassing', 'BallControl',
                'Acceleration', 'SprintSpeed', 'Agility', 'Reactions', 'Balance',
                'ShotPower', 'Jumping', 'Stamina', 'Strength', 'LongShots',
                'Aggression', 'Interceptions', 'Positioning', 'Vision', 'Penalties',
                'Composure', 'Marking', 'StandingTackle', 'SlidingTackle', 'GKDiving',
                'GKHandling', 'GKKicking', 'GKPositioning', 'GKReflexes']


feature_columns = df[ml_features].astype(int)  # Convert to int

target_variable = df['Good Player']

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(feature_columns, target_variable, test_size=0.2, random_state=42)

# Scale the features (if needed, optional)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train the model
model = RandomForestClassifier(random_state=42, n_estimators=1000)
model.fit(X_train_scaled, y_train)

# Make predictions using the test set
y_pred = model.predict(X_test_scaled)
print(f"Prediksi adalah {y_pred}")
# Evaluate the model
ml_accuracy = accuracy_score(y_test, y_pred)
ml_f1_score = f1_score(y_test, y_pred)

# Display the evaluation metrics
print(f"Accuracy Score: {ml_accuracy}")
print(f"F1 Score: {ml_f1_score}")


Prediksi adalah [0 0 0 ... 1 0 0]
Accuracy Score: 0.9742840444184687
F1 Score: 0.9004524886877828


In [None]:
y_pred.shape


(3422,)

In [None]:
X_test

Unnamed: 0,Crossing,Finishing,HeadingAccuracy,ShortPassing,Volleys,Dribbling,Curve,FKAccuracy,LongPassing,BallControl,...,Penalties,Composure,Marking,StandingTackle,SlidingTackle,GKDiving,GKHandling,GKKicking,GKPositioning,GKReflexes
3946,47,50,68,74,45,59,49,47,70,64,...,50,60,46,69,70,13,10,15,6,6
13607,21,45,65,64,26,45,24,21,54,55,...,37,54,46,65,64,5,15,10,6,7
4156,65,60,40,68,61,75,71,73,57,74,...,62,78,46,30,26,12,8,12,6,9
12581,33,68,70,54,64,53,33,28,48,56,...,67,69,46,12,11,9,9,10,13,7
17104,8,9,11,23,7,11,11,11,17,14,...,7,24,46,12,10,52,55,52,42,57
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10673,59,35,53,62,32,58,42,38,62,62,...,47,51,46,62,56,10,7,9,15,7
1602,73,71,57,71,74,82,76,75,66,78,...,66,80,46,35,28,12,6,9,15,8
457,73,72,75,74,64,76,74,57,68,77,...,71,74,46,61,57,10,6,12,7,15
4789,57,75,78,60,71,70,48,31,46,73,...,64,67,46,39,36,7,15,11,9,11


In [None]:
Tabel_1 = pd.merge(X_test, df["Name"],  left_index=True, right_index=True)
Tabel_1 = pd.merge(Tabel_1, df["Overall"], left_index=True, right_index=True)
Tabel_1 = pd.merge(Tabel_1, df['Good Player'], left_index=True, right_index=True)
Tabel_1

Unnamed: 0,Crossing,Finishing,HeadingAccuracy,ShortPassing,Volleys,Dribbling,Curve,FKAccuracy,LongPassing,BallControl,...,StandingTackle,SlidingTackle,GKDiving,GKHandling,GKKicking,GKPositioning,GKReflexes,Name,Overall,Good Player
3946,47,50,68,74,45,59,49,47,70,64,...,69,70,13,10,15,6,6,D. N'Dinga,71,0
13607,21,45,65,64,26,45,24,21,54,55,...,65,64,5,15,10,6,7,J. Cascante,65,0
4156,65,60,40,68,61,75,71,73,57,74,...,30,26,12,8,12,6,9,J. Iturbe,72,0
12581,33,68,70,54,64,53,33,28,48,56,...,12,11,9,9,10,13,7,20 Jon Ander,65,0
17104,8,9,11,23,7,11,11,11,17,14,...,12,10,52,55,52,42,57,B. Voll,51,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10673,59,35,53,62,32,58,42,38,62,62,...,62,56,10,7,9,15,7,E. Abedini,63,0
1602,73,71,57,71,74,82,76,75,66,78,...,35,28,12,6,9,15,8,Jovane Cabral,77,1
457,73,72,75,74,64,76,74,57,68,77,...,61,57,10,6,12,7,15,A. Ayew,76,1
4789,57,75,78,60,71,70,48,31,46,73,...,39,36,7,15,11,9,11,Deyverson,74,0


In [None]:
# Make predictions using the test set
Tabel = Tabel_1.reset_index(drop=True)
Tabel.insert(loc=len(Tabel.columns), column='y_pred', value=y_pred)
Tabel['Kategori'] = Tabel['y_pred'].replace({0: 'Kurang Bagus', 1: 'Bagus'})
Tabel

Unnamed: 0,Crossing,Finishing,HeadingAccuracy,ShortPassing,Volleys,Dribbling,Curve,FKAccuracy,LongPassing,BallControl,...,GKDiving,GKHandling,GKKicking,GKPositioning,GKReflexes,Name,Overall,Good Player,y_pred,Kategori
0,47,50,68,74,45,59,49,47,70,64,...,13,10,15,6,6,D. N'Dinga,71,0,0,Kurang Bagus
1,21,45,65,64,26,45,24,21,54,55,...,5,15,10,6,7,J. Cascante,65,0,0,Kurang Bagus
2,65,60,40,68,61,75,71,73,57,74,...,12,8,12,6,9,J. Iturbe,72,0,0,Kurang Bagus
3,33,68,70,54,64,53,33,28,48,56,...,9,9,10,13,7,20 Jon Ander,65,0,0,Kurang Bagus
4,8,9,11,23,7,11,11,11,17,14,...,52,55,52,42,57,B. Voll,51,0,0,Kurang Bagus
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3417,59,35,53,62,32,58,42,38,62,62,...,10,7,9,15,7,E. Abedini,63,0,0,Kurang Bagus
3418,73,71,57,71,74,82,76,75,66,78,...,12,6,9,15,8,Jovane Cabral,77,1,1,Bagus
3419,73,72,75,74,64,76,74,57,68,77,...,10,6,12,7,15,A. Ayew,76,1,1,Bagus
3420,57,75,78,60,71,70,48,31,46,73,...,7,15,11,9,11,Deyverson,74,0,0,Kurang Bagus


In [None]:
# Mengubah susunan kolom sehingga kolom 'name' menjadi kolom pertama
new_order = ['Name'] + [col for col in Tabel.columns if col != 'name']
Tabel = Tabel[new_order]
Tabel['Kebenaran'] = Tabel['Good Player'] == Tabel['y_pred']
# Menampilkan DataFrame dengan susunan kolom baru
Tabel


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Tabel['Kebenaran'] = Tabel['Good Player'] == Tabel['y_pred']


Unnamed: 0,Name,Crossing,Finishing,HeadingAccuracy,ShortPassing,Volleys,Dribbling,Curve,FKAccuracy,LongPassing,...,GKHandling,GKKicking,GKPositioning,GKReflexes,Name.1,Overall,Good Player,y_pred,Kategori,Kebenaran
0,D. N'Dinga,47,50,68,74,45,59,49,47,70,...,10,15,6,6,D. N'Dinga,71,0,0,Kurang Bagus,True
1,J. Cascante,21,45,65,64,26,45,24,21,54,...,15,10,6,7,J. Cascante,65,0,0,Kurang Bagus,True
2,J. Iturbe,65,60,40,68,61,75,71,73,57,...,8,12,6,9,J. Iturbe,72,0,0,Kurang Bagus,True
3,20 Jon Ander,33,68,70,54,64,53,33,28,48,...,9,10,13,7,20 Jon Ander,65,0,0,Kurang Bagus,True
4,B. Voll,8,9,11,23,7,11,11,11,17,...,55,52,42,57,B. Voll,51,0,0,Kurang Bagus,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3417,E. Abedini,59,35,53,62,32,58,42,38,62,...,7,9,15,7,E. Abedini,63,0,0,Kurang Bagus,True
3418,Jovane Cabral,73,71,57,71,74,82,76,75,66,...,6,9,15,8,Jovane Cabral,77,1,1,Bagus,True
3419,A. Ayew,73,72,75,74,64,76,74,57,68,...,6,12,7,15,A. Ayew,76,1,1,Bagus,True
3420,Deyverson,57,75,78,60,71,70,48,31,46,...,15,11,9,11,Deyverson,74,0,0,Kurang Bagus,True


In [None]:
Tabel["Kebenaran"].value_counts()  #Terdapat 3334 data yang benar dan 88 data yang salah.

True     3334
False      88
Name: Kebenaran, dtype: int64

In [None]:
# Prediksi nilai fitur yang bernilai tinggi
import random
angka_acak = [[random.randint(80, 100) for _ in range(34)]]
angka_acak = scaler.transform(angka_acak)
angka_acak



array([[1.87117211, 2.27642456, 1.97278913, 2.17895228, 2.6079342 ,
        1.45056876, 2.0747666 , 2.69167974, 2.39702478, 2.469822  ,
        1.80868854, 1.62880973, 1.64297091, 2.09385504, 2.16460334,
        1.82857793, 2.35728871, 1.55220664, 1.8160455 , 2.51945762,
        1.53863639, 1.72445288, 2.03213933, 3.05116728, 2.42008646,
        2.14881572, 6.67957183, 1.66059545, 1.74375219, 4.03705719,
        4.40183981, 4.41840667, 4.39692549, 4.41998242]])

In [None]:
# Pemain baik
if model.predict(angka_acak) == [1] :
  print("Performa bagus")
else :
  print("Performa kurang bagus")

Performa bagus


In [None]:
#Prediksi nilai fitur yang bernilai rendah

angka_acak_2 = [[random.randint(30, 75) for _ in range(34)]]
angka_acak_2 = scaler.transform(angka_acak)
angka_acak_2



array([[-2.75818853, -2.33216058, -3.01300733, -4.13082842, -2.37337339,
        -3.09057158, -2.59118102, -2.369385  , -3.44798676, -3.61496545,
        -4.31628078, -4.4027684 , -4.37629845, -6.58401836, -4.44515154,
        -4.28740976, -5.27561251, -4.00429847, -5.02669143, -2.41497581,
        -3.2232024 , -2.15688556, -2.60768302, -3.82031321, -3.02498347,
        -4.86864035, -7.74595519, -2.15950119, -2.08657499, -0.6876504 ,
        -0.68606737, -0.69028524, -0.6779024 , -0.65842395]])

In [None]:
# Pemain tidak baik
if model.predict(angka_acak_2) == [1] :
  print("Performa baik")
else :
  print("Performa tidak baik")

Performa tidak baik


## Submission

Once you are satisfied with the performance of your model, then you run the code block below to submit your project.


In [None]:
# Submit Method

# Do not change the code below
question_id = "01_ml_project_features"
submit(student_id, name, assignment_id, str(ml_features), question_id, drive_link)
question_id = "02_ml_project_accuracy"
submit(student_id, name, assignment_id, str(ml_accuracy), question_id, drive_link)
question_id = "03_ml_project_f1score"
submit(student_id, name, assignment_id, str(ml_f1_score), question_id, drive_link)

'Assignment successfully submitted'

## FIN