# **Latar Belakang**

Prestasi belajar siswa merupakan indikator penting dalam menilai keberhasilan proses pendidikan. Namun, hasil akademik siswa tidak hanya dipengaruhi oleh kemampuan intelektual, tetapi juga oleh berbagai faktor sosial, ekonomi, dan lingkungan. Oleh karena itu, diperlukan analisis berbasis data untuk memahami faktor-faktor yang memengaruhi performa akademik siswa secara lebih komprehensif.

Dataset Student Performance dari UCI Machine Learning Repository menyediakan data nilai akademik siswa serta berbagai atribut pendukung seperti latar belakang keluarga, kebiasaan belajar, dan kondisi sosial. Dengan memanfaatkan dataset ini, proyek ini bertujuan untuk menganalisis hubungan antara faktor-faktor tersebut terhadap prestasi siswa serta membangun model prediksi nilai akhir. Hasil analisis diharapkan dapat membantu pendidik dalam mengambil keputusan yang lebih tepat guna meningkatkan kualitas pembelajaran.

source : https://archive.ics.uci.edu/dataset/320/student+performance

# **Import Library**

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
# Import library
import pandas as pd

# Eksekusi sintaks untuk custom tampilan
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
pd.set_option('display.float_format', '{:.2f}'.format)

# **Ekstraksi Data**

In [4]:
# Proses ekstraksi data
StudentMentalHealth = pd.read_csv('StudentMentalHealth_dirty.csv')

# Tampilkan data
display(StudentMentalHealth)

Unnamed: 0,Name,Gender,Age,Education Level,Screen Time (hrs/day),Sleep Duration (hrs),Physical Activity (hrs/week),Stress Level,Anxious Before Exams,Academic Performance Change
0,Aarav,Male,15.0,Class 8,7.1,8.90,9.30,Medium,No,Same
1,Meera,Female,25.0,MSc,3.3,-5.00,0.20,Medium,No,Same
2,Ishaan,Male,20.0,BTech,9.5,150.00,6.20,Medium,No,Same
3,Aditya,Male,20.0,BA,10.8,5.60,5.50,High,Yes,Same
4,Anika,Female,17.0,Class 11,2.8,5.40,3.10,Medium,Yes,Same
...,...,...,...,...,...,...,...,...,...,...
1245,Vivaan,Male,21.0,BTech,6.1,6.20,4.00,Low,Yes,Declined
1246,Arjun,Male,15.0,Class 9,5.5,8.00,6.90,Medium,Yes,Same
1247,Aarav,Male,22.0,,5.5,4.40,-5.00,Medium,No,Declined
1248,Ananya,Female,16.0,Class 10,10.4,8.00,6.60,High,No,Declined


# **Data Dictionary**

<table border="1" cellspacing="0" cellpadding="4">
  <tr>
    <th>Kolom</th>
    <th>Deskripsi</th>
  </tr>

  <tr>
    <td>Name</td>
    <td>Nama mahasiswa/responden.</td>
  </tr>

  <tr>
    <td>Gender</td>
    <td>Jenis kelamin responden.</td>
  </tr>

  <tr>
    <td>Age</td>
    <td>Usia responden dalam tahun.</td>
  </tr>

  <tr>
    <td>Education Level</td>
    <td>Tingkat pendidikan yang sedang ditempuh.</td>
  </tr>

  <tr>
    <td>Screen Time (hrs/day)</td>
    <td>Durasi penggunaan layar per hari dalam jam.</td>
  </tr>

  <tr>
    <td>Sleep Duration (hrs)</td>
    <td>Rata-rata durasi tidur per hari dalam jam.</td>
  </tr>

  <tr>
    <td>Physical Activity (hrs/week)</td>
    <td>Total durasi aktivitas fisik per minggu dalam jam.</td>
  </tr>

  <tr>
    <td>Stress Level</td>
    <td>Tingkat stres (biasanya skala 1–10).</td>
  </tr>

  <tr>
    <td>Anxious Before Exams</td>
    <td>Indikasi apakah responden merasa cemas sebelum ujian (Yes/No).</td>
  </tr>

  <tr>
    <td>Academic Performance Change</td>
    <td>Perubahan performa akademik</td>
  </tr>
</table>


In [5]:
# Periksa informasi umum pada data
StudentMentalHealth.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1250 entries, 0 to 1249
Data columns (total 10 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Name                          1240 non-null   object 
 1   Gender                        1239 non-null   object 
 2   Age                           1240 non-null   object 
 3   Education Level               1240 non-null   object 
 4   Screen Time (hrs/day)         1236 non-null   object 
 5   Sleep Duration (hrs)          1232 non-null   float64
 6   Physical Activity (hrs/week)  1243 non-null   float64
 7   Stress Level                  1238 non-null   object 
 8   Anxious Before Exams          1233 non-null   object 
 9   Academic Performance Change   1235 non-null   object 
dtypes: float64(2), object(8)
memory usage: 97.8+ KB


# **Analisa Univariate**

# Kolom Kategorik

In [8]:
# Seleksi kolom dengan tipe object
kolom_kategorik = StudentMentalHealth.select_dtypes(include=['object'])

# Tampilkan frekuensi
for col in kolom_kategorik:
    print(f'> Frekuensi \033[93m{col}\033[0m')
    print(f'  Terdapat {StudentMentalHealth[col].nunique(dropna = False)} data unik\n')
    display(StudentMentalHealth[col].value_counts(dropna = False).reset_index())
    print('\n')

> Frekuensi [93mName[0m
  Terdapat 31 data unik



Unnamed: 0,Name,count
0,Shaurya,77
1,Kavya,68
2,Meera,66
3,Aadhya,66
4,Diya,64
5,Arjun,64
6,Anika,61
7,Krishna,61
8,Myra,60
9,Reyansh,60




> Frekuensi [93mGender[0m
  Terdapat 4 data unik



Unnamed: 0,Gender,count
0,Female,592
1,Male,578
2,Other,69
3,,11




> Frekuensi [93mAge[0m
  Terdapat 17 data unik



Unnamed: 0,Age,count
0,17.0,121
1,21.0,120
2,23.0,115
3,15.0,108
4,20.0,104
5,16.0,104
6,25.0,96
7,19.0,96
8,26.0,90
9,22.0,88




> Frekuensi [93mEducation Level[0m
  Terdapat 12 data unik



Unnamed: 0,Education Level,count
0,MSc,172
1,MTech,172
2,MA,164
3,Class 10,116
4,Class 11,115
5,BSc,105
6,BTech,104
7,Class 9,100
8,BA,78
9,Class 8,58




> Frekuensi [93mScreen Time (hrs/day)[0m
  Terdapat 106 data unik



Unnamed: 0,Screen Time (hrs/day),count
0,6.9,22
1,4.8,21
2,4.5,21
3,6.3,20
4,9.9,19
...,...,...
101,12.0,6
102,6.5,6
103,9.2,4
104,8.5,4




> Frekuensi [93mStress Level[0m
  Terdapat 4 data unik



Unnamed: 0,Stress Level,count
0,Medium,625
1,Low,398
2,High,215
3,,12




> Frekuensi [93mAnxious Before Exams[0m
  Terdapat 3 data unik



Unnamed: 0,Anxious Before Exams,count
0,Yes,634
1,No,599
2,,17




> Frekuensi [93mAcademic Performance Change[0m
  Terdapat 4 data unik



Unnamed: 0,Academic Performance Change,count
0,Same,486
1,Improved,377
2,Declined,372
3,,15






Ditemukan karakter pada kolom numerik Age dan Screen Time (hrs/day), sehingga perlu dilakukan Standarisasi Numerik.

In [9]:
# Ubah string 'twenty' menjadi numerik pada kolom Age
StudentMentalHealth['Age'] = StudentMentalHealth['Age'].replace('twenty', 20)

# Konversi tipe data
StudentMentalHealth['Age'] = pd.to_numeric(StudentMentalHealth['Age'])

In [11]:
# Ubah string 'unknown' pada kolom Screen Time (hrs/day) menjadi None (kosong)
StudentMentalHealth['Screen Time (hrs/day)'] = StudentMentalHealth['Screen Time (hrs/day)'].replace('unknown', None)

#Konversi tipe data
StudentMentalHealth['Screen Time (hrs/day)'] = pd.to_numeric(StudentMentalHealth['Screen Time (hrs/day)'])

# Kolom Numerik

In [12]:
# Seleksi kolom dengan tipe numerik
kolom_numerik = StudentMentalHealth.select_dtypes(include=['int64', 'float64'])

# Statistik deskriptif numerik
display(StudentMentalHealth[kolom_numerik.columns].describe().T)

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Age,1240.0,21.53,15.43,1.0,17.0,20.0,23.0,200.0
Screen Time (hrs/day),1235.0,8.17,13.47,-5.0,4.4,6.9,9.5,150.0
Sleep Duration (hrs),1232.0,8.2,15.18,-5.0,5.1,6.5,7.8,150.0
Physical Activity (hrs/week),1243.0,7.03,16.19,-5.0,2.7,5.1,7.7,150.0


# **Missing Value**

# Jumlah Missing Value

In [13]:
# Tampilkan missing value pada data
display(StudentMentalHealth.isna().sum())

Unnamed: 0,0
Name,10
Gender,11
Age,10
Education Level,10
Screen Time (hrs/day),15
Sleep Duration (hrs),18
Physical Activity (hrs/week),7
Stress Level,12
Anxious Before Exams,17
Academic Performance Change,15


# Handling Missing Value - Tipe Kategorik

In [14]:
# Tampilkan ukuran awal
StudentMentalHealth.shape

(1250, 10)

In [17]:
# Hapus baris yang Academic Performance Change kosong
StudentMentalHealth = StudentMentalHealth.dropna(subset = ['Academic Performance Change'])

StudentMentalHealth.shape

(1235, 10)

In [18]:
# Isi Gender yang kosong dengan 'Other'
StudentMentalHealth['Gender'] = StudentMentalHealth['Gender'].fillna('Other')

# Isi Education Level, Stress Level dan Anxious Before Exams yang kosong dengan 'Unknown'
StudentMentalHealth['Education Level'] = StudentMentalHealth['Education Level'].fillna('Unknown')
StudentMentalHealth['Stress Level'] = StudentMentalHealth['Stress Level'].fillna('Unknown')
StudentMentalHealth['Anxious Before Exams'] = StudentMentalHealth['Anxious Before Exams'].fillna('Unknown')

# Handling Missing Value - Tipe Numerik

In [19]:
# Isikan setiap missing value pada kolom numerik dengan data median
for col in kolom_numerik:
  median_data = StudentMentalHealth[col].median()
  StudentMentalHealth[col] = StudentMentalHealth[col].fillna(median_data)

# Periksa Kembali

In [20]:
# Periksa missing value
display(StudentMentalHealth.isna().sum())

Unnamed: 0,0
Name,10
Gender,0
Age,0
Education Level,0
Screen Time (hrs/day),0
Sleep Duration (hrs),0
Physical Activity (hrs/week),0
Stress Level,0
Anxious Before Exams,0
Academic Performance Change,0


# **Duplicated Data**

# Hitung Jumlah Duplikasi Data

In [21]:
print(f'Jumlah data saat ini : {StudentMentalHealth.shape}')
print(f'Jumlah data duplikat : {StudentMentalHealth.duplicated(keep = False).sum()}')

Jumlah data saat ini : (1235, 10)
Jumlah data duplikat : 342


# Hapus Duplikasi Data

In [22]:
# Hapus duplikasi data
StudentMentalHealth = StudentMentalHealth.drop_duplicates()
print(f'Jumlah data saat ini : {StudentMentalHealth.shape}')

Jumlah data saat ini : (1056, 10)


# **Outlier**

# Boxplot untuk mendeteksi Outlier

In [23]:
import plotly.express as px

def box_plot(series, column_name, color):
  # Buat horizontal box plot
  fig = px.box(
      series,
      orientation = 'h',
      color_discrete_sequence = [color]
  )

  # Update layout dan display plot
  fig.update_layout(
      title = f'<b>Box Plot {column_name}</b>',
      yaxis = dict(
          title = '',
          showgrid = False,
          showline = False,
          showticklabels = False,
          zeroline = False,
      ),
      xaxis = dict(
          title = '',
          showgrid = False,
          showline = True,
          showticklabels = True,
          zeroline = False,
      )
  )

  fig.show()

In [24]:
for col in kolom_numerik:
  box_plot(StudentMentalHealth[col], col, '#0072B2')

# Teknik Winsorizing untuk mengatasi Outlier

In [25]:
# Fungsi untuk teknik winsorizing
def teknik_winsorizing(series):
  # Hitung Q1, A3, dan IQR
  Q1 = series.quantile(0.25)
  Q3 = series.quantile(0.75)
  IQR = Q3 - Q1

  #Hitung lower dan upper bound
  lower_bound = Q1 - 1.5 * IQR
  upper_bound = Q3 + 1.5 * IQR

  # Jika lower bound negatif maka ubah menjadi 0
  if (lower_bound < 0):
    lower_bound = 0

  # Winsorizing: clip nilai ke batas bawah dan atas
  series = series.astype(pd.Float64Dtype())
  winsorized_series = series.clip(lower = lower_bound, upper = upper_bound)

  return (winsorized_series)

In [26]:
for col in kolom_numerik:
  StudentMentalHealth[col] = teknik_winsorizing(StudentMentalHealth[col])
  box_plot(StudentMentalHealth[col], col, '#0072B2')

# Distribusi

In [29]:
# Your code
import plotly.express as px

for col in kolom_numerik:
  fig = px.histogram(
      StudentMentalHealth,
      x = col,
      nbins = 25,
      color_discrete_sequence = ['#0072B2']
  )

  fig.update_yaxes(
      showgrid = False,
      showticklabels = False,
      title = ''
  )

  fig.update_layout(
      title = {
          'text': f'Distribusi <b><span style = "color: #0072B2"></span> {col} </b>',
          'y': 0.92,
          'x': 0.5,
          'xanchor': 'center',
          'yanchor': 'top'
      },
      plot_bgcolor = 'rgba(0,0,0,0)',
      bargap = 0.01,
      title_font = dict(size = 25)
  )

  fig.show()

# **Analisis Multivariate**

# Korelasi

In [31]:
# Hitung korelasi
data_corr = StudentMentalHealth.select_dtypes(include=['int64', 'float64']).corr()

# Tampilkan hasil
display(data_corr)

Unnamed: 0,Age,Screen Time (hrs/day),Sleep Duration (hrs),Physical Activity (hrs/week)
Age,1.0,0.01,0.02,-0.0
Screen Time (hrs/day),0.01,1.0,-0.0,0.02
Sleep Duration (hrs),0.02,-0.0,1.0,-0.02
Physical Activity (hrs/week),-0.0,0.02,-0.02,1.0


In [32]:
import plotly.express as px

# Buat grafik
fig = px.imshow(
    data_corr,
    color_continuous_scale = 'blues',
    title = '<b>Korelasi Kolom Numerik Data Student Mental Health</b><br>',
    text_auto = True
)

# Sembunyikan skala/rentang korelasi
fig.update_coloraxes(showscale = False)

# Atur judul heatmap
fig.update_layout(
    title = dict(
        x = 0.5,
        y = 0.9,
        xanchor = 'center',
        yanchor = 'top'
    ),
    width = 1000,
    height = 800
)

# Tampilkan heatmap
fig.show()

# **Statistik Tiap Kategori Academic Performance Change**

In [33]:
StudentMentalHealth.groupby(['Academic Performance Change']).agg(
    total_data = ('Name', 'count'),
    min_age = ('Age', 'min'),
    median_age = ('Age', 'median'),
    max_age = ('Age', 'max'),
    min_screen_time = ('Screen Time (hrs/day)', 'min'),
    median_screen_time = ('Screen Time (hrs/day)', 'median'),
    max_screen_time = ('Screen Time (hrs/day)', 'max'),
    min_sleep_duration = ('Sleep Duration (hrs)', 'min'),
    median_sleep_duration = ('Sleep Duration (hrs)', 'median'),
    max_sleep_duration = ('Sleep Duration (hrs)', 'max')
)

Unnamed: 0_level_0,total_data,min_age,median_age,max_age,min_screen_time,median_screen_time,max_screen_time,min_sleep_duration,median_sleep_duration,max_sleep_duration
Academic Performance Change,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Declined,310,8.0,20.0,32.0,0.0,6.9,17.0,1.3,6.35,11.7
Improved,317,8.0,21.0,32.0,0.0,7.0,17.0,1.3,6.5,11.7
Same,419,8.0,20.0,32.0,0.0,6.9,17.0,1.3,6.6,11.7


# **Data Preprocessing**

# Feature Selection

In [34]:
# Copy DataFrame
data_preprocessing = StudentMentalHealth.copy()

# Hapus kolom name
data_preprocessing = data_preprocessing.drop(columns = ['Name'])

# Tampilkan hasilnya
data_preprocessing.head()

Unnamed: 0,Gender,Age,Education Level,Screen Time (hrs/day),Sleep Duration (hrs),Physical Activity (hrs/week),Stress Level,Anxious Before Exams,Academic Performance Change
0,Male,15.0,Class 8,7.1,8.9,9.3,Medium,No,Same
1,Female,25.0,MSc,3.3,1.3,0.2,Medium,No,Same
2,Male,20.0,BTech,9.5,11.7,6.2,Medium,No,Same
3,Male,20.0,BA,10.8,5.6,5.5,High,Yes,Same
4,Female,17.0,Class 11,2.8,5.4,3.1,Medium,Yes,Same


# Encoding

In [35]:
# Definisikan ulang kolom kategorik dan kolom numerik
kolom_kategorik = data_preprocessing.select_dtypes(include=['object']).columns
kolom_numerik = data_preprocessing.select_dtypes(include=['int64', 'float64']).columns

In [36]:
# Import library
from sklearn.preprocessing import OrdinalEncoder

# Panggil object Ordinal Encoder
ord_enc = OrdinalEncoder()

# Terapkan
data_preprocessing[kolom_kategorik] = ord_enc.fit_transform(
    data_preprocessing[kolom_kategorik]
)

# Tampilkan hasilnya
display(data_preprocessing)

Unnamed: 0,Gender,Age,Education Level,Screen Time (hrs/day),Sleep Duration (hrs),Physical Activity (hrs/week),Stress Level,Anxious Before Exams,Academic Performance Change
0,1.00,15.00,6.00,7.10,8.90,9.30,2.00,0.00,2.00
1,0.00,25.00,9.00,3.30,1.30,0.20,2.00,0.00,2.00
2,1.00,20.00,2.00,9.50,11.70,6.20,2.00,0.00,2.00
3,1.00,20.00,0.00,10.80,5.60,5.50,0.00,2.00,2.00
4,0.00,17.00,4.00,2.80,5.40,3.10,2.00,2.00,2.00
...,...,...,...,...,...,...,...,...,...
1241,2.00,18.00,5.00,10.90,5.00,4.80,1.00,2.00,2.00
1242,0.00,22.00,9.00,5.70,1.30,3.40,2.00,0.00,2.00
1243,1.00,15.00,7.00,4.80,8.70,3.00,1.00,1.00,2.00
1247,1.00,22.00,11.00,5.50,4.40,0.00,2.00,0.00,0.00


In [38]:
for col, categories in zip(kolom_kategorik, ord_enc.categories_):
  print(f'Mapping untuk \033[93m{col}\033[0m:\n')
  for i, cat in enumerate(categories):
    print(f'{cat} → {i}')
  print()

Mapping untuk [93mGender[0m:

Female → 0
Male → 1
Other → 2

Mapping untuk [93mEducation Level[0m:

BA → 0
BSc → 1
BTech → 2
Class 10 → 3
Class 11 → 4
Class 12 → 5
Class 8 → 6
Class 9 → 7
MA → 8
MSc → 9
MTech → 10
Unknown → 11

Mapping untuk [93mStress Level[0m:

High → 0
Low → 1
Medium → 2
Unknown → 3

Mapping untuk [93mAnxious Before Exams[0m:

No → 0
Unknown → 1
Yes → 2

Mapping untuk [93mAcademic Performance Change[0m:

Declined → 0
Improved → 1
Same → 2



# Splitting Data

In [41]:
# Import library
from sklearn.model_selection import train_test_split

# Variabel X untuk fitur dan variabel y untuk target
X = data_preprocessing.drop(columns = ['Academic Performance Change'])
y = data_preprocessing['Academic Performance Change']


# Proses splitting data
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size = 0.25,
    random_state = 42
)

In [42]:
# Tampilkan frekuensi kemunculan target
display(y.value_counts())

Unnamed: 0_level_0,count
Academic Performance Change,Unnamed: 1_level_1
2.0,424
1.0,320
0.0,312


*   Declined → 0
*   Improved → 1
*   Same → 2

# Imbalance Target

In [43]:
# Import library
from imblearn.over_sampling import SMOTE

# Buat object SMOTE
smote = SMOTE(random_state = 42)

# Terapkan pada X_train dan X_test dan buat variabel baru hasil oversampling
X_train_resample, y_train_resample = smote.fit_resample(
    X_train, y_train
)

In [44]:
# Tampilkan frekuensi kemunculan target
display(y_train_resample.value_counts())

Unnamed: 0_level_0,count
Academic Performance Change,Unnamed: 1_level_1
1.0,322
0.0,322
2.0,322


# Modeling

In [46]:
from imblearn.pipeline import Pipeline
from sklearn.preprocessing import RobustScaler
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

pipeline = Pipeline(steps = [
    ('scaler', RobustScaler()),
    ('smote', SMOTE(random_state = 42)),
    ('model', DecisionTreeClassifier(random_state = 42))
])

param_grid = {
    'smote__k_neighbors': [3, 5, 7],
    'model__criterion': ['gini', 'entropy', 'log_loss'],
    'model__max_depth': [None, 5, 10, 20, 30],
    'model__min_samples_split': [2, 5, 10],
    'model__min_samples_leaf': [1, 2, 5],
    'model__class_weight': [None, 'balanced']
}

grid = GridSearchCV(
    estimator = pipeline,
    param_grid = param_grid,
    cv = 5,
    scoring = 'f1_macro',
    n_jobs = -1
)

grid.fit(X_train, y_train)

print('Best Params:', grid.best_params_)
print('Best CV Score:', grid.best_score_)

Best Params: {'model__class_weight': None, 'model__criterion': 'gini', 'model__max_depth': 20, 'model__min_samples_leaf': 1, 'model__min_samples_split': 2, 'smote__k_neighbors': 7}
Best CV Score: 0.4025499500162767


In [47]:
# Import library
from sklearn.model_selection import StratifiedKFold, cross_val_score

# Masukkan parameter yang telah di-tuning
model_dtc = DecisionTreeClassifier(
    criterion = 'entropy',
    max_depth = 30,
    min_samples_leaf = 1,
    min_samples_split = 5,
    splitter = 'best'
)

# Definisikan pipeline baru
new_pipe = Pipeline([
    ('scaler', RobustScaler()),
    ('smote', SMOTE(random_state = 42)),
    ('model', model_dtc)
])

# Terapkan pada CV
cv = StratifiedKFold(
    n_splits = 5,
    shuffle = True,
    random_state = 42
)

scores = cross_val_score(new_pipe, X, y, cv = cv, scoring = 'accuracy')

print(f'Scores: {scores}')
print(f'Mean: {scores.mean()}')
print(f'Std: {scores.std()}')

Scores: [0.34433962 0.3507109  0.4028436  0.33649289 0.32701422]
Mean: 0.35228024680318343
Std: 0.026494459996775865


In [48]:
model_dtc.fit(X_train_resample, y_train_resample)

# Predict test set labels
y_pred = model_dtc.predict(X_test)

In [49]:
from sklearn import metrics

# Menghitung dan mencetak laporan klasifikasi
classification_report = metrics.classification_report(y_pred, y_test)

# Tampilkan hasil
print(classification_report)

              precision    recall  f1-score   support

         0.0       0.37      0.36      0.36        84
         1.0       0.33      0.31      0.32        85
         2.0       0.38      0.41      0.40        95

    accuracy                           0.36       264
   macro avg       0.36      0.36      0.36       264
weighted avg       0.36      0.36      0.36       264



In [53]:
import plotly.express as px
from sklearn.metrics import confusion_matrix

# Plot dengan Plotly
fig = px.imshow(
    confusion_matrix(y_test, y_pred),
    text_auto = True,
    color_continuous_scale = 'Blues',
    title = '<b>Confusion Matrix Model Decision Tree</b>',
)

# Ubah tampilan xticks dan yticks
fig.update_xaxes(
    tickmode = 'array',
    tickvals = [0, 1, 2],
    ticktext = ['Declined', 'Improved', 'Same']
)

fig.update_yaxes(
    tickmode = 'array',
    tickvals = [0, 1, 2],
    ticktext = ['Declined', 'Improved', 'Same'],
    tickangle = -90
)

# Hapus legend/colorbar
fig.update_layout(coloraxis_showscale = False)

# Judul dan label, serta ukuran figure
fig.update_layout(
    title = dict(
        x = 0.5,
        y = 0.9,
        xanchor = 'center',
        yanchor = 'top'
    ),
    xaxis_title = '<b>Prediksi</b>',
    yaxis_title = '<b>Nilai Sebenarnya </b>',
    width = 700,
    height = 700
)

# Tampilkan grafik
fig.show()