## **Muhammad Grandiv Lava Putra - Teknologi Informasi 22 - 22/493242/TK/54023**
## **Tujuan analisis:**
Memprediksi skor Matematika, Reading, dan Writing berdasarkan performa dan latar belakang demografik siswa.

In [None]:
# Mengimport packages yang diperlukan
import numpy as np
import pandas as pd
import datetime
import seaborn as sns
import matplotlib.pyplot as plt

## **Menampilkan Informasi Data**

In [None]:
df = pd.read_csv('exams.csv')
df.head()

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,female,group D,some college,standard,completed,59,70,78
1,male,group D,associate's degree,standard,none,96,93,87
2,female,group D,some college,free/reduced,none,57,76,77
3,male,group B,some college,free/reduced,none,70,70,63
4,female,group D,associate's degree,standard,none,83,85,86


In [None]:
df.shape

(1000, 8)

In [None]:
df.dtypes

gender                         object
race/ethnicity                 object
parental level of education    object
lunch                          object
test preparation course        object
math score                      int64
reading score                   int64
writing score                   int64
dtype: object

## **Cek Duplikasi**

In [None]:
df[df.duplicated()]

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score


Tampak bahwa tidak ada yang duplikat

## **Cek Null Values**

In [None]:
df.isnull().sum()

gender                         0
race/ethnicity                 0
parental level of education    0
lunch                          0
test preparation course        0
math score                     0
reading score                  0
writing score                  0
dtype: int64

Tampak bahwa tidak ada null values

## **Encoding: Label Encoder**

In [None]:
from sklearn.preprocessing import LabelEncoder

In [None]:
kolom_kategorikal = ['gender', 'race/ethnicity', 'parental level of education', 'lunch', 'test preparation course']

LE = LabelEncoder()

for kolom in kolom_kategorikal:
  df[kolom] = LE.fit_transform(df[kolom])

In [None]:
df.head()

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,0,3,4,1,0,59,70,78
1,1,3,0,1,1,96,93,87
2,0,3,4,0,1,57,76,77
3,1,1,4,0,1,70,70,63
4,0,3,0,1,1,83,85,86


In [None]:
df.dtypes

gender                         int64
race/ethnicity                 int64
parental level of education    int64
lunch                          int64
test preparation course        int64
math score                     int64
reading score                  int64
writing score                  int64
dtype: object

Semua data type sudah numerik.

## **Split Data untuk Persiapan Modeling**

In [None]:
X = df.drop(['math score', 'reading score', 'writing score'], axis = 1)
y_math = df['math score']
y_reading = df['reading score']
y_writing = df['writing score']

Normalisasi dengan Standard Scaler

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
scaler = StandardScaler()

scaler.fit(X)

X = scaler.transform(X)

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_math_train, y_math_test = train_test_split(X, y_math, test_size=0.2, random_state=92)
X_train, X_test, y_reading_train, y_reading_test = train_test_split(X, y_reading, test_size=0.2, random_state=92)
X_train, X_test, y_writing_train, y_writing_test = train_test_split(X, y_writing, test_size=0.2, random_state=92)


## **Linear Regression**

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_percentage_error

In [None]:
# Buat model regresi linier untuk skor matematika
linreg = LinearRegression()
linreg.fit(X_train, y_math_train)

# Buat model regresi linier untuk skor reading
linregreading = LinearRegression()
linregreading.fit(X_train, y_reading_train)

# Buat model regresi linier untuk skor writing
linregwriting = LinearRegression()
linregwriting.fit(X_train, y_writing_train)

y_math_pred = linreg.predict(X_test)
y_reading_pred = linregreading.predict(X_test)
y_writing_pred = linregwriting.predict(X_test)

**Evaluasi Akurasi**

In [None]:
# Evaluasi model untuk skor matematika
mape_math = mean_absolute_percentage_error(y_math_test, y_math_pred) * 100
print(f'MAPE for Math Score: {mape_math:.2f}%')

# Evaluasi model untuk skor reading
mape_reading = mean_absolute_percentage_error(y_reading_test, y_reading_pred) * 100
print(f'MAPE for Reading Score: {mape_reading:.2f}%')

# Evaluasi model untuk skor writing
mape_writing = mean_absolute_percentage_error(y_writing_test, y_writing_pred) * 100
print(f'MAPE for Writing Score: {mape_writing:.2f}%')

MAPE for Math Score: 19.25%
MAPE for Reading Score: 16.92%
MAPE for Writing Score: 17.96%


nilai Mean Absolute Percentage Error (MAPE) sudah cukup rendah, artinya model yang digunakan sudah cukup baik untuk melakukan prediksi skor Matematika, Reading, dan Writing. MAPE mengukur persentase rata-rata perbedaan absolut antara nilai aktual dan nilai yang diprediksi

## **Contoh Implementasi**

In [None]:
# Contoh input latar belakang demografik
inputan = [[0, 4, 3, 1, 0]]

inputan = scaler.transform(inputan)

# Lakukan prediksi berdasarkan inputan
y_math_pred = linreg.predict(inputan)
y_reading_pred = linregreading.predict(inputan)
y_writing_pred = linregwriting.predict(inputan)

# Tampilkan hasil prediksi
print("Prediksi skor matematika: ", y_math_pred)
print("Prediksi skor reading: ", y_reading_pred)
print("Prediksi skor writing: ", y_writing_pred)


Prediksi skor matematika:  [79.18939121]
Prediksi skor reading:  [85.24529228]
Prediksi skor writing:  [88.43514206]


