<a href="https://colab.research.google.com/github/rivaldilambey22/rivaldilambey22/blob/main/Model_ANN_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

*Social Computing Big Data Laboratory - 2022*

----

## **Predicting Indonesia Stock Exchange Composite**

### **Import Packages**

In [1]:
# Import Packages
import pandas as pd
import numpy as np

# Import Modules
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

#Seed
import random
random.seed(10)

#Import Visualization
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

### **Load Data**

In [2]:
# Import the files to Google Colab
url = 'https://raw.githubusercontent.com/apriandito/dl-python/main/data/idx.csv'
df = pd.read_csv(url, sep=',',)

In [4]:
# Show the 5 first row
df.head(5)

Unnamed: 0,date,idx,exchange_rate_m2,interest_rate_m2,inflation_rate_m2,money_supply_m2
0,2006-02-01,1216.14,-0.336762,2.853,3.163729,-1.379015
1,2006-03-01,1322.97,-0.466187,2.853,3.139882,-1.385178
2,2006-04-01,1464.4,-0.72779,2.853,3.405172,-1.382946
3,2006-05-01,1330.0,-0.810401,2.853,2.755361,-1.382177
4,2006-06-01,1310.26,-0.879244,2.853,2.654015,-1.383458


In [5]:
# Show data information
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 142 entries, 0 to 141
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   date               142 non-null    object 
 1   idx                142 non-null    float64
 2   exchange_rate_m2   142 non-null    float64
 3   interest_rate_m2   142 non-null    float64
 4   inflation_rate_m2  142 non-null    float64
 5   money_supply_m2    142 non-null    float64
dtypes: float64(5), object(1)
memory usage: 6.8+ KB


### **Set Feature**

In [6]:
# Selecting feature, by remove the unused feature
feature = ['idx', 'date']
train_feature = df.drop(feature, axis=1)

# Set target
train_target = df['idx']

### **Split Data**

In [7]:
# Split Data
X_train, X_test, y_train, y_test = train_test_split(train_feature ,train_target, shuffle = True, test_size=0.5, random_state=1)

X_train

Unnamed: 0,exchange_rate_m2,interest_rate_m2,inflation_rate_m2,money_supply_m2
94,0.524049,-0.040183,0.543620,0.491152
10,-0.778458,1.800933,-0.061479,-1.279238
34,-0.598365,1.143392,1.571991,-0.898711
32,-0.859417,0.880375,1.595837,-1.000864
114,1.425616,0.091325,0.227657,1.107107
...,...,...,...,...
133,1.566055,-1.355266,-0.896098,1.562481
137,1.472980,-1.355266,-0.645712,1.711747
72,-0.857215,-0.697725,-0.806674,-0.059987
140,1.474081,-1.486774,-0.797732,1.784303


### **Training**

In [8]:
# Set model and training
mlp = MLPRegressor(hidden_layer_sizes=(5,5),
                    activation = 'relu',
                    solver = 'adam',
                    max_iter= 50000,
                    verbose = True).fit(X_train, y_train)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Iteration 16406, loss = 30427.47771812
Iteration 16407, loss = 30426.83532502
Iteration 16408, loss = 30425.29053550
Iteration 16409, loss = 30423.42146148
Iteration 16410, loss = 30422.16793549
Iteration 16411, loss = 30421.74805707
Iteration 16412, loss = 30420.71958946
Iteration 16413, loss = 30418.65495324
Iteration 16414, loss = 30416.86784480
Iteration 16415, loss = 30416.89877246
Iteration 16416, loss = 30415.30939643
Iteration 16417, loss = 30413.05641244
Iteration 16418, loss = 30411.74278886
Iteration 16419, loss = 30411.44230630
Iteration 16420, loss = 30410.51979013
Iteration 16421, loss = 30407.97016630
Iteration 16422, loss = 30408.15307426
Iteration 16423, loss = 30407.98866122
Iteration 16424, loss = 30406.34486371
Iteration 16425, loss = 30404.76596459
Iteration 16426, loss = 30403.95607320
Iteration 16427, loss = 30402.25670920
Iteration 16428, loss = 30399.83324058
Iteration 16429, loss = 30399.02749792

* Semakin rendah nilai loss, semakin baik performa model dalam memprediksi data pelatihan. Tujuan pelatihan adalah mencapai loss yang rendah sehingga model dapat digunakan untuk melakukan prediksi yang akurat pada data baru yang belum pernah dilihat sebelumnya.*

In [9]:
# Print model
print('Number of Layer =', mlp.n_layers_)
print('Number of Iteration =', mlp.n_iter_)
print('Current loss computed with the loss function =', mlp.loss_)

Number of Layer = 4
Number of Iteration = 21404
Current loss computed with the loss function = 28035.113010648427


### **Evaluation**

Melakukan prediksi pada data test.

In [10]:
# Make a prediction to test data
y_pred = mlp.predict(X_test)
y_pred

array([4349.55528691, 4782.10200909, 2012.39666307, 1906.58468814,
       5297.29592031, 2759.84278508, 2142.48428071, 2838.38107917,
       4070.91945014, 4972.85395852, 5380.62011675, 3101.36912768,
       4599.38969586, 4749.39868792, 2283.67560447, 2233.00288512,
       1372.50134735, 4934.25858184, 4489.7425912 , 2537.33791006,
       1902.20437393, 5025.55206398, 4686.52099118, 3811.62449411,
       3263.05598862, 2508.21002697, 2175.01221276, 4395.0995954 ,
       4201.64878017, 3870.46352611, 2349.0764028 , 2450.73493449,
       3792.12171191, 5023.44878879, 4789.88259446, 3913.341774  ,
       2514.13484361, 1636.45781212, 4706.02326368, 1698.11121289,
       1058.78763809, 5218.40288489, 5467.67178621, 2986.93196841,
       2977.29951149, 4187.95303301, 4318.49474799, 5551.47496696,
       1903.85780773, 3081.02376879, 5774.68586044, 1941.19484259,
       1616.65180681, 3251.71932608, 4931.75538324, 4522.96993805,
       1548.81596359, 4539.82613951, 3134.28111915, 4724.31113

#### Melakukan pendefinisian pada fungsi MAPE dan melakukan pengecekan hasil dari dataset

Mean Absolute Percentage Error (Kesalahan Persentase Mutlak Rata-rata) = salah satu metrik evaluasi yang umum digunakan untuk mengukur akurasi model dalam tugas regresi. MAPE mengukur sejauh mana perbedaan antara nilai aktual (y_test) dan nilai yang diprediksi (y_pred) sebagai persentase dari nilai aktual itu sendiri.

In [11]:
# Define MAPE function
def mape(y_test, y_pred):
    return np.mean(np.abs((y_pred - y_test) / y_test)) * 100

In [12]:
# Evaluation
print('MAE =', mean_absolute_error(y_test, y_pred))
print('RMSE', mean_squared_error(y_test, y_pred))
print('R2 =', r2_score(y_test, y_pred))
print('MAPE', mape(y_test, y_pred))

MAE = 296.1146332766306
RMSE 141440.14682391981
R2 = 0.9228589574124206
MAPE 9.625783753058537



1.   MAE (Mean Absolute Error): nilai rata-rata dari kesalahan absolut antara nilai aktual (y_test) dan nilai yang diprediksi (y_pred). Artinya bahwa rata-rata kesalahan absolut dari prediksi sekitar 283.16 unit
2.   RMSE (Root Mean Square Error): akar kuadrat dari rata-rata dari kuadrat kesalahan antara nilai aktual dan nilai yang diprediksi. RMSE sekitar 130,556.04.
3. R-squared (R^2) :  ukuran sejauh mana variasi dalam data dapat dijelaskan oleh model regresi. Nilai R-squared berkisar dari 0 hingga 1, di mana 0 berarti model tidak menjelaskan variasi sama sekali, dan 1 berarti model menjelaskan seluruh variasi. R2 = 0.93, yang berarti model Anda mampu menjelaskan sekitar 93% variasi dalam data uji.
4. MAPE (Mean Absolute Percentage Error) = mengukur rata-rata kesalahan dalam persentase dari nilai aktual. MAPE = 9,05% -> secara rata-rata, prediksi model memiliki kesalahan sekitar 9.05% terhadap nilai aktual.


**Semakin rendah MAE, RMSE, dan MAPE, serta semakin tinggi R-squared, semakin baik performa suatu model**


Kesimpulan : model memiliki performa yang sangat baik dalam memprediksi data.

