## Perbandingan Metode ARIMA dan LSTM dalam Menentukan Faktor - Faktor yang Mempengaruhi Trend pertumbuhan Saham PT. Telkom
Nama : Muhammad Akbar Ramadhan

NIM : 3332200108

Kelas : NiaPY

## Problem Scoping

### **1. Latar Belakang**
Penelitian ini dilakukan untuk menentukan faktor - faktor yang mempengaruhi trend pertumbuhan saham pada PT. Telkom


### **2. Tujuan**
Penelitian ini bertujuan untuk membandingkan metode ARIMA dan LSTM untuk mengetahui faktor - faktor apa saja yang mempengaruhi trend pertumbuhan saham PT. Telkom

### **3. Rumusan Masalah**

*   Apa itu metode Arima dan LTSM yang dibandingkan pada penelitian ini
*   Manakah metode yang paling tepat untuk melakukan Forecasting pada data Saham PT. Telkom
*   Mana akurasi terbaik dan MSE yang paling kecil diantara dua metode tersebut






### **4. Ruang Lingkup**
Penelitian ini akan dilakukan pengambilan data saham PT. Telkom pada rentang waktu 2017 - 2022, serta faktor - faktor yang mempengaruhi trend pertumbuhan berdasarkan pada variabel yang ada pada dataset yang digunakan  

### **5. Metodologi**
Penelitian ini akan menggunakan data historis harga saham PT. Telkom selama periode 2017 - 2022 dengan menggunakan metode ARIMA dan LSTM untuk menganalisis faktor - faktor pengaruh trend pertumbuhan saham PT. Telkom

## Data Collection

### **1. Import Library dan Dataset**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_theme(style="whitegrid")
import warnings
warnings.filterwarnings("ignore")


In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/mhmmdakbar1812/Data_Historis_PT_Telkom/main/TLKM.JK.csv', header=None)
df.head(10)

In [None]:
df[['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']] = df[0].str.split(',', expand=True)
df.drop([0], axis=1, inplace=True)
df.head()

## Data Preparation

### **1. Menampilkan Struktur dan tipe data**

In [None]:
df.isnull().sum()

In [None]:
df.info()

In [None]:
invalid_rows = pd.to_numeric(df['Open'], errors='coerce').isna()
invalid_df = df[invalid_rows]
print(invalid_df)

In [None]:
df = df.drop(index=[0, 637])
invalid_rows = pd.to_numeric(df['Open'], errors='coerce').isna()
invalid_df = df[invalid_rows]
print(invalid_df)

In [None]:
df[['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']] = df[['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']].astype(float)
df.head()

In [None]:
df.tail()

### **2. Filter Kolom yang digunakan**

In [None]:
df = df[['Date','Adj Close']]
df.Date = pd.to_datetime(df.Date)
df.set_index('Date', inplace=True)

In [None]:
plt.figure(figsize=(16,5))
plt.plot(df['Adj Close'], c='r', label='Price stock')
plt.plot(df['Adj Close'].rolling(22).mean(), c='g', label='Rolling - Mean')
plt.plot(df['Adj Close'].rolling(22).std(), c='b', label='Rolling - Std.')
plt.legend()

In [None]:
print(df.columns)

In [None]:
df.head()

## Data Exploration

### **1. Identifikasi Outlier**

In [None]:
df.describe().round(2)

In [None]:
z_scores = (df - df.mean()) / df.std()
threshold = 3
outlier = z_scores[np.abs(z_scores)>threshold].index

In [None]:
plt.figure(figsize=(10,6))
plt.scatter(df.index, df, s=4)
plt.scatter(outlier, df.loc[outlier], color='b', s=10)
plt.xlabel('Date')
plt.ylabel('Adj Close')

### **2. Analisa Deskriptif**

In [None]:
df.describe().round(2)

In [None]:
from statsmodels.tsa.stattools import adfuller, kpss
import warnings
warnings.filterwarnings('ignore')

def stationery_test(ts_data, alpha):
  p_value_adf = adfuller(ts_data)[1]
  p_value_kpss = kpss(ts_data)[1]

  if p_value_adf<alpha:
    adf_res = 'Tolak Ho / data stasioner'
  else:
    adf_res = 'Terima Ho / data tak stasioner'

  if p_value_kpss<alpha:
    kpss_res = 'Tolak Ho / data tak stasioner'
  else:
    kpss_res = 'Terima Ho / data stasioner'
  
  temp = pd.DataFrame({
      'Uji' : ['ADF', 'KPSS'],
      'P_value' : [p_value_adf, p_value_kpss],
      'Alpha' : [alpha, alpha],
      'Result' : [adf_res, kpss_res]
  })

  return temp

stationery_test(df['Adj Close'], 0.05)

P-Value > alpha sehingga pada uji Adf tak stasioner

P-Value < alpha sehingga pada uji kpss tak stasioner


### **3. Analisa Time Series**

In [None]:
from scipy.stats import boxcox

# Apply Box-Cox transformation
df_boxcox, lam = boxcox(df['Adj Close'])
df['Adj Close Box-Cox'] = df_boxcox

# Perform first difference on Box-Cox transformed data
difference_boxcox = pd.Series(df_boxcox).diff().dropna()

# Plot transformed and differenced data
fig, ax = plt.subplots(2,1, figsize=(12,8))
ax[0].plot(df.index, df['Adj Close'], label='Original')
ax[0].plot(df.index, df_boxcox, label='Box-Cox Transformed')
ax[0].set_ylabel('Value')
ax[0].legend()
ax[1].plot(df.index[1:], difference_boxcox, label='First Difference of Box-Cox Transformed')
ax[1].set_ylabel('Value')
ax[1].legend()
plt.show()

In [None]:
difference_boxcox.head()

In [None]:
#differencing data
"""
window_size = 12
df_diff = df['Adj Close'].rolling(window_size).mean()
df_diff1 = df['Adj Close'] - df['Adj Close'].shift()
df_diff1.head()
"""

In [None]:
#df_diff1 = df_diff - df_diff.shift()
#df_diff1.dropna(inplace=True)
plt.figure(figsize=(16,5))
plt.plot(difference_boxcox, c='r', label='Price stock')
plt.plot(difference_boxcox.rolling(22).mean(), c='g', label='Rolling - Mean')
plt.plot(difference_boxcox.rolling(22).std(), c='b', label='Rolling - Std.')
plt.legend()

In [None]:
stationery_test(difference_boxcox, 0.05)

In [None]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

plt.figure(figsize=(15,6))
plt.rcParams['figure.figsize'] = (15,6)
# plt.subplot(121);
plot_acf(difference_boxcox);

# plt.subplot(122)
plot_pacf(difference_boxcox);

## Model Development

### **1. Splitting Dataset menjadi Data Train dan Data Test**

In [None]:
from statsmodels.tsa.api import ARIMA

ts = difference_boxcox
split = int(.8 * len(ts))
train, test = ts[:split], ts[split:]

### **2. Pembuatan Model ARIMA**

In [None]:
model = ARIMA(train, order=(1,1,1))
fit_model = model.fit()
print(fit_model.summary())
"""
import pmdarima as pm

# Create model
model = pm.auto_arima(difference_boxcox, seasonal=False, suppress_warnings=True)

# Print model summary
print(model.summary())
"""


In [None]:
fit_model.forecast()[:1]

#### **a. Prediksi Data**

#### **b. Bandingkan Skor Akurasi Data Test dan Train**

In [None]:
from sklearn.metrics import mean_squared_error
import math


ts = difference_boxcox
split = int(.8 * len(ts))
train, test = ts[:split], ts[split:]
history = [i for i in train]
pred = []

for i in range(len(test)):
  model = ARIMA(history, order=(1,1,1))
  fit_model = model.fit()
  temp = fit_model.forecast()[:1]
  pred.append(temp)
  history.append(test.iloc[i])

mse = mean_squared_error(test, pred)
rmse = math.sqrt(mse)


In [None]:
print("nilai mse : ",mse)
print("nilai rmse : ",rmse)

In [None]:
forecast = fit_model.forecast(steps=1)

# Print forecasted value
print('One-day forecast:', forecast[0])

#### **c. Cek Overfitting dan underfittingnya**

In [None]:
# Split data into train and test sets
train_size = int(len(ts) * 0.8)
train, test = ts[0:train_size], ts[train_size:]

# Fit ARIMA model on training data
model = ARIMA(train, order=(1,1,1))
model_fit = model.fit()

# Generate predictions on train and test data
train_pred = model_fit.predict(start=1, end=len(train), typ='levels')
test_pred = model_fit.predict(start=len(train), end=len(ts)-1, typ='levels')

# Evaluate performance on train and test data
train_mse = mean_squared_error(train, train_pred)
test_mse = mean_squared_error(test, test_pred)

train_rmse = np.sqrt(train_mse)
test_rmse = np.sqrt(test_mse)

print(f'Train MSE: {train_mse:.6f}, Train RMSE: {train_rmse:.4f}')
print(f'Test MSE: {test_mse:.6f}, Test RMSE: {test_rmse:.4f}')


In [None]:
# Plot train data and train predictions
plt.figure(figsize=(12,6))
plt.plot(train.index, train, label='Train Data')
plt.plot(train_pred.index, train_pred, label='Train Predictions')

# Plot test data and test predictions
plt.plot(test.index, test, label='Test Data')
plt.plot(test_pred.index, test_pred, label='Test Predictions')

# Set title and labels
plt.title('ARIMA Model Predictions')
plt.xlabel('Date')
plt.ylabel('Value')

# Set legend
plt.legend()

# Show plot
plt.show()


### **3. Pembuatan Model LSTM**

In [None]:
df_LSTM = pd.DataFrame(df.drop('Adj Close Box-Cox', axis=1))
df_LSTM

In [None]:
df_LSTM.head()

In [None]:
test.shape

In [None]:
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM, Input
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import layers

In [None]:
df_lag = pd.DataFrame(index=df_LSTM.index)
df_lag['Adj Close Lag1'] = df_LSTM['Adj Close'].shift(1)

In [None]:
 df_new = df.join(df_lag)

In [None]:
scaler = MinMaxScaler()
df_new[['Adj Close', 'Adj Close Lag1']] = scaler.fit_transform(df_new[['Adj Close', 'Adj Close Lag1']])
df_new.dropna(inplace=True)

In [None]:
df_new = df_new.drop('Adj Close Box-Cox', axis=1)
df_new

In [None]:
def prepare_data_for_lstm(df_new, lag=1):
    X, y = [], []
    for i in range(len(df_new)-lag):
        X.append(df_new[i:(i+lag), 0])
        y.append(df_new[i + lag, 0])
    X = np.array(X)
    y = np.array(y).reshape(-1, 1)  # ubah bentuk y menjadi (n, 1)
    return X, y

In [None]:
lag = 1
X, y = prepare_data_for_lstm(df_new[['Adj Close', 'Adj Close Lag1']].values, lag)
train_size = int(len(X) * 0.8)
X_train, y_train = X[:train_size], y[:train_size]
X_val, y_val = X[train_size:], y[train_size:]
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_val = X_val.reshape(X_val.shape[0], X_val.shape[1], 1)

In [None]:
model = Sequential([Input((lag, 1)),
                    LSTM(64),
                    Dense(32, activation='relu'),
                    Dense(32, activation='relu'),
                    Dense(1)
])

model.compile(loss='mse', 
              optimizer=Adam(learning_rate=0.05),
              metrics=['mean_absolute_error'])

model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=10, batch_size = 32)

#### **a. Prediksi Data**

In [None]:
y_train_pred = model.predict(X_train)
y_val_pred = model.predict(X_val)
y_train_pred.shape

In [None]:
y_val_pred.shape

In [None]:
df_LSTM.head()

In [None]:
date = df_LSTM.reset_index(inplace=True)

In [None]:
date = df_LSTM['Date'].values
date = date.reshape(-1,1)
date.shape

In [None]:
y_train = y_train.reshape(-1,1)
y_train.shape

In [None]:
y_val = y_val.reshape(-1,1)
y_val.shape

In [None]:
df_new.head()

In [None]:
newdata=df_new.values

In [None]:
newdata=newdata.reshape(-1,1)

In [None]:
newdata.shape

In [None]:
import matplotlib.pyplot as plt

# select dates for the plots
date_train = date[:1135]
date_val = date[-284:]

# plot the actual and predicted values for train and validation sets
plt.figure(figsize=(15, 5))
plt.plot(date_train, y_train, label='Actual Train')
plt.plot(date_train, y_train_pred, label='Predicted Train')
plt.plot(date_val, y_val, label='Actual Validation')
plt.plot(date_val, y_val_pred, label='Predicted Validation')
plt.title('PT. Telkom Stock Price Prediction')
plt.xlabel('Date')
plt.ylabel('Stock Price')
plt.legend()
plt.show()


In [None]:
train_size

In [None]:
import numpy as np

# predicting next day
prediction_days = 60
real_data = [newdata[len(newdata)+1 - prediction_days:len(newdata+1), 0]]
real_data = np.array(real_data)
real_data = np.reshape(real_data, (real_data.shape[0], real_data.shape[1], 1))


In [None]:
real_data.shape

In [None]:
prediction = model.predict(real_data)
prediction

In [None]:
prediction = np.array([prediction[0, 0], 0])
prediction = scaler.inverse_transform([prediction])
prediction = prediction[0, 0]
print(f"prediction: {prediction}")

#### **b. Bandingkan Skor Akurasi Data Test dan Train**

In [None]:
train_loss, train_mae = model.evaluate(X_train, y_train, verbose=0)
val_loss, val_mae = model.evaluate(X_val, y_val, verbose=0)
print('Train Loss: {:.4f}'.format(train_loss))
print('Validation Loss: {:.4f}'.format(val_loss))
print('Train MAE: {:.4f}'.format(train_mae))
print('Validation MAE: {:.4f}'.format(val_mae))

#### **c. Cek Overfitting dan underfittingnya**

In [None]:
import matplotlib.pyplot as plt

history = model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=25)

train_loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(train_loss) + 1)

plt.plot(epochs, train_loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()


## Conclusion

Didapatkan Kesimpulan nilai 
*   MSE pada ARIMA sebesar :

    nilai mse :  0.0006948381605498469
    
    nilai rmse :  0.026359783014088846

*   MSE pada LSTM sebesar :

    Train Loss: 0.0006

    Validation Loss: 0.0012
    
Jika dilihat berdasarkan mse=loss pada kedua metode tersebut dapat disimpulkan bahwasannya penggunaan Deep Learning (LSTM) tidak selalu memiliki performa yang jauh lebih baik dibandingkan ARIMA yang merupakan metode Machine Learning. untuk beberapa dataset yang tidak terlalu kompleks penggunaan ARIMA sudah sangat dapat digunakan untuk melakukan forecasting

