## Implementasi Penggunaan Model Linear (Multilinear Regression untuk Memprediksi Risiko Penyakit Jantung Berdasarkan Data Medis)

Implementasi dari jurnal menggunakan Kemras, kali ini saya menggunakan Pytorch

In [809]:
# Import semua library yang dibutuh
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
from pandas import DataFrame as df
import pandas as pd
from sklearn.model_selection import train_test_split

from torch.autograd import Variable

## Import data set dan menampilan menggunakan Pandas

In [810]:
datasets =  pd.read_excel("Data/heart_disease_uci.xlsx")
dataFrame = df(datasets)
dataFrame.tail()

Unnamed: 0,id,age,sex,dataset,cp,trestbps,chol,fbs,restecg,thalch,exang,oldpeak,slope,ca,thal,num
915,916,54,Female,VA Long Beach,asymptomatic,127.0,333.0,1.0,st-t abnormality,154.0,0.0,0.0,,,,1
916,917,62,Male,VA Long Beach,typical angina,,139.0,0.0,st-t abnormality,,,,,,,0
917,918,55,Male,VA Long Beach,asymptomatic,122.0,223.0,1.0,st-t abnormality,100.0,0.0,0.0,,,fixed defect,2
918,919,58,Male,VA Long Beach,asymptomatic,,385.0,1.0,lv hypertrophy,,,,,,,0
919,920,62,Male,VA Long Beach,atypical angina,120.0,254.0,0.0,lv hypertrophy,93.0,1.0,0.0,,,,1


In [811]:
print('Data Types')
print('-'*10)
print(dataFrame.dtypes)
print("")
print ('Shape')
print('-'*10)
print (newShape:=dataFrame.shape)

Data Types
----------
id            int64
age           int64
sex          object
dataset      object
cp           object
trestbps    float64
chol        float64
fbs         float64
restecg      object
thalch      float64
exang       float64
oldpeak     float64
slope        object
ca          float64
thal         object
num           int64
dtype: object

Shape
----------
(920, 16)


## Clear the data
Karena kita melihat ada tipe data yg kosong di tandakan dengan `Nan` Maka kita perlu mendrop record yang ada Nan Nya


In [812]:
print("Data Shape setelah drop NA value")
dataFrame = dataFrame.dropna()
dataFrame.shape

print(f'data yang digunakan pada model sebesar {(dataFrame.shape[0]/newShape[0])*100}% dari total datasets yang tersedia pada kaggle')

Data Shape setelah drop NA value
data yang digunakan pada model sebesar 32.5% dari total datasets yang tersedia pada kaggle


## Ambil kolom yang digunakan dalam pembuatan modul
Berdasarakn jurnal tersebut kolom yang diambil adalah variabel independent yang digunakan adalah

* age
* chol
* tahlch
* ca
* thal

dan variable dependent adalah
* num (yang di bentuk dalam numrik)

Pengubahan yang dijalankan fixed defect diganti dengan 0, normal diganti dengan 1, dan reversable defect diganti dengan 2.


## Replace object
Merubah tipe data object `thal` menjadi numerik


In [813]:
dataFrame['thal'].head()

0         fixed defect
1               normal
2    reversable defect
3               normal
4               normal
Name: thal, dtype: object

In [814]:
thalUnique = dataFrame['thal'].unique()
print(thalUnique)
dataFrame.thal.replace(thalUnique,[0,1,2], inplace=True)

['fixed defect' 'normal' 'reversable defect']


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  dataFrame.thal.replace(thalUnique,[0,1,2], inplace=True)
  dataFrame.thal.replace(thalUnique,[0,1,2], inplace=True)


In [815]:
# ind_var = dataFrame[['age', 'chol', 'thalch','ca','thal']]
ind_var = dataFrame[[ 'thalch']]
num_of_ind_var = ind_var.shape[1]
ind_var.head()

Unnamed: 0,thalch
0,150.0
1,108.0
2,129.0
3,187.0
4,172.0


In [816]:
dep_var = dataFrame[['num']]
dep_var.head()

Unnamed: 0,num
0,0
1,2
2,1
3,0
4,0


## Menentukan jumlah data sets yang digunakan untuk train dan test
Pemecahaan data untuk membuat model dan test menggunakan fucntion dari sklear, yaitu menggunakan method `tran_test_split`, pada penenlitan tersebut menggunakan variabel latihan sebanyak 80%

In [817]:
# x_train, x_test, y_train, y_test = train_test_split(ind_var, dep_var, test_size = 0.2, random_state = 4)

## Mari kita mulai dengen regression linear menggunakan pythor

In [818]:
print(f'Jumlah varaible input {5}')

Jumlah varaible input 5


In [819]:
# Parameter menerima input size dan outputnya
# 5 variabel dependen dan 1 variable independent
# model = nn.Linear(5, 1)
# model

In [820]:
# # Knversi data kedalam bentuk pytorch
# X_torch_train = torch.from_numpy(x_train.to_numpy().astype(np.float32))
# X_torch_test = torch.from_numpy(x_test.to_numpy().astype(np.float32))
# y_torch_train = torch.from_numpy(y_train.to_numpy().astype(np.float32))
# # dep_var_test = torch.from_numpy(y_test.to_numpy().astype(np.float32))

# # reshape the y from 1d array to a column vector
# y_torch_train = y_torch_train.view(y.shape[0], 1) # number of values is 1 and rows is 1

In [821]:
X = Variable(torch.from_numpy(ind_var.to_numpy().astype(np.float32)))
Y = Variable(torch.from_numpy(dep_var.to_numpy().astype(np.float32)))
# Y = torch.from_numpy(dep_var.to_numpy().astype(np.float32))

In [822]:
print(X.shape)
print(Y.shape)

torch.Size([299, 1])
torch.Size([299, 1])


In [823]:
df(X)

Unnamed: 0,0
0,150.0
1,108.0
2,129.0
3,187.0
4,172.0
...,...
294,141.0
295,115.0
296,174.0
297,98.0


In [824]:
df(Y)

Unnamed: 0,0
0,0.0
1,2.0
2,1.0
3,0.0
4,0.0
...,...
294,2.0
295,3.0
296,1.0
297,1.0


In [832]:
X = torch.from_numpy(np.array(range(0,10), dtype = np.float32))
Y = torch.from_numpy(np.array([2*a for a in X], dtype = np.float32))
df({'X':X, 'Y':Y})

Unnamed: 0,X,Y
0,0.0,0.0
1,1.0,2.0
2,2.0,4.0
3,3.0,6.0
4,4.0,8.0
5,5.0,10.0
6,6.0,12.0
7,7.0,14.0
8,8.0,16.0
9,9.0,18.0


In [826]:
class LinearRegressionModel (nn.Module):
    def __init__(self):
        super(LinearRegressionModel, self).__init__()
        self.linear = torch.nn.Linear(1,1)

    def forward(self,x):
        y_pred = self.linear(x)
        return y_pred

In [827]:
model = LinearRegressionModel()


In [828]:
## Membuat optimizer dan learning rate
learning_rate = 0.01
loss_function = nn.MSELoss()
# loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = learning_rate)

In [833]:
# Train Model
epochs = 100 # banyak percobaan
myMSE = list()
for epoch in range (epochs):

    # maju
    # print(epoch)
    predict_dep = model(X)
    # print(predict_dep)
    loss = loss_function(predict_dep, Y)
    # # loss = MSE(y_predicted = predict_dep, y_target = Y) # calculate MSE
    myMSE.append(loss)

    # # backward pass: calculate gradients
    loss.backward()
    
    # # perform weight updates
    optimizer.step()

    # # empty the gradients before the next iterations so they don't accumulate
    optimizer.zero_grad()
    
    

    if (epoch+1) % 5 == 0:
        print(f'Pelatihan ke : {epoch+1} loss: {loss.item():5f}')

# Plot kurva
# predicted = model(X).detach().numpy() # detach the tensor and then convert to numpy array again.
# # print(predicted)
# # print(myMSE.item())
# lost_list = [a.item() for a in myMSE]
# plt.plot(range(1, epochs+1),lost_list);
# plt.xlabel('Epoch (#)'), plt.ylabel('Mean squared Errors')



RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x10 and 1x1)