**Latar Belakang**

COVID-19 adalah penyakit menular yang disebabkan oleh virus SARS-CoV-2. Penting untuk dapat mengidentifikasi individu yang terinfeksi COVID-19 dengan cepat agar langkah-langkah pencegahan dan pengobatan yang tepat dapat diambil. Salah satu metode yang digunakan dalam prediksi status COVID-19 (positif atau negatif) adalah dengan menggunakan algoritma Support Vector Machines (SVM) dan logistic regression.

Dengan menggunakan model ini, kita dapat memberikan penilaian awal tentang kemungkinan seseorang terinfeksi COVID-19, yang dapat membantu dalam pengambilan keputusan dan penanganan lebih lanjut.

**Algoritma SVM & Logistic Regression**


Support Vector Machines (SVM) adalah algoritma Machine Learning yang digunakan untuk klasifikasi dan regresi. SVM mencari hyperplane atau garis pemisah yang optimal untuk memisahkan dua kelas data. Logistic Regression dan SVM umumnya digunakan untuk masalah klasifikasi biner, di mana tujuan utama adalah memprediksi kategori atau kelas target yang memiliki dua kemungkinan nilai.Dalam konteks prediksi gejala COVID-19, SVM dan logistic regression dapat digunakan sebagai algoritma klasifikasi untuk membedakan antara individu yang terinfeksi COVID-19 dan yang tidak berdasarkan gejala yang mereka alami.

**Tahapan Persiapan Data (Data Preparation)**

In [None]:
#import library
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as mtp
from sklearn.model_selection import train_test_split

In [None]:
# import dataset
df = pd.read_csv('/content/covid_early_stage_symptoms.csv')
df.head()

Unnamed: 0,gender,age_year,fever,cough,runny_nose,muscle_soreness,pneumonia,diarrhea,lung_infection,travel_history,isolation_treatment,SARS-CoV-2 Positive
0,male,89,1,1,0,0,0,0,0,1,0,0
1,male,68,1,0,0,0,0,0,0,0,0,0
2,male,68,0,0,0,0,0,0,0,1,0,0
3,male,68,1,1,0,0,0,0,0,1,1,1
4,male,50,1,1,1,0,1,0,0,1,0,1


In [None]:
#mengubah nama column SARS-Cov-2 Positive menjadi covid_positive
df = df.rename(columns = {'SARS-CoV-2 Positive': 'covid_positive'}, inplace = False)

In [None]:
#mengubah nilai pada column gender menjadi angka 1 untuk male dan 0 untuk female
df['gender'].replace({'male':1, 'female':0}, inplace=True)
print(df)

      gender  age_year  fever  cough  runny_nose  muscle_soreness  pneumonia  \
0          1        89      1      1           0                0          0   
1          1        68      1      0           0                0          0   
2          1        68      0      0           0                0          0   
3          1        68      1      1           0                0          0   
4          1        50      1      1           1                0          1   
...      ...       ...    ...    ...         ...              ...        ...   
6507       0        44      1      1           0                0          0   
6508       0        44      1      1           0                0          0   
6509       0        58      0      0           0                0          0   
6510       0        58      1      1           0                0          0   
6511       1        12      1      1           0                0          0   

      diarrhea  lung_infection  travel_

In [None]:
# Untuk mengetahui ukuran data frame
df.shape

(6512, 12)

In [None]:
# menampilkan informasi
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6512 entries, 0 to 6511
Data columns (total 12 columns):
 #   Column               Non-Null Count  Dtype
---  ------               --------------  -----
 0   gender               6512 non-null   int64
 1   age_year             6512 non-null   int64
 2   fever                6512 non-null   int64
 3   cough                6512 non-null   int64
 4   runny_nose           6512 non-null   int64
 5   muscle_soreness      6512 non-null   int64
 6   pneumonia            6512 non-null   int64
 7   diarrhea             6512 non-null   int64
 8   lung_infection       6512 non-null   int64
 9   travel_history       6512 non-null   int64
 10  isolation_treatment  6512 non-null   int64
 11  covid_positive       6512 non-null   int64
dtypes: int64(12)
memory usage: 610.6 KB


In [None]:
#menampilkan tipe data pada setiap column
df.dtypes

gender                 int64
age_year               int64
fever                  int64
cough                  int64
runny_nose             int64
muscle_soreness        int64
pneumonia              int64
diarrhea               int64
lung_infection         int64
travel_history         int64
isolation_treatment    int64
covid_positive         int64
dtype: object

In [None]:
#menampilkan setiap informasi pada setiap column
df.describe()

Unnamed: 0,gender,age_year,fever,cough,runny_nose,muscle_soreness,pneumonia,diarrhea,lung_infection,travel_history,isolation_treatment,covid_positive
count,6512.0,6512.0,6512.0,6512.0,6512.0,6512.0,6512.0,6512.0,6512.0,6512.0,6512.0,6512.0
mean,0.517045,44.019502,0.41078,0.303286,0.084306,0.003993,0.074785,0.005682,0.131296,0.650952,0.216984,0.2414
std,0.499748,16.112865,0.492013,0.459713,0.277867,0.063066,0.263064,0.075169,0.33775,0.476706,0.412223,0.427965
min,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,32.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,1.0,43.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
75%,1.0,55.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
max,1.0,96.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [None]:
# melihat apakah terdapat missing value atau tidak/ berfungsi untuk menampilkan jumlah data yang kosong
df.isnull().sum()

gender                 0
age_year               0
fever                  0
cough                  0
runny_nose             0
muscle_soreness        0
pneumonia              0
diarrhea               0
lung_infection         0
travel_history         0
isolation_treatment    0
covid_positive         0
dtype: int64

In [None]:
#untuk mengetahui korelasi antar kolom.
# Positif correlation dimana data akan bergerak searah.
# Negatif correlation dimana data akan bergerak berlawanan arah.
df.corr()

Unnamed: 0,gender,age_year,fever,cough,runny_nose,muscle_soreness,pneumonia,diarrhea,lung_infection,travel_history,isolation_treatment,covid_positive
gender,1.0,-0.088342,0.008685,-0.006127,0.000157,0.002713,0.006074,-0.0128,-0.021905,0.010473,-0.025785,-0.007034
age_year,-0.088342,1.0,0.046861,0.072975,0.014143,-0.002193,0.014113,0.037316,0.043555,-0.085734,0.055691,0.040054
fever,0.008685,0.046861,1.0,0.763711,0.360032,0.070879,0.340502,0.036549,0.236403,-0.324988,0.230636,0.434909
cough,-0.006127,0.072975,0.763711,1.0,0.453879,0.090664,0.43091,0.047905,0.311282,-0.42795,0.257287,0.555225
runny_nose,0.000157,0.014143,0.360032,0.453879,1.0,0.024611,0.936984,-0.000877,0.507186,-0.197544,0.144646,0.477185
muscle_soreness,0.002713,-0.002193,0.070879,0.090664,0.024611,1.0,-0.008743,-0.004786,0.04028,-0.050702,0.055288,0.004118
pneumonia,0.006074,0.014113,0.340502,0.43091,0.936984,-0.008743,1.0,-0.013725,0.523868,-0.183726,0.135015,0.503991
diarrhea,-0.0128,0.037316,0.036549,0.047905,-0.000877,-0.004786,-0.013725,1.0,0.019008,-0.03894,0.034555,-0.013997
lung_infection,-0.021905,0.043555,0.236403,0.311282,0.507186,0.04028,0.523868,0.019008,1.0,-0.245693,0.26969,0.615855
travel_history,0.010473,-0.085734,-0.324988,-0.42795,-0.197544,-0.050702,-0.183726,-0.03894,-0.245693,1.0,-0.103789,-0.332971


In [None]:
#melihat korelasi atau hubungan antar setiap column
df.corr().style.background_gradient().set_precision(1)

  df.corr().style.background_gradient().set_precision(1)


Unnamed: 0,gender,age_year,fever,cough,runny_nose,muscle_soreness,pneumonia,diarrhea,lung_infection,travel_history,isolation_treatment,covid_positive
gender,1.0,-0.1,0.0,-0.0,0.0,0.0,0.0,-0.0,-0.0,0.0,-0.0,-0.0
age_year,-0.1,1.0,0.0,0.1,0.0,-0.0,0.0,0.0,0.0,-0.1,0.1,0.0
fever,0.0,0.0,1.0,0.8,0.4,0.1,0.3,0.0,0.2,-0.3,0.2,0.4
cough,-0.0,0.1,0.8,1.0,0.5,0.1,0.4,0.0,0.3,-0.4,0.3,0.6
runny_nose,0.0,0.0,0.4,0.5,1.0,0.0,0.9,-0.0,0.5,-0.2,0.1,0.5
muscle_soreness,0.0,-0.0,0.1,0.1,0.0,1.0,-0.0,-0.0,0.0,-0.1,0.1,0.0
pneumonia,0.0,0.0,0.3,0.4,0.9,-0.0,1.0,-0.0,0.5,-0.2,0.1,0.5
diarrhea,-0.0,0.0,0.0,0.0,-0.0,-0.0,-0.0,1.0,0.0,-0.0,0.0,-0.0
lung_infection,-0.0,0.0,0.2,0.3,0.5,0.0,0.5,0.0,1.0,-0.2,0.3,0.6
travel_history,0.0,-0.1,-0.3,-0.4,-0.2,-0.1,-0.2,-0.0,-0.2,1.0,-0.1,-0.3


**Tahapan Pemodelan (Modeling)**

In [None]:
#membagi setiap data pada column menjadi variabel x dan y. karena kita akan memprediksi column covid_positive, maka column tersebut ditempatkan di variabel y
x = df.drop(columns=['covid_positive', 'isolation_treatment'])
y = df['covid_positive']

#membagi data training dan data test
x_train, x_test, y_train, y_test = train_test_split(x,
                                                    y,
                                                    test_size=0.3,
                                                    random_state=0)

**Tagapan Pemodelan (Modeling) & Pelatihan (Training)**

In [None]:
#mengimport modul logistic regression
from sklearn.linear_model import LogisticRegression
model_log = LogisticRegression()

#menerapkan dataset pada model
model_log.fit(x_train, y_train)

#melihat akurasi model
model_log.score(x_test, y_test)*100

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


88.84339815762539

In [None]:
#mengimport modul SVM
from sklearn import svm

model_svm = svm.SVC(kernel='linear',decision_function_shape='ovr',
                    verbose=False)

model_svm.fit(x_train, y_train)
model_svm.score(x_test, y_test)*100

88.84339815762539

**Tahapan Evaluasi (Evaluation)**

In [None]:
y_pred = model_svm.predict(x_test)

In [None]:
# untuk mencetak metrik evaluasi kinerja model klasifikasi
from sklearn import metrics

print('Precision : ', round(metrics.precision_score(y_test, y_pred)*100, 2))
print('F1 score : ', round(metrics.f1_score(y_test, y_pred)*100, 2))
print('recall : ', round(metrics.recall_score(y_test, y_pred)*100, 2))
print('Accuracy : ', round(metrics.accuracy_score(y_test, y_pred)*100, 2))

Precision :  94.43
F1 score :  71.32
recall :  57.29
Accuracy :  88.84


In [None]:
#menampilkan column pada variabel x sebagai acuan mengisi parameter untuk proses prediksi
x.head(7)

Unnamed: 0,gender,age_year,fever,cough,runny_nose,muscle_soreness,pneumonia,diarrhea,lung_infection,travel_history
0,1,89,1,1,0,0,0,0,0,1
1,1,68,1,0,0,0,0,0,0,0
2,1,68,0,0,0,0,0,0,0,1
3,1,68,1,1,0,0,0,0,0,1
4,1,50,1,1,1,0,1,0,0,1
5,1,50,1,1,0,0,0,0,0,0
6,0,55,1,1,0,0,0,0,0,0


In [None]:
# melakukan testing pada model

#menginputkan paramter dengan urutan column pada dataset variabel x
param = [[1, 50, 1, 1, 1, 0, 1, 0, 0, 1]]

# melakukan prediksi
predicted = model_log.predict(param)
score = model_log.predict_proba(param)

if predicted == 1 :
    result = 'Positive'
    accuracy = str(score[0][1]*100)
else:
    result = 'Negative'
    accuracy = str(score[0][0]*100)

print('{0:.5}'.format(accuracy)+'%', 'kemungkinan anda', result, 'Covid-19')

93.98% kemungkinan anda Positive Covid-19


