#### Daftar Isi:
- [1. Pemodelan Tahap 1](#1-pemodelan-svm-tahap-1)
- [2. Pengukuran Performa Tahap 1](#2-pengukuran-performa-tahap-1)
- [3. Pemodelan Tahap 2](#3-pemodelan-svm-tahap-2)
- [4. Pengukuran Performa Tahap 2](#4-pengukuran-performa-tahap-2)
- [5. Pengujian](#5-pengujian)

***

In [1]:
from IPython.core.interactiveshell import InteractiveShell
import sklearnex

InteractiveShell.ast_node_interactivity = "all"
sklearnex.patch_sklearn()

%load_ext watermark
%watermark -a "F. Waskito" -n -t -u -v

Author: F. Waskito

Last updated: Wed Nov 29 2023 23:30:13

Python implementation: CPython
Python version       : 3.9.18
IPython version      : 8.15.0



Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


In [2]:
import pandas
from collection import analysis

path = "data/tweet/depresi_or_bipolar_tweets_id_01-10.csv"
tweets_table = pandas.read_csv(path)
texts = tweets_table.loc[:, "Text"].copy().to_list()
labels = tweets_table.loc[:, "Sentiment"].copy().to_list()

analysis.get_shape(texts)
analysis.get_distribution(labels)

Shape: (3409,)
Distribution:
	('positive', 864)
	('neutral', 1253)
	('negative', 1292)


Prapemrosesan

In [3]:
import time
from tqdm import tqdm
from preprocess.preprocessing import TextPreprocessor

preprocessor = TextPreprocessor()
for i, text in enumerate(tqdm(texts)):
    text = preprocessor.clean(text)
    text = preprocessor.standardize(text)
    tokens = preprocessor.tokenize(text)
    tokens = preprocessor.filter(tokens)
    texts[i] = preprocessor.stem(tokens)
    time.sleep(0.001)

100%|██████████| 3409/3409 [12:16<00:00,  4.63it/s]


Ekstraksi Fitur

In [4]:
from preprocess.feature.extraction import TextVectorizer

In [35]:
# BOW
extractor = TextVectorizer(texts)
extractor.transform(target="bow", min_df=2, norm=True)
vector_texts = extractor.vectors

analysis.get_shape(vector_texts)

Shape: (3409, 2202)


In [5]:
# TF-IDF
extractor = TextVectorizer(texts)
extractor.transform(target="tfidf", min_df=2, norm=True)
vector_texts = extractor.vectors

analysis.get_shape(vector_texts)

Shape: (3409, 2202)


Transformasi Label

In [6]:
from preprocess.encoding import LabelEncoder

In [7]:
encoder = LabelEncoder(labels)
encoder.transform(target="integer")
encoded_labels = encoder.encoded_labels

analysis.get_distribution(encoded_labels)

Distribution:
	(2, 864)
	(1, 1253)
	(0, 1292)


Seprasi Data

In [8]:
from sklearn.model_selection import train_test_split

In [9]:
X_train, X_test, y_train, y_test = train_test_split(
    vector_texts,
    encoded_labels,
    test_size = 0.3,
    random_state = 42
)

print("> Train set:")
analysis.get_shape(X_train)
analysis.get_distribution(y_train)
print("\n> Test set:")
analysis.get_shape(X_test)
analysis.get_distribution(y_test)

> Train set:
Shape: (2386, 2202)
Distribution:
	(1, 866)
	(2, 619)
	(0, 901)

> Test set:
Shape: (1023, 2202)
Distribution:
	(0, 391)
	(1, 387)
	(2, 245)


***

### __1. Pemodelan SVM Tahap 1__

Beralih ke:
- [Daftar Isi](#daftar-isi)
- [2. Pengukuran Performa Tahap 1](#2-pengukuran-performa-tahap-1)

In [10]:
from sklearn.svm import SVC

In [11]:
c = 100.0
gamma = 1.0
degree = 9

In [321]:
linear_svm = SVC(kernel="linear", C=c)

In [12]:
rbf_svm = SVC(kernel="rbf", C=c, gamma=gamma)

In [27]:
poly_svm = SVC(kernel="poly", C=c, gamma=gamma, degree=degree)

### __2. Pengukuran Performa Tahap 1__

Beralih ke:
- [1. Pemodelan SVM Tahap 1](#1-pemodelan-svm-tahap-1)
- [3. Pemodelan SVM Tahap 2](#3-pemodelan-svm-tahap-2)

In [13]:
from validation.cross import ImbalancedCV

In [14]:
n_fold = 5
scoring = ["accuracy", "precision", "recall", "f1",]
random_state = 42

In [324]:
linear_perform = ImbalancedCV(
    model = linear_svm,
    n_fold = n_fold,
    scoring = scoring,
    scoring_avg = "macro",
    random_state = random_state,
)

linear_perform.validate(X_train, y_train)

CV: 100%|██████████| 5/5 [00:16<00:00,  3.36s/it]


In [16]:
rbf_perform = ImbalancedCV(
    model = rbf_svm,
    n_fold = n_fold,
    scoring = scoring,
    scoring_avg = "macro",
    random_state = random_state,
)

rbf_perform.validate(X_train, y_train)

CV: 100%|██████████| 5/5 [00:19<00:00,  3.97s/it]


In [29]:
poly_perform = ImbalancedCV(
    model = poly_svm,
    n_fold = n_fold,
    scoring = scoring,
    scoring_avg = "macro",
    random_state = random_state,
)

poly_perform.validate(X_train, y_train)

CV: 100%|██████████| 5/5 [00:16<00:00,  3.32s/it]


#### 2.1 Pengukuran Performa Tahap 1: SVM + BOW

##### 2.1.1 Performa SVM-Linier

In [22]:
linear_perform.get_score() # C= 1.0

{'mean_accuracy': 0.672,
 'mean_precision': 0.662,
 'mean_recall': 0.657,
 'mean_f1': 0.654}

In [33]:
linear_perform.get_score() # C= 10.0

{'mean_accuracy': 0.679,
 'mean_precision': 0.667,
 'mean_recall': 0.665,
 'mean_f1': 0.665}

In [43]:
linear_perform.get_score() # C= 100.0

{'mean_accuracy': 0.653,
 'mean_precision': 0.641,
 'mean_recall': 0.639,
 'mean_f1': 0.639}

In [53]:
linear_perform.get_score() # C= 1000.0

{'mean_accuracy': 0.642,
 'mean_precision': 0.632,
 'mean_recall': 0.629,
 'mean_f1': 0.63}

In [64]:
linear_perform.get_score() # C= 10000.0

{'mean_accuracy': 0.636,
 'mean_precision': 0.627,
 'mean_recall': 0.623,
 'mean_f1': 0.623}

##### 2.1.2 Performa SVM-RBF

In [23]:
rbf_perform.get_score() # C= 1.0; gamma= 0.01

{'mean_accuracy': 0.531,
 'mean_precision': 0.596,
 'mean_recall': 0.511,
 'mean_f1': 0.476}

In [34]:
rbf_perform.get_score() # C= 10.0; gamma= 0.01

{'mean_accuracy': 0.666,
 'mean_precision': 0.673,
 'mean_recall': 0.649,
 'mean_f1': 0.645}

In [44]:
rbf_perform.get_score() # C= 100.0; gamma= 0.01

{'mean_accuracy': 0.676,
 'mean_precision': 0.664,
 'mean_recall': 0.66,
 'mean_f1': 0.659}

In [54]:
rbf_perform.get_score() # C= 1000.0; gamma= 0.01

{'mean_accuracy': 0.679,
 'mean_precision': 0.666,
 'mean_recall': 0.66,
 'mean_f1': 0.661}

In [65]:
rbf_perform.get_score() # C= 10,000.0; gamma= 0.01

{'mean_accuracy': 0.647,
 'mean_precision': 0.636,
 'mean_recall': 0.631,
 'mean_f1': 0.632}

In [74]:
rbf_perform.get_score() # C= 1.0; gamma= 0.1

{'mean_accuracy': 0.648,
 'mean_precision': 0.651,
 'mean_recall': 0.627,
 'mean_f1': 0.625}

In [81]:
rbf_perform.get_score() # C= 10.0; gamma= 0.1

{'mean_accuracy': 0.68,
 'mean_precision': 0.669,
 'mean_recall': 0.661,
 'mean_f1': 0.661}

In [88]:
rbf_perform.get_score() # C= 100.0; gamma= 0.1

{'mean_accuracy': 0.678,
 'mean_precision': 0.665,
 'mean_recall': 0.658,
 'mean_f1': 0.658}

In [95]:
rbf_perform.get_score() # C= 1000.0; gamma= 0.1

{'mean_accuracy': 0.647,
 'mean_precision': 0.636,
 'mean_recall': 0.627,
 'mean_f1': 0.628}

In [102]:
rbf_perform.get_score() # C= 10,000.0; gamma= 0.1

{'mean_accuracy': 0.625,
 'mean_precision': 0.613,
 'mean_recall': 0.603,
 'mean_f1': 0.603}

In [109]:
rbf_perform.get_score() # C= 1.0; gamma= 1.0

{'mean_accuracy': 0.575,
 'mean_precision': 0.612,
 'mean_recall': 0.531,
 'mean_f1': 0.509}

In [116]:
rbf_perform.get_score() # C= 10.0; gamma= 1.0

{'mean_accuracy': 0.581,
 'mean_precision': 0.608,
 'mean_recall': 0.538,
 'mean_f1': 0.518}

In [123]:
rbf_perform.get_score() # C= 100.0; gamma= 1.0

{'mean_accuracy': 0.572,
 'mean_precision': 0.597,
 'mean_recall': 0.53,
 'mean_f1': 0.513}

In [130]:
rbf_perform.get_score() # C= 1000.0; gamma= 1.0

{'mean_accuracy': 0.57,
 'mean_precision': 0.601,
 'mean_recall': 0.528,
 'mean_f1': 0.511}

In [137]:
rbf_perform.get_score() # C= 10,000.0; gamma= 1.0

{'mean_accuracy': 0.57,
 'mean_precision': 0.601,
 'mean_recall': 0.528,
 'mean_f1': 0.51}

##### 2.1.3 Performa SVM-Polinomial

In [24]:
poly_perform.get_score() # C= 1.0; gamma= 0.01; degree= 3

{'mean_accuracy': 0.365,
 'mean_precision': 0.305,
 'mean_recall': 0.336,
 'mean_f1': 0.182}

In [35]:
poly_perform.get_score() # C= 10.0; gamma= 0.01; degree= 3

{'mean_accuracy': 0.365,
 'mean_precision': 0.305,
 'mean_recall': 0.336,
 'mean_f1': 0.182}

In [45]:
poly_perform.get_score() # C= 100.0; gamma= 0.01; degree= 3

{'mean_accuracy': 0.365,
 'mean_precision': 0.305,
 'mean_recall': 0.336,
 'mean_f1': 0.182}

In [55]:
poly_perform.get_score() # C= 1000.0; gamma= 0.01; degree= 3

{'mean_accuracy': 0.368,
 'mean_precision': 0.566,
 'mean_recall': 0.338,
 'mean_f1': 0.188}

In [66]:
poly_perform.get_score() # C= 10,000.0; gamma= 0.01; degree= 3

{'mean_accuracy': 0.383,
 'mean_precision': 0.785,
 'mean_recall': 0.357,
 'mean_f1': 0.225}

In [75]:
poly_perform.get_score() # C= 1.0; gamma= 0.1; degree= 3

{'mean_accuracy': 0.368,
 'mean_precision': 0.566,
 'mean_recall': 0.338,
 'mean_f1': 0.188}

In [82]:
poly_perform.get_score() # C= 10.0; gamma= 0.1; degree= 3

{'mean_accuracy': 0.383,
 'mean_precision': 0.785,
 'mean_recall': 0.357,
 'mean_f1': 0.225}

In [89]:
poly_perform.get_score() # C= 100.0; gamma= 0.1; degree= 3

{'mean_accuracy': 0.405,
 'mean_precision': 0.706,
 'mean_recall': 0.379,
 'mean_f1': 0.269}

In [96]:
poly_perform.get_score() # C= 1000.0; gamma= 0.1; degree= 3

{'mean_accuracy': 0.495,
 'mean_precision': 0.606,
 'mean_recall': 0.473,
 'mean_f1': 0.432}

In [103]:
poly_perform.get_score() # C= 10,000.0; gamma= 0.1; degree= 3

{'mean_accuracy': 0.565,
 'mean_precision': 0.602,
 'mean_recall': 0.546,
 'mean_f1': 0.53}

In [110]:
poly_perform.get_score() # C= 1.0; gamma= 1.0; degree= 3

{'mean_accuracy': 0.495,
 'mean_precision': 0.606,
 'mean_recall': 0.473,
 'mean_f1': 0.432}

In [117]:
poly_perform.get_score() # C= 10.0; gamma= 1.0; degree= 3

{'mean_accuracy': 0.565,
 'mean_precision': 0.602,
 'mean_recall': 0.546,
 'mean_f1': 0.53}

In [124]:
poly_perform.get_score() # C= 100.0; gamma= 1.0; degree= 3

{'mean_accuracy': 0.599,
 'mean_precision': 0.597,
 'mean_recall': 0.579,
 'mean_f1': 0.57}

In [131]:
poly_perform.get_score() # C= 1000.0; gamma= 1.0; degree= 3

{'mean_accuracy': 0.595,
 'mean_precision': 0.582,
 'mean_recall': 0.573,
 'mean_f1': 0.567}

In [138]:
poly_perform.get_score() # C= 10,000.0; gamma= 1.0; degree= 3

{'mean_accuracy': 0.577,
 'mean_precision': 0.559,
 'mean_recall': 0.558,
 'mean_f1': 0.554}

In [142]:
poly_perform.get_score() # C= 1.0; gamma= 0.01; degree= 6

{'mean_accuracy': 0.363,
 'mean_precision': 0.188,
 'mean_recall': 0.334,
 'mean_f1': 0.178}

In [146]:
poly_perform.get_score() # C= 10.0; gamma= 0.01; degree= 6

{'mean_accuracy': 0.363,
 'mean_precision': 0.188,
 'mean_recall': 0.334,
 'mean_f1': 0.178}

In [150]:
poly_perform.get_score() # C= 100.0; gamma= 0.01; degree= 6

{'mean_accuracy': 0.363,
 'mean_precision': 0.188,
 'mean_recall': 0.334,
 'mean_f1': 0.178}

In [154]:
poly_perform.get_score() # C= 1000.0; gamma= 0.01; degree= 6

{'mean_accuracy': 0.363,
 'mean_precision': 0.188,
 'mean_recall': 0.334,
 'mean_f1': 0.178}

In [158]:
poly_perform.get_score() # C= 10,000.0; gamma= 0.01; degree= 6

{'mean_accuracy': 0.363,
 'mean_precision': 0.188,
 'mean_recall': 0.334,
 'mean_f1': 0.178}

In [162]:
poly_perform.get_score() # C= 1.0; gamma= 0.1; degree= 6

{'mean_accuracy': 0.364,
 'mean_precision': 0.321,
 'mean_recall': 0.334,
 'mean_f1': 0.18}

In [166]:
poly_perform.get_score() # C= 10.0; gamma= 0.1; degree= 6

{'mean_accuracy': 0.368,
 'mean_precision': 0.566,
 'mean_recall': 0.338,
 'mean_f1': 0.188}

In [170]:
poly_perform.get_score() # C= 100.0; gamma= 0.1; degree= 6

{'mean_accuracy': 0.373,
 'mean_precision': 0.709,
 'mean_recall': 0.344,
 'mean_f1': 0.2}

In [174]:
poly_perform.get_score() # C= 1000.0; gamma= 0.1; degree= 6

{'mean_accuracy': 0.38,
 'mean_precision': 0.782,
 'mean_recall': 0.353,
 'mean_f1': 0.218}

In [178]:
poly_perform.get_score() # C= 10,000.0; gamma= 0.1; degree= 6

{'mean_accuracy': 0.385,
 'mean_precision': 0.785,
 'mean_recall': 0.359,
 'mean_f1': 0.229}

In [182]:
poly_perform.get_score() # C= 1.0; gamma= 1.0; degree= 6

{'mean_accuracy': 0.412,
 'mean_precision': 0.619,
 'mean_recall': 0.388,
 'mean_f1': 0.29}

In [186]:
poly_perform.get_score() # C= 10.0; gamma= 1.0; degree= 6

{'mean_accuracy': 0.429,
 'mean_precision': 0.596,
 'mean_recall': 0.404,
 'mean_f1': 0.322}

In [190]:
poly_perform.get_score() # C= 100.0; gamma= 1.0; degree= 6

{'mean_accuracy': 0.463,
 'mean_precision': 0.574,
 'mean_recall': 0.443,
 'mean_f1': 0.387}

In [194]:
poly_perform.get_score() # C= 1000.0; gamma= 1.0; degree= 6

{'mean_accuracy': 0.524,
 'mean_precision': 0.59,
 'mean_recall': 0.504,
 'mean_f1': 0.477}

In [198]:
poly_perform.get_score() # C= 10,000.0; gamma= 1.0; degree= 6

{'mean_accuracy': 0.546,
 'mean_precision': 0.582,
 'mean_recall': 0.528,
 'mean_f1': 0.509}

In [209]:
poly_perform.get_score() # C= 1.0; gamma= 0.01; degree= 9

{'mean_accuracy': 0.363,
 'mean_precision': 0.188,
 'mean_recall': 0.334,
 'mean_f1': 0.178}

In [213]:
poly_perform.get_score() # C= 10.0; gamma= 0.01; degree= 9

{'mean_accuracy': 0.363,
 'mean_precision': 0.188,
 'mean_recall': 0.334,
 'mean_f1': 0.178}

In [217]:
poly_perform.get_score() # C= 100.0; gamma= 0.01; degree= 9

{'mean_accuracy': 0.363,
 'mean_precision': 0.188,
 'mean_recall': 0.334,
 'mean_f1': 0.178}

In [221]:
poly_perform.get_score() # C= 1000.0; gamma= 0.01; degree= 9

{'mean_accuracy': 0.363,
 'mean_precision': 0.188,
 'mean_recall': 0.334,
 'mean_f1': 0.178}

In [225]:
poly_perform.get_score() # C= 10,000.0; gamma= 0.01; degree= 9

{'mean_accuracy': 0.363,
 'mean_precision': 0.188,
 'mean_recall': 0.334,
 'mean_f1': 0.178}

In [229]:
poly_perform.get_score() # C= 1.0; gamma= 0.1; degree= 9

{'mean_accuracy': 0.364,
 'mean_precision': 0.321,
 'mean_recall': 0.334,
 'mean_f1': 0.18}

In [233]:
poly_perform.get_score() # C= 10.0; gamma= 0.1; degree= 9

{'mean_accuracy': 0.364,
 'mean_precision': 0.321,
 'mean_recall': 0.334,
 'mean_f1': 0.18}

In [237]:
poly_perform.get_score() # C= 100.0; gamma= 0.1; degree= 9

{'mean_accuracy': 0.368,
 'mean_precision': 0.633,
 'mean_recall': 0.339,
 'mean_f1': 0.19}

In [241]:
poly_perform.get_score() # C= 1000.0; gamma= 0.1; degree= 9

{'mean_accuracy': 0.369,
 'mean_precision': 0.638,
 'mean_recall': 0.34,
 'mean_f1': 0.192}

In [245]:
poly_perform.get_score() # C= 10,000.0; gamma= 0.1; degree= 9

{'mean_accuracy': 0.38,
 'mean_precision': 0.782,
 'mean_recall': 0.352,
 'mean_f1': 0.217}

In [249]:
poly_perform.get_score() # C= 1.0; gamma= 1.0; degree= 9

{'mean_accuracy': 0.404,
 'mean_precision': 0.622,
 'mean_recall': 0.379,
 'mean_f1': 0.274}

In [253]:
poly_perform.get_score() # C= 10.0; gamma= 1.0; degree= 9

{'mean_accuracy': 0.406,
 'mean_precision': 0.578,
 'mean_recall': 0.382,
 'mean_f1': 0.281}

In [257]:
poly_perform.get_score() # C= 100.0; gamma= 1.0; degree= 9

{'mean_accuracy': 0.42,
 'mean_precision': 0.587,
 'mean_recall': 0.396,
 'mean_f1': 0.307}

In [261]:
poly_perform.get_score() # C= 1000.0; gamma= 1.0; degree= 9

{'mean_accuracy': 0.441,
 'mean_precision': 0.564,
 'mean_recall': 0.42,
 'mean_f1': 0.35}

In [265]:
poly_perform.get_score() # C= 10,000.0; gamma= 1.0; degree= 9

{'mean_accuracy': 0.47,
 'mean_precision': 0.561,
 'mean_recall': 0.448,
 'mean_f1': 0.399}

#### 2.2 Pengukuran Performa Tahap 1: SVM + TF-IDF

##### 2.2.1 Performa SVM-Linier

In [277]:
linear_perform.get_score() # C= 1.0

{'mean_accuracy': 0.696,
 'mean_precision': 0.685,
 'mean_recall': 0.682,
 'mean_f1': 0.683}

In [287]:
linear_perform.get_score() # C= 10.0

{'mean_accuracy': 0.676,
 'mean_precision': 0.666,
 'mean_recall': 0.662,
 'mean_f1': 0.663}

In [297]:
linear_perform.get_score() # C= 100.0

{'mean_accuracy': 0.659,
 'mean_precision': 0.651,
 'mean_recall': 0.643,
 'mean_f1': 0.645}

In [307]:
linear_perform.get_score() # C= 1000.0

{'mean_accuracy': 0.653,
 'mean_precision': 0.641,
 'mean_recall': 0.636,
 'mean_f1': 0.636}

In [317]:
linear_perform.get_score() # C= 10,000.0

{'mean_accuracy': 0.632,
 'mean_precision': 0.627,
 'mean_recall': 0.618,
 'mean_f1': 0.618}

##### 2.2.2 Performa SVM-RBF

In [278]:
rbf_perform.get_score() # C= 1.0; gamma= 0.01

{'mean_accuracy': 0.467,
 'mean_precision': 0.657,
 'mean_recall': 0.476,
 'mean_f1': 0.408}

In [288]:
rbf_perform.get_score() # C= 10.0; gamma= 0.01

{'mean_accuracy': 0.68,
 'mean_precision': 0.676,
 'mean_recall': 0.674,
 'mean_f1': 0.673}

In [298]:
rbf_perform.get_score() # C= 100.0; gamma= 0.01

{'mean_accuracy': 0.698,
 'mean_precision': 0.687,
 'mean_recall': 0.683,
 'mean_f1': 0.684}

In [308]:
rbf_perform.get_score() # C= 1000.0; gamma= 0.01

{'mean_accuracy': 0.668,
 'mean_precision': 0.66,
 'mean_recall': 0.654,
 'mean_f1': 0.655}

In [318]:
rbf_perform.get_score() # C= 10,000.0; gamma= 0.01

{'mean_accuracy': 0.658,
 'mean_precision': 0.648,
 'mean_recall': 0.642,
 'mean_f1': 0.643}

In [327]:
rbf_perform.get_score() # C= 1.0; gamma= 0.1

{'mean_accuracy': 0.675,
 'mean_precision': 0.678,
 'mean_recall': 0.668,
 'mean_f1': 0.668}

In [334]:
rbf_perform.get_score() # C= 10.0; gamma= 0.1

{'mean_accuracy': 0.701,
 'mean_precision': 0.691,
 'mean_recall': 0.686,
 'mean_f1': 0.687}

In [341]:
rbf_perform.get_score() # C= 100.0; gamma= 0.1

{'mean_accuracy': 0.684,
 'mean_precision': 0.675,
 'mean_recall': 0.67,
 'mean_f1': 0.671}

In [348]:
rbf_perform.get_score() # C= 1000.0; gamma= 0.1

{'mean_accuracy': 0.671,
 'mean_precision': 0.662,
 'mean_recall': 0.657,
 'mean_f1': 0.658}

In [355]:
rbf_perform.get_score() # C= 10,000.0; gamma= 0.1

{'mean_accuracy': 0.67,
 'mean_precision': 0.661,
 'mean_recall': 0.655,
 'mean_f1': 0.656}

In [362]:
rbf_perform.get_score() # C= 1.0; gamma= 1.0

{'mean_accuracy': 0.701,
 'mean_precision': 0.708,
 'mean_recall': 0.674,
 'mean_f1': 0.676}

In [18]:
rbf_perform.get_score() # C= 10.0; gamma= 1.0

{'mean_accuracy': 0.71,
 'mean_precision': 0.71,
 'mean_recall': 0.69,
 'mean_f1': 0.693}

In [17]:
rbf_perform.get_score() # C= 100.0; gamma= 1.0

{'mean_accuracy': 0.712,
 'mean_precision': 0.711,
 'mean_recall': 0.691,
 'mean_f1': 0.694}

In [383]:
rbf_perform.get_score() # C= 1000.0; gamma= 1.0

{'mean_accuracy': 0.711,
 'mean_precision': 0.709,
 'mean_recall': 0.69,
 'mean_f1': 0.692}

In [390]:
rbf_perform.get_score() # C= 10,000.0; gamma= 1.0

{'mean_accuracy': 0.711,
 'mean_precision': 0.709,
 'mean_recall': 0.69,
 'mean_f1': 0.692}

##### 2.2.3 Performa SVM-Polinomial

In [279]:
poly_perform.get_score() # C= 1.0; gamma= 0.01; degree= 3

{'mean_accuracy': 0.43,
 'mean_precision': 0.766,
 'mean_recall': 0.387,
 'mean_f1': 0.288}

In [289]:
poly_perform.get_score() # C= 10.0; gamma= 0.01; degree= 3

{'mean_accuracy': 0.43,
 'mean_precision': 0.766,
 'mean_recall': 0.387,
 'mean_f1': 0.288}

In [299]:
poly_perform.get_score() # C= 100.0; gamma= 0.01; degree= 3

{'mean_accuracy': 0.43,
 'mean_precision': 0.766,
 'mean_recall': 0.387,
 'mean_f1': 0.288}

In [309]:
poly_perform.get_score() # C= 1000.0; gamma= 0.01; degree= 3

{'mean_accuracy': 0.43,
 'mean_precision': 0.766,
 'mean_recall': 0.387,
 'mean_f1': 0.288}

In [319]:
poly_perform.get_score() # C= 10,000.0; gamma= 0.01; degree= 3

{'mean_accuracy': 0.43,
 'mean_precision': 0.766,
 'mean_recall': 0.387,
 'mean_f1': 0.288}

In [328]:
poly_perform.get_score() # C= 1.0; gamma= 0.1; degree= 3

{'mean_accuracy': 0.43,
 'mean_precision': 0.766,
 'mean_recall': 0.387,
 'mean_f1': 0.288}

In [335]:
poly_perform.get_score() # C= 10.0; gamma= 0.1; degree= 3

{'mean_accuracy': 0.43,
 'mean_precision': 0.766,
 'mean_recall': 0.387,
 'mean_f1': 0.288}

In [342]:
poly_perform.get_score() # C= 100.0; gamma= 0.1; degree= 3

{'mean_accuracy': 0.44,
 'mean_precision': 0.769,
 'mean_recall': 0.395,
 'mean_f1': 0.302}

In [349]:
poly_perform.get_score() # C= 1000.0; gamma= 0.1; degree= 3

{'mean_accuracy': 0.567,
 'mean_precision': 0.652,
 'mean_recall': 0.528,
 'mean_f1': 0.511}

In [356]:
poly_perform.get_score() # C= 10,000.0; gamma= 0.1; degree= 3

{'mean_accuracy': 0.575,
 'mean_precision': 0.625,
 'mean_recall': 0.592,
 'mean_f1': 0.578}

In [363]:
poly_perform.get_score() # C= 1.0; gamma= 1.0; degree= 3

{'mean_accuracy': 0.567,
 'mean_precision': 0.652,
 'mean_recall': 0.528,
 'mean_f1': 0.511}

In [370]:
poly_perform.get_score() # C= 10.0; gamma= 1.0; degree= 3

{'mean_accuracy': 0.575,
 'mean_precision': 0.625,
 'mean_recall': 0.592,
 'mean_f1': 0.578}

In [377]:
poly_perform.get_score() # C= 100.0; gamma= 1.0; degree= 3

{'mean_accuracy': 0.576,
 'mean_precision': 0.625,
 'mean_recall': 0.593,
 'mean_f1': 0.578}

In [384]:
poly_perform.get_score() # C= 1000.0; gamma= 1.0; degree= 3

{'mean_accuracy': 0.575,
 'mean_precision': 0.626,
 'mean_recall': 0.592,
 'mean_f1': 0.577}

In [391]:
poly_perform.get_score() # C= 10,000.0; gamma= 1.0; degree= 3

{'mean_accuracy': 0.575,
 'mean_precision': 0.626,
 'mean_recall': 0.592,
 'mean_f1': 0.577}

In [395]:
poly_perform.get_score() # C= 1.0; gamma= 0.01; degree= 6

{'mean_accuracy': 0.428,
 'mean_precision': 0.773,
 'mean_recall': 0.384,
 'mean_f1': 0.281}

In [399]:
poly_perform.get_score() # C= 10.0; gamma= 0.01; degree= 6

{'mean_accuracy': 0.428,
 'mean_precision': 0.773,
 'mean_recall': 0.384,
 'mean_f1': 0.281}

In [403]:
poly_perform.get_score() # C= 100.0; gamma= 0.01; degree= 6

{'mean_accuracy': 0.428,
 'mean_precision': 0.773,
 'mean_recall': 0.384,
 'mean_f1': 0.281}

In [407]:
poly_perform.get_score() # C= 1000.0; gamma= 0.01; degree= 6

{'mean_accuracy': 0.428,
 'mean_precision': 0.773,
 'mean_recall': 0.384,
 'mean_f1': 0.281}

In [411]:
poly_perform.get_score() # C= 10,000.0; gamma= 0.01; degree= 6

{'mean_accuracy': 0.428,
 'mean_precision': 0.773,
 'mean_recall': 0.384,
 'mean_f1': 0.281}

In [415]:
poly_perform.get_score() # C= 1.0; gamma= 0.1; degree= 6

{'mean_accuracy': 0.428,
 'mean_precision': 0.773,
 'mean_recall': 0.384,
 'mean_f1': 0.281}

In [419]:
poly_perform.get_score() # C= 10.0; gamma= 0.1; degree= 6

{'mean_accuracy': 0.428,
 'mean_precision': 0.773,
 'mean_recall': 0.384,
 'mean_f1': 0.281}

In [423]:
poly_perform.get_score() # C= 100.0; gamma= 0.1; degree= 6

{'mean_accuracy': 0.428,
 'mean_precision': 0.773,
 'mean_recall': 0.384,
 'mean_f1': 0.281}

In [427]:
poly_perform.get_score() # C= 1000.0; gamma= 0.1; degree= 6

{'mean_accuracy': 0.428,
 'mean_precision': 0.773,
 'mean_recall': 0.384,
 'mean_f1': 0.281}

In [431]:
poly_perform.get_score() # C= 10,000.0; gamma= 0.1; degree= 6

{'mean_accuracy': 0.428,
 'mean_precision': 0.773,
 'mean_recall': 0.384,
 'mean_f1': 0.281}

In [435]:
poly_perform.get_score() # C= 1.0; gamma= 1.0; degree= 6

{'mean_accuracy': 0.491,
 'mean_precision': 0.7,
 'mean_recall': 0.445,
 'mean_f1': 0.386}

In [439]:
poly_perform.get_score() # C= 10.0; gamma= 1.0; degree= 6

{'mean_accuracy': 0.358,
 'mean_precision': 0.69,
 'mean_recall': 0.422,
 'mean_f1': 0.301}

In [443]:
poly_perform.get_score() # C= 100.0; gamma= 1.0; degree= 6

{'mean_accuracy': 0.358,
 'mean_precision': 0.69,
 'mean_recall': 0.422,
 'mean_f1': 0.301}

In [447]:
poly_perform.get_score() # C= 1000.0; gamma= 1.0; degree= 6

{'mean_accuracy': 0.358,
 'mean_precision': 0.69,
 'mean_recall': 0.422,
 'mean_f1': 0.301}

In [451]:
poly_perform.get_score() # C= 10,000.0; gamma= 1.0; degree= 6

{'mean_accuracy': 0.358,
 'mean_precision': 0.69,
 'mean_recall': 0.422,
 'mean_f1': 0.301}

In [456]:
poly_perform.get_score() # C= 1.0; gamma= 0.01; degree= 9

{'mean_accuracy': 0.55,
 'mean_precision': 0.559,
 'mean_recall': 0.535,
 'mean_f1': 0.531}

In [460]:
poly_perform.get_score() # C= 10.0; gamma= 0.01; degree= 9

{'mean_accuracy': 0.55,
 'mean_precision': 0.559,
 'mean_recall': 0.535,
 'mean_f1': 0.531}

In [465]:
poly_perform.get_score() # C= 100.0; gamma= 0.01; degree= 9

{'mean_accuracy': 0.428,
 'mean_precision': 0.773,
 'mean_recall': 0.383,
 'mean_f1': 0.28}

In [469]:
poly_perform.get_score() # C= 1000.0; gamma= 0.01; degree= 9

{'mean_accuracy': 0.429,
 'mean_precision': 0.774,
 'mean_recall': 0.385,
 'mean_f1': 0.283}

In [473]:
poly_perform.get_score() # C= 10,000.0; gamma= 0.01; degree= 9

{'mean_accuracy': 0.429,
 'mean_precision': 0.774,
 'mean_recall': 0.385,
 'mean_f1': 0.283}

In [477]:
poly_perform.get_score() # C= 1.0; gamma= 0.1; degree= 9

{'mean_accuracy': 0.429,
 'mean_precision': 0.774,
 'mean_recall': 0.385,
 'mean_f1': 0.283}

In [481]:
poly_perform.get_score() # C= 10.0; gamma= 0.1; degree= 9

{'mean_accuracy': 0.429,
 'mean_precision': 0.774,
 'mean_recall': 0.385,
 'mean_f1': 0.283}

In [485]:
poly_perform.get_score() # C= 100.0; gamma= 0.1; degree= 9

{'mean_accuracy': 0.429,
 'mean_precision': 0.774,
 'mean_recall': 0.385,
 'mean_f1': 0.283}

In [489]:
poly_perform.get_score() # C= 1000.0; gamma= 0.1; degree= 9

{'mean_accuracy': 0.429,
 'mean_precision': 0.774,
 'mean_recall': 0.385,
 'mean_f1': 0.283}

In [493]:
poly_perform.get_score() # C= 10,000.0; gamma= 0.1; degree= 9

{'mean_accuracy': 0.429,
 'mean_precision': 0.774,
 'mean_recall': 0.385,
 'mean_f1': 0.283}

In [497]:
poly_perform.get_score() # C= 1.0; gamma= 1.0; degree= 9

{'mean_accuracy': 0.478,
 'mean_precision': 0.723,
 'mean_recall': 0.432,
 'mean_f1': 0.365}

In [501]:
poly_perform.get_score() # C= 10.0; gamma= 1.0; degree= 9

{'mean_accuracy': 0.348,
 'mean_precision': 0.721,
 'mean_recall': 0.414,
 'mean_f1': 0.285}

In [17]:
poly_perform.get_score() # C= 100.0; gamma= 1.0; degree= 9

{'mean_accuracy': 0.349,
 'mean_precision': 0.719,
 'mean_recall': 0.414,
 'mean_f1': 0.285}

In [21]:
poly_perform.get_score() # C= 1000.0; gamma= 1.0; degree= 9

{'mean_accuracy': 0.349,
 'mean_precision': 0.719,
 'mean_recall': 0.414,
 'mean_f1': 0.285}

In [25]:
poly_perform.get_score() # C= 10,000.0; gamma= 1.0; degree= 9

{'mean_accuracy': 0.349,
 'mean_precision': 0.719,
 'mean_recall': 0.414,
 'mean_f1': 0.285}

***

### __3. Pemodelan SVM Tahap 2__

Beralih ke:
- [2. Pengukuran Performa Tahap 1](#2-pengukuran-performa-tahap-1)
- [4. Pengukuran Performa Tahap 2](#4-pengukuran-performa-tahap-2)

In [10]:
from sklearn.svm import SVC

In [38]:
linear_svm = SVC(kernel="linear")

In [39]:
linear_params = {
    "C": [
        1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0,
    ],
}

In [11]:
rbf_svm = SVC(kernel="rbf")

In [12]:
rbf_params = {
    "C": [
        10, 20, 40, 80, 160, 320, 640, 1000,
    ],
    "gamma": [
        0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.0,
    ],
}

In [27]:
poly_svm = SVC(kernel="poly")

In [28]:
poly_params = {
    "C": [
        10, 30, 90, 270, 810, 2430, 7290, 10000,
    ],
    "degree": [
        3,
    ],
    "gamma": [
        0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
    ]
}

### __4. Pengukuran Performa Tahap 2__

Beralih ke:
- [3. Pemodelan Tahap 2](#3-pemodelan-svm-tahap-2)
- [5. Pengujian](#5-pengujian)

In [13]:
from model.tuning import SVMGridSearchCV

In [40]:
n_fold = 5
scoring = ["accuracy", "precision", "recall", "f1",]
random_state = 42

In [41]:
linear_perform = SVMGridSearchCV(
    model= linear_svm,
    params= linear_params,
    cv= n_fold,
    scoring= scoring,
    scoring_avg= "macro",
)
linear_perform.random_state = random_state

In [15]:
rbf_perform = SVMGridSearchCV(
    model= rbf_svm,
    params= rbf_params,
    cv= n_fold,
    scoring= scoring,
    scoring_avg= "macro",
)
rbf_perform.random_state = random_state

In [29]:
poly_perform = SVMGridSearchCV(
    model= poly_svm,
    params= poly_params,
    cv= n_fold,
    scoring= scoring,
    scoring_avg= "macro",
)
poly_perform.random_state = random_state

#### 4.1 Pengukuran Performa Tahap 2: SVM + BOW

##### 4.1.1 Performa SVM-Linear

In [42]:
linear_perform.fit(X_train, y_train)

Total combination of parameters: 10


i=0|{'kernel': 'linear', 'C': 1.0}  {'mean_accuracy': 0.673, 'mean_precision': 0.663, 'mean_recall': 0.659, 'mean_f1': 0.655}
i=1|{'kernel': 'linear', 'C': 2.0}  {'mean_accuracy': 0.676, 'mean_precision': 0.666, 'mean_recall': 0.663, 'mean_f1': 0.661}
i=2|{'kernel': 'linear', 'C': 3.0}  {'mean_accuracy': 0.671, 'mean_precision': 0.661, 'mean_recall': 0.66, 'mean_f1': 0.658}
i=3|{'kernel': 'linear', 'C': 4.0}  {'mean_accuracy': 0.676, 'mean_precision': 0.666, 'mean_recall': 0.665, 'mean_f1': 0.664}
i=4|{'kernel': 'linear', 'C': 5.0}  {'mean_accuracy': 0.676, 'mean_precision': 0.665, 'mean_recall': 0.664, 'mean_f1': 0.663}
i=5|{'kernel': 'linear', 'C': 6.0}  {'mean_accuracy': 0.675, 'mean_precision': 0.664, 'mean_recall': 0.664, 'mean_f1': 0.663}
i=6|{'kernel': 'linear', 'C': 7.0}  {'mean_accuracy': 0.678, 'mean_precision': 0.666, 'mean_recall': 0.666, 'mean_f1': 0.665}
i=7|{'kernel': 'linear', 'C': 8.0}  {'mean_accuracy': 0.678, 'mean_precision': 0.666, 'mean_recall': 0.666, 'mean_f1': 

In [45]:
print("Best and worst results of Linear-SVM (base on accuracy):")
linear_perform.get_best_result(base_on="accuracy")
linear_perform.get_worst_result(base_on="accuracy")

Best and worst results of Linear-SVM (base on accuracy):


[{'params': {'kernel': 'linear', 'C': 7.0},
  'scores': {'mean_accuracy': 0.678,
   'mean_precision': 0.666,
   'mean_recall': 0.666,
   'mean_f1': 0.665}}]

[{'params': {'kernel': 'linear', 'C': 3.0},
  'scores': {'mean_accuracy': 0.671,
   'mean_precision': 0.661,
   'mean_recall': 0.66,
   'mean_f1': 0.658}}]

In [17]:
result_sequen = linear_perform.get_sequential_result()
result_table = pandas.DataFrame(result_sequen)
path = f"data/result/train_result_linear_bow.csv"
result_table.to_csv(path, index=False)

##### 4.1.2 Performa SVM-RBF

In [16]:
rbf_perform.fit(X_train, y_train)

Total combination of parameters: 64


i=0|{'kernel': 'rbf', 'C': 10, 'gamma': 0.01}  {'mean_accuracy': 0.657, 'mean_precision': 0.665, 'mean_recall': 0.641, 'mean_f1': 0.636}
i=1|{'kernel': 'rbf', 'C': 10, 'gamma': 0.02}  {'mean_accuracy': 0.668, 'mean_precision': 0.665, 'mean_recall': 0.653, 'mean_f1': 0.65}
i=2|{'kernel': 'rbf', 'C': 10, 'gamma': 0.04}  {'mean_accuracy': 0.676, 'mean_precision': 0.665, 'mean_recall': 0.659, 'mean_f1': 0.657}
i=3|{'kernel': 'rbf', 'C': 10, 'gamma': 0.08}  {'mean_accuracy': 0.673, 'mean_precision': 0.661, 'mean_recall': 0.654, 'mean_f1': 0.654}
i=4|{'kernel': 'rbf', 'C': 10, 'gamma': 0.16}  {'mean_accuracy': 0.675, 'mean_precision': 0.665, 'mean_recall': 0.654, 'mean_f1': 0.655}
i=5|{'kernel': 'rbf', 'C': 10, 'gamma': 0.32}  {'mean_accuracy': 0.661, 'mean_precision': 0.654, 'mean_recall': 0.632, 'mean_f1': 0.631}
i=6|{'kernel': 'rbf', 'C': 10, 'gamma': 0.64}  {'mean_accuracy': 0.619, 'mean_precision': 0.628, 'mean_recall': 0.58, 'mean_f1': 0.57}
i=7|{'kernel': 'rbf', 'C': 10, 'gamma': 1.0}

In [17]:
print("Best and worst results of RBF-SVM (base on accuracy):")
rbf_perform.get_best_result(base_on="accuracy")
rbf_perform.get_worst_result(base_on="accuracy")

Best and worst results of RBF-SVM (base on accuracy):


[{'params': {'kernel': 'rbf', 'C': 160, 'gamma': 0.02},
  'scores': {'mean_accuracy': 0.685,
   'mean_precision': 0.673,
   'mean_recall': 0.669,
   'mean_f1': 0.67}}]

[{'params': {'kernel': 'rbf', 'C': 320, 'gamma': 1.0},
  'scores': {'mean_accuracy': 0.566,
   'mean_precision': 0.592,
   'mean_recall': 0.524,
   'mean_f1': 0.506}}]

In [18]:
result_sequen = rbf_perform.get_sequential_result()
result_table = pandas.DataFrame(result_sequen)
path = f"data/result/train_result_rbf_bow.csv"
result_table.to_csv(path, index=False)

##### 4.1.3 Performa SVM-Polinomial

In [28]:
poly_perform.fit(X_train, y_train)

Total combination of parameters: 80


i=0|{'kernel': 'poly', 'C': 10, 'degree': 3, 'gamma': 0.1}  {'mean_accuracy': 0.383, 'mean_precision': 0.785, 'mean_recall': 0.357, 'mean_f1': 0.225}
i=1|{'kernel': 'poly', 'C': 10, 'degree': 3, 'gamma': 0.2}  {'mean_accuracy': 0.4, 'mean_precision': 0.713, 'mean_recall': 0.374, 'mean_f1': 0.259}
i=2|{'kernel': 'poly', 'C': 10, 'degree': 3, 'gamma': 0.3}  {'mean_accuracy': 0.441, 'mean_precision': 0.637, 'mean_recall': 0.417, 'mean_f1': 0.337}
i=3|{'kernel': 'poly', 'C': 10, 'degree': 3, 'gamma': 0.4}  {'mean_accuracy': 0.481, 'mean_precision': 0.612, 'mean_recall': 0.459, 'mean_f1': 0.408}
i=4|{'kernel': 'poly', 'C': 10, 'degree': 3, 'gamma': 0.5}  {'mean_accuracy': 0.496, 'mean_precision': 0.598, 'mean_recall': 0.474, 'mean_f1': 0.436}
i=5|{'kernel': 'poly', 'C': 10, 'degree': 3, 'gamma': 0.6}  {'mean_accuracy': 0.509, 'mean_precision': 0.593, 'mean_recall': 0.489, 'mean_f1': 0.457}
i=6|{'kernel': 'poly', 'C': 10, 'degree': 3, 'gamma': 0.7}  {'mean_accuracy': 0.526, 'mean_precision':

In [29]:
print("Best and worst results of RBF-SVM (base on accuracy):")
poly_perform.get_best_result(base_on="accuracy")
poly_perform.get_worst_result(base_on="accuracy")

Best and worst results of RBF-SVM (base on accuracy):


[{'params': {'kernel': 'poly', 'C': 270, 'degree': 3, 'gamma': 0.9},
  'scores': {'mean_accuracy': 0.596,
   'mean_precision': 0.589,
   'mean_recall': 0.577,
   'mean_f1': 0.569}}]

[{'params': {'kernel': 'poly', 'C': 10, 'degree': 3, 'gamma': 0.1},
  'scores': {'mean_accuracy': 0.383,
   'mean_precision': 0.785,
   'mean_recall': 0.357,
   'mean_f1': 0.225}}]

In [32]:
result_sequen = poly_perform.get_sequential_result()
result_table = pandas.DataFrame(result_sequen)
path = f"data/result/train_result_poly_bow.csv"
result_table.to_csv(path, index=False)

#### 4.2 Pengukuran Performa Tahap 2: SVM + TF-IDF

##### 4.2.1 Performa SVM-Linear

In [40]:
linear_perform.fit(X_train, y_train)

Total combination of parameters: 10
i=0|{'kernel': 'linear', 'C': 1.0}  {'mean_accuracy': 0.697, 'mean_precision': 0.686, 'mean_recall': 0.684, 'mean_f1': 0.685}
i=1|{'kernel': 'linear', 'C': 2.0}  {'mean_accuracy': 0.698, 'mean_precision': 0.688, 'mean_recall': 0.684, 'mean_f1': 0.685}
i=2|{'kernel': 'linear', 'C': 3.0}  {'mean_accuracy': 0.699, 'mean_precision': 0.69, 'mean_recall': 0.685, 'mean_f1': 0.687}
i=3|{'kernel': 'linear', 'C': 4.0}  {'mean_accuracy': 0.69, 'mean_precision': 0.681, 'mean_recall': 0.675, 'mean_f1': 0.677}
i=4|{'kernel': 'linear', 'C': 5.0}  {'mean_accuracy': 0.684, 'mean_precision': 0.675, 'mean_recall': 0.669, 'mean_f1': 0.671}
i=5|{'kernel': 'linear', 'C': 6.0}  {'mean_accuracy': 0.682, 'mean_precision': 0.674, 'mean_recall': 0.668, 'mean_f1': 0.67}
i=6|{'kernel': 'linear', 'C': 7.0}  {'mean_accuracy': 0.685, 'mean_precision': 0.677, 'mean_recall': 0.671, 'mean_f1': 0.673}
i=7|{'kernel': 'linear', 'C': 8.0}  {'mean_accuracy': 0.68, 'mean_precision': 0.672, 

In [41]:
print("Best and worst results of Linear-SVM (base on accuracy):")
linear_perform.get_best_result(base_on="accuracy")
linear_perform.get_worst_result(base_on="accuracy")

Best and worst results of Linear-SVM (base on accuracy):


[{'params': {'kernel': 'linear', 'C': 3.0},
  'scores': {'mean_accuracy': 0.699,
   'mean_precision': 0.69,
   'mean_recall': 0.685,
   'mean_f1': 0.687}}]

[{'params': {'kernel': 'linear', 'C': 10.0},
  'scores': {'mean_accuracy': 0.675,
   'mean_precision': 0.666,
   'mean_recall': 0.662,
   'mean_f1': 0.664}}]

In [42]:
result_sequen = linear_perform.get_sequential_result()
result_table = pandas.DataFrame(result_sequen)
path = f"data/result/train_result_linear_tfidf.csv"
result_table.to_csv(path, index=False)

##### 4.2.2 Performa SVM-RBF

In [16]:
rbf_perform.fit(X_train, y_train)

Total combination of parameters: 64


i=0|{'kernel': 'rbf', 'C': 10, 'gamma': 0.01}  {'mean_accuracy': 0.679, 'mean_precision': 0.68, 'mean_recall': 0.676, 'mean_f1': 0.674}
i=1|{'kernel': 'rbf', 'C': 10, 'gamma': 0.02}  {'mean_accuracy': 0.697, 'mean_precision': 0.686, 'mean_recall': 0.685, 'mean_f1': 0.685}
i=2|{'kernel': 'rbf', 'C': 10, 'gamma': 0.04}  {'mean_accuracy': 0.694, 'mean_precision': 0.684, 'mean_recall': 0.681, 'mean_f1': 0.682}
i=3|{'kernel': 'rbf', 'C': 10, 'gamma': 0.08}  {'mean_accuracy': 0.702, 'mean_precision': 0.693, 'mean_recall': 0.687, 'mean_f1': 0.688}
i=4|{'kernel': 'rbf', 'C': 10, 'gamma': 0.16}  {'mean_accuracy': 0.7, 'mean_precision': 0.691, 'mean_recall': 0.684, 'mean_f1': 0.686}
i=5|{'kernel': 'rbf', 'C': 10, 'gamma': 0.32}  {'mean_accuracy': 0.7, 'mean_precision': 0.692, 'mean_recall': 0.684, 'mean_f1': 0.686}
i=6|{'kernel': 'rbf', 'C': 10, 'gamma': 0.64}  {'mean_accuracy': 0.706, 'mean_precision': 0.701, 'mean_recall': 0.688, 'mean_f1': 0.691}
i=7|{'kernel': 'rbf', 'C': 10, 'gamma': 1.0}  

In [17]:
print("Best and worst results of RBF-SVM (base on accuracy):")
rbf_perform.get_best_result(base_on="accuracy")
rbf_perform.get_worst_result(base_on="accuracy")

Best and worst results of RBF-SVM (base on accuracy):


[{'params': {'kernel': 'rbf', 'C': 20, 'gamma': 1.0},
  'scores': {'mean_accuracy': 0.712,
   'mean_precision': 0.711,
   'mean_recall': 0.691,
   'mean_f1': 0.694}}]

[{'params': {'kernel': 'rbf', 'C': 1000, 'gamma': 0.02},
  'scores': {'mean_accuracy': 0.664,
   'mean_precision': 0.656,
   'mean_recall': 0.651,
   'mean_f1': 0.652}}]

In [18]:
result_sequen = rbf_perform.get_sequential_result()
result_table = pandas.DataFrame(result_sequen)
path = f"data/result/train_result_rbf_tfidf.csv"
result_table.to_csv(path, index=False)

##### 4.2.3 Performa SVM-Polinomial

In [30]:
poly_perform.fit(X_train, y_train)

Total combination of parameters: 80
i=0|{'kernel': 'poly', 'C': 10, 'degree': 3, 'gamma': 0.1}  {'mean_accuracy': 0.432, 'mean_precision': 0.758, 'mean_recall': 0.388, 'mean_f1': 0.291}
i=1|{'kernel': 'poly', 'C': 10, 'degree': 3, 'gamma': 0.2}  {'mean_accuracy': 0.434, 'mean_precision': 0.758, 'mean_recall': 0.39, 'mean_f1': 0.293}
i=2|{'kernel': 'poly', 'C': 10, 'degree': 3, 'gamma': 0.3}  {'mean_accuracy': 0.451, 'mean_precision': 0.75, 'mean_recall': 0.406, 'mean_f1': 0.322}
i=3|{'kernel': 'poly', 'C': 10, 'degree': 3, 'gamma': 0.4}  {'mean_accuracy': 0.5, 'mean_precision': 0.695, 'mean_recall': 0.456, 'mean_f1': 0.404}
i=4|{'kernel': 'poly', 'C': 10, 'degree': 3, 'gamma': 0.5}  {'mean_accuracy': 0.591, 'mean_precision': 0.641, 'mean_recall': 0.558, 'mean_f1': 0.554}
i=5|{'kernel': 'poly', 'C': 10, 'degree': 3, 'gamma': 0.6}  {'mean_accuracy': 0.603, 'mean_precision': 0.62, 'mean_recall': 0.594, 'mean_f1': 0.593}
i=6|{'kernel': 'poly', 'C': 10, 'degree': 3, 'gamma': 0.7}  {'mean_ac

In [31]:
print("Best and worst results of RBF-SVM (base on accuracy):")
poly_perform.get_best_result(base_on="accuracy")
poly_perform.get_worst_result(base_on="accuracy")

Best and worst results of RBF-SVM (base on accuracy):


[{'params': {'kernel': 'poly', 'C': 30, 'degree': 3, 'gamma': 0.4},
  'scores': {'mean_accuracy': 0.607,
   'mean_precision': 0.628,
   'mean_recall': 0.593,
   'mean_f1': 0.593}}]

[{'params': {'kernel': 'poly', 'C': 30, 'degree': 3, 'gamma': 0.1},
  'scores': {'mean_accuracy': 0.432,
   'mean_precision': 0.758,
   'mean_recall': 0.388,
   'mean_f1': 0.291}}]

In [32]:
result_sequen = poly_perform.get_sequential_result()
result_table = pandas.DataFrame(result_sequen)
path = f"data/result/train_result_poly_tfidf.csv"
result_table.to_csv(path, index=False)

***

### __5. Pengujian__

Beralih ke:
- [4. Pengukuran Performa Tahap 2](#4-pengukuran-performa-tahap-2)
- [Daftar Isi](#daftar-isi)

In [20]:
from sklearn.metrics import confusion_matrix, classification_report
from imblearn.over_sampling import SMOTE

In [21]:
smoter = SMOTE(random_state=random_state)
X_train_res, y_train_res = smoter.fit_resample(X_train, y_train)
svm = SVC()

#### 5.1 Pengujian SVM + BOW

##### 5.1.1 SVM-Linear

In [43]:
linear_best = linear_perform.get_best_result(base_on="accuracy")
linear_best_params = linear_best[0]["params"]
svm = svm.set_params(**linear_best_params)

svm.fit(X_train_res, y_train_res)
y_pred = svm.predict(X_test)

In [44]:
print(f"Confusion Matrix:\n {confusion_matrix(y_test, y_pred)}\n")
print(f"Report:\n {classification_report(y_test, y_pred)}")

Confusion Matrix:
 [[212 114  65]
 [ 23 332  32]
 [ 36  54 155]]

Report:
               precision    recall  f1-score   support

           0       0.78      0.54      0.64       391
           1       0.66      0.86      0.75       387
           2       0.62      0.63      0.62       245

    accuracy                           0.68      1023
   macro avg       0.69      0.68      0.67      1023
weighted avg       0.70      0.68      0.68      1023



##### 5.1.2 SVM-RBF

In [23]:
rbf_best = rbf_perform.get_best_result(base_on="accuracy")
rbf_best_params = rbf_best[0]["params"]
svm = svm.set_params(**rbf_best_params)

svm.fit(X_train_res, y_train_res)
y_pred = svm.predict(X_test)

In [24]:
print(f"Confusion Matrix:\n {confusion_matrix(y_test, y_pred)}\n")
print(f"Report:\n {classification_report(y_test, y_pred)}")

Confusion Matrix:
 [[298  53  40]
 [ 43 295  49]
 [ 72  37 136]]

Report:
               precision    recall  f1-score   support

           0       0.72      0.76      0.74       391
           1       0.77      0.76      0.76       387
           2       0.60      0.56      0.58       245

    accuracy                           0.71      1023
   macro avg       0.70      0.69      0.69      1023
weighted avg       0.71      0.71      0.71      1023



##### 5.1.3 SVM-Polinomial

In [30]:
poly_best = poly_perform.get_best_result(base_on="accuracy")
poly_best_params = poly_best[0]["params"]
svm = svm.set_params(**poly_best_params)

svm.fit(X_train_res, y_train_res)
y_pred = svm.predict(X_test)

In [31]:
print(f"Confusion Matrix:\n {confusion_matrix(y_test, y_pred)}\n")
print(f"Report:\n {classification_report(y_test, y_pred)}")

Confusion Matrix:
 [[228 121  42]
 [ 36 315  36]
 [ 81  58 106]]

Report:
               precision    recall  f1-score   support

           0       0.66      0.58      0.62       391
           1       0.64      0.81      0.72       387
           2       0.58      0.43      0.49       245

    accuracy                           0.63      1023
   macro avg       0.62      0.61      0.61      1023
weighted avg       0.63      0.63      0.63      1023



#### 5.2 Pengujian SVM + TF-IDF

##### 5.2.1 SVM-Linear

In [44]:
linear_best = linear_perform.get_best_result(base_on="accuracy")
linear_best_params = linear_best[0]["params"]
svm = svm.set_params(**linear_best_params)

svm.fit(X_train_res, y_train_res)
y_pred = svm.predict(X_test)

In [45]:
print(f"Confusion Matrix:\n {confusion_matrix(y_test, y_pred)}\n")
print(f"Report:\n {classification_report(y_test, y_pred)}")

Confusion Matrix:
 [[307  40  44]
 [ 57 293  37]
 [ 65  35 145]]

Report:
               precision    recall  f1-score   support

           0       0.72      0.79      0.75       391
           1       0.80      0.76      0.78       387
           2       0.64      0.59      0.62       245

    accuracy                           0.73      1023
   macro avg       0.72      0.71      0.71      1023
weighted avg       0.73      0.73      0.73      1023



##### 5.2.2 SVM-RBF

In [22]:
rbf_best = rbf_perform.get_best_result(base_on="accuracy")
rbf_best_params = rbf_best[0]["params"]
svm = svm.set_params(**rbf_best_params)

svm.fit(X_train_res, y_train_res)
y_pred = svm.predict(X_test)

In [23]:
print(f"Confusion Matrix:\n {confusion_matrix(y_test, y_pred)}\n")
print(f"Report:\n {classification_report(y_test, y_pred)}")

Confusion Matrix:
 [[317  45  29]
 [ 51 307  29]
 [ 81  38 126]]

Report:
               precision    recall  f1-score   support

           0       0.71      0.81      0.75       391
           1       0.79      0.79      0.79       387
           2       0.68      0.51      0.59       245

    accuracy                           0.73      1023
   macro avg       0.73      0.71      0.71      1023
weighted avg       0.73      0.73      0.73      1023



##### SVM-Polinomial

In [33]:
poly_best = poly_perform.get_best_result(base_on="accuracy")
poly_best_params = poly_best[0]["params"]
svm = svm.set_params(**poly_best_params)

svm.fit(X_train_res, y_train_res)
y_pred = svm.predict(X_test)

In [34]:
print(f"Confusion Matrix:\n {confusion_matrix(y_test, y_pred)}\n")
print(f"Report:\n {classification_report(y_test, y_pred)}")

Confusion Matrix:
 [[335  25  31]
 [124 207  56]
 [106  23 116]]

Report:
               precision    recall  f1-score   support

           0       0.59      0.86      0.70       391
           1       0.81      0.53      0.64       387
           2       0.57      0.47      0.52       245

    accuracy                           0.64      1023
   macro avg       0.66      0.62      0.62      1023
weighted avg       0.67      0.64      0.64      1023

