#### Daftar Isi:
- [1. Akusisi Data](#1-akusisi-data)
- [2. Pembentukan Kamus Slang](#2-pembentukan-kamus-slang)
- [3. Anotasi Data](#3-anotasi-data)
- [4. Prapemrosesan](#4-prapemrosesan)
- [5. Pemodelan SVM](#5-pemodelan-svm)
- [6. Pengukuran Performa](#6-pengukuran-performa)
    - [6.1 Pengukuran Performa Tahap 1](#61-pengukuran-performa-tahap-1)
    - [6.2 Pengukuran Performa Tahap 2](#62-pengukuran-performa-tahap-2)
- [7. Pengujian](#7-pengujian)

*__*Internal link tidak berfungsi di laman github.__*

***

In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

%load_ext watermark
%watermark -a "F. Waskito" -n -t -u -v

Author: F. Waskito

Last updated: Mon Jun 05 2023 21:26:49

Python implementation: CPython
Python version       : 3.9.16
IPython version      : 8.12.0



#### __1. Akusisi Data__

In [None]:
from collection.scrape import TweetScraper

scraper = TweetScraper(
    "depresi OR bipolar",
    "id",
    "2022-05-10",
    "2022-05-11",
)

scraper.scrape()

Scraping: 100%|██████████| 476/476 [00:39<00:00, 12.17it/s]


In [None]:
path = "data/tweet/scrape/depresi_or_bipolar_tweets_id_220510_with_irrelevant.csv"
scraper.tweets_table.to_csv(path, index=False)

In [None]:
print(f"Number of tweets before removal: {scraper.num_of_tweets}")
irrelevant_tweets_table = scraper.remove_irrelevant()
print(f"Number of tweets after removal: {scraper.num_of_tweets}")

Number of tweets before removal: 476


Removing irrelevant: 100%|██████████| 476/476 [07:44<00:00,  1.03it/s]

Number of tweets after removal: 428





In [None]:
path = "data/tweet/scrape/depresi_or_bipolar_tweets_id_220510_irrelevant.csv"
irrelevant_tweets_table.to_csv(path, index=False)

In [None]:
path = "data/scrape/depresi_or_bipolar_tweets_id_220510.csv"
scraper.tweets_table.to_csv(path, index=False)

#### __2. Pembentukan Kamus Slang__

Beralih ke:
- [1. Akusisi Data](#1-akusisi-data)
- [3. Anotasi Data](#3-anotasi-data)

In [1]:
import pandas

file_path = "data/tweet/scrape/depresi_or_bipolar_tweets_id_01-10.csv"
tweets_table = pandas.read_csv(file_path)
tweets_table.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3409 entries, 0 to 3408
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Tweet_ID  3409 non-null   int64 
 1   Datetime  3409 non-null   object
 2   Username  3409 non-null   object
 3   Text      3409 non-null   object
dtypes: int64(1), object(3)
memory usage: 106.7+ KB


In [2]:
from collection.slang.template import KamusSlangTemplate

template = KamusSlangTemplate(tweets_table['Text'])
template.create()

100%|██████████| 3409/3409 [4:22:04<00:00,  4.61s/it]  


__Catatan__: Karena dalam proses pencarian slang dibutuhkan parapemrosesan teks sampai di tahap *stemming* (menggunakan Sastrawi), ketidakefisienan waktu jadi konsekuensinya.

In [7]:
template.template.info()
template.template.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5357 entries, 0 to 5356
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Slang       5357 non-null   object 
 1   Makna       0 non-null      float64
 2   No_Konteks  5357 non-null   int64  
 3   Konteks     5357 non-null   object 
dtypes: float64(1), int64(1), object(2)
memory usage: 167.5+ KB


Unnamed: 0,Slang,Makna,No_Konteks,Konteks
0,ilux,,0,Padahal ilux baru KB/TK ya mana ngerti begitua...
1,kb,,0,Padahal ilux baru KB/TK ya mana ngerti begitua...
2,tk,,0,Padahal ilux baru KB/TK ya mana ngerti begitua...
3,ya,,0,Padahal ilux baru KB/TK ya mana ngerti begitua...
4,ngerti,,0,Padahal ilux baru KB/TK ya mana ngerti begitua...


In [9]:
tweets_table.tail(15)

Unnamed: 0,Slang,Makna,No_Konteks,Konteks
5342,mecicil,,3392,3:10 dan masih mecicil Goddamn bipolar ... Lol..
5343,goddamn,,3392,3:10 dan masih mecicil Goddamn bipolar ... Lol..
5344,pernqh,,3395,Insomnia dan depresi. Gak pernqh bisq tidur le...
5345,bisq,,3395,Insomnia dan depresi. Gak pernqh bisq tidur le...
5346,diseriusin,,3397,pantes idup lu pada depresi anime aja diserius...
5347,easier,,3398,"""Eren depresi krna ga cerita ke temen²nya, dan..."
5348,said,,3398,"""Eren depresi krna ga cerita ke temen²nya, dan..."
5349,than,,3398,"""Eren depresi krna ga cerita ke temen²nya, dan..."
5350,done,,3398,"""Eren depresi krna ga cerita ke temen²nya, dan..."
5351,isnt,,3398,"""Eren depresi krna ga cerita ke temen²nya, dan..."


In [None]:
path = "data/dictionary/kamus_slang.csv"
template.template.to_csv(path, index=False)

#### __3. Anotasi Data__

Beralih ke:
- [2. Pembentukan Kamus Slang](#2-pembentukan-kamus-slang)
- [4. Prapemrosesan](#4-prapemrosesan)

In [14]:
import pandas

path = "data/tweet/scrape/depresi_or_bipolar_tweets_id_01-10.csv"
tweets_table = pandas.read_csv(path)
tweets_table.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3409 entries, 0 to 3408
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Tweet_ID  3409 non-null   float64
 1   Datetime  3409 non-null   object 
 2   Username  3409 non-null   object 
 3   Text      3409 non-null   object 
dtypes: float64(1), object(3)
memory usage: 106.7+ KB


In [1]:
import pandas

path = "data/tweet/depresi_or_bipolar_tweets_id_01-10.csv"
tweets_table = pandas.read_csv(path)

In [2]:
from collection.annotation import BlobLabeler

anotator = BlobLabeler(tweets_table["Text"])
anotator.generate()

labeling: 100%|██████████| 3409/3409 [31:51<00:00,  1.78it/s]  


__Catatan__: Lama waktu proses pelabelan lebih dipengaruhi oleh dua faktor, koneksi internet dan versi Python. Pelabelan ini hampir 2 kali lebih cepat dari proses pelabelan sebelumnya. Menggunakan *dependecies* yang persis sama, proses sebelumnya memakan waktu di atas 55 menit ketika dilakukan di jam aktif dan melalui Python 3.10.x.

In [5]:
from collection import analysis

analysis.get_distribution(anotator.labels)

Distribution:
	('positive', 864)
	('neutral', 1253)
	('negative', 1292)


In [10]:
tweets_table["Sentiment"] = anotator.labels

tweets_table.info()
tweets_table.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3409 entries, 0 to 3408
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Tweet_ID   3409 non-null   float64
 1   Datetime   3409 non-null   object 
 2   Username   3409 non-null   object 
 3   Text       3409 non-null   object 
 4   Sentiment  3409 non-null   object 
dtypes: float64(1), object(4)
memory usage: 133.3+ KB


Unnamed: 0,Tweet_ID,Datetime,Username,Text,Sentiment
0,1.520554e+18,2022-05-01 00:00:12+00:00,yfnasa,Padahal ilux baru KB/TK ya mana ngerti begitua...,positive
1,1.520561e+18,2022-05-01 00:30:46+00:00,SoleilLumina,Et dah gw jadi ngefollow akun quotes depresi (...,neutral
2,1.520563e+18,2022-05-01 00:36:48+00:00,petitegeeky,"Rossy setahun di laut betah, cuma pas pulang a...",neutral
3,1.520565e+18,2022-05-01 00:43:40+00:00,raniapj,Sebenarnya aku jarang jbjb. Aku toh lagi stres...,neutral
4,1.520565e+18,2022-05-01 00:43:50+00:00,Jawaban,"Apakah kamu sedang banyak masalah, sampai-samp...",neutral


In [None]:
path = "data/tweet/depresi_or_bipolar_tweets_id_01-10.csv"
tweets_table.to_csv(path, index=False)

#### __4. Prapemrosesan__

Beralih ke:
- [3. Anotasi Data](#3-anotasi-data)
- [5. Pemodelan SVM](#5-pemodelan-svm)

In [1]:
from IPython.core.interactiveshell import InteractiveShell
import sklearnex

InteractiveShell.ast_node_interactivity = "all"
sklearnex.patch_sklearn()

%load_ext watermark
%watermark -a "F. Waskito" -n -t -u -v

Author: F. Waskito

Last updated: Mon Jun 05 2023 14:43:47

Python implementation: CPython
Python version       : 3.9.16
IPython version      : 8.12.0



Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


In [2]:
import pandas

file_path = "data/tweet/depresi_or_bipolar_tweets_id_01-10.csv"
tweets_table = pandas.read_csv(file_path)
tweets_table.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3409 entries, 0 to 3408
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Tweet_ID   3409 non-null   float64
 1   Datetime   3409 non-null   object 
 2   Username   3409 non-null   object 
 3   Text       3409 non-null   object 
 4   Sentiment  3409 non-null   object 
dtypes: float64(1), object(4)
memory usage: 133.3+ KB


In [4]:
from collection import analysis

texts = tweets_table.loc[:100, "Text"].copy().to_list()
labels = tweets_table.loc[:100, "Sentiment"].copy().to_list()

analysis.get_shape(texts)
analysis.get_distribution(labels)

Shape: (101,)
Distribution:
	('positive', 16)
	('neutral', 50)
	('negative', 35)


##### 4.1 Praoperasi Numerik

In [5]:
import time
from tqdm import tqdm
from preprocess.preprocessing import TextPreprocessor

preprocessor = TextPreprocessor()
for i, text in enumerate(tqdm(texts)):
    text = preprocessor.clean(text)
    text = preprocessor.standardize(text)
    tokens = preprocessor.tokenize(text)
    tokens = preprocessor.filter(tokens)
    texts[i] = preprocessor.stem(tokens)
    time.sleep(0.001)

100%|██████████| 101/101 [00:44<00:00,  2.28it/s]


##### 4.2 Ekstraksi Fitur

4.2.1 Ekstraksi Fitur dengan Bag of Words (BOW)

In [7]:
from preprocess.feature_extraction import TextVectorizer

In [64]:
extractor = TextVectorizer(texts)
extractor.transform(target="bow", min_df=1)
vector_texts = extractor.vectors

analysis.get_shape(vector_texts)

Shape: (3409, 4965)


In [99]:
vector_texts[4][1000:1050]
vector_texts[1151][1000:1050]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0], dtype=int64)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0], dtype=int64)

In [100]:
extractor.transform(target="bow", min_df=1, norm=True)
extractor.vectors[4][1000:1050]
extractor.vectors[1151][1000:1050]

array([0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.5, 0. , 0. ,
       0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
       0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
       0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ])

array([0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.25,
       0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ,
       0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ,
       0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ,
       0.  , 0.  , 0.  , 0.  , 0.  , 0.  ])

Vektor BOW representasi teks dokumen (cuitan) yang akan digunakan adalah vetor BOW yeng telah dilakukan dua proses sekuender lainnya yaitu:
- Normaliasi dengan penskalaan "Min-Max"
- Reduksi n (banyak) fitur menaikkan nilai DF (frekuensi kemunculan *term* t dalam dokuemen d) = 2. Dengan kata lain, setiap fitur (*term*/elemen di setiap vektor) yang hanya memiliki nilai pada 1 (satu) buah vektor, maka fitur tersebut akan dihapus.

In [8]:
extractor = TextVectorizer(texts)
extractor.transform(target="bow", min_df=2, norm=True)
vector_texts = extractor.vectors

analysis.get_shape(vector_texts)

Shape: (101, 136)


4.2.2 Ektraksi Fitur dengaN Term Frequency-Inverse Document Frequency (TF-IDF)

In [37]:
extractor = TextVectorizer(texts)
extractor.transform(target="tfidf", min_df=1)
vector_texts = extractor.vectors

analysis.get_shape(vector_texts)

Shape: (3409, 4965)


In [38]:
vector_texts[4][1000:1050]
vector_texts[1151][1000:1050]

array([0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       2.24158685, 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ])

array([0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       1.12079343, 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ])

In [39]:
extractor.transform(target="tfidf", min_df=1, norm=True)
extractor.vectors[4][1000:1050]
extractor.vectors[1151][1000:1050]

array([0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.11744302, 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ])

array([0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.03381436, 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ])

Vektor TF-IDF representasi teks dokumen (cuitan) yang akan digunakan adalah vetor TF-IDF yeng telah dilakukan dua proses sekuender lainnya yaitu:
- Normaliasi dengan penskalaan "L2" (Euclidean/Akar Kuadrat)
- Reduksi n (banyak) fitur sama sebagaiman yang juga dilakukan pada vektor BOW.

In [6]:
extractor = TextVectorizer(texts)
extractor.transform(target="tfidf", min_df=2, norm=True)
vector_texts = extractor.vectors

analysis.get_shape(vector_texts)

Shape: (3409, 2205)


##### 4.3 Transformasi Label

In [9]:
from preprocess.encoding import LabelEncoder

In [10]:
encoder = LabelEncoder(labels)
encoder.transform(target="integer")
encoded_labels = encoder.encoded_labels

analysis.get_distribution(encoded_labels)

Distribution:
	(2, 16)
	(1, 50)
	(0, 35)


##### 4.4 Seprasi Data Set

In [11]:
from sklearn.model_selection import train_test_split

In [12]:
X_train, X_test, y_train, y_test = train_test_split(
    vector_texts,
    encoded_labels,
    test_size=0.3
)

print("> Train set:")
analysis.get_shape(X_train)
analysis.get_distribution(y_train)
print("\n> Test set:")
analysis.get_shape(X_test)
analysis.get_distribution(y_test)

> Train set:
Shape: (70, 136)
Distribution:
	(1, 36)
	(0, 23)
	(2, 11)

> Test set:
Shape: (31, 136)
Distribution:
	(0, 12)
	(2, 5)
	(1, 14)


#### __5. Pemodelan SVM__

Beralih ke:
- [4. Prapemrosesan](#4-prapemrosesan)
- [6. Pengukuran Performa](#6-pengukuran-performa)

In [13]:
from sklearn.svm import SVC

In [39]:
c = 1.0
degree = 4
gamma = 0.1

In [16]:
linear_svm = SVC(kernel="linear", C=c)

In [30]:
poly_svm = SVC(kernel="poly", C=c, degree=degree, gamma=gamma)

In [14]:
rbf_svm = SVC(kernel="rbf", C=c, gamma=gamma)

#### __6. Pengukuran Performa__

Beralih ke:
- [5. Pemodelan SVM](#5-pemodelan-svm)
- [6.1 Pengukuran Performa Tahap 1](#61-pengukuran-performa-tahap-1)

In [17]:
from validation.cross import ImbalancedCV

In [41]:
n_fold = 5
scoring = ["accuracy", "precision", "recall", "f1",]
random_state = 42

In [23]:
linear_perform = ImbalancedCV(
    model = linear_svm,
    n_fold = n_fold,
    scoring = scoring,
    scoring_avg = "macro",
    random_state = random_state,
)

linear_perform.validate(X_train, y_train)

CV: 100%|██████████| 5/5 [00:15<00:00,  3.04s/it]


In [31]:
poly_perform = ImbalancedCV(
    model = poly_svm,
    n_fold = n_fold,
    scoring = scoring,
    scoring_avg = "macro",
    random_state = random_state,
)

poly_perform.validate(X_train, y_train)

CV: 100%|██████████| 5/5 [00:17<00:00,  3.45s/it]


In [42]:
rbf_perform = ImbalancedCV(
    model = rbf_svm,
    n_fold = n_fold,
    scoring = scoring,
    scoring_avg = "macro",
    random_state = random_state,
)

rbf_perform.validate(X_train, y_train)

CV: 100%|██████████| 5/5 [00:17<00:00,  3.52s/it]


***

##### 6.1 Pengukuran Performa Tahap 1

>Model Utama: (SVM + SMOTE) + BOW

Beralih ke:
- [6. Pengukuran Performa](#6-pengukuran-performa)
- [6.2 Pengukuran Performa Tahap 2](#62-pengukuran-performa-tahap-2)

6.1.1 Performa SVM-Linier

In [162]:
# linear_perform.get_score() # default params

{'mean_accuracy': 0.689,
 'mean_precison': 0.683,
 'mean_recall': 0.674,
 'mean_f1': 0.672}

In [30]:
linear_perform.get_score() # default params (trail 2)

{'mean_accuracy': 0.693,
 'mean_precison': 0.685,
 'mean_recall': 0.68,
 'mean_f1': 0.677}

In [178]:
linear_perform.get_score() # c= 2.0

{'mean_accuracy': 0.692,
 'mean_precison': 0.684,
 'mean_recall': 0.678,
 'mean_f1': 0.677}

In [51]:
linear_perform.get_score() # c= 2.0 (trail 2)

{'mean_accuracy': 0.699,
 'mean_precison': 0.688,
 'mean_recall': 0.685,
 'mean_f1': 0.684}

In [190]:
# linear_perform.get_score() # c= 5.0

{'mean_accuracy': 0.687,
 'mean_precison': 0.674,
 'mean_recall': 0.673,
 'mean_f1': 0.672}

In [56]:
linear_perform.get_score() # c= 5.0 (trail 2)

{'mean_accuracy': 0.698,
 'mean_precison': 0.686,
 'mean_recall': 0.685,
 'mean_f1': 0.684}

In [202]:
# linear_perform.get_score() # c= 7.0

{'mean_accuracy': 0.684,
 'mean_precison': 0.671,
 'mean_recall': 0.67,
 'mean_f1': 0.67}

In [66]:
linear_perform.get_score() # c= 7.0 (trail 2)

{'mean_accuracy': 0.699,
 'mean_precison': 0.688,
 'mean_recall': 0.686,
 'mean_f1': 0.686}

In [214]:
# linear_perform.get_score() # c= 10.0

{'mean_accuracy': 0.676,
 'mean_precison': 0.661,
 'mean_recall': 0.66,
 'mean_f1': 0.66}

In [71]:
linear_perform.get_score() # c= 10.0 (trail 2)

{'mean_accuracy': 0.693,
 'mean_precison': 0.682,
 'mean_recall': 0.679,
 'mean_f1': 0.679}

In [18]:
# linear_perform.get_score() # c= 15.0 <-- terbaik di Tahap 1

{'mean_accuracy': 0.693,
 'mean_precison': 0.68,
 'mean_recall': 0.679,
 'mean_f1': 0.678}

In [282]:
linear_perform.get_score() # c= 15.0 (trail 2)<-- terbaik di Tahap 1

{'mean_accuracy': 0.694,
 'mean_precison': 0.683,
 'mean_recall': 0.682,
 'mean_f1': 0.682}

In [23]:
# linear_perform.get_score() # c= 20.0

{'mean_accuracy': 0.682,
 'mean_precison': 0.668,
 'mean_recall': 0.667,
 'mean_f1': 0.667}

In [81]:
linear_perform.get_score() # c= 20.0 (trail 2)

{'mean_accuracy': 0.685,
 'mean_precison': 0.673,
 'mean_recall': 0.672,
 'mean_f1': 0.672}

In [28]:
# linear_perform.get_score() # c= 30.0

{'mean_accuracy': 0.677,
 'mean_precison': 0.664,
 'mean_recall': 0.663,
 'mean_f1': 0.663}

In [91]:
linear_perform.get_score() # c= 30.0 (trail 2)

{'mean_accuracy': 0.677,
 'mean_precison': 0.664,
 'mean_recall': 0.663,
 'mean_f1': 0.663}

6.1.2 Performa SVM-Polinomial

Beralih ke:
- [5. Pemodelan SVM](#5-pemodelan-svm)

In [163]:
# poly_perform.get_score() # default params

{'mean_accuracy': 0.394,
 'mean_precison': 0.754,
 'mean_recall': 0.364,
 'mean_f1': 0.242}

In [99]:
poly_perform.get_score() # default params (trail 2)

{'mean_accuracy': 0.395,
 'mean_precison': 0.754,
 'mean_recall': 0.366,
 'mean_f1': 0.244}

In [179]:
# poly_perform.get_score() # c= 2.0; degree= 4; gamma= 1.0

{'mean_accuracy': 0.465,
 'mean_precison': 0.619,
 'mean_recall': 0.437,
 'mean_f1': 0.377}

In [114]:
poly_perform.get_score() # c= 2.0; degree= 4; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.46,
 'mean_precison': 0.588,
 'mean_recall': 0.433,
 'mean_f1': 0.373}

In [191]:
# poly_perform.get_score() # c= 5.0; degree= 4; gamma= 1.0

{'mean_accuracy': 0.482,
 'mean_precison': 0.601,
 'mean_recall': 0.454,
 'mean_f1': 0.404}

In [122]:
poly_perform.get_score() # c= 5.0; degree= 4; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.474,
 'mean_precison': 0.567,
 'mean_recall': 0.448,
 'mean_f1': 0.396}

In [203]:
# poly_perform.get_score() # c= 7.0; degree= 5; gamma= 1.0

{'mean_accuracy': 0.456,
 'mean_precison': 0.615,
 'mean_recall': 0.429,
 'mean_f1': 0.365}

In [130]:
poly_perform.get_score() # c= 7.0; degree= 5; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.451,
 'mean_precison': 0.584,
 'mean_recall': 0.425,
 'mean_f1': 0.358}

In [215]:
# poly_perform.get_score() # c= 10.0; degree= 5; gamma= 1.0

{'mean_accuracy': 0.462,
 'mean_precison': 0.612,
 'mean_recall': 0.435,
 'mean_f1': 0.376}

In [138]:
poly_perform.get_score() # c= 10.0; degree= 5; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.455,
 'mean_precison': 0.581,
 'mean_recall': 0.429,
 'mean_f1': 0.366}

In [227]:
# poly_perform.get_score() # c= 2.0; degree= 4; gamma= 0.5

{'mean_accuracy': 0.401,
 'mean_precison': 0.737,
 'mean_recall': 0.371,
 'mean_f1': 0.255}

In [146]:
poly_perform.get_score() # c= 2.0; degree= 4; gamma= 0.5 (trail 2)

{'mean_accuracy': 0.399,
 'mean_precison': 0.688,
 'mean_recall': 0.37,
 'mean_f1': 0.254}

In [238]:
# poly_perform.get_score() # c= 2.0; degree= 4; gamma= 0.1

{'mean_accuracy': 0.372,
 'mean_precison': 0.573,
 'mean_recall': 0.34,
 'mean_f1': 0.193}

In [162]:
poly_perform.get_score() # c= 2.0; degree= 4; gamma= 0.1 (trail 2)

{'mean_accuracy': 0.37,
 'mean_precison': 0.5,
 'mean_recall': 0.336,
 'mean_f1': 0.186}

In [248]:
# poly_perform.get_score() # c= 5.0; degree= 4; gamma= 0.5

{'mean_accuracy': 0.418,
 'mean_precison': 0.668,
 'mean_recall': 0.39,
 'mean_f1': 0.292}

In [170]:
poly_perform.get_score() # c= 5.0; degree= 4; gamma= 0.5 (trail 2)

{'mean_accuracy': 0.419,
 'mean_precison': 0.637,
 'mean_recall': 0.392,
 'mean_f1': 0.297}

In [258]:
# poly_perform.get_score() # c= 5.0; degree= 4; gamma= 0.1

{'mean_accuracy': 0.373,
 'mean_precison': 0.573,
 'mean_recall': 0.34,
 'mean_f1': 0.194}

In [178]:
poly_perform.get_score() # c= 5.0; degree= 4; gamma= 0.1 (trail 2)

{'mean_accuracy': 0.373,
 'mean_precison': 0.701,
 'mean_recall': 0.34,
 'mean_f1': 0.194}

In [264]:
# poly_perform.get_score() # c= 2.0; degree= 6; gamma= 1.0

{'mean_accuracy': 0.428,
 'mean_precison': 0.638,
 'mean_recall': 0.401,
 'mean_f1': 0.314}

In [184]:
poly_perform.get_score() # c= 2.0; degree= 6; gamma= 1.0 (trial 2)

{'mean_accuracy': 0.419,
 'mean_precison': 0.6,
 'mean_recall': 0.39,
 'mean_f1': 0.296}

In [269]:
# poly_perform.get_score() # c= 5.0; degree= 6; gamma= 1.0

{'mean_accuracy': 0.432,
 'mean_precison': 0.616,
 'mean_recall': 0.405,
 'mean_f1': 0.322}

In [189]:
poly_perform.get_score() # c= 5.0; degree= 6; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.428,
 'mean_precison': 0.59,
 'mean_recall': 0.4,
 'mean_f1': 0.315}

In [276]:
# poly_perform.get_score() # c= 7.0; degree= 6; gamma= 1.0

{'mean_accuracy': 0.437,
 'mean_precison': 0.618,
 'mean_recall': 0.41,
 'mean_f1': 0.331}

In [194]:
poly_perform.get_score() # c= 7.0; degree= 6; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.432,
 'mean_precison': 0.591,
 'mean_recall': 0.404,
 'mean_f1': 0.322}

In [281]:
# poly_perform.get_score() # c= 10.0; degree= 6; gamma= 1.0

{'mean_accuracy': 0.445,
 'mean_precison': 0.628,
 'mean_recall': 0.418,
 'mean_f1': 0.345}

In [199]:
poly_perform.get_score() # c= 10.0; degree= 6; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.439,
 'mean_precison': 0.594,
 'mean_recall': 0.412,
 'mean_f1': 0.335}

In [290]:
# poly_perform.get_score() # c= 7.0; degree= 4; gamma= 1.0

{'mean_accuracy': 0.489,
 'mean_precison': 0.6,
 'mean_recall': 0.461,
 'mean_f1': 0.417}

In [204]:
poly_perform.get_score() # c= 7.0; degree= 4; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.481,
 'mean_precison': 0.562,
 'mean_recall': 0.455,
 'mean_f1': 0.407}

In [295]:
# poly_perform.get_score() # c= 10.0; degree= 4; gamma= 1.0

{'mean_accuracy': 0.499,
 'mean_precison': 0.588,
 'mean_recall': 0.472,
 'mean_f1': 0.434}

In [209]:
poly_perform.get_score() # c= 10.0; degree= 4; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.491,
 'mean_precison': 0.557,
 'mean_recall': 0.464,
 'mean_f1': 0.42}

In [307]:
# poly_perform.get_score() # c= 15.0; degree= 4; gamma= 1.0

{'mean_accuracy': 0.506,
 'mean_precison': 0.589,
 'mean_recall': 0.48,
 'mean_f1': 0.447}

In [214]:
poly_perform.get_score() # c= 15.0; degree= 4; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.498,
 'mean_precison': 0.559,
 'mean_recall': 0.472,
 'mean_f1': 0.431}

In [314]:
# poly_perform.get_score() # c= 20.0; degree= 4; gamma= 1.0

{'mean_accuracy': 0.52,
 'mean_precison': 0.595,
 'mean_recall': 0.494,
 'mean_f1': 0.466}

In [219]:
poly_perform.get_score() # c= 20.0; degree= 4; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.512,
 'mean_precison': 0.569,
 'mean_recall': 0.487,
 'mean_f1': 0.453}

In [319]:
# poly_perform.get_score() # c= 30.0; degree= 4; gamma= 1.0

{'mean_accuracy': 0.531,
 'mean_precison': 0.59,
 'mean_recall': 0.505,
 'mean_f1': 0.481}

In [223]:
poly_perform.get_score() # c= 30.0; degree= 4; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.522,
 'mean_precison': 0.572,
 'mean_recall': 0.497,
 'mean_f1': 0.467}

In [323]:
# poly_perform.get_score() # c= 50.0; degree= 4; gamma= 1.0

{'mean_accuracy': 0.543,
 'mean_precison': 0.585,
 'mean_recall': 0.517,
 'mean_f1': 0.497}

In [227]:
poly_perform.get_score() # c= 50.0; degree= 4; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.54,
 'mean_precison': 0.574,
 'mean_recall': 0.515,
 'mean_f1': 0.491}

In [328]:
# poly_perform.get_score() # c= 100.0; degree= 4; gamma= 1.0

{'mean_accuracy': 0.557,
 'mean_precison': 0.578,
 'mean_recall': 0.531,
 'mean_f1': 0.515}

In [231]:
poly_perform.get_score() # c= 100.0; degree= 4; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.568,
 'mean_precison': 0.595,
 'mean_recall': 0.546,
 'mean_f1': 0.529}

In [333]:
# poly_perform.get_score() # c= 200.0; degree= 4; gamma= 1.0

{'mean_accuracy': 0.561,
 'mean_precison': 0.57,
 'mean_recall': 0.535,
 'mean_f1': 0.521}

In [236]:
poly_perform.get_score() # c= 200.0; degree= 4; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.579,
 'mean_precison': 0.596,
 'mean_recall': 0.559,
 'mean_f1': 0.545}

In [338]:
# poly_perform.get_score() # c= 500.0; degree= 4; gamma= 1.0

{'mean_accuracy': 0.565,
 'mean_precison': 0.566,
 'mean_recall': 0.539,
 'mean_f1': 0.526}

In [245]:
poly_perform.get_score() # c= 500.0; degree= 4; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.588,
 'mean_precison': 0.595,
 'mean_recall': 0.568,
 'mean_f1': 0.557}

In [343]:
# poly_perform.get_score() # c= 1000.0; degree= 4; gamma= 1.0

{'mean_accuracy': 0.571,
 'mean_precison': 0.566,
 'mean_recall': 0.545,
 'mean_f1': 0.534}

In [250]:
poly_perform.get_score() # c= 1000.0; degree= 4; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.598,
 'mean_precison': 0.597,
 'mean_recall': 0.58,
 'mean_f1': 0.57}

In [348]:
# poly_perform.get_score() # c= 2000.0; degree= 4; gamma= 1.0

{'mean_accuracy': 0.574,
 'mean_precison': 0.563,
 'mean_recall': 0.547,
 'mean_f1': 0.538}

In [255]:
poly_perform.get_score() # c= 2000.0; degree= 4; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.596,
 'mean_precison': 0.591,
 'mean_recall': 0.579,
 'mean_f1': 0.57}

In [352]:
# poly_perform.get_score() # c= 5000.0; degree= 4; gamma= 1.0 <-- terbaik di Tahap 1

{'mean_accuracy': 0.58,
 'mean_precison': 0.564,
 'mean_recall': 0.552,
 'mean_f1': 0.544}

In [260]:
poly_perform.get_score() # c = 5000.0; degree = 4; gamma = 1.0 (trail 2)<-- terbaik di Tahap 1

{'mean_accuracy': 0.599,
 'mean_precison': 0.589,
 'mean_recall': 0.582,
 'mean_f1': 0.574}

In [357]:
# poly_perform.get_score() # c= 10000.0; degree= 4; gamma= 1.0

{'mean_accuracy': 0.575,
 'mean_precison': 0.555,
 'mean_recall': 0.549,
 'mean_f1': 0.542}

In [265]:
poly_perform.get_score() # c= 10000.0; degree= 4; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.597,
 'mean_precison': 0.587,
 'mean_recall': 0.581,
 'mean_f1': 0.574}

In [363]:
# poly_perform.get_score() # c= 12000.0; degree= 4; gamma= 1.0

{'mean_accuracy': 0.57,
 'mean_precison': 0.546,
 'mean_recall': 0.542,
 'mean_f1': 0.535}

In [270]:
poly_perform.get_score() # c= 12000.0; degree= 4; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.596,
 'mean_precison': 0.586,
 'mean_recall': 0.58,
 'mean_f1': 0.573}

6.1.3 Performa SVM-RBF

Beralih ke:
- [5. Pemodelan SVM](#5-pemodelan-svm)

In [164]:
# rbf_perform.get_score() # default params

{'mean_accuracy': 0.641,
 'mean_precison': 0.646,
 'mean_recall': 0.605,
 'mean_f1': 0.602}

In [100]:
rbf_perform.get_score() # default params (trail 2)

{'mean_accuracy': 0.648,
 'mean_precison': 0.653,
 'mean_recall': 0.615,
 'mean_f1': 0.613}

In [180]:
# rbf_perform.get_score() # c= 2.0; gamma= 1.0

{'mean_accuracy': 0.586,
 'mean_precison': 0.613,
 'mean_recall': 0.538,
 'mean_f1': 0.52}

In [115]:
rbf_perform.get_score() # c= 2.0; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.604,
 'mean_precison': 0.659,
 'mean_recall': 0.557,
 'mean_f1': 0.544}

In [192]:
# rbf_perform.get_score() # c= 5.0; gamma= 1.0

{'mean_accuracy': 0.591,
 'mean_precison': 0.625,
 'mean_recall': 0.541,
 'mean_f1': 0.524}

In [123]:
rbf_perform.get_score() # c= 5.0; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.605,
 'mean_precison': 0.661,
 'mean_recall': 0.558,
 'mean_f1': 0.545}

In [204]:
# rbf_perform.get_score() # c= 7.0; gamma= 1.0

{'mean_accuracy': 0.588,
 'mean_precison': 0.625,
 'mean_recall': 0.539,
 'mean_f1': 0.522}

In [131]:
rbf_perform.get_score() # c= 7.0; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.603,
 'mean_precison': 0.656,
 'mean_recall': 0.556,
 'mean_f1': 0.543}

In [216]:
# rbf_perform.get_score() # c= 10.0; gamma= 1.0

{'mean_accuracy': 0.586,
 'mean_precison': 0.622,
 'mean_recall': 0.537,
 'mean_f1': 0.52}

In [139]:
rbf_perform.get_score() # c= 10.0; gamma= 1.0 (trail 2)

{'mean_accuracy': 0.602,
 'mean_precison': 0.656,
 'mean_recall': 0.555,
 'mean_f1': 0.542}

In [228]:
# rbf_perform.get_score() # c= 2.0; gamma= 0.5

{'mean_accuracy': 0.633,
 'mean_precison': 0.644,
 'mean_recall': 0.592,
 'mean_f1': 0.586}

In [147]:
rbf_perform.get_score() # c= 2.0; gamma= 0.5 (trail 2)

{'mean_accuracy': 0.649,
 'mean_precison': 0.659,
 'mean_recall': 0.609,
 'mean_f1': 0.605}

In [239]:
# rbf_perform.get_score() # c= 2.0; gamma= 0.1

{'mean_accuracy': 0.681,
 'mean_precison': 0.679,
 'mean_recall': 0.662,
 'mean_f1': 0.661}

In [163]:
rbf_perform.get_score() # c= 2.0; gamma= 0.1 (trail 2)

{'mean_accuracy': 0.677,
 'mean_precison': 0.672,
 'mean_recall': 0.659,
 'mean_f1': 0.657}

In [249]:
# rbf_perform.get_score() # c= 5.0; gamma= 0.5

{'mean_accuracy': 0.636,
 'mean_precison': 0.645,
 'mean_recall': 0.595,
 'mean_f1': 0.589}

In [171]:
rbf_perform.get_score() # c= 5.0; gamma= 0.5 (trail 2)

{'mean_accuracy': 0.652,
 'mean_precison': 0.664,
 'mean_recall': 0.612,
 'mean_f1': 0.61}

In [259]:
# rbf_perform.get_score() # c= 5.0; gamma= 0.1

{'mean_accuracy': 0.692,
 'mean_precison': 0.687,
 'mean_recall': 0.673,
 'mean_f1': 0.673}

In [179]:
rbf_perform.get_score() # c= 5.0; gamma= 0.1 (trial 2)

{'mean_accuracy': 0.698,
 'mean_precison': 0.69,
 'mean_recall': 0.679,
 'mean_f1': 0.679}

In [32]:
# rbf_perform.get_score() # c= 10.0; gamma= 0.1 <-- terbaik di Tahap 1

{'mean_accuracy': 0.7,
 'mean_precison': 0.692,
 'mean_recall': 0.678,
 'mean_f1': 0.679}

In [275]:
rbf_perform.get_score() # c= 10.0; gamma= 0.1 (tiral 2)

{'mean_accuracy': 0.705,
 'mean_precison': 0.695,
 'mean_recall': 0.685,
 'mean_f1': 0.686}

In [308]:
# rbf_perform.get_score() # c = 15.0; gamma = 0.1

{'mean_accuracy': 0.583,
 'mean_precison': 0.616,
 'mean_recall': 0.535,
 'mean_f1': 0.517}

In [283]:
rbf_perform.get_score() # c= 15.0; gamma= 0.1 (trial 2)<-- terbaik di Tahap 1

{'mean_accuracy': 0.707,
 'mean_precison': 0.697,
 'mean_recall': 0.688,
 'mean_f1': 0.689}

In [315]:
# rbf_perform.get_score() # c = 20.0; gamma = 0.1

{'mean_accuracy': 0.582,
 'mean_precison': 0.613,
 'mean_recall': 0.534,
 'mean_f1': 0.517}

In [288]:
rbf_perform.get_score() # c = 20.0; gamma = 0.1 (tiral 2)

{'mean_accuracy': 0.706,
 'mean_precison': 0.696,
 'mean_recall': 0.687,
 'mean_f1': 0.688}

In [293]:
rbf_perform.get_score() # c = 20.0; gamma = 0.1 (tiral 2)

{'mean_accuracy': 0.702,
 'mean_precison': 0.69,
 'mean_recall': 0.682,
 'mean_f1': 0.683}

***

##### 6.2 Pengukuran Performa Tahap 2

>Model Utama: (SVM + SMOTE) + TF-DF

Beralih ke:
- [6.1 Pengukuran Performa Tahap 1](#61-pengukuran-performa-tahap-1)
- [7. Pengujian](#7-pengujian)

In [None]:
from sklearn.svm import SVC

In [90]:
c = 5.0
degree = 3
gamma = 0.5

In [66]:
linear_svm = SVC(kernel="linear", C=c)

In [52]:
poly_svm = SVC(kernel="poly", C=c, degree=degree, gamma=gamma)

In [91]:
rbf_svm = SVC(kernel="rbf", C=c, gamma=gamma)

In [None]:
from validation.cross import ImbalancedCV

In [87]:
n_fold = 5
scoring = ["accuracy", "precision", "recall", "f1"]
random_state = 42

In [68]:
linear_perform = ImbalancedCV(
    model = linear_svm,
    n_fold = n_fold,
    scoring = scoring,
    scoring_avg = "macro",
    random_state = random_state,
)

linear_perform.validate(X_train, y_train)

CV: 100%|██████████| 5/5 [00:17<00:00,  3.56s/it]


In [58]:
poly_perform = ImbalancedCV(
    model = poly_svm,
    n_fold = n_fold,
    scoring = scoring,
    scoring_avg = "macro",
    random_state = random_state,
)

poly_perform.validate(X_train, y_train)

CV: 100%|██████████| 5/5 [00:17<00:00,  3.42s/it]


In [92]:
rbf_perform = ImbalancedCV(
    model = rbf_svm,
    n_fold = n_fold,
    scoring = scoring,
    scoring_avg = "macro",
    random_state = random_state,
)

rbf_perform.validate(X_train, y_train)

CV: 100%|██████████| 5/5 [00:18<00:00,  3.80s/it]


In [69]:
linear_perform.get_score() # c= 2.0 (trail 3)

{'mean_accuracy': 0.728,
 'mean_precision': 0.72,
 'mean_recall': 0.718,
 'mean_f1': 0.717}

In [631]:
poly_perform.get_score() # c=2; degree= 3; gamma='scale' # <-- titik terbaik  di Tahap 2

{'mean_accuracy': 0.604,
 'mean_precison': 0.619,
 'mean_recall': 0.603,
 'mean_f1': 0.596}

In [59]:
poly_perform.get_score() # c=2; degree= 3; gamma='scale' (trail 3)

{'mean_accuracy': 0.612,
 'mean_precision': 0.624,
 'mean_recall': 0.605,
 'mean_f1': 0.601}

In [601]:
rbf_perform.get_score() # <-- titik konstan max rbf. gamma= 1.0 + C (trail 2)

{'mean_accuracy': 0.715,
 'mean_precison': 0.708,
 'mean_recall': 0.692,
 'mean_f1': 0.694}

In [79]:
rbf_perform.get_score() # c= 2.0, gamma= 1.0 (trail 3)

{'mean_accuracy': 0.72,
 'mean_precision': 0.716,
 'mean_recall': 0.702,
 'mean_f1': 0.703}

In [84]:
rbf_perform.get_score() # c= 2.0, gamma= 1.0 (trail 3)

{'mean_accuracy': 0.722,
 'mean_precision': 0.716,
 'mean_recall': 0.704,
 'mean_f1': 0.705}

6.2.1 Performa SVM-Linier

In [287]:
# linear_perform.get_score() # default params

{'mean_accuracy': 0.712,
 'mean_precison': 0.707,
 'mean_recall': 0.702,
 'mean_f1': 0.703}

In [302]:
linear_perform.get_score() # default params (trail 2)<-- trtbsik di Tahap 2

{'mean_accuracy': 0.714,
 'mean_precison': 0.702,
 'mean_recall': 0.699,
 'mean_f1': 0.7}

In [274]:
# linear_perform.get_score() # c= 2.0 <-- terbaik di tahap 2

{'mean_accuracy': 0.715,
 'mean_precison': 0.708,
 'mean_recall': 0.703,
 'mean_f1': 0.704}

In [307]:
linear_perform.get_score() # c= 2.0 (trail 2)

{'mean_accuracy': 0.712,
 'mean_precison': 0.702,
 'mean_recall': 0.697,
 'mean_f1': 0.699}

In [309]:
# linear_perform.get_score() # c= 5.0

{'mean_accuracy': 0.698,
 'mean_precison': 0.691,
 'mean_recall': 0.683,
 'mean_f1': 0.685}

In [312]:
linear_perform.get_score() # c= 5.0 (trail 2)

{'mean_accuracy': 0.703,
 'mean_precison': 0.693,
 'mean_recall': 0.689,
 'mean_f1': 0.69}

In [331]:
# linear_perform.get_score() # c= 7.0

{'mean_accuracy': 0.694,
 'mean_precison': 0.689,
 'mean_recall': 0.68,
 'mean_f1': 0.683}

In [317]:
linear_perform.get_score() # c= 7.0 (trail 2)

{'mean_accuracy': 0.694,
 'mean_precison': 0.685,
 'mean_recall': 0.68,
 'mean_f1': 0.681}

In [342]:
# linear_perform.get_score() # c= 10.0

{'mean_accuracy': 0.688,
 'mean_precison': 0.682,
 'mean_recall': 0.674,
 'mean_f1': 0.676}

In [322]:
linear_perform.get_score() # c= 10.0 (trail 2)

{'mean_accuracy': 0.689,
 'mean_precison': 0.68,
 'mean_recall': 0.674,
 'mean_f1': 0.676}

In [501]:
# linear_perform.get_score() # c= 15.0

{'mean_accuracy': 0.681,
 'mean_precison': 0.675,
 'mean_recall': 0.668,
 'mean_f1': 0.67}

In [327]:
linear_perform.get_score() # c= 15.0 (Trail 2)

{'mean_accuracy': 0.678,
 'mean_precison': 0.668,
 'mean_recall': 0.662,
 'mean_f1': 0.664}

In [506]:
# linear_perform.get_score() # c= 20.0

{'mean_accuracy': 0.68,
 'mean_precison': 0.674,
 'mean_recall': 0.667,
 'mean_f1': 0.669}

In [332]:
linear_perform.get_score() # c= 20.0 (trail 2)

{'mean_accuracy': 0.672,
 'mean_precison': 0.663,
 'mean_recall': 0.657,
 'mean_f1': 0.658}

In [511]:
# linear_perform.get_score() # c= 30.0

{'mean_accuracy': 0.676,
 'mean_precison': 0.668,
 'mean_recall': 0.663,
 'mean_f1': 0.665}

In [336]:
linear_perform.get_score() # c= 30.0 (trail 2)

{'mean_accuracy': 0.668,
 'mean_precison': 0.658,
 'mean_recall': 0.652,
 'mean_f1': 0.654}

6.2.2 Performa SVM-Polinomial

Beralih ke:
- [6.2 Pengukuran Performa Tahap 2](#62-pengukuran-performa-tahap-2)
- [7. Pengujian](#7-pengujian)

In [288]:
# poly_perform.get_score() # default params <-- terbaik di Tahap 2

{'mean_accuracy': 0.595,
 'mean_precison': 0.658,
 'mean_recall': 0.558,
 'mean_f1': 0.549}

In [343]:
poly_perform.get_score() # default params (trail 2) <-- terbaik di Tahap 2

{'mean_accuracy': 0.584,
 'mean_precison': 0.627,
 'mean_recall': 0.548,
 'mean_f1': 0.535}

In [275]:
# poly_perform.get_score() # c= 2.0, degree= 4, gamma= 1.0

{'mean_accuracy': 0.57,
 'mean_precison': 0.635,
 'mean_recall': 0.57,
 'mean_f1': 0.557}

In [351]:
poly_perform.get_score() # c= 2.0, degree= 4, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.544,
 'mean_precison': 0.622,
 'mean_recall': 0.568,
 'mean_f1': 0.543}

In [310]:
# poly_perform.get_score() # c= 5.0, degree= 4, gamma= 1.0

{'mean_accuracy': 0.533,
 'mean_precison': 0.697,
 'mean_recall': 0.488,
 'mean_f1': 0.454}

In [359]:
poly_perform.get_score() # c= 5.0, degree= 4, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.415,
 'mean_precison': 0.658,
 'mean_recall': 0.473,
 'mean_f1': 0.39}

In [332]:
# poly_perform.get_score() # c= 7.0, degree= 5, gamma= 1.0

{'mean_accuracy': 0.518,
 'mean_precison': 0.736,
 'mean_recall': 0.47,
 'mean_f1': 0.425}

In [367]:
poly_perform.get_score() # c= 7.0, degree= 5, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.368,
 'mean_precison': 0.67,
 'mean_recall': 0.434,
 'mean_f1': 0.321}

In [343]:
# poly_perform.get_score() # c= 10.0, degree= 5, gamma= 1.0

{'mean_accuracy': 0.518,
 'mean_precison': 0.736,
 'mean_recall': 0.47,
 'mean_f1': 0.425}

In [382]:
poly_perform.get_score() # c= 10.0, degree= 5, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.368,
 'mean_precison': 0.675,
 'mean_recall': 0.434,
 'mean_f1': 0.322}

In [367]:
# poly_perform.get_score() # c= 2.0, degree= 4, gamma= 0.5

{'mean_accuracy': 0.425,
 'mean_precison': 0.766,
 'mean_recall': 0.378,
 'mean_f1': 0.273}

In [390]:
poly_perform.get_score() # c= 2.0, degree= 4, gamma= 0.5 (trail 2)

{'mean_accuracy': 0.431,
 'mean_precison': 0.779,
 'mean_recall': 0.388,
 'mean_f1': 0.288}

In [359]:
# poly_perform.get_score() # c= 2.0, degree= 4, gamma= 0.1

{'mean_accuracy': 0.425,
 'mean_precison': 0.766,
 'mean_recall': 0.378,
 'mean_f1': 0.272}

In [398]:
poly_perform.get_score() # c= 2.0, degree= 4, gamma= 0.1 (trail 2)

{'mean_accuracy': 0.421,
 'mean_precison': 0.774,
 'mean_recall': 0.379,
 'mean_f1': 0.273}

In [375]:
# poly_perform.get_score() # c= 5.0, degree= 4, gamma= 0.5

{'mean_accuracy': 0.425,
 'mean_precison': 0.766,
 'mean_recall': 0.378,
 'mean_f1': 0.273}

In [406]:
poly_perform.get_score() # c= 5.0, degree= 4, gamma= 0.5 (trail 2)

{'mean_accuracy': 0.447,
 'mean_precison': 0.755,
 'mean_recall': 0.404,
 'mean_f1': 0.318}

In [383]:
# poly_perform.get_score() # c= 5.0, degree= 4, gamma= 0.1

{'mean_accuracy': 0.425,
 'mean_precison': 0.766,
 'mean_recall': 0.378,
 'mean_f1': 0.272}

In [414]:
poly_perform.get_score() # c= 5.0, degree= 4, gamma= 0.1 (trail 2)

{'mean_accuracy': 0.421,
 'mean_precison': 0.774,
 'mean_recall': 0.379,
 'mean_f1': 0.273}

In [389]:
# poly_perform.get_score() # c= 2.0, degree= 6, gamma= 1.0

{'mean_accuracy': 0.503,
 'mean_precison': 0.754,
 'mean_recall': 0.456,
 'mean_f1': 0.404}

In [420]:
poly_perform.get_score() # c= 2.0, degree= 6, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.394,
 'mean_precison': 0.646,
 'mean_recall': 0.455,
 'mean_f1': 0.364}

In [394]:
# poly_perform.get_score() # c= 5.0, degree= 6, gamma= 1.0

{'mean_accuracy': 0.503,
 'mean_precison': 0.754,
 'mean_recall': 0.456,
 'mean_f1': 0.404}

In [425]:
poly_perform.get_score() # c= 5.0, degree= 6, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.352,
 'mean_precison': 0.678,
 'mean_recall': 0.42,
 'mean_f1': 0.299}

In [399]:
# poly_perform.get_score() # c= 7.0, degree= 6, gamma= 1.0

{'mean_accuracy': 0.503,
 'mean_precison': 0.754,
 'mean_recall': 0.456,
 'mean_f1': 0.404}

In [430]:
poly_perform.get_score() # c= 7.0, degree= 6, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.348,
 'mean_precison': 0.698,
 'mean_recall': 0.417,
 'mean_f1': 0.292}

In [404]:
# poly_perform.get_score() # c= 10.0, degree= 6, gamma= 1.0

{'mean_accuracy': 0.503,
 'mean_precison': 0.754,
 'mean_recall': 0.456,
 'mean_f1': 0.404}

In [435]:
poly_perform.get_score() # c= 10.0, degree= 6, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.346,
 'mean_precison': 0.692,
 'mean_recall': 0.415,
 'mean_f1': 0.289}

In [409]:
# poly_perform.get_score() # c= 7.0, degree= 4, gamma= 1.0

{'mean_accuracy': 0.533,
 'mean_precison': 0.697,
 'mean_recall': 0.488,
 'mean_f1': 0.454}

In [440]:
poly_perform.get_score() # c= 7.0, degree= 4, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.412,
 'mean_precison': 0.659,
 'mean_recall': 0.47,
 'mean_f1': 0.385}

In [414]:
# poly_perform.get_score() # c= 10.0, degree= 4, gamma= 1.0

{'mean_accuracy': 0.533,
 'mean_precison': 0.697,
 'mean_recall': 0.488,
 'mean_f1': 0.454}

In [447]:
poly_perform.get_score() # c= 10.0, degree= 4, gamma= 1.0 (trial 2)

{'mean_accuracy': 0.412,
 'mean_precison': 0.657,
 'mean_recall': 0.47,
 'mean_f1': 0.385}

In [419]:
# poly_perform.get_score() # c= 15.0, degree= 4, gamma= 1.0

{'mean_accuracy': 0.533,
 'mean_precison': 0.697,
 'mean_recall': 0.488,
 'mean_f1': 0.454}

In [452]:
poly_perform.get_score() # c= 15.0, degree= 4, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.414,
 'mean_precison': 0.656,
 'mean_recall': 0.472,
 'mean_f1': 0.387}

In [424]:
# poly_perform.get_score() # c= 20.0, degree= 4, gamma= 1.0

{'mean_accuracy': 0.533,
 'mean_precison': 0.697,
 'mean_recall': 0.488,
 'mean_f1': 0.454}

In [457]:
poly_perform.get_score() # c= 20.0, degree= 4, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.415,
 'mean_precison': 0.656,
 'mean_recall': 0.473,
 'mean_f1': 0.388}

In [429]:
# poly_perform.get_score() # c= 30.0, degree= 4, gamma= 1.0

{'mean_accuracy': 0.533,
 'mean_precison': 0.697,
 'mean_recall': 0.488,
 'mean_f1': 0.454}

In [462]:
poly_perform.get_score() # c= 30.0, degree= 4, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.418,
 'mean_precison': 0.65,
 'mean_recall': 0.475,
 'mean_f1': 0.39}

In [434]:
poly_perform.get_score() # c= 50.0, degree= 4, gamma= 1.0

{'mean_accuracy': 0.533,
 'mean_precison': 0.697,
 'mean_recall': 0.488,
 'mean_f1': 0.454}

In [467]:
# poly_perform.get_score() # c= 50.0, degree= 4, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.423,
 'mean_precison': 0.629,
 'mean_recall': 0.475,
 'mean_f1': 0.392}

In [439]:
# poly_perform.get_score() # c= 100.0, degree= 4, gamma= 1.0

{'mean_accuracy': 0.533,
 'mean_precison': 0.697,
 'mean_recall': 0.488,
 'mean_f1': 0.454}

In [472]:
poly_perform.get_score() # c= 100.0, degree= 4, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.418,
 'mean_precison': 0.628,
 'mean_recall': 0.435,
 'mean_f1': 0.356}

In [444]:
# poly_perform.get_score() # c= 200.0, degree= 4, gamma= 1.0

{'mean_accuracy': 0.533,
 'mean_precison': 0.697,
 'mean_recall': 0.488,
 'mean_f1': 0.454}

In [477]:
poly_perform.get_score() # c= 200.0, degree= 4, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.422,
 'mean_precison': 0.65,
 'mean_recall': 0.434,
 'mean_f1': 0.349}

In [450]:
# poly_perform.get_score() # c= 500.0, degree= 4, gamma= 1.0

{'mean_accuracy': 0.533,
 'mean_precison': 0.697,
 'mean_recall': 0.488,
 'mean_f1': 0.454}

In [482]:
poly_perform.get_score() # c= 500.0, degree= 4, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.423,
 'mean_precison': 0.633,
 'mean_recall': 0.407,
 'mean_f1': 0.321}

In [455]:
# poly_perform.get_score() # c= 1000.0, degree= 4, gamma= 1.0

{'mean_accuracy': 0.533,
 'mean_precison': 0.697,
 'mean_recall': 0.488,
 'mean_f1': 0.454}

In [487]:
poly_perform.get_score() # c= 1000.0, degree= 4, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.412,
 'mean_precison': 0.677,
 'mean_recall': 0.391,
 'mean_f1': 0.291}

In [460]:
# poly_perform.get_score() # c= 2000.0, degree= 4, gamma= 1.0

{'mean_accuracy': 0.533,
 'mean_precison': 0.697,
 'mean_recall': 0.488,
 'mean_f1': 0.454}

In [492]:
poly_perform.get_score() # c= 2000.0, degree= 4, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.409,
 'mean_precison': 0.68,
 'mean_recall': 0.387,
 'mean_f1': 0.285}

In [465]:
poly_perform.get_score() # c= 5000.0, degree= 4, gamma= 1.0

{'mean_accuracy': 0.533,
 'mean_precison': 0.697,
 'mean_recall': 0.488,
 'mean_f1': 0.454}

In [497]:
# poly_perform.get_score() # c= 5000.0, degree= 4, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.409,
 'mean_precison': 0.68,
 'mean_recall': 0.387,
 'mean_f1': 0.285}

In [472]:
poly_perform.get_score() # c= 10000.0, degree= 4, gamma= 1.0

{'mean_accuracy': 0.533,
 'mean_precison': 0.697,
 'mean_recall': 0.488,
 'mean_f1': 0.454}

In [502]:
# poly_perform.get_score() # c= 10000.0, degree= 4, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.409,
 'mean_precison': 0.68,
 'mean_recall': 0.387,
 'mean_f1': 0.285}

In [476]:
# poly_perform.get_score() # c= 12000.0, degree= 4, gamma= 1.0

{'mean_accuracy': 0.533,
 'mean_precison': 0.697,
 'mean_recall': 0.488,
 'mean_f1': 0.454}

6.2.3 Performa SVM-RBF

In [507]:
poly_perform.get_score() # c= 12000.0, degree= 4, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.409,
 'mean_precison': 0.68,
 'mean_recall': 0.387,
 'mean_f1': 0.285}

Beralih ke:
- [6.2 Pengukuran Performa Tahap 2](#62-pengukuran-performa-tahap-2)

In [289]:
# rbf_perform.get_score() # default params

{'mean_accuracy': 0.707,
 'mean_precison': 0.718,
 'mean_recall': 0.679,
 'mean_f1': 0.682}

In [344]:
rbf_perform.get_score() # default params (trail 2)

{'mean_accuracy': 0.71,
 'mean_precison': 0.712,
 'mean_recall': 0.677,
 'mean_f1': 0.677}

In [276]:
# rbf_perform.get_score() # c= 2.0, gamma= 1.0 <-- terbaik di Tahap 2

{'mean_accuracy': 0.723,
 'mean_precison': 0.721,
 'mean_recall': 0.703,
 'mean_f1': 0.706}

In [352]:
rbf_perform.get_score() # c= 2.0, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.719,
 'mean_precison': 0.712,
 'mean_recall': 0.695,
 'mean_f1': 0.698}

In [311]:
# rbf_perform.get_score() # c= 5.0, gamma= 1.0

{'mean_accuracy': 0.722,
 'mean_precison': 0.719,
 'mean_recall': 0.703,
 'mean_f1': 0.706}

In [360]:
rbf_perform.get_score() # c= 5.0, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.714,
 'mean_precison': 0.706,
 'mean_recall': 0.691,
 'mean_f1': 0.693}

In [333]:
# rbf_perform.get_score() # c= 7.0, gamma= 1.0

{'mean_accuracy': 0.722,
 'mean_precison': 0.719,
 'mean_recall': 0.702,
 'mean_f1': 0.705}

In [368]:
rbf_perform.get_score() # c= 7.0, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.714,
 'mean_precison': 0.706,
 'mean_recall': 0.69,
 'mean_f1': 0.693}

In [344]:
# rbf_perform.get_score() # c= 10.0, gamma= 1.0

{'mean_accuracy': 0.721,
 'mean_precison': 0.718,
 'mean_recall': 0.701,
 'mean_f1': 0.705}

In [383]:
rbf_perform.get_score() # c= 10.0, gamma= 1.0 (trail 2)

{'mean_accuracy': 0.714,
 'mean_precison': 0.706,
 'mean_recall': 0.69,
 'mean_f1': 0.693}

In [368]:
# rbf_perform.get_score() # c= 2.0, gamma= 0.5

{'mean_accuracy': 0.722,
 'mean_precison': 0.718,
 'mean_recall': 0.707,
 'mean_f1': 0.71}

In [391]:
rbf_perform.get_score() # c= 2.0, gamma= 0.5 (trail 2)

{'mean_accuracy': 0.717,
 'mean_precison': 0.708,
 'mean_recall': 0.697,
 'mean_f1': 0.699}

In [360]:
# rbf_perform.get_score() # c= 2.0, gamma= 0.1

{'mean_accuracy': 0.702,
 'mean_precison': 0.695,
 'mean_recall': 0.69,
 'mean_f1': 0.691}

In [399]:
rbf_perform.get_score() # c= 2.0, gamma= 0.1 (trail 2)

{'mean_accuracy': 0.703,
 'mean_precison': 0.69,
 'mean_recall': 0.686,
 'mean_f1': 0.687}

In [376]:
# rbf_perform.get_score() # c= 5.0, gamma= 0.5

{'mean_accuracy': 0.721,
 'mean_precison': 0.715,
 'mean_recall': 0.706,
 'mean_f1': 0.709}

In [407]:
rbf_perform.get_score() # c= 5.0, gamma= 0.5 (trail 2)

{'mean_accuracy': 0.718,
 'mean_precison': 0.708,
 'mean_recall': 0.699,
 'mean_f1': 0.701}

In [384]:
# rbf_perform.get_score() # c= 5.0, gamma= 0.1

{'mean_accuracy': 0.714,
 'mean_precison': 0.709,
 'mean_recall': 0.702,
 'mean_f1': 0.704}

In [415]:
rbf_perform.get_score() # c= 5.0, gamma= 0.1 (trail 2)

{'mean_accuracy': 0.716,
 'mean_precison': 0.705,
 'mean_recall': 0.699,
 'mean_f1': 0.701}

In [481]:
# rbf_perform.get_score() # c= 10.0, gamma= 0.1

{'mean_accuracy': 0.718,
 'mean_precison': 0.713,
 'mean_recall': 0.704,
 'mean_f1': 0.707}

In [512]:
rbf_perform.get_score() # c= 10.0, gamma= 0.1 (trail 2)

{'mean_accuracy': 0.711,
 'mean_precison': 0.7,
 'mean_recall': 0.695,
 'mean_f1': 0.697}

In [486]:
# rbf_perform.get_score() # c= 15.0, gamma= 0.1

{'mean_accuracy': 0.711,
 'mean_precison': 0.706,
 'mean_recall': 0.697,
 'mean_f1': 0.7}

In [517]:
rbf_perform.get_score() # c= 15.0, gamma= 0.1 (trail 2)

{'mean_accuracy': 0.703,
 'mean_precison': 0.692,
 'mean_recall': 0.687,
 'mean_f1': 0.688}

In [491]:
# rbf_perform.get_score() # c= 20.0, gamma= 0.1

{'mean_accuracy': 0.71,
 'mean_precison': 0.705,
 'mean_recall': 0.695,
 'mean_f1': 0.698}

In [522]:
rbf_perform.get_score() # c= 20.0, gamma= 0.1 (trail 2)

{'mean_accuracy': 0.7,
 'mean_precison': 0.689,
 'mean_recall': 0.685,
 'mean_f1': 0.686}

In [496]:
# rbf_perform.get_score() # c= 30.0, gamma= 0.1

{'mean_accuracy': 0.703,
 'mean_precison': 0.697,
 'mean_recall': 0.689,
 'mean_f1': 0.691}

In [527]:
rbf_perform.get_score() # c= 30.0, gamma= 0.1 (trail 2)

{'mean_accuracy': 0.697,
 'mean_precison': 0.687,
 'mean_recall': 0.682,
 'mean_f1': 0.683}

In [532]:
rbf_perform.get_score() # c= 2.0, gamma= 'scale' (trail 2)<-- terbaik di Tahap 2

{'mean_accuracy': 0.72,
 'mean_precison': 0.713,
 'mean_recall': 0.696,
 'mean_f1': 0.698}

#### __7. Pengujian__

Beralih ke:
- [6.2 Pengukuran Performa Tahap 2](#62-pengukuran-performa-tahap-2)
- [Daftar Isi](#daftar-isi)

In [140]:
from sklearn.metrics import confusion_matrix, classification_report
from imblearn.over_sampling import SMOTE

In [None]:
smoter = SMOTE()
balanced_X_train, balanced_y_train = smoter.fit_resample(
    X_train,
    y_train
)

analysis.get_shape(X_train)
analysis.get_distribution(y_train)
analysis.get_shape(balanced_X_train)
analysis.get_distribution(balanced_y_train)

In [None]:
# print(f"Confusion Matrix:\n {confusion_matrix(y_test, )}\n")
# print(f"Report:\n {classification_report(y_test, )}")
