<a href="https://colab.research.google.com/github/fdhliakbar/IR-Lab/blob/main/Vector_Space_Model_Pertemuan5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Praktikum 4: Pembobotan TF-IDF

<img src="https://www.infront.com/wp-content/uploads/2021/01/tf%E2%80%93idf-1080.jpg" alt="TF IDF Image"/>

Dalam bidang Information Retrieval (IR), kita sering ingin mengetahui seberapa penting sebuah kata dalam sebuah dokumen dibandingkan dengan seluruh koleksi dokumen.
Untuk itu, digunakan metode pembobotan yang disebut TF-IDF (Term Frequency - Inverse Document Frequency).

TF-IDF membantu sistem untuk menentukan relevansi kata terhadap dokumen, dan digunakan secara luas dalam:
- Mesin pencari (search engine),
- Sistem rekomendasi,
- Text mining, dan
- Natural Language Processing (NLP).

## Term Frequency
Term Frequency (TF) menunjukkan seberapa sering sebuah kata (term) muncul dalam sebuah dokumen.

Rumus: `T F (t, d)`

Jumlah kemunculan term t dalam dokumen d
Jumlah total kata dalam dokumen d
TF(t,d)=
Jumlah total kata dalam dokumen d
Jumlah kemunculan term t dalam dokumen d

Semakin sering sebuah kata muncul dalam dokumen, semakin penting kata tersebut di dokumen itu. Namun, kata umum seperti ‚Äúyang‚Äù, ‚Äúdan‚Äù, ‚Äúatau‚Äù akan memiliki TF tinggi di banyak dokumen, sehingga perlu faktor pembeda (IDF).

## Inverse Document Frequency
Inverse Document Frequency (IDF) mengukur seberapa unik atau jarang suatu kata muncul di seluruh kumpulan dokumen (corpus).

Rumus: `IDF(t) = log(N / df(t))`

Keterangan:

- N = jumlah total dokumen

- df(t) = jumlah dokumen yang mengandung kata ùë°

Jika sebuah kata muncul di banyak dokumen, berarti kata tersebut kurang informatif.
Kata yang jarang muncul lebih unik dan memiliki nilai IDF tinggi.

## TF-IDF
TF-IDF adalah hasil perkalian antara TF dan IDF, yang memberi bobot pada kata berdasarkan:
- Seberapa sering kata itu muncul di dokumen (TF),
- Seberapa jarang kata itu muncul di keseluruhan dokumen (IDF).

Rumus = `TF-IDF(t,d)=TF(t,d) √ó IDF(t)`

Kata yang sering muncul di sebuah dokumen tetapi jarang muncul di dokumen lain akan memiliki nilai TF-IDF tinggi, menandakan kata tersebut penting.

<img src="https://miro.medium.com/v2/resize:fit:1358/0*W3Rzv6djRGrftW7r.PNG" width="108%" alt="Contoh TF IDF" />

## Kesimpulan

| Komponen   | Fungsi                             | Menunjukkan                 |
| ---------- | ---------------------------------- | --------------------------- |
| **TF**     | Frekuensi kata dalam dokumen       | Kepentingan lokal           |
| **IDF**    | Kelangkaan kata di seluruh dokumen | Kepentingan global          |
| **TF-IDF** | Kombinasi TF dan IDF               | Bobot total pentingnya kata |


## Lanjut Langkah Praktikum ü´°

## Inisialisasi dan Import Library

In [6]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [7]:
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.naive_bayes import MultinomialNB

In [8]:
!pip install Sastrawi



In [9]:
from Sastrawi.Stemmer.StemmerFactory import StemmerFactory
from Sastrawi.StopWordRemover.StopWordRemoverFactory import StopWordRemoverFactory

## Baca Dataset

In [10]:
data =  pd.read_excel('dataKumparan1.xlsx')
data.head()

Unnamed: 0,Topic,Title,Content
0,Politik,"Pelanggaran Pemilu, Tiga Caleg di Sulteng Dipr...","Komisioner Bawaslu Sigi, Sulawesi Tengah, Agus..."
1,Politik,"Pemilu Susulan di Kota Jayapura, Suara Jokowi ...",Walaupun dua dari lima distrik melakukan pemil...
2,Politik,"Tsamara Amany Dipinang, Pengurus PSI Daerah Me...","Tsamara Amany, politisi Partai Solidaritas Ind..."
3,Politik,Ada 47 TPS di Sulawesi Utara Berpotensi Pemili...,Badan Pengawas Pemilu (Bawaslu) Provinsi Sulaw...
4,Politik,Ketua KPPS di Sleman Ditemukan Tewas Gantung D...,"Tugiman, Ketua Kelompok Penyelenggara Pemungut..."


### Ukuran Dataset

In [11]:
print('Ukuran dataset: ', data.shape)

Ukuran dataset:  (60, 3)


### Pembagian Data Training & Testing

In [12]:
x_train, x_test, y_train, y_test = train_test_split(data['Content'], data['Topic'], train_size=0.5, test_size=0.16)
train_data = pd.DataFrame({'Content': x_train, 'Topic': y_train})
test_data = pd.DataFrame({'Content': x_test, 'Topic': y_test})

### Data Training

In [13]:
df1 = pd.DataFrame(train_data)
print(df1)

                                              Content      Topic
24  Stasiun Kereta Api Stockholm, Swedia, merupaka...  Teknologi
55  Maskapai bersimbol singa merah, Lion Air kerap...     Travel
58  Setuju atau tidak, ruang bagasi penyimpanan da...     Travel
38  Kamu mungkin pernah merasa kesulitan untuk ber...  Teknologi
15  Mantan Ketua Mahkamah Konstitusi (MK), Mahfud ...    Politik
57  Kabar bahagia datang bagi para penyelam di sel...     Travel
9   Seorang Ketua KPPS bernama Baharuddin Effendi ...    Politik
16  Pemungutan suara Pemilu 2019 telah usai. Tapi ...    Politik
39  Perusahaan e-commerce marketplace Tokopedia ke...  Teknologi
28  Masyarakat Indonesia menggunakan hak pilihnya ...  Teknologi
25  Apple ternyata tidak main-main untuk terjun ke...  Teknologi
33  Pada awal bulan April, ada sebuah berita besar...  Teknologi
6   Peningkatan perolehan suara untuk Partai Keadi...    Politik
41  Berlibur menjadi kegiatan yang paling dinanti ...     Travel
36  Platform media sosial

### Data Testing

In [14]:
df2 = pd.DataFrame(test_data)
print(df2)

                                              Content      Topic
43  Anda suka dessert? Di berbagai media sosial, p...     Travel
2   Tsamara Amany, politisi Partai Solidaritas Ind...    Politik
7   Badan Pengawas Pemilu (Bawaslu) Kota Banjarmas...    Politik
18  Isu kecurangan di Pemilu 2019 terus menyeruak....    Politik
56  Sulawesi Utara (Sulut) menunjukkan komitmennya...     Travel
59  Di balik kemegahan Pegunungan Tianzhu China, a...     Travel
50  Untuk pertama kalinya dalam 300 tahun, Vatikan...     Travel
37  Belanda merupakan negara yang secara geografis...  Teknologi
32  Super hype! Penayangan perdana film 'Avengers:...  Teknologi
13  Aplikasi dan situs ayojagatps.com ikut meramai...    Politik


### Ukuran Data Training & Testing

In [15]:
print('Ukuran data train : ', train_data.shape)
print('Ukuran data test : ', test_data.shape)
n_train = train_data.shape[0]
n_test = test_data.shape[0]

Ukuran data train :  (30, 2)
Ukuran data test :  (10, 2)


### Mengecek Data dan Menggabungkan Data

In [16]:
sparse_data = pd.concat([train_data, test_data], ignore_index=True)
# sparse_data.head()

df3 = pd.DataFrame(sparse_data)
print(df3)

                                              Content      Topic
0   Stasiun Kereta Api Stockholm, Swedia, merupaka...  Teknologi
1   Maskapai bersimbol singa merah, Lion Air kerap...     Travel
2   Setuju atau tidak, ruang bagasi penyimpanan da...     Travel
3   Kamu mungkin pernah merasa kesulitan untuk ber...  Teknologi
4   Mantan Ketua Mahkamah Konstitusi (MK), Mahfud ...    Politik
5   Kabar bahagia datang bagi para penyelam di sel...     Travel
6   Seorang Ketua KPPS bernama Baharuddin Effendi ...    Politik
7   Pemungutan suara Pemilu 2019 telah usai. Tapi ...    Politik
8   Perusahaan e-commerce marketplace Tokopedia ke...  Teknologi
9   Masyarakat Indonesia menggunakan hak pilihnya ...  Teknologi
10  Apple ternyata tidak main-main untuk terjun ke...  Teknologi
11  Pada awal bulan April, ada sebuah berita besar...  Teknologi
12  Peningkatan perolehan suara untuk Partai Keadi...    Politik
13  Berlibur menjadi kegiatan yang paling dinanti ...     Travel
14  Platform media sosial

In [17]:
sparse_data.head()
print('Ukuran data sparse : ', sparse_data.shape)
n_document = sparse_data.shape[0]

Ukuran data sparse :  (40, 2)


## Dengan Stemming dan Stopword

### Preprocessing

### Stemming (Mengubah Kata ke Bentuk Dasar)

In [18]:
sparse_data['teks'] = sparse_data['Content']

stemmerFactory = StemmerFactory()
stemmer = StemmerFactory().create_stemmer()

for row in range(n_document):
  # Now you can access and modify the 'teks' column
  sparse_data.loc[row, 'teks'] = stemmer.stem(sparse_data.loc[row, 'teks'])


In [19]:
# Sparse_data.head()
df4 = pd.DataFrame(sparse_data)
print(df4)


                                              Content      Topic  \
0   Stasiun Kereta Api Stockholm, Swedia, merupaka...  Teknologi   
1   Maskapai bersimbol singa merah, Lion Air kerap...     Travel   
2   Setuju atau tidak, ruang bagasi penyimpanan da...     Travel   
3   Kamu mungkin pernah merasa kesulitan untuk ber...  Teknologi   
4   Mantan Ketua Mahkamah Konstitusi (MK), Mahfud ...    Politik   
5   Kabar bahagia datang bagi para penyelam di sel...     Travel   
6   Seorang Ketua KPPS bernama Baharuddin Effendi ...    Politik   
7   Pemungutan suara Pemilu 2019 telah usai. Tapi ...    Politik   
8   Perusahaan e-commerce marketplace Tokopedia ke...  Teknologi   
9   Masyarakat Indonesia menggunakan hak pilihnya ...  Teknologi   
10  Apple ternyata tidak main-main untuk terjun ke...  Teknologi   
11  Pada awal bulan April, ada sebuah berita besar...  Teknologi   
12  Peningkatan perolehan suara untuk Partai Keadi...    Politik   
13  Berlibur menjadi kegiatan yang paling dinant

### Perhitungan Bobot TF-IDF

In [20]:
vectorize = CountVectorizer()
tf = vectorize.fit_transform(sparse_data['teks'])

print('Jumlah dokumen:', tf.shape[0])
print('Jumlah Term:', tf.shape[1])

Jumlah dokumen: 40
Jumlah Term: 2272


In [21]:
print('Daftar Term:')
vectorize.get_feature_names_out()

Daftar Term:


array(['00', '000', '0004', ..., 'zamih', 'zat', 'ziarah'], dtype=object)

## Daftar StopWord

In [22]:
from Sastrawi.StopWordRemover.StopWordRemoverFactory import StopWordRemoverFactory

factory = StopWordRemoverFactory()
stopword_list = factory.get_stop_words()


# Untuk Postest:
vectorize = CountVectorizer(stop_words=stopword_list)

tf = vectorize.fit_transform(sparse_data['teks'])

In [23]:
print('Jumlah Term:', tf.shape[1])

Jumlah Term: 2186


In [24]:
vectorize.get_feature_names_out()

array(['00', '000', '0004', ..., 'zamih', 'zat', 'ziarah'], dtype=object)

In [25]:
print('Matriks TF:')
tf_matrix = pd.DataFrame(tf.toarray(), columns=vectorize.get_feature_names_out())
tf_matrix

Matriks TF:


Unnamed: 0,00,000,0004,01,02,03,052,08,09,10,...,yakin,yasar,yerusalem,yesus,yogyakarta,yunani,zahid,zamih,zat,ziarah
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
4,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,8,0,0,0,0
6,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [26]:
print('Matriks TF (Khusus data train):')
tf_train = tf_matrix[:n_train]
tf_train.shape

Matriks TF (Khusus data train):


(30, 2186)

### Menghitung Bobot TF-IDF

In [27]:
transformer = TfidfTransformer(use_idf=True)

# Penyesuian df agar query atau data test tidak dihitung pada perhitungan df
n = n_train
df = tf_train.astype(bool).sum(axis=0)
idf = np.log(n/df)
transformer.idf_ = idf

# Mengubah matriks TF menjadi matriks bobot TF-IDF.
weight = transformer.fit_transform(tf)
print('Jumlah Dokumen:', weight.shape[0])
print('Jumlah Term:', weight.shape[1])

Jumlah Dokumen: 40
Jumlah Term: 2186


### Hasil akhir pembobotan

In [28]:
weight_matrix = pd.DataFrame(weight.toarray(), columns=vectorize.get_feature_names_out())
weight_matrix

Unnamed: 0,00,000,0004,01,02,03,052,08,09,10,...,yakin,yasar,yerusalem,yesus,yogyakarta,yunani,zahid,zamih,zat,ziarah
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013429,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.054616,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.413064,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [29]:
# Pembagian Matrix Bobot

weight_train = weight_matrix[:n_train]
weight_test = weight_matrix[n_train:]

## Next Menghitung Cosine Similarity dengan Data dari Praktikum 4

---

## POSTEST 4 - Modifikasi dengan Stopword

1. kerjakan postest seperti pada langkah praktikum
2. Buat **daftar stop word** dari data yang terdapat dalam file `Excel` yang telah disediakan.
3.  Analisis hasil perhitungan `TF-IDF` dari data yang sudah diolah, dan berikan penjelasan mengenai hasil analisis tersebut.

`DONE`

---

# Pertemuan 5 - Vector Space Model



## Vector Space Model
VSM adalah pendekatan matematis untuk mengukur kemiripan antara dokumen dan query. Setiap dokumen dan query direpresentasikan sebagai vektor dalam ruang berdimensi banyak (dimensi = jumlah term unik). Kemiripan dihitung menggunakan Cosine Similarity.

## Output yang diharapkan nanti berupa:

- Matriks kemiripan antara dokumen dan query

- Label dokumen paling relevan

- Analisis hasil: apakah hasil sesuai ekspektasi?

## Tips

- Pastikan preprocessing (stopword removal, stemming) sudah dilakukan

- Normalisasi vektor penting untuk hasil yang akurat

- Coba bandingkan hasil dengan query berbeda untuk melihat perubahan relevansi

Bayangkan kamu membuat sistem pencarian artikel berita. Ketika user mengetik ‚Äúharga BBM naik‚Äù, sistem harus bisa menampilkan artikel yang paling relevan. VSM membantu sistem menilai mana artikel yang paling ‚Äúmirip‚Äù dengan query tersebut.



## Lanjut Langkah PraktikumüòÄ

Perhitungan Cosine Similarty

In [30]:
cosim = cosine_similarity(weight_train, weight_test)
print('Ukuran matriks cosine similarty: ', cosim.shape)

Ukuran matriks cosine similarty:  (30, 10)


In [31]:
name = []
for i in range (n_test):
  name.append('Dokumen ' + str(i))

In [32]:
print('Matriks Cosine Similarity:')
cosim_matrix = pd.DataFrame(cosim, columns=name)
cosim_matrix.shape

Matriks Cosine Similarity:


(30, 10)

Matrix cosim + label

In [34]:
cosim_matrix['Label Train'] = train_data['Topic'].values
label_row = dict(zip(name, test_data['Topic'].values))

label_cosim = pd.concat([cosim_matrix, pd.DataFrame([label_row], index=['Label Test'])])
label_cosim.rename({label_cosim.index[-1]:'Label Test'}, inplace=True)


label_test = pd.DataFrame(label_cosim.iloc[-1])
label_test = label_test.T
label_test

Unnamed: 0,Dokumen 0,Dokumen 1,Dokumen 2,Dokumen 3,Dokumen 4,Dokumen 5,Dokumen 6,Dokumen 7,Dokumen 8,Dokumen 9,Label Train
Label Test,Travel,Politik,Politik,Politik,Travel,Travel,Travel,Teknologi,Teknologi,Politik,


In [35]:
label_cosim

Unnamed: 0,Dokumen 0,Dokumen 1,Dokumen 2,Dokumen 3,Dokumen 4,Dokumen 5,Dokumen 6,Dokumen 7,Dokumen 8,Dokumen 9,Label Train
0,0.020849,0.026596,0.008291,0.030909,0.013937,0.039298,0.029285,0.115236,0.008344,0.043836,Teknologi
1,0.022123,0.011255,0.038283,0.020182,0.045175,0.032658,0.030721,0.079265,0.021858,0.052807,Travel
2,0.033714,0.022999,0.021006,0.020191,0.026653,0.024926,0.058337,0.024883,0.013566,0.037326,Travel
3,0.020035,0.014725,0.022957,0.01952,0.032389,0.049807,0.032593,0.031164,0.020646,0.050956,Teknologi
4,0.017825,0.051579,0.153388,0.127689,0.035755,0.030444,0.024771,0.02207,0.021004,0.273948,Politik
5,0.016318,0.028448,0.020087,0.019039,0.066842,0.037616,0.058583,0.124889,0.018489,0.036807,Travel
6,0.009318,0.036968,0.048041,0.029875,0.014355,0.087416,0.058214,0.029662,0.012841,0.079402,Politik
7,0.020053,0.020877,0.077334,0.043786,0.034681,0.017484,0.026493,0.012244,0.02149,0.083224,Politik
8,0.022948,0.02723,0.04898,0.024247,0.021141,0.024334,0.018265,0.017971,0.063573,0.041701,Teknologi
9,0.040675,0.052968,0.118013,0.041702,0.034513,0.084611,0.021914,0.030449,0.025219,0.124137,Teknologi


---

## POSTEST 5

1. Bagaimana hasil dari tampilan `Vector Space Model`?
2. Lakukan evaluasi menggunakan metrik `Precision@N` dan analisis apakah dokumen hasil retrieval sudah relevan dengan query-nya.

In [40]:
import numpy as np
import pandas as pd

def evaluate_vsm(cosim_matrix, label_test, n_retrieve=3):
    # Hanya ambil baris numerik (bukan Label Test)
    numeric_part = cosim_matrix.iloc[:-1, :].copy()

    results = []
    print("=== Evaluasi Vector Space Model ===\n")

    for query in numeric_part.columns[:-1]:  # kecuali kolom Label Train
        # pastikan kolom berupa numerik
        numeric_part[query] = pd.to_numeric(numeric_part[query], errors='coerce')
        sorted_docs = numeric_part.sort_values(by=query, ascending=False)
        top_docs = sorted_docs.iloc[:n_retrieve]

        relevan = (top_docs['Label Train'].values == label_test[query].values)
        precision = np.mean(relevan)
        results.append(precision)

        print(f"Query: {query}")
        print(f"Label Uji: {label_test[query].values[0]}")
        print(f"Top-{n_retrieve} Dokumen:")
        display(top_docs[[query, 'Label Train']])
        print(f"Precision@{n_retrieve}: {precision:.2f}\n")

    avg_precision = np.mean(results)
    print(f"Rata-rata Precision: {avg_precision:.2f}")
    return avg_precision

# Jalankan evaluasi
evaluate_vsm(label_cosim, label_test, n_retrieve=3)


=== Evaluasi Vector Space Model ===

Query: Dokumen 0
Label Uji: Travel
Top-3 Dokumen:


Unnamed: 0,Dokumen 0,Label Train
11,0.087866,Teknologi
27,0.054666,Travel
18,0.048233,Teknologi


Precision@3: 0.33

Query: Dokumen 1
Label Uji: Politik
Top-3 Dokumen:


Unnamed: 0,Dokumen 1,Label Train
26,0.132333,Politik
9,0.052968,Teknologi
4,0.051579,Politik


Precision@3: 0.67

Query: Dokumen 2
Label Uji: Politik
Top-3 Dokumen:


Unnamed: 0,Dokumen 2,Label Train
4,0.153388,Politik
21,0.141193,Politik
26,0.132606,Politik


Precision@3: 1.00

Query: Dokumen 3
Label Uji: Politik
Top-3 Dokumen:


Unnamed: 0,Dokumen 3,Label Train
4,0.127689,Politik
21,0.098023,Politik
16,0.056736,Politik


Precision@3: 1.00

Query: Dokumen 4
Label Uji: Travel
Top-3 Dokumen:


Unnamed: 0,Dokumen 4,Label Train
27,0.188464,Travel
24,0.092408,Travel
5,0.066842,Travel


Precision@3: 1.00

Query: Dokumen 5
Label Uji: Travel
Top-3 Dokumen:


Unnamed: 0,Dokumen 5,Label Train
22,0.087792,Politik
6,0.087416,Politik
9,0.084611,Teknologi


Precision@3: 0.00

Query: Dokumen 6
Label Uji: Travel
Top-3 Dokumen:


Unnamed: 0,Dokumen 6,Label Train
5,0.058583,Travel
2,0.058337,Travel
6,0.058214,Politik


Precision@3: 0.67

Query: Dokumen 7
Label Uji: Teknologi
Top-3 Dokumen:


Unnamed: 0,Dokumen 7,Label Train
5,0.124889,Travel
0,0.115236,Teknologi
1,0.079265,Travel


Precision@3: 0.33

Query: Dokumen 8
Label Uji: Teknologi
Top-3 Dokumen:


Unnamed: 0,Dokumen 8,Label Train
8,0.063573,Teknologi
18,0.049985,Teknologi
23,0.037127,Teknologi


Precision@3: 1.00

Query: Dokumen 9
Label Uji: Politik
Top-3 Dokumen:


Unnamed: 0,Dokumen 9,Label Train
4,0.273948,Politik
21,0.219875,Politik
12,0.151498,Politik


Precision@3: 1.00

Rata-rata Precision: 0.70


np.float64(0.7)

**Note**: Jika ada perlu ditanyakan terkait teknis praktikum, jangan ragu bertanya. Silahkan bertanya di group atau pc dengan asisten `Fadhli` & `Aufa`

## Selamat Mengerjakan

<div align="center">
<img src="https://i.pinimg.com/originals/21/11/61/21116158daaeb1459b4ec0758505e1ad.gif" alt="Banner Haruhi suzumiya" />
</div>