<div style="background-color: #e3f2fd; padding: 20px; border-radius: 10px; border-left: 8px solid #2196f3;"> <h1 style="color: #1565c0; margin-bottom: 5px;">MindSchedule AI: Intent Classification</h1> <p style="font-size: 1.2em; color: #455a64;">Pelatihan Model NLP untuk Manajemen Jadwal & Kesehatan Mental Mahasiswa</p> <hr> <strong>Status Proyek:</strong> Tahap 1 - Pemodelan AI (Offline Training) </div>


[1] IMPORT LIBRARY

Kita memuat library utama. Scikit-learn digunakan untuk ML klasik, dan Datasets dari HuggingFace untuk mengambil data riset terbaru.

In [1]:
print("‚è≥ Menginisialisasi sistem dan library...")

import pandas as pd
import numpy as np
import re
import nltk
import joblib
import os
from datasets import load_dataset
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, accuracy_score

# Unduh resource pendukung NLTK
nltk.download("stopwords", quiet=True)
nltk.download("wordnet", quiet=True)
nltk.download("omw-1.4", quiet=True)

print("‚úÖ Library dan Resource NLTK siap digunakan!")

‚è≥ Menginisialisasi sistem dan library...


  from .autonotebook import tqdm as notebook_tqdm


‚úÖ Library dan Resource NLTK siap digunakan!


[3] LOAD DATASET DARI HUGGINGFACE

Kita menggunakan dua sumber dataset publik untuk memperkaya konteks asisten:

Mental Health Dataset: heliosbrahma/mental_health_chatbot_dataset - Berisi percakapan seputar isu kesehatan mental.

Student Assistance: bot-remains/student-assistance-chatbot - Berisi dataset bantuan akademik mahasiswa.

In [2]:
print("üåê Menghubungkan ke HuggingFace...")
try:
    # Load Mental Health
    ds_mental = load_dataset("heliosbrahma/mental_health_chatbot_dataset", split="train")
    df_mental = pd.DataFrame(ds_mental)

    # Load Student Assistance
    ds_student = load_dataset("bot-remains/student-assistance-chatbot", split="train")
    df_student = pd.DataFrame(ds_student)

    print(f"‚úÖ Berhasil! Mental Health: {len(df_mental)} baris, Student: {len(df_student)} baris")
    
    print("\nüîç Preview Tabel Mental Health:")
    display(df_mental.head(3))
    
    print("\nüîç Preview Tabel Student Assistance:")
    display(df_student.head(3))
except Exception as e:
    print(f"‚ùå Error: {e}")

üåê Menghubungkan ke HuggingFace...
‚úÖ Berhasil! Mental Health: 172 baris, Student: 217 baris

üîç Preview Tabel Mental Health:


Unnamed: 0,text
0,<HUMAN>: What is a panic attack?\n<ASSISTANT>:...
1,<HUMAN>: What are symptoms of panic attack vs....
2,<HUMAN>: What are the types of Mental Illness?...



üîç Preview Tabel Student Assistance:


Unnamed: 0,category,instruction,input,output
0,Greetings and Farewells,Respond to greetings and farewells.,"Hi, how are you?","Hello! I'm doing great, thank you. How about you?"
1,Greetings and Farewells,Respond to greetings and farewells.,"Goodbye, see you later!",Goodbye! Take care and see you soon!
2,Greetings and Farewells,Respond to greetings and farewells.,Hi,Hello there! How can I help you today?


üõ†Ô∏è [4] PREPROCESSING & NORMALISASI
(Gunakan Markdown Cell) Langkah ini sangat krusial. Kita menyamakan nama kolom dan membersihkan teks dari karakter yang tidak perlu (simbol, angka, kata hubung) agar model lebih fokus pada kata kunci penting.

In [3]:
print("‚è≥ Menyelaraskan kolom dan membersihkan teks...")

# 1. Normalisasi Mental Health
# Mencari kolom teks: bisa 'text', 'Context', atau 'Questions'
mental_text_col = [c for c in df_mental.columns if c in ['text', 'Context', 'Questions']][0]
df_mental = df_mental.rename(columns={mental_text_col: 'text'})
df_mental["intent"] = "mental_health"

# 2. Normalisasi Student Assistance
df_student = df_student.rename(columns={"input": "text", "category": "intent"})

# 3. Gabungkan & Bersihkan
df = pd.concat([df_mental[['text', 'intent']], df_student[['text', 'intent']]], ignore_index=True)
df = df.dropna(subset=['text']).reset_index(drop=True)

# 4. Cleaning Function
stop_words = set(stopwords.words("english"))
lemmatizer = WordNetLemmatizer()

def clean_process(text):
    text = re.sub(r"[^a-z\s]", "", str(text).lower())
    return " ".join([lemmatizer.lemmatize(w) for w in text.split() if w not in stop_words])

df["final_text"] = df["text"].apply(clean_process)

print("‚úÖ Data Gabungan Siap!")
display(df["intent"].value_counts().to_frame())

‚è≥ Menyelaraskan kolom dan membersihkan teks...
‚úÖ Data Gabungan Siap!


Unnamed: 0_level_0,count
intent,Unnamed: 1_level_1
mental_health,172
Course Information,25
Open-Ended Questions,20
General Questions,17
Placement Information,16
Scholarships,16
Apologies,16
Hostel Facilities,15
Contact Details,15
Alumni Information,14


üíæ [6] PENYIMPANAN ARTIFAK MODEL
(Gunakan Markdown Cell) Tahap terakhir adalah menyimpan model ke folder ../model/. File inilah yang nantinya akan digunakan oleh FastAPI di tahap pengembangan backend.

In [4]:
print("‚è≥ Training model sedang berjalan...")

vectorizer = TfidfVectorizer(max_features=5000)
X = vectorizer.fit_transform(df["final_text"])
y = df["intent"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(f"‚úÖ Training Selesai! Akurasi: {accuracy_score(y_test, y_pred)*100:.2f}%")
print("\nüìä Laporan Klasifikasi:\n", classification_report(y_test, y_pred))

‚è≥ Training model sedang berjalan...
‚úÖ Training Selesai! Akurasi: 73.08%

üìä Laporan Klasifikasi:
                          precision    recall  f1-score   support

      Admission Process       0.00      0.00      0.00         1
     Alumni Information       1.00      1.00      1.00         3
              Apologies       0.00      0.00      0.00         3
            Campus Life       0.00      0.00      0.00         1
        Contact Details       1.00      1.00      1.00         3
     Course Information       0.83      1.00      0.91         5
   Eligibility Criteria       0.00      0.00      0.00         1
          Fee Structure       0.00      0.00      0.00         1
      General Questions       0.00      0.00      0.00         4
Greetings and Farewells       1.00      0.33      0.50         3
      Hostel Facilities       1.00      0.67      0.80         3
   Open-Ended Questions       0.00      0.00      0.00         4
  Placement Information       1.00      1.00      

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


In [5]:
print("‚è≥ Menyimpan artifak proyek...")

# Buat folder lokal
os.makedirs('../model', exist_ok=True)
os.makedirs('../data/processed', exist_ok=True)

# Simpan .pkl
joblib.dump(model, "../model/intent_model.pkl")
joblib.dump(vectorizer, "../model/tfidf_vectorizer.pkl")
df.to_csv("../data/processed/intents_final.csv", index=False)

print("\n" + "="*40)
print("üöÄ SELESAI! File berikut telah siap:")
print("1. ../model/intent_model.pkl")
print("2. ../model/tfidf_vectorizer.pkl")
print("="*40)

‚è≥ Menyimpan artifak proyek...

üöÄ SELESAI! File berikut telah siap:
1. ../model/intent_model.pkl
2. ../model/tfidf_vectorizer.pkl


In [6]:
import os

print("üßê Mengecek keberadaan file secara fisik...\n")

files = ["../model/intent_model.pkl", "../model/tfidf_vectorizer.pkl", "../data/processed/intents_final.csv"]

for f in files:
    if os.path.exists(f):
        size = os.path.getsize(f) / 1024  # Ukuran dalam KB
        print(f"‚úÖ {f} DITEMUKAN!")
        print(f"   Ukuran: {size:.2f} KB")
    else:
        print(f"‚ùå {f} TIDAK DITEMUKAN!")

print("\nüí° Jika 'TIDAK DITEMUKAN', periksa apakah Anda menjalankan notebook dari folder yang benar.")

üßê Mengecek keberadaan file secara fisik...

‚úÖ ../model/intent_model.pkl DITEMUKAN!
   Ukuran: 397.97 KB
‚úÖ ../model/tfidf_vectorizer.pkl DITEMUKAN!
   Ukuran: 99.06 KB
‚úÖ ../data/processed/intents_final.csv DITEMUKAN!
   Ukuran: 347.36 KB

üí° Jika 'TIDAK DITEMUKAN', periksa apakah Anda menjalankan notebook dari folder yang benar.
