# Bengali NID Intent Classification with Qwen3-4B

Fine-tune Qwen3-4B on Bengali NID customer service dataset for intent classification.

**Environment:** Google Colab Pro with L4 GPU (24GB)

| Component | Value |
|-----------|-------|
| Model | Qwen3-4B (4B params, 119 languages with Bengali support) |
| Dataset | Bengali NID (407 intents, ~78k train, ~11k eval) |
| Method | Generative SFT + LoRA |
| Language | Bengali questions → English intent tags |
| Attention | Flash Attention 2 |

## 1. Install Dependencies

In [1]:
!pip install -q transformers>=4.51.0 "datasets>=3.0.0,<4.0.0" trl>=0.17.0 peft>=0.15.0 accelerate>=1.6.0 bitsandbytes>=0.45.0 scikit-learn pandas flash-attn --no-build-isolation

## 2. Check GPU & Mount Google Drive

In [2]:
import torch

# Check GPU
if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"GPU: {gpu_name}")
    print(f"VRAM: {gpu_memory:.1f} GB")
else:
    print("No GPU available! Go to Runtime > Change runtime type > GPU")
    raise RuntimeError("GPU required")

# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Create output directory for model
import os
OUTPUT_DIR = "/content/drive/MyDrive/models/smollm2-bengali-nid-intent"
os.makedirs(OUTPUT_DIR, exist_ok=True)
print(f"Model output directory: {OUTPUT_DIR}")

# ============================================================
# DATASET PATH - Root of Google Drive (My Drive)
# ============================================================
DATASET_DIR = "/content/drive/MyDrive"

print(f"\n>>> Upload your CSV files to: My Drive (root folder)")
print("    - sts_train.csv")
print("    - sts_eval.csv")
print("    - tag_answer.csv")

GPU: NVIDIA L4
VRAM: 23.8 GB
Mounted at /content/drive
Model output directory: /content/drive/MyDrive/models/smollm2-bengali-nid-intent

>>> Upload your CSV files to: My Drive (root folder)
    - sts_train.csv
    - sts_eval.csv
    - tag_answer.csv


## 3. Verify Dataset Files

Make sure your CSV files are in Google Drive at the path shown above.

In [3]:
import os

# Check if files exist in Google Drive
train_path = f"{DATASET_DIR}/sts_train.csv"
eval_path = f"{DATASET_DIR}/sts_eval.csv"
tag_path = f"{DATASET_DIR}/tag_answer.csv"

files_status = {
    "sts_train.csv": os.path.exists(train_path),
    "sts_eval.csv": os.path.exists(eval_path),
    "tag_answer.csv": os.path.exists(tag_path),
}

print("Dataset files status:")
for fname, exists in files_status.items():
    status = "✓ Found" if exists else "✗ Missing"
    print(f"  {status}: {fname}")

if not all(files_status.values()):
    missing = [f for f, exists in files_status.items() if not exists]
    print(f"\n❌ Missing files: {missing}")
    print(f"Please upload them to: {DATASET_DIR}")
    raise FileNotFoundError(f"Missing dataset files in {DATASET_DIR}")
else:
    print(f"\n✓ All files found in {DATASET_DIR}")

Dataset files status:
  ✓ Found: sts_train.csv
  ✓ Found: sts_eval.csv
  ✓ Found: tag_answer.csv

✓ All files found in /content/drive/MyDrive


## 4. Load and Analyze Dataset

In [4]:
import pandas as pd
from collections import Counter

# Load CSV files from Google Drive
print("Loading dataset files from Google Drive...")
train_df = pd.read_csv(train_path)
eval_df = pd.read_csv(eval_path)
tag_answer_df = pd.read_csv(tag_path)

print(f"Train samples: {len(train_df)}")
print(f"Eval samples: {len(eval_df)}")
print(f"Unique tags in train: {train_df['tag'].nunique()}")
print(f"Unique tags in eval: {eval_df['tag'].nunique()}")
print(f"Tags with answers: {len(tag_answer_df)}")

# Show sample
print(f"\nSample from training data:")
print(f"  Question: {train_df.iloc[0]['question']}")
print(f"  Tag: {train_df.iloc[0]['tag']}")

Loading dataset files from Google Drive...
Train samples: 78616
Eval samples: 11457
Unique tags in train: 407
Unique tags in eval: 403
Tags with answers: 407

Sample from training data:
  Question: "একাউন্ট লক করা হয়েছে" দেখাচ্ছে, সমাধান কী?
  Tag: account_locked


In [5]:
# Build intent labels from training data
INTENT_TAGS = sorted(train_df['tag'].unique().tolist())
print(f"Total unique intents: {len(INTENT_TAGS)}")

# Create mappings
ID2INTENT = {i: intent for i, intent in enumerate(INTENT_TAGS)}
INTENT2ID = {intent: i for i, intent in enumerate(INTENT_TAGS)}

# Show top 15 tags by frequency
print(f"\nTop 15 tags by frequency:")
tag_counts = train_df['tag'].value_counts()
for tag, count in tag_counts.head(15).items():
    print(f"  {tag}: {count}")

Total unique intents: 407

Top 15 tags by frequency:
  fraction: 494
  permanent_address_change_fees: 381
  spouse_name_correction_new: 231
  parent_spouse_name_correct_or_add_document_new: 229
  parents_name_correction_new: 226
  goodbye: 218
  picture_done_but_lost_or_no_sms_slip: 215
  service_provided: 213
  disability_no_hands_registration_procedure: 206
  abroad_smart_card_collection_return: 206
  reissue_urgent_card_delivery_time: 206
  signature_to_fingerprint_reversal_not_allowed: 206
  reissue_smart_card_download_not_available: 206
  abroad_illegal_resident_nid: 206
  abroad_embassy_walk_in_registration: 206


## 5. Configuration

In [6]:
# ============================================================
# CONFIGURATION - Qwen3-4B for Bengali Intent Classification
# ============================================================

# Model - Qwen3-4B with native Bengali support (119 languages)
MODEL_NAME = "Qwen/Qwen3-4B"

# Data
MAX_SEQ_LENGTH = 512  # Bengali text can be longer

# Training - optimized for 4B model on L4 24GB
NUM_EPOCHS = 3
BATCH_SIZE = 4          # Reduced for larger model
GRAD_ACCUM_STEPS = 16   # Effective batch = 64
LEARNING_RATE = 2e-5    # Lower LR for larger model
WARMUP_RATIO = 0.03

# LoRA - adjusted for 4B model (larger model needs less adaptation)
LORA_R = 32
LORA_ALPHA = 64
LORA_DROPOUT = 0.05
LORA_TARGET_MODULES = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]

# Seed
SEED = 42

# System prompt for Qwen3 chat format
SYSTEM_PROMPT = "You are a Bengali NID customer service classifier. Given a Bengali customer query, output only the intent tag name. Do not include any explanation."

print(f"Model: {MODEL_NAME}")
print(f"Number of intents: {len(INTENT_TAGS)}")
print(f"Batch size: {BATCH_SIZE} x {GRAD_ACCUM_STEPS} = {BATCH_SIZE * GRAD_ACCUM_STEPS} effective")
print(f"LoRA rank: {LORA_R}")

Model: Qwen/Qwen3-4B
Number of intents: 407
Batch size: 4 x 16 = 64 effective
LoRA rank: 32


## 6. Prepare Dataset for SFT

In [7]:
from datasets import Dataset
from transformers import AutoTokenizer

# Load tokenizer first (needed for chat template)
print(f"Loading tokenizer: {MODEL_NAME}")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id

def format_for_sft(row):
    """
    Format example for generative SFT using Qwen3 chat template.

    Input: Bengali question + tag
    Output: Formatted chat conversation
    """
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": row['question']},
        {"role": "assistant", "content": row['tag']}
    ]
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
    return {'text': text, 'intent': row['tag']}

# Convert DataFrames to formatted datasets
print("Formatting training data with Qwen3 chat template...")
train_formatted = [format_for_sft(row) for _, row in train_df.iterrows()]
train_dataset = Dataset.from_list(train_formatted)

print("Formatting evaluation data...")
eval_formatted = [format_for_sft(row) for _, row in eval_df.iterrows()]
eval_dataset = Dataset.from_list(eval_formatted)

print(f"\nTrain dataset: {len(train_dataset)} samples")
print(f"Eval dataset: {len(eval_dataset)} samples")

# Show formatted sample
print(f"\nFormatted sample (Qwen3 chat format):")
print(train_dataset[0]['text'][:500] + "..." if len(train_dataset[0]['text']) > 500 else train_dataset[0]['text'])

Loading tokenizer: Qwen/Qwen3-4B


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

Formatting training data with Qwen3 chat template...
Formatting evaluation data...

Train dataset: 78616 samples
Eval dataset: 11457 samples

Formatted sample (Qwen3 chat format):
<|im_start|>system
You are a Bengali NID customer service classifier. Given a Bengali customer query, output only the intent tag name. Do not include any explanation.<|im_end|>
<|im_start|>user
"একাউন্ট লক করা হয়েছে" দেখাচ্ছে, সমাধান কী?<|im_end|>
<|im_start|>assistant
<think>

</think>

account_locked<|im_end|>



## 7. Load Model and Apply LoRA

In [8]:
from transformers import AutoModelForCausalLM
from peft import LoraConfig, TaskType, get_peft_model

# Load model with flash attention for memory efficiency
print(f"Loading model: {MODEL_NAME}")
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    attn_implementation="flash_attention_2",
)

# Enable gradient checkpointing for memory efficiency
model.gradient_checkpointing_enable()

print(f"Model loaded with Flash Attention 2")
print(f"Parameters: {model.num_parameters():,}")

Loading model: Qwen/Qwen3-4B


config.json:   0%|          | 0.00/726 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/3.99G [00:00<?, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/3.96G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/99.6M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Model loaded with Flash Attention 2
Parameters: 4,022,468,096


In [9]:
# Configure LoRA with higher rank for 407 classes
print("Configuring LoRA...")
lora_config = LoraConfig(
    r=LORA_R,
    lora_alpha=LORA_ALPHA,
    target_modules=LORA_TARGET_MODULES,
    lora_dropout=LORA_DROPOUT,
    bias="none",
    task_type=TaskType.CAUSAL_LM,
)

# Apply LoRA
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

Configuring LoRA...
trainable params: 66,060,288 || all params: 4,088,528,384 || trainable%: 1.6157


## 8. Train

In [10]:
from trl import SFTTrainer, SFTConfig
from transformers import set_seed

# Set seed
set_seed(SEED)

# Configure training - optimized for Qwen3-4B on L4 24GB
# NOTE: Use max_length (not max_seq_length) - renamed in TRL 0.12+
training_args = SFTConfig(
    output_dir=OUTPUT_DIR,

    # Training schedule
    num_train_epochs=NUM_EPOCHS,
    per_device_train_batch_size=BATCH_SIZE,
    per_device_eval_batch_size=BATCH_SIZE,
    gradient_accumulation_steps=GRAD_ACCUM_STEPS,

    # Optimizer
    learning_rate=LEARNING_RATE,
    weight_decay=0.01,
    warmup_ratio=WARMUP_RATIO,
    lr_scheduler_type="cosine",
    optim="adamw_8bit",

    # Mixed precision
    bf16=True,

    # Logging & saving
    logging_steps=25,
    eval_strategy="steps",
    eval_steps=250,
    save_strategy="steps",
    save_steps=500,
    save_total_limit=2,

    # Data - use max_length (NOT max_seq_length)
    max_length=MAX_SEQ_LENGTH,
    dataset_text_field="text",
    packing=False,

    # Memory optimization
    gradient_checkpointing=True,

    # Other
    seed=SEED,
    report_to="none",
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
)

# Initialize trainer
trainer = SFTTrainer(
    model=model,
    processing_class=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    args=training_args,
)

# Calculate training steps
steps_per_epoch = len(train_dataset) // (BATCH_SIZE * GRAD_ACCUM_STEPS)
total_steps = steps_per_epoch * NUM_EPOCHS

print("Trainer initialized. Ready to train!")
print(f"Steps per epoch: {steps_per_epoch}")
print(f"Total training steps: {total_steps}")
print(f"Estimated time on L4: ~2-3 hours")

Adding EOS to train dataset:   0%|          | 0/78616 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/78616 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/78616 [00:00<?, ? examples/s]

Adding EOS to eval dataset:   0%|          | 0/11457 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/11457 [00:00<?, ? examples/s]

Truncating eval dataset:   0%|          | 0/11457 [00:00<?, ? examples/s]

The model is already on multiple devices. Skipping the move to device specified in `args`.


Trainer initialized. Ready to train!
Steps per epoch: 1228
Total training steps: 3684
Estimated time on L4: ~2-3 hours


In [11]:
# Train!
print("Starting training with Qwen3-4B...")
print(f"Dataset: ~78k Bengali NID queries, 407 intents")
print(f"This will take approximately 12-13 hours on L4 GPU.")
print("-" * 50)

trainer.train()

print("-" * 50)
print("Training complete!")

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}.


Starting training with Qwen3-4B...
Dataset: ~78k Bengali NID queries, 407 intents
This will take approximately 2-3 hours on L4 GPU.
--------------------------------------------------


Step,Training Loss,Validation Loss,Entropy,Num Tokens,Mean Token Accuracy
250,0.5465,0.531768,0.572754,1878929.0,0.849627
500,0.3906,0.392649,0.397233,3756277.0,0.880999
750,0.3586,0.369788,0.376253,5634013.0,0.88684
1000,0.3431,0.356202,0.352035,7510359.0,0.890518
1250,0.3288,0.34792,0.345215,9382824.0,0.892943
1500,0.3199,0.341629,0.327387,11258614.0,0.894703
1750,0.3193,0.337402,0.315699,13136474.0,0.895749
2000,0.3069,0.332339,0.314857,15015314.0,0.897313
2250,0.3063,0.329154,0.32242,16893714.0,0.897692
2500,0.2905,0.328654,0.301022,18766197.0,0.898324


--------------------------------------------------
Training complete!


## 9. Save Model

In [12]:
# Save model to Google Drive
print(f"Saving model to {OUTPUT_DIR}...")
trainer.save_model(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)

# Also save intent mappings
import json
with open(f"{OUTPUT_DIR}/intent_mappings.json", "w", encoding="utf-8") as f:
    json.dump({"id2intent": ID2INTENT, "intent2id": INTENT2ID}, f, ensure_ascii=False, indent=2)

print(f"Model saved successfully!")
print(f"Files in {OUTPUT_DIR}:")
!ls -la {OUTPUT_DIR}

Saving model to /content/drive/MyDrive/models/smollm2-bengali-nid-intent...
Model saved successfully!
Files in /content/drive/MyDrive/models/smollm2-bengali-nid-intent:
total 273683
-rw------- 1 root root      1044 Dec 20 21:20 adapter_config.json
-rw------- 1 root root 264308896 Dec 20 21:20 adapter_model.safetensors
-rw------- 1 root root       707 Dec 20 21:20 added_tokens.json
-rw------- 1 root root      4168 Dec 20 21:20 chat_template.jinja
drwx------ 2 root root      4096 Dec 20 20:48 checkpoint-3500
drwx------ 2 root root      4096 Dec 20 21:19 checkpoint-3687
-rw------- 1 root root     38936 Dec 20 21:20 intent_mappings.json
-rw------- 1 root root   1671853 Dec 20 21:20 merges.txt
-rw------- 1 root root      1507 Dec 20 21:19 README.md
-rw------- 1 root root       613 Dec 20 21:20 special_tokens_map.json
-rw------- 1 root root      5404 Dec 20 21:20 tokenizer_config.json
-rw------- 1 root root  11422654 Dec 20 21:20 tokenizer.json
-rw------- 1 root root      6289 Dec 20 21:20 t

## 10. Evaluate

In [None]:
from tqdm.notebook import tqdm
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
from collections import Counter

def extract_intent(response, intent_tags):
    """Extract intent from model response, handling Qwen3 thinking tokens."""
    response = response.strip()

    # Remove thinking tokens if present (Qwen3 may include <think>...</think>)
    if "</think>" in response:
        response = response.split("</think>")[-1].strip()

    # Try exact match first (case-insensitive)
    response_lower = response.lower()
    for intent in intent_tags:
        if intent.lower() == response_lower:
            return intent

    # Try partial match
    for intent in intent_tags:
        if intent.lower() in response_lower:
            return intent

    return None

def evaluate_model(model, tokenizer, eval_df, num_samples=None):
    """Evaluate model on test set using Qwen3 chat format."""
    model.eval()

    if num_samples:
        eval_df = eval_df.head(num_samples)

    predictions = []
    true_labels = []

    for _, row in tqdm(eval_df.iterrows(), total=len(eval_df), desc="Evaluating"):
        question = row['question']
        true_intent = row['tag']

        # Use Qwen3 chat template
        messages = [
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": question},
        ]
        prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

        # Tokenize
        inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=MAX_SEQ_LENGTH)
        inputs = {k: v.to(model.device) for k, v in inputs.items()}

        # Generate
        with torch.no_grad():
            outputs = model.generate(
                **inputs,
                max_new_tokens=32,
                do_sample=False,
                pad_token_id=tokenizer.pad_token_id,
            )

        # Decode
        response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
        predicted_intent = extract_intent(response, INTENT_TAGS)

        predictions.append(predicted_intent)
        true_labels.append(true_intent)

    return predictions, true_labels

# Evaluate on subset first (full eval takes time with 11k samples)
print("Evaluating on 2000 samples (for speed)...")
print("For full evaluation, change num_samples=None")
predictions, true_labels = evaluate_model(model, tokenizer, eval_df, num_samples=None)

Evaluating on 2000 samples (for speed)...
For full evaluation, change num_samples=None


Evaluating:   0%|          | 0/11457 [00:00<?, ?it/s]

In [14]:
# Compute metrics
valid_mask = [p is not None for p in predictions]
valid_preds = [INTENT2ID.get(p, -1) for p in predictions]
valid_true = [INTENT2ID.get(t, -1) for t in true_labels]

# Filter valid
filtered_preds = [p for p, m in zip(valid_preds, valid_mask) if m and p != -1]
filtered_true = [t for t, m, p in zip(valid_true, valid_mask, valid_preds) if m and p != -1]

# Calculate metrics
if len(filtered_preds) > 0:
    accuracy = accuracy_score(filtered_true, filtered_preds)
    precision, recall, f1, _ = precision_recall_fscore_support(
        filtered_true, filtered_preds, average="weighted", zero_division=0
    )
else:
    accuracy = precision = recall = f1 = 0.0

print("=" * 50)
print("EVALUATION RESULTS")
print("=" * 50)
print(f"Total samples: {len(predictions)}")
print(f"Valid predictions: {sum(valid_mask)} ({100*sum(valid_mask)/len(predictions):.1f}%)")
print(f"")
print(f"Accuracy:  {accuracy:.4f} ({accuracy*100:.2f}%)")
print(f"Precision: {precision:.4f}")
print(f"Recall:    {recall:.4f}")
print(f"F1 Score:  {f1:.4f}")
print("=" * 50)

# Show top confusions
print("\nTop 10 Confusions:")
confusions = [(t, p) for p, t in zip(predictions, true_labels) if p != t and p is not None]
for (true, pred), count in Counter(confusions).most_common(10):
    print(f"  {true} -> {pred}: {count}")

EVALUATION RESULTS
Total samples: 2000
Valid predictions: 1997 (99.8%)

Accuracy:  0.9224 (92.24%)
Precision: 0.9548
Recall:    0.9224
F1 Score:  0.9339

Top 10 Confusions:
  request_for_clear_pronunciation -> repeat_again: 11
  smart_card_fee -> smart_card_collection_fees: 7
  new_voter_online_what_documents_to_submit -> new_voter_apply_online_what_documents_needed: 5
  new_voter_apply_online_what_documents_needed -> new_voter_online_what_documents_to_submit: 4
  nid_information_misentry_correction_procedure -> card_information_correction: 3
  closing_statement -> request_for_feedback_or_suggestions: 3
  correction_application_status -> correction_application_current_status: 3
  card_correction_request_failed_query_regarding_refund_fees -> overpayment_refund_not_possible: 3
  greeting_and_request_for_question -> delayed_question_prompt: 3
  new_voter_fees -> new_registration_fee_not_required: 3


## 11. Interactive Inference

In [15]:
def classify_intent(query, model, tokenizer):
    """Classify intent for a single Bengali query using Qwen3 chat format."""
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": query},
    ]
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=MAX_SEQ_LENGTH)
    inputs = {k: v.to(model.device) for k, v in inputs.items()}

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=32,
            do_sample=False,
            pad_token_id=tokenizer.pad_token_id,
        )

    response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
    intent = extract_intent(response, INTENT_TAGS)

    return intent, response

# Get answer for intent from tag_answer_df
def get_answer(intent):
    """Get Bengali answer for an intent."""
    row = tag_answer_df[tag_answer_df['tag'] == intent]
    if len(row) > 0:
        return row.iloc[0]['answer']
    return "উত্তর পাওয়া যায়নি।"

# Test with sample Bengali queries
test_queries = [
    "আমার এনআইডি একাউন্ট লক হয়ে গেছে, কিভাবে আনলক করবো?",
    "কার্ড হারিয়ে গেলে কি করতে হবে?",
    "জাতীয় পরিচয়পত্রে নাম সংশোধন করতে চাই",
    "ভোটার আইডি কার্ডের ঠিকানা পরিবর্তন করতে কি কি লাগবে?",
    "স্মার্ট কার্ড কবে পাবো?",
]

print("Testing with Bengali queries (Qwen3-4B):")
print("=" * 70)
for query in test_queries:
    intent, raw = classify_intent(query, model, tokenizer)
    answer = get_answer(intent) if intent else "Intent not recognized"
    print(f"Query: {query}")
    print(f"Intent: {intent}")
    print(f"Raw response: {raw}")
    print(f"Answer: {answer[:100]}..." if len(answer) > 100 else f"Answer: {answer}")
    print("-" * 70)

Testing with Bengali queries (Qwen3-4B):
Query: আমার এনআইডি একাউন্ট লক হয়ে গেছে, কিভাবে আনলক করবো?
Intent: account_locked
Raw response: <think>

</think>

account_locked
Answer: ভুল তথ্য দিয়ে রেজিস্ট্রেশন বা লগইনের চেষ্টা করা হলে স্বয়ংক্রিয়ভাবে একাউন্টটি লক হয়ে যায়। অনুগ্রহপূর্...
----------------------------------------------------------------------
Query: কার্ড হারিয়ে গেলে কি করতে হবে?
Intent: voter_slip_lost_how_to_get_nid
Raw response: <think>

</think>

voter_slip_lost_how_to_get_nid
Answer: সংশ্লিষ্ট উপজেলা/থানা নির্বাচন অফিস গিয়ে এনআইডি নম্বর সংগ্রহ করে অনলাইনে রি-ইস্যু আবেদন করতে হবে। যদ...
----------------------------------------------------------------------
Query: জাতীয় পরিচয়পত্রে নাম সংশোধন করতে চাই
Intent: nid_information_misentry_correction_procedure
Raw response: <think>

</think>

nid_information_misentry_correction_procedure
Answer: সংশ্লিষ্ট থানা/উপজেলা নির্বাচন অফিসে যোগাযোগ করে ডেমোগ্রাফিক তথ্য সংশোধনের স্বপক্ষে পর্যাপ্ত দলিলাদি...
------------------------------

## 12. Push to HuggingFace Hub

In [19]:
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [20]:
# Push model to Hub
HF_REPO_NAME = "ehzawad/qwen3-4b-bengali-nid-intent"

print(f"Pushing model to HuggingFace Hub: {HF_REPO_NAME}")

model.push_to_hub(HF_REPO_NAME)
tokenizer.push_to_hub(HF_REPO_NAME)

print(f"\nModel uploaded successfully!")
print(f"View at: https://huggingface.co/{HF_REPO_NAME}")

Pushing model to HuggingFace Hub: ehzawad/qwen3-4b-bengali-nid-intent


Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...adapter_model.safetensors:   0%|          |  559kB /  264MB            

README.md: 0.00B [00:00, ?B/s]

Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...mptkklo5he/tokenizer.json: 100%|##########| 11.4MB / 11.4MB            


Model uploaded successfully!
View at: https://huggingface.co/ehzawad/qwen3-4b-bengali-nid-intent


In [21]:
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B")
model = PeftModel.from_pretrained(base_model, "ehzawad/qwen3-4b-bengali-nid-intent")
tokenizer = AutoTokenizer.from_pretrained("ehzawad/qwen3-4b-bengali-nid-intent")

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

adapter_config.json: 0.00B [00:00, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/264M [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/707 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/613 [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

## Done!

Your Bengali NID intent classification model (Qwen3-4B) is now:
- Saved to Google Drive at: `/content/drive/MyDrive/models/smollm2-bengali-nid-intent`
- Pushed to HuggingFace Hub

To load the model later:
```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B")
model = PeftModel.from_pretrained(base_model, "ehzawad/qwen3-4b-bengali-nid-intent")
tokenizer = AutoTokenizer.from_pretrained("ehzawad/qwen3-4b-bengali-nid-intent")
```