<a href="https://colab.research.google.com/github/normalnon/Advanced-Data-Filtering/blob/main/Workshop2_PPL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Get your Hugging Face Token

If you don't already have one, create a **User Access Token** on the Hugging Face website. Make sure it has at least 'read' permissions.

- Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
- Click on 'New token'
- Give it a name (e.g., `Colab_HF_Token`)
- Set the Role to 'read' (or 'write' if you plan to push models)
- Click 'Generate a token' and copy the token.

## Add the token to Colab Secrets

To securely store your token and avoid hardcoding it in your notebook:

- Click on the 'üîë' (key) icon in the left panel of Colab.
- Click on 'Add new secret'.
- For the 'Name' field, type `HF_TOKEN` (this is the recommended name by Hugging Face).
- For the 'Value' field, paste your Hugging Face token.
- Ensure 'Notebook access' is toggled on for this notebook.


## Log in to Hugging Face programmatically

Now, run the following code cell to log in using the token stored in your Colab secrets. This will authenticate your session and allow you to access gated models.

In [None]:
# Install huggingface_hub if not already installed
!pip install -q huggingface_hub

from huggingface_hub import login
from google.colab import userdata

# Retrieve the token from Colab secrets
hf_token = userdata.get('Huggingface')

# Log in to Hugging Face
if hf_token:
    login(token=hf_token)
    print("Successfully logged in to Hugging Face!")
else:
    print("Hugging Face token not found in Colab secrets. Please add it as 'HF_TOKEN'.")


Successfully logged in to Hugging Face!


## ‡∏ï‡∏¥‡∏î‡∏ï‡∏±‡πâ‡∏á Library

In [None]:
# ‡∏ï‡∏¥‡∏î‡∏ï‡∏±‡πâ‡∏á Library ‡∏ó‡∏µ‡πà‡∏à‡∏≥‡πÄ‡∏õ‡πá‡∏ô‡∏™‡∏≥‡∏´‡∏£‡∏±‡∏ö‡∏Å‡∏≤‡∏£‡∏£‡∏±‡∏ô Model ‡πÅ‡∏•‡∏∞‡∏à‡∏±‡∏î‡∏Å‡∏≤‡∏£‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•
!pip install transformers torch pandas accelerate --quiet

print("Installation complete.")

Installation complete.


## Imports & Setup


In [None]:
import math
import torch
import pandas as pd
import numpy as np
from typing import List, Tuple
from transformers import AutoTokenizer, AutoModelForCausalLM

# ‡πÄ‡∏•‡∏∑‡∏≠‡∏Å‡πÇ‡∏°‡πÄ‡∏î‡∏•
MODEL_NAME = "Qwen/Qwen3-0.6B"

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
print(f"Selected Model: {MODEL_NAME}")

Using device: cuda
Selected Model: Qwen/Qwen3-0.6B


## ‡∏™‡∏£‡πâ‡∏≤‡∏á Mock-up Data

In [None]:
# Prepare Mock-up Data
def get_mock_data():
    data = {
        "Group": [],
        "Text": []
    }

    # 1. ‡∏Å‡∏•‡∏∏‡πà‡∏°‡∏†‡∏≤‡∏©‡∏≤‡∏ò‡∏£‡∏£‡∏°‡∏ä‡∏≤‡∏ï‡∏¥ (‡∏Ñ‡∏ß‡∏£‡πÑ‡∏î‡πâ PPL ‡∏ï‡πà‡∏≥/‡∏î‡∏µ)
    normal_texts = [
        "‡∏Å‡∏£‡∏∏‡∏á‡πÄ‡∏ó‡∏û‡∏°‡∏´‡∏≤‡∏ô‡∏Ñ‡∏£‡πÄ‡∏õ‡πá‡∏ô‡πÄ‡∏°‡∏∑‡∏≠‡∏á‡∏´‡∏•‡∏ß‡∏á‡∏Ç‡∏≠‡∏á‡∏õ‡∏£‡∏∞‡πÄ‡∏ó‡∏®‡πÑ‡∏ó‡∏¢",
        "‡πÅ‡∏°‡∏ß‡∏ä‡∏≠‡∏ö‡∏ô‡∏≠‡∏ô‡∏≠‡∏≤‡∏ö‡πÅ‡∏î‡∏î‡πÉ‡∏ô‡∏ï‡∏≠‡∏ô‡πÄ‡∏ä‡πâ‡∏≤‡πÄ‡∏û‡∏£‡∏≤‡∏∞‡∏≠‡∏≤‡∏Å‡∏≤‡∏®‡∏≠‡∏ö‡∏≠‡∏∏‡πà‡∏ô",
        "‡∏Å‡∏≤‡∏£‡πÄ‡∏£‡∏µ‡∏¢‡∏ô‡∏£‡∏π‡πâ‡πÄ‡∏Ñ‡∏£‡∏∑‡πà‡∏≠‡∏á‡∏à‡∏±‡∏Å‡∏£‡πÄ‡∏õ‡πá‡∏ô‡∏™‡∏≤‡∏Ç‡∏≤‡∏´‡∏ô‡∏∂‡πà‡∏á‡∏Ç‡∏≠‡∏á‡∏õ‡∏±‡∏ç‡∏ç‡∏≤‡∏õ‡∏£‡∏∞‡∏î‡∏¥‡∏©‡∏ê‡πå"
    ]
    for t in normal_texts:
        data["Group"].append("1. Normal (Natural)")
        data["Text"].append(t)

    # 2. ‡∏Å‡∏•‡∏∏‡πà‡∏°‡∏™‡∏•‡∏±‡∏ö‡∏Ñ‡∏≥/‡πÑ‡∏ß‡∏¢‡∏≤‡∏Å‡∏£‡∏ì‡πå‡∏ú‡∏¥‡∏î (‡∏Ñ‡∏ß‡∏£‡πÑ‡∏î‡πâ PPL ‡∏™‡∏π‡∏á‡∏Ç‡∏∂‡πâ‡∏ô)
    scrambled_texts = [
        "‡∏Ç‡∏≠‡∏á‡πÄ‡∏°‡∏∑‡∏≠‡∏á‡∏´‡∏•‡∏ß‡∏á‡πÄ‡∏õ‡πá‡∏ô‡πÑ‡∏ó‡∏¢‡∏Å‡∏£‡∏∏‡∏á‡πÄ‡∏ó‡∏û‡∏°‡∏´‡∏≤‡∏ô‡∏Ñ‡∏£‡∏õ‡∏£‡∏∞‡πÄ‡∏ó‡∏®",
        "‡πÄ‡∏ä‡πâ‡∏≤‡πÅ‡∏î‡∏î‡∏≠‡∏≤‡∏ö‡∏ä‡∏≠‡∏ö‡πÅ‡∏°‡∏ß‡∏ô‡∏≠‡∏ô‡πÉ‡∏ô‡∏≠‡∏ö‡∏≠‡∏∏‡πà‡∏ô‡∏≠‡∏≤‡∏Å‡∏≤‡∏®‡πÄ‡∏û‡∏£‡∏≤‡∏∞",
        "‡∏™‡∏µ‡πÄ‡∏Ç‡∏µ‡∏¢‡∏ß‡∏ô‡∏≠‡∏ô‡∏´‡∏•‡∏±‡∏ö‡∏≠‡∏¢‡πà‡∏≤‡∏á‡∏£‡∏ß‡∏î‡πÄ‡∏£‡πá‡∏ß‡∏ö‡∏ô‡πÄ‡∏û‡∏î‡∏≤‡∏ô‡∏Ñ‡∏ß‡∏≤‡∏°‡∏Ñ‡∏¥‡∏î" # Semantic ‡∏ú‡∏¥‡∏î
    ]
    for t in scrambled_texts:
        data["Group"].append("2. Scrambled/Weird")
        data["Text"].append(t)

    # 3. ‡∏Å‡∏•‡∏∏‡πà‡∏°‡∏†‡∏≤‡∏©‡∏≤‡∏ú‡∏™‡∏°/Code-switching (PPL ‡∏Ç‡∏∂‡πâ‡∏ô‡∏Å‡∏±‡∏ö‡∏Ñ‡∏ß‡∏≤‡∏°‡πÄ‡∏Å‡πà‡∏á‡πÇ‡∏°‡πÄ‡∏î‡∏•)
    mixed_texts = [
        "‡∏ß‡∏±‡∏ô‡∏ô‡∏µ‡πâ Coding ‡∏™‡∏ô‡∏∏‡∏Å‡∏°‡∏≤‡∏Å Error ‡∏Å‡∏£‡∏∞‡∏à‡∏≤‡∏¢ 555+",
        "Concept ‡∏Ç‡∏≠‡∏á Perplexity ‡∏Ñ‡∏∑‡∏≠‡∏ß‡∏±‡∏î‡∏Ñ‡∏ß‡∏≤‡∏° Surprise ‡∏Ç‡∏≠‡∏á Model",
    ]
    for t in mixed_texts:
        data["Group"].append("3. Code-Switching")
        data["Text"].append(t)

    # 4. ‡∏Å‡∏•‡∏∏‡πà‡∏°‡∏°‡∏±‡πà‡∏ß/‡∏™‡∏∏‡πà‡∏° (PPL ‡∏Ñ‡∏ß‡∏£‡∏™‡∏π‡∏á‡∏ó‡∏µ‡πà‡∏™‡∏∏‡∏î)
    gibberish_texts = [
        "‡∏Å‡∏Ç‡∏Ñ‡∏á ‡∏à‡∏â‡∏ä‡∏ã 1234 abcd !@#$",
        "sfdk jljlaksjd 9999 xxxxx"
    ]
    for t in gibberish_texts:
        data["Group"].append("4. Gibberish")
        data["Text"].append(t)

    return pd.DataFrame(data)

# ‡∏™‡∏£‡πâ‡∏≤‡∏á‡πÅ‡∏•‡∏∞‡πÅ‡∏™‡∏î‡∏á‡∏ï‡∏±‡∏ß‡∏≠‡∏¢‡πà‡∏≤‡∏á‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•
df_data = get_mock_data()
display(df_data)

Unnamed: 0,Group,Text
0,1. Normal (Natural),‡∏Å‡∏£‡∏∏‡∏á‡πÄ‡∏ó‡∏û‡∏°‡∏´‡∏≤‡∏ô‡∏Ñ‡∏£‡πÄ‡∏õ‡πá‡∏ô‡πÄ‡∏°‡∏∑‡∏≠‡∏á‡∏´‡∏•‡∏ß‡∏á‡∏Ç‡∏≠‡∏á‡∏õ‡∏£‡∏∞‡πÄ‡∏ó‡∏®‡πÑ‡∏ó‡∏¢
1,1. Normal (Natural),‡πÅ‡∏°‡∏ß‡∏ä‡∏≠‡∏ö‡∏ô‡∏≠‡∏ô‡∏≠‡∏≤‡∏ö‡πÅ‡∏î‡∏î‡πÉ‡∏ô‡∏ï‡∏≠‡∏ô‡πÄ‡∏ä‡πâ‡∏≤‡πÄ‡∏û‡∏£‡∏≤‡∏∞‡∏≠‡∏≤‡∏Å‡∏≤‡∏®‡∏≠‡∏ö‡∏≠‡∏∏‡πà‡∏ô
2,1. Normal (Natural),‡∏Å‡∏≤‡∏£‡πÄ‡∏£‡∏µ‡∏¢‡∏ô‡∏£‡∏π‡πâ‡πÄ‡∏Ñ‡∏£‡∏∑‡πà‡∏≠‡∏á‡∏à‡∏±‡∏Å‡∏£‡πÄ‡∏õ‡πá‡∏ô‡∏™‡∏≤‡∏Ç‡∏≤‡∏´‡∏ô‡∏∂‡πà‡∏á‡∏Ç‡∏≠‡∏á‡∏õ‡∏±‡∏ç‡∏ç‡∏≤‡∏õ‡∏£‡∏∞...
3,2. Scrambled/Weird,‡∏Ç‡∏≠‡∏á‡πÄ‡∏°‡∏∑‡∏≠‡∏á‡∏´‡∏•‡∏ß‡∏á‡πÄ‡∏õ‡πá‡∏ô‡πÑ‡∏ó‡∏¢‡∏Å‡∏£‡∏∏‡∏á‡πÄ‡∏ó‡∏û‡∏°‡∏´‡∏≤‡∏ô‡∏Ñ‡∏£‡∏õ‡∏£‡∏∞‡πÄ‡∏ó‡∏®
4,2. Scrambled/Weird,‡πÄ‡∏ä‡πâ‡∏≤‡πÅ‡∏î‡∏î‡∏≠‡∏≤‡∏ö‡∏ä‡∏≠‡∏ö‡πÅ‡∏°‡∏ß‡∏ô‡∏≠‡∏ô‡πÉ‡∏ô‡∏≠‡∏ö‡∏≠‡∏∏‡πà‡∏ô‡∏≠‡∏≤‡∏Å‡∏≤‡∏®‡πÄ‡∏û‡∏£‡∏≤‡∏∞
5,2. Scrambled/Weird,‡∏™‡∏µ‡πÄ‡∏Ç‡∏µ‡∏¢‡∏ß‡∏ô‡∏≠‡∏ô‡∏´‡∏•‡∏±‡∏ö‡∏≠‡∏¢‡πà‡∏≤‡∏á‡∏£‡∏ß‡∏î‡πÄ‡∏£‡πá‡∏ß‡∏ö‡∏ô‡πÄ‡∏û‡∏î‡∏≤‡∏ô‡∏Ñ‡∏ß‡∏≤‡∏°‡∏Ñ‡∏¥‡∏î
6,3. Code-Switching,‡∏ß‡∏±‡∏ô‡∏ô‡∏µ‡πâ Coding ‡∏™‡∏ô‡∏∏‡∏Å‡∏°‡∏≤‡∏Å Error ‡∏Å‡∏£‡∏∞‡∏à‡∏≤‡∏¢ 555+
7,3. Code-Switching,Concept ‡∏Ç‡∏≠‡∏á Perplexity ‡∏Ñ‡∏∑‡∏≠‡∏ß‡∏±‡∏î‡∏Ñ‡∏ß‡∏≤‡∏° Surprise ‡∏Ç‡∏≠‡∏á...
8,4. Gibberish,‡∏Å‡∏Ç‡∏Ñ‡∏á ‡∏à‡∏â‡∏ä‡∏ã 1234 abcd !@#$
9,4. Gibberish,sfdk jljlaksjd 9999 xxxxx


## Define Calculation Functions

In [None]:
# Define Calculation Functions

def mask_special_tokens(labels: torch.Tensor, special_ids: set[int]) -> torch.Tensor:
    """‡πÄ‡∏õ‡∏•‡∏µ‡πà‡∏¢‡∏ô label ‡∏Ç‡∏≠‡∏á special token ‡πÄ‡∏õ‡πá‡∏ô -100 ‡πÄ‡∏û‡∏∑‡πà‡∏≠‡πÑ‡∏°‡πà‡πÉ‡∏´‡πâ‡πÄ‡∏≠‡∏≤‡∏°‡∏≤‡∏Ñ‡∏¥‡∏î Loss"""
    masked = labels.clone()
    for tid in special_ids:
        masked[masked == tid] = -100
    return masked

@torch.no_grad()
def calculate_ppl_by_sentence(model, tokenizer, texts: List[str]):
    """
    ‡∏Ñ‡∏≥‡∏ô‡∏ß‡∏ì PPL ‡πÅ‡∏¢‡∏Å‡∏£‡∏≤‡∏¢‡∏õ‡∏£‡∏∞‡πÇ‡∏¢‡∏Ñ ‡πÅ‡∏•‡∏∞‡∏™‡πà‡∏á‡∏Ñ‡πà‡∏≤‡∏Å‡∏•‡∏±‡∏ö‡∏°‡∏≤‡πÄ‡∏û‡∏∑‡πà‡∏≠‡∏Ñ‡∏≥‡∏ô‡∏ß‡∏ì Micro/Macro
    """
    model.eval()
    results = []

    # ‡∏î‡∏∂‡∏á ID ‡∏Ç‡∏≠‡∏á token ‡∏û‡∏¥‡πÄ‡∏®‡∏©‡∏ó‡∏µ‡πà‡πÑ‡∏°‡πà‡∏Ñ‡∏ß‡∏£‡∏ô‡∏≥‡∏°‡∏≤‡∏Ñ‡∏¥‡∏î loss (‡πÄ‡∏ä‡πà‡∏ô padding)
    special_ids = {
        tokenizer.eos_token_id,
        tokenizer.bos_token_id,
        tokenizer.pad_token_id
    }
    special_ids = {tid for tid in special_ids if tid is not None}

    total_nll_sum = 0.0      # ‡∏ú‡∏•‡∏£‡∏ß‡∏° NLL ‡∏ó‡∏±‡πâ‡∏á‡∏´‡∏°‡∏î (‡πÉ‡∏ä‡πâ‡∏´‡∏≤ Micro)
    total_token_count = 0    # ‡∏ú‡∏•‡∏£‡∏ß‡∏° Token ‡∏ó‡∏±‡πâ‡∏á‡∏´‡∏°‡∏î (‡πÉ‡∏ä‡πâ‡∏´‡∏≤ Micro)

    for text in texts:
        # 1. Tokenize
        inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=1024).to(device)

        # 2. Prepare Labels (Mask special tokens)
        labels = inputs.input_ids.clone()
        labels = mask_special_tokens(labels, special_ids)

        # 3. Forward Pass
        outputs = model(input_ids=inputs.input_ids, attention_mask=inputs.attention_mask, labels=labels)
        loss = outputs.loss.item() # ‡∏ô‡∏µ‡πà‡∏Ñ‡∏∑‡∏≠‡∏Ñ‡πà‡∏≤‡πÄ‡∏â‡∏•‡∏µ‡πà‡∏¢ NLL ‡∏Ç‡∏≠‡∏á‡∏õ‡∏£‡∏∞‡πÇ‡∏¢‡∏Ñ‡∏ô‡∏µ‡πâ

        # 4. Count valid tokens
        valid_tokens = (labels != -100).sum().item()

        # 5. Calculate Metrics for this sentence
        sentence_nll_sum = loss * valid_tokens # ‡πÅ‡∏õ‡∏•‡∏á‡∏Ñ‡πà‡∏≤‡πÄ‡∏â‡∏•‡∏µ‡πà‡∏¢‡∏Å‡∏•‡∏±‡∏ö‡πÄ‡∏õ‡πá‡∏ô‡∏ú‡∏•‡∏£‡∏ß‡∏° NLL ‡∏Ç‡∏≠‡∏á‡∏õ‡∏£‡∏∞‡πÇ‡∏¢‡∏Ñ
        sentence_ppl = math.exp(loss)          # PPL = exp(NLL_mean)

        # ‡πÄ‡∏Å‡πá‡∏ö‡∏™‡∏∞‡∏™‡∏°‡∏Ñ‡πà‡∏≤‡∏£‡∏ß‡∏°
        total_nll_sum += sentence_nll_sum
        total_token_count += valid_tokens

        results.append({
            "Token_Count": valid_tokens,
            "NLL_Sum": sentence_nll_sum,
            "Perplexity": sentence_ppl
        })

    return results, total_nll_sum, total_token_count

## Load model & Calcualation

In [None]:
# Load Model & Run Calculation
print(f"Loading {MODEL_NAME}...")

# ‡πÇ‡∏´‡∏•‡∏î Tokenizer ‡πÅ‡∏•‡∏∞ Model
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype="auto",
    device_map="auto"
)

print("Calculating Perplexity...")

sentence_results, total_nll, total_tokens = calculate_ppl_by_sentence(
    model,
    tokenizer,
    df_data["Text"].tolist()
)

# ‡∏ô‡∏≥‡∏ú‡∏•‡∏•‡∏±‡∏û‡∏ò‡πå‡∏°‡∏≤‡∏£‡∏ß‡∏°‡∏Å‡∏±‡∏ö DataFrame ‡πÄ‡∏î‡∏¥‡∏°
df_results = pd.concat([df_data, pd.DataFrame(sentence_results)], axis=1)

print("Calculation Done!")

Loading Qwen/Qwen3-0.6B...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/726 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/1.50G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Calculating Perplexity...
Calculation Done!


## Result Analysis

In [None]:
# Result Analysis

# --- 1. Calculate Micro vs Macro ---

# Micro: ‡∏°‡∏≠‡∏á‡∏ó‡∏∏‡∏Å‡∏õ‡∏£‡∏∞‡πÇ‡∏¢‡∏Ñ‡∏ï‡πà‡∏≠‡∏Å‡∏±‡∏ô‡πÄ‡∏õ‡πá‡∏ô‡∏Å‡πâ‡∏≠‡∏ô‡πÄ‡∏î‡∏µ‡∏¢‡∏ß (Weighted by length)
# ‡∏™‡∏π‡∏ï‡∏£: exp( ‡∏ú‡∏•‡∏£‡∏ß‡∏° NLL ‡∏ó‡∏±‡πâ‡∏á‡∏´‡∏°‡∏î / ‡∏ú‡∏•‡∏£‡∏ß‡∏° Token ‡∏ó‡∏±‡πâ‡∏á‡∏´‡∏°‡∏î )
micro_ppl = math.exp(total_nll / total_tokens)

# Macro: ‡∏°‡∏≠‡∏á‡∏ó‡∏∏‡∏Å‡∏õ‡∏£‡∏∞‡πÇ‡∏¢‡∏Ñ‡∏™‡∏≥‡∏Ñ‡∏±‡∏ç‡πÄ‡∏ó‡πà‡∏≤‡∏Å‡∏±‡∏ô (Average of PPLs)
# ‡∏™‡∏π‡∏ï‡∏£: Average( PPL ‡∏Ç‡∏≠‡∏á‡πÅ‡∏ï‡πà‡∏•‡∏∞‡∏õ‡∏£‡∏∞‡πÇ‡∏¢‡∏Ñ )
macro_ppl = df_results["Perplexity"].mean()

print("="*40)
print(f"üìä Overall Results ({MODEL_NAME})")
print("="*40)
print(f"üëâ Micro Average PPL:  {micro_ppl:.4f}")
print(f"üëâ Macro Average PPL:  {macro_ppl:.4f}")
print("="*40)

# --- 2. Group Analysis ---
print("\nüîç Average Perplexity by Group (Lower is Better):")
group_summary = df_results.groupby("Group")["Perplexity"].mean().sort_values()
display(group_summary.to_frame())

print("\nüìã Detailed Sentence Report:")
pd.set_option('display.max_colwidth', 60)
pd.set_option('display.float_format', '{:.2f}'.format)
display(df_results[["Group", "Text", "Token_Count", "Perplexity"]])

üìä Overall Results (Qwen/Qwen3-0.6B)
üëâ Micro Average PPL:  73.0944
üëâ Macro Average PPL:  204.1737

üîç Average Perplexity by Group (Lower is Better):


Unnamed: 0_level_0,Perplexity
Group,Unnamed: 1_level_1
1. Normal (Natural),17.200837
2. Scrambled/Weird,103.028627
3. Code-Switching,257.745277
4. Gibberish,582.779046



üìã Detailed Sentence Report:


Unnamed: 0,Group,Text,Token_Count,Perplexity
0,1. Normal (Natural),‡∏Å‡∏£‡∏∏‡∏á‡πÄ‡∏ó‡∏û‡∏°‡∏´‡∏≤‡∏ô‡∏Ñ‡∏£‡πÄ‡∏õ‡πá‡∏ô‡πÄ‡∏°‡∏∑‡∏≠‡∏á‡∏´‡∏•‡∏ß‡∏á‡∏Ç‡∏≠‡∏á‡∏õ‡∏£‡∏∞‡πÄ‡∏ó‡∏®‡πÑ‡∏ó‡∏¢,14,13.9
1,1. Normal (Natural),‡πÅ‡∏°‡∏ß‡∏ä‡∏≠‡∏ö‡∏ô‡∏≠‡∏ô‡∏≠‡∏≤‡∏ö‡πÅ‡∏î‡∏î‡πÉ‡∏ô‡∏ï‡∏≠‡∏ô‡πÄ‡∏ä‡πâ‡∏≤‡πÄ‡∏û‡∏£‡∏≤‡∏∞‡∏≠‡∏≤‡∏Å‡∏≤‡∏®‡∏≠‡∏ö‡∏≠‡∏∏‡πà‡∏ô,18,32.14
2,1. Normal (Natural),‡∏Å‡∏≤‡∏£‡πÄ‡∏£‡∏µ‡∏¢‡∏ô‡∏£‡∏π‡πâ‡πÄ‡∏Ñ‡∏£‡∏∑‡πà‡∏≠‡∏á‡∏à‡∏±‡∏Å‡∏£‡πÄ‡∏õ‡πá‡∏ô‡∏™‡∏≤‡∏Ç‡∏≤‡∏´‡∏ô‡∏∂‡πà‡∏á‡∏Ç‡∏≠‡∏á‡∏õ‡∏±‡∏ç‡∏ç‡∏≤‡∏õ‡∏£‡∏∞‡∏î‡∏¥‡∏©‡∏ê‡πå,28,5.57
3,2. Scrambled/Weird,‡∏Ç‡∏≠‡∏á‡πÄ‡∏°‡∏∑‡∏≠‡∏á‡∏´‡∏•‡∏ß‡∏á‡πÄ‡∏õ‡πá‡∏ô‡πÑ‡∏ó‡∏¢‡∏Å‡∏£‡∏∏‡∏á‡πÄ‡∏ó‡∏û‡∏°‡∏´‡∏≤‡∏ô‡∏Ñ‡∏£‡∏õ‡∏£‡∏∞‡πÄ‡∏ó‡∏®,15,35.56
4,2. Scrambled/Weird,‡πÄ‡∏ä‡πâ‡∏≤‡πÅ‡∏î‡∏î‡∏≠‡∏≤‡∏ö‡∏ä‡∏≠‡∏ö‡πÅ‡∏°‡∏ß‡∏ô‡∏≠‡∏ô‡πÉ‡∏ô‡∏≠‡∏ö‡∏≠‡∏∏‡πà‡∏ô‡∏≠‡∏≤‡∏Å‡∏≤‡∏®‡πÄ‡∏û‡∏£‡∏≤‡∏∞,17,217.5
5,2. Scrambled/Weird,‡∏™‡∏µ‡πÄ‡∏Ç‡∏µ‡∏¢‡∏ß‡∏ô‡∏≠‡∏ô‡∏´‡∏•‡∏±‡∏ö‡∏≠‡∏¢‡πà‡∏≤‡∏á‡∏£‡∏ß‡∏î‡πÄ‡∏£‡πá‡∏ß‡∏ö‡∏ô‡πÄ‡∏û‡∏î‡∏≤‡∏ô‡∏Ñ‡∏ß‡∏≤‡∏°‡∏Ñ‡∏¥‡∏î,21,56.03
6,3. Code-Switching,‡∏ß‡∏±‡∏ô‡∏ô‡∏µ‡πâ Coding ‡∏™‡∏ô‡∏∏‡∏Å‡∏°‡∏≤‡∏Å Error ‡∏Å‡∏£‡∏∞‡∏à‡∏≤‡∏¢ 555+,18,251.39
7,3. Code-Switching,Concept ‡∏Ç‡∏≠‡∏á Perplexity ‡∏Ñ‡∏∑‡∏≠‡∏ß‡∏±‡∏î‡∏Ñ‡∏ß‡∏≤‡∏° Surprise ‡∏Ç‡∏≠‡∏á Model,16,264.1
8,4. Gibberish,‡∏Å‡∏Ç‡∏Ñ‡∏á ‡∏à‡∏â‡∏ä‡∏ã 1234 abcd !@#$,18,479.15
9,4. Gibberish,sfdk jljlaksjd 9999 xxxxx,14,686.41
