# Zero-Shot mT5 Baseline Evaluation

### Introduction

This notebook documents the initial phase of the project on "Code-Mixed Query Auto-Suggestion." The primary objective of this phase is to establish a performance baseline using a pre-trained multilingual model, mT5. By running the model on code-mixed queries in a zero-shot setting (without any fine-tuning), we aim to demonstrate that a general-purpose model is insufficient for this specialized task. The poor results from this evaluation will provide a strong justification for the project's next steps: generating a high-quality synthetic corpus and fine-tuning a specialized model.



In [2]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch

# Load the mT5 model and tokenizer.
model_name = "google/mt5-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)


# Define your zero-shot prompts and queries for each language pair.
# A clear prompt helps the model understand the task.
# Hinglish (Hindi-English)
hinglish_queries = [
    "mujhe aaj bahar jana hai, can you please tell me the",
    "kya aap mujhe bata sakte hain, what is the nearest",
    "meri car kharab ho gayi hai, what should I do",
    "yeh phone bahut expensive hai, is there any",
    "kal ka match kiska hai? I want to see the",
    "dosto, koi new movie aayi hai kya? I am so",
]

# Frenglish (French-English)
frenglish_queries = [
    "as good as vs as in",
    "did n't think covid-19",
    "he is married to or with"
]

# Chinese-English
chinese_queries = [
    "has got/have got 的中文翻译 have",
    "有 has got/have got 的中文翻译",
    "有 has got"
]


# A function to run the zero-shot inference.
def run_zero_shot_inference(queries, language_pair):
    print(f"--- Running Zero-Shot for {language_pair} ---")

    # Prepend a task-specific prefix with explicit language information.
    prefix = f"complete the query containing {language_pair.replace('-', ' and ')} code-mix words: "

    for query in queries:
        input_text = prefix + query

        # Tokenize the input text.
        input_ids = tokenizer.encode(input_text, return_tensors="pt")

        # Generate the output.
        output_ids = model.generate(
            input_ids,
            max_length=50,
            num_beams=5,
            early_stopping=True
        )

        # Decode the generated tokens back to text.
        generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

        print(f"Input:    '{query}'")
        print(f"Output:   '{generated_text}'\n")

# Run the tests for each language pair.
run_zero_shot_inference(hinglish_queries, "Hindi-English")
run_zero_shot_inference(frenglish_queries, "French-English")
run_zero_shot_inference(chinese_queries, "Chinese-English")



--- Running Zero-Shot for Hindi-English ---
Input:    'mujhe aaj bahar jana hai, can you please tell me the'
Output:   '<extra_id_0> aaj bahar jana.'

Input:    'kya aap mujhe bata sakte hain, what is the nearest'
Output:   '<extra_id_0> shuru karte hain.'

Input:    'meri car kharab ho gayi hai, what should I do'
Output:   '<extra_id_0> a query.'

Input:    'yeh phone bahut expensive hai, is there any'
Output:   '<extra_id_0> is query:'

Input:    'kal ka match kiska hai? I want to see the'
Output:   '<extra_id_0> is query:'

Input:    'dosto, koi new movie aayi hai kya? I am so'
Output:   '<extra_id_0>. I am sorry.'

--- Running Zero-Shot for French-English ---
Input:    'as good as vs as in'
Output:   '<extra_id_0> as good as good'

Input:    'did n't think covid-19'
Output:   '<extra_id_0> n't think'

Input:    'he is married to or with'
Output:   '<extra_id_0> married to or with'

--- Running Zero-Shot for Chinese-English ---
Input:    'has got/have got 的中文翻译 have'
Output:   '<ext

### Results Summary

The zero-shot evaluation with the mT5-base model, even with improved and more explicit prompts, yielded poor results across all three language pairs. The model consistently failed to provide meaningful auto-suggestions for the code-mixed queries. The outputs highlight a fundamental lack of understanding of the core task.

* **Hinglish:** Despite the explicit prompt, the model's responses were often generic, returning placeholder tokens or phrases like `<extra_id_0> to find the answer.`
* **French-English:** The outputs were similarly unhelpful, with the model failing to complete the queries in a natural, code-mixed way and often falling back to its default pre-training behavior.
* **Chinese-English:** The model's responses were nonsensical, again dominated by the placeholder token `<extra_id_0>`, which indicates that the model could not generate a coherent continuation.

This outcome confirms that a general multilingual model lacks the specialized knowledge of code-switching patterns necessary for this specific task. The poor baseline performance validates the project's central premise and provides strong evidence for the need to create and fine-tune a specialized model.