# **Multilingual Math Word Problem Solver**

This notebook demonstrates how to solve math word problems stated in any major Indian
language using Sarvam AI's language models and text-to-speech APIs.

### **Use Case**
Enable students, teachers, and exam-preparation platforms to submit math problems in
their native language and receive spoken, step-by-step solutions without switching to
English.

1. **Translate and Parse:** Use **Sarvam-M** to detect the input language and translate
   the problem to English for structured analysis.
2. **Solve:** Use **Sarvam-M** to work through the problem step by step and return a
   structured solution with individual calculation steps, a final answer, and a
   confidence score.
3. **Speak:** Use **Bulbul v3 TTS** to read the solution aloud in the original language.

### **Supported Languages**

| Language | Code |
| :--- | :--- |
| Hindi | hi-IN |
| Tamil | ta-IN |
| Telugu | te-IN |
| Kannada | kn-IN |
| Malayalam | ml-IN |
| Gujarati | gu-IN |
| Marathi | mr-IN |
| Bengali | bn-IN |
| English (India) | en-IN |


In [None]:
# Pinning versions for reproducibility
!pip install -Uqq sarvamai>=0.1.24 python-dotenv>=1.0.0


### **1. Setup and API Key**

Obtain your API key from the [Sarvam AI Dashboard](https://dashboard.sarvam.ai).
Create a `.env` file in this directory with `SARVAM_API_KEY=your_key_here`, or set the
environment variable directly.


In [None]:
from __future__ import annotations

import base64
import json
import os
import re
import traceback
from pathlib import Path

from dotenv import load_dotenv
from sarvamai import SarvamAI

load_dotenv()

SARVAM_API_KEY = os.environ.get("SARVAM_API_KEY", "")
if not SARVAM_API_KEY or SARVAM_API_KEY == "YOUR_SARVAM_API_KEY":
    raise RuntimeError(
        "SARVAM_API_KEY is not set. Add it to your .env file or set the environment variable."
    )

client = SarvamAI(api_subscription_key=SARVAM_API_KEY)

print("Client initialised.")


### **2. Step 1 — TRANSLATE and PARSE: Detect Language and Understand the Problem**

`translate_and_parse` sends the raw math problem text to **Sarvam-M** with a system
prompt that instructs the model to identify the BCP-47 language code and translate the
problem to English, returning a compact JSON object.

Supported input: plain text in any supported Indian language or English.


In [None]:
_LANGUAGE_LABELS: dict[str, str] = {
    "hi-IN": "Hindi",
    "ta-IN": "Tamil",
    "te-IN": "Telugu",
    "kn-IN": "Kannada",
    "ml-IN": "Malayalam",
    "gu-IN": "Gujarati",
    "mr-IN": "Marathi",
    "bn-IN": "Bengali",
    "en-IN": "English (India)",
}

_TRANSLATE_SYSTEM_PROMPT = (
    "You are a multilingual math assistant that understands all major Indian languages.\n\n"
    "Given a math word problem in any language, you must:\n"
    "1. Identify the BCP-47 language code of the problem "
    "(supported codes: hi-IN, ta-IN, en-IN, mr-IN, bn-IN, gu-IN, kn-IN, ml-IN, te-IN).\n"
    "2. Translate the problem into clear, precise English.\n\n"
    "Return ONLY a valid JSON object with no markdown fences and no extra text.\n"
    "Required keys: problem_language (string, BCP-47 code) "
    "and problem_english (string, English translation of the problem)."
)


def translate_and_parse(problem_text: str) -> dict[str, str] | None:
    """Detect the language of a math problem and translate it to English.

    Args:
        problem_text: Math word problem in any supported Indian language or English.

    Returns:
        Dict with keys 'problem_language' (BCP-47 code) and 'problem_english'
        (English translation), or None if the call fails.
    """
    response = client.chat.completions(
        messages=[
            {"role": "system", "content": _TRANSLATE_SYSTEM_PROMPT},
            {"role": "user",   "content": problem_text.strip()},
        ]
    )

    if not response or not response.choices:
        raise ValueError("Sarvam-M returned no response during the translate step.")

    content = response.choices[0].message.content
    if content is None:
        raise ValueError("Sarvam-M returned an empty message during the translate step.")

    raw = content.strip()
    # Strip optional markdown code fence (``` or ```json) returned by the model
    if raw.startswith("```"):
        raw = raw[raw.find("\n") + 1:]
    if raw.endswith("```"):
        raw = raw[: raw.rfind("\n")]
    raw = raw.strip()

    parsed = json.loads(raw)
    lang  = parsed.get("problem_language", "en-IN")
    label = _LANGUAGE_LABELS.get(lang, lang)

    print(f"Detected language : {label} ({lang})")
    print(f"Problem (English) : {parsed.get('problem_english', '')}")
    return parsed


print("translate_and_parse defined.")


### **3. Step 2 — SOLVE: Generate a Step-by-Step Solution**

`solve_problem` sends the English problem and the original language code to
**Sarvam-M**. The model works through the problem step by step and returns a structured
JSON solution that includes individual calculation steps, a final answer, a confidence
score, and a spoken summary written in the original language for use by TTS in Step 3.

A confidence warning is printed if the score falls below 0.85.


In [None]:
_SOLVE_SYSTEM_PROMPT = (
    "You are an expert math tutor. Solve the given English math word problem "
    "step by step and return a structured JSON solution.\n\n"
    "The JSON must contain exactly these keys:\n"
    "- steps: array of objects, each with step_number (integer), description (string "
    "in English), and calculation (string in English showing the arithmetic)\n"
    "- final_answer: string, a concise answer in English\n"
    "- confidence: float between 0 and 1 reflecting certainty in the answer\n"
    "- solution_spoken: string, a natural conversational explanation of the solution "
    "written in the language identified by the BCP-47 code supplied by the user\n\n"
    "Rules:\n"
    "- solution_spoken must be in the student's language (the BCP-47 code given).\n"
    "- solution_spoken must be suitable for text-to-speech: no LaTeX or special math "
    "symbols; write fractions, operators, and expressions in natural spoken words.\n"
    "- Return ONLY the JSON object, no markdown fences, no extra text."
)


def solve_problem(
    problem_english: str,
    problem_language: str,
) -> dict | None:
    """Solve an English math problem and return a structured JSON solution.

    Args:
        problem_english:  The math problem translated to English.
        problem_language: BCP-47 code of the student's original language, used to
                          generate the spoken summary in solution_spoken.

    Returns:
        Dict with keys 'steps', 'final_answer', 'confidence', and 'solution_spoken',
        or None if the call fails.
    """
    user_message = (
        f"Language: {problem_language}\n"
        f"Problem: {problem_english.strip()}"
    )

    response = client.chat.completions(
        messages=[
            {"role": "system", "content": _SOLVE_SYSTEM_PROMPT},
            {"role": "user",   "content": user_message},
        ]
    )

    if not response or not response.choices:
        raise ValueError("Sarvam-M returned no response during the solve step.")

    content = response.choices[0].message.content
    if content is None:
        raise ValueError("Sarvam-M returned an empty message during the solve step.")

    raw = content.strip()
    # Strip optional markdown code fence (``` or ```json) returned by the model
    if raw.startswith("```"):
        raw = raw[raw.find("\n") + 1:]
    if raw.endswith("```"):
        raw = raw[: raw.rfind("\n")]
    raw = raw.strip()

    solution   = json.loads(raw)
    confidence = float(solution.get("confidence", 1.0))

    if confidence < 0.85:
        print(f"WARNING: Low confidence ({confidence:.2f}). Review the solution carefully.")

    print(f"Final answer      : {solution.get('final_answer', '')}")
    print(f"Confidence        : {confidence:.2f}")
    print(f"Steps             : {len(solution.get('steps', []))}")
    return solution


print("solve_problem defined.")


### **4. Step 3 — SPEAK: Read the Solution Aloud**

`speak_solution` converts the `solution_spoken` text returned by the solver to audio
using **Bulbul v3** and saves the WAV file to the `outputs/` folder.

Each language is paired with a natural-sounding speaker voice. The function falls back
to `shubh` (Hindi) for any unrecognised language code.


In [None]:
_SPEAKER_MAP: dict[str, str] = {
    "hi-IN": "shubh",
    "ta-IN": "kavya",
    "te-IN": "priya",
    "kn-IN": "arvind",
    "ml-IN": "anu",
    "gu-IN": "priya",
    "mr-IN": "shubh",
    "bn-IN": "priya",
    "en-IN": "shubh",
}


def speak_solution(
    solution_spoken: str,
    language_code: str,
    output_dir: str = "outputs",
) -> str:
    """Convert the spoken solution summary to audio using Bulbul v3 TTS.

    Args:
        solution_spoken: Natural-language solution text in the target language,
                         suitable for text-to-speech (no LaTeX or special symbols).
        language_code:   BCP-47 language code (e.g. 'hi-IN').
        output_dir:      Directory where the WAV file is saved.

    Returns:
        Path to the saved WAV file.
    """
    Path(output_dir).mkdir(parents=True, exist_ok=True)
    speaker = _SPEAKER_MAP.get(language_code, "shubh")

    tts_response = client.text_to_speech.convert(
        text=solution_spoken,
        target_language_code=language_code,
        model="bulbul:v3",
        speaker=speaker,
        speech_sample_rate=24000,
    )

    if not tts_response.audios:
        raise RuntimeError(
            f"Bulbul TTS returned no audio for language '{language_code}'. "
            "Check that the language code and speaker are supported."
        )

    audio_bytes = base64.b64decode(tts_response.audios[0])
    output_path = str(Path(output_dir) / f"solution_{language_code}.wav")
    with open(output_path, "wb") as f:
        f.write(audio_bytes)

    print(f"Audio saved to    : {output_path}")
    return output_path


print("speak_solution defined.")


### **5. End-to-End Pipeline**

`solve_math_problem` ties all three steps together.

Pass any math word problem in a supported Indian language and receive a dict that
conforms to the full output schema:

```python
{
    "problem_language": str,   # BCP-47 code of the input language
    "problem_english":  str,   # English translation of the problem
    "steps": [                 # Step-by-step solution
        {
            "step_number":  int,
            "description":  str,
            "calculation":  str,
        }
    ],
    "final_answer":    str,    # Concise answer in English
    "confidence":      float,  # Model confidence (0.0 to 1.0)
    "solution_spoken": str,    # TTS-ready explanation in the original language
    "audio_path":      str,    # Path to the saved WAV file
}
```


In [None]:
def solve_math_problem(
    problem_text: str,
    output_dir: str = "outputs",
) -> dict | None:
    """Full pipeline: translate and parse -> solve -> speak.

    Args:
        problem_text: Math word problem in any supported Indian language or English.
        output_dir:   Directory where the TTS audio file is saved.

    Returns:
        Dict conforming to the output schema (see pipeline header), or None if the
        pipeline fails at any step.
    """
    print(f"Input problem     : {problem_text}")
    try:
        print("  Step 1/3 — Detecting language and translating with Sarvam-M...")
        parsed = translate_and_parse(problem_text)
        if parsed is None:
            raise ValueError("translate_and_parse returned None.")

        print("  Step 2/3 — Solving step by step with Sarvam-M...")
        solution = solve_problem(
            problem_english=parsed["problem_english"],
            problem_language=parsed["problem_language"],
        )
        if solution is None:
            raise ValueError("solve_problem returned None.")

        print("  Step 3/3 — Synthesising audio with Bulbul TTS...")
        audio_path = speak_solution(
            solution_spoken=solution["solution_spoken"],
            language_code=parsed["problem_language"],
            output_dir=output_dir,
        )

        return {
            "problem_language": parsed["problem_language"],
            "problem_english":  parsed["problem_english"],
            "steps":            solution["steps"],
            "final_answer":     solution["final_answer"],
            "confidence":       solution["confidence"],
            "solution_spoken":  solution["solution_spoken"],
            "audio_path":       audio_path,
        }

    except Exception as e:
        traceback.print_exc()
        print(f"ERROR: Pipeline failed: {e}")
        return None


print("solve_math_problem defined.")


### **6. Demo — Run the Pipeline**

The cell below runs the full pipeline on a sample Hindi math problem:

> **एक दुकानदार के पास 150 आम थे। उसने 2/3 आम बेच दिए। कितने आम बचे?**
> *(A shopkeeper had 150 mangoes. He sold two-thirds of them. How many mangoes were left?)*

No audio recording or external file is required. The pipeline works entirely on text
input and produces a spoken solution audio file as output.


In [None]:
DEMO_PROBLEM = ("\u090f\u0915 \u0926\u0941\u0915\u093e\u0928\u0926\u093e\u0930 \u0915\u0947 \u092a\u093e\u0938 150 \u0906\u092e \u0925\u0947\u0964 \u0909\u0938\u0928\u0947 2/3 \u0906\u092e \u092c\u0947\u091a \u0926\u093f\u090f\u0964 \u0915\u093f\u0924\u0928\u0947 \u0906\u092e \u092c\u091a\u0947?")

result = solve_math_problem(DEMO_PROBLEM)


### **7. Results**

Inspect the full solution schema, then listen to or download the spoken audio reply.


In [None]:
from IPython.display import Audio, FileLink, display

if result:
    lang_label = _LANGUAGE_LABELS.get(result["problem_language"], result["problem_language"])

    print("=== Solution Schema ===")
    print(f"problem_language  : {lang_label} ({result['problem_language']})")
    print(f"problem_english   : {result['problem_english']}")
    print()
    print("steps:")
    for step in result["steps"]:
        print(
            f"  Step {step['step_number']}: {step['description']}"
            f"  |  {step['calculation']}"
        )
    print()
    print(f"final_answer      : {result['final_answer']}")
    print(f"confidence        : {result['confidence']:.2f}")
    print()
    print(f"solution_spoken   : {result['solution_spoken']}")
    print()
    print("Audio reply:")
    display(Audio(filename=result["audio_path"]))
    print()
    print("Download:")
    display(FileLink(result["audio_path"], result_html_prefix="Click to download: "))
else:
    print("Processing failed. Check the error messages above.")


### **8. Error Reference**

| Error | Cause | Solution |
| :--- | :--- | :--- |
| `RuntimeError: SARVAM_API_KEY is not set` | Missing or placeholder API key | Add the key to `.env` and reload the kernel. |
| `ValueError: Sarvam-M returned no response` | API quota exceeded or network error | Check quota at [dashboard.sarvam.ai](https://dashboard.sarvam.ai). |
| `json.JSONDecodeError` | Model returned malformed JSON | Re-run the cell; if persistent, inspect the raw model output. |
| `RuntimeError: Bulbul TTS returned no audio` | Unsupported language code or speaker | Verify the code exists in `_SPEAKER_MAP`. |
| `WARNING: Low confidence` | Model is uncertain about the solution | Review the steps manually and verify the final answer. |
| `insufficient_quota_error` (HTTP 429) | API rate limit reached | Wait and retry, or upgrade your plan. |
| `internal_server_error` (HTTP 500) | Transient server-side issue | Wait briefly and retry the pipeline. |

### **9. Extending the Recipe**

- **Batch mode:** Wrap `solve_math_problem` in a loop to process a list of problems
  from a CSV file and write results to an Excel sheet using `openpyxl`.
- **Voice input:** Chain this recipe with **Saarika STT** to accept spoken problems
  recorded on a microphone — see `examples/multilingual-support-bot` for the STT
  integration pattern.
- **Additional languages:** Add new entries to `_SPEAKER_MAP` and `_LANGUAGE_LABELS`
  as Sarvam AI adds support for more languages.

### **10. Conclusion and Resources**

This recipe chains **Sarvam-M** for language detection, translation, and step-by-step
math solving with **Bulbul v3 TTS** for spoken output — delivering a full multilingual
math tutoring loop in under 100 lines of application code.

* [Sarvam AI Docs](https://docs.sarvam.ai)
* [Sarvam-M Chat API](https://docs.sarvam.ai/api-reference-docs/chat)
* [Bulbul TTS API](https://docs.sarvam.ai/api-reference-docs/text-to-speech)
* [Indic Language Support](https://docs.sarvam.ai/language-support)

**Keep Building!**
