In [1]:
%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
!pip install python-dotenv
!pip install huggingface_hub
!pip install wandb
!pip install datasets

In [2]:
from dotenv import load_dotenv
import os
from huggingface_hub import login

load_dotenv()

hf_token = os.environ.get("HUGGINGFACE_TOKEN")
login(hf_token)


  from .autonotebook import tqdm as notebook_tqdm


In [3]:
import wandb

wb_token = os.environ.get("WANDB")

wandb.login(key=wb_token)
run = wandb.init(
    project='Fine-tune-DeepSeek-R1-Distill-Qwen-7B on Stack Experiment Dataset', 
    job_type="training", 
    anonymous="allow"
)

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /home/zhaiyl/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mzhaiyl[0m ([33mpingcap[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


In [4]:
from unsloth import FastLanguageModel

max_seq_length = 8192 
dtype = None 
load_in_4bit = False

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/DeepSeek-R1-Distill-Qwen-7B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = hf_token, 
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.1.8: Fast Qwen2 patching. Transformers: 4.49.0.
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.381 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards: 100%|██████████| 4/4 [00:05<00:00,  1.48s/it]


In [5]:
from typing import Dict, Any

def describe_goal(goal: str, metadata: Dict[str, Any]) -> str:
    """
    Describe the goal in a more detailed and structured way.

    Args:
        goal: The original task goal
        metadata: Task metadata containing response format, label path etc.

    Returns:
        A formatted string describing the complete goal context
    """
    description_parts = []

    # Add the main goal
    description_parts.append(f"Goal: {goal}")

    if metadata:
        response_format = metadata.get("response_format", {})

        # Add background information if present
        background = response_format.get("Background") or response_format.get(
            "background"
        )
        if background:
            description_parts.append(f"Background: {background}")

        # Add annotations if present
        annotations = response_format.get("Annotations") or response_format.get(
            "annotations"
        )
        if annotations:
            description_parts.append(f"Annotations: {annotations}")

        # Add language information if present
        lang = response_format.get("Lang") or response_format.get("lang")
        if lang:
            description_parts.append(f"Response Language: {lang}")

        # Add format requirements if present
        format_req = response_format.get("Format") or response_format.get("format")
        if format_req:
            description_parts.append(f"Response Format: {format_req}")

        # Add label path if present
        label_path = metadata.get("label_path")
        if label_path:
            if isinstance(label_path, list):
                # Handle both string list and dict list formats
                path_str = " -> ".join(
                    item["label"] if isinstance(item, dict) else item
                    for item in label_path
                )
                description_parts.append(f"Labels: {path_str}")

    return "\n".join(description_parts)

In [6]:
prompt_style = """Your task is to generate a detailed action plan to achieve the following goal:

Goal: {}

--------------------------------

**MUST follow the Specification**:
# Specification for Generating Executable Plans for the Virtual Machine (VM)

## Table of Contents
1. Overview of the VM
2. Instruction Format
3. Supported Instructions
4. Parameters and Variable References
5. Variables and Dependencies
6. Plan Structure
7. Best Practices
8. Common Errors
9. Available Tools for calling instruction
10. Example Plan

## 1. Overview of the VM
The VM executes plans consisting of a sequence of instructions. Each instruction performs a specific operation and may interact with variables stored in a variable store. The VM supports conditional execution and can handle dependencies between instructions through variable assignments and references.

### Key features:
- **Variable Store**: A key-value store where variables are stored and accessed by name.
- **Instruction Execution**: Instructions are executed sequentially unless control flow is altered by conditional statements.

## 2. Instruction Format
Each instruction in the plan is represented as a JSON object with the following keys:

- `seq_no`: A unique and AUTO-INCREMENT integer identifying the instruction's sequence within the plan, starting from 0.
- `type`: A string indicating the instruction type. See Supported Instructions.
- `parameters`: An object containing parameters required by the instruction.

```json
{{
  "seq_no": N,
  "type": "instruction_type",
  "parameters": {{
    "param1": "value_or_variable_reference",
    "param2": "value_or_variable_reference"
  }}
}}
```

## 3. Supported Instructions
### 3.1 assign
- **Purpose**: Assigns values to one or more variables.
- **Parameters**: An object where each key is a variable name. Each value can be:
  1. A direct value (number/string).
  2. A reference to an existing variable: use the syntax "${{variable_name}}".
  3. A template string that interpolates variables for string concatenation.
     - Example: "The reason is: ${{reason}}, and the solution is: ${{solution}}"
  4. A basic arithmetic expression involving numeric variables:
     - Supported operators: +, -, *, /, ** (pow), % (mod), unary +/-
     - Example: "${{var0}} / 3 + ${{var1}}"

The VM will:
1. Replace each "${{varName}}" with the current value of varName.
2. If the result is a pure numeric expression (e.g., 2+3, 5*6, or referencing numeric variables), it will be evaluated as a number.
3. If the result is a string with placeholders, it becomes a string concatenation or template filling.
4. Assign the final computed result back to the target variable(s).


**Examples:**

1. Direct Assignment
   ```json
   {{
     "seq_no": 0,
     "type": "assign",
     "parameters": {{
       "constant_number": 42,
       "message": "Hello World"
     }}
   }}
   ```

2. Template/String Interpolation
   ```json
   {{
     "seq_no": 1,
     "type": "assign",
     "parameters": {{
       "recommended_solution": "Reason: ${{reason}}\nSolution: ${{solution}}"
     }}
   }}
   ```

3. Basic Arithmetic
   ```json
   {{
     "seq_no": 2,
     "type": "assign",
     "parameters": {{
       "calculated_result": "${{num1}} + ${{num2}} / 3"
     }}
   }}
   ```

### 3.2 jmp
- **Purpose**: Jumps to a specified sequence number based on an optional condition.
- **Parameters**:
  - `condition_prompt` (optional): The prompt to evaluate the condition. If provided, the LLM evaluates whether to jump. **Must respond with a JSON object in the following format:**
    ```json
    {{
      "result": boolean,
      "explanation": string
    }}
    ```
  - `context` (optional): Additional context for the LLM. Can be a direct string or a variable reference.
  - `jump_if_true`: The `seq_no` to jump to if the condition evaluates to true. Required if `condition_prompt` is provided.
  - `jump_if_false` (optional): The `seq_no` to jump to if the condition evaluates to false. Required if `condition_prompt` is provided.
  - `target_seq` (optional): The `seq_no` to jump to if no condition is provided (unconditional jump).

**Example (Conditional Jump):**
```json
{{
  "seq_no": 4,
  "type": "jmp",
  "parameters": {{
    "condition_prompt": "Is ${{number}} even? Respond with a JSON object in the following format:\n{{\n  \"result\": boolean,\n  \"explanation\": string\n}}\nWhere 'result' is true if the number is even, false otherwise, and 'explanation' provides a brief reason for the result.",
    "context": null,
    "jump_if_true": 5,
    "jump_if_false": 6
  }}
}}
```

**Example (Unconditional Jump):**
```json
{{
  "seq_no": 5,
  "type": "jmp",
  "parameters": {{
    "target_seq": 7
  }}
}}
```

### 3.3 calling
- **Purpose**: Invokes a specific tool or function with the provided parameters.
- **Parameters**: Defines the specifications required to call a tool.
  - `tool_name`: The name of the tool to be called for `calling` instruction.
  - `tool_params`: An object containing key-value pairs that represent the arguments required by the specified tool.
    - Keys: Must match the argument names expected by the tool.
    - Values: Can be either a direct value or a variable reference.
  - `output_vars` (optional): An array specifying how the tool's output should be stored in the VM's variable store for later use.
    - If it is a string: The array contains one variable name. The entire tool's response is stored under this variable name.
    - If it is an array: The array contains variable names corresponding to the keys in the JSON response. Each variable name in the array maps to a key in the JSON object, and the value associated with each key will be extracted and stored under the corresponding variable name.

The structure of `calling` instruction:

```json
{{
  "seq_no": <unique_sequential_number>,
  "type": "calling",
  "parameters": {{
    "tool_name": "<tool_name>",
    "tool_params": {{
      <tool-specific parameters>
    }},
    "output_vars": [<list_of_output_variable_names>]
  }}
}}
```

**Example:**
```json
{{
  "seq_no": 1,
  "type": "calling",
  "parameters": {{
    "tool_name": "tool_name",
    "tool_params": {{
      "param1": "value_or_variable_reference",
      "param2": "value_or_variable_reference"
    }},
    "output_vars": ["variable_name_1", ...]
  }}
}}
```

## 4. Parameters and Variable References
Parameters can be either direct values or variable references. To reference a variable, use the format `${{variable_name}}`.

- **Direct Values** are used when you clearly know the corresponding parameter values. These values do not depend on the results of other instructions, ensuring clarity and simplicity. Using direct values helps improve query readability and maintainability, especially in scenarios where parameters do not need to change dynamically.

- **Variable References** are ideal for scenarios that require dynamic parameter value filling, enhancing the interconnectivity and data flow between instructions. By using variable references, parameters can be adjusted dynamically based on the results of previous steps, increasing the flexibility and automation of the workflow.

- **Don't Use Math Expressions in Parameters and tool_params**: The VM does not have the capability to compute or parse expressions within parameters. It can only perform simple reference substitutions. For example, avoid using expressions like value1 + value2 or value * 2 within parameters, and instead, calculate these values explicitly in a prior step and refer to the result in the parameter.


**Direct Value Example:**
```json
{{
  "seq_no": 1,
  "type": "calling",
  "parameters": {{
    "tool_name": "retrieve_knowledge_graph",
    "tool_params": {{
      "query": "TiDB latest stable version"
    }},
    "output_vars": ["latest_tidb_version_info"]
  }}
}}
```

**Variable Reference Example:**
```json
{{
  "seq_no": 4,
  "type": "calling",
  "parameters": {{
    "tool_name": "vector_search",
    "tool_params": {{
      "query": "What are the key features and improvements in TiDB version ${{latest_stable_tidb_version}}?",
      "top_k": 10
    }},
    "output_vars": ["tidb_key_features_and_improvements"]
  }}
}}
```

## 5. Variables and Dependencies
- **Variable Assignment**: Use the `assign` instruction or specify an `output_vars` in a `calling` instruction that produces outputs.
- **Variable Access**: Reference variables in parameters using the variable reference format.
- **Dependencies**: Manage dependencies by assigning outputs to variables and referencing them in subsequent instructions.

## 6. Plan Structure
- **Sequential Execution**: Instructions are executed in order based on their `seq_no`.
- **Control Flow**: Use the `jmp` instruction for branching logic and conditional loops.

## 7. Best Practices
- **Sequence Numbering**: Ensure that `seq_no` values are unique, sequential integers within the plan.
- **Variable Naming**: Use descriptive variable names to make the plan readable and maintainable.
- **Control Flow**: Use `jmp` instructions to create conditional logic, manage execution flow, and implement loops effectively.
- **Final answer**: The name of output var of The last instruction MUST be "final_answer".
- **Language Consistency**: All the instructions (e.g. `llm_generate`) that directly contribute to generating the `final_answer` must be written in the same language as the Response Language (if not specified, use the same language of the goal). This ensures the final output is consistent with the intended language.

- **Instruction type selection**: Available instruction types:[assign, jmp, calling].

- **Avoid variable dependencies within a single "assign" instruction**：Since the order of variable assignments within an "assign" instruction is not defined, do not rely on one variable being assigned before another within the same instruction. Instead, split assignments across multiple instructions if one depends on another. For example, this is incorrect:

```json
{{
  "seq_no": 3,
  "type": "assign",
  "parameters": {{
    "y": "${{x}}",
    "x": 10
  }}
}}
```

"y" might end up being undefined because we cannot guarantee that "x" will be set first. The correct approach is to split them:

```json
{{
  "seq_no": 3,
  "type": "assign",
  "parameters": {{
    "x": 10
  }}
}},
{{
  "seq_no": 4,
  "type": "assign",
  "parameters": {{
    "y": "${{x}}"
  }}
}}
```

- **Best Practices for Information Retrieval - Combining Knowledge Graph Search and Vector Search**:
  - Dual Retrieval: When retrieving information, utilize both Knowledge Graph Search and Vector Search simultaneously. This combination enhances the richness of the information by leveraging the structured data from the knowledge graph and the detailed insights from vector search.
  - Unified Summarization: After retrieving data from both tools, use an LLM generation tool to summarize the knowledge related to the query. Avoid directly using the loose data returned by the two tools; instead, ensure all retrieved information is processed through the LLM generation tool to create a coherent and well-structured final answer.
  - Tool Integration: Ensure that raw data retrieved from both Knowledge Graph Search and Vector Search is exclusively processed by the LLM generation tool. Do not pass this data to other tools, as doing so may result in an unreadable final answer or prevent other tools from effectively processing the data. This practice maintains the coherence, integrity, and quality of the final response.
  - Maintain Coherence: By processing all retrieved data through the LLM generation tool, you ensure that the final answer is a cohesive, single-language narrative. This avoids the inclusion of raw or fragmented data that could compromise the readability and consistency of the response.

- **Final Answer Alignment**:
  - **Goal-Centric Generation**: Ensure that the generated `final_answer` directly addresses the question or objective outlined in the goal. The `final_answer` should be focused and relevant to the goal and avoid general response.
  - **Contextual Consistency**: Since the tools in the plan (e.g., `llm_generate`) do not aware the goal, include the goal context when making tool calls if necessary. Maintain the alignment between the goal and all intermediate steps leading to the `final_answer`. This ensures that every instruction and tool interaction contributes towards achieving the desired outcome.
  - **Avoid Divergence**: Prevent the generation of information that, while relevant, does not serve to answer the primary goal. All synthesized and summarized data should reinforce the goal-centric `final_answer`.

## 8. Common Errors

**Case 1: Querying Specific Runtime/Environment Information**

**Error Example:**
```json
{{
  "seq_no": 1,
  "type": "calling",
  "parameters": {{
    "tool_name": "tool_name",
    "tool_params": {{
      "query": "Determin the current version of ..."
    }},
    "output_vars": [...]
  }}
}}
```

```json
{{
  "parameters": {{
    "output_vars": [
      "slow_query_log_explanation",
      "sample_slow_query_log"
    ],
    "tool_name": "llm_generate",
    "tool_params": {{
      "context": null,
      "prompt": "Please analyze the sql query: `SELECT * FROM INFORMATION_SCHEMA.SLOW_QUERY ORDER BY start_time DESC LIMIT 10;`. Explain the slow query and its relevant details(at least contain 'query', 'start_time', 'duration', 'plan_digest').\n\nPlease ensure that the generated text uses English."
    }}
  }},
  "seq_no": 2,
  "type": "calling"
}}
```

**Error Explanation**:

- **Not allowed to execute SQL**: Please do not use any tools, such as llm_generate, to attempt to obtain SQL execution results.
- **Do Not Assume Specific Environment Information**: Do not make assumptions about (or generate) specific details of the environment, such as their current system configuration, current versions of tidb, current tiup version, or private data. Plans should be designed to be adaptable and not rely on presumed specific environment information.
- **Avoid Obtain Specific Data with General Tools**: General tools like `retrieve_knowledge_graph`, `vector_search` and `llm_generate` can only access public documentation and general knowledge. They cannot access:
  - Current system configuration
  - Current version
  - Cluster status
  - Any private or runtime information
  Such specific environment information can only be obtained through specialized tools explicitly designed for that purpose, or should be provided by the user as part of their query.


## 9. Available Tools for `calling` instruction


Please use only the following tools in Calling Instruction:

### retrieve_knowledge_graph


    Retrieves TiDB related information from a knowledge graph, returning nodes and relationships between those nodes.

    This tool is designed to extract structured knowledge about TiDB from a knowledge graph. It excels at identifying entities and relationships, providing a rich context of interconnected information.

    Arguments:
      - `query`: The query string. This should be a question or statement about TiDB entities, concepts, or their relationships. Can be a direct string or a variable reference.

    Output:
      - Returns a single dictionary (`Dict`) representing the retrieved knowledge graph data. This dictionary contains a complex structure representing nodes and the relationships between them, extracted from the knowledge graph. 
        **Important:** The raw output of this tool, a complex dictionary representing graph data, is **not intended for direct use in the final answer.**  The knowledge graph data is returned in a structured format that requires further processing to be presented in a user-friendly and coherent manner.


    Best practices:
    - **Prioritize for Information Retrieval:** For most information retrieval tasks related to TiDB knowledge, the `retrieve_knowledge_graph` tool should be your **first choice**.  Consider using `retrieve_knowledge_graph` and `vector_search` together with the **same query** to retrieve complementary information and increase the richness of results.  `vector_search` can be used as a secondary option when graph-based knowledge is insufficient.
    - **Refine and Synthesize with `llm_generate`:** After retrieving information using `retrieve_knowledge_graph` (and optionally `vector_search`), **always** process the raw output using the `llm_generate` tool.  Use `llm_generate` to refine, summarize, and synthesize the retrieved knowledge graph data (and document snippets if using `vector_search` as well) into a concise and user-friendly answer. Do **not** directly use the raw output in the `final_answer`.
    - **Focus Queries on General TiDB Knowledge:**  Target your queries towards general, shared knowledge about TiDB concepts and relationships.  Avoid queries that are specific to a user's environment or seek private data like configurations or versions, which is out of scope of this tool.
    

### llm_generate


    Generates a response using the Language Model (LLM).

    This tool must be used within a "calling" instruction in the plan.

    Arguments:
    - `prompt`: The prompt to provide to the LLM. Can be a direct string or a variable reference.
        - **Language Matching**: Write the prompt in the same language as the goal.
        - **Language Confirmation**: Append a sentence to confirm the desired language of the generated text:
            - *For English goals*: "Please ensure that the generated text uses English."
            - *For Chinese goals*: "请确保生成的文本使用中文。"
            - *For Japanese goals*: "Please ensure that the generated text uses Japanese."
    - `context` (optional): Additional context for the LLM. Can be a direct string or a variable reference.

    Output: The output format (text or JSON) depends on your instructions.
    - Text Response: If you ask for a text answer, let output_vars be an array containing one variable name. The entire text response will be stored under this variable.
    - JSON Response: If you instruct the LLM to respond in JSON format, let output_vars be an array containing variable names that match the keys in the JSON response. Each variable name corresponds to a key in the JSON object, and the value associated with each key is stored under the corresponding variable name.

    Example usage in a plan:
    ```json
    {{
        "seq_no": 1,
        "type": "calling",
        "parameters": {{
            "tool_name": "llm_generate",
            "tool_params": {{
                "prompt": "Analyze the sales data and provide summary and insights, response a json object including keys ['summary', 'insights'].",
                "context": "${{sales_data}}"
            }},
            "output_vars": ["summary", "insights"]
        }}
    }}
    ```

    Best practices:
    - Always use llm_generate within a "calling" instruction in your plan.
    - Use variable references (${{variable_name}}) when you need to include dynamic content from previous steps.
    

### vector_search


    Retrieves the most relevant snippets of TiDB documentation based on embedding similarity to your query.

    This tool leverages vector embeddings to find document fragments from TiDB documentation that are most semantically similar to your query. It excels at finding relevant document snippets that provide rich context and detailed information.

    Arguments:
      - `query`: The query string. It should be a clear and simple statement or question, focusing on a single objective for best results.
      - `top_k`: The number of top document snippets to retrieve. Must be an integer or a variable referencing an integer.

    Output:
      - Returns a list of dictionaries (`List[Dict]`). Each dictionary represents a retrieved document chunk and contains information about the chunk (e.g., content, source). **Important:** The raw output of this tool, a list of dictionaries, is **not intended for direct use in the final answer.** The document chunks are returned as individual fragments and require further processing to form a coherent response.


    Example to call this tool:

    **Example:**
    ```json
    {{
        "seq_no": 3,
        "type": "calling",
        "parameters": {{
            "tool_name": "vector_search",
            "tool_params": {{
                "query": "Information about ...",
                "top_k": 10
            }},
            "output_vars": ["embedded_chunks"]
        }}
    }}
    ```

    Best practices:
      - **Process Output with `llm_generate`:**  The `vector_search` tool returns a list of document chunks. **Always** process this raw output using the `llm_generate` tool to summarize, synthesize, and refine the information into a coherent answer before using it in the final response.  Do **not** directly use the raw `vector_search` output in the `final_answer`.
      - **Use Clear, Focused Queries:** For the best search results, ensure your query is clear, concise, and focuses on a **single**, specific question or objective. Avoid multi-part or ambiguous queries.

-------------------------------

Now, let's generate the plan.

1. **Analyze the Request**:
   - Determine the primary intent behind the goal.
   - Identify any implicit requirements or necessary prerequisites.

2. **Break Down the Goal**:
   - Decompose the goal into smaller, manageable sub-goals or tasks.
   - Ensure each sub-goal is specific, actionable, and can be addressed with existing tools or data sources.
   - Identify dependencies between sub-goals to establish the correct execution order.

3. **Generate an Action Plan**:
   - For each sub-goal, create a corresponding action step to achieve it.
   - Ensure the plan follows the VM Specification.
   - Include a 'reasoning' step at the beginning of the plan that outlines the chain of thought and dependency analysis of the steps.
   - IMPORTANT: Always use tools within "calling" instructions. Never use tool functions directly in the plan.

4. **Tool Usage Guidelines**:
   - When using a tool, always wrap it in a "calling" instruction.
   - For calling instruction, Only select tools listed in the "Available Tools" section. Using tools outside this list will cause the plan to fail.
   - Ensure that the "tool_params" object contains all necessary parameters for the specific tool being called.

The final step of the plan must be assign the final output result to the 'final_answer' variable.
You should response in the following format:

<think>...</think>
```json
[
  {{
    "seq_no": 0,
    ...
  }},
  ...
]
```

where <think> is your detailed reasoning process in text format and the JSON array after <think> is a valid plan.

### Response:
<think>{}"""

In [7]:
goal = "what is tidb?"

FastLanguageModel.for_inference(model) 
inputs = tokenizer([prompt_style.format(goal, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=2048,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])


<think>
Alright, let's tackle the goal: What is TiDB? I need to create a detailed action plan using the provided specifications. First, I should understand what TiDB is. TiDB is an open-source, real-time, in-memory, distributed database designed for handling complex queries and large datasets efficiently. It's built on top of TiDB's knowledge graph and vector search capabilities, making it highly scalable and suitable for applications requiring real-time analytics and machine learning.

To achieve this, I'll break down the goal into manageable steps. The primary task is to retrieve information about TiDB using the available tools:retrieve_knowledge_graph and vector_search. Then, process this information to generate a coherent summary using llm_generate.

1. **Retrieve Knowledge Graph Data**: Use retrieve_knowledge_graph to get structured information about TiDB. This will provide a comprehensive overview of TiDB's features, components, and use cases.

2. **Retrieve Vector Search Snippe

In [8]:
train_prompt_style = """Your task is to generate a detailed action plan to achieve the following goal:

Goal: {}

--------------------------------

**MUST follow the Specification**:
# Specification for Generating Executable Plans for the Virtual Machine (VM)

## Table of Contents
1. Overview of the VM
2. Instruction Format
3. Supported Instructions
4. Parameters and Variable References
5. Variables and Dependencies
6. Plan Structure
7. Best Practices
8. Common Errors
9. Available Tools for calling instruction
10. Example Plan

## 1. Overview of the VM
The VM executes plans consisting of a sequence of instructions. Each instruction performs a specific operation and may interact with variables stored in a variable store. The VM supports conditional execution and can handle dependencies between instructions through variable assignments and references.

### Key features:
- **Variable Store**: A key-value store where variables are stored and accessed by name.
- **Instruction Execution**: Instructions are executed sequentially unless control flow is altered by conditional statements.

## 2. Instruction Format
Each instruction in the plan is represented as a JSON object with the following keys:

- `seq_no`: A unique and AUTO-INCREMENT integer identifying the instruction's sequence within the plan, starting from 0.
- `type`: A string indicating the instruction type. See Supported Instructions.
- `parameters`: An object containing parameters required by the instruction.

```json
{{
  "seq_no": N,
  "type": "instruction_type",
  "parameters": {{
    "param1": "value_or_variable_reference",
    "param2": "value_or_variable_reference"
  }}
}}
```

## 3. Supported Instructions
### 3.1 assign
- **Purpose**: Assigns values to one or more variables.
- **Parameters**: An object where each key is a variable name. Each value can be:
  1. A direct value (number/string).
  2. A reference to an existing variable: use the syntax "${{variable_name}}".
  3. A template string that interpolates variables for string concatenation.
     - Example: "The reason is: ${{reason}}, and the solution is: ${{solution}}"
  4. A basic arithmetic expression involving numeric variables:
     - Supported operators: +, -, *, /, ** (pow), % (mod), unary +/-
     - Example: "${{var0}} / 3 + ${{var1}}"

The VM will:
1. Replace each "${{varName}}" with the current value of varName.
2. If the result is a pure numeric expression (e.g., 2+3, 5*6, or referencing numeric variables), it will be evaluated as a number.
3. If the result is a string with placeholders, it becomes a string concatenation or template filling.
4. Assign the final computed result back to the target variable(s).


**Examples:**

1. Direct Assignment
   ```json
   {{
     "seq_no": 0,
     "type": "assign",
     "parameters": {{
       "constant_number": 42,
       "message": "Hello World"
     }}
   }}
   ```

2. Template/String Interpolation
   ```json
   {{
     "seq_no": 1,
     "type": "assign",
     "parameters": {{
       "recommended_solution": "Reason: ${{reason}}\nSolution: ${{solution}}"
     }}
   }}
   ```

3. Basic Arithmetic
   ```json
   {{
     "seq_no": 2,
     "type": "assign",
     "parameters": {{
       "calculated_result": "${{num1}} + ${{num2}} / 3"
     }}
   }}
   ```

### 3.2 jmp
- **Purpose**: Jumps to a specified sequence number based on an optional condition.
- **Parameters**:
  - `condition_prompt` (optional): The prompt to evaluate the condition. If provided, the LLM evaluates whether to jump. **Must respond with a JSON object in the following format:**
    ```json
    {{
      "result": boolean,
      "explanation": string
    }}
    ```
  - `context` (optional): Additional context for the LLM. Can be a direct string or a variable reference.
  - `jump_if_true`: The `seq_no` to jump to if the condition evaluates to true. Required if `condition_prompt` is provided.
  - `jump_if_false` (optional): The `seq_no` to jump to if the condition evaluates to false. Required if `condition_prompt` is provided.
  - `target_seq` (optional): The `seq_no` to jump to if no condition is provided (unconditional jump).

**Example (Conditional Jump):**
```json
{{
  "seq_no": 4,
  "type": "jmp",
  "parameters": {{
    "condition_prompt": "Is ${{number}} even? Respond with a JSON object in the following format:\n{{\n  \"result\": boolean,\n  \"explanation\": string\n}}\nWhere 'result' is true if the number is even, false otherwise, and 'explanation' provides a brief reason for the result.",
    "context": null,
    "jump_if_true": 5,
    "jump_if_false": 6
  }}
}}
```

**Example (Unconditional Jump):**
```json
{{
  "seq_no": 5,
  "type": "jmp",
  "parameters": {{
    "target_seq": 7
  }}
}}
```

### 3.3 calling
- **Purpose**: Invokes a specific tool or function with the provided parameters.
- **Parameters**: Defines the specifications required to call a tool.
  - `tool_name`: The name of the tool to be called for `calling` instruction.
  - `tool_params`: An object containing key-value pairs that represent the arguments required by the specified tool.
    - Keys: Must match the argument names expected by the tool.
    - Values: Can be either a direct value or a variable reference.
  - `output_vars` (optional): An array specifying how the tool's output should be stored in the VM's variable store for later use.
    - If it is a string: The array contains one variable name. The entire tool's response is stored under this variable name.
    - If it is an array: The array contains variable names corresponding to the keys in the JSON response. Each variable name in the array maps to a key in the JSON object, and the value associated with each key will be extracted and stored under the corresponding variable name.

The structure of `calling` instruction:

```json
{{
  "seq_no": <unique_sequential_number>,
  "type": "calling",
  "parameters": {{
    "tool_name": "<tool_name>",
    "tool_params": {{
      <tool-specific parameters>
    }},
    "output_vars": [<list_of_output_variable_names>]
  }}
}}
```

**Example:**
```json
{{
  "seq_no": 1,
  "type": "calling",
  "parameters": {{
    "tool_name": "tool_name",
    "tool_params": {{
      "param1": "value_or_variable_reference",
      "param2": "value_or_variable_reference"
    }},
    "output_vars": ["variable_name_1", ...]
  }}
}}
```

## 4. Parameters and Variable References
Parameters can be either direct values or variable references. To reference a variable, use the format `${{variable_name}}`.

- **Direct Values** are used when you clearly know the corresponding parameter values. These values do not depend on the results of other instructions, ensuring clarity and simplicity. Using direct values helps improve query readability and maintainability, especially in scenarios where parameters do not need to change dynamically.

- **Variable References** are ideal for scenarios that require dynamic parameter value filling, enhancing the interconnectivity and data flow between instructions. By using variable references, parameters can be adjusted dynamically based on the results of previous steps, increasing the flexibility and automation of the workflow.

- **Don't Use Math Expressions in Parameters and tool_params**: The VM does not have the capability to compute or parse expressions within parameters. It can only perform simple reference substitutions. For example, avoid using expressions like value1 + value2 or value * 2 within parameters, and instead, calculate these values explicitly in a prior step and refer to the result in the parameter.


**Direct Value Example:**
```json
{{
  "seq_no": 1,
  "type": "calling",
  "parameters": {{
    "tool_name": "retrieve_knowledge_graph",
    "tool_params": {{
      "query": "TiDB latest stable version"
    }},
    "output_vars": ["latest_tidb_version_info"]
  }}
}}
```

**Variable Reference Example:**
```json
{{
  "seq_no": 4,
  "type": "calling",
  "parameters": {{
    "tool_name": "vector_search",
    "tool_params": {{
      "query": "What are the key features and improvements in TiDB version ${{latest_stable_tidb_version}}?",
      "top_k": 10
    }},
    "output_vars": ["tidb_key_features_and_improvements"]
  }}
}}
```

## 5. Variables and Dependencies
- **Variable Assignment**: Use the `assign` instruction or specify an `output_vars` in a `calling` instruction that produces outputs.
- **Variable Access**: Reference variables in parameters using the variable reference format.
- **Dependencies**: Manage dependencies by assigning outputs to variables and referencing them in subsequent instructions.

## 6. Plan Structure
- **Sequential Execution**: Instructions are executed in order based on their `seq_no`.
- **Control Flow**: Use the `jmp` instruction for branching logic and conditional loops.

## 7. Best Practices
- **Sequence Numbering**: Ensure that `seq_no` values are unique, sequential integers within the plan.
- **Variable Naming**: Use descriptive variable names to make the plan readable and maintainable.
- **Control Flow**: Use `jmp` instructions to create conditional logic, manage execution flow, and implement loops effectively.
- **Final answer**: The name of output var of The last instruction MUST be "final_answer".
- **Language Consistency**: All the instructions (e.g. `llm_generate`) that directly contribute to generating the `final_answer` must be written in the same language as the Response Language (if not specified, use the same language of the goal). This ensures the final output is consistent with the intended language.

- **Instruction type selection**: Available instruction types:[assign, jmp, calling].

- **Avoid variable dependencies within a single "assign" instruction**：Since the order of variable assignments within an "assign" instruction is not defined, do not rely on one variable being assigned before another within the same instruction. Instead, split assignments across multiple instructions if one depends on another. For example, this is incorrect:

```json
{{
  "seq_no": 3,
  "type": "assign",
  "parameters": {{
    "y": "${{x}}",
    "x": 10
  }}
}}
```

"y" might end up being undefined because we cannot guarantee that "x" will be set first. The correct approach is to split them:

```json
{{
  "seq_no": 3,
  "type": "assign",
  "parameters": {{
    "x": 10
  }}
}},
{{
  "seq_no": 4,
  "type": "assign",
  "parameters": {{
    "y": "${{x}}"
  }}
}}
```

- **Best Practices for Information Retrieval - Combining Knowledge Graph Search and Vector Search**:
  - Dual Retrieval: When retrieving information, utilize both Knowledge Graph Search and Vector Search simultaneously. This combination enhances the richness of the information by leveraging the structured data from the knowledge graph and the detailed insights from vector search.
  - Unified Summarization: After retrieving data from both tools, use an LLM generation tool to summarize the knowledge related to the query. Avoid directly using the loose data returned by the two tools; instead, ensure all retrieved information is processed through the LLM generation tool to create a coherent and well-structured final answer.
  - Tool Integration: Ensure that raw data retrieved from both Knowledge Graph Search and Vector Search is exclusively processed by the LLM generation tool. Do not pass this data to other tools, as doing so may result in an unreadable final answer or prevent other tools from effectively processing the data. This practice maintains the coherence, integrity, and quality of the final response.
  - Maintain Coherence: By processing all retrieved data through the LLM generation tool, you ensure that the final answer is a cohesive, single-language narrative. This avoids the inclusion of raw or fragmented data that could compromise the readability and consistency of the response.

- **Final Answer Alignment**:
  - **Goal-Centric Generation**: Ensure that the generated `final_answer` directly addresses the question or objective outlined in the goal. The `final_answer` should be focused and relevant to the goal and avoid general response.
  - **Contextual Consistency**: Since the tools in the plan (e.g., `llm_generate`) do not aware the goal, include the goal context when making tool calls if necessary. Maintain the alignment between the goal and all intermediate steps leading to the `final_answer`. This ensures that every instruction and tool interaction contributes towards achieving the desired outcome.
  - **Avoid Divergence**: Prevent the generation of information that, while relevant, does not serve to answer the primary goal. All synthesized and summarized data should reinforce the goal-centric `final_answer`.

## 8. Common Errors

**Case 1: Querying Specific Runtime/Environment Information**

**Error Example:**
```json
{{
  "seq_no": 1,
  "type": "calling",
  "parameters": {{
    "tool_name": "tool_name",
    "tool_params": {{
      "query": "Determin the current version of ..."
    }},
    "output_vars": [...]
  }}
}}
```

```json
{{
  "parameters": {{
    "output_vars": [
      "slow_query_log_explanation",
      "sample_slow_query_log"
    ],
    "tool_name": "llm_generate",
    "tool_params": {{
      "context": null,
      "prompt": "Please analyze the sql query: `SELECT * FROM INFORMATION_SCHEMA.SLOW_QUERY ORDER BY start_time DESC LIMIT 10;`. Explain the slow query and its relevant details(at least contain 'query', 'start_time', 'duration', 'plan_digest').\n\nPlease ensure that the generated text uses English."
    }}
  }},
  "seq_no": 2,
  "type": "calling"
}}
```

**Error Explanation**:

- **Not allowed to execute SQL**: Please do not use any tools, such as llm_generate, to attempt to obtain SQL execution results.
- **Do Not Assume Specific Environment Information**: Do not make assumptions about (or generate) specific details of the environment, such as their current system configuration, current versions of tidb, current tiup version, or private data. Plans should be designed to be adaptable and not rely on presumed specific environment information.
- **Avoid Obtain Specific Data with General Tools**: General tools like `retrieve_knowledge_graph`, `vector_search` and `llm_generate` can only access public documentation and general knowledge. They cannot access:
  - Current system configuration
  - Current version
  - Cluster status
  - Any private or runtime information
  Such specific environment information can only be obtained through specialized tools explicitly designed for that purpose, or should be provided by the user as part of their query.


## 9. Available Tools for `calling` instruction


Please use only the following tools in Calling Instruction:

### retrieve_knowledge_graph


    Retrieves TiDB related information from a knowledge graph, returning nodes and relationships between those nodes.

    This tool is designed to extract structured knowledge about TiDB from a knowledge graph. It excels at identifying entities and relationships, providing a rich context of interconnected information.

    Arguments:
      - `query`: The query string. This should be a question or statement about TiDB entities, concepts, or their relationships. Can be a direct string or a variable reference.

    Output:
      - Returns a single dictionary (`Dict`) representing the retrieved knowledge graph data. This dictionary contains a complex structure representing nodes and the relationships between them, extracted from the knowledge graph. 
        **Important:** The raw output of this tool, a complex dictionary representing graph data, is **not intended for direct use in the final answer.**  The knowledge graph data is returned in a structured format that requires further processing to be presented in a user-friendly and coherent manner.


    Best practices:
    - **Prioritize for Information Retrieval:** For most information retrieval tasks related to TiDB knowledge, the `retrieve_knowledge_graph` tool should be your **first choice**.  Consider using `retrieve_knowledge_graph` and `vector_search` together with the **same query** to retrieve complementary information and increase the richness of results.  `vector_search` can be used as a secondary option when graph-based knowledge is insufficient.
    - **Refine and Synthesize with `llm_generate`:** After retrieving information using `retrieve_knowledge_graph` (and optionally `vector_search`), **always** process the raw output using the `llm_generate` tool.  Use `llm_generate` to refine, summarize, and synthesize the retrieved knowledge graph data (and document snippets if using `vector_search` as well) into a concise and user-friendly answer. Do **not** directly use the raw output in the `final_answer`.
    - **Focus Queries on General TiDB Knowledge:**  Target your queries towards general, shared knowledge about TiDB concepts and relationships.  Avoid queries that are specific to a user's environment or seek private data like configurations or versions, which is out of scope of this tool.
    

### llm_generate


    Generates a response using the Language Model (LLM).

    This tool must be used within a "calling" instruction in the plan.

    Arguments:
    - `prompt`: The prompt to provide to the LLM. Can be a direct string or a variable reference.
        - **Language Matching**: Write the prompt in the same language as the goal.
        - **Language Confirmation**: Append a sentence to confirm the desired language of the generated text:
            - *For English goals*: "Please ensure that the generated text uses English."
            - *For Chinese goals*: "请确保生成的文本使用中文。"
            - *For Japanese goals*: "Please ensure that the generated text uses Japanese."
    - `context` (optional): Additional context for the LLM. Can be a direct string or a variable reference.

    Output: The output format (text or JSON) depends on your instructions.
    - Text Response: If you ask for a text answer, let output_vars be an array containing one variable name. The entire text response will be stored under this variable.
    - JSON Response: If you instruct the LLM to respond in JSON format, let output_vars be an array containing variable names that match the keys in the JSON response. Each variable name corresponds to a key in the JSON object, and the value associated with each key is stored under the corresponding variable name.

    Example usage in a plan:
    ```json
    {{
        "seq_no": 1,
        "type": "calling",
        "parameters": {{
            "tool_name": "llm_generate",
            "tool_params": {{
                "prompt": "Analyze the sales data and provide summary and insights, response a json object including keys ['summary', 'insights'].",
                "context": "${{sales_data}}"
            }},
            "output_vars": ["summary", "insights"]
        }}
    }}
    ```

    Best practices:
    - Always use llm_generate within a "calling" instruction in your plan.
    - Use variable references (${{variable_name}}) when you need to include dynamic content from previous steps.
    

### vector_search


    Retrieves the most relevant snippets of TiDB documentation based on embedding similarity to your query.

    This tool leverages vector embeddings to find document fragments from TiDB documentation that are most semantically similar to your query. It excels at finding relevant document snippets that provide rich context and detailed information.

    Arguments:
      - `query`: The query string. It should be a clear and simple statement or question, focusing on a single objective for best results.
      - `top_k`: The number of top document snippets to retrieve. Must be an integer or a variable referencing an integer.

    Output:
      - Returns a list of dictionaries (`List[Dict]`). Each dictionary represents a retrieved document chunk and contains information about the chunk (e.g., content, source). **Important:** The raw output of this tool, a list of dictionaries, is **not intended for direct use in the final answer.** The document chunks are returned as individual fragments and require further processing to form a coherent response.


    Example to call this tool:

    **Example:**
    ```json
    {{
        "seq_no": 3,
        "type": "calling",
        "parameters": {{
            "tool_name": "vector_search",
            "tool_params": {{
                "query": "Information about ...",
                "top_k": 10
            }},
            "output_vars": ["embedded_chunks"]
        }}
    }}
    ```

    Best practices:
      - **Process Output with `llm_generate`:**  The `vector_search` tool returns a list of document chunks. **Always** process this raw output using the `llm_generate` tool to summarize, synthesize, and refine the information into a coherent answer before using it in the final response.  Do **not** directly use the raw `vector_search` output in the `final_answer`.
      - **Use Clear, Focused Queries:** For the best search results, ensure your query is clear, concise, and focuses on a **single**, specific question or objective. Avoid multi-part or ambiguous queries.

-------------------------------

Now, let's generate the plan.

1. **Analyze the Request**:
   - Determine the primary intent behind the goal.
   - Identify any implicit requirements or necessary prerequisites.

2. **Break Down the Goal**:
   - Decompose the goal into smaller, manageable sub-goals or tasks.
   - Ensure each sub-goal is specific, actionable, and can be addressed with existing tools or data sources.
   - Identify dependencies between sub-goals to establish the correct execution order.

3. **Generate an Action Plan**:
   - For each sub-goal, create a corresponding action step to achieve it.
   - Ensure the plan follows the VM Specification.
   - Include a 'reasoning' step at the beginning of the plan that outlines the chain of thought and dependency analysis of the steps.
   - IMPORTANT: Always use tools within "calling" instructions. Never use tool functions directly in the plan.

4. **Tool Usage Guidelines**:
   - When using a tool, always wrap it in a "calling" instruction.
   - For calling instruction, Only select tools listed in the "Available Tools" section. Using tools outside this list will cause the plan to fail.
   - Ensure that the "tool_params" object contains all necessary parameters for the specific tool being called.

The final step of the plan must be assign the final output result to the 'final_answer' variable.
You should response in the following format:

<think>...</think>
```json
[
  {{
    "seq_no": 0,
    ...
  }},
  ...
]
```

where <think> is your detailed reasoning process in text format and the JSON array after <think> is a valid plan.

### Response:
<think>
{}
</think>
{}"""

In [9]:
import json
EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN


def formatting_prompts_func(examples):
    goals = examples["goal"]
    metadatas = examples["metadata"]
    best_plans = examples["best_plan"]
    reasonings = examples["reasoning"]
    texts = []
    for goal, metadata, cot, plan in zip(goals, metadatas, reasonings, best_plans):
        metadata_json = json.loads(metadata)
        goal_description = describe_goal(goal, metadata_json)
        plan_json = json.loads(plan)
        plan_description = f"""```json{plan_json}```"""
        text = train_prompt_style.format(goal_description, cot, plan_description) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }

In [10]:
from datasets import load_dataset
dataset = load_dataset("ianthereal-z/tidb_bot", split = "train[0:]",trust_remote_code=True)
dataset = dataset.map(formatting_prompts_func, batched = True)
dataset = dataset.train_test_split(test_size=0.1, seed=3407)  # 10% for validation
train_dataset = dataset["train"]
eval_dataset = dataset["test"]
print(train_dataset["text"][0])

Generating train split: 100%|██████████| 100/100 [00:00<00:00, 11265.32 examples/s]


Dataset({
    features: ['task_id', 'goal', 'metadata', 'best_plan', 'reasoning', 'final_answer'],
    num_rows: 100
})

In [13]:
FastLanguageModel.for_training(model)
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

Unsloth 2025.1.8 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


In [14]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    args=TrainingArguments(
        num_train_epochs = 3,
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=60,
        learning_rate=1e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=5,
        optim="adamw_8bit",
        weight_decay=0.1,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        evaluation_strategy="steps",
        eval_steps=5,
        save_strategy="steps",
        save_steps=5,
        load_best_model_at_end=True,
        metric_for_best_model="loss"
    ),
)

Map (num_proc=2): 100%|██████████| 100/100 [00:01<00:00, 55.30 examples/s]
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


In [15]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 100 | Num Epochs = 10
O^O/ \_/ \    Batch size per device = 4 | Gradient Accumulation steps = 4
\        /    Total batch size = 16 | Total steps = 60
 "-____-"     Number of trainable parameters = 40,370,176


Step,Training Loss
5,1.799
10,1.7271
15,1.5423
20,1.4068
25,1.3071
30,1.1753
35,1.0621
40,0.9385
45,0.8184
50,0.6808


In [18]:
goal = "what is tidb?" 

FastLanguageModel.for_inference(model)  # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(goal, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=2048,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])



<think>
Alright, let's break down the goal of finding tidb's latest stable version. The primary intent is to obtain the latest stable version of TiDB, which is a specific runtime/environment information. Since tools like vector_search and llm_generate cannot access current system configurations or runtime details, we must obtain this information through a specialized tool, or it should be provided by the user.

To achieve this, we can utilize the llm_generate tool within a "calling" instruction. The llm_generate tool can generate a response using the LLM, ensuring the answer is in the same language as the goal. Since the goal is in English, we'll use the English version of llm_generate.

We need to include a "reasoning" step at the beginning of the plan to outline the chain of thought and dependency analysis. The reasoning should explain that we cannot obtain specific environment information (like current system configuration or private data) using the tools provided. We must rely on 