# Datamapping Workflow: AI-Generated XSLT for M/TEXT

Generate XSLT 1.0 stylesheets that transform source XML data into M/TEXT datamapping structures using Large Language Models.

## M/TEXT Context

This workflow creates **XSLT transformations** that map external data sources into M/TEXT datamapping structures. In M/TEXT document generation:

1. You have a **BusinessData.datamodel** (target schema) defining the data structure for your template
2. You receive **source data** (XML) from external systems (CRM, ERP, databases)
3. You need an **XSLT transformation** to map source → BusinessData structure
4. M/TEXT uses this XSLT to transform incoming data before template rendering

**Traditional Approach:** Hand-write XSLT by analyzing both schemas (hours to days)  
**AI Approach:** LLM generates the XSLT from examples in seconds

## Who This Is For

**M/TEXT Developers**: Automate the tedious work of writing XSLT transformations when integrating new data sources with existing templates. Instead of manually mapping fields, provide example data and let the LLM generate the transformation rules.


## How It Works

The workflow follows this sequence:

1. **Input Validation**: Receive a datamodel XML (target schema using `<Node name="...">` notation) and a testcase XML (source data example)
2. **Prompt Composition**: Combine both XMLs with a strict system prompt defining output constraints
3. **LLM Generation**: Call Claude Sonnet 4, GPT-5, or Mistral Large to generate XSLT
4. **Extraction**: Parse the response to extract XSLT from markdown fences or prose
5. **Sanitization**: Apply guardrails to ensure well-formed output (close tags, add root template, escape illegal characters)
6. **Response**: Return validated XSLT ready for transformation

The entire flow typically completes in 5-15 seconds depending on model and complexity.


## Inputs and Requirements

### M/TEXT Resources Needed

1. **BusinessData.datamodel XML** (target schema)
   - Exported from M/TEXT Designer
   - Uses `<Node name="...">` notation to define the data structure
   - This is what M/TEXT templates expect as input
   - Example: `<DataModel><Node name="Partner"><Node name="PartnerID" type="String"/>...</Node></DataModel>`

2. **Source Data XML** (example testcase)
   - Sample XML from your external system
   - Represents the actual data structure you receive
   - Example: `<Partner><PartnerID>12345</PartnerID><UserID>jdoe</UserID></Partner>`

3. **Optional Instructions** (guidance for field mapping)
   - Short hints like "Ignore fax numbers" or "Prefer email over phone"
   - Helps the LLM prioritize which fields to map

### Output

**XSLT 1.0 Stylesheet** that M/TEXT can use to transform source XML → BusinessData structure

The LLM generates **concise transformations** (12-15 fields) rather than exhaustive mappings, which is more reliable and easier to review/modify.


## System Prompt: The Core of the Pattern

### M/TEXT-Specific Constraints

The prompt teaches the LLM about M/TEXT's datamodel structure:
- **Datamodel uses `<Node name="...">`** as schema notation
- **Output must use element names** (PartnerID, Address), not `<Node>` tags
- **XSLT transforms source XML → M/TEXT BusinessData format**

### Key Design Principles

- **Completeness over coverage**: Generate 12-15 high-confidence fields rather than attempting all fields (prevents truncation and makes output reviewable)
- **Well-formedness guarantees**: Explicit rules about closing tags, escaping characters, valid XPath
- **Minimal prose**: No markdown fences or explanations - just the stylesheet
- **Self-contained templates**: Each field gets its own named template for clarity

### The Prompt (Copy-Paste Ready)


In [None]:
SYSTEM = """
You are an expert in XML and XSLT.

Goal:
- Produce a SHORT, WELL-FORMED XSLT 1.0 STYLESHEET that maps SOURCE = testcase XML to TARGET = datamodel structure.

Scaffolding-first output (prioritize completeness over coverage):
- Return only a single <xsl:stylesheet> with a root template (match="/") that constructs ONE root element (the server may wrap as <Result/> if needed).
- Include at most 12–15 of the HIGHEST-CONFIDENCE target fields to avoid truncation. It is OK to omit the rest.
- For each included field, create a self-contained named template and call it from the root in TARGET order.
- Template pattern (no placeholders with braces):
  <xsl:template name="TARGET_NAME">
    <xsl:element name="TARGET_NAME">
      <xsl:value-of select="SOURCE_XPATH"/>
    </xsl:element>
  </xsl:template>

Critical rules:
- Interpret the datamodel as a SCHEMA (uses <Node name="...">). DO NOT emit <Node> or <DataModel> in output—emit element names exactly as @name values (PartnerID, Address, Line, ...).
- NO prose, NO markdown fences, NO comments containing angle brackets.
- ABSOLUTELY close every <xsl:template> and every element. Avoid long lines; keep templates compact.
- Attribute values MUST NOT contain raw '<' or '>' except inside XPath in select="...".
- If unsure about an XPath, pick a safe, short absolute XPath from the testcase. Do not invent malformed constructs.

Reference style (structure only):
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:template match="/">
    <xsl:call-template name="Partner"/>
  </xsl:template>
  <xsl:template name="Partner">
    <xsl:element name="Partner">
      <xsl:call-template name="Partner-PartnerID"/>
      <xsl:call-template name="Partner-UserID"/>
    </xsl:element>
  </xsl:template>
</xsl:stylesheet>
"""



## User prompt (inputs + task)

We pass the datamodel and testcase as strings, along with optional instructions to bias the selection of fields.


In [None]:
USER = f"""
Datamodel (XML):
{DATAMODEL_XML}

Testcase (XML):
{TESTCASE_XML}

Optional instructions:
{INSTRUCTIONS}

Task: Produce a single valid XSLT 1.0 stylesheet that maps fields from the testcase (SOURCE) into the target structure defined by the datamodel (TARGET). Follow the reference style and return only the stylesheet.
""".strip()



## Implementation Pattern (Framework-Agnostic)

Here's the generic sequence for integrating this into any application:

1. **Receive inputs**: Datamodel XML, testcase XML, optional instructions (via API, file upload, or direct input)
2. **Compose prompts**: Combine the SYSTEM prompt above with your USER prompt containing the XMLs
3. **Call LLM**: Use your preferred provider's API (Claude, GPT, Mistral, etc.)
   - Timeout: 55-280s recommended (XSLT generation is fast, usually <15s)
   - Retries: 1 retry maximum (if it fails twice, the schemas are likely too complex)
4. **Extract XSLT**: Parse the response to get the `<xsl:stylesheet>...</xsl:stylesheet>` content
5. **Apply guardrails**: Run the sanitization logic (see next section) to ensure well-formedness
6. **Return result**: Provide the XSLT to the user for review/testing




## Example Implementation (Python)

This example shows how to implement the pattern with any LLM provider. The code is framework-agnostic - adapt it to your stack.



In [None]:
# Example 1: Using Anthropic Claude (pip install anthropic)
import os
from anthropic import Anthropic

def generate_xslt_with_claude(datamodel_xml: str, testcase_xml: str, instructions: str = ""):
    """Generate M/TEXT XSLT using Claude"""
    client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
    
    user_prompt = f"""Datamodel (XML):
{datamodel_xml}

Testcase (XML):
{testcase_xml}

Optional instructions:
{instructions}

Task: Produce a single valid XSLT 1.0 stylesheet that maps fields from the testcase (SOURCE) into the target structure defined by the datamodel (TARGET). Follow the reference style and return only the stylesheet."""
    
    response = client.messages.create(
        model="claude-sonnet-4-0",
        max_tokens=4000,
        messages=[{"role": "user", "content": SYSTEM + "\n\n" + user_prompt}]
    )
    
    return response.content[0].text

# Example 2: Using OpenAI GPT (pip install openai)
from openai import OpenAI

def generate_xslt_with_gpt(datamodel_xml: str, testcase_xml: str, instructions: str = ""):
    """Generate M/TEXT XSLT using GPT"""
    client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    
    user_prompt = f"""Datamodel (XML):
{datamodel_xml}

Testcase (XML):
{testcase_xml}

Optional instructions:
{instructions}

Task: Produce a single valid XSLT 1.0 stylesheet that maps fields from the testcase (SOURCE) into the target structure defined by the datamodel (TARGET). Follow the reference style and return only the stylesheet."""
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": SYSTEM},
            {"role": "user", "content": user_prompt}
        ]
    )
    
    return response.choices[0].message.content

# Usage example:
# xslt = generate_xslt_with_claude(DATAMODEL_XML, TESTCASE_XML, INSTRUCTIONS)
# Then apply sanitization (see next section) before using in M/TEXT

# Example output - what the LLM typically returns:
RAW_XSLT = """<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:template match="/">
    <xsl:element name="Result">
      <xsl:call-template name="Partner"/>
    </xsl:element>
  </xsl:template>
  <xsl:template name="Partner">
    <xsl:element name="Partner">
      <xsl:element name="PartnerID"><xsl:value-of select="/Partner/PartnerID"/></xsl:element>
      <xsl:element name="UserID"><xsl:value-of select="/Partner/UserID"/></xsl:element>
    </xsl:element>
  </xsl:template>
</xsl:stylesheet>"""



## Minimal sanitation example (illustrative)

In the app, we ensure the stylesheet closes all tags and has a root template. Here we only show the idea.


In [None]:
def sanitize_xslt(xslt: str) -> str:
    s = xslt.strip()
    if not s.endswith("</xsl:stylesheet>"):
        s += "\n</xsl:stylesheet>"
    # Add root template if missing
    if 'match="/"' not in s:
        i = s.rfind("</xsl:stylesheet>")
        if i != -1:
            s = s[:i] + "\n<xsl:template match=\"/\"><xsl:element name=\"Result\"/></xsl:template>\n" + s[i:]
    return s

CLEAN_XSLT = sanitize_xslt(RAW_XSLT)
print(CLEAN_XSLT[:500] + ("..." if len(CLEAN_XSLT) > 500 else ""))


## Guardrails & Validation Logic

The backend applies several layers of post-processing to ensure production-ready output. These guardrails handle ~95% of common LLM errors:

### 1. XSLT Extraction
- Strips markdown code fences (` ```xml ` or ` ```xslt `)
- Extracts `<xsl:stylesheet>...</xsl:stylesheet>` tags
- Falls back to raw text if it starts with `<xsl:` and contains namespace

### 2. Tag Balancing
Count opening vs closing template tags and add missing closers:
```javascript
const openTemplates = (xslt.match(/<xsl:template\b/gi) || []).length;
const closeTemplates = (xslt.match(/<\/xsl:template>/gi) || []).length;
// Insert missing </xsl:template> tags before </xsl:stylesheet>
```

### 3. Root Template Injection
If no `match="/"` template exists, add one that calls the first named template:
```xml
<xsl:template match="/">
  <xsl:call-template name="FirstNamedTemplate"/>
</xsl:template>
```

### 4. Character Escaping
Escape illegal `<` characters inside attribute values:
```javascript
xslt = xslt.replace(/="([^"]*<[^"]*)"/g, 
  (m, val) => '="' + val.replace(/</g, '&lt;') + '"'
);
```

### 5. Root Element Wrapping
Wraps output in a `<Result>` element to ensure single-root compliance required by XSLT processors.


## Adaptation Ideas for M/TEXT Projects

This LLM-generation pattern is highly adaptable for different M/TEXT integration scenarios:

### Different Source Data Formats

**JSON data sources** (REST APIs, web services):
- Convert JSON to XML first (simple transformation)
- Use this workflow to generate XSLT for XML → M/TEXT datamodel
- Alternative: Modify system prompt to generate JSON-to-XML transformations directly


### M/TEXT-Specific Customizations

Add domain-specific rules to the system prompt:

```
Additional M/TEXT requirements:
- All dates must use M/TEXT's date format (YYYY-MM-DD)
- Phone numbers must include country code prefix
- Currency amounts must be in cents (no decimal point)
- Address fields must map to M/TEXT's Address node structure
```

### Complex M/TEXT Datamodels

For large M/TEXT datamodels (>50 fields):

**Option 1: Multi-Section Generation**
1. Split datamodel into logical sections (Customer, Order, Items)
2. Generate separate XSLT for each section
3. Combine XSLTs manually or with another LLM pass

**Option 2: Hierarchical Approach**
1. Generate high-level structure first (main nodes only)
2. Generate detail XSLTs for nested structures
3. Merge using XSLT imports or includes

**Option 3: Iterative Refinement**
1. Generate initial XSLT with instructions "Focus on customer data"
2. Generate second XSLT with "Focus on order line items"
3. Manually merge the best parts of each


The same LLM pattern works - just update the constraints in the system prompt.
