# Domain Specific Code Generation using the FormLang DSL

## Abstract

AI and LLM systems are being trained to perform a variety of complex tasks requiring expertise in many software tools, languages and technology stacks. Some of the challenges in this process involve quality data acquisition in large amounts, handling the increase of model parameter count which leads to increasing compute demands and costs, as well as emerging Data Privacy and Intellctual Property concerns related with using 3rd party cloud services for model training. 

In this work we attempt to harness and combine the benefits of Abstraction and Determinism provided by Formal Domain Specific Languages (DSLs) with the innate ability of LLMs to learn new languages and their semantics. We propose a novel generation task called "Domain Specific Code Generation" which involve mapping user requests written in natural language to DSL code.

By utilizing a specially crafted DSL called `FormLang` as a case study, we attempt to lay the groundwork for methods of automated DSL dataset generation, training techniques and performance evaluation, with the end goal of creating an AI system capable of generating Web forms according to a user request.

Our `FormLang` DSL allows expressing the semantics of Web-forms using a simplified syntax that does not require much, if any, Web-programming knowledge and expertise.

Given a user prompt in English describing the desired form and its fields the LLM produces syntactically valid `FormLang` output which is run through the accompanying FormLang parser and a hand-crafted React JSX compiler to output a final implementation of the form in JavaScript and React.

The project includes a live demo which demonstrates the capabilities of the system.


## Referring to this work

If you use this work the following quote is preferred:

```bibtex
@misc{guyor2025dscodegenformlang,
      title={Domain Specific Code Generation using the FormLang DSL}, 
      author={Guy Or},
      year={2025}
}
```

The official repository of this work is hosted in GitHub at https://github.com/guyo13/Form-Lang **TBD** - Make the repo public

## Project Goals

* Define the Task of Domain Specific Code Generation.
* Create an AI training pipeline for FormLang (implemented as a Juypter notebook) which includes:
    * Automatic FormLang Dataset generation using searching algorithms and heuristics.
    * Baseline model selection and loading from Hugging Face.
    * Dataset Preprocessing and loading.
    * Defining performance KPIs for the system.
    * Model fine tuning using Transformers library
    * Model Adapter training using PEFT library.
    * Model upload to Hugging Face Hub and example usage from the Hub.
    *  **(TBD)** Export to ONNX using Optimum and run on-device using Transformers.js.
* Create the “FormLang” language:
    * Describe the problem domain.
    * Defining a viable minimal syntax and semantics which are research focused rather than completeness focused.
    *  **(TBD)** Implementing a “JavaScript React” compile target.
*  **(TBD)** Create a live demo website:
    * **(TBD)** Users input a prompt.
    * **(TBD)** A FormLang editor is populated with the AI’s code generation results.
    * **(TBD)** The form is rendered alongside the generated code.
*  Discuss the project results:
    * Perfornace and user acceptability.
    * **(TBD)** Viability of the implemented methods for the Domain Specific Code Generation task and generalization to other domains.  
    * **(TBD)** Potential enhancements to the system.
    * **(TBD)** Possible research directions on how to learn from user data.


**(TBD)** - Features marked as TBD are depending on the project's progress and timeline constraints as well as proving the viability of the methods and system.

# Notebook Setup

## Project imports

In [2]:
import asyncio
from transformers import pipeline
from huggingface_hub import login
import pythonmonkey as pm
formlang_lib = pm.require("../out/cjs/lib/index")

## Login to Huggingface Hub

In [8]:
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Python Monkey helpers

In [3]:
def js_dir(something):
    pm.globalThis.console.dir(something)

def js_log(something):
    pm.globalThis.console.log(something)

# Dataset generation

### Example usage of FormLang library

In [6]:
form_components = """
  component userDetailsContainer {}
  component formContainer {}
  component someOtherContainer {}
  component OtherContainer2 {}
"""
field_components = """
  component myTextBox {
    props {
      textColor
      textSize
      textWeight
      borderColor
    }
  }
  component myCheckbox {
    props {
      size
    }
  }
  component otherTextBox {}
  component counter {
    props {
      style
    }
  }
"""

form_gen = await formlang_lib.newFormGen(formlang_lib.DEFAULT_GENERATOR_HYPER_PARAMETERS, form_components, field_components)

Validating model
Validating model


In [7]:
a = form_gen.generateForm()

In [8]:
js_dir(a)

{ [32m'$type'[39m: [32m'Form'[39m,
  name: [32m'xw'[39m,
  component: 
   { component: 
      { [32m'$type'[39m: [32m'ComponentDef'[39m,
        name: [32m'formContainer'[39m,
        [32m'$cstNode'[39m: [36m[Object][39m,
        props: [],
        [32m'$container'[39m: [36m[Object][39m,
        [32m'$containerProperty'[39m: [32m'components'[39m,
        [32m'$containerIndex'[39m: [33m1[39m },
     propAssignments: {} },
  children: 
   [ { [32m'$type'[39m: [32m'Field'[39m,
       name: [32m'c'[39m,
       component: [36m[Object][39m,
       state: [1mnull[22m,
       depth: [33m1[39m },
     { [32m'$type'[39m: [32m'Field'[39m,
       name: [32m'e'[39m,
       component: [36m[Object][39m,
       state: [36m[Object][39m,
       depth: [33m1[39m },
     { [32m'$type'[39m: [32m'Form'[39m,
       name: [32m'A'[39m,
       component: [36m[Object][39m,
       children: [36m[Array][39m,
       depth: [33m1[39m },
     { [32m'$ty

In [19]:
fl_obj = await formlang_lib.getFormLangStringParser()("""
component comp1 {}
component comp2 {}
form HelloWorld {
    comp comp1
    field MyField {
        comp comp2
    }
}
""")

Validating model


In [20]:
formlang_lib.hasErrors(fl_obj)

False

In [21]:
js_dir(fl_obj)

{ parseResult: 
   { value: 
      { [32m'$type'[39m: [32m'Model'[39m,
        components: [36m[Array][39m,
        forms: [36m[Array][39m,
        [32m'$cstNode'[39m: [36m[Object][39m,
        typeDefs: [],
        [32m'$document'[39m: [36m[Circular][39m },
     lexerErrors: [],
     lexerReport: { diagnostics: [] },
     parserErrors: [] },
  uri: 
   l2 {
     scheme: [32m'file'[39m,
     authority: [32m''[39m,
     path: [32m'/10.form'[39m,
     query: [32m''[39m,
     fragment: [32m''[39m,
     _formatted: [32m'file:///10.form'[39m,
     _fsPath: [1mnull[22m },
  state: [33m6[39m,
  references: 
   [ { [32m'$refNode'[39m: [36m[Object][39m,
       [32m'$refText'[39m: [32m'comp1'[39m,
       ref: [36m[Getter][39m,
       [32m'$nodeDescription'[39m: [36m[Getter][39m,
       error: [36m[Getter][39m,
       _ref: [36m[Object][39m,
       _nodeDescription: [36m[Object][39m },
     { [32m'$refNode'[39m: [36m[Object][39m,
       [32

In [22]:
formlang_lib.serializeAst(fl_obj.parseResult.value, formlang_lib.getServices().FormLang)

'{"$type":"Model","components":[{"$type":"ComponentDef","name":"comp1","props":[]},{"$type":"ComponentDef","name":"comp2","props":[]}],"forms":[{"$type":"Form","name":"HelloWorld","component":{"$type":"FieldComponentDef","componentId":{"$ref":"#/components@0"},"componentPropsKeys":[],"componentPropsValues":[]},"children":[{"$type":"Field","name":"MyField","component":{"$type":"FieldComponentDef","componentId":{"$ref":"#/components@1"},"componentPropsKeys":[],"componentPropsValues":[]}}]}],"typeDefs":[]}'