Import and set up FORMULA executor in Jupyter notebook

In [1]:
import formulallm.formula as f

In [None]:
f.help()

Import `ParserDSL.4ml` source file and `GenericDataParser` domain that provides a generic DSL for parsing data and will be extended to model a concrete parser in FORMULA language.

In [None]:
code = f.load("./data/parser/ParserDSL.4ml")

In [None]:
f.details("GenericDataParser")


Import the following 4 files and feed them to the LLM:
1. A buggy C program `untar1.c` from ClamAV
2. A generic DSL for data parsing in `ParserDSL.4ml`
3. FORMULA documentation `formula.pdf`
4. A prompt with specific instructions in `prompt.txt` for the agent to model a Tar parser in FORMULA language.

In [None]:
from formulallm.agents import FormulaLLMAgent
from formulallm.utils.extraction import pdf2text, file2text
text_code_in_c = file2text("./data/parser/untar1.c")
text_parser_dsl = file2text("./data/parser/ParserDSL.4ml")
text_formula_doc = pdf2text("./data/parser/formula.pdf")
text_prompt = file2text("./data/parser/prompt.txt")
print(text_prompt)

Interactively run the agent to model a Tar parser in FORMULA language until the next user input is `q` or `quit`.

In [None]:
agent = FormulaLLMAgent(model="gpt-4o")
files = [text_code_in_c, text_parser_dsl, text_formula_doc]
agent.run(text_prompt, files)

Improve the generated parser DSL in FORMULA and load it into FORMULA executor.

In [None]:
code = f.load("./data/parser/GeneratedTar.4ml")

Execute a malicious input model to prove that the parser will get stuck in an infinite loop and never finish.

In [None]:
f.query("maliciousInput", "parsingDone")

In [None]:
f.list()