Skip to content

Commit

Permalink
Start a HOWTO on writing target-language back ends.
Browse files Browse the repository at this point in the history
#8 in the retargeting patch series
  • Loading branch information
eric-s-raymond committed Sep 21, 2020
1 parent 673f2ca commit e58cdd1
Show file tree
Hide file tree
Showing 2 changed files with 77 additions and 1 deletion.
3 changes: 2 additions & 1 deletion src/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,8 @@ include_HEADERS = \
EXTRA_DIST = \
cpp-flex.skl \
mkskel.sh \
gettext.h
gettext.h \
backend.adoc

CLEANFILES = stage1scan.c stage1flex$(EXEEXT)

Expand Down
75 changes: 75 additions & 0 deletions src/backend.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
= How to add a support for a new language to flex

= Theory

The flex code was historically written to generate parsers in C, but
it has factored to isolate knowledge of the specifics of each target
languageas from the logic for byukilding the lexer state tables much
as possible.

The only assumption that is absolutely baked into all of flex is that
the bodies of initializers for arrays of integers consist of decimal
numeric kiterals sepaerated by commas (and optional whitespace).

Otherwise, knowledge of each target langage's syntax lives in two
places: (1) a table of langyuge-specific syntax-generator methods,
and (2) A language-specific skeleton file.

For example: The methods for the C and C++ back end live in a source
file named cpp_backend.c (so named because both languages use the C
preprocessor), and in a skeleton file names cpp-flex.skl.

Syntactically C-like languages such as Go, Rust, and Java should be easy
target. Alnost anything generally descended from Algol shouldn't be
much more difficult; this certainly includes the whole
Pascal/Modula/Oberon family.

= Writing a new backend

All the code that accesses language-specific code generators goes
through a global pointer named "backend" to a method table. The
results of these generators are used to fill in some parts of the
language-specifoc skeleton file amd conditionalize other.

Read the definition of struct backend_t in src/flexdefs.h, and
attached comments, to get a feel for the methods. Don't worry
about understandng table generator names at first.

To write support for a langusge, you'll want to do the following
steps:

1. Clone one of the existing back-end/skeleton pairs. If the language
you are supporting is names "foo", you should create files named
foo_backend.c and foo-flex.skl.

2. Add foo_backend.c to COMMON_SOURCES in src.Makefile.am. Add the
name of your skeleton file to EXTRA_DIST.

3. Add a production to src/Makefile.am parallel to the one that
priduces cpp-skel.h. Your objecting is to make s string list
initializer from your skeleton file that can be linked with flex
and is opointed at by the skel nember of your language back end.

4. Add some logic to main.c that enables the new back end with a
new command-line option. Following this step you should be
able to run flex on a specification and fet code out in the
language of whatever back end you cloned.

5. The interesting part: mutate your new back end and skeleton so they
produce code in your desired target langage.

6. Write a test suite for your back end. You should be able to clone
one of the existing sets of test loads to get good coverage. Note
that is highly unliely your back end will be accepted into the
flex distribution without a test suite.

A hint about step 5:

* Don't bother supporting non-reentrant parser generation.
The interface of original lex with all those globals hanging out
needs to be supported in C for backwards compatibility, but
there




0 comments on commit e58cdd1

Please sign in to comment.