Start a HOWTO on writing target-language back ends.

#8 in the retargeting patch series
eric-s-raymond · Sep 21, 2020 · e58cdd1 · e58cdd1
1 parent 673f2ca
commit e58cdd1
Show file tree

Hide file tree

Showing 2 changed files with 77 additions and 1 deletion.
diff --git a/src/Makefile.am b/src/Makefile.am
@@ -91,7 +91,8 @@ include_HEADERS = \
 EXTRA_DIST = \
 	cpp-flex.skl \
 	mkskel.sh \
-	gettext.h
+	gettext.h \
+	backend.adoc
 
 CLEANFILES = stage1scan.c stage1flex$(EXEEXT)
 

diff --git a/src/backend.adoc b/src/backend.adoc
@@ -0,0 +1,75 @@
+= How to add a support for a new language to flex
+
+= Theory
+
+The flex code was historically written to generate parsers in C, but
+it has factored to isolate knowledge of the specifics of each target
+languageas from the logic for byukilding the lexer state tables much
+as possible.
+
+The only assumption that is absolutely baked into all of flex is that
+the bodies of initializers for arrays of integers consist of decimal
+numeric kiterals sepaerated by commas (and optional whitespace).
+
+Otherwise, knowledge of each target langage's syntax lives in two
+places: (1) a table of langyuge-specific syntax-generator methods,
+and (2) A language-specific skeleton file.
+
+For example: The methods for the C and C++ back end live in a source
+file named cpp_backend.c (so named because both languages use the C
+preprocessor), and in a skeleton file names cpp-flex.skl.
+
+Syntactically C-like languages such as Go, Rust, and Java should be easy
+target.  Alnost anything generally descended from Algol shouldn't be
+much more difficult; this certainly includes the whole
+Pascal/Modula/Oberon family.
+
+= Writing a new backend
+
+All the code that accesses language-specific code generators goes
+through a global pointer named "backend" to a method table.  The
+results of these generators are used to fill in some parts of the
+language-specifoc skeleton file amd conditionalize other.
+
+Read the definition of struct backend_t in src/flexdefs.h, and
+attached comments, to get a feel for the methods.  Don't worry
+about understandng table generator names at first.
+
+To write support for a langusge, you'll want to do the following
+steps:
+
+1. Clone one of the existing back-end/skeleton pairs.  If the language
+   you are supporting is names "foo", you should create files named
+   foo_backend.c and foo-flex.skl.
+
+2. Add foo_backend.c to COMMON_SOURCES in src.Makefile.am.  Add the
+   name of your skeleton file to EXTRA_DIST.
+
+3. Add a production to src/Makefile.am parallel to the one that
+   priduces cpp-skel.h.  Your objecting is to make s string list
+   initializer from your skeleton file that can be linked with flex
+   and is opointed at by the skel nember of your language back end.
+
+4. Add some logic to main.c that enables the new back end with a
+   new command-line option.  Following this step you should be
+   able to run flex on a specification and fet code out in the
+   language of whatever back end you cloned.
+
+5. The interesting part: mutate your new back end and skeleton so they
+   produce code in your desired target langage.
+
+6. Write a test suite for your back end.  You should be able to clone
+   one of the existing sets of test loads to get good coverage.  Note
+   that is highly unliely your back end will be accepted into the
+   flex distribution without a test suite.
+
+A hint about step 5:
+
+* Don't bother supporting non-reentrant parser generation.
+  The interface of original lex with all those globals hanging out
+  needs to be supported in C for backwards compatibility, but
+  there
+
+
+
+