recursive-descent-parser

Learning how to write a compiler

Building

npm run build or ./build.sh

Methods described

There are several steps to run source code
A series of passes over the data make it easier to handle:

Tokenize

The first step scans through the source code as a string
and returns a series of tokens
identified by their:

type - defines syntactic usage, such as identifier, keyword, operator, brackets, etc
data - typically the string represented by the type, but can be transformed by preprocessor
For instance, a preprocessor could take several tokens:
{type: "parenthesis", data:"("},
{type: "parenthesis", data:")"},
{type: "operator", data:"="},
{type: "operator", data:">"},

And turn them into
`{type: "arrow-function", data:"()=>"}`,
3. line and char numbers (useful for debugging source)

Preprocess

This part is still in the works, but it will essentially
be a function that passes over tokens and returns a
modified set.

What modifications actual entells is up to the preprocessor
but some examples are:

source directives
.babelrc
special language features
not supported by a parser that
can be broken down into lower level codes.

Parser

Creates a tree structure from a token array
called an Abstract Syntax Tree or AST

This is where the recursive descent part comes into play, and the part I came here to learn about.

Interpreter / Codegen

I plan on implementing both an interpreter and code generator.

They will take an abstract syntax tree and

run (interpreter)
or compile (codegen)
it into some lower level code
(typically OP codes, or machine code)

Implementation

In my process I've decided to take a language-agnostic
approach, even though my end goal is probably
something like typescript/javascript

For instance, the tokenizer process actually relies
on a Scanner, which is where language syntax will actually be handled,
and the tokenize function will already be implemented for you.

To handle your own language, you'll need to implement
a scanner subclass.

Scanner

This is a class meant to be extended
It provides functionality to implement scanning text
an a more standard way, which should make debugging easier

addPass - for adding more syntax handling
```
addPass(name: string, pass: ScannerPass): this
```
Where name is the token.type when pass is successful
and pass is a scanner pass

ScannerPass

Each scanner pass is meant to handle a single type of
language syntax.

(data: string, offset: number): ScannerData

Where data is the source code data
offset the offset in the source to read from
and return expected to be a ScannerData

ScannerData

{
  success: boolean //needs to be false when not finding data at offset that satisfies ScannerData.type
  readChars: number //chars that fit this type before we read something we didn't like
  readLines: number //obsolete, this will be handled by internal code soon
  error?: string //optional - meant for when positive identification of error is determined, not necessarily every time success == false
}

Note that scanner data does not actually return the text that was read, only the char count.
This is to standardize the reading process, which should cause a lot less errors
between implementations of languages.

Basically: don't allow reading of chars that don't fit your specifications, and don't count ones that don't.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.vscode		.vscode
src		src
.gitignore		.gitignore
ReadMe.md		ReadMe.md
build.sh		build.sh
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
webpack.config.js		webpack.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

recursive-descent-parser

Building

Methods described

Tokenize

Preprocess

Parser

Interpreter / Codegen

Implementation

Scanner

ScannerPass

ScannerData

About

Uh oh!

Releases

Packages

Languages

RepComm/recursive-descent-parser

Folders and files

Latest commit

History

Repository files navigation

recursive-descent-parser

Building

Methods described

Tokenize

Preprocess

Parser

Interpreter / Codegen

Implementation

Scanner

ScannerPass

ScannerData

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages