Pattern Lexer

My personal implementation of a lexer.

Principle

This lexer is making use of the Pattern trait to find tokens.
The idea is to create Tokens, explain how to match them with a Pattern and build them from the matched String value.

Pattern

A string Pattern trait.

The type implementing it can be used as a pattern for &str, by default it is implemented for the following types.

Pattern type	Match condition
`char`	is contained in string
`&str`	is substring
`String`	is substring
`&[char]`	any `char` match
`&[&str]`	any `&str` match
`F: Fn(&str) -> bool`	`F` returns `true` for substring (slow)
`Regex`	`Regex` match substring

Usage

The lexer! macro match the following syntax.

lexer!(
    // Ordered by priority
    NAME(optional types, ...) {
        impl Pattern => |value: String| -> Token,
        ...,
    },
    ...,
);

It generates module gen which contains Token, LexerError, LexerResult and Lexer.

You can now call Token::tokenize to tokenize a &str, it should return a Lexer instance that implements Iterator.
Each iteration, the Lexer tries to match one of the given Pattern and returns a LexerResult<Token> built from the best match.

Example

Here is an example for a simple math lexer.

lexer!(
    // Different operators
    OPERATOR(char) {
        '+' => |_| Token::OPERATOR('+'),
        '-' => |_| Token::OPERATOR('-'),
        '*' => |_| Token::OPERATOR('*'),
        '/' => |_| Token::OPERATOR('/'),
        '=' => |_| Token::OPERATOR('='),
    },
    // Integer numbers
    NUMBER(usize) {
        |s: &str| s.chars().all(|c| c.is_digit(10))
            => |v: String| Token::NUMBER(v.parse().unwrap()),
    },
    // Variable names
    IDENTIFIER(String) {
        regex!(r"[a-zA-Z_$][a-zA-Z_$0-9]*")
            => |v: String| Token::IDENTIFIER(v),
    },
    WHITESPACE {
        [' ', '\n'] => |_| Token::WHITESPACE,
    },
);

That will expand to these enum and structs.

mod gen {
    pub enum Token {
        OPERATOR(char),
        NUMBER(usize),
        IDENTIFIER(String),
        WHITESPACE,
    }

    pub struct Lexer {...}
    pub struct LexerError {...}
    pub type LexerResult<T> = Result<T, LexerError>;
}

And you can use them afterwards.

use gen::*;

let mut lex = Token::tokenize("x_4 = 1 + 3 = 2 * 2");
assert_eq!(lex.nth(2), Some(Ok(Token::OPERATOR('='))));
assert_eq!(lex.nth(5), Some(Ok(Token::NUMBER(3))));

// Our lexer doesn't handle parenthesis...
let mut err = Token::tokenize("x_4 = (1 + 3)");
assert!(err.nth(4).is_some_and(|res| res.is_err()));

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pattern Lexer

Principle

Pattern

Usage

Example

About

Releases

Packages

Languages

License

emsquid/plexer

Folders and files

Latest commit

History

Repository files navigation

Pattern Lexer

Principle

Pattern

Usage

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages