Skip to content

Unicode character classes #235

@terpstra

Description

@terpstra

Firstly, thanks a lot for this tool. It saved me a lot of time! I am using re2c to create a parser for an as-yet unpublished build tool. The input files are utf-8 encoded. Everything works fine for the ascii character set.

However, I'd like to expand my identifier space to include/allow unicode letters in addition to [a-zA-Z]. Currently the only way to do this that I can see is to write a parser for UnicodeData.txt that grabs all of the letter category code points and dumps them into a giant character class. That's fine, but now I have a generator for a generator for C++. It seems like this sort of Unicode character class functionality would be more naturally supported directly in re2c itself.

I was somewhat surprised this was not already supported, so I went looking for these classes in re2c and could not find them. Apologies if this is already supported and my grep-powers were insufficient.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions