Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: regexp/syntax: improve printing of flags #57950

theseion opened this issue Jan 22, 2023 · 3 comments

proposal: regexp/syntax: improve printing of flags #57950

theseion opened this issue Jan 22, 2023 · 3 comments


Copy link

Parsing and printing expressions with flags produces pretty bad output. Example: (?i:ab*c|d?e) -> (?i:A)(?i:B)*(?i:C)|(?i:D)?(?i:E). While the output is semantically correct, it causes the resulting expression to grow in size by approximately factor 3.

I am a core developer of and we use Go to generate regular expressions that can become very large (hundreds of KB). A growth of an expression by factor 3 is not acceptable for us. The issue for us is mostly about the fold case flag (i) but other flags are affected as well, e.g., ab*c.|d?e. -> ab*c(?-s:.)|d?e(?-s:.) (better: (?:ab*c|d?e)(?-s:.), or (?-s)(?:ab*c|d?e).).

I propose to improve printing of regular expressions, so that flags are only repeated if necessary.

Copy link

why wouldn't you just keep the original, like regexp.Regex ?

cc @rsc

Copy link

robpike commented Jan 22, 2023

How are you printing it? The existing code does exactly what you want:

package main

import (

func main() {

Copy link

theseion commented Jan 23, 2023

We generate a single regular expression out of multiple smaller expressions. Here's an example:


Each line is treated as an alternation. The result would be \s+__\$|command1|command2. We do more complicated things too, such as substitutions and concatenation. In the end, we always end up with a list of alternations as above. We then feed these to a library ( for final assembly and optimisation.

This process is also recursive in some cases, so the output from rassemble-go may become one of the alternations.

rassemble-go uses regex/syntax to parse and process the expression:

import "regexp/syntax"

r, _ := syntax.Parse("(?i:ab*c|d?e)", syntax.ClassNL|syntax.PerlX)

The result is (?i:A)(?i:B)*(?i:C)|(?i:D)?(?i:E).

To finalize:

  1. the "original" expression may be embedded in another expression and I don't know the position in the string where I would need to insert the flags
  2. rassemble-go uses regexp/syntax to parse and manipulate the regular expression, hence the output differs from the one produced by regexp.Regexp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Status: Incoming

No branches or pull requests

4 participants