Skip to content

[regexp feature request] Support 'Ş' character on word boundary search, like in .NET/Python3/Java regex engine. #58863

@caner-cetin

Description

@caner-cetin

What version of Go are you using (go version)?

$ go version

go version go1.20.1 linux/amd64

Does this issue reproduce with the latest release?

Yeaps.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOVERSION="go1.20.1"

What did you do?

package main
import(
    fmt
    regexp
)
func DeleteNonDate(toBeDeleted string) (string, string, error) {
	nonDateRegex, err := regexp.Compile(`(?mi)\b(\d{1,2}\s)?(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec|[Oo]cak|January|February|March|April|May|June|July|August|September|October|November|December|[Şş]ubat|[Mm]art|[Nn]isan|[Mm]ayıs|[Hh]aziran|[Tt]emmuz|[Aa]ğustos|[Ee]ylül|[Ee]kim|[Kk]asım|[Aa]ralık)(\s\d{1,2},?)?(\s\d{4})?\b`)
	if err != nil {
		return "", "", err
	}
	// if no line matches:
	return strings.TrimSpace(nonDateRegex.FindString(toBeDeleted)), nonDateRegex.ReplaceAllString(toBeDeleted, ""), nil
}

func main() {
      excepted, _, err := DeleteNonDate(`asdasd \n \n Şubat 2023 Haziran 2023`)
     if err  != nil {
        panic(err)
      }
      fmt.Println(excepted)
}

What did you expect to see?

Şubat 2023

What did you see instead?

Haziran 2023, missed Şubat 2023 match entirely.

https://regex101.com/r/uq8ZR9/1, live demo to play around. Python, Java, .NET engine supports 'Ş' match with same regex, can we please have it on RE2?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions