Allow an option for \b regexp regression in JDK >= 19

[JDK19 has changed the default behavior for \b regexp](https://bugs.openjdk.org/browse/JDK-8264160) (to be compatible with \w).
The problem is that all the existing Java code with regular expressions that relied on \b for unicode characters would be pretty much broken when using JDK>=19. See (https://github.com/languagetool-org/languagetool/pull/9854)
LanguageTool project mentioned above uses segment project and [has tons of \b in segmentation rules for 25 languages](https://github.com/languagetool-org/languagetool/blob/master/languagetool-core/src/main/resources/org/languagetool/resource/segment.srx).
The fix would be to add (?U) for each regexp that uses \b
It would be nice to have an option in net.loomchild.segment project to specify Pattern.UNICODE_CHARACTER_CLASS to compile rules from segment.srx so there's no need to adjust each rule separately (we may not be able to turn on Pattern.UNICODE_CHARACTER_CLASS by default as it may have regressions for other projects, even though probability is low).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow an option for \b regexp regression in JDK >= 19 #24

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Allow an option for \b regexp regression in JDK >= 19 #24

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions