Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GLib's regexp interface (backed by PCRE) #412

Merged
merged 7 commits into from Jan 3, 2017

Conversation

Projects
None yet
6 participants
@LemonBoy
Copy link
Member

commented Jan 29, 2016

Pros:

  • Uniform syntax between systems (as brought up by @wilhelmy)
  • More people are familiar with the syntax used by PCRE
  • Faster (?)
  • More robust utf8 support

Cons:

  • People may need to update their regexps
  • We need to make sure to feed utf8 encoded strings

This proposal is up for discussion 馃巿


@ailin-nemui
I would really like to see some test cases wrt. UTF8 and with and without the raw flags and some thoughts about how to go best about this (compile 2 regexen and match them depending on utf state?) there is also some secret unicode command flag in pcre iirc, does this apply?

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Feb 3, 2016

I think this is a good idea. what we need is

  • examples of breakage in regexes when using the new engine
  • documentation for users how to manually adjust their regexes

@LemonBoy LemonBoy force-pushed the LemonBoy:pcre-regexp branch 2 times, most recently from a31b5fd to 6aebf0a Jun 19, 2016

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jun 23, 2016

we (probably) need to conditionally use the raw flag. Two cases:

  • irssi is being used with 8bit term_type (-> everything is bytes)
  • incoming messages are not valid utf8
@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Aug 11, 2016

we should consider if it shouldn't be possible to disable the RAW mode? otherwise we cannot benefit from the unicode advantage in gregex

@foice

This comment has been minimized.

Copy link

commented Sep 10, 2016

Does this support negative lookahead? I'd like to try it, I get irssi from brew on OS X if that matters.

@vague666

This comment has been minimized.

Copy link
Member

commented Sep 10, 2016

@foice , see man pcresyntax, I assume that's the place to read about pcre and what it supports (it mentions negative look ahead)

@vague666

This comment has been minimized.

Copy link
Member

commented Sep 10, 2016

@LemonBoy @ailin-nemui how about using both regex systems in parallel for a while to let people adjust to the change and use a setting to switch between the two? It would make the codebase bigger for the time being but might not give users as much headache, or is there a library that you know of(I didn't find anything with google) that would allow conversion of POSIX regex to PCRE?

@dequis

This comment has been minimized.

Copy link
Member

commented Sep 10, 2016

is there a library that you know of(I didn't find anything with google) that would allow conversion of POSIX regex to PCRE?

Converting POSIX regex to PCRE would mean trying to emulate it while keeping bug-compatibility with it. Nope. Maybe it's a matter of adding backslashes to some characters and removing them from others, but we might end up learning about missed edge cases after releasing a new version breaks stuff. I don't want to go there.

Better allow switching regex engines with a global setting.

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Sep 13, 2016

I would really like to see some test cases wrt. UTF8 and with and without the raw flags and some thoughts about how to go best about this (compile 2 regexen and match them depending on utf state?) there is also some secret unicode command flag in pcre iirc, does this apply?

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Nov 15, 2016

I will merge this, issues caused by that can then be found and solved later

@dequis

This comment has been minimized.

Copy link
Member

commented Nov 16, 2016

I really don't like the idea of replacing the regexp engine.

@dequis

This comment has been minimized.

Copy link
Member

commented Nov 16, 2016

It's a breaking change, and a subtle one at that. People will upgrade irssi and find their stuff stopped working for no particular reason. Let's not do that. The improvements brought by this change aren't worth it. I'd rather have it under a setting.

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Nov 17, 2016

how about making it a compile time switch

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jan 2, 2017

LemonBoy and others added some commits Jan 14, 2016

Remove unused references to regex.h
Also remove the prototype for regex_match since it has been removed.
Remove the regexp_compiled field.
It was made redundant by the introduction of the pointer to the GRegex
structure.
Silence the compiler warning in textbuffer.c about preg being
initialized by setting it to NULL.
Use the RAW flag when building the regexps.
Also, plugged a memory leak when retrieving the match position.
Ailin Nemui

@ailin-nemui ailin-nemui force-pushed the LemonBoy:pcre-regexp branch from 27d51d0 to 1f72b8e Jan 3, 2017

@ailin-nemui ailin-nemui merged commit 5787e2b into irssi:master Jan 3, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@lotheac

This comment has been minimized.

Copy link
Contributor

commented Jan 9, 2017

dequis wrote:

It's a breaking change, and a subtle one at that. People will upgrade
irssi and find their stuff stopped working for no particular reason.
Let's not do that. The improvements brought by this change aren't
worth it. I'd rather have it under a setting.

Exactly this happened to me -- I realize that this is a compile-time
setting, but if subtle breakage was to be avoided, then it should not
have been the default.

edit: or, release notes should have been clearer about this.

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

commented Jan 9, 2017

sorry for your inconvenience, I pushed for this change but dequis was against it. I added a further hint about it to the on-site release notes (cannot change the released notes anymore)

would you like to expand on what regex broke and how you had to modify it?

@lotheac

This comment has been minimized.

Copy link
Contributor

commented Jan 9, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can鈥檛 perform that action at this time.