New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCRE1 to PCRE2 #1230

Merged
merged 14 commits into from Oct 12, 2018

Conversation

Projects
None yet
3 participants
@shawnw
Contributor

shawnw commented Aug 22, 2018

Background: In 2015 or so, PCRE 8.X (AKA PCRE1) was put into bug-fix only mode, with all new PCRE development in 10.X and up (AKA PCRE2). Now, I've updated Penn to use the new version.

The good: The Unicode work is my biggest motivator, since PCRE is used in it to get information about characters if ICU isn't installed - PCRE1 only has character tables for Unicode 7, while PCRE2 is up to 10. The new version also has performance and match efficiency improvements, which would be nice to have. Among new features, the only one actually being used is that it has a way to compile wildcard patterns into regular expressions (Which can then be JIT compiled for really fast matching). This is something I've had in the back of my mind for ages, so it's nice to get that without actually having to do any work.

The bad & ugly: Since this version of the library is nowhere near as ubiquitous as PCRE1, and because some LTS OSes only have versions that are too old to provide that wildcard thing, and because people hate hate hate the idea of installing a new package in their OS anyways, it's bundled with Penn. And since I don't want to be responsible for maintaining a pruned down version (That didn't go well the last time due to the amount of work needed to update it), it's all there - documentation, tests, utility programs, etc. Unfortunately there's no easy to embed single file version like with sqlite. The size of the patch and the benefits being nice but not overwhelmingly so is why I'm holding off just merging it until I get a second opinion on if people think it's worth it.

@talvo

This comment has been minimized.

Member

talvo commented Aug 23, 2018

Would it not be worth putting more work into unicode support first before bundling the lib for that purpose? While I can appreciate that bundling it complete, with all the docs, tests, utilities, etc, is a time-saver.. ergh.

@captdeaf

This comment has been minimized.

Contributor

captdeaf commented Aug 23, 2018

Maybe include this in the unicode branch first, then 'release' it with unicode?

Is there any other benefit to including PCRE2 before we have real unicode support?

@talvo

This comment has been minimized.

Member

talvo commented Aug 23, 2018

If this should go in, we really need a slightly more detailed changelog entry to explain why a Penn install is now 20 times the size and 400 files larger; "Updated PCRE version" doesn't quite do it justice.

That being said, is bundling this really necessary? Why not add a pennmush/lib directory, stick instructions in for how to download the PCRE2 source into it (if it's not available on your system already), and have the build script check for and compile it there, if present? That way, Penn itself isn't ginormous from the bundling, keeping it up to date isn't our responsibility, and people running an OS that does provide it don't need to do anything extra anyway.

@shawnw

This comment has been minimized.

Contributor

shawnw commented Aug 23, 2018

  1. The unicode branch uses the ICU library for most stuff, falling back on PCRE when possible if ICU isn't present (People hate having more mandatory dependencies, and it is way way way too big to directly include...). PCRE2 has support for more things than PCRE1 - for example, a big improvement is that it can do unicode-aware case mapping of strings, so it expands the fallback capabilities of the unicode version. Also with regards to "more work into unicode support" first, you can't build a house without a foundation. This falls into the foundation category - implementation of the core functionality everything else uses. This makes for a better foundation.

  2. Asking people to download and install it themselves will be as popular as saying 'you need to install X library' ever is. Plus already mentioned problems with relying on potentially older system versions that don't support useful features.

  3. I'd be good with merging it into the unicode-icu branch first instead of incrementally, but I think it's easier to do it as a standalone first.

@captdeaf

This comment has been minimized.

Contributor

captdeaf commented Aug 23, 2018

What about making it a git submodule ?

Configure can detect if a workable version of pcre2 is installed system-wide. If it's not, then pull the git submodule, compile in-place, and go from there.

Also, I have no problems with people who complain that we may have external dependencies - so does every other package out there. Will there be grumbling? Probably, yes. But I care less about those grumbles than about "We just quadrupled the size of our repo" grumbling.

@shawnw

This comment has been minimized.

Contributor

shawnw commented Aug 23, 2018

pcre is hosted in a SVN repository, so unless git can work with that, no joy on submodules. And no, I don't want to set up and maintain a mirror of it.

Hmm. I wonder if I can just delete the documentation directories or files without the installer failing...

@shawnw

This comment has been minimized.

Contributor

shawnw commented Aug 25, 2018

New version has configure downloads pcre2 source tarball as needed and unpack it, instead of being included in the Penn repository.

shawnw added some commits Aug 25, 2018

@shawnw shawnw merged commit 51beed3 into master Oct 12, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment