Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in GET_PREV_CHAR macro #278

Merged
merged 2 commits into from Mar 3, 2024

Conversation

chqrlie
Copy link
Collaborator

@chqrlie chqrlie commented Feb 19, 2024

  • pass cbuf_type variable to XXX_CHAR macros in lre_exec_backtrack()
  • improve readability of these macros
  • fix GET_PREV_CHAR macro: cptr was decremented twice on invalid high surrogate.

- pass `cbuf_type` variable to XXX_CHAR macros in `lre_exec_backtrack()`
- improve readability of these macros
- fix GET_PREV_CHAR macro: cptr was decrementes twice on invalid high surrogate.
@chqrlie
Copy link
Collaborator Author

chqrlie commented Feb 19, 2024

I found this while proof reading as I was merging some of the differences into quickjs. It is difficult to create a test case triggering this bug...

libregexp.c Outdated
if (_p < _end && is_lo_surrogate(*_p)) { \
c = from_surrogate(c, *_p++); \
} \
} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave out stylistic changes? They make the diff larger and more difficult to read.

As a reviewer, I first have to figure out if it's actually a functional change - and if not, if it's really equivalent to the old code. As a maintainer, style changes introduce churn that makes git blame less effective. As a contributor, it's rage inducing when your carefully crafted pull request ends up with a ton of merge conflicts because someone landed a style change the other day.

Changes that improve legibility are okay but a) sparingly, and b) should be split off into separate commits and marked NFC (Non-Functional Change.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that, I was trying to reduce the macro size and combine the tests for consistency. I posted a patch with minimal NFC differences.

- pass `cbuf_type` variable to XXX_CHAR macros in `lre_exec_backtrack()`
- improve readability of these macros
- fix GET_PREV_CHAR macro: cptr was decrementes twice on invalid high surrogate.
- minimize non functional changes
@chqrlie
Copy link
Collaborator Author

chqrlie commented Mar 3, 2024

I am going to merge this patch. This is work in progress:

  • we might use unions for the cbuf_xxx pointers to avoid warnings about alignment in casts
  • there is substantial room for performance improvement in lre_exec_backtrack handling some opcodes more explicitly for the 3 different supported encodings.
  • we should have variable length opcodes for string fragment matches, or at least fixes length string fragments.
  • we could also use different opcodes to avoid testing s->ignore_case and s->unicode for each character.
  • these changes would pave the way to generate JIT code for regexp matching.

@chqrlie chqrlie merged commit 5abbeac into quickjs-ng:master Mar 3, 2024
35 checks passed
@chqrlie chqrlie deleted the fix-get-char-macro branch March 4, 2024 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants