Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

symbols= parameter doesn't work with compressed revisions ($wgCompressRevisions = true) #7

Closed
Xenareee opened this issue Mar 5, 2023 · 6 comments
Labels
enhancement New feature or request

Comments

@Xenareee
Copy link

Xenareee commented Mar 5, 2023

This issue makes JsCalendar unusable/broken with the "symbols" parameter.

I mistakenly reported this bug on the Mediawiki Phabricator. I explained it in depth there, so to not repeat the same thing here's the link to the report:
https://phabricator.wikimedia.org/T331228

@edwardspec
Copy link
Owner

edwardspec commented Mar 5, 2023

Not reproduced locally. Might be caused by one of the pages in the wiki having incorrect UTF-8 on it.

The error mentioned on https://phabricator.miraheze.org/T9688 is happening in MediaWiki parser (not in the JsCalendar code):

preg_match_all error 4: Malformed UTF-8 characters, possibly incorrectly encoded
from /srv/mediawiki/w/includes/MagicWordArray.php(319)
#0 /srv/mediawiki/w/includes/parser/Parser.php(4116): MagicWordArray->matchAndRemove(string)
#1 /srv/mediawiki/w/includes/parser/Parser.php(1636): Parser->handleDoubleUnderscore(string)
#2 /srv/mediawiki/w/includes/parser/Parser.php(882): Parser->internalParse(string, boolean, boolean)
#3 /srv/mediawiki/w/includes/parser/Parser.php(906): Parser->recursiveTagParse(string, boolean)
#4 /srv/mediawiki/w/extensions/JsCalendar/includes/EventCalendar.php(192): Parser->recursiveTagParseFully(string)
...

It's theoretically possible that the page had incorrect UTF-8 on it (JsCalendar passes contents of the page to MediaWiki parser),
but per what I see in the Phabricator, your page only had the word "Test" on it and nothing else (so it can't be incorrect UTF-8).

JsCalendar doesn't do any transformations to the "contents of the page" string before passing it to Parser. It's not truncated, not wrapped, not encoded, etc. JsCalendar takes the page text directly from the database and gives it to Parser.

@edwardspec edwardspec added the bug Something isn't working label Mar 5, 2023
@edwardspec
Copy link
Owner

See https://phabricator.wikimedia.org/T321234
Closing, as this very same error happens without JsCalendar.

@edwardspec edwardspec added invalid This doesn't seem right and removed bug Something isn't working labels Mar 5, 2023
@bawolff
Copy link
Contributor

bawolff commented Mar 6, 2023

FWIW, to reiterate what i said on the other bug, I do not believe that this issue is the same as https://phabricator.wikimedia.org/T321234. https://phabricator.wikimedia.org/T321234 is about old non-utf8 valid data that for some reason was not normalized properly. Any page that this is happening on that is less than 15 years old, probably has a different cause (May or may not be this extensions fault, I'm not familiar with this extensions code. Could be some other issue at miraheze)

@bawolff
Copy link
Contributor

bawolff commented Mar 6, 2023

It may quite possibly be the fault of code at https://github.com/edwardspec/mediawiki-extension-JsCalendar/blob/master/includes/FindEventPagesQuery.php#L188 - this is not processing the old_flags field properly, which could result in the exception described depending on the wikis config (Processing old_flags properly is really hard, I would generally recommend using MW core's various classes to get page text instead if at all possible)

@edwardspec edwardspec reopened this Mar 7, 2023
@edwardspec edwardspec removed the invalid This doesn't seem right label Mar 7, 2023
@edwardspec
Copy link
Owner

edwardspec commented Mar 7, 2023

this is not processing the old_flags field properly

Thank you.

Turns out, Miraheze had $wgCompressRevisions = true;, that's why it's not working. All revisions are compressed.
https://github.com/miraheze/mw-config/blob/7b230dd2072bb2361b099adba3ba17b9edcdaf33/Database.php

I would generally recommend using MW core's various classes to get page text instead

Will investigate. This seems to be handled in SqlBlobStore::expandBlob(), can also be done via SqlBlobStore::getBlobBatch().

@edwardspec edwardspec changed the title Internal error/parsing problems on recently added/edited pages symbols= parameter doesn't work with compressed revisions ($wgCompressRevisions = true) Mar 7, 2023
@edwardspec edwardspec added the enhancement New feature or request label Mar 7, 2023
edwardspec added a commit that referenced this issue Mar 7, 2023
@edwardspec
Copy link
Owner

Should be working now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants