-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<nowiki> appearing where it should not #259
Comments
Ugh... This sounds like a parser issue. Is there an existing page where you're seeing this error? (I just want to get an idea of how widespread this issue is) As for the actual fix, I'll have to look at the nowiki implementation. This is a particularly complicated piece of code that I wrote early in the XOWA parser implementation. It's possible that either my impersonation wasn't good enough, or MediaWiki changed something recently. I'll look again at the code later this week, but depending on how widespread the above is, this may be my highest priority. Thanks! |
I stumbled across it when looking at Template:Reflist/doc - that is, I found it when looking at the documentation to Template:Reflist |
I have just scanned all the enwiki html databases (18 of them) and the only one with <nowiki> in it seems to be 1965–66_TSV_1860_Munich_season |
Thanks for the follow-up. I found the issue. It's related to the <tag> function. The simplified example wikitext would be the following:
... which outputs nowiki tags This behavior is caused by the tag function wrapping the original contents in a UNIQ block and unwrapping later. I have to look at MediaWiki code later to see what is the proper fix. A sloppy proof of concept hack would be to make the following change to https://github.com/gnosygnu/xowa/blob/master/400_xowa/src/gplx/xowa/xtns/pfuncs/strings/Pfunc_tag.java#L47
However, this won't work on a permanent basis b/c the Main() parser should not be invoked in nested calls I'll comment again here when I have a more robust fix. On another note, how do you scan the html databases? I assume you have some adhoc code that un-hzips each html page and then scans the full-text? If so, how long does that take? I'd imagine it would take at least 2+ hours for each scan (unless you're saving the un-hzipped content as files somewhere) |
To scan the html, I have a simple python script that does essentially as you describe see the gist checkhtml.py On the machine I use this takes about 30 mins |
Cool. This should pick up most of the errors, since they aren't hzipped. I'll give the python script a try when I get home later. It's interesting that your script is relatively concise yet powerful. One day, when I get rid of hzip, it'll be pretty useful in scanning through all the html pages |
Thanks for the example. Will take a look at it this weekend, but nowiki debugging always gives me a headache. |
I thought I would take a look at this and have noticed quote a lot of commented out code regarding UNIQ The example I was specifically tracking down was en.wikipedia.org/wiki/Template:Party of European Socialists/meta/color It does seem to work with the current code (this is due to the nowiki text being 'esacpaed') I tracked things to Xop_tblw_wkr.java Instead, I took the tokens identified and effectively passed them through However, I believe there is an underlying issue with the table tokens - they all assume that they refer to the original source |
Yeah, I added this a while ago. I forget why I left it commented (probably did not want to risk changing behavior) Let me put it on tab for this weekend. Thanks. |
The following wikitext
In the wikipedia sandbox produces
Xowa produces
Ignoring the red error, note the presence of the text '<nowiki>' and '</nowiki>' in the left hand column
The handling of <nowiki> does not seem quite correct
The text was updated successfully, but these errors were encountered: