<nowiki> appearing where it should not #259

desb42 · 2018-10-27T09:16:14Z

The following wikitext

{{markup|title=Using only footnote-style references
|<nowiki>Lorem ipsum.<ref>Source name, access date, etc.</ref>

Lorem ipsum dolor sit amet.<ref>Source name, access date, etc.</ref>

==References==
{{Reflist}}</nowiki>
|Lorem ipsum.<ref>Source name, access date, etc.</ref>
Lorem ipsum dolor sit amet.<ref>Source name, access date, etc.</ref>

{{fake heading|sub=3|References}}
{{Reflist}}
}}

In the wikipedia sandbox produces

Xowa produces

Ignoring the red error, note the presence of the text '<nowiki>' and '</nowiki>' in the left hand column

The handling of <nowiki> does not seem quite correct

The text was updated successfully, but these errors were encountered:

gnosygnu · 2018-10-27T18:55:11Z

Ugh... This sounds like a parser issue. Is there an existing page where you're seeing this error? (I just want to get an idea of how widespread this issue is)

As for the actual fix, I'll have to look at the nowiki implementation. This is a particularly complicated piece of code that I wrote early in the XOWA parser implementation. It's possible that either my impersonation wasn't good enough, or MediaWiki changed something recently.

I'll look again at the code later this week, but depending on how widespread the above is, this may be my highest priority.

Thanks!

desb42 · 2018-10-28T09:00:36Z

I stumbled across it when looking at Template:Reflist/doc - that is, I found it when looking at the documentation to Template:Reflist
I cannot tell how widespread it is; however I suspect it is an edge case

desb42 · 2018-10-28T09:09:17Z

I have just scanned all the enwiki html databases (18 of them) and the only one with <nowiki> in it seems to be 1965–66_TSV_1860_Munich_season

gnosygnu · 2018-10-29T00:23:33Z

Thanks for the follow-up.

I found the issue. It's related to the <tag> function. The simplified example wikitext would be the following:

{{#tag:pre|<nowiki>A<b>B</b></nowiki>}}

... which outputs nowiki tags

This behavior is caused by the tag function wrapping the original contents in a UNIQ block and unwrapping later. I have to look at MediaWiki code later to see what is the proper fix. A sloppy proof of concept hack would be to make the following change to https://github.com/gnosygnu/xowa/blob/master/400_xowa/src/gplx/xowa/xtns/pfuncs/strings/Pfunc_tag.java#L47

if (args_len > 0) {	// handle no args; EX: "{{#tag:ref}}" -> "<ref></ref>"
	byte[] temp = Pf_func_.Eval_arg_or_empty(ctx, src, caller, self, args_len, 0);
	temp = ctx.Wiki().Parser_mgr().Main().Parse_text_to_html(Xop_ctx.New__sub__reuse_page(ctx), temp);
	tmp_bfr.Add(temp);
}

However, this won't work on a permanent basis b/c the Main() parser should not be invoked in nested calls

I'll comment again here when I have a more robust fix.

On another note, how do you scan the html databases? I assume you have some adhoc code that un-hzips each html page and then scans the full-text? If so, how long does that take? I'd imagine it would take at least 2+ hours for each scan (unless you're saving the un-hzipped content as files somewhere)

desb42 · 2018-10-29T06:03:14Z

To scan the html, I have a simple python script that does essentially as you describe

see the gist checkhtml.py

On the machine I use this takes about 30 mins
This produces 6059 entries

gnosygnu · 2018-11-01T14:29:44Z

Cool. This should pick up most of the errors, since they aren't hzipped.

I'll give the python script a try when I get home later. It's interesting that your script is relatively concise yet powerful. One day, when I get rid of hzip, it'll be pretty useful in scanning through all the html pages

desb42 · 2019-02-14T17:41:07Z

I have just found an instance of this <nowiki> issue which has broader consequences

Within the source of the page is the following lines

<th scope="row" class="navbox-group" style="background: white; 
-moz-box-shadow: inset 2px 2px 0 <nowiki>#F0001C</nowiki>, inset -2px -2px 0 <nowiki>#F0001C</nowiki>; 
-webkit-box-shadow: inset 2px 2px 0 <nowiki>#F0001C</nowiki>, inset -2px -2px 0 <nowiki>#F0001C</nowiki>; 
box-shadow: inset 2px 2px 0 <nowiki>#F0001C</nowiki>, inset -2px -2px 0 <nowiki>#F0001C</nowiki>;;width:1%">

Note the presence of many <nowiki>

Looking at the wikitext the area under discussion is {{Party of European Socialists}}
This in turn contains three {{Party of European Socialists/meta/color}} entries
And that template contains the <nowiki> entry

I think it needs a little boost in priority

gnosygnu · 2019-02-15T03:34:30Z

Thanks for the example. Will take a look at it this weekend, but nowiki debugging always gives me a headache.

desb42 · 2019-03-12T11:47:29Z

And here's another <nowiki> the other way around. That is <nowiki> tags do not seem to be taken
en.wiktionary.org/wiki/Wiktionary:Beer_parlour/2011/March#CFI_and_vandalism

The wikitext off this section is:

== CFI and vandalism ==

Now this is a section CFI could do well without:

<div style="border-left: 1px solid #C00; border-left-width: 3px; padding-left: .5em; margin-left: 2em;">
<nowiki>==Vandalism==</nowiki>

From time to time, various parties will insert material into Wiktionary which clearly has nothing

Xowa is treating the ==Vandalism== as a header, mediawiki just as text

desb42 · 2019-05-13T12:11:33Z

I thought I would take a look at this and have noticed quote a lot of commented out code regarding UNIQ
So I reinstated them to see what happens

The example I was specifically tracking down was en.wikipedia.org/wiki/Template:Party of European Socialists/meta/color

It does seem to work with the current code (this is due to the nowiki text being 'esacpaed')

I tracked things to Xop_tblw_wkr.java Atrs_make
This routine essentially finds all the tokens associated with the attributes to the table element, works out where they start and end and then throws them away.
For <nowiki>, there is piece of commented out code to use Uniq_mgr

Instead, I took the tokens identified and effectively passed them through Xot_tmpl_wtr.Write
This seemed to work in the short term

However, I believe there is an underlying issue with the table tokens - they all assume that they refer to the original source
Using the above approach I think the object prv_tblw should not only be adjusted for range but also for the potentially new and different sized source
(Or am I just rambling)

gnosygnu · 2019-05-14T02:44:25Z

I thought I would take a look at this and have noticed quote a lot of commented out code regarding UNIQ
So I reinstated them to see what happens

Yeah, I added this a while ago. I forget why I left it commented (probably did not want to risk changing behavior)

Let me put it on tab for this weekend. Thanks.

gnosygnu added the core - parser label Oct 27, 2018

gnosygnu mentioned this issue Dec 13, 2018

Template:DISPLAYTITLE/doc display variance #300

Open

gnosygnu added [type - bug] [priority 3 - moderate] [schedule 2 - within weeks] [risk 3 - moderate] [effort 3 - less than a week] labels Feb 15, 2019

gnosygnu added [schedule 1 - within days] and removed [schedule 2 - within weeks] labels May 14, 2019

This was referenced May 18, 2019

<table> column not showing complete text #466

Open

Template:Pre2 not behaving correctly #468

Open

desb42 mentioned this issue Jun 18, 2019

another <nowiki> issue #499

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

<nowiki> appearing where it should not #259

<nowiki> appearing where it should not #259

desb42 commented Oct 27, 2018

gnosygnu commented Oct 27, 2018

desb42 commented Oct 28, 2018

desb42 commented Oct 28, 2018

gnosygnu commented Oct 29, 2018

desb42 commented Oct 29, 2018

gnosygnu commented Nov 1, 2018

desb42 commented Feb 14, 2019

gnosygnu commented Feb 15, 2019

desb42 commented Mar 12, 2019

desb42 commented May 13, 2019

gnosygnu commented May 14, 2019

<nowiki> appearing where it should not #259

<nowiki> appearing where it should not #259

Comments

desb42 commented Oct 27, 2018

gnosygnu commented Oct 27, 2018

desb42 commented Oct 28, 2018

desb42 commented Oct 28, 2018

gnosygnu commented Oct 29, 2018

desb42 commented Oct 29, 2018

gnosygnu commented Nov 1, 2018

desb42 commented Feb 14, 2019

gnosygnu commented Feb 15, 2019

desb42 commented Mar 12, 2019

desb42 commented May 13, 2019

gnosygnu commented May 14, 2019