Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No parsing of text inside <ref>? #67

Closed
wetneb opened this issue Sep 23, 2017 · 2 comments
Closed

No parsing of text inside <ref>? #67

wetneb opened this issue Sep 23, 2017 · 2 comments

Comments

@wetneb
Copy link
Member

wetneb commented Sep 23, 2017

Hi!
After some investigation, it looks like things enclosed by XML tags are not parsed further. For instance:

Groundbreaking claim.<ref>see {{cite book|author=Chuck Norris|title=The Truth}}</ref>

The citation template will not be parsed at all: the content of the <ref>...</ref> is just represented as a String.

I understand this might be desirable for tags like <nowiki/> but I'm not sure why this would apply to any tag? How could I modify the parser to recurse inside the <ref>?

@hannesd
Copy link
Member

hannesd commented Sep 25, 2017

The behavior you describe only applies to tags that were configured as tag extensions in the parser (through the config). Tag extensions are basically functions that are free to choose what they want to do with their content. Therefore the parser cannot just parse their content because there's a good chance the content isn't even wikitext.

I took a look at the engine code and the <math> and <ref> extension have nonsensical implementations (return null). You have two options here:

a) Remove the math and ref tag extension from the configuration. They should then be treated as unknown XML elements and their content would get parsed.

b) Properly implement the extension you require. Here's just an example how one could parse the extension's content in the MathTagExtImpl.invoke(...) method of the tag extension:

try
{
	EngProcessedPage processed = frame.getEngine().parseAndPostprocess(
			new PageId(frame.getTitle(), -1L),
			body.getContent(),
			null);
	EngPage page = processed.getPage();
	return nf().unwrap(page);
}
catch (EngineException e)
{
	return nf().softError("Processing of <math> tag failed");
}

I cannot guarantee that this code actually works as I have not tested it.

@wetneb
Copy link
Member Author

wetneb commented Sep 25, 2017

@hannesd awesome, thanks a lot!

@wetneb wetneb closed this as completed Sep 25, 2017
wetneb added a commit to OpenRefine/OpenRefine that referenced this issue Oct 20, 2017
This is a temporary fix before we do full Wikitext parsing inside references
(this needs a change upstream). See sweble/sweble-wikitext#67 .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants