Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed language detection to support parsing of HTML fragments #121

Merged
merged 3 commits into from May 27, 2017

Conversation

jkphl
Copy link
Contributor

@jkphl jkphl commented May 25, 2017

The current way of language detection breaks when parsing an HTML fragment only (instead of a complete HTML document). The parser expects an <html> element to be present and will run into

Argument 1 passed to Mf2\Parser::language() must be an instance of DOMElement, instance of DOMDocument given, called in L:\micrometa\vendor\mf2\mf2\Mf2\Parser.php on line 513 and defined

otherwise. Making sure the parent node is a DOMElement before recursing fixes the problem.

@gRegorLove
Copy link
Member

This looks good (simple enough of an update), but can you add a test case? Specifically I'm wondering if this still parses language from the HTML fragments or skips it.

@jkphl
Copy link
Contributor Author

jkphl commented May 25, 2017

@gRegorLove I tried running the present tests but 2 of them fail for me (both having to do with fetching remote documents from Barnaby's and Aaron's sites, so I guess it's a problem on my side). I don't know if there's a test included that checks for the language, but if so, it should still work. The new release Aaron just pushed out broke my code (micrometa) because of the mandatory <html> element, adding the check for DOMElement fixes it again. I'll see if I can add a test proving that language parsing still works and also HTML fragments are supported (will take me a while thoug, please bear with me).

@jkphl
Copy link
Contributor Author

jkphl commented May 25, 2017

@gRegorLove Re-read your comment. Well, there should be no difference in language handling after adding the check for DOMElement, except that the recursion stops before hitting the DOMDocument. For fragments without an <html> element there will be no <meta http-equiv=""> check, but xml:langs will still be detected.

@jkphl
Copy link
Contributor Author

jkphl commented May 26, 2017

@gRegorLove I now added a simple test for parsing the language of an HTML fragment without enclosing <html> element. All the tests in ParseLanguageTest.php run totally fine for both complete HTML documents as well as HTML fragments.

@gRegorLove
Copy link
Member

Awesome, thanks! I think I misunderstood the update initially. Having another test is good though. :)

@aaronpk
Copy link
Member

aaronpk commented May 27, 2017

I tried running the test case without the change you submitted, but it doesn't fail the test. Can you provide example HTML that causes this error before the fix?

@aaronpk
Copy link
Member

aaronpk commented May 27, 2017

Got it, thanks!

@jkphl
Copy link
Contributor Author

jkphl commented May 27, 2017

Interesting! I just realized that the problem hits you only in an edge case: The parser allows to pass in a DOMDocument as $input. When this document has been constructed via ->loadXML() (instead of ->loadHTML()), then an HTML fragment won't be wrapped with the base HTML scaffold ... hence the error.

@aaronpk aaronpk merged commit 84bd6ef into microformats:master May 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants