Content of an element can also contain a tag terminator ('>') #22

matsumotosyu · 2020-04-07T09:25:41Z

The content of an element can also contain a tag terminator ('>'), even if the CDATA, COMMENT or PI sections are not used. (Tag starters ('<') cannot be included.)

Citation.
https://www.w3.org/TR/xml/#NT-content

[43] content ::= CharData? ((element | Reference | CDSect | PI | Comment) CharData?)*
[14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)

From the above, the following was the correct XML format. (I'm sorry.)
(I've also verified it using this site ( https://www.w3schools.com/xml/xml_validator.asp ))

<x some="・・>・・" any='・・"・・>・・'>・・・(A string that does not contain '<', but can contain '>'.)・・>・・</x>

So, I rethought the logic for getting the tag endings as follows.

・When retrieving the tag endpoint ('>') of a tag containing the current cursor, read the current cursor position one character at a time and use the following rules to find the tag endpoint ('>') while skipping the character string.

When the character the cursor points to matches "(double quotes), jump the cursor to the next "(double quotes).†1†3
When the character the cursor points to matches a '(single quote), jump the cursor to the next '(single quote).†2†3
If the character pointed by the cursor matches '>', it is treated as the end of the tag.

†1 " (double quotes) does not appear in a string enclosed in "(double quotes).
†2 ' (single quote) does not appear in a string enclosed in ' (single quote).
†3 If the corresponding symbol does not exist, the error is handled in the same way as before.

In considering the above logic, the following definitions were taken into account.

・The BNF of the start tag (STAG) (tag name, Attribute and blank (S) can only be included in the start tag)

Citation.
https://www.w3.org/TR/xml/#sec-starttags

[40] STag ::= '<' Name (S Attribute)* S? '>' [WFC: Unique Att Spec]

EmptyElemTag, as well as the start tag (STAG)

[44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>' [WFC: Unique Att Spec]

The BNF of Attribute

[41] Attribute ::= Name Eq AttValue [VC: Attribute Value Type] [WFC: No External Entity References] [WFC: No < in Attribute Values]

The BNF of AttValue

[10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"
[66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';' [WFC: Legal Character]
[67] Reference ::= EntityRef | CharRef
[68] EntityRef ::= '&' Name ';' [WFC: Entity Declared] [VC: Entity Declared] [WFC: Parsed Entity] [WFC: No Recursion]

I also modified the test code assumptions and added test items.
We would appreciate it if you would consider incorporating the above considerations.

lib/parser.js

nikku · 2020-04-07T13:03:34Z

Thanks for the continued work on the topic.

I see that this will have a positive impact on the code base and usage.

nikku · 2020-04-08T07:58:58Z

Merged via f12ad15.

nikku · 2020-04-08T07:59:38Z

I was able to further simplify the skipping logic (and save some bytes) via 2f208e2.

Thanks for your great work 🙏.

Content of an element can also contain a tag terminator ('>')

d03ca67

nikku reviewed Apr 7, 2020

View reviewed changes

lib/parser.js Outdated Show resolved Hide resolved

nikku reviewed Apr 7, 2020

View reviewed changes

lib/parser.js Outdated Show resolved Hide resolved

nikku reviewed Apr 7, 2020

View reviewed changes

lib/parser.js Show resolved Hide resolved

Change the declaration of a temporary variable

57cd937

nikku closed this Apr 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content of an element can also contain a tag terminator ('>') #22

Content of an element can also contain a tag terminator ('>') #22

matsumotosyu commented Apr 7, 2020

nikku commented Apr 7, 2020 •

edited

Loading

nikku commented Apr 8, 2020

nikku commented Apr 8, 2020

Content of an element can also contain a tag terminator ('>') #22

Content of an element can also contain a tag terminator ('>') #22

Conversation

matsumotosyu commented Apr 7, 2020

nikku commented Apr 7, 2020 • edited Loading

nikku commented Apr 8, 2020

nikku commented Apr 8, 2020

nikku commented Apr 7, 2020 •

edited

Loading