Skip to content

fix: overhaul SAX::Parser encoding handling#3288

Merged
flavorjones merged 6 commits intomainfrom
918-sax-parser-encoding
Jul 7, 2024
Merged

fix: overhaul SAX::Parser encoding handling#3288
flavorjones merged 6 commits intomainfrom
918-sax-parser-encoding

Conversation

@flavorjones
Copy link
Member

What problem is this PR intended to solve?

Previously, encoding overrides were not implemented for XML::SAX::Parser#parse_memory (as reported in #918) and XML::SAX::Parser#parse_file.

However, this commit goes further and significantly simplifies and unifies the two SAX::ParserContext implementations and the two SAX::Parser implementations.

This commit also allows Encoding objects and encoding names to be passed into the SAX::ParserContext methods, and the XML memory and file methods now accept and properly use passed encodings.

Finally, this commit also backfills a lot of test coverage for the XML and the HTML4 sax parser encoding.

Closes #918

Have you included adequate test coverage?

Yes.

Does this change affect the behavior of either the C or the Java implementations?

Yes, but they are more consistent with each other.

"it's" means "it is", "its" means "belonging to it"
We'll use this in an upcoming commit to simplify the sax parsers
and polyfill xmlSwitchEncodingName. We'll need this functionality in
the next commit.
Previously, encoding overrides were not implemented for
XML::SAX::Parser#parse_memory (as reported in #918) and
XML::SAX::Parser#parse_file.

However, this commit goes further and significantly simplifies and
unifies the two SAX::ParserContext implementations and the two
SAX::Parser implementations.

This commit also allows Encoding objects and encoding names to be
passed into the SAX::ParserContext methods, and the XML memory and
file methods now accept and properly use passed encodings.

Finally, this commit also backfills a lot of test coverage for the XML
and the HTML4 sax parser encoding.

Closes #918
@flavorjones flavorjones force-pushed the 918-sax-parser-encoding branch from 2e99210 to f67b294 Compare July 7, 2024 20:39
@flavorjones flavorjones enabled auto-merge July 7, 2024 21:01
@flavorjones flavorjones merged commit 1ba1db1 into main Jul 7, 2024
@flavorjones flavorjones deleted the 918-sax-parser-encoding branch July 7, 2024 21:03
bihorco36 added a commit to puzzle/prawn-markup that referenced this pull request Dec 16, 2024
Due to changes in nokogiri 17, the SAX parser now needs a default
encoding: sparklemotion/nokogiri#3288
bihorco36 added a commit to puzzle/prawn-markup that referenced this pull request Dec 16, 2024
Due to changes in nokogiri 17, the SAX parser now needs a default
encoding: sparklemotion/nokogiri#3288
bihorco36 added a commit to puzzle/prawn-markup that referenced this pull request Dec 17, 2024
Due to changes in nokogiri 17, the SAX parser now needs a default
encoding: sparklemotion/nokogiri#3288
bihorco36 added a commit to puzzle/prawn-markup that referenced this pull request Dec 17, 2024
Due to changes in nokogiri 17, the SAX parser now needs a default
encoding: sparklemotion/nokogiri#3288
bihorco36 added a commit to puzzle/prawn-markup that referenced this pull request Dec 17, 2024
Due to changes in nokogiri 17, the SAX parser now needs a default
encoding: sparklemotion/nokogiri#3288
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SAX Parser ignores explicitly set 'UTF-8' encoding and proceeds to reencode the document resulting in double-encoding artifacts

1 participant