Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indented-xml-string: simplify handling of Apple-restricted chars #2

Merged

Conversation

LiberalArtist
Copy link
Contributor

Instead of an ad-hoc scheme of pre-escaping and un-escaping,
convert string x-expressions containing these characters into
(lists of) equivalent x-expressions using the Racket representations
of the entity and character references Apple mandates,
and let the XML library take care of the rest.

Instead of an ad-hoc scheme of pre-escaping and un-escaping,
convert string x-expressions containing these characters into
(lists of) equivalent x-expressions using the Racket representations
of the entity and character references Apple mandates,
and let the XML library take care of the rest.
@LiberalArtist
Copy link
Contributor Author

I started on some contract enhancements, too, but I need to think more thoroughly about what you said on the mailing list re CDATA. Again, I'm super excited you've made this library! I thought I was going to have to write one.

@otherjoel
Copy link
Owner

Thanks! It might be a bit before I can properly process this; any delay is due to my schedule, not lack of interest.

@otherjoel
Copy link
Owner

A slightly different test:

> (define test-xpr
    `(feed (author (name "Punch & Judy"))
           (title [[type "text"]] "Punch & Judy's <friend> © ℗ \"George\" Escapes!")
           (notes (div "Here & < > are escaped, but ' \" © ℗ ™ are not"))))

> (display (indented-xml-string test-xpr))
<feed>
  <author>
    <name>Punch &amp; Judy</name>
  </author>
  <title type="text">Punch &amp; Judy&apos;s &lt;friend&gt; &#169; &#8471; &quot;George&quot; Escapes!</title>
  <notes>
    <div>Here &amp; &lt; &gt; are escaped, but ' " © ℗ ™ are not</div>
  </notes>

This should be correct, in a normal world. I’m 90% sure this will be fine. But I don’t know if Apple cares that the result uses the decimal entity instead of the hex one.

In digging around a little more I found this, which seems to suggest that Unicode characters are actually fine and that Apple’s real issue is with named entities not being valid XML (see the part just before the example feed). If that’s the case then perhaps I’ve misread the Apple guidelines and the whole strategy is wrong; rather than replacing Unicode characters we should be scanning for any named entities and throwing exceptions.

Probably what I need to do is produce and submit a mock podcast to Apple Podcasts.

@otherjoel
Copy link
Owner

Upon reflection I will just merge this, since, regardless of Apple foibles we will need a way to properly escape for the entities that are defined by the XML spec (&apos; and &quot;) and this is a much better way of handling that.

Then I can reorganize to get cleaner contract boundaries, and then look at validating strings/x-expressions to ensure they will produce valid feeds after escaping. Part of that will mean ensuring there are no non-XML named entities present. (Or just translating them, similar to this gist)

@otherjoel otherjoel merged commit b279470 into otherjoel:main Nov 2, 2021
@otherjoel
Copy link
Owner

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants