New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the name of the normalize function #4

Closed
JesseKPhillips opened this Issue Feb 22, 2018 · 5 comments

Comments

3 participants
@JesseKPhillips

JesseKPhillips commented Feb 22, 2018

While D has good tools to handle name conflicts it can still be useful to avoid needing to use them. And std.uni.normalize seem likely to be a function that might be imported when dealing with XML and so I believe that avoiding the conflict all together would be a good choice.

@jmdavis

This comment has been minimized.

Owner

jmdavis commented Feb 23, 2018

I don't know. I don't entirely disagree, but I have no clue what else to name it, and the language does provide good ways to disambiguate. normalize is the best name that I could come up with, and I don't want to go with a bad name just to avoid naming conflicts. What would you suggest it be called?

@JesseKPhillips

This comment has been minimized.

JesseKPhillips commented Feb 23, 2018

Well, technically I think decode is closer as we have it std.xml already, though that doesn't solve the conflict problem because std.utf also has decode. unescape could work.

@jmdavis

This comment has been minimized.

Owner

jmdavis commented Feb 24, 2018

Hmm. Well, unescape seems decent, though it doesn't work as well in the sense that removing carriage returns doesn't involve escaping or unescaping anything. If we had unescape, then the reverse would be escape, whereas right now, we have normalize and the opposite (which hasn't been released yet) is currently denormalize, which I'm not a big fan of. Maybe escapeXML and unescapeXML? I don't like that they don't fit the carriage return behavior very well, but they could still be better overall. I'll have to think about it. Whatever I end up with, hopefully I can avoid a bunch of bikeshedding when it finally comes time to try and get dxml through the Phobos review process.

@jmdavis jmdavis added the enhancement label Feb 24, 2018

@ghost91-

This comment has been minimized.

ghost91- commented Feb 24, 2018

kxml, the XML library I was using before the release of dxml, calls those functions xmlDecode and xmlEncode, which seems a good fit (though I'd prefer encodeXML and decodeXML and escapeXML, unescapeXML would also be fine imo). However, those functions don't about \r. The fact that we have difficulty in finding a proper name for the function because of handling \r leads me to think that maybe it should not be handled in the same function.

A side note: I think handling of \r is not done correctly at the moment: If I understand the code correctly, normalize simply strips all occurrences of \r. However, the correct behaviour as by the XML spec is to replace both any 2 character sequences \r\n and any \r not followed by a \n by a \n (see https://www.w3.org/TR/REC-xml/#sec-line-ends).

@jmdavis

This comment has been minimized.

Owner

jmdavis commented Feb 24, 2018

However, those functions don't about \r. The fact that we have difficulty in finding a proper name for the function because of handling \r leads me to think that maybe it should not be handled in the same function.

Handling them in a separate function means processing the data twice to strip out the carriage returns and transform the character entities. So, I don't think that it really makes sense to separate that into two separate functions.

A side note: I think handling of \r is not done correctly at the moment: If I understand the code correctly, normalize simply strips all occurrences of \r. However, the correct behaviour as by the XML spec is to replace both any 2 character sequences \r\n and any \r not followed by a \n by a \n (see https://www.w3.org/TR/REC-xml/#sec-line-ends).

That's not quite true. https://www.w3.org/TR/REC-xml/#sec-common-syn also talks about either removing \r or replacing it with \n, so it would be correct for the parser to either ignore \r or treat it as \n. So, for everything except for counting lines, the parser treats it as \n, since that's the simplest. Regardless, it's impossible for the parser to actually do anything to \r while parsing, because it returns slices of the input, and so all of the carriage returns must be left in - which is why normalize then strips them out. For better or worse, the spec gives leeway on how to handle carriage returns, and what dxml does takes advantage of that.

@jmdavis jmdavis added the 0.3 label Apr 18, 2018

@jmdavis jmdavis closed this Apr 18, 2018

jmdavis added a commit that referenced this issue Apr 18, 2018

Rename dxml.util.normalize to dxml.util.decodeXML.
#4

normalize is now decodeXML and asNormalized is now asDecodedXML. The old
names are now deprecated aliases and are scheduled to be removed in dxml
0.4.0.

The new names fit better with the newly added writer functionality
(denormalize didn't seem like a great function name) and don't risk
conflicting with std.uni.normalize (which stands a fairly high chance of
being used in conjuction with dxml and thus would result in symbol
conflicts).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment