Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http-request when URI is a puri:uri -- more silent coercion problems #13

Open
mon-key opened this issue Mar 20, 2012 · 1 comment
Open

Comments

@mon-key
Copy link

mon-key commented Mar 20, 2012

in lieu of your recent fix I've now noticed that when URI arg to http-request is a puri:uri we have a similar situation as when when URI is a string such that the following do not return equivalently:

(http-request #u"http://id.loc.gov/vocabulary/graphicMaterials/label/Action%20%26%20adventure%20dramas"
                     :preserve-uri t :method :head)

(drakma:http-request "http://id.loc.gov/vocabulary/graphicMaterials/label/Action%20%26%20adventure%20dramas"
                              :preserve-uri t :method :head)

It is beyond me whether this differences in return value constitutes a bug or not.

This said, I would like to point out that if it is considered a bug then the possible fixes around puri may not be quite so trivial as they were for strings esp. b/c puri:parse-uri breaks percent-encoded non-ASCII characters by silently coercing them to goo.

This returns:

(drakma:http-request "http://id.loc.gov/vocabulary/graphicMaterials/label/A%20la%20poup%C3%A9e%20prints"
                     :preserve-uri t 
                     :method :head)

These don't:

(drakma:http-request #u"http://id.loc.gov/vocabulary/graphicMaterials/label/A%20la%20poup%C3%A9e%20prints"
                     :preserve-uri t 
                     :method :head)

(drakma:http-request
 (puri:parse-uri "http://id.loc.gov/vocabulary/graphicMaterials/label/A%20la%20poup%C3%A9e%20prints")
 :preserve-uri t 
 :method :head)

additional discussion here:
http://paste.lisp.org/+2R44

Also, maybe these are relevant:
https://github.com/archimag/puri-unicode
https://github.com/franzinc/uri

@tmccombs
Copy link
Contributor

puri isn't converting the percent encodings into goo, it is converting them into the latin1 encoding for the percent encoding. In your example %C3 is à and %A9 is © in latin1. But "é" in the latin1 encoding is the same as é in UTF-8.

However, the RFC for urls (1738) says

Octets must be encoded if they have no corresponding graphic
character within the US-ASCII coded character set, if the use of the
corresponding character is unsafe, or if the corresponding character
is reserved for some other interpretation within the particular URL
scheme.

So, puri should not be un-encoding the percent encodings to non-ascii characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants