Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolved #4: Handles multi-byte characters with url-retrieve (i.e., without the external curl command) #29

Merged
merged 2 commits into from Apr 1, 2023

Conversation

algal
Copy link
Contributor

@algal algal commented Mar 31, 2023

These two commits fix handling of multi-byte characters in the prompt send to OpenAI and in the response received from it.

As a result, you can now use non-Latin alphabets like Greek, or emojis.

To make requests work, it was necessary to encode the loaded API key as utf-8, for reasons that remain obscure, but may have to do with some subtlety concerning how the concat function combines multibyte and non-multibyte strings. To make replies work, it was necessary to decode the response body as utf-8 before, not after, parsing it as json.

I have tested this lightly on Emacs 28.1 running on Linux.

algal added 2 commits March 30, 2023 17:34
If the user provides the OpenAPI key via a function, as is the case by
default if the user puts credentials in an auth-sources resource like
.authinfo or .authinfo.gpg, then it is necessary to encode the
function's returned value into utf-8 before passing it onward to build
the HTTP request. This commit ensures that happen.

Why is this necessary, given that the API key contains only
alphanumeric characters and therefore should be byte-for-byte the same
in utf-8 as in us-acii? I don't know. It may because emacs's
url-http.el library concats many strings together, and they all need
to be identically encoded before they can be combined correctly.

Whatever the reason, this fix works and allows you to send prompts
which include Unicode characters that require multibyte encodings in UTF-8
This changes response parsing so that the response body is decoded as
UTF-8 and then parsed as JSON, rather than the other way around.

This fixes the handling of responses that contain Unicode characters
which are encoded with multiple bytes in UTF-8, such as emojis.
@algal
Copy link
Contributor Author

algal commented Mar 31, 2023

This is intended to resolve #4

@karthink
Copy link
Owner

karthink commented Apr 1, 2023

Appears to work for me too. This is great! I'm also on Emacs 28.2 on Linux, I'm going to make the assumption here that explicitly encoding all parts of the request as utf-8 is the conservative approach and thus unlikely to cause issues in other versions or platforms.

To make requests work, it was necessary to encode the loaded API key as utf-8, for reasons that remain obscure, but may have to do with some subtlety concerning...

Ah, I can see why this could be a problem.

Pushed.

@karthink karthink merged commit c2ad1c0 into karthink:master Apr 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants