Resolved #4: Handles multi-byte characters with url-retrieve (i.e., without the external curl command) #29

algal · 2023-03-31T01:42:41Z

These two commits fix handling of multi-byte characters in the prompt send to OpenAI and in the response received from it.

As a result, you can now use non-Latin alphabets like Greek, or emojis.

To make requests work, it was necessary to encode the loaded API key as utf-8, for reasons that remain obscure, but may have to do with some subtlety concerning how the concat function combines multibyte and non-multibyte strings. To make replies work, it was necessary to decode the response body as utf-8 before, not after, parsing it as json.

I have tested this lightly on Emacs 28.1 running on Linux.

If the user provides the OpenAPI key via a function, as is the case by default if the user puts credentials in an auth-sources resource like .authinfo or .authinfo.gpg, then it is necessary to encode the function's returned value into utf-8 before passing it onward to build the HTTP request. This commit ensures that happen. Why is this necessary, given that the API key contains only alphanumeric characters and therefore should be byte-for-byte the same in utf-8 as in us-acii? I don't know. It may because emacs's url-http.el library concats many strings together, and they all need to be identically encoded before they can be combined correctly. Whatever the reason, this fix works and allows you to send prompts which include Unicode characters that require multibyte encodings in UTF-8

This changes response parsing so that the response body is decoded as UTF-8 and then parsed as JSON, rather than the other way around. This fixes the handling of responses that contain Unicode characters which are encoded with multiple bytes in UTF-8, such as emojis.

algal · 2023-03-31T01:43:40Z

This is intended to resolve #4

karthink · 2023-04-01T01:04:40Z

Appears to work for me too. This is great! I'm also on Emacs 28.2 on Linux, I'm going to make the assumption here that explicitly encoding all parts of the request as utf-8 is the conservative approach and thus unlikely to cause issues in other versions or platforms.

To make requests work, it was necessary to encode the loaded API key as utf-8, for reasons that remain obscure, but may have to do with some subtlety concerning...

Ah, I can see why this could be a problem.

Pushed.

algal added 2 commits March 30, 2023 17:34

karthink merged commit c2ad1c0 into karthink:master Apr 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolved #4: Handles multi-byte characters with url-retrieve (i.e., without the external curl command) #29

Resolved #4: Handles multi-byte characters with url-retrieve (i.e., without the external curl command) #29

algal commented Mar 31, 2023

algal commented Mar 31, 2023

karthink commented Apr 1, 2023

Resolved #4: Handles multi-byte characters with url-retrieve (i.e., without the external curl command) #29

Resolved #4: Handles multi-byte characters with url-retrieve (i.e., without the external curl command) #29

Conversation

algal commented Mar 31, 2023

algal commented Mar 31, 2023

karthink commented Apr 1, 2023