Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem reading gzipped kong.service.response.get_raw_body() #156

Closed
dwoctor opened this issue Mar 8, 2022 · 11 comments · Fixed by #161
Closed

Problem reading gzipped kong.service.response.get_raw_body() #156

dwoctor opened this issue Mar 8, 2022 · 11 comments · Fixed by #161
Labels
bug Something isn't working

Comments

@dwoctor
Copy link
Contributor

dwoctor commented Mar 8, 2022

I have been trying to read the kong.service.response.get_raw_body() in a kong js plugin.
I'm able to read the data fine when it's not gzipped.
However, when the response is gzipped, I'm unable to gunzip.

const rawBodyCompressed = await kong.service.response.getRawBody();
const rawBodyUncompressed = zlib.gunzipSync(rawBodyCompressed);

I wrote a version of the plugin in lua and was able to perform the task successfully.

local rawBodyCompressed = kong.service.response.get_raw_body()
local rawBodyUncompressed = zlib.inflate()(rawBodyCompressed)

When I compared the base64 encodings of both the js and lua responses from kong.service.response.get_raw_body() I found they did not match. Which to me points to an issue with the pdk, but I'm not 100% sure.

If any assistance can be provided in trying to rectify this problem I'm experiencing, it would be greatly appreciated.

@dwoctor
Copy link
Contributor Author

dwoctor commented Mar 8, 2022

cc @fffonion

@fffonion fffonion added the bug Something isn't working label Mar 9, 2022
@fffonion
Copy link
Contributor

fffonion commented Mar 9, 2022

@dwoctor could you share with me the base64 encoded output of both lua and js plugin?

@dwoctor
Copy link
Contributor Author

dwoctor commented Mar 9, 2022

@fffonion, I have been testing against https://rickandmortyapi.com/api/character/1 with Accept-Encoding: gzip

In the javascript plugin, I used the following to get the base64:

const rawBodyCompressed = await kong.service.response.getRawBody();
const rawBodyCompressedBase64 = Buffer.from(rawBodyCompressed, "binary").toString("base64");

The base64 generated by the javascript plugin was as follows:

H/0IAAAAAAAAA/39T0v9MBgG/VJyUv2+Sf39/WQIXv39J/39Sxr9aP39Zv05/f1N/f03/UD9/dM3LT/9/ShcJUpKRd0V/Xh0/TN5/f00/Uv9YgocdlP99m5v/f1o/f1z/X79cR8r/TD9EzZ9ZX0cP3A7Zwf9asv9eG5+/T40/f12QWp9HQM7/f1qE/04/Vn9/f39V/39wUf9NEP9/XP9Dv39G/39/SkV/T9t/S5wZf1keE/93/39av39Ov39Bf1M3k39Pv09B/0ZLT9GW/1V/aH9PV79bf0T/dRe/f1AVgFZDWQLIP0C/Wsg/QH9N0D9ciT9/RFCR/0dIXj9/RH9R/0fIf39CEpEUEL9HiIoEUH9CEpEUCL9EhH9/f1EBBUi/RBBBX0+EUH9CCpEUCH9ChFU/f1CBDUi/RFB/Qhq/Q/9CGpEUCP9GhEI/UYEC0T9IP12/X79d/00/RtvOTj9FDL9/f1o/f1nlDoe/f0s/Rdx/Qb9/f04/QoAAA==

In the lua plugin, I used the following to get the base64:

local rawBodyCompressed = kong.service.response.get_raw_body()
local rawBodyCompressedBase64 = base64.encode(rawBodyCompressed)

The base64 generated by the lua plugin was as follows:

H4sIAAAAAAAAA5XWT0vDMBgG8K9SclLo1r5Jus3eZAhevKgnxcNLGtto/5Fmgzn23U3BoTf3QKHpy5M3LT/a5ihcJUpKRc+dFaV4dOYzeeLeNPZLpGIKHHZTrN+2bm/nwmiNs3PlftdxHyvhMM4T46i2fWV9HD9wO2cH72rXi/J4bn7HPjTJ1XZBan0dAzvfxmoTwjiVWebj0txX3eDDgUe3NEOXxXPWDoaDG/qMxCkV56s/bbcucGXbZHhP5tuf8M5q7uw6ru0Fs0zDnk2wPuM9B/YZLT9GW8dV7eimoYo9Xv9t8hONz5RenJVAVgFZDWQLILsCsmsguwGyN0CWciSMyBFCR4gdIXiE6BHCR4gfIYCECEpEUELvHiIoEUGJCEpEUCKCEhGUiKBEBBUiqBBBBX0+EUGFCCpEUCGCChFUiKBCBDUiqBFBjQhq6A+ICGpEUCOCGhHUiKBGBAtEsCDxdul+63fnNLMbbznYuPsUMqf1gmiR62falDoeq6Us8hdx+gaJzNI4nwoAAA==

@fffonion
Copy link
Contributor

fffonion commented Mar 9, 2022

>>> 'H4sIAAAAAAAAA5XWT0vDMBgG8K9SclLo1r5Jus3eZAhevKgnxcNLGtto/5Fmgzn23U3BoTf3QKHpy5M3LT/a5ihcJUpKRc+dFaV4dOYzeeLeNPZLpGIKHHZTrN+2bm/nwmiNs3PlftdxHyvhMM4T46i2fWV9HD9wO2cH72rXi/J4bn7HPjTJ1XZBan0dAzvfxmoTwjiVWebj0txX3eDDgUe3NEOXxXPWDoaDG/qMxCkV56s/bbcucGXbZHhP5tuf8M5q7uw6ru0Fs0zDnk2wPuM9B/YZLT9GW8dV7eimoYo9Xv9t8hONz5RenJVAVgFZDWQLILsCsmsguwGyN0CWciSMyBFCR4gdIXiE6BHCR4gfIYCECEpEUELvHiIoEUGJCEpEUCKCEhGUiKBEBBUiqBBBBX0+EUGFCCpEUCGCChFUiKBCBDUiqBFBjQhq6A+ICGpEUCOCGhHUiKBGBAtEsCDxdul+63fnNLMbbznYuPsUMqf1gmiR62falDoeq6Us8hdx+gaJzNI4nwoAAA==
KeyboardInterrupt
>>> a='H4sIAAAAAAAAA5XWT0vDMBgG8K9SclLo1r5Jus3eZAhevKgnxcNLGtto/5Fmgzn23U3BoTf3QKHpy5M3LT/a5ihcJUpKRc+dFaV4dOYzeeLeNPZLpGIKHHZTrN+2bm/nwmiNs3PlftdxHyvhMM4T46i2fWV9HD9wO2cH72rXi/J4bn7HPjTJ1XZBan0dAzvfxmoTwjiVWebj0txX3eDDgUe3NEOXxXPWDoaDG/qMxCkV56s/bbcucGXbZHhP5tuf8M5q7uw6ru0Fs0zDnk2wPuM9B/YZLT9GW8dV7eimoYo9Xv9t8hONz5RenJVAVgFZDWQLILsCsmsguwGyN0CWciSMyBFCR4gdIXiE6BHCR4gfIYCECEpEUELvHiIoEUGJCEpEUCKCEhGUiKBEBBUiqBBBBX0+EUGFCCpEUCGCChFUiKBCBDUiqBFBjQhq6A+ICGpEUCOCGhHUiKBGBAtEsCDxdul+63fnNLMbbznYuPsUMqf1gmiR62falDoeq6Us8hdx+gaJzNI4nwoAAA=='
>>> b='H/0IAAAAAAAAA/39T0v9MBgG/VJyUv2+Sf39/WQIXv39J/39Sxr9aP39Zv05/f1N/f03/UD9/dM3LT/9/ShcJUpKRd0V/Xh0/TN5/f00/Uv9YgocdlP99m5v/f1o/f1z/X79cR8r/TD9EzZ9ZX0cP3A7Zwf9asv9eG5+/T40/f12QWp9HQM7/f1qE/04/Vn9/f39V/39wUf9NEP9/XP9Dv39G/39/SkV/T9t/S5wZf1keE/93/39av39Ov39Bf1M3k39Pv09B/0ZLT9GW/1V/aH9PV79bf0T/dRe/f1AVgFZDWQLIP0C/Wsg/QH9N0D9ciT9/RFCR/0dIXj9/RH9R/0fIf39CEpEUEL9HiIoEUH9CEpEUCL9EhH9/f1EBBUi/RBBBX0+EUH9CCpEUCH9ChFU/f1CBDUi/RFB/Qhq/Q/9CGpEUCP9GhEI/UYEC0T9IP12/X79d/00/RtvOTj9FDL9/f1o/f1nlDoe/f0s/Rdx/Qb9/f04/QoAAA=='
>>> a.decode('base64')
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\x95\xd6OK\xc30\x18\x06\xf0\xafRrR\xe8\xd6\xbeI\xba\xcd\xded\x08^\xbc\xa8\'\xc5\xc3K\x1a\xdbh\xff\x91f\x839\xf6\xddM\xc1\xa17\xf7@\xa1\xe9\xcb\x937-?\xda\xe6(\\%JJE\xcf\x9d\x15\xa5xt\xe63y\xe2\xde4\xf6K\xa4b\n\x1cvS\xac\xdf\xb6no\xe7\xc2h\x8d\xb3s\xe5~\xd7q\x1f+\xe10\xce\x13\xe3\xa8\xb6}e}\x1c?p;g\x07\xefj\xd7\x8b\xf2xn~\xc7>4\xc9\xd5vAj}\x1d\x03;\xdf\xc6j\x13\xc28\x95Y\xe6\xe3\xd2\xdcW\xdd\xe0\xc3\x81G\xb74C\x97\xc5s\xd6\x0e\x86\x83\x1b\xfa\x8c\xc4)\x15\xe7\xab?m\xb7.pe\xdbdxO\xe6\xdb\x9f\xf0\xcej\xee\xec:\xae\xed\x05\xb3L\xc3\x9eM\xb0>\xe3=\x07\xf6\x19-?F[\xc7U\xed\xe8\xa6\xa1\x8a=^\xffm\xf2\x13\x8d\xcf\x94^\x9c\x95@V\x01Y\rd\x0b \xbb\x02\xb2k \xbb\x01\xb27@\x96r$\x8c\xc8\x11BG\x88\x1d!x\x84\xe8\x11\xc2G\x88\x1f!\x80\x84\x08JDPB\xef\x1e"(\x11A\x89\x08JDP"\x82\x12\x11\x94\x88\xa0D\x04\x15"\xa8\x10A\x05}>\x11A\x85\x08*DP!\x82\n\x11T\x88\xa0B\x045"\xa8\x11A\x8d\x08j\xe8\x0f\x88\x08jDP#\x82\x1a\x11\xd4\x88\xa0F\x04\x0bD\xb0 \xf1v\xe9~\xebw\xe74\xb3\x1bo9\xd8\xb8\xfb\x142\xa7\xf5\x82h\x91\xebg\xda\x94:\x1e\xab\xa5,\xf2\x17q\xfa\x06\x89\xcc\xd28\x9f\n\x00\x00'
>>> b.decode('base64')
'\x1f\xfd\x08\x00\x00\x00\x00\x00\x00\x03\xfd\xfdOK\xfd0\x18\x06\xfdRrR\xfd\xbeI\xfd\xfd\xfdd\x08^\xfd\xfd\'\xfd\xfdK\x1a\xfdh\xfd\xfdf\xfd9\xfd\xfdM\xfd\xfd7\xfd@\xfd\xfd\xd37-?\xfd\xfd(\\%JJE\xdd\x15\xfdxt\xfd3y\xfd\xfd4\xfdK\xfdb\n\x1cvS\xfd\xf6no\xfd\xfdh\xfd\xfds\xfd~\xfdq\x1f+\xfd0\xfd\x136}e}\x1c?p;g\x07\xfdj\xcb\xfdxn~\xfd>4\xfd\xfdvAj}\x1d\x03;\xfd\xfdj\x13\xfd8\xfdY\xfd\xfd\xfd\xfdW\xfd\xfd\xc1G\xfd4C\xfd\xfds\xfd\x0e\xfd\xfd\x1b\xfd\xfd\xfd)\x15\xfd?m\xfd.pe\xfddxO\xfd\xdf\xfd\xfdj\xfd\xfd:\xfd\xfd\x05\xfdL\xdeM\xfd>\xfd=\x07\xfd\x19-?F[\xfdU\xfd\xa1\xfd=^\xfdm\xfd\x13\xfd\xd4^\xfd\xfd@V\x01Y\rd\x0b \xfd\x02\xfdk \xfd\x01\xfd7@\xfdr$\xfd\xfd\x11BG\xfd\x1d!x\xfd\xfd\x11\xfdG\xfd\x1f!\xfd\xfd\x08JDPB\xfd\x1e"(\x11A\xfd\x08JDP"\xfd\x12\x11\xfd\xfd\xfdD\x04\x15"\xfd\x10A\x05}>\x11A\xfd\x08*DP!\xfd\n\x11T\xfd\xfdB\x045"\xfd\x11A\xfd\x08j\xfd\x0f\xfd\x08jDP#\xfd\x1a\x11\x08\xfdF\x04\x0bD\xfd \xfdv\xfd~\xfdw\xfd4\xfd\x1bo98\xfd\x142\xfd\xfd\xfdh\xfd\xfdg\x94:\x1e\xfd\xfd,\xfd\x17q\xfd\x06\xfd\xfd\xfd8\xfd\n\x00\x00'

seeing lots of wrong characters become \xfd, looks like someone is treating it as unicode, let me try to find out which part is this happening.

@dwoctor
Copy link
Contributor Author

dwoctor commented Mar 10, 2022

@fffonion is there any news?

@StarlightIbuki
Copy link
Contributor

StarlightIbuki commented Mar 29, 2022

I've tested with python pdk with:

import kong_pdk.pdk.kong as kong

Schema = (
    { "message": { "type": "string" } },
)
version = '0.1.0'
priority = 0
class Plugin(object):
    def __init__(self, config):
        self.config = config
    def access(self, kong: kong.kong):
        a = kong.request.get_raw_body()
        kong.log(a)

and a random binary string as request body:

\x00\xff\xfa\xca\xbb\xcf

I got:

2022/03/29 15:44:41 [error] 10209#0: *4512 [kong] mp_rpc.lua:311 [test] no data, client: 127.0.0.1, server: kong, request: "GET / HTTP/1.1", host: "localhost:8000"

It doesn't seem related to gzip or js. Any binary request body could reproduce this behavior.

@dwoctor
Copy link
Contributor Author

dwoctor commented Mar 29, 2022

The behaviour I was looking at was on the service response side not the request.
But it's interesting you found problems on request request side in the case of the python pdk.
Do you think this is a common issue between the PDKs? i.e. javascript, python, go

@StarlightIbuki
Copy link
Contributor

StarlightIbuki commented Mar 29, 2022

The behaviour I was looking at was on the service response side not the request. But it's interesting you found problems on request request side in the case of the python pdk. Do you think this is a common issue between the PDKs? i.e. javascript, python, go

I have just gone through the code for a while. We are using msgpack for RPC, and its default encoding of strings is "string_compact". A reasonable guess is that, js and go interpreter all string as encoded by UTF-8 or something, and all RPC calls (implemented with msgpack) is influenced by this behavior.

So you can expect this behavior also appear in headers, get_ctx, ..., basically any binary data of PDK.

I will test go PDK later(it uses protobuf, not msgpack).

@StarlightIbuki
Copy link
Contributor

I've tested on go PDK and it also uses msgpack. However, it doesn't have the same problem. The reason seems to be that golang can handle arbitrary bytes in a string(just like what Lua does).

@StarlightIbuki
Copy link
Contributor

The return value type for get_raw_body(or GetRawBody) is str, string, string, and promise for python, lua, go and js respectively.

Lua and Golang's strings are sure to be capable to hold arbitrary binary data;
The documentation of Python states that strings are immutable sequences of Unicode code, so by definition, it can not handle this;
The documentation of Js says strings are arrays of 16-bit integers(and potentially utf16 encoded), therefore js also fails to represent arbitrary binary data.

This problem is due to the design of interface.

@StarlightIbuki
Copy link
Contributor

We have 2 solutions:

  1. When decoding returned value, we try to decode it as a textual string and only return a binary string when this fails. This way we preserve most compatibility;
  2. Just make a breaking change. We tag the exact type of return value. This is more elegant than 1, especially because the string return type is a mistake.

We decide to apply solution 2.

However, there are still decisions to make for 2:

  1. Technically headers content can be octets, which may not necessarily be decodable. Should we tag it as binary or not? (most of the time it should be a string)
  2. How do we tag compound types, E.g. get_headers returns table?

Anyway, we will migrate to protobuf, it will not affect too much. So we can simply leave those behaviors unchanged, except for fixing the raw_body.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants