Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

response Transfer-Encoding: chunked error #9

Closed
LuPan2015 opened this issue May 10, 2019 · 25 comments
Closed

response Transfer-Encoding: chunked error #9

LuPan2015 opened this issue May 10, 2019 · 25 comments

Comments

@LuPan2015
Copy link

error info
java.lang.IllegalStateException: Invalid chunk-size (too big, more than 4 hex-digits) at rawhttp.core.body.ChunkedBodyParser.readChunkSize(ChunkedBodyParser.java:215) at rawhttp.core.body.ChunkedBodyParser.parseChunkedBody(ChunkedBodyParser.java:97) at rawhttp.core.body.BodyConsumer$ChunkedBodyConsumer.consumeInto(BodyConsumer.java:93) null at rawhttp.core.body.BodyConsumer.consume(BodyConsumer.java:29) at rawhttp.core.body.EagerBodyReader.<init>(EagerBodyReader.java:35) at rawhttp.core.body.LazyBodyReader.eager(LazyBodyReader.java:68) at rawhttp.core.EagerHttpResponse.from(EagerHttpResponse.java:52) at rawhttp.core.RawHttpResponse.eagerly(RawHttpResponse.java:95) at rawhttp.core.RawHttpResponse.eagerly(RawHttpResponse.java:79)

@renatoathaydes
Copy link
Owner

Where did you get this error from?
Please provide information. I can't fix things by looking at an error stacktrace only.

@WangGaofei
Copy link

We are trying to parse google http response with chunk and gzip encoding, but what we get is
java.lang.IllegalStateException: Invalid chunk-size (too big, more than 4 hex-digits)
at rawhttp.core.body.ChunkedBodyParser.readChunkSize(ChunkedBodyParser.java:215)
at rawhttp.core.body.ChunkedBodyParser.parseChunkedBody(ChunkedBodyParser.java:97)
at rawhttp.core.body.BodyConsumer$ChunkedBodyConsumer.consumeInto(BodyConsumer.java:93)
at rawhttp.core.body.BodyConsumer.consume(BodyConsumer.java:29)
at rawhttp.core.body.EagerBodyReader.(EagerBodyReader.java:35)
at rawhttp.core.body.LazyBodyReader.eager(LazyBodyReader.java:68)
at rawhttp.core.EagerHttpResponse.from(EagerHttpResponse.java:52)
at rawhttp.core.RawHttpResponse.eagerly(RawHttpResponse.java:95)
at rawhttp.core.RawHttpResponse.eagerly(RawHttpResponse.java:79)
at com.bfd.engine.ParserEngine.extract(ParserEngine.java:197)
at com.bfd.util.SazUtil.main(SazUtil.java:69)
raw/706_s.txt*
java.lang.IllegalStateException: Invalid chunk-size (too big, more than 4 hex-digits)
at rawhttp.core.body.ChunkedBodyParser.readChunkSize(ChunkedBodyParser.java:215)
at rawhttp.core.body.ChunkedBodyParser.parseChunkedBody(ChunkedBodyParser.java:97)
at rawhttp.core.body.BodyConsumer$ChunkedBodyConsumer.consumeInto(BodyConsumer.java:93)
at rawhttp.core.body.BodyConsumer.consume(BodyConsumer.java:29)
at rawhttp.core.body.EagerBodyReader.(EagerBodyReader.java:35)
at rawhttp.core.body.LazyBodyReader.eager(LazyBodyReader.java:68)
at rawhttp.core.EagerHttpResponse.from(EagerHttpResponse.java:52)
at rawhttp.core.RawHttpResponse.eagerly(RawHttpResponse.java:95)
at rawhttp.core.RawHttpResponse.eagerly(RawHttpResponse.java:79)
at com.bfd.engine.ParserEngine.extract(ParserEngine.java:197)
at com.bfd.util.SazUtil.main(SazUtil.java:69)

Please check the attachements below.

144_c.txt
144_s.txt

image

@WangGaofei
Copy link

We are using the latest version of RawHttp. It seems a perfect http library we love.
The issue is similar to actix/actix-web#674. But it is said that the issue in actix-web project is fixed.

org.apache.httpcomponents
httpclient
4.5.7


com.athaydes.rawhttp
rawhttp-core
2.1


com.athaydes.rawhttp
rawhttp-cli
1.1.1
zip

@renatoathaydes
Copy link
Owner

Ok... This is indeed a strange chunk Google is sending... my interpretation of the spec, though, seems to have been incorrect now that I look into it more carefully.

RawHTTP does not allow chunk-sizes bigger than FFFF (65535 bytes), so it expects at most 4 hexa-decimal characters for chunk-size.
However, the spec does not actually mention this and Google's message seems to be valid!

I will change the code as soon as possible to fix this... It's an easy fix, so if you have urgency, you could probably do it yourselves for now.

@WangGaofei
Copy link

We are trying to solve the issue but in vain. Now we have no idea. I guess Google uses async request, and 0000001 means Async Not Ready.

@renatoathaydes
Copy link
Owner

@WangGaofei fork this project, change this method to stop reading the chunk-size when it finds either a new line or ;: https://github.com/renatoathaydes/rawhttp/blob/master/rawhttp-core/src/main/java/rawhttp/core/body/ChunkedBodyParser.java#L172

Can you do that?

@WangGaofei
Copy link

I am not sure what does 00000001 and 0001 and 1 and 9 mean in the file. Or we can skip the section.
I know why this exception raised. But I don't understand the chunk format that Google generated.
As a result, I can modify the code like that, but not guarantee that data is correct.

@renatoathaydes
Copy link
Owner

What you want to do is parse the hex String up to either new-line or ; as a Hexadecimal number.

So, 00000001 means 1 in decimal. Something like 000000FF would be 255, and so on... In Java this just means: Integer.parseInt(hexaString, 16);.

@renatoathaydes
Copy link
Owner

I've fixed the issue in the dev branch. Would appreciate if you could test if it works for you now, before I release the change @WangGaofei @LuPan2015 .

@WangGaofei
Copy link

WangGaofei commented May 10, 2019

GOOD JOB. You are very nice and efficient. I need to go to Tianjin this morning. After that I will have a test. Thank you.@renatoathaydes

@renatoathaydes
Copy link
Owner

No problem! Happy to help. Have a good trip :)

@WangGaofei
Copy link

Info update: File 144_c.txt is request, file 144_s.txt is response. Join 2 files with /r/n, then pass it into RawHttp. These files are generated in .saz format using fiddler.

@LuPan2015
Copy link
Author

@renatoathaydes Thanks, I have gained the dev branch the latest code and run successfully.But I get the response of the body data is garbled.
My test code is as follows:
`

    String requestTxt = "src/main/resources/chunck/144_c.txt";
    String responseTxt = "/src/main/resources/chunck/144_s.txt";

    InputStream requestIn = new FileInputStream(new File(requestTxt));
    InputStream responseIn = new FileInputStream(new File(responseTxt));

    ByteArrayOutputStream bos = new ByteArrayOutputStream();
    byte data[] = new byte[4096];

    int count;
    while ((count = requestIn.read(data, 0, data.length)) != -1) {
        bos.write(data, 0, count);
    }
    bos.write("\r\n".getBytes("UTF-8"));

    while ((count = responseIn.read(data, 0, data.length)) != -1) {
        bos.write(data, 0, count);
    }

    RawHttpRequest request = null;
    RawHttpResponse response = null;
    RawHttp rawHttp = new RawHttp();
    ByteArrayInputStream bis = new ByteArrayInputStream(bos.toByteArray());
    request = rawHttp.parseRequest(bis).eagerly();
    response = rawHttp.parseResponse(bis).eagerly();
    System.out.println(response.getBody().toString());`

Get the reponse of body data is as follows:
image

I don't know what reason is this。

@renatoathaydes
Copy link
Owner

Can you try String textBody = response.getBody().get().decodeBodyToString(UTF_8); and print that? If the body is gzipped, then you need to decode it.

@LuPan2015
Copy link
Author

LuPan2015 commented May 11, 2019

@renatoathaydes
OK, I try to use the decodeBodyToString method.Seems still gibberish.My code is as follows:
`
String requestTxt = "/src/main/resources/chunck/144_c.txt";
String responseTxt = "/src/main/resources/chunck/144_s.txt";

InputStream requestIn = new FileInputStream(new File(requestTxt));
InputStream responseIn = new FileInputStream(new File(responseTxt));

ByteArrayOutputStream bos = new ByteArrayOutputStream();
byte data[] = new byte[4096];

int count;
while ((count = requestIn.read(data, 0, data.length)) != -1) {
    bos.write(data, 0, count);
}
bos.write("\r\n".getBytes("UTF-8"));

while ((count = responseIn.read(data, 0, data.length)) != -1) {
    bos.write(data, 0, count);
}

RawHttpRequest request = null;
RawHttpResponse response = null;
RawHttp rawHttp = new RawHttp();
ByteArrayInputStream bis = new ByteArrayInputStream(bos.toByteArray());
request = rawHttp.parseRequest(bis).eagerly();
response = rawHttp.parseResponse(bis).eagerly();
System.out.println(response.getBody().get()
            .decodeBodyToString(Charset.forName("UTF-8")));`

The results are as follows:
image

@renatoathaydes
Copy link
Owner

Can you send me the response body you're having trouble with? Also the response headers so I know the encoding used.

@LuPan2015
Copy link
Author

LuPan2015 commented May 11, 2019

Ok. The request header data is in the file 144_c.txt
, and the request body data is in the file 144_s.txt.
You can directly run the following test code to see the results:

String requestTxt = "/src/main/resources/chunck/144_c.txt";
String responseTxt = "/src/main/resources/chunck/144_s.txt";
InputStream requestIn = new FileInputStream(new File(requestTxt));
InputStream responseIn = new FileInputStream(new File(responseTxt));

ByteArrayOutputStream bos = new ByteArrayOutputStream();
byte data[] = new byte[4096];

int count;
while ((count = requestIn.read(data, 0, data.length)) != -1) {
    bos.write(data, 0, count);
}
bos.write("\r\n".getBytes("UTF-8"));

while ((count = responseIn.read(data, 0, data.length)) != -1) {
    bos.write(data, 0, count);
}

RawHttpRequest request = null;
RawHttpResponse response = null;
RawHttp rawHttp = new RawHttp();
ByteArrayInputStream bis = new ByteArrayInputStream(bos.toByteArray());
request = rawHttp.parseRequest(bis).eagerly();
response = rawHttp.parseResponse(bis).eagerly();
System.out.println(response.getBody().get()
            .decodeBodyToString(Charset.forName("UTF-8")));

thanks!

@renatoathaydes
Copy link
Owner

Ok, I found the problem.

The response uses the following relevant headers:

Transfer-Encoding: chunked
Content-Encoding: gzip

RawHTTP cares only about the transfer encoding because that's within the realm of transferring the HTTP message - which RawHTTP is responsible for doing. The Content-Encoding is not used because that's an encoding addressed to the receiver of the message instead.

The spec says, further: _ Typically, the representation is only decoded just prior to rendering
or analogous usage._

So, I believe the behaviour RawHTTP is presenting is correct in not unzipping the message body... however, that's not very helpful, I agree... so I think the best solution to this problem is to make the decodeBodyToString and decodeBody methods actually decode according to both the Transfer-Encoding and Content-Encoding headers... and add new methods for maintaining the representation encoding but not the transfer encoding, perhaps unframeBodyToString and unframeBody because the transfer-encoding can be seen, I believe, as part of framing the body.

@LuPan2015
Copy link
Author

Thank you gave me answer over the weekend.This problem I also don't know how to deal with, can you give me a solution for the sample code?
thank you

@LuPan2015
Copy link
Author

@renatoathaydes
The problem solved.After I get the response body, one by one the chunck of decompression.But I think the best solution is realized in decodeBodyToString method is better.What time is convenient for you of the dev branch and publish to the maven repository?Thank you, have a nice weekend.

@renatoathaydes
Copy link
Owner

You can get your data for now by decompressing the body:

byte[] body = response.getBody().get().decodeBody();

java.io.ByteArrayInputStream bytein = new java.io.ByteArrayInputStream(body);
java.util.zip.GZIPInputStream gzin = new java.util.zip.GZIPInputStream(bytein);
java.io.ByteArrayOutputStream byteout = new java.io.ByteArrayOutputStream();

int res = 0;
byte buf[] = new byte[1024];
while (res >= 0) {
    res = gzin.read(buf, 0, buf.length);
    if (res > 0) {
        byteout.write(buf, 0, res);
    }
}
byte[] uncompressed = byteout.toByteArray();

System.out.println(new String(uncompressed, StandardCharsets.UTF_8));

@LuPan2015
Copy link
Author

LuPan2015 commented May 13, 2019

@renatoathaydes Because the data is according to block transmission, and it cannot be directly extract.
My test code is:

byte[] bytes = response.getBody().get().asRawBytes();
String data =uncompress(bytes,true,"UTF-8");
/**
     * 是否需要通过chunk方式解压缩
     * @param str
     * @param outEncode
     * @param isChunk
     * @return
     */
    public static String uncompress(byte[] str,boolean isChunk,String outEncode)throws Exception{
        if (isChunk){
            ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();

            int chunkSize = 0;
            boolean start = false; //是否开始读取数据
            byte[] data = null; //实际数据
            int index = 0; //偏移量
            StringBuilder tmpChunckSize = new StringBuilder("0");

            int t = 0;

            //chunk方式需要先保存数据,然后一起解压即可
            for (int i=0;i <= str.length;i++){
                int b = str[i];
                if (start){ //还在读写范围内
                    if (index < chunkSize){
                        data[index++] = str[i];
                    }else {
                        //数据长度达到
                        start = false;
                        byteArrayOutputStream.write(data); //保存数据流
                        data = null;
                        chunkSize = 0;
                        index = 0;
                    }
                }else {
                    if (b == 10 && t == 13){ //t: \r str[i]=\n
                        chunkSize = Integer.valueOf(tmpChunckSize.toString(),16);
                        data = new byte[chunkSize];
                        tmpChunckSize = new StringBuilder("0"); //清空长度
                        if (chunkSize > 0){
                            start = true;
                        }
                    }else {
                        if (b !=13 && b >= 48 && b <= 122){ //排除\r
                            tmpChunckSize.append((char) str[i]);
                        }
                        if (t == 10 && b == 48){ //如果为0
                            break;
                        }
                    }
                }

                t = str[i];

            }
            String sb = uncompress(byteArrayOutputStream.toByteArray(),"UTF-8");
            return sb;
        }else {
            return  uncompress(str,outEncode);
        }

    }

    /**
     * 通过字节进行解压,没有chunk的情况
     * @param str
     * @param outEncoding
     * @return
     * @throws IOException
     */
    public static String uncompress(byte[] str,String outEncode) throws IOException {
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        ByteArrayInputStream in = new ByteArrayInputStream(str);
        GZIPInputStream gunzip = new GZIPInputStream(in);
        byte[] buffer = new byte[256];
        int n;
        while ((n = gunzip.read(buffer)) >= 0) {
            out.write(buffer, 0, n);
        }
        return out.toString(outEncode);
    }

Have passed this test scheme。

@renatoathaydes
Copy link
Owner

The code I posted works... you're doing too much in your code that is not necessary.

@WangGaofei
Copy link

@renatoathaydes Thank you for your help. Your library is perfect and flexible. And we have made huge progress based on your library. We change the code a bit, it works like a charmer.

Please look at the changes. It will unchunk and decompress the data automatically. You can perform a new release, I think.

image

rawhttp.core.RawHttp

public FramedBody getFramedBody(StartLine startLine, RawHttpHeaders headers) {
	List<String> encodings = new ArrayList<String>();
	List<String> contentEncoding = headers.get("Content-Encoding", ",\\s*");
    List<String> transferEncodings = headers.get("Transfer-Encoding", ",\\s*");
    encodings.addAll(contentEncoding);
    encodings.addAll(transferEncodings);
    BodyDecoder bodyDecoder = new BodyDecoder(options.getEncodingRegistry(), encodings);

    boolean isChunked = !transferEncodings.isEmpty() &&
            transferEncodings.get(transferEncodings.size() - 1).equalsIgnoreCase("chunked");

    if (isChunked) {
        return new FramedBody.Chunked(bodyDecoder, metadataParser);
    }
    List<String> lengthValues = headers.get("Content-Length");
    if (lengthValues.isEmpty()) {
        if (startLine instanceof StatusLine) {
            // response has no message framing information available
            return new FramedBody.CloseTerminated(bodyDecoder);
        }
        // request body without framing is not allowed
        throw new InvalidMessageFrame("The length of the request body cannot be determined. " +
                "The Content-Length header is missing and the Transfer-Encoding header does not " +
                "indicate the message is chunked");
    }
    if (lengthValues.size() > 1) {
        throw new InvalidMessageFrame("More than one Content-Length header value is present");
    }
    long bodyLength;
    try {
        bodyLength = Long.parseLong(lengthValues.get(0));
    } catch (NumberFormatException e) {
        throw new InvalidMessageFrame("Content-Length header value is not a valid number");
    }
    return new FramedBody.ContentLength(bodyDecoder, bodyLength);
}

@renatoathaydes
Copy link
Owner

@LuPan2015 it's released now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants