Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple "Skipping impossibly large" errors when working with http_body data #80

Open
alexv-anderson-uw opened this issue Jan 9, 2020 · 2 comments

Comments

@alexv-anderson-uw
Copy link

I want to analyze the body of HTTP responses; however, I am seeing errors which say Skipping impossibly large 26003-byte #1 chunk, at offset 6/21013.

I can reproduce these errors when processing the http_get_reply_iframes.json.bz2 file provided in the samples directory using the following command:

bzcat http_get_reply_iframes.json.bz2 | dap json + select ip data + transform data=base64decode + decode_http_reply data + remove data data.http_raw_body + select ip + json

I am running DAP in Docker and mounting the samples directory. My Dockerfile is a duplicate of this repo's Dockerfile, but I removed the installation of MaxMind as it was throwing an error which I think is due to a licensing change...

How should I structure the DAP query to avoid the skipping?

@tsellers-r7
Copy link
Contributor

tsellers-r7 commented Jun 22, 2020

@alexv-anderson-uw - Thanks for the report, sorry for the delay. We'll take a look.

Simple reproducer with output data:

bzcat http_get_reply_iframes.json.bz2 | grep 173.45.72.243 | \
    dap json + select ip data + transform data=base64decode + \
   decode_http_reply data + remove data +  json | \
jq
Skipping impossibly large 26003-byte #1 chunk, at offset 6/21013

If you look at the body in that case (using the following command) you will see that the chunk size is 6593 in hex which is 26,003 bytes which is larger than the entire response (length 21013).
The record for 173.45.72.243 is still emitted by dap but the body value won't be populated or processed by later filters.

bzcat http_get_reply_iframes.json.bz2 | grep 173.45.72.243 | \
    dap json + select ip data + transform data=base64decode + \
    remove data.http_raw_body +  json | \
jq

@theblackturtle
Copy link

Hi @tsellers-r7, I have the same error too. I tried your way with the input is sonar.http response and the query is

wget -qO-  https://opendata.rapid7.com/sonar.http/2020-07-27-1595862118-http_get_80.json.gz | zcat | dap json + select host port data + transform data=base64decode + decode_http_reply data + remove data.http_raw_body + json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants