Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HAR server replay missing content length in HEAD response #6547

Closed
zanieb opened this issue Dec 12, 2023 · 4 comments · Fixed by #6548
Closed

HAR server replay missing content length in HEAD response #6547

zanieb opened this issue Dec 12, 2023 · 4 comments · Fixed by #6548

Comments

@zanieb
Copy link
Contributor

zanieb commented Dec 12, 2023

Problem Description

When I use a HAR file for server response replay, my application fails with an error about missing content in the expected zip file response. It appears that the HEAD request's response does not properly include the content-length so the subsequent GET request is not made by the client? Perhaps there is another issue with chunked responses happening here.

With hardump:

[23:45:42.350][[::1]:56578] client connect
[::1]:56578: HEAD https://files.pythonhosted.org/packages/00/e5/f12a80907d0884e6dff9c16d0c0114d81b8cd07dc3ae54c5e962cc83037e/tqdm-4.66.1-py3-none-any.whl HTTP/2.0
    accept: */*
    user-agent: puffin
    accept-encoding: gzip, br

[replay] << HTTP/1.1 200 OK 0b
    last-modified: Thu, 10 Aug 2023 12:01:15 GMT
    etag: "a296c6e224c118b0d08cd77e8c08f4b1"
    x-amz-request-id: aeb4d3335548af85
    x-amz-id-2: aN65jxTFgNrlm8zEJMNdk7mYLYwUwTzh0
    x-amz-version-id: 4_z179c51e67f11a0ad8f6c0018_f10789ff3151435c8_d20230810_m113900_c005_v0501001_t0045_u01691667540984
    content-type: application/octet-stream
    cache-control: max-age=365000000, immutable, public
    accept-ranges: bytes
    date: Tue, 12 Dec 2023 05:45:42 GMT
    age: 2295687
    x-served-by: cache-iad-kcgs7200038-IAD, cache-stp9222-STP
    x-cache: HIT, HIT
    x-cache-hits: 21458, 104163
    x-timer: S1702358607.425873,VS0,VE0
    strict-transport-security: max-age=31536000; includeSubDomains; preload
    x-frame-options: deny
    x-xss-protection: 1; mode=block
    x-content-type-options: nosniff
    x-permitted-cross-domain-policies: none
    x-robots-header: noindex
    x-pypi-file-python-version: py3
    x-pypi-file-version: 4.66.1
    x-pypi-file-package-type: bdist_wheel
    x-pypi-file-project: tqdm
    content-length: 0

When I don't use HAR, you can see it sets the content length and there are subsequent requests:

[23:46:53.539][[::1]:56610] client connect
[::1]:56610: HEAD https://files.pythonhosted.org/packages/00/e5/f12a80907d0884e6dff9c16d0c0114d81b8cd07dc3ae54c5e962cc83037e/tqdm-4.66.1-py3-none-any.whl HTTP/2.0
    accept: */*
    user-agent: puffin
    accept-encoding: gzip, br

[replay] << HTTP/2.0 200 OK 0b
    last-modified: Thu, 10 Aug 2023 12:02:26 GMT
    etag: "a296c6e224c118b0d08cd77e8c08f4b1"
    x-amz-request-id: aeb4d3335548af85
    x-amz-id-2: aN65jxTFgNrlm8zEJMNdk7mYLYwUwTzh0
    x-amz-version-id: 4_z179c51e67f11a0ad8f6c0018_f10789ff3151435c8_d20230810_m113900_c005_v0501001_t0045_u01691667540984
    content-type: application/octet-stream
    cache-control: max-age=365000000, immutable, public
    accept-ranges: bytes
    date: Tue, 12 Dec 2023 05:46:53 GMT
    age: 2295687
    x-served-by: cache-iad-kcgs7200038-IAD, cache-stp9222-STP
    x-cache: HIT, HIT
    x-cache-hits: 21458, 104163
    x-timer: S1702358607.425873,VS0,VE0
    strict-transport-security: max-age=31536000; includeSubDomains; preload
    x-frame-options: deny
    x-xss-protection: 1; mode=block
    x-content-type-options: nosniff
    x-permitted-cross-domain-policies: none
    x-robots-header: noindex
    x-pypi-file-python-version: py3
    x-pypi-file-version: 4.66.1
    x-pypi-file-package-type: bdist_wheel
    x-pypi-file-project: tqdm
    content-length: 78258

[::1]:56610: GET https://files.pythonhosted.org/packages/00/e5/f12a80907d0884e6dff9c16d0c0114d81b8cd07dc3ae54c5e962cc83037e/tqdm-4.66.1-py3-none-any.whl HTTP/2.0
    range: bytes=61874-78257
    accept: */*
    user-agent: puffin

[replay] << HTTP/2.0 206 Partial Content 16.0k
    last-modified: Thu, 10 Aug 2023 12:02:26 GMT
    etag: "a296c6e224c118b0d08cd77e8c08f4b1"
    x-amz-request-id: aeb4d3335548af85
    x-amz-id-2: aN65jxTFgNrlm8zEJMNdk7mYLYwUwTzh0
    x-amz-version-id: 4_z179c51e67f11a0ad8f6c0018_f10789ff3151435c8_d20230810_m113900_c005_v0501001_t0045_u01691667540984
    content-type: application/octet-stream
    cache-control: max-age=365000000, immutable, public
    accept-ranges: bytes
    date: Tue, 12 Dec 2023 05:46:53 GMT
    age: 2295687
    x-served-by: cache-iad-kcgs7200038-IAD, cache-stp9222-STP
    x-cache: HIT, HIT
    x-cache-hits: 21458, 104164
    x-timer: S1702358607.435964,VS0,VE0
    strict-transport-security: max-age=31536000; includeSubDomains; preload
    x-frame-options: deny
    x-xss-protection: 1; mode=block
    x-content-type-options: nosniff
    x-robots-header: noindex
    access-control-allow-methods: GET, OPTIONS
    access-control-allow-headers: Range
    access-control-allow-origin: *
    x-pypi-file-python-version: py3
    x-pypi-file-version: 4.66.1
    x-pypi-file-package-type: bdist_wheel
    x-pypi-file-project: tqdm
    content-range: bytes 61874-78257/78258
    content-length: 16384

    0000000000 (truncated)

[::1]:56610: GET https://files.pythonhosted.org/packages/00/e5/f12a80907d0884e6dff9c16d0c0114d81b8cd07dc3ae54c5e962cc83037e/tqdm-4.66.1-py3-none-any.whl HTTP/2.0
    range: bytes=55094-61873
    accept: */*
    user-agent: puffin

[replay] << HTTP/2.0 206 Partial Content 6.6k
    last-modified: Thu, 10 Aug 2023 12:02:26 GMT
    etag: "a296c6e224c118b0d08cd77e8c08f4b1"
    x-amz-request-id: aeb4d3335548af85
    x-amz-id-2: aN65jxTFgNrlm8zEJMNdk7mYLYwUwTzh0
    x-amz-version-id: 4_z179c51e67f11a0ad8f6c0018_f10789ff3151435c8_d20230810_m113900_c005_v0501001_t0045_u01691667540984
    content-type: application/octet-stream
    cache-control: max-age=365000000, immutable, public
    accept-ranges: bytes
    date: Tue, 12 Dec 2023 05:46:53 GMT
    age: 2295687
    x-served-by: cache-iad-kcgs7200038-IAD, cache-stp9222-STP
    x-cache: HIT, HIT
    x-cache-hits: 21458, 104165
    x-timer: S1702358607.454024,VS0,VE0
    strict-transport-security: max-age=31536000; includeSubDomains; preload
    x-frame-options: deny
    x-xss-protection: 1; mode=block
    x-content-type-options: nosniff
    x-robots-header: noindex
    access-control-allow-methods: GET, OPTIONS
    access-control-allow-headers: Range
    access-control-allow-origin: *
    x-pypi-file-python-version: py3
    x-pypi-file-version: 4.66.1
    x-pypi-file-package-type: bdist_wheel
    x-pypi-file-project: tqdm
    content-range: bytes 55094-61873/78258
    content-length: 6780

    0000000000 (truncated)

[23:46:53.586][[::1]:56610] client disconnect
The content length appears to be correctly encoded in the HAR file (expand)
{
    "startedDateTime": "2023-12-12T05:23:27.346896+00:00",
    "time": 16.900062561035156,
    "request": {
        "method": "HEAD",
        "url": "https://files.pythonhosted.org/packages/00/e5/f12a80907d0884e6dff9c16d0c0114d81b8cd07dc3ae54c5e962cc83037e/tqdm-4.66.1-py3-none-any.whl",
        "httpVersion": "HTTP/2.0",
        "cookies": [],
        "headers": [
            {
                "name": "accept",
                "value": "*/*"
            },
            {
                "name": "user-agent",
                "value": "puffin"
            },
            {
                "name": "accept-encoding",
                "value": "gzip, br"
            }
        ],
        "queryString": [],
        "headersSize": 91,
        "bodySize": 0
    },
    "response": {
        "status": 200,
        "statusText": "",
        "httpVersion": "HTTP/2.0",
        "cookies": [],
        "headers": [
            {
                "name": "last-modified",
                "value": "Thu, 10 Aug 2023 11:39:00 GMT"
            },
            {
                "name": "etag",
                "value": "\"a296c6e224c118b0d08cd77e8c08f4b1\""
            },
            {
                "name": "x-amz-request-id",
                "value": "aeb4d3335548af85"
            },
            {
                "name": "x-amz-id-2",
                "value": "aN65jxTFgNrlm8zEJMNdk7mYLYwUwTzh0"
            },
            {
                "name": "x-amz-version-id",
                "value": "4_z179c51e67f11a0ad8f6c0018_f10789ff3151435c8_d20230810_m113900_c005_v0501001_t0045_u01691667540984"
            },
            {
                "name": "content-type",
                "value": "application/octet-stream"
            },
            {
                "name": "cache-control",
                "value": "max-age=365000000, immutable, public"
            },
            {
                "name": "accept-ranges",
                "value": "bytes"
            },
            {
                "name": "date",
                "value": "Tue, 12 Dec 2023 05:23:27 GMT"
            },
            {
                "name": "age",
                "value": "2295687"
            },
            {
                "name": "x-served-by",
                "value": "cache-iad-kcgs7200038-IAD, cache-stp9222-STP"
            },
            {
                "name": "x-cache",
                "value": "HIT, HIT"
            },
            {
                "name": "x-cache-hits",
                "value": "21458, 104163"
            },
            {
                "name": "x-timer",
                "value": "S1702358607.425873,VS0,VE0"
            },
            {
                "name": "strict-transport-security",
                "value": "max-age=31536000; includeSubDomains; preload"
            },
            {
                "name": "x-frame-options",
                "value": "deny"
            },
            {
                "name": "x-xss-protection",
                "value": "1; mode=block"
            },
            {
                "name": "x-content-type-options",
                "value": "nosniff"
            },
            {
                "name": "x-permitted-cross-domain-policies",
                "value": "none"
            },
            {
                "name": "x-robots-header",
                "value": "noindex"
            },
            {
                "name": "x-pypi-file-python-version",
                "value": "py3"
            },
            {
                "name": "x-pypi-file-version",
                "value": "4.66.1"
            },
            {
                "name": "x-pypi-file-package-type",
                "value": "bdist_wheel"
            },
            {
                "name": "x-pypi-file-project",
                "value": "tqdm"
            },
            {
                "name": "content-length",
                "value": "78258"
            }
        ],
        "content": {
            "size": 0,
            "compression": 0,
            "mimeType": "application/octet-stream",
            "text": ""
        },
        "redirectURL": "",
        "headersSize": 1187,
        "bodySize": 0
    },
    "cache": {},
    "timings": {
        "connect": 5.411863327026367,
        "ssl": 6.536245346069336,
        "send": 0.23412704467773438,
        "receive": 0.20503997802734375,
        "wait": 4.512786865234375
    },
    "serverIPAddress": "199.232.28.223"
},
{
    "startedDateTime": "2023-12-12T05:23:27.356972+00:00",
    "time": 6.1588287353515625,
    "request": {
        "method": "GET",
        "url": "https://files.pythonhosted.org/packages/00/e5/f12a80907d0884e6dff9c16d0c0114d81b8cd07dc3ae54c5e962cc83037e/tqdm-4.66.1-py3-none-any.whl",
        "httpVersion": "HTTP/2.0",
        "cookies": [],
        "headers": [
            {
                "name": "range",
                "value": "bytes=61874-78257"
            },
            {
                "name": "accept",
                "value": "*/*"
            },
            {
                "name": "user-agent",
                "value": "puffin"
            }
        ],
        "queryString": [],
        "headersSize": 90,
        "bodySize": 0
    },
    "response": {
        "status": 206,
        "statusText": "",
        "httpVersion": "HTTP/2.0",
        "cookies": [],
        "headers": [
            {
                "name": "last-modified",
                "value": "Thu, 10 Aug 2023 11:39:00 GMT"
            },
            {
                "name": "etag",
                "value": "\"a296c6e224c118b0d08cd77e8c08f4b1\""
            },
            {
                "name": "x-amz-request-id",
                "value": "aeb4d3335548af85"
            },
            {
                "name": "x-amz-id-2",
                "value": "aN65jxTFgNrlm8zEJMNdk7mYLYwUwTzh0"
            },
            {
                "name": "x-amz-version-id",
                "value": "4_z179c51e67f11a0ad8f6c0018_f10789ff3151435c8_d20230810_m113900_c005_v0501001_t0045_u01691667540984"
            },
            {
                "name": "content-type",
                "value": "application/octet-stream"
            },
            {
                "name": "cache-control",
                "value": "max-age=365000000, immutable, public"
            },
            {
                "name": "accept-ranges",
                "value": "bytes"
            },
            {
                "name": "date",
                "value": "Tue, 12 Dec 2023 05:23:27 GMT"
            },
            {
                "name": "age",
                "value": "2295687"
            },
            {
                "name": "x-served-by",
                "value": "cache-iad-kcgs7200038-IAD, cache-stp9222-STP"
            },
            {
                "name": "x-cache",
                "value": "HIT, HIT"
            },
            {
                "name": "x-cache-hits",
                "value": "21458, 104164"
            },
            {
                "name": "x-timer",
                "value": "S1702358607.435964,VS0,VE0"
            },
            {
                "name": "strict-transport-security",
                "value": "max-age=31536000; includeSubDomains; preload"
            },
            {
                "name": "x-frame-options",
                "value": "deny"
            },
            {
                "name": "x-xss-protection",
                "value": "1; mode=block"
            },
            {
                "name": "x-content-type-options",
                "value": "nosniff"
            },
            {
                "name": "x-robots-header",
                "value": "noindex"
            },
            {
                "name": "access-control-allow-methods",
                "value": "GET, OPTIONS"
            },
            {
                "name": "access-control-allow-headers",
                "value": "Range"
            },
            {
                "name": "access-control-allow-origin",
                "value": "*"
            },
            {
                "name": "x-pypi-file-python-version",
                "value": "py3"
            },
            {
                "name": "x-pypi-file-version",
                "value": "4.66.1"
            },
            {
                "name": "x-pypi-file-package-type",
                "value": "bdist_wheel"
            },
            {
                "name": "x-pypi-file-project",
                "value": "tqdm"
            },
            {
                "name": "content-range",
                "value": "bytes 61874-78257/78258"
            },
            {
                "name": "content-length",
                "value": "16384"
            }
        ],
        "content": {
            "size": 16384,
            "compression": 0,
            "mimeType": "application/octet-stream",
            "text": "oJXk4Pn9+zNUNfvAMb1jD1cyGfag9xosQqBN3efa4rfTZT9G+M9Hy4QUs4/..."
        }
    }
}

Reproduction

I can work on a reproduction in the mitmproxy test suite as well as producing an example extracted from my application.

Roughly:

mitmdump \
    -w "$path.dat" \
    --set stream_large_bodies=1000m \
    --set hardump="$path.har" \
    "~d pypi.org|files.pythonhosted.org|mitm.it"
# Request the wheel from PyPI
# TODO to write this with curl or script
# Also `path.dat` for comparison
mitmdump --server-replay "$path.har" \
    --flow-detail 3 \
    --server-replay-extra 500 \
    --set connection_strategy=lazy

System Information

Mitmproxy: 11.0.0.dev (+16, commit bda9c4e)
Python:    3.11.4
OpenSSL:   OpenSSL 3.1.4 24 Oct 2023
Platform:  macOS-14.0-arm64-arm-64bit
@zanieb zanieb added the kind/triage Unclassified issues label Dec 12, 2023
@zanieb
Copy link
Contributor Author

zanieb commented Dec 12, 2023

cc @stanleygvi — no pressure I just saw that you contributed this feature recently <3

xref #6368 and #6335

@zanieb
Copy link
Contributor Author

zanieb commented Dec 12, 2023

The following patch resolves this issue for me

diff --git a/mitmproxy/http.py b/mitmproxy/http.py
index 986e6b898..d53f345b8 100644
--- a/mitmproxy/http.py
+++ b/mitmproxy/http.py
@@ -378,7 +378,7 @@ class Message(serializable.Serializable):
             # don't set content-length if a transfer-encoding is provided
             pass
         else:
-            self.headers["content-length"] = str(len(self.raw_content))
+            self.headers.setdefault("content-length", str(len(self.raw_content)))
 
     def get_content(self, strict: bool = True) -> bytes | None:
         """

ref

This behavior looks like it was intentionally added in #4827 but I'm confused by it's different with HAR vs the binary storage.

@mhils
Copy link
Member

mhils commented Dec 12, 2023

Thanks for the detailed report! I think we should keep set_content as-is, we want

flow.request.content = flow.request.content.replace(b"a", b"bb")

to work in user addons. Many years ago we had it so that content-length wouldn't be updated, and that resulted in tons of bug reports.

Instead, we should fix HAR loading to not use Response.make, but construct a Response object manually so that we don't update the header automatically:
https://github.com/mitmproxy/mitmproxy/blob/10.1.5/mitmproxy/io/har.py#L92-L94. You can look at the implementation of Response.make to see how that works.

Contributions welcome!

@mhils mhils added kind/bug help wanted area/addons and removed kind/triage Unclassified issues labels Dec 12, 2023
@zanieb
Copy link
Contributor Author

zanieb commented Dec 12, 2023

Thanks for the pointers! I was thinking about this later and came to similar conclusions that we should not change the behavior of set_content — I'll put up a pull request that constructs the response manually.

mhils pushed a commit that referenced this issue Dec 12, 2023
#### Description

Closes #6547

Responses in flows constructed from HAR files were using the
`Response.make` utility which resulted in the injection of
`content-length` headers. When a `content-length` header existed
already, this could cause failures during replay.

#### Checklist

 - [x] I have updated tests where applicable.
 - [x] I have added an entry to the CHANGELOG.

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants