Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't silently fix invalid JSON with duplicate keys #1133

Closed
1 task done
annettejanewilson opened this issue Sep 1, 2021 · 6 comments · Fixed by #1163
Closed
1 task done

Don't silently fix invalid JSON with duplicate keys #1133

annettejanewilson opened this issue Sep 1, 2021 · 6 comments · Fixed by #1163
Labels
enhancement New feature or enhancement planned Solution is being worked on

Comments

@annettejanewilson
Copy link

annettejanewilson commented Sep 1, 2021

Checklist

  • I've searched for similar feature requests.

What enhancement would you like to see?

When a server returns invalid JSON, the HTTPie formatter should either leave it untouched or fail loudly. As it stands, if the JSON contains duplicate keys, HTTPie will quietly remove the duplicates during formatting.

What problem does it solve?

When debugging broken services, it's important that I can trust HTTPie to faithfully show me what the server returned, so that I focus on figuring out what's going wrong with my services and not have to spend that time thinking about HTTPie.

From my point of view it's helpful for HTTPie to format and color text, I'd rather it didn't sort keys by default but I know that I can reconfigure that, but it's a problem if HTTPie is removing anything that isn't non-significant whitespace. If HTTPie is to be a general purpose tool it is important that it doesn't mask problems.

Provide any additional information, screenshots, or code examples below:

The below example is my motivation for this feature. This server is returning invalid JSON with duplicate keys in the "dps" object. But I could not tell it was doing this, because HTTPie had silently "corrected" the output during formatting. You can see that if I turn off formatting and only colorize the output, I see the true output.

annettewilson@ANNEWILS02M:~/scratch/undercity$ http GET 'http://opentsdb.skymetrics.prod.skyscanner.local:4242/api/query' m=='zimsum:prod.undercity.search_cycle.meter.count_delta{}{dataCenter=literal_or(EU_WEST_1),endpoint=literal_or(search),from_bot=literal_or(false)}' start==1630064720 end==1630064727
HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 232
Content-Type: application/json;charset=utf-8
Date: Wed, 01 Sep 2021 09:07:45 GMT
Vary: Accept-Encoding
X-OpenTSDB-Query-Complexity: 318

[
    {
        "aggregateTags": [
            "ip",
            "traffic_source"
        ],
        "dps": {
            "1630064726": 6.0
        },
        "metric": "prod.undercity.search_cycle.meter.count_delta",
        "tags": {
            "dataCenter": "EU_WEST_1",
            "endpoint": "search",
            "from_bot": "false"
        }
    }
]


annettewilson@ANNEWILS02M:~/scratch/undercity$ http GET 'http://opentsdb.skymetrics.prod.skyscanner.local:4242/api/query' m=='zimsum:prod.undercity.search_cycle.meter.count_delta{}{dataCenter=literal_or(EU_WEST_1),endpoint=literal_or(search),from_bot=literal_or(false)}' start==1630064720 end==1630064727 --pretty colors
HTTP/1.1 200 OK
Content-Type: application/json;charset=utf-8
Date: Wed, 01 Sep 2021 09:07:59 GMT
Vary: Accept-Encoding
X-OpenTSDB-Query-Complexity: 318
Content-Length: 232
Connection: keep-alive

[{"metric":"prod.undercity.search_cycle.meter.count_delta","tags":{"endpoint":"search","dataCenter":"EU_WEST_1","from_bot":"false"},"aggregateTags":["ip","traffic_source"],"dps":{"1630064726":5.0,"1630064726":3.0,"1630064726":6.0}}]
@annettejanewilson annettejanewilson added enhancement New feature or enhancement new Needs triage. Comments are welcome! labels Sep 1, 2021
@jakubroztocil
Copy link
Member

jakubroztocil commented Sep 1, 2021

Thanks for the thorough description. I agree that the current behavior is not acceptable for the formatted output use case. It’s the default behavior of the underlying library we use.

We should customize the JSON loader/serializer we use for formatting to allow repeated keys. Ideally, we’d also extend the JSON lexer to mark repeated keys as errors, but that might be too big of a project for such a rare scenario.

@jakubroztocil jakubroztocil removed the new Needs triage. Comments are welcome! label Sep 1, 2021
@annettejanewilson
Copy link
Author

annettejanewilson commented Sep 1, 2021

Oh, it's thornier than I realised. Apparently the JSON specification doesn't require keys to be unique. It just leaves it up to JSON processors. 😱 Still, I don't think that remove the onus from HTTPie to do something sensible. It just makes it harder. It looks like it's super easy to parse using object_pairs_hook=multidict.MultiDict and largely impossible to serialise without rewriting the json library. I guess you could at least use that to detect such a case and fall back to something safe.

@jakubroztocil
Copy link
Member

jakubroztocil commented Sep 1, 2021

Interesting that JSON doesn’t prohibit duplicate keys. But it makes sense, though. It’s valid on the language syntax level, and semantically it could be explained as following the JS object notation behavior where the latest occurrence overwrites any earlier ones.

In any case, we shouldn’t alter the data for display. object_pairs_hook=multidict.MultiDict might be a good start. Cc @BoboTiG

@BoboTiG BoboTiG added the planned Solution is being worked on label Sep 1, 2021
@BoboTiG
Copy link
Contributor

BoboTiG commented Sep 1, 2021

@annettejanewilson Using a custom multidict, we can achieve that output:

{
    "dps": {
        "1630064726": [
            5.0,
            3.0,
            6.0
        ]
    }
}

It is still not the same output as for --pretty=colors in the sens we loose the repeated lines. Still, does such output would help in your case?

@jakubroztocil
Copy link
Member

jakubroztocil commented Sep 1, 2021

The data should not be interpreted/transformed in any way other than the documented one. So this doesn’t work.

@BoboTiG
Copy link
Contributor

BoboTiG commented Sep 2, 2021

OK I have a working patch.

Default behaviour (sorted):

[
    {
        "aggregateTags": [
            "ip",
            "traffic_source"
        ],
        "dps": {
            "1630064726": 3.0,
            "1630064726": 5.0,
            "1630064726": 6.0
        },
        "metric": "prod.undercity.search_cycle.meter.count_delta",
        "tags": {
            "dataCenter": "EU_WEST_1",
            "endpoint": "search",
            "from_bot": "false"
        }
    }
]

With --unsorted:

[
    {
        "metric": "prod.undercity.search_cycle.meter.count_delta",
        "tags": {
            "endpoint": "search",
            "dataCenter": "EU_WEST_1",
            "from_bot": "false"
        },
        "aggregateTags": [
            "ip",
            "traffic_source"
        ],
        "dps": {
            "1630064726": 5.0,
            "1630064726": 3.0,
            "1630064726": 6.0
        }
    }
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or enhancement planned Solution is being worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants