Skip to content
This repository

Body stored as binary for text/html content #64

Closed
cressie176 opened this Issue September 25, 2012 · 8 comments

2 participants

Stephen Cresswell Rob Fletcher
Stephen Cresswell

The following code will create a binary tape rather than one you can edit...

@Betamax(tape='bbc')
def "Tape body"() {

    setup:
        HTTPBuilder http = new HTTPBuilder()
        BetamaxRoutePlanner.configure(http.client)

    when:
        http.get(uri: 'http://www.bbc.co.uk/news')

    then:
        1
}

This is because MemoryTape.isPrintable(...) is returning false, causing Betamax to use binary format. This behaviour might be correct but it seems a little odd since the page content type is "text/html"

Any idea if there is something genuinely unprintable in the response or is this a bug?

Rob Fletcher

The BBC are being a bit rubbish and not declaring a charset on their Content-Type header. Without that it's easy to misinterpret the data. The servlet spec says that the default should be ISO-8859-1 but some sites will encode as UTF-8 and include multi-byte characters which will crash parsing as ISO-8859-1.

I can't see anything obvious on that page but I'll do some digging & see if I can figure out what's doing it.

Rob Fletcher

Ok. Looking at the data it appears that it's really UTF-8. It occurs to me that it might make sense if instead of assuming ISO-8859-1 when there's no declared charset I assume UTF-8 because whilst you can get errors trying to interpret data that really is UTF-8 as ISO-8859-1 the reverse is not true as it's (AFAIK) a pure subset.

If I change AbstractMessage.DEFAULT_CHARSET to UTF-8 the page is recorded as text and doesn't cause any problems when played back.

Stephen Cresswell

What about having the ability to override the default charset in the tape options?

Rob Fletcher

@cressie176 The problem with that is that you may have multiple things going on in that one tape

Stephen Cresswell

In principal is this not just a limitation of the current tape format? Do you not currently support a mixture of binary and text in the same tape?

Rob Fletcher

Yes, you can have a mixture in the same tape. The issue is that the same tape can be used for multiple requests which might each have a different character encoding. If the @Betamax annotation specified the default encoding that would be across all requests in the tape & may be appropriate for some and not for others. See #52 for more on the fun & games that can ensure.

I think for anything that isn't standard ASCII or UTF-8 it would be madness not to declare a charset so falling back to UTF-8 as a default is probably the best option.

Rob Fletcher

This should be fixed by 34cb4aa

Rob Fletcher robfletcher closed this September 25, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.