Skip to content


Subversion checkout URL

You can clone with
Download ZIP


Body stored as binary for text/html content #64

cressie176 opened this Issue · 8 comments

2 participants


The following code will create a binary tape rather than one you can edit...

def "Tape body"() {

        HTTPBuilder http = new HTTPBuilder()

        http.get(uri: '')


This is because MemoryTape.isPrintable(...) is returning false, causing Betamax to use binary format. This behaviour might be correct but it seems a little odd since the page content type is "text/html"

Any idea if there is something genuinely unprintable in the response or is this a bug?


The BBC are being a bit rubbish and not declaring a charset on their Content-Type header. Without that it's easy to misinterpret the data. The servlet spec says that the default should be ISO-8859-1 but some sites will encode as UTF-8 and include multi-byte characters which will crash parsing as ISO-8859-1.

I can't see anything obvious on that page but I'll do some digging & see if I can figure out what's doing it.


Ok. Looking at the data it appears that it's really UTF-8. It occurs to me that it might make sense if instead of assuming ISO-8859-1 when there's no declared charset I assume UTF-8 because whilst you can get errors trying to interpret data that really is UTF-8 as ISO-8859-1 the reverse is not true as it's (AFAIK) a pure subset.

If I change AbstractMessage.DEFAULT_CHARSET to UTF-8 the page is recorded as text and doesn't cause any problems when played back.


What about having the ability to override the default charset in the tape options?


@cressie176 The problem with that is that you may have multiple things going on in that one tape


In principal is this not just a limitation of the current tape format? Do you not currently support a mixture of binary and text in the same tape?


Yes, you can have a mixture in the same tape. The issue is that the same tape can be used for multiple requests which might each have a different character encoding. If the @Betamax annotation specified the default encoding that would be across all requests in the tape & may be appropriate for some and not for others. See #52 for more on the fun & games that can ensure.

I think for anything that isn't standard ASCII or UTF-8 it would be madness not to declare a charset so falling back to UTF-8 as a default is probably the best option.


This should be fixed by 34cb4aa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.