Data is corrupted when transferred as byte arrays from Java

I was baffled when my PDFs from JasperReports showed up blank. After a lot of head scratching, I found that if I wrote the PDF data to disk on the Java side, it was fine. But if I transferred the data as byte[] over the network link, it would get "wrinkles" on the way, rendering it useless. Wrinkles here meaning that certain bytes (0xD8-0xDF range) would instead be replaced with nulls (0x00). It took me one long sleepless night to track down this problem, but I finally nailed it. It happens on the Java side of things, when the response object is being written to the network socket. The UTF-8 encoder does not like certain characters which happen to be on the reserved range for UTF-16 surrogates which should always come in pairs. This causes the encoder to silently replace some characters with question marks, which in turn either silently corrupts the data or causes an exception on the Python side on decoding. I concluded that this was a protocol design flaw.

More information at: http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Surrogates


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data is corrupted when transferred as byte arrays from Java #91

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Data is corrupted when transferred as byte arrays from Java #91

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions