Description
I was baffled when my PDFs from JasperReports showed up blank. After a lot of head scratching, I found that if I wrote the PDF data to disk on the Java side, it was fine. But if I transferred the data as byte[] over the network link, it would get "wrinkles" on the way, rendering it useless. Wrinkles here meaning that certain bytes (0xD8-0xDF range) would instead be replaced with nulls (0x00). It took me one long sleepless night to track down this problem, but I finally nailed it. It happens on the Java side of things, when the response object is being written to the network socket. The UTF-8 encoder does not like certain characters which happen to be on the reserved range for UTF-16 surrogates which should always come in pairs. This causes the encoder to silently replace some characters with question marks, which in turn either silently corrupts the data or causes an exception on the Python side on decoding. I concluded that this was a protocol design flaw.
More information at: http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Surrogates