Just spotted this in the apache error log DBD::mysql::st execute failed: Incorrect string value: '\xC6\xC6\xBD\xE2\xBA\xF3...' for column 'requester_user_agent' at row 1 at /usr/share/eprints/perl_lib/EPrints/Database.pm line 1184.
Not quite sure what this is meant to be - but it doesn't go into the database cleanly!
Might need to do some form of sanity check on the User-Agent.
The text was updated successfully, but these errors were encountered:
From what I've read and what I understand, there shouldn't be a problem sanitising rubbish by C-style backslash-escaping the raw octets. I don't mind the database literally holding the string "\\xc6\\xc6\\xbd\\xe2\\xba\\xf3\\xb5\\xc4". That would work if it was non-ASCII UTF-8, too \xf0\x9f\x98\x96
It appears that to get octets, we needed the encode function:
$octets = encode('UTF-8', $characters, Encode::FB_CROAK); (https://perldoc.perl.org/Encode.html)
However, having tried various combination of encode, decode, utf8, gb2312, I cannot get it to show octets on my system (corrupted characters saved in my latin1 encoded mysql)
I used the following to simulate a ua string:
$ curl -A "破解后的" http://testgithub.local/id/eprint/77/
The problem could also be that my terminal is encoding the string differently than the browser in the field.