You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Mac OS sets the locale environment variable LC_CTYPE=UTF-8, which is not recognized on Linux. Linux would use en_US.UTF-8 or similar, or leave it unset and use LANG.
When Java starts up on Linux with the Mac OS LC_CTYPE=UTF-8, the Charsets.defaultCharset() is US-ASCII. This causes problems wherever the default character set is used: System.out, I/O streams without a specified character set, convenience classes like FileReader and FileWriter, etc.
In the case above, a FileWriter is used to output sorted DWCA data. With the mixed environment variables, that leads to the file being written in ASCII, and corrupted data.
In other words, gbif-common assumes a correctly configured UTF-8 environment.
The text was updated successfully, but these errors were encountered:
The commit improves the code (removing an encoding encoding assumption), and logs a warning if FileUtils is used where the default character set is ASCII.
I've also prevented the servers from accepting locale environment variables being set when accessed over SSH.
Reported in gbif/portal-feedback#3191, but also affecting other datasets.
Mac OS sets the locale environment variable
LC_CTYPE=UTF-8
, which is not recognized on Linux. Linux would useen_US.UTF-8
or similar, or leave it unset and useLANG
.When Java starts up on Linux with the Mac OS
LC_CTYPE=UTF-8
, theCharsets.defaultCharset()
isUS-ASCII
. This causes problems wherever the default character set is used:System.out
, I/O streams without a specified character set, convenience classes like FileReader and FileWriter, etc.In the case above, a FileWriter is used to output sorted DWCA data. With the mixed environment variables, that leads to the file being written in ASCII, and corrupted data.
In other words, gbif-common assumes a correctly configured UTF-8 environment.
The text was updated successfully, but these errors were encountered: