Some input files are encoded as UTF-8, but include wrongly encoded data that have to be interpreted correctly.
We internally faced this issue already in the past, we should apply the same fix here too.
Working solution from the geoname_enrichment script:
https://github.com/MetaBelgica/geoname-enrichment/blob/d94851e9ae81e06a45ee2f53d4b8e952fe0182f7/geoname_enrichment/utils.py#L2-L23