-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rozbitý encoding v OpenData_Slovakia_CovidAutomat.csv #51
Comments
+1 for this issue. The wrong encoding can pose issues when someone will be using the raw data and filter them since some of the special characters in names of some cities/villages are misinterpreted. @matejmisik if you (or your team), because of whatever reason, are unable to fix the data before uploading them here, on GitHub, please, let me know and I'll create a simple Python script to fix the data. |
Hello, it seems that encoding is still malformed and not easily readable. |
My solution is to convert OpenData_Slovakia_CovidAutomat.xlsx to csv through cloudconvert service. It's free and works perfectly. # coding=UTF-8
import cloudconvert
api_key = 'XXXXXXX'
sandbox = False
cloudconvert.configure(api_key = api_key,sandbox = sandbox)
result = cloudconvert.Job.create(payload={
"tasks": {
'import-covid-data': {
'operation': 'import/url',
'url': 'https://github.com/Institut-Zdravotnych-Analyz/covid19-data/raw/main/OpenData_Slovakia_CovidAutomat.xlsx',
'filename': 'OpenData_Slovakia_CovidAutomat.xlsx'
},
'convert-covid-data': {
'operation': 'convert',
'input': 'import-covid-data',
'output_format': 'csv',
'some_other_option': 'value'
},
'export-covid-data': {
'operation': 'export/url',
'input': 'convert-covid-data',
'inline': False,
'archive_multiple_files': False
}
}
})
exported_url_task_id = result['tasks'][2]['id']
res = cloudconvert.Task.wait(id=exported_url_task_id) # Wait for job completion
file = res.get("result").get("files")[0]
res = cloudconvert.download(filename=file['filename'], url=file['url']) Result is downloaded file: OpenData_Slovakia_CovidAutomat.csv without any encoding error. |
Btw, I also created Java wrapper around automat.gov.sk. |
Dobrý deň, v súbore OpenData_Slovakia_CovidAutomat.csv sú názvy okresov trochu porozbíjané.
Napríklad okres Stará ?ubov?a alebo ?adca. V hex editore vidím všetky tie otáznikové znaky ako 3F (00111111), čo je naozaj otáznik v ASCII.
The text was updated successfully, but these errors were encountered: