Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rozbitý encoding v OpenData_Slovakia_CovidAutomat.csv #51

Open
example-sk opened this issue Feb 9, 2021 · 5 comments
Open

Rozbitý encoding v OpenData_Slovakia_CovidAutomat.csv #51

example-sk opened this issue Feb 9, 2021 · 5 comments

Comments

@example-sk
Copy link

Dobrý deň, v súbore OpenData_Slovakia_CovidAutomat.csv sú názvy okresov trochu porozbíjané.

Napríklad okres Stará ?ubov?a alebo ?adca. V hex editore vidím všetky tie otáznikové znaky ako 3F (00111111), čo je naozaj otáznik v ASCII.

@neisor
Copy link

neisor commented Mar 4, 2021

+1 for this issue.

The wrong encoding can pose issues when someone will be using the raw data and filter them since some of the special characters in names of some cities/villages are misinterpreted.
In case of need, I can create a Python script which would just perform a simple search and replace automatically to the CSV file. But this would need to be started manually by the user which is not the best-case scenario.

@matejmisik if you (or your team), because of whatever reason, are unable to fix the data before uploading them here, on GitHub, please, let me know and I'll create a simple Python script to fix the data.

@sakonn
Copy link

sakonn commented Sep 16, 2021

Hello,

it seems that encoding is still malformed and not easily readable.
@neisor have you find a way to read this data automaticaly and reliably?

@achjaj
Copy link

achjaj commented Sep 19, 2021

(English text follows)

Zdravím,

dovolil som si spraviť veľmi jednoduchý python skrip, ktorý daný súbor opravý. Snáď niekomu pomôže.

Hello,

I made a very simple python script, which can repair the broken file. I hope that it can help someone.

@sakonn
Copy link

sakonn commented Sep 20, 2021

My solution is to convert OpenData_Slovakia_CovidAutomat.xlsx to csv through cloudconvert service. It's free and works perfectly.

# coding=UTF-8

import cloudconvert

api_key = 'XXXXXXX'
sandbox = False

cloudconvert.configure(api_key = api_key,sandbox = sandbox)

result = cloudconvert.Job.create(payload={
     "tasks": {
         'import-covid-data': {
              'operation': 'import/url',
              'url': 'https://github.com/Institut-Zdravotnych-Analyz/covid19-data/raw/main/OpenData_Slovakia_CovidAutomat.xlsx',
              'filename': 'OpenData_Slovakia_CovidAutomat.xlsx'
         },
         'convert-covid-data': {
             'operation': 'convert',
             'input': 'import-covid-data',
             'output_format': 'csv',
             'some_other_option': 'value'
         },
         'export-covid-data': {
             'operation': 'export/url',
             'input': 'convert-covid-data',
             'inline': False,
             'archive_multiple_files': False
         }
     }
 })

exported_url_task_id = result['tasks'][2]['id']
res = cloudconvert.Task.wait(id=exported_url_task_id) # Wait for job completion
file = res.get("result").get("files")[0]
res = cloudconvert.download(filename=file['filename'], url=file['url'])

Result is downloaded file: OpenData_Slovakia_CovidAutomat.csv without any encoding error.

@achjaj
Copy link

achjaj commented Sep 20, 2021

Btw, I also created Java wrapper around automat.gov.sk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants