Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

it shows me this error LibrdataError: Unable to convert string to the requested encoding (invalid byte sequence) #64

Open
69hed opened this issue Jan 28, 2021 · 12 comments
Labels
enhancement New feature or request waiting for librdata changes the issue needs some fixes to the C library librdata before it can be solved

Comments

@69hed
Copy link

69hed commented Jan 28, 2021

I want to open below dataset in python, but it keeps showing me an error. The codes are:

  import pyreadr
  result = pyreadr.read_r(r"~/Desktop/review2020.rda")
  print(result.keys())
  df1 = result["df1"]

The error:
~/opt/anaconda3/lib/python3.8/site-packages/pyreadr/pyreadr.py in read_r(path, use_objects, timezone)
46 if not os.path.isfile(path):
47 raise PyreadrError("File {0} does not exist!".format(path))
---> 48 parser.parse(path)
49
50 result = OrderedDict()

~/opt/anaconda3/lib/python3.8/site-packages/pyreadr/librdata.pyx in pyreadr.librdata.Parser.parse()

~/opt/anaconda3/lib/python3.8/site-packages/pyreadr/librdata.pyx in pyreadr.librdata.Parser.parse()

LibrdataError: Unable to convert string to the requested encoding (invalid byte sequence) #

How I can fix this?

@ofajardo
Copy link
Owner

as suggested in the issue template, please include a file (with no sensitive data) so that I can reproduce the issue. If I cannot reproduce the issue I cannot fix it.

@69hed
Copy link
Author

69hed commented Jan 28, 2021 via email

@ofajardo
Copy link
Owner

I can't access the file, it gives me an error. Please zip it and drag and drop here directly.

@69hed
Copy link
Author

69hed commented Jan 28, 2021 via email

@ofajardo
Copy link
Owner

After signing in it keeps me giving a permission denied error. Please attach the file here in github (you need to zip it not to reduce the size, but because github accepts zip files) or look for another way to share it.

@ofajardo ofajardo reopened this Jan 29, 2021
@69hed
Copy link
Author

69hed commented Jan 31, 2021 via email

@ofajardo
Copy link
Owner

ofajardo commented Feb 1, 2021

I managed to download the file and reproduce the error. Reading the first bytes of the file I got this:

b'RDX3\nX\n\x00\x00\x00\x03\x00\x03\x06\x01\x00\x03\x05\x00\x00\x00\x00\x06CP1252\x00'

I think CP1252 is the encoding, meaning Windows-1252. Right now as indicated in the Known limitations section of the README of this repo, pyreadr does not support other encodings different from UTF-8.

Cannot read RData or rds files in encodings other than utf-8.

That means this file is not supported.

This limitation comes from the C backend librdata. Looking at the C source code I have the feeling the error message should be different, so I am going to make an issue there for them to take a look. I will also ask if other encodings could be supported. It may come at some point in the future.

If you have control over the generation of the rda files, then try saving them with utf-8 encoding.

@ofajardo ofajardo added enhancement New feature or request waiting for librdata changes the issue needs some fixes to the C library librdata before it can be solved labels Feb 1, 2021
@69hed
Copy link
Author

69hed commented Feb 5, 2021 via email

@ofajardo
Copy link
Owner

@69hed could you please share the file again? It has been deleted from dropbox.

@ofajardo
Copy link
Owner

@69hed recovered the file and hosted it here: https://github.com/ofajardo/readstat_test_files/blob/master/tip2020.rda for easier sharing with librdata people, who is looking at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request waiting for librdata changes the issue needs some fixes to the C library librdata before it can be solved
Projects
None yet
Development

No branches or pull requests

2 participants