UnicodeDecodeError raised by get_rdataset("Guerry", "HistData") #1745

edschofield opened this Issue Jun 6, 2014 · 3 comments


None yet

2 participants


The following line from the formulas example notebook (from Git master):

    dta = sm.datasets.get_rdataset("Guerry", "HistData", cache=True)

causes this exception to be raised on Py3.3 (with statsmodels v0.5.0):

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-6-0b450e8cdfce> in <module>()
----> 1 dta = sm.datasets.get_rdataset("Guerry", "HistData", cache=True)

/home/user/anaconda/lib/python3.3/site-packages/statsmodels/datasets/utils.py in get_rdataset(dataname, package, cache)
    252     title = _get_dataset_meta(dataname, package, cache)
--> 253     doc, _ = _get_data(docs_base_url, dataname, cache, "rst")
    255     return Dataset(data=data, __doc__=doc.read(), package=package, title=title,

/home/user/anaconda/lib/python3.3/site-packages/statsmodels/datasets/utils.py in _get_data(base_url, dataname, cache, extension)
    185     #Python 3, don't think there will be any unicode in r datasets
    186     if sys.version[0] == '3':  # pragma: no cover
--> 187         data = data.decode('ascii', errors='strict')
    188     return StringIO(data), from_cache

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1288: ordinal not in range(128)

Notice the comment on line 185 of utils.py. It looks like that assumption is wrong.

@edschofield edschofield changed the title from ``UnicodeDecodeError`` raised by ``get_rdataset("Guerry", "HistData")`` to UnicodeDecodeError raised by get_rdataset("Guerry", "HistData") Jun 6, 2014

see #1055
This looks fixed in master.


Thanks, @josef-pkt. Shall we close this issue then?



@josef-pkt josef-pkt closed this Jun 6, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment