Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'gbk' codec can't decode byte 0xbf in position 2: illegal multibyte sequence #9

Closed
df19900725 opened this issue May 3, 2017 · 4 comments

Comments

@df19900725
Copy link

df19900725 commented May 3, 2017

from geotext import GeoText
places = GeoText("London is a great city")
places.cities
GeoText('New York, Texas, and also China').country_mentions

My computer system is Windows 10... The code fragment is mentioned above. Then it throws an error:

"D:\Program Files\Python3\python.exe" D:/OneDrive/Programs/Jieba/ExtractLocation.py
Traceback (most recent call last):

File "D:/OneDrive/Programs/Jieba/ExtractLocation.py", line 20, in
from geotext import GeoText

File "C:\Users\Du Fei\AppData\Roaming\Python\Python36\site-packages\geotext_init_.py", line 7, in
from .geotext import GeoText
File "C:\Users\Du Fei\AppData\Roaming\Python\Python36\site-packages\geotext\geotext.py", line 87, in
class GeoText(object):
File "C:\Users\Du Fei\AppData\Roaming\Python\Python36\site-packages\geotext\geotext.py", line 103, in GeoText
index = build_index()
File "C:\Users\Du Fei\AppData\Roaming\Python\Python36\site-packages\geotext\geotext.py", line 74, in build_index
get_data_path('countryInfo.txt'), usecols=[4, 0], skip=1)
File "C:\Users\Du Fei\AppData\Roaming\Python\Python36\site-packages\geotext\geotext.py", line 48, in read_table
next(f)
UnicodeDecodeError: 'gbk' codec can't decode byte 0xbf in position 2: illegal multibyte sequence

@DevinCharles
Copy link

I believe this issues is the same that I'm seeing... geotext\read_table has an input encoding but this is never used during the file open() command.

At Line 45:

with open(filename, 'r') as f:
    # skip initial lines
    for _ in range(skip):
        next(f)

Should be:

with open(filename, 'rt', encoding=encoding) as f:
    # skip initial lines
    for _ in range(skip):
        next(f)

@astrocrazy
Copy link

Worked like a charm, Thank you so much...

@DevinCharles
Copy link

Unfortunately this doesn't solve all issues related to this, but it works in a pinch...

elyase pushed a commit that referenced this issue Jun 13, 2018
@elyase
Copy link
Owner

elyase commented Jun 13, 2018

Should be fixed in master:

pip install https://github.com/elyase/geotext/archive/master.zip

thanks for reporting and thanks to @DevinCharles for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants