Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas Merge Error #3

Closed
Chamartin3 opened this issue Aug 21, 2018 · 1 comment
Closed

Pandas Merge Error #3

Chamartin3 opened this issue Aug 21, 2018 · 1 comment

Comments

@Chamartin3
Copy link

Hello, I have a few days learning from your videos and all the exercises are really cool. I am now exploring the exercises of data merge. But when I run the notebook i have an error and i don't know how to solve it.

This is the script I am trying to run.
import pandas as pd
movie_url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/u.item'
movie_cols = ['movie_id', 'title']
movies = pd.read_table(movie_url, sep='|', header=None, names=movie_cols, usecols=[0, 1])
movies.head()

This is the output.

UnicodeDecodeError Traceback (most recent call last)
pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._string_convert()

pandas_libs\parsers.pyx in pandas._libs.parsers._string_box_utf8()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 3: invalid continuation byte

During handling of the above exception, another exception occurred:

UnicodeDecodeError Traceback (most recent call last)
in ()
1 movie_url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/u.item'
2 movie_cols = ['movie_id', 'title']
----> 3 movies = pd.read_table(movie_url, sep='|', header=None, names=movie_cols, usecols=[0, 1])
4 movies.head()

c:\users\omar\appdata\local\programs\python\python37-32\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision)
676 skip_blank_lines=skip_blank_lines)
677
--> 678 return _read(filepath_or_buffer, kwds)
679
680 parser_f.name = name

c:\users\omar\appdata\local\programs\python\python37-32\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
444
445 try:
--> 446 data = parser.read(nrows)
447 finally:
448 parser.close()

c:\users\omar\appdata\local\programs\python\python37-32\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
1034 raise ValueError('skipfooter not supported for iteration')
1035
-> 1036 ret = self._engine.read(nrows)
1037
1038 # May alter columns / col_dict

c:\users\omar\appdata\local\programs\python\python37-32\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
1846 def read(self, nrows=None):
1847 try:
-> 1848 data = self._reader.read(nrows)
1849 except StopIteration:
1850 if self._first_chunk:

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._string_convert()

pandas_libs\parsers.pyx in pandas._libs.parsers._string_box_utf8()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 3: invalid continuation byte

@justmarkham
Copy link
Owner

Maybe try specifying an encoding for read_table?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants