Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3 test failures #2331

Closed
juliantaylor opened this issue Nov 22, 2012 · 22 comments
Closed

python3 test failures #2331

juliantaylor opened this issue Nov 22, 2012 · 22 comments
Labels
Bug Testing pandas testing functions or related to the test suite
Milestone

Comments

@juliantaylor
Copy link

on ubuntu 13.04 several tests fail with 3.3 while they suceed with python2
13.04 has python 3.3 and numpy 1.7
current pandas head 61766ec

======================================================================
FAIL: test_quoting (pandas.io.tests.test_parsers.TestParsers)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jtaylor/tmp/pandas-0.9.1/build/lib.linux-x86_64-3.3/pandas/io/tests/test_parsers.py", line 528, in test_quoting
    sep='\t')
AssertionError: Exception not raised by read_table

======================================================================
FAIL: test_cant_compare_tz_naive_w_aware (pandas.tseries.tests.test_timeseries.TestTimestamp)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jtaylor/tmp/pandas-0.9.1/build/lib.linux-x86_64-3.3/pandas/tseries/tests/test_timeseries.py", line 2349, in test_cant_compare_tz_naive_w_aware
    self.assertRaises(Exception, a.__eq__, b.to_pydatetime())
AssertionError: Exception not raised by __eq__

----------------------------------------------------------------------

======================================================================
FAIL: test_more_flexible_frame_multi_function (__main__.TestGroupBy)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jtaylor/tmp/pandas-0.9.1/build/lib.linux-x86_64-3.3/pandas/tests/test_groupby.py", line 1909, in test_more_flexible_frame_multi_function
    assert_frame_equal(result, expected)
  File "/home/jtaylor/tmp/pandas-0.9.1/build/lib.linux-x86_64-3.3/pandas/util/testing.py", line 167, in assert_frame_equal
    assert(left.columns.equals(right.columns))
AssertionError

----------------------------------------------------------------------

======================================================================
ERROR: test_yahoo (pandas.io.tests.test_yahoo.TestYahoo)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/pandas/build/lib.linux-x86_64-3.3/pandas/io/tests/test_yahoo.py", line 25, in test_yahoo
    pd.DataReader("F", 'yahoo', start, end)['Close'][-1],
  File "/tmp/pandas/build/lib.linux-x86_64-3.3/pandas/io/data.py", line 58, in DataReader
    retry_count=retry_count, pause=pause)
  File "/tmp/pandas/build/lib.linux-x86_64-3.3/pandas/io/data.py", line 149, in get_data_yahoo
    parse_dates=True)[::-1]
  File "/tmp/pandas/build/lib.linux-x86_64-3.3/pandas/io/parsers.py", line 364, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/tmp/pandas/build/lib.linux-x86_64-3.3/pandas/io/parsers.py", line 195, in _read
    return parser.read()
  File "/tmp/pandas/build/lib.linux-x86_64-3.3/pandas/io/parsers.py", line 592, in read
    ret = self._engine.read(nrows)
  File "/tmp/pandas/build/lib.linux-x86_64-3.3/pandas/io/parsers.py", line 844, in read
    data = self._reader.read(nrows)
  File "parser.pyx", line 597, in pandas._parser.TextReader.read (pandas/src/parser.c:5342)
  File "parser.pyx", line 619, in pandas._parser.TextReader._read_low_memory (pandas/src/parser.c:5562)
  File "parser.pyx", line 668, in pandas._parser.TextReader._read_rows (pandas/src/parser.c:6143)
  File "parser.pyx", line 655, in pandas._parser.TextReader._tokenize_rows (pandas/src/parser.c:6027)
  File "parser.pyx", line 1385, in pandas._parser.raise_parser_error (pandas/src/parser.c:14807)
pandas._parser.CParserError: Error tokenizing data. C error: Expected 7 fields in line 106, saw 3



test_more_flexible_frame_multi_function (__main__.TestGroupBy) ... > /home/jtaylor/tmp/pandas-0.9.1/build/lib.linux-x86_64-3.3/pandas/util/testing.py(167)assert_frame_equal()
-> assert(left.columns.equals(right.columns))
(Pdb) p left.columns

[(D, mean), (D, std), (C, mean), (C, std)]
(Pdb) p right.columns
MultiIndex
[(C, mean), (C, std), (D, mean), (D, std)]

this one is random probably related to hash randomization

the yahoo one might be due to missing internet connection?

@juliantaylor
Copy link
Author

another one with matplotlib installed

======================================================================
ERROR: test_bootstrap_plot (pandas.tests.test_graphics.TestSeriesPlots)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pandas/tests/test_graphics.py", line 158, in test_bootstrap_plot
    _check_plot_works(bootstrap_plot, self.ts, size=10)
  File "/usr/lib/python3/dist-packages/pandas/tests/test_graphics.py", line 625, in _check_plot_works
    ret = f(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/pandas/tools/plotting.py", line 376, in bootstrap_plot
    samplings = [random.sample(data, size) for _ in range(samples)]
  File "/usr/lib/python3/dist-packages/pandas/tools/plotting.py", line 376, in <listcomp>
    samplings = [random.sample(data, size) for _ in range(samples)]
  File "/usr/lib/python3.3/random.py", line 298, in sample
    raise TypeError("Population must be a sequence or set.  For dicts, use list(d).")
TypeError: Population must be a sequence or set.  For dicts, use list(d).

seems like random.sample does not take np arrays, its no abc.Sequence (which might be a bug in numpy?)

@ghost
Copy link

ghost commented Nov 25, 2012

I can reproduce Intermittent failures on py3.3 with numpy 1.7.0rc1.dev-3a52aa0' (git master).
I cannot reproduce failures on py3.2 with numpy 1.7.
I cannot reproduce failures on py2.7 with numpy 1.7.

test_dict sometimes also fails.

@juliantaylor
Copy link
Author

the random failures can only be seen with python 3.3 and python 2.7.3, hash randomization does not exist earlier
with python2.7.3 it might have to explicitly enable it

@ghost
Copy link

ghost commented Nov 25, 2012

verified. can reproduce on 2.7.3 with hash randomization enabled.
here's the announcement.
The change mitigates an algorithmic complexity attack.

@juliantaylor
Copy link
Author

actually no, the python "fix" is completely ineffective (see issue14621)
nevertheless relying on ordering of dicts was never supported an should thus be fixed

@ghost
Copy link

ghost commented Nov 25, 2012

afaict, the seed is only randomized between sessions, traversal order within a given process is
determinstic.
Of course relying on order should be fixed, but I'm not sure why the problem is surfacing,
I don't see any state being saved across runs.

@ghost
Copy link

ghost commented Nov 25, 2012

dear lord, the order of keys is this form of construction is not well-determined:

OrderedDict(foo=0,bar=1)

@wesm
Copy link
Member

wesm commented Dec 1, 2012

Python 3.3 with NumPy git master passing cleanly for me now. Could you give it a shot with your environment and let me know so I can close this issue?

@juliantaylor
Copy link
Author

I still get this failure (f4b5cbb):

======================================================================
FAIL: test_quoting (pandas.io.tests.test_parsers.TestPythonParser)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/pandas/build/lib.linux-x86_64-3.3/pandas/io/tests/test_parsers.py", line 462, in test_quoting
    sep='\t')
AssertionError: Exception not raised by read_table

----------------------------------------------------------------------

@juliantaylor
Copy link
Author

that one only fails with LC_ALL=C
it works with C.UTF-8

edit, not it fails with utf-8 too
it also fails in python2 for me now too

@wesm
Copy link
Member

wesm commented Dec 2, 2012

I tried doing LC_ALL=C; nosetests pandas/io/tests/test_parsers.py and it worked-- I'm on 64-bit Ubuntu 11.04, maybe that's not the right way?

@juliantaylor
Copy link
Author

I can only reproduce it in ubuntu 13.04, not 12.10. both use python 2.7.3
so far I know that uses a snapshot of pythons stable 2.7.3 branch, maybe some change causes the issue

@wesm
Copy link
Member

wesm commented Dec 2, 2012

Bummer. I'll close this issue until it becomes a problem elsewhere-- the test case that is failing is very esoteric, I wouldn't expect you to run into it in the course of normal use. If you get motivated and want to pick through the C file tokenizer and figure out why it isn't falling into the error state, be my guest.

@wesm wesm closed this as completed Dec 2, 2012
@ghost
Copy link

ghost commented Dec 2, 2012

@wesm, i get the same failure consistently on 64 bit debian testing with 3.3.

@juliantaylor
Copy link
Author

I confirmed it works with ubuntus 2.7.3-5ubuntu4 but not 2.7.3-5ubuntu5
the change log us unfortunately long:

@wesm
Copy link
Member

wesm commented Dec 2, 2012

OK, I'll reopen.

@wesm wesm reopened this Dec 2, 2012
@wesm
Copy link
Member

wesm commented Dec 2, 2012

@y-p if you have the stomach for it, you'll want to look at tokenize_delimited in pandas/src/parser/tokenizer.c. The error should be originating (I think?) because the state of the tokenizer is IN_QUOTED_FIELD when it reaches the end of the file, the result of a malformed line in the file. I bet if you change the string literal to a bytes literal it will fix the test. Otherwise have to dig

@ghost
Copy link

ghost commented Dec 2, 2012

I will if I have to, but it looks like the failure is in the python rather then the c tokenizer:
I put print( locals()) at the start of _make_parser_function and got:

======================================================================
FAIL: test_quoting (pandas.io.tests.test_parsers.TestPythonParser)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user1/src/pandas/.tox/py33/lib/python3.3/site-packages/pandas-0.10.0.dev_3950684-py3.3-linux-x86_64.egg/pandas/io/tests/test_parsers.py", line 462, in test_quoting
    sep='\t')
AssertionError: AssertionError: Exception not raised by read_table
-------------------- >> begin captured stdout << ---------------------
{'index_col': None, 'chunksize': None, 'true_values': None, 'skiprows': None, 'use_unsigned': False, 'na_values': None, 'keep_default_na': True, 'names': None, 'date_parser': None, 'buffer_lines': None, 'converters': None, 'as_recarray': False, 'parse_dates': False, 'quoting': 0, 'squeeze': False, 'memory_map': False, 'na_filter': True, 'skip_footer': 0, 'filepath_or_buffer': <_io.StringIO object at 0x7f52b6cc93e0>, 'skipinitialspace': False, 'low_memory': True, 'skipfooter': None, 'usecols': None, 'header': 0, 'escapechar': None, 'false_values': None, 'comment': None, 'compact_ints': False, 'delim_whitespace': False, 'thousands': None, 'dayfirst': False, 'verbose': False, 'nrows': None, 'encoding': None, 'quotechar': '"', 'dialect': None, 'warn_bad_lines': True, 'delimiter': None, 'compression': None, 'engine': 'python', 'error_bad_lines': True, 'keep_date_col': False, 'sep': '\t', 'iterator': False, 'doublequote': True, 'dtype': None}

--------------------- >> end captured stdout << ----------------------

----------------------------------------------------------------------
Ran 1 test in 0.002s

note the 'engine': 'python'.
and Issue #16013: Fix CSV Reader parsing issue with ending quote characters. looks suspicious.
is that enough to hit your fast-path for bug fixing?

edit:: wrong method but the conclusion is true, it's the python engine and not the c engine that the error comes from.

@ghost
Copy link

ghost commented Dec 2, 2012

@juliantaylor , can you confirm the fix in bc6ad16?

@ghost ghost self-assigned this Dec 2, 2012
@wesm
Copy link
Member

wesm commented Dec 2, 2012

Oh interesting. That makes sense that they altered the CSV module in Python 3.3

@juliantaylor
Copy link
Author

the fix works, thanks

ghost pushed a commit that referenced this issue Dec 2, 2012
@ghost
Copy link

ghost commented Dec 2, 2012

closed via bc6ad16 .
Test suite now passes on python 3.3

@ghost ghost closed this as completed Dec 2, 2012
@wesm wesm unassigned ghost Oct 12, 2016
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Testing pandas testing functions or related to the test suite
Projects
None yet
Development

No branches or pull requests

2 participants