Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3 test failures #2331

Closed
juliantaylor opened this issue Nov 22, 2012 · 22 comments

Comments

@juliantaylor
Copy link

commented Nov 22, 2012

on ubuntu 13.04 several tests fail with 3.3 while they suceed with python2
13.04 has python 3.3 and numpy 1.7
current pandas head 61766ec

======================================================================
FAIL: test_quoting (pandas.io.tests.test_parsers.TestParsers)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jtaylor/tmp/pandas-0.9.1/build/lib.linux-x86_64-3.3/pandas/io/tests/test_parsers.py", line 528, in test_quoting
    sep='\t')
AssertionError: Exception not raised by read_table

======================================================================
FAIL: test_cant_compare_tz_naive_w_aware (pandas.tseries.tests.test_timeseries.TestTimestamp)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jtaylor/tmp/pandas-0.9.1/build/lib.linux-x86_64-3.3/pandas/tseries/tests/test_timeseries.py", line 2349, in test_cant_compare_tz_naive_w_aware
    self.assertRaises(Exception, a.__eq__, b.to_pydatetime())
AssertionError: Exception not raised by __eq__

----------------------------------------------------------------------

======================================================================
FAIL: test_more_flexible_frame_multi_function (__main__.TestGroupBy)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jtaylor/tmp/pandas-0.9.1/build/lib.linux-x86_64-3.3/pandas/tests/test_groupby.py", line 1909, in test_more_flexible_frame_multi_function
    assert_frame_equal(result, expected)
  File "/home/jtaylor/tmp/pandas-0.9.1/build/lib.linux-x86_64-3.3/pandas/util/testing.py", line 167, in assert_frame_equal
    assert(left.columns.equals(right.columns))
AssertionError

----------------------------------------------------------------------

======================================================================
ERROR: test_yahoo (pandas.io.tests.test_yahoo.TestYahoo)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/pandas/build/lib.linux-x86_64-3.3/pandas/io/tests/test_yahoo.py", line 25, in test_yahoo
    pd.DataReader("F", 'yahoo', start, end)['Close'][-1],
  File "/tmp/pandas/build/lib.linux-x86_64-3.3/pandas/io/data.py", line 58, in DataReader
    retry_count=retry_count, pause=pause)
  File "/tmp/pandas/build/lib.linux-x86_64-3.3/pandas/io/data.py", line 149, in get_data_yahoo
    parse_dates=True)[::-1]
  File "/tmp/pandas/build/lib.linux-x86_64-3.3/pandas/io/parsers.py", line 364, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/tmp/pandas/build/lib.linux-x86_64-3.3/pandas/io/parsers.py", line 195, in _read
    return parser.read()
  File "/tmp/pandas/build/lib.linux-x86_64-3.3/pandas/io/parsers.py", line 592, in read
    ret = self._engine.read(nrows)
  File "/tmp/pandas/build/lib.linux-x86_64-3.3/pandas/io/parsers.py", line 844, in read
    data = self._reader.read(nrows)
  File "parser.pyx", line 597, in pandas._parser.TextReader.read (pandas/src/parser.c:5342)
  File "parser.pyx", line 619, in pandas._parser.TextReader._read_low_memory (pandas/src/parser.c:5562)
  File "parser.pyx", line 668, in pandas._parser.TextReader._read_rows (pandas/src/parser.c:6143)
  File "parser.pyx", line 655, in pandas._parser.TextReader._tokenize_rows (pandas/src/parser.c:6027)
  File "parser.pyx", line 1385, in pandas._parser.raise_parser_error (pandas/src/parser.c:14807)
pandas._parser.CParserError: Error tokenizing data. C error: Expected 7 fields in line 106, saw 3



test_more_flexible_frame_multi_function (__main__.TestGroupBy) ... > /home/jtaylor/tmp/pandas-0.9.1/build/lib.linux-x86_64-3.3/pandas/util/testing.py(167)assert_frame_equal()
-> assert(left.columns.equals(right.columns))
(Pdb) p left.columns

[(D, mean), (D, std), (C, mean), (C, std)]
(Pdb) p right.columns
MultiIndex
[(C, mean), (C, std), (D, mean), (D, std)]

this one is random probably related to hash randomization

the yahoo one might be due to missing internet connection?

@juliantaylor

This comment has been minimized.

Copy link
Author

commented Nov 22, 2012

another one with matplotlib installed

======================================================================
ERROR: test_bootstrap_plot (pandas.tests.test_graphics.TestSeriesPlots)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pandas/tests/test_graphics.py", line 158, in test_bootstrap_plot
    _check_plot_works(bootstrap_plot, self.ts, size=10)
  File "/usr/lib/python3/dist-packages/pandas/tests/test_graphics.py", line 625, in _check_plot_works
    ret = f(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/pandas/tools/plotting.py", line 376, in bootstrap_plot
    samplings = [random.sample(data, size) for _ in range(samples)]
  File "/usr/lib/python3/dist-packages/pandas/tools/plotting.py", line 376, in <listcomp>
    samplings = [random.sample(data, size) for _ in range(samples)]
  File "/usr/lib/python3.3/random.py", line 298, in sample
    raise TypeError("Population must be a sequence or set.  For dicts, use list(d).")
TypeError: Population must be a sequence or set.  For dicts, use list(d).

seems like random.sample does not take np arrays, its no abc.Sequence (which might be a bug in numpy?)

@ghost

This comment has been minimized.

Copy link

commented Nov 25, 2012

I can reproduce Intermittent failures on py3.3 with numpy 1.7.0rc1.dev-3a52aa0' (git master).
I cannot reproduce failures on py3.2 with numpy 1.7.
I cannot reproduce failures on py2.7 with numpy 1.7.

test_dict sometimes also fails.

@juliantaylor

This comment has been minimized.

Copy link
Author

commented Nov 25, 2012

the random failures can only be seen with python 3.3 and python 2.7.3, hash randomization does not exist earlier
with python2.7.3 it might have to explicitly enable it

@ghost

This comment has been minimized.

Copy link

commented Nov 25, 2012

verified. can reproduce on 2.7.3 with hash randomization enabled.
here's the announcement.
The change mitigates an algorithmic complexity attack.

@juliantaylor

This comment has been minimized.

Copy link
Author

commented Nov 25, 2012

actually no, the python "fix" is completely ineffective (see issue14621)
nevertheless relying on ordering of dicts was never supported an should thus be fixed

@ghost

This comment has been minimized.

Copy link

commented Nov 25, 2012

afaict, the seed is only randomized between sessions, traversal order within a given process is
determinstic.
Of course relying on order should be fixed, but I'm not sure why the problem is surfacing,
I don't see any state being saved across runs.

@ghost

This comment has been minimized.

Copy link

commented Nov 25, 2012

dear lord, the order of keys is this form of construction is not well-determined:

OrderedDict(foo=0,bar=1)

@wesm

This comment has been minimized.

Copy link
Member

commented Dec 1, 2012

Python 3.3 with NumPy git master passing cleanly for me now. Could you give it a shot with your environment and let me know so I can close this issue?

@juliantaylor

This comment has been minimized.

Copy link
Author

commented Dec 1, 2012

I still get this failure (f4b5cbb):

======================================================================
FAIL: test_quoting (pandas.io.tests.test_parsers.TestPythonParser)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/pandas/build/lib.linux-x86_64-3.3/pandas/io/tests/test_parsers.py", line 462, in test_quoting
    sep='\t')
AssertionError: Exception not raised by read_table

----------------------------------------------------------------------
@juliantaylor

This comment has been minimized.

Copy link
Author

commented Dec 1, 2012

that one only fails with LC_ALL=C
it works with C.UTF-8

edit, not it fails with utf-8 too
it also fails in python2 for me now too

@wesm

This comment has been minimized.

Copy link
Member

commented Dec 2, 2012

I tried doing LC_ALL=C; nosetests pandas/io/tests/test_parsers.py and it worked-- I'm on 64-bit Ubuntu 11.04, maybe that's not the right way?

@juliantaylor

This comment has been minimized.

Copy link
Author

commented Dec 2, 2012

I can only reproduce it in ubuntu 13.04, not 12.10. both use python 2.7.3
so far I know that uses a snapshot of pythons stable 2.7.3 branch, maybe some change causes the issue

@wesm

This comment has been minimized.

Copy link
Member

commented Dec 2, 2012

Bummer. I'll close this issue until it becomes a problem elsewhere-- the test case that is failing is very esoteric, I wouldn't expect you to run into it in the course of normal use. If you get motivated and want to pick through the C file tokenizer and figure out why it isn't falling into the error state, be my guest.

@wesm wesm closed this Dec 2, 2012

@ghost

This comment has been minimized.

Copy link

commented Dec 2, 2012

@wesm, i get the same failure consistently on 64 bit debian testing with 3.3.

@juliantaylor

This comment has been minimized.

Copy link
Author

commented Dec 2, 2012

I confirmed it works with ubuntus 2.7.3-5ubuntu4 but not 2.7.3-5ubuntu5
the change log us unfortunately long:

  • Update to 20121128, taken from the 2.7 branch.
    • Issue #9011: Fix hacky AST code that modified the CST when compiling
      a negated numeric literal.
    • Issue #16306: Fix multiple error messages when unknown command line
      parameters where passed to the interpreter.
    • Issue #15379: Fix passing of non-BMP characters as integers for the
      charmap decoder (already working as unicode strings).
    • Issue #16453: Fix equality testing of dead weakref objects.
    • Issue #9535: Fix pending signals that have been received but not yet
      handled by Python to not persist after os.fork() in the child process.
    • Issue #15001: fix segfault on "del sys.modules['main']".
    • Issue #5057: the peepholer no longer optimizes subscription on unicode
      literals (e.g. u'foo'[0]) in order to produce compatible pyc files
      between narrow and wide builds.
    • Issue #8401: assigning an int to a bytearray slice (e.g. b[3:4] = 5)
      now raises an error.
    • Issue #14700: Fix buggy overflow checks for large width and precision
      in string formatting operations.
    • Issue #16345: Fix an infinite loop when fromkeys on a dict subclass
      received a nonempty dict from the constructor.
    • Issue #6074: Ensure cached bytecode files can always be updated by the
      user that created them, even when the source file is read-only.
    • Issue #14783: Improve int() and long() docstrings and switch docstrings
      for unicode(), slice(), range(), and xrange() to use multi-line
      signatures.
    • Issue #16030: Fix overflow bug in computing the repr of an xrange
      object with large start, step or length.
    • Issue #16029: Fix overflow bug occurring when pickling xranges with large
      start, step or length.
    • Issue #16037: Limit httplib's _read_status() function to work around
      broken HTTP servers and reduce memory usage (backport of a 3.2 fix).
    • Issue #13992: The trashcan mechanism is now thread-safe. This eliminates
      sporadic crashes in multi-thread programs when several long deallocator
      chains ran concurrently and involved subclasses of built-in container
      types.
    • Issue #15801: Make sure mappings passed to '%' formatting are actually
      subscriptable.
    • Issue #15604: Update uses of PyObject_IsTrue() to check for and handle
      errors correctly.
    • Issue #15897: zipimport.c doesn't check return value of fseek().
    • Issue #16369: Global PyTypeObjects not initialized with PyType_Ready(...).
    • Issue #15033: Fix the exit status bug when modules invoked using
      -m switch, return the proper failure return value (1).
    • Issue #1160: Fix compiling large regular expressions on UCS2 builds.
    • Issue #14313: zipfile now raises NotImplementedError when the compression
      type is unknown.
    • Issue #16408: Fix file descriptors not being closed in error conditions
      in the zipfile module.
    • Issue #16327: The subprocess module no longer leaks file descriptors
      used for stdin/stdout/stderr pipes to the child when fork() fails.
    • Issue #14396: Handle the odd rare case of waitpid returning 0 when not
      expected in subprocess.Popen.wait().
    • Issue #16411: Fix a bug where zlib.decompressobj().flush() might try
      to access previously-freed memory.
    • Issue #16350: zlib.decompressobj().decompress() now accumulates data from
      successive calls after EOF in unused_data, instead of only saving the
      argument to the last call. decompressobj().flush() now correctly sets
      unused_data and unconsumed_tail. A bug in the handling of MemoryError
      when setting the unconsumed_tail attribute has also been fixed.
    • Issue #12759: sre_parse now raises a proper error when the name
      of the group is missing.
    • Issue #16152: fix tokenize to ignore whitespace at the end of the code
      when no newline is found.
    • Issue #1207589: Add Cut/Copy/Paste items to IDLE right click
      Context Menu.
    • Issue #16230: Fix a crash in select.select() when one the lists changes
      size while iterated on.
    • Issue #16228: Fix a crash in the json module where a list changes size
      while it is being encoded.
    • Issue #14897: Enhance error messages of struct.pack and struct.pack_into.
    • Issue #12890: cgitb no longer prints spurious

      tags in text
      mode when the logdir option is specified.

    • Issue #14398: Fix size truncation and overflow bugs in the bz2 module.
    • Issue #5148: Ignore 'U' in mode given to gzip.open() and gzip.GzipFile().
    • Issue #16220: wsgiref now always calls close() on an iterable response.
    • Issue #16461: Wave library should be able to deal with 4GB wav files,
      and sample rate of 44100 Hz.
    • Issue #16176: Properly identify Windows 8 via platform.platform().
    • Issue #15756: subprocess.poll() now properly handles errno.ECHILD to
      return a returncode of 0 when the child has already exited or cannot
      be waited on.
    • Issue #12376: Pass on parameters in TextTestResult.init super call.
    • Issue #15222: Insert blank line after each message in mbox mailboxes.
    • Issue #16013: Fix CSV Reader parsing issue with ending quote characters.
    • Issue #15421: fix an OverflowError in Calendar.itermonthdates() after
      datetime.MAXYEAR.
    • Issue #15970: xml.etree.ElementTree now serializes correctly the
      empty HTML elements 'meta' and 'param'.
    • Issue #15676: Now "mmap" check for empty files before doing the
      offset check.
    • Issue #15340: Fix importing the random module when /dev/urandom cannot
      be opened. This was a regression caused by the hash randomization patch.
    • Issue #15841: The readable(), writable() and seekable() methods of
      io.BytesIO and io.StringIO objects now raise ValueError when the object
      has been closed.
    • Issue #16112: platform.architecture does not correctly escape argument to
      /usr/bin/file.
    • Issue #12776,#11839: call argparse type function (specified by
      add_argument) only once. Before, the type function was called twice in
      the case where the default was specified and the argument was given as
      well. This was especially problematic for the FileType type, as a
      default file would always be opened, even if a file argument was
      specified on the command line.
    • Issue #15906: Fix a regression in argparse caused by the preceding change,
      when action='append', type='str' and default=[].
    • Issue #13370: Ensure that ctypes works on Mac OS X when Python is
      compiled using the clang compiler
    • Issue #15544: Fix Decimal.float to work with payload-carrying NaNs.
    • Issue #15199: Fix JavaScript's default MIME type to
      application/javascript.
    • Issue #15477: In cmath and math modules, add workaround for platforms
      whose system-supplied log1p function doesn't respect signs of zeros.
    • Issue #15908: Fix misbehaviour of the sha1 module when called on data
      larger than 2**32 bytes.
    • Issue #15910: Fix misbehaviour of _md5 and sha1 modules when "updating"
      on data larger than 2**32 bytes.
    • Fix the leak of a dict in the time module when used in an embedded
      interpreter that is repeatedly initialized and shutdown and reinitialized.
    • Issue #12268: File readline, readlines and read or readall methods
      no longer lose data when an underlying read system call is interrupted
      within an io module object. IOError is no longer raised due to a read
      system call returning EINTR from within these methods.
    • Issue #16012: Fix a regression in pyexpat. The parser's UseForeignDTD()
      method doesn't require an argument again.
    • Issue #16559: Add more tests for the json module, including some from the
      official test suite at json.org.
    • Issue #16274: Fix test_asyncore on Solaris.
    • Issue #15040: Close files in mailbox tests for PyPy compatibility.
    • Issue #15802: Fix test logic in TestMaildir.test_create_tmp.
    • Issue #15765: Extend a previous fix to Solaris and OpenBSD for quirky
      getcwd() behaviour (issue #9185) to NetBSD as well.
    • Issue #15615: Add some tests for the json module's handling of invalid
      input data.
    • Issue #15923: fix a mistake in asdl_c.py that resulted in a TypeError
      after 2801bf875a24 (see #15801).
    • Issue #11715: Fix multiarch detection without having Debian development
      tools (dpkg-dev) installed.
    • Issue #15819: Make sure we can build Python out-of-tree from a readonly
      source directory. (Somewhat related to Issue #9860.)
    • Issue #15822: Ensure 2to3 grammar pickles are properly installed.
    • Issue #13301: use ast.literal_eval() instead of eval()
      in Tools/i18n/msgfmt.py.
    • Issue #16400: Update the description of which versions of a given package
      PyPI displays.
    • Issue #15677: Document that zlib and gzip accept a compression level
      of 0 to mean 'no compression'.
    • Issue #8040: added a version switcher to the documentation.
    • Issue #16115: Improve subprocess.Popen() documentation around args,
      shell, and executable arguments.
    • Issue #15979: Improve timeit documentation.
    • Issue #16036: Improve documentation of built-in int()'s signature and
      arguments.
    • Issue #15935: Clarification of argparse docs, re: add_argument() type and
      default arguments.
    • Issue #13769: Document the effect of ensure_ascii to the return type
      of JSON decoding functions.
    • Issue #14880: Fix kwargs notation in csv.reader,
      .writer & .register_dialect.
    • Issue #14674: Add a discussion of the json module's standard compliance.
  • Clarify location of the gdbinit file. LP: #975676.
  • Fix traceback for missing distutils wininst .exe files. LP: #1081155.
    -- Matthias Klose doko@ubuntu.com Wed, 28 Nov 2012 13:49:46 +0100
@wesm

This comment has been minimized.

Copy link
Member

commented Dec 2, 2012

OK, I'll reopen.

@wesm wesm reopened this Dec 2, 2012

@wesm

This comment has been minimized.

Copy link
Member

commented Dec 2, 2012

@y-p if you have the stomach for it, you'll want to look at tokenize_delimited in pandas/src/parser/tokenizer.c. The error should be originating (I think?) because the state of the tokenizer is IN_QUOTED_FIELD when it reaches the end of the file, the result of a malformed line in the file. I bet if you change the string literal to a bytes literal it will fix the test. Otherwise have to dig

@ghost

This comment has been minimized.

Copy link

commented Dec 2, 2012

I will if I have to, but it looks like the failure is in the python rather then the c tokenizer:
I put print( locals()) at the start of _make_parser_function and got:

======================================================================
FAIL: test_quoting (pandas.io.tests.test_parsers.TestPythonParser)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user1/src/pandas/.tox/py33/lib/python3.3/site-packages/pandas-0.10.0.dev_3950684-py3.3-linux-x86_64.egg/pandas/io/tests/test_parsers.py", line 462, in test_quoting
    sep='\t')
AssertionError: AssertionError: Exception not raised by read_table
-------------------- >> begin captured stdout << ---------------------
{'index_col': None, 'chunksize': None, 'true_values': None, 'skiprows': None, 'use_unsigned': False, 'na_values': None, 'keep_default_na': True, 'names': None, 'date_parser': None, 'buffer_lines': None, 'converters': None, 'as_recarray': False, 'parse_dates': False, 'quoting': 0, 'squeeze': False, 'memory_map': False, 'na_filter': True, 'skip_footer': 0, 'filepath_or_buffer': <_io.StringIO object at 0x7f52b6cc93e0>, 'skipinitialspace': False, 'low_memory': True, 'skipfooter': None, 'usecols': None, 'header': 0, 'escapechar': None, 'false_values': None, 'comment': None, 'compact_ints': False, 'delim_whitespace': False, 'thousands': None, 'dayfirst': False, 'verbose': False, 'nrows': None, 'encoding': None, 'quotechar': '"', 'dialect': None, 'warn_bad_lines': True, 'delimiter': None, 'compression': None, 'engine': 'python', 'error_bad_lines': True, 'keep_date_col': False, 'sep': '\t', 'iterator': False, 'doublequote': True, 'dtype': None}

--------------------- >> end captured stdout << ----------------------

----------------------------------------------------------------------
Ran 1 test in 0.002s

note the 'engine': 'python'.
and Issue #16013: Fix CSV Reader parsing issue with ending quote characters. looks suspicious.
is that enough to hit your fast-path for bug fixing?

edit:: wrong method but the conclusion is true, it's the python engine and not the c engine that the error comes from.

@ghost

This comment has been minimized.

Copy link

commented Dec 2, 2012

@juliantaylor , can you confirm the fix in bc6ad16?

@ghost ghost self-assigned this Dec 2, 2012

@wesm

This comment has been minimized.

Copy link
Member

commented Dec 2, 2012

Oh interesting. That makes sense that they altered the CSV module in Python 3.3

@juliantaylor

This comment has been minimized.

Copy link
Author

commented Dec 2, 2012

the fix works, thanks

ghost pushed a commit that referenced this issue Dec 2, 2012
y-p
@ghost

This comment has been minimized.

Copy link

commented Dec 2, 2012

closed via bc6ad16 .
Test suite now passes on python 3.3

@ghost ghost closed this Dec 2, 2012

@wesm wesm unassigned ghost Oct 12, 2016

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.