Skip to content

Can't write xml files via "lxml" package when using pyfakefs #713

@buhtz

Description

@buhtz

I'm really sorry for such a broad and unspecific bug report. But I could break it down to the fact that the problem occurs only when I use pyfakefs but if I use a real fileystem for (nearly) the same test everything is fine.

Do you have any idea what could cause this side effect or how I could go on with my investigation?

Maybe pyfakefs doesn't write a real file to the fake filesystem? Can I checkt that somehow?

The last lines of the raised error

The raised errors seems to have nothing to do with pyfakefs or my own package.

  File "/usr/lib/python3/dist-packages/openpyxl/reader/excel.py", line 219, in read_worksheets
    ws = ReadOnlyWorksheet(self.wb, sheet.name, rel.target, self.shared_strings)
  File "/usr/lib/python3/dist-packages/openpyxl/worksheet/_read_only.py", line 40, in __init__
    self._get_size()
  File "/usr/lib/python3/dist-packages/openpyxl/worksheet/_read_only.py", line 46, in _get_size
    dimensions = parser.parse_dimensions()
  File "/usr/lib/python3/dist-packages/openpyxl/worksheet/_reader.py", line 164, in parse_dimensions
    for _event, element in it:
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1260, in iterator
    root = pullparser._close_and_return_root()
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1307, in _close_and_return_root
    root = self._parser.close()
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1729, in close
    self._raiseerror(v)
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1629, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: no element found: line 1, column 0

----------------------------------------------------------------------
Ran 1 test in 0.691s

FAILED (errors=1)

Description

The unittest checks if a excel file can be read. The test does this steps

  1. Create a pandas.DataFrame.
  2. Store it as an excel file (via pandas.DataFrame.to_excel())
  3. Read the excel file into (via pandas.read_excel()).
  4. Compare the initial and the returned data frame.

Of course in the real tests there happens a lot more between 2. and 3. I have a wrapper around pandas.DataFrame.to_excel().

The unittests

import unittest
import pandas
import pyfakefs.fake_filesystem_unittest as pyfakefs_ut

class Works(unittest.TestCase):

    def test_simple(self):
        """Simple excel."""
        excel_path = pathlib.Path('foobar.xlsx')
        if excel_path.exists():
            excel_path.unlink()

        df_init = pandas.DataFrame({'FOO': range(3), 'BAR': list('ABC')})
        df_init.to_excel(excel_path)

        self.assertTrue(excel_path.exists())

        df = pandas.read_excel(excel_path)

        self.assertEqual(df.shape, (3, 3))


class Problem(pyfakefs_ut.TestCase):

    def setUp(self):
        self.setUpPyfakefs(allow_root_user=False)

    def test_simple(self):
        excel_path = pathlib.Path('foobar.xlsx')
        df_init = pandas.DataFrame({'FOO': range(3), 'BAR': list('ABC')})
        df_init.to_excel(excel_path)

        self.assertTrue(excel_path.exists())

        # HERE comes the ERROR
        df = pandas.read_excel(excel_path)

        self.assertEqual(df.shape, (3, 3))

Environment

In the beginning this problem occur in older versions with Pandas (1.3.5), Numpy and openpyxl. Just for that bug report I updated everything possible (except my operating system and the python interpreter) to the current available stable release version. But the error is still there.

  • Debian 11 (arm)
  • Python 3.9.2 (via debian repo)
  • Pandas 1.4.4 (via pip)
  • Numpy 1.19.5 (via pip)
  • openpyxl 3.0.10 (via pip)

Full error output

python3 -m unittest tests.test_bandas.Problem
E
======================================================================
ERROR: test_simple (tests.test_bandas.Problem)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1727, in close
    self.parser.Parse(b"", True) # end of data
xml.parsers.expat.ExpatError: no element found: line 1, column 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/ownCloud/my.work/buhtzology/tests/test_bandas.py", line 1139, in test_simple
    df = pandas.read_excel(excel_path)
  File "/home/user/.local/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/user/.local/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 457, in read_excel
    io = ExcelFile(io, storage_options=storage_options, engine=engine)
  File "/home/user/.local/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1419, in __init__
    self._reader = self._engines[engine](self._io, storage_options=storage_options)
  File "/home/user/.local/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 525, in __init__
    super().__init__(filepath_or_buffer, storage_options=storage_options)
  File "/home/user/.local/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 518, in __init__
    self.book = self.load_workbook(self.handles.handle)
  File "/home/user/.local/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 536, in load_workbook
    return load_workbook(
  File "/home/user/.local/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 317, in load_workbook
    reader.read()
  File "/home/user/.local/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 282, in read
    self.read_worksheets()
  File "/home/user/.local/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 219, in read_worksheets
    ws = ReadOnlyWorksheet(self.wb, sheet.name, rel.target, self.shared_strings)
  File "/home/user/.local/lib/python3.9/site-packages/openpyxl/worksheet/_read_only.py", line 41, in __init__
    self._get_size()
  File "/home/user/.local/lib/python3.9/site-packages/openpyxl/worksheet/_read_only.py", line 47, in _get_size
    dimensions = parser.parse_dimensions()
  File "/home/user/.local/lib/python3.9/site-packages/openpyxl/worksheet/_reader.py", line 166, in parse_dimensions
    for _event, element in it:
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1260, in iterator
    root = pullparser._close_and_return_root()
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1307, in _close_and_return_root
    root = self._parser.close()
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1729, in close
    self._raiseerror(v)
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1629, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: no element found: line 1, column 0

----------------------------------------------------------------------
Ran 1 test in 0.699s

FAILED (errors=1)

Misc

Referenced by https://codeberg.org/buhtz/buhtzology/issues/28

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions