BUG: `to_xml` raises `KeyError` for `index=False` when the index does not start at zero #42458

stephan-hesselmann-by · 2021-07-09T09:52:38Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

import pandas as pd

records = {"col1": [1, 2], "col2": [3, 4]}
df = pd.DataFrame(data=records)

df.index = range(1, 3)  # if this line is commented out it works as expected
xml = df.to_xml(index=False)
print(xml)

Problem description

This code snipped should not raise a KeyError. The index should be ignored regardless of its content.

Traceback:

Traceback (most recent call last):
  File "/Users/xxx/xml.py", line 7, in <module>
    xml = df.to_xml(index=False)
  File "/Users/xxx/pandas/core/frame.py", line 2967, in to_xml
    xml_formatter = TreeBuilder(
  File "/Users/xxx/pandas/io/formats/xml.py", line 458, in __init__
    self.handle_indexes()
  File "/Users/xxx/pandas/io/formats/xml.py", line 199, in handle_indexes
    x for x in self.frame_dicts[0].keys() if x not in self.orig_cols
KeyError: 0

Expected Output

<?xml version='1.0' encoding='utf-8'?>
<data>
  <row>
    <col1>1</col1>
    <col2>3</col2>
  </row>
  <row>
    <col1>2</col1>
    <col2>4</col2>
  </row>
</data>

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : f00ed8f
python : 3.9.6.final.0
python-bits : 64
OS : Darwin
OS-release : 20.5.0
Version : Darwin Kernel Version 20.5.0: Sat May 8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.3.0
numpy : 1.21.0
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.3
setuptools : None
Cython : None
pytest : 6.2.4
hypothesis : None
sphinx : 4.0.3
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : None
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : 2021.06.1
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 4.0.1
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 1.4.15
tables : None
tabulate : 0.8.9
xarray : None
xlrd : None
xlwt : None
numba : None
None

The text was updated successfully, but these errors were encountered:

Fixes pandas-dev#42458 It was assumed that the index contains the element `0`. This led to a defect when the index of the input Dataframe has an offset, which is a common use case when streaming Dataframes via generators. This fix consists of not relying on accessing the `0` element of `frame_dicts`.

ParfaitG · 2021-07-12T12:22:39Z

Good catch! Yes. zero should not be hard-coded in handling indexes: self.frame_dicts[0].

stephan-hesselmann-by added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 9, 2021

stephan-hesselmann-by mentioned this issue Jul 9, 2021

BUG: to_xml with index=False and offset input index #42464

Merged

4 tasks

jreback added IO XML read_xml, to_xml and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 12, 2021

jreback added this to the 1.3.1 milestone Jul 12, 2021

jreback closed this as completed in #42464 Jul 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: `to_xml` raises `KeyError` for `index=False` when the index does not start at zero #42458

BUG: `to_xml` raises `KeyError` for `index=False` when the index does not start at zero #42458

stephan-hesselmann-by commented Jul 9, 2021 •

edited

Loading

INSTALLED VERSIONS

ParfaitG commented Jul 12, 2021

BUG: to_xml raises KeyError for index=False when the index does not start at zero #42458

BUG: to_xml raises KeyError for index=False when the index does not start at zero #42458

Comments

stephan-hesselmann-by commented Jul 9, 2021 • edited Loading

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

ParfaitG commented Jul 12, 2021

BUG: `to_xml` raises `KeyError` for `index=False` when the index does not start at zero #42458

BUG: `to_xml` raises `KeyError` for `index=False` when the index does not start at zero #42458

stephan-hesselmann-by commented Jul 9, 2021 •

edited

Loading

Output of `pd.show_versions()`