Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: return an OrderedDict from read_excel with sheetname=None/list #9930 #9944

Closed
wants to merge 1 commit into from

Conversation

s-celles
Copy link
Contributor

Fix #9930

read_excel with sheetname=None or sheetname=[0,1,”Sheet5”] returns an OrderedDict instead of a dict

Follow-up of #9933

Fix #9930

read_excel with sheetname=None or sheetname=[0,1,”Sheet5”] returns an OrderedDict instead of a dict
@shoyer
Copy link
Member

shoyer commented Apr 20, 2015

This still needs tests :). Take a look at the excel tests and update some or all of them to require sheets to appear in the original order.

@s-celles
Copy link
Contributor Author

How can I run only this test ?
(because I don't want to run all tests using tests.sh)

I tried

$ python pandas/io/tests/test_excel.py
cannot import name hashtable
Traceback (most recent call last):
  File "pandas/io/tests/test_excel.py", line 3, in <module>
    from pandas.compat import u, range, map, openpyxl_compat
  File "/Users/scls/github/scls19fr/pandas/pandas/__init__.py", line 7, in <module>
    from . import hashtable, tslib, lib
ImportError: cannot import name hashtable

I also tried

bash test_fast.sh
E
======================================================================
ERROR: Failure: ImportError (cannot import name hashtable)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "//anaconda/lib/python2.7/site-packages/nose/loader.py", line 414, in loadTestsFromName
    addr.filename, addr.module)
  File "//anaconda/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "//anaconda/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/Users/scls/github/scls19fr/pandas/pandas/__init__.py", line 7, in <module>
    from . import hashtable, tslib, lib
ImportError: cannot import name hashtable

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (errors=1)

@shoyer
Copy link
Member

shoyer commented Apr 20, 2015

@scls19fr Take a look at these sections of the contributing guidelines (might be worth reading other sections, too!):
http://pandas-docs.github.io/pandas-docs-travis/contributing.html#making-changes
http://pandas-docs.github.io/pandas-docs-travis/contributing.html#running-the-test-suite

(to be more specific, that error message means you need to rebuild pandas)

@s-celles
Copy link
Contributor Author

I think this test could be good

def test_reading_all_sheets(self):
    # Test reading all sheetnames by setting sheetname to None,
    # Ensure a dict is returned.
    # See PR #9450
    # Ensure sheet order is preserved.
    # See issue #9930

    _skip_if_no_xlrd()

    dfs = read_excel(self.multisheet,sheetname=None)
    expected_keys = ['Alpha','Beta','Charlie']
    tm.assert_contains_all(expected_keys,dfs.keys())
    for i, sheetname in enumerate(dfs):
        expected_sheetname = expected_keys[i]
        tm.assert_equal(sheetname, expected_sheetname)

But I won't send a new PR until I can run test on my side.
Any idea ?

@s-celles
Copy link
Contributor Author

I also wonder what should happen when a list is given as sheetname parameter. Should list order be preserved or should excel sheetname order be preserved ?

@shoyer
Copy link
Member

shoyer commented Apr 20, 2015

I would preserve the excel sheetname order. The user already supplied a list of desired sheets, so they can access them in that order if desired. But if we reorder them to the supplied order, there is no way for them to access them in the original order.

This rule is also simpler to explain.

@s-celles
Copy link
Contributor Author

Preserving order defined by user will be (in my mind) both much easier to explain and much easier to implement. I don't see how we can preserve excel sheetname order easily.

see ExcelFile._parse_excel

    if isinstance(sheetname, list):
        sheets = sheetname
        ret_dict = True
    elif sheetname is None:
        sheets = self.sheet_names
        ret_dict = True
    else:
        sheets = [sheetname]

....

for asheetname in sheets:
    ...

@jreback jreback added API Design IO Excel read_excel, to_excel labels Apr 20, 2015
@jreback jreback added this to the Next Major Release milestone Apr 20, 2015
@jreback
Copy link
Contributor

jreback commented Apr 20, 2015

@scls19fr you don't need to open a new pull requests each time you update. just push to the same one.

@s-celles
Copy link
Contributor Author

Sorry @jreback I still need to learn Git and GitHub

@shoyer
Copy link
Member

shoyer commented Apr 20, 2015

To preserve original sheet order, you could use something like this:

    if isinstance(sheetname, list):
        requested_sheets = set(sheetname)
        if not requested_sheets <= set(self.sheet_names):
            raise ValueError('...')
        sheets = [s for s in self.sheet_names if s in requested_sheets]
        ret_dict = True

@jreback
Copy link
Contributor

jreback commented May 9, 2015

@scls19fr pls add some tests.

@jreback jreback changed the title Update excel.py ENH: return an OrderedDict from read_excel with sheetname=None/list #9930 May 9, 2015
@s-celles
Copy link
Contributor Author

I'm sorry but I'm quite busy these days

@jreback
Copy link
Contributor

jreback commented May 10, 2015

closing if/when updated, pls reopen

@jreback jreback closed this May 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_excel with sheetname=None or sheetname=[0,1,”Sheet5”] should preserve sheets order
3 participants