Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: Make ExcelWriter more pluggable #4750

Merged
merged 2 commits into from Sep 13, 2013

Conversation

@jtratner
Copy link
Contributor

commented Sep 5, 2013

FIxes #4745 and makes it easier to swap in other ExcelWriters.

Basically, you can subclass ExcelWriter (or not), register the engine with register_engine, and then either pass it as an engine keyword to read_excel or set it as the default writer using the config options (i.e., io.excel.xlsx.writer or io.excel.xls.writer). Engine names are validated by checking that they are already defined and that they are able to read files of that type (that said, if you can set up external config files to be parsed by pandas, this validation should be removed, because it won't work if pandas is loaded before an external dependency [e.g., a plugin that registers a writer class]). When you call ExcelWriter, the metaclass swaps in the appropriate engine class automatically [thereby being backwards compatible].

The great thing is that it should be really simple to add additional writers and relatively trivial to add additional readers by following the same format. (if, for example, we wanted to resurrect openpyxl for brave souls or something).

Here's the 'interface':

  • Mandatory (but not checked at run time):
    • write_cells(self, cells, sheet_name=None, startrow=0, startcol=0)
      --> called to write additional DataFrames to disk
    • supported_extensions (tuple of supported extensions), used to check
      that engine supports the given extension.
    • engine - string that gives the engine name. Necessary to
      instantiate class directly and bypass ExcelWriterMeta engine lookup.
    • save(self) --> called to save file to disk
  • Optional:
    • __init__(self, path, **kwargs) --> always called with path as first
      argument.
    • check_extension(cls, ext) --> called to check that the engine supports a
      given extension (and used to validate user selection of engine as option).
      This supercedes supported_exceptions.

You also need to register the class with register_engine(). If you don't subclass ExcelWriter,
you will have to implement check_extensions(ext) yourself.

There's a tiny bit of metaclass magic to make this backwards-compatible, but it's nothing particularly complicated.

@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Sep 5, 2013

@jreback @cpcloud @hayd this look okay to you? It's just a prep for adding xlsxwriter as a writer to pandas.

@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Sep 6, 2013

@jreback et al - does the API seem okay?

@jmcnamara - since you actually want to add a new writer, does this seem clear to you? It's similar to what was there before with a bit cleaner separation of concerns.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2013

@jtratner maybe put the methods you are supposed to override in the base class as NotImplemented?

@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Sep 6, 2013

I could also just make an abstract base class... would that be clearer?

Otherwise do you mean func = NotImplemented? Or raise NotImplemented?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2013

either work fine....its the person in the future who adds a new writer/reader and forgets to define something...

@cpcloud

View changes

pandas/io/excel.py Outdated

register_engine('xlwt', _XlwtWriter)


This comment has been minimized.

Copy link
@cpcloud

cpcloud Sep 6, 2013

Member

Minor point: can everything below this line be moved to core/config_init.py? otherwise 👌

This comment has been minimized.

Copy link
@jtratner

jtratner Sep 6, 2013

Author Contributor

yes - makes much more sense to do that :) - btw - does pandas support reading a config file? (if so, I'd need to just validate with str instead, since you might set an engine that doesn't exist at the moment...)

This comment has been minimized.

Copy link
@jtratner

jtratner Sep 6, 2013

Author Contributor

take that back...I'll have to just validate with string, because don't want to have to be sure that config is instantiated after io/excel

This comment has been minimized.

Copy link
@cpcloud

cpcloud Sep 6, 2013

Member

no config files AFAIK. could be a good first pr maybe?

This comment has been minimized.

Copy link
@cpcloud

cpcloud Sep 6, 2013

Member

also can u add a small test to test_config to make sure ur validator is doing what u think it is? thanks

This comment has been minimized.

Copy link
@jtratner

jtratner Sep 6, 2013

Author Contributor

@cpcloud I got rid of the validator(too dependent on timing of setting option), but added a test case for ExcelWriter('badfilename')

@jmcnamara

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2013

@jtratner

@jmcnamara - since you actually want to add a new writer, does this seem clear to you? It's similar to what was there before with a bit cleaner separation of concerns.

It looks fine. I'll subclass it and let you know how I get on.

Should I subclass it in another file or in excel.py?

About this:

# declare external properties you can count on
book = None
curr_sheet = None
path = None

That is good. I was going to ask for some interface to the workbook and worksheet since that could be useful to the end user for adding additional formatting or invoking other methods on them.

How about calling them workbook and worksheet (or current_worksheet) which would be more consistent with the existing Excel writers.

@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Sep 6, 2013

@jmcnamara huh? You would just put whatever setup you need in your __init__ method. Basically, as long as the engine takes a path, responds to write_cells appropriately and saves to disk on save(), you can do whatever you want in terms of saving workbook/worksheet, etc.

@jmcnamara

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2013

You would just put whatever setup you need in your init method.

Sure. What I meant was it that it would be useful (from my point of view) to have interfaces in the base class that would return workbook and worksheet objects that could be used to access APIs that aren't exposed by ExcelWriter.

For example, if I wanted to set a column width in an xlwt worksheet. Something like this:

writer = ExcelWriter('file.xls')
df.to_excel(writer, 'Sheet1')

# Access the underlying worksheet.
worksheet = writer.worksheet()
worksheet.col(0).width = 256 * 12

writer.save()
@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Sep 6, 2013

You can already do this, right?

writer = ExcelWriter('file.xls')
df.to_excel(writer, 'Sheet1')

# Access the underlying worksheet.
worksheet = writer.sheets["Sheet1"]
worksheet.col(0).width = 256 * 12

writer.save()

(on current master)

@jmcnamara

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2013

worksheet = writer.sheets["Sheet1"]

Ok. That's good. Ignore above.

@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Sep 9, 2013

okay, I added some additional documentation for this and testing for registering writers, setting engines, etc. If no objections, going to merge at some point soon...

@jmcnamara

This comment has been minimized.

Copy link
Contributor

commented Sep 10, 2013

@jtratner For what it is worth this looks good to me. If you can get it merged I redo my PR for xlsxwriter support.

@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Sep 10, 2013

Yeah, I think it's close. Just want to look it over one more time. It
should be relatively straightforward for you, right?

@jmcnamara

This comment has been minimized.

Copy link
Contributor

commented Sep 10, 2013

It should be relatively straightforward for you, right?

It looks good from my end. I already tried it out on a clone from your fork.

When I add the xlsxwriter subclass do you want it in excel.py or in a separate file?

@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Sep 10, 2013

I'd put it in excel.py.

@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Sep 12, 2013

I just rebased this. I think it's in a pretty good state now. I could change ExcelReader slightly to do the same thing (for completeness sake) - is that worth doing right now even though we don't have any other readers in the pipeline at the moment?

The only other item is whether the formatting should be done by to_excel or by the individual Writer. At least for now, probably fine to leave it, but I could see getting speed improvements by leaving the default formatting as a classmethod on ExcelWriter that subclasses can overwrite...

@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Sep 12, 2013

I guess I will probably merge this tonight. So if you are interested/have time, please take a look.

@jreback

View changes

doc/source/io.rst Outdated
``xlwt`` for ``.xls`` files. It's very simple to add new Excel writer engines
to ``pandas``. If you have multiple engines installed, you can choose the
engine to use by default via the options ``io.excel.xlsx.writer`` and
``io.excel.xls.writer``. (side note: if you want to add an Excel writer,

This comment has been minimized.

Copy link
@jreback

jreback Sep 12, 2013

Contributor

I would take out this side note comment, that's for dev, not users; also Ithink we usually put the versionadded at the top of the new section?

@jreback

View changes

doc/source/release.rst Outdated
- It's now easier to hook new Excel writers into pandas (just subclass
``ExcelWriter`` and register your engine). You can specify an ``engine`` in
``to_excel`` or in ``ExcelWriter``. You can also specify which writers you want to
use by default with config optinos ``io.excel.xlsx.writer`` and

This comment has been minimized.

Copy link
@jreback

jreback Sep 12, 2013

Contributor

options

@jreback

View changes

pandas/core/panel.py Outdated
@@ -458,7 +458,7 @@ def to_sparse(self, fill_value=None, kind='block'):
default_kind=kind,
default_fill_value=fill_value)

def to_excel(self, path, na_rep=''):
def to_excel(self, path, na_rep='', engine=None):

This comment has been minimized.

Copy link
@jreback

jreback Sep 12, 2013

Contributor

I think you can easily move to_excel to core/generic.py (and remove from frame.py/panel.py)

This comment has been minimized.

Copy link
@jtratner

jtratner Sep 12, 2013

Author Contributor

would to_excel work with a series? If not, I don't want to touch that here.

This comment has been minimized.

Copy link
@jreback

jreback Sep 12, 2013

Contributor

not sure...you can easily just raise on self.ndim not in [2,3].....just cleaner to have it in generic, syncing is easier

This comment has been minimized.

Copy link
@jtratner

jtratner Sep 12, 2013

Author Contributor

actually panel and frame are completely different on this - frame just stores itself, whereas panel iterates through itself and stores each of its dataframes separately.

This comment has been minimized.

Copy link
@jreback

jreback Sep 12, 2013

Contributor

ok then

@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Sep 12, 2013

okay, I'm probably going to merge this today - nothing much changed from past few days.

jtratner added 2 commits Sep 6, 2013
ENH: Add 'add_metaclass' class decorator.
Taken from a new addition to the six library (again, license is in the
LICENSE directory).
CLN: Make ExcelWriter more pluggable
Make ExcelWriter an ABC and add ExcelWriter config to core/config_init

ENH: Allow Panel.to_excel to pass keyword arguments
jtratner added a commit that referenced this pull request Sep 13, 2013
Merge pull request #4750 from jtratner/refactor-excel-writer
CLN: Make ExcelWriter more pluggable

@jtratner jtratner merged commit d20d9bf into pandas-dev:master Sep 13, 2013

@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Sep 13, 2013

@jmcnamara okay, this has been merged in...go ahead :)

@jtratner jtratner deleted the jtratner:refactor-excel-writer branch Sep 13, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.