Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ExcelWriter() to add sheets to existing workbook #3441

Closed
ligon opened this issue Apr 23, 2013 · 40 comments
Closed

Allow ExcelWriter() to add sheets to existing workbook #3441

ligon opened this issue Apr 23, 2013 · 40 comments

Comments

@ligon
Copy link

@ligon ligon commented Apr 23, 2013

The ability of ExcelWriter to save different dataframes to different worksheets is great for sharing those dfs with the python-deficient. But this quickly leads to a need to add worksheets to an existing workbook, not just creating one from scratch; something like:

df0=pd.DataFrame(np.arange(3))
df0.to_excel('foo.xlsx','Data 0')

df1=pd.DataFrame(np.arange(2))
df1.to_excel('foo.xlsx','Data 1')

The following little diff to io/parsers.py implements this behavior for *.xlsx files:

diff --git a/pandas/io/parsers.py b/pandas/io/parsers.py
index 89f892d..7f010ee 100644
--- a/pandas/io/parsers.py
+++ b/pandas/io/parsers.py
@@ -2099,12 +2099,19 @@ class ExcelWriter(object):
             self.fm_date = xlwt.easyxf(num_format_str='YYYY-MM-DD')
         else:
             from openpyxl.workbook import Workbook
-            self.book = Workbook()  # optimized_write=True)
-            # open pyxl 1.6.1 adds a dummy sheet remove it
-            if self.book.worksheets:
-                self.book.remove_sheet(self.book.worksheets[0])
+            from openpyxl.reader.excel import load_workbook
+
+            try:
+               self.book=load_workbook(filename = path)
+               self.sheets={wks.title:wks for wks in self.book.worksheets}
+            except InvalidFileException:
+                self.book = Workbook()  # optimized_write=True)
+                # open pyxl 1.6.1 adds a dummy sheet remove it
+                if self.book.worksheets:
+                    self.book.remove_sheet(self.book.worksheets[0])
+                self.sheets = {}
+
         self.path = path
-        self.sheets = {}
         self.cur_sheet = None

Doing this for *.xls files is a little harder.

@ligon ligon closed this Apr 23, 2013
@ligon ligon reopened this Apr 23, 2013
@jreback
Copy link
Contributor

@jreback jreback commented Sep 22, 2013

@jtratner is this still a bug/needed enhancement?

Loading

@jtratner
Copy link
Contributor

@jtratner jtratner commented Sep 22, 2013

Because of how to_excel is set up, this would mean reading in and then writing the file each time (because to_excel with a path argument saves the file). The right way to do this is to use ExcelWriter:

import pandas as pd
writer = pd.ExcelWriter('foo.xlsx')
df.to_excel(writer, 'Data 0')
df.to_excel(writer, 'Data 1')
writer.save()

I could see (eventually) adding an option to ExcelWriter that doesn't overwrite the file. But, yet again, that may mean writing in the entire file first. I don't know.

Loading

@ghost ghost assigned jtratner Sep 22, 2013
@jtratner
Copy link
Contributor

@jtratner jtratner commented Sep 22, 2013

I'm going to add something to the docs about this, maybe a test case with this, and I'll look into adding an option to read in the file, but it depends on how xlwt and openpyxl work.

Loading

@jreback
Copy link
Contributor

@jreback jreback commented Sep 22, 2013

@jtratner what about a context manager get_excel?

with get_excel('foo.xlsx') as e:
    df.to_excel(e,'Data 0)
    df.to_excel(e,'Data 1)

?

Loading

@jtratner
Copy link
Contributor

@jtratner jtratner commented Sep 22, 2013

how about we just make ExcelWriter into a contextmanager instead? it'll just call save at the end. Much simpler.

Loading

@jtratner
Copy link
Contributor

@jtratner jtratner commented Sep 23, 2013

@ligon you can do this now this way:

with ExcelWriter('foo.xlsx') as writer:
    df.to_excel(writer, 'Data 0')
    df2.to_excel(writer, 'Data 1')

If you don't use the with statement, just have to call save() at the end.

Loading

@ligon
Copy link
Author

@ligon ligon commented Sep 23, 2013

Excellent. And great that it has an exit method.

Thanks,
-Ethan Ligon

Ethan Ligon, Associate Professor
Agricultural & Resource Economics
University of California, Berkeley

Loading

@dylancis
Copy link

@dylancis dylancis commented Jan 10, 2014

I was extremely interesting by the request made by @ligon - but seems this is already there.
However using 0.12.0 pd version, when I am doing:
df = DataFrame([1,2,3])
df2 = DataFrame([5,5,6])
with ExcelWriter('foo.xlsx') as writer:
df.to_excel(writer, 'Data 0')
df2.to_excel(writer, 'Data 1')

Assumning foo.xlsx was containing a sheet named 'bar', basgot delete after the command run. While as per your comment, i was expecting to keep it in my foo excel file. Is that a bug?

Loading

@frenet
Copy link

@frenet frenet commented Apr 4, 2014

is it hard to add sheets to an existing excel file on the disk?
import pandas as pd
import numpy as np
a=pd.DataFrame(np.random.random((3,1)))
excel_writer=pd.ExcelWriter('c:\excel.xlsx')
a.to_excel(excel_writer, 'a1')
excel_writer.save()

excel_writer=pd.ExcelWriter('c:\excel.xlsx')
a.to_excel(excel_writer, 'a2')
excel_writer.save()

here only sheet 'a2" is save, but I like to save both 'a1' and 'a2'.

I know it is possible to add sheets to an existing workbook.

Loading

@jtratner
Copy link
Contributor

@jtratner jtratner commented Apr 4, 2014

It's definitely possible to add sheets to an existing workbook, but it's
not necessarily easy to do it with pandas. I think you'd have to read the
workbook separately and then pass it into the ExcelWriter class... That
would be something we could consider supporting.

Loading

@jtratner
Copy link
Contributor

@jtratner jtratner commented Apr 4, 2014

And I think if you subclass the ExcelWriter instance you want to use and
overwrite its__init__ method, as long as you set self.book it should work.
That said, no guarantee that this would continue to work in future
versions, since it's only a quasi-public API

Loading

@ankostis
Copy link

@ankostis ankostis commented Dec 17, 2015

This stackoverflow workaround, which is based in openpyxl, may work
(EDIT: indeed works, checked with pandas-0.17.0):

import pandas
from openpyxl import load_workbook

book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') 
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

Loading

@jreback
Copy link
Contributor

@jreback jreback commented Dec 18, 2015

this would be pretty easy to implement inside ExcelWriter (Oder patch above)

prob just need to add a mode=kw and default to w and make a be append

Loading

@pylang
Copy link

@pylang pylang commented May 11, 2016

Was this ever patched?

Loading

@jreback
Copy link
Contributor

@jreback jreback commented May 11, 2016

it seems that you can work around it (see above), but I suppose would be nice to actually do it from pandas.

Loading

@andreacassioli
Copy link

@andreacassioli andreacassioli commented Nov 21, 2016

Hi, any follow up on this issue?
I can provide a use case: I have excel files with pivot tables and pivot graphs that I need to reach out people not proficient in Python.

My idea was to use pandas to add a sheet that contains the data for the pivot. But up to know I am stuck and the proposed workaround, thought not difficult, sounds a bit cumbersome . It would make sense to jsut have an option whether overwrite an existing file.

Loading

@zeluspudding
Copy link

@zeluspudding zeluspudding commented Dec 11, 2016

Let me echo @jreback , it would be super nice if I could just add a sheet to an excel workbook from pandas.

Loading

@jorisvandenbossche jorisvandenbossche added this to the Next Major Release milestone Dec 11, 2016
@jorisvandenbossche jorisvandenbossche removed this from the 0.13 milestone Dec 11, 2016
@jorisvandenbossche jorisvandenbossche added this to the Someday milestone Dec 11, 2016
@jorisvandenbossche jorisvandenbossche removed this from the Next Major Release milestone Dec 11, 2016
@jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Dec 11, 2016

To be clear, we agree that this would be a nice functionality, and would certainly welcome a contribution from the community.
Best way to have this in pandas is to make a PR!

Loading

@aa3222119
Copy link

@aa3222119 aa3222119 commented Feb 16, 2017

@jmcnamara how to use pandas.to_excel(Writer maybe use pandas.ExcelWriter) to add some data to an existed file , but not rewrite it??

Loading

@jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Feb 16, 2017

@aa3222119 That is exactly what this issue is about: an enhancement request to add this ability, which is not yet possible today.

(BTW, it is not needed to post the same question in multiple issues)

Loading

@aa3222119
Copy link

@aa3222119 aa3222119 commented Feb 16, 2017

sorry. i will delete that one . BTW, will that be possible some day later? @jorisvandenbossche

Loading

@jmcnamara
Copy link
Contributor

@jmcnamara jmcnamara commented Feb 16, 2017

BTW will that be possible some day later? @jmcnamara

This isn't and won't be possible when using XlsxWriter as the engine. It should be possible when using OpenPyXL. I've seen some examples on SO like this one: jmcnamara/excel-writer-xlsx#157

Loading

@aa3222119
Copy link

@aa3222119 aa3222119 commented Feb 16, 2017

3Q very much! @jmcnamara
it is exactly what you said . use openpyxl 👍

import pandas as pd
from openpyxl import load_workbook
book = load_workbook('text.xlsx')
writer = pd.ExcelWriter('text.xlsx', engine='openpyxl')
writer.book = book
pd.DataFrame(userProfile,index=[1]).to_excel(writer,'sheet111',startrow=7,startcol=7)
pd.DataFrame(userProfile,index=[1]).to_excel(writer,'sheet123',startrow=0,startcol=0)
writer.save()
pd.DataFrame(userProfile,index=[1]).to_excel(writer,'sheet123',startrow=3,startcol=3)
writer.save()

all can be added to text.xlsx.
https://github.com/pandas-dev/pandas/issues/3441

Loading

@Themanwithoutaplan
Copy link
Contributor

@Themanwithoutaplan Themanwithoutaplan commented Mar 4, 2017

Loading

@jgonzale
Copy link

@jgonzale jgonzale commented Mar 9, 2017

@ankostis, @aa3222119 when I follow the steps you comment, I always reach the following error:

Traceback (most recent call last):
File "./name_manipulation.py", line 60, in
df.to_excel(excel_writer, 'iadatasheet', startcol=0, startrow=5, columns=['codes', 'Zona Basica de Salud', month+"-"+year], index=False)
File "/Users/jgonzalez.iacs/Projects/SIIDI/PYTHON_ETLs/venv/lib/python3.4/site-packages/pandas/core/frame.py", line 1464, in to_excel
startrow=startrow, startcol=startcol)
File "/Users/jgonzalez.iacs/Projects/SIIDI/PYTHON_ETLs/venv/lib/python3.4/site-packages/pandas/io/excel.py", line 1306, in write_cells
wks = self.book.create_sheet()
AttributeError: 'str' object has no attribute 'create_sheet'

So, there is not solution yet, right?

Thanks

Loading

@ankostis
Copy link

@ankostis ankostis commented Mar 9, 2017

Maybe the API has changed - it definitely worked back then.

Loading

@Themanwithoutaplan
Copy link
Contributor

@Themanwithoutaplan Themanwithoutaplan commented Mar 9, 2017

@jgonzale which engine are you using?

Loading

@aa3222119
Copy link

@aa3222119 aa3222119 commented Mar 10, 2017

@jgonzale by what python said , your excel_writer.book maybe just a str but not a workbook?

Loading

@jgonzale
Copy link

@jgonzale jgonzale commented Mar 10, 2017

@aa3222119 Oh geez! You were right! Messing around with very similar names!

Thank you very much! 👏 👏 👏

Loading

@wxl3322335
Copy link

@wxl3322335 wxl3322335 commented Apr 3, 2017

thank you very much!

Loading

@BLMeltdown
Copy link

@BLMeltdown BLMeltdown commented Jun 12, 2017

Hello
I have some use case where it would be useful:
Even with the ExcelWriter trick as:

with ExcelWriter('foo.xlsx') as writer:
df.to_excel(writer, 'Data 0')
df2.to_excel(writer, 'Data 1')

you can't add a plot that you need without saving the file and reopening it. With the risk of meddling with any formatting you have in the workbook.

There is indeed the workaround to use the plotting functions from pandas to save these in the files, but (there is a but), when you need something a little more sophisticated like showing a PCA components graph you built from scikitlearn PCA and matplotlib, then it becomes tedious.

Hence
a pandas.nondf_manager (non df object or filename).to_excel(usual syntax)
would be exceedingly fine.
Thanks.

Loading

@orbitalz
Copy link

@orbitalz orbitalz commented Jul 24, 2017

I don't know how it is possible, however, it works for me

create_excel = 0
if plot_spectra != 0:
    for x in range(min_sigma, max_sigma, step_size):
        # apply gaussian
        df1 = gaussian_filter(df, sigma=x, mode=padding_mode)
        df2 = pd.DataFrame(df1)
        if save_file:
            if save_csv:
                df2.to_csv('{} {}{}.csv'.format(Output_file, 'sigma_', x,))
            if save_xlsx:
                if os.path.isfile('{}.xlsx'.format(Output_file)):
                    print("Warning! Excel file is exist")
                    break
                if create_excel == 0:
                    xlsx_writer = pd.ExcelWriter('{}.xlsx'.format(Output_file), engine='xlsxwriter')                                          
                    create_excel += 1
                df2.to_excel(xlsx_writer, '{}{}'.format('sigma_', x))
                if x == max_sigma-1:
                    xlsx_writer.save()

At the end, I got the excel file which have several work sheets.

Loading

@jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Jul 27, 2017

@orbitalz you are creating an excel file the first time (xlsx_writer = pd.ExcelWriter(..)), and then adding multiple sheets to that file object. That is supported, but this issue is about adding sheets to an existing excel file.

Loading

@orbitalz
Copy link

@orbitalz orbitalz commented Aug 1, 2017

I'm sorry for misunderstanding the topic and Thank you for pointing me out :)

Loading

@tlysecust
Copy link

@tlysecust tlysecust commented May 23, 2018

@orbitalz You solve my problem ,but I don't known how it works

Loading

@jreback jreback removed this from the Someday milestone Jun 19, 2018
@jreback jreback added this to the 0.24.0 milestone Jun 19, 2018
@ivoska
Copy link

@ivoska ivoska commented Sep 23, 2019

mode={'a'} does not work as the documentation suggests
this is still a buggy mess

Loading

@codewithpatch
Copy link

@codewithpatch codewithpatch commented Nov 4, 2019

Appending in the existing worksheet seems to work with
writer = pd.ExcelWriter('filename.xlsx', mode='a')

But, this only appends and does not overwrite sheets with the same sheetname

Example, my existing workbook has a sheetname 'mySheet'
If I try to do:
df.to_excel(writer, 'mySheet')
It will create a new sheet 'mySheet1' instead of rewriting the existing 'mySheet'

I wonder if there's any other way to append in the existing workbook, but overwriting sheets that you want to overwrite.

Hope someone helps.

Loading

@anvesha-nextsteps
Copy link

@anvesha-nextsteps anvesha-nextsteps commented Nov 28, 2019

By using openpyxl as engine in ExcelWriter
writer = pd.ExcelWriter(filename, engine='openpyxl')
df.to_excel(writer, sheet_name)
at writer.save() i am getting this error
TypeError: got invalid input value of type <class 'xml.etree.ElementTree.Element'>, expected string or Element

Loading

@irishun
Copy link

@irishun irishun commented Mar 15, 2020

By using openpyxl as engine in ExcelWriter
writer = pd.ExcelWriter(filename, engine='openpyxl')
df.to_excel(writer, sheet_name)
at writer.save() i am getting this error
TypeError: got invalid input value of type <class 'xml.etree.ElementTree.Element'>, expected string or Element

I have met the same error. Has anyone solved this issue?

Loading

@LittleMoDel
Copy link

@LittleMoDel LittleMoDel commented Mar 20, 2020

engine should change to openyxl,because the default engine'xlsxwriter' NOT support append mode !

`
import pandas as pd

df= pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'], 'value': [1, 2, 3, 5]})

#engine should change to openyxl,because the default engine'xlsxwriter' NOT support append mode !

writer = pd.ExcelWriter('exist.xlsx',mode='a',engine='openpyxl')

df.to_excel(writer, sheet_name ='NewSheet')

writer.save()

writer.close()

`


Pandas chooses an Excel writer via two methods:

the engine keyword argument
the filename extension (via the default specified in config options)
By default, pandas uses the XlsxWriter for .xlsx, openpyxl for .xlsm, and xlwt for .xls files. If you have multiple engines installed, you can set the default engine through setting the config options io.excel.xlsx.writer and io.excel.xls.writer. pandas will fall back on openpyxl for .xlsx files if Xlsxwriter is not available.

To specify which writer you want to use, you can pass an engine keyword argument to to_excel and to ExcelWriter. The built-in engines are:

  • openpyxl: version 2.4 or higher is required
  • xlsxwriter
  • xlwt

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.