Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ExcelWriter() to add sheets to existing workbook #3441

Closed
ligon opened this issue Apr 23, 2013 · 35 comments

Comments

Projects
None yet
@ligon
Copy link

commented Apr 23, 2013

The ability of ExcelWriter to save different dataframes to different worksheets is great for sharing those dfs with the python-deficient. But this quickly leads to a need to add worksheets to an existing workbook, not just creating one from scratch; something like:

df0=pd.DataFrame(np.arange(3))
df0.to_excel('foo.xlsx','Data 0')

df1=pd.DataFrame(np.arange(2))
df1.to_excel('foo.xlsx','Data 1')

The following little diff to io/parsers.py implements this behavior for *.xlsx files:

diff --git a/pandas/io/parsers.py b/pandas/io/parsers.py
index 89f892d..7f010ee 100644
--- a/pandas/io/parsers.py
+++ b/pandas/io/parsers.py
@@ -2099,12 +2099,19 @@ class ExcelWriter(object):
             self.fm_date = xlwt.easyxf(num_format_str='YYYY-MM-DD')
         else:
             from openpyxl.workbook import Workbook
-            self.book = Workbook()  # optimized_write=True)
-            # open pyxl 1.6.1 adds a dummy sheet remove it
-            if self.book.worksheets:
-                self.book.remove_sheet(self.book.worksheets[0])
+            from openpyxl.reader.excel import load_workbook
+
+            try:
+               self.book=load_workbook(filename = path)
+               self.sheets={wks.title:wks for wks in self.book.worksheets}
+            except InvalidFileException:
+                self.book = Workbook()  # optimized_write=True)
+                # open pyxl 1.6.1 adds a dummy sheet remove it
+                if self.book.worksheets:
+                    self.book.remove_sheet(self.book.worksheets[0])
+                self.sheets = {}
+
         self.path = path
-        self.sheets = {}
         self.cur_sheet = None

Doing this for *.xls files is a little harder.

@ligon ligon closed this Apr 23, 2013

@ligon ligon reopened this Apr 23, 2013

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 22, 2013

@jtratner is this still a bug/needed enhancement?

@jtratner

This comment has been minimized.

Copy link
Contributor

commented Sep 22, 2013

Because of how to_excel is set up, this would mean reading in and then writing the file each time (because to_excel with a path argument saves the file). The right way to do this is to use ExcelWriter:

import pandas as pd
writer = pd.ExcelWriter('foo.xlsx')
df.to_excel(writer, 'Data 0')
df.to_excel(writer, 'Data 1')
writer.save()

I could see (eventually) adding an option to ExcelWriter that doesn't overwrite the file. But, yet again, that may mean writing in the entire file first. I don't know.

@ghost ghost assigned jtratner Sep 22, 2013

@jtratner

This comment has been minimized.

Copy link
Contributor

commented Sep 22, 2013

I'm going to add something to the docs about this, maybe a test case with this, and I'll look into adding an option to read in the file, but it depends on how xlwt and openpyxl work.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 22, 2013

@jtratner what about a context manager get_excel?

with get_excel('foo.xlsx') as e:
    df.to_excel(e,'Data 0)
    df.to_excel(e,'Data 1)

?

@jtratner

This comment has been minimized.

Copy link
Contributor

commented Sep 22, 2013

how about we just make ExcelWriter into a contextmanager instead? it'll just call save at the end. Much simpler.

@jtratner

This comment has been minimized.

Copy link
Contributor

commented Sep 23, 2013

@ligon you can do this now this way:

with ExcelWriter('foo.xlsx') as writer:
    df.to_excel(writer, 'Data 0')
    df2.to_excel(writer, 'Data 1')

If you don't use the with statement, just have to call save() at the end.

@ligon

This comment has been minimized.

Copy link
Author

commented Sep 23, 2013

Excellent. And great that it has an exit method.

Thanks,
-Ethan Ligon

Ethan Ligon, Associate Professor
Agricultural & Resource Economics
University of California, Berkeley

@dylancis

This comment has been minimized.

Copy link

commented Jan 10, 2014

I was extremely interesting by the request made by @ligon - but seems this is already there.
However using 0.12.0 pd version, when I am doing:
df = DataFrame([1,2,3])
df2 = DataFrame([5,5,6])
with ExcelWriter('foo.xlsx') as writer:
df.to_excel(writer, 'Data 0')
df2.to_excel(writer, 'Data 1')

Assumning foo.xlsx was containing a sheet named 'bar', basgot delete after the command run. While as per your comment, i was expecting to keep it in my foo excel file. Is that a bug?

@frenet

This comment has been minimized.

Copy link

commented Apr 4, 2014

is it hard to add sheets to an existing excel file on the disk?
import pandas as pd
import numpy as np
a=pd.DataFrame(np.random.random((3,1)))
excel_writer=pd.ExcelWriter('c:\excel.xlsx')
a.to_excel(excel_writer, 'a1')
excel_writer.save()

excel_writer=pd.ExcelWriter('c:\excel.xlsx')
a.to_excel(excel_writer, 'a2')
excel_writer.save()

here only sheet 'a2" is save, but I like to save both 'a1' and 'a2'.

I know it is possible to add sheets to an existing workbook.

@jtratner

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2014

It's definitely possible to add sheets to an existing workbook, but it's
not necessarily easy to do it with pandas. I think you'd have to read the
workbook separately and then pass it into the ExcelWriter class... That
would be something we could consider supporting.

@jtratner

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2014

And I think if you subclass the ExcelWriter instance you want to use and
overwrite its__init__ method, as long as you set self.book it should work.
That said, no guarantee that this would continue to work in future
versions, since it's only a quasi-public API

@ankostis

This comment has been minimized.

Copy link

commented Dec 17, 2015

This stackoverflow workaround, which is based in openpyxl, may work
(EDIT: indeed works, checked with pandas-0.17.0):

import pandas
from openpyxl import load_workbook

book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') 
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()
@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 18, 2015

this would be pretty easy to implement inside ExcelWriter (Oder patch above)

prob just need to add a mode=kw and default to w and make a be append

@pylang

This comment has been minimized.

Copy link

commented May 11, 2016

Was this ever patched?

@jreback

This comment has been minimized.

Copy link
Contributor

commented May 11, 2016

it seems that you can work around it (see above), but I suppose would be nice to actually do it from pandas.

@andreacassioli

This comment has been minimized.

Copy link

commented Nov 21, 2016

Hi, any follow up on this issue?
I can provide a use case: I have excel files with pivot tables and pivot graphs that I need to reach out people not proficient in Python.

My idea was to use pandas to add a sheet that contains the data for the pivot. But up to know I am stuck and the proposed workaround, thought not difficult, sounds a bit cumbersome . It would make sense to jsut have an option whether overwrite an existing file.

@zeluspudding

This comment has been minimized.

Copy link

commented Dec 11, 2016

Let me echo @jreback , it would be super nice if I could just add a sheet to an excel workbook from pandas.

@jorisvandenbossche jorisvandenbossche modified the milestones: Next Major Release, 0.13, Someday Dec 11, 2016

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Dec 11, 2016

To be clear, we agree that this would be a nice functionality, and would certainly welcome a contribution from the community.
Best way to have this in pandas is to make a PR!

@aa3222119

This comment has been minimized.

Copy link

commented Feb 16, 2017

@jmcnamara how to use pandas.to_excel(Writer maybe use pandas.ExcelWriter) to add some data to an existed file , but not rewrite it??

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Feb 16, 2017

@aa3222119 That is exactly what this issue is about: an enhancement request to add this ability, which is not yet possible today.

(BTW, it is not needed to post the same question in multiple issues)

@aa3222119

This comment has been minimized.

Copy link

commented Feb 16, 2017

sorry. i will delete that one . BTW, will that be possible some day later? @jorisvandenbossche

@jmcnamara

This comment has been minimized.

Copy link
Contributor

commented Feb 16, 2017

BTW will that be possible some day later? @jmcnamara

This isn't and won't be possible when using XlsxWriter as the engine. It should be possible when using OpenPyXL. I've seen some examples on SO like this one: jmcnamara/excel-writer-xlsx#157

@aa3222119

This comment has been minimized.

Copy link

commented Feb 16, 2017

3Q very much! @jmcnamara
it is exactly what you said . use openpyxl 👍

import pandas as pd
from openpyxl import load_workbook
book = load_workbook('text.xlsx')
writer = pd.ExcelWriter('text.xlsx', engine='openpyxl')
writer.book = book
pd.DataFrame(userProfile,index=[1]).to_excel(writer,'sheet111',startrow=7,startcol=7)
pd.DataFrame(userProfile,index=[1]).to_excel(writer,'sheet123',startrow=0,startcol=0)
writer.save()
pd.DataFrame(userProfile,index=[1]).to_excel(writer,'sheet123',startrow=3,startcol=3)
writer.save()

all can be added to text.xlsx.
https://github.com/pandas-dev/pandas/issues/3441

@Themanwithoutaplan

This comment has been minimized.

@jgonzale

This comment has been minimized.

Copy link

commented Mar 9, 2017

@ankostis, @aa3222119 when I follow the steps you comment, I always reach the following error:

Traceback (most recent call last):
File "./name_manipulation.py", line 60, in
df.to_excel(excel_writer, 'iadatasheet', startcol=0, startrow=5, columns=['codes', 'Zona Basica de Salud', month+"-"+year], index=False)
File "/Users/jgonzalez.iacs/Projects/SIIDI/PYTHON_ETLs/venv/lib/python3.4/site-packages/pandas/core/frame.py", line 1464, in to_excel
startrow=startrow, startcol=startcol)
File "/Users/jgonzalez.iacs/Projects/SIIDI/PYTHON_ETLs/venv/lib/python3.4/site-packages/pandas/io/excel.py", line 1306, in write_cells
wks = self.book.create_sheet()
AttributeError: 'str' object has no attribute 'create_sheet'

So, there is not solution yet, right?

Thanks

@ankostis

This comment has been minimized.

Copy link

commented Mar 9, 2017

Maybe the API has changed - it definitely worked back then.

@Themanwithoutaplan

This comment has been minimized.

Copy link
Contributor

commented Mar 9, 2017

@jgonzale which engine are you using?

@aa3222119

This comment has been minimized.

Copy link

commented Mar 10, 2017

@jgonzale by what python said , your excel_writer.book maybe just a str but not a workbook?

@jgonzale

This comment has been minimized.

Copy link

commented Mar 10, 2017

@aa3222119 Oh geez! You were right! Messing around with very similar names!

Thank you very much! 👏 👏 👏

@wxl3322335

This comment has been minimized.

Copy link

commented Apr 3, 2017

thank you very much!

@BLMeltdown

This comment has been minimized.

Copy link

commented Jun 12, 2017

Hello
I have some use case where it would be useful:
Even with the ExcelWriter trick as:

with ExcelWriter('foo.xlsx') as writer:
df.to_excel(writer, 'Data 0')
df2.to_excel(writer, 'Data 1')

you can't add a plot that you need without saving the file and reopening it. With the risk of meddling with any formatting you have in the workbook.

There is indeed the workaround to use the plotting functions from pandas to save these in the files, but (there is a but), when you need something a little more sophisticated like showing a PCA components graph you built from scikitlearn PCA and matplotlib, then it becomes tedious.

Hence
a pandas.nondf_manager (non df object or filename).to_excel(usual syntax)
would be exceedingly fine.
Thanks.

@orbitalz

This comment has been minimized.

Copy link

commented Jul 24, 2017

I don't know how it is possible, however, it works for me

create_excel = 0
if plot_spectra != 0:
    for x in range(min_sigma, max_sigma, step_size):
        # apply gaussian
        df1 = gaussian_filter(df, sigma=x, mode=padding_mode)
        df2 = pd.DataFrame(df1)
        if save_file:
            if save_csv:
                df2.to_csv('{} {}{}.csv'.format(Output_file, 'sigma_', x,))
            if save_xlsx:
                if os.path.isfile('{}.xlsx'.format(Output_file)):
                    print("Warning! Excel file is exist")
                    break
                if create_excel == 0:
                    xlsx_writer = pd.ExcelWriter('{}.xlsx'.format(Output_file), engine='xlsxwriter')                                          
                    create_excel += 1
                df2.to_excel(xlsx_writer, '{}{}'.format('sigma_', x))
                if x == max_sigma-1:
                    xlsx_writer.save()

At the end, I got the excel file which have several work sheets.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jul 27, 2017

@orbitalz you are creating an excel file the first time (xlsx_writer = pd.ExcelWriter(..)), and then adding multiple sheets to that file object. That is supported, but this issue is about adding sheets to an existing excel file.

@orbitalz

This comment has been minimized.

Copy link

commented Aug 1, 2017

I'm sorry for misunderstanding the topic and Thank you for pointing me out :)

@tlysecust

This comment has been minimized.

Copy link

commented May 23, 2018

@orbitalz You solve my problem ,but I don't known how it works

@WillAyd WillAyd referenced this issue May 29, 2018

Merged

Append Mode for ExcelWriter with openpyxl #21251

4 of 4 tasks complete

@jreback jreback modified the milestones: Someday, 0.24.0 Jun 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.