Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatting a column #204

Closed
drorata opened this issue Dec 24, 2014 · 7 comments
Closed

Formatting a column #204

drorata opened this issue Dec 24, 2014 · 7 comments

Comments

@drorata
Copy link

drorata commented Dec 24, 2014

I fail to format a column when exporting from pandas.DataFrame. Consider the following minimal example:

import pandas as pd
from pandas import ExcelWriter
df =pd.DataFrame([[123456,2],[3,4]], columns=['First column', 'Second column'])
writer = ExcelWriter('test.xlsx', engine = 'xlsxwriter')
df.to_excel(writer)
formater = writer.book.add_format({'num_format': '#,##0'})
sheet = writer.book.worksheets()[0]
sheet.write('F5',1234.8888, formater)
sheet.set_column(1,1, 10, formater)
writer.save()

As you can note, the number in cell F5 is well-formatted, but the number in B2 (in column 1) is not formatted (and the width of the column is not changed).

In contrast, a similar approach works when it doesn't come from Pandas:

import xlsxwriter

# Create a workbook and add a worksheet.
workbook = xlsxwriter.Workbook('Expenses01.xlsx')
worksheet = workbook.add_worksheet()

# Some data we want to write to the worksheet.
expenses = (
    ['Rent', 1000],
    ['Gas',   100],
    ['Food',  300],
    ['Gym',    50],
)

# Start from the first cell. Rows and columns are zero indexed.
row = 0
col = 0

# Iterate over the data and write it out row by row.
for item, cost in (expenses):
    worksheet.write(row, col,     item)
    worksheet.write(row, col + 1, cost)
    row += 1

worksheet.set_column(1,1,10,workbook.add_format({'bold': True}))
workbook.close()

What am I doing wrong?

Thanks in advance,
Dror

@jmcnamara
Copy link
Owner

Hi Dror,

Thank you for the detailed report and examples. The set_column() method is working. Consider the following variation on your first program:

import pandas as pd
from pandas import ExcelWriter

df = pd.DataFrame([[123456, 2], [3, 4]],
                  columns=['First column', 'Second column'])

writer = ExcelWriter('test.xlsx', engine='xlsxwriter')

df.to_excel(writer)

formater = writer.book.add_format({'num_format': '#,##0'})
sheet = writer.book.worksheets()[0]

sheet.write('B5', 1234.8888)
sheet.set_column(1, 1, 40, formater)
writer.save()

And output:
screenshot

The issue is that pandas already applies a format to the numbers in cells B1 and B3. In cases like this XlsxWriter behaves like Excel. Cell formatting overwrites Row formatting which overwrites Column formatting. So since there is already a cell format you won't get the column format.

Note however that cell B5 does have column format.

It is possible to specify the cell number format in pandas but that will apply to all cells with data.

Apart from that, or using XlsxWriter outside of pandas, there isn't a workaround.

Regards,

John

@drorata
Copy link
Author

drorata commented Dec 29, 2014

So set_column sets the width correctly (because Pandas didn't set it on a cell-base-level) but it fails to set the formatting since Pandas did ti on the cell-level-base. Is my understanding correct?

Can you provide some link to the Pandas documentation on the matter? Where is it setting the format of a cell?

@jmcnamara
Copy link
Owner

@drorata

So set_column sets the width correctly (because Pandas didn't set it on a cell-base-level) but it fails to set the formatting since Pandas did ti on the cell-level-base. Is my understanding correct

Yes. Pandas sets a cell level format and in Excel cell format overrides row format overrides column format.

Can you provide some link to the Pandas documentation on the matter? Where is it setting the format of a cell?

I was wrong about that. Pandas only sets a cell format for headers and for cells that contain either a datetime.date or datetime.datetime. So in your case the column format should be applied to the cells.

The reason that it isn't is due to a bug/feature: all cells in the Pandas XlsxWriter handler have a default general format applied to them. And since I wrote the XlsxWriter code in Pandas it is my bug. :-(

I'll submit a bug report and fix to Pandas. In the meantime you can workaround it by patching your local copy of the _XlwtWriter._convert_to_style() method in pandas/io/excel.py as follows:

diff --git a/pandas/io/excel.py b/pandas/io/excel.py
index e81c279..637937d 100644
--- a/pandas/io/excel.py
+++ b/pandas/io/excel.py
@@ -815,6 +815,10 @@ class _XlsxWriter(ExcelWriter):
         num_format_str: optional number format string
         """

+        # If there is no formatting we don't create a format object.
+        if num_format_str is None and style_dict is None:
+            return None
+
         # Create a XlsxWriter format object.
         xl_format = self.book.add_format()

I'll post an update here when I submit a fix to Pandas.

John

@drorata
Copy link
Author

drorata commented Dec 29, 2014

Wonderful! Worked like a charm.

@jmcnamara
Copy link
Owner

Hi Dror,

I've submitted a bug report and patch to pandas: pandas-dev/pandas#9167

I'll close this issue since it isn't strictly an XlsxWriter issue and you can add a +1 to the pandas issue so that you get an update when it is closed.

Thanks for the report.

John

@jmcnamara
Copy link
Owner

This is fixed and merged in pandas now.

@drorata
Copy link
Author

drorata commented Jan 6, 2015

Great news! Thanks for the effort.
On Jan 6, 2015 2:23 AM, "John McNamara" notifications@github.com wrote:

This is fixed and merged in pandas now.


Reply to this email directly or view it on GitHub
#204 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants