Excel output in non-ascii encodings #3710

jtornero · 2013-05-29T23:06:10Z

First, I would like to thank the developers for this excellent work.

My patch is about the possibility of specify the workbook encoding when making output to excel xls files. By default, pandas hasn't it, so say-so non ascii language speakers weren't able to save files with special characters, receiving an error like, for instance:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)

My change makes possible to specify both in ExcelWriter and DataFrame.to_excel the parameter encoding which is passed by both the ExcelWriter init and DataFrame.to_excel functions to the appropiate xlwt function to make the trick.

I'm sorry my explanations aren't very proffesional, but I think they are enough for this purpose.

Also I have to apologize in the sense that I've run the tests with nosetests and I lack of knowledge about tests interpretation, but I have checked the log and no error comes from the modules/files I've worked with.

Best regards,

Jorge Tornero

…using xlwt

…using xlwt, also for output without ExcelWriter - just filename

…using xlwt, also for output without ExcelWriter - just filename - final

jreback · 2013-05-29T23:55:19Z

@jtornero this is nice, but a couple of things.

the argument encoding needs to be moved to the end of the named arguments (as its a new argument)
needs some tests, e.g. look at _check_extension_mixed in test_excel.py; at the very least need to create a frame, write it with an encoding, then read it back and check for equality

jreback · 2013-05-31T22:27:32Z

@jtornero

couple of things

don't include the test output (test2.txt) as a file
put the tests in test_excel (don't create another test file)
don't include extra files in the repo, just write them out and then read them back in to test
(if you really need say 1 file to make sure the encoding is correct, ok)
is there a reason that the encoding should even have a default? shouldn't it be empty? (e.g. let the system encoding determine it), unless overriden by the user?

jreback · 2013-05-31T22:29:13Z

also...need to rebase off of master

git br your-branch --set-upstream master
git checkout master

git pull

git checkout your-branch
git pull --rebase

then merge conflcts

the excel stuff has been moved to pandas.io.excel (tests are in the same location)....these should show up as a merge conflict

fix and proceed

thanks

jtornero · 2013-06-01T01:11:21Z

@jreback

First, I think I should learn this git stuff a bit before doing the big mess. I have to recognize that git overwhelmes me a little.

I tried to do what you said and no way. Maybe it is better for me (and for all) to start in git from scratch (delete my fork and start again)

Anyway, about your suggest has two optionals parameters:ions (will be followed in the future)

I won't include the test results file in the future. I thought that maybe it will be useful for someone interested in the same issue
I created a separate test file apart from test_excel because (AFAIK) I must specify the # -- coding: -- specified in PEP 0263, taking into account that my non-ascii sample data must contain non ascii characters. I don't know if it a better idea to have such data in a file apart, as I think you suggest in your point 3), load it, and make the savings/conversions
In this line, the extra files are those generated by the test itself. I guess I should provide a way to get rid of them after test completion, shouldn't I?
pandas.ExcelWriter/.toExcel makes use of xlwt.Workbook(). This class has an optional parameter encoding, of which default is ascii and it what makes the magic here. As far as my python knowledge lets me go, I need to pass that parameter to ExcelWriter to be passed to xlwt_Workbook. And all the say-so ascii writer have it already made, I think.

Best regards,

Jorge Tornero

jreback · 2013-06-03T00:47:14Z

@jtornero the encoding my default in xlwt.Workbook(), but encoding=None should be the default in to_excel. so the idea is to test the following cases

encoding=None, the current behavior and the default
encoding='ascii', should equal the above case
encoding='another encoding'

these are easy enough to test without an actual data file; this will temporarily create a file and clean it up when done.

you will get read_excel if you rebase off of master (or start over), up to u, git is an animal!

with ensure_clean('test_encoding.xls') as path:
    df.to_excel(path,encoding=whatever)
    result = read_excel(path,encoding=whatever)
    assert_frame_equal(result,df)

jreback · 2013-08-23T01:58:44Z

@jtornero how's this coming along?

jtornero · 2013-08-28T08:41:16Z

Hello again. Sorry for the time been away. I'll be on holiday in a week, and I think I'll have some spare time to try to cope with it. I really need to start over with my git forking etc., because I made sort of mess with my repository. Maybe it's a big thing for my level of knowledge, but I'll try to do it again.

Best regards

jreback · 2013-09-26T00:52:14Z

ping!

jtornero · 2013-09-28T12:21:16Z

Yes, I'm still alive. I've started again and I'm going to try to make it work this weekend. Sorry.

jtornero · 2013-09-28T18:37:47Z

Well I've made it and it passes the tests. But I can't manage to commit my changes in this *****+ of git. I know it's my fault but I am sooooo disappointed... need help about this. Also, the changes are so simle after the new code of excel output. But I'm so stuck... guess I'll be able to put my changes here some time.

I don't know if it's the thing about Travis IC or what, but 've done this:

git clone https://github.com/jtornero/pandas.git

git checkout -b excelout_encodings

after my changes:

git add .

git status

I've got this:

On branch excelout_encodings
Changes to be committed:
(use "git reset HEAD ..." to unstage)

   modified:   pandas/core/frame.py
   modified:   pandas/io/excel.py
   modified:   pandas/io/tests/test_excel.py

Then
git commit -a -m "#3710 ENH/TST: Support for non-ascii encodings in DataFrame.to_excel"

and then:

git push origin master

Well... then nothing happens. I guess I should be able to see my commit in my repository... but nothing happens!!!

I don't know how to do it... maybe it is related to Travis IC?

Thank yo very much

jtratner · 2013-09-28T18:42:07Z

@jtornero it looks like you aren't making that many changes. Might be easier to go back and checkout pydata/master then apply just your changes to it.

jreback · 2013-09-29T23:05:41Z

@jtornero I believe the #5025 is the current PR, this should be closed, yes? (as its not an issue but another PR for the same issue)

jtornero · 2013-09-30T09:36:46Z

I guess everything could be "assigned" to #3710, but I had to re-git everything and started an new pull request, hence the #5025... Sorry for my unskillfullness

jreback · 2014-03-12T13:01:15Z

merged via 268ee80

jtornero added 3 commits May 29, 2013 23:46

Added support for encodings other than ascii when exporting to Excel …

d669e5c

…using xlwt

Added support for encodings other than ascii when exporting to Excel …

5b539ea

…using xlwt, also for output without ExcelWriter - just filename

Added support for encodings other than ascii when exporting to Excel …

8bf91ff

…using xlwt, also for output without ExcelWriter - just filename - final

Style changes and test for excel output with encoding

4a6e21e

jtornero mentioned this pull request Sep 29, 2013

ENH: Excel output in non-ascii encodings #5025

Closed

jreback closed this Sep 30, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excel output in non-ascii encodings #3710

Excel output in non-ascii encodings #3710

jtornero commented May 29, 2013

jreback commented May 29, 2013

jreback commented May 31, 2013

jreback commented May 31, 2013

jtornero commented Jun 1, 2013

jreback commented Jun 3, 2013

jreback commented Aug 23, 2013

jtornero commented Aug 28, 2013

jreback commented Sep 26, 2013

jtornero commented Sep 28, 2013

jtornero commented Sep 28, 2013

jtratner commented Sep 28, 2013

jreback commented Sep 29, 2013

jtornero commented Sep 30, 2013

jreback commented Mar 12, 2014

Excel output in non-ascii encodings #3710

Excel output in non-ascii encodings #3710

Conversation

jtornero commented May 29, 2013

jreback commented May 29, 2013

jreback commented May 31, 2013

jreback commented May 31, 2013

jtornero commented Jun 1, 2013

jreback commented Jun 3, 2013

jreback commented Aug 23, 2013

jtornero commented Aug 28, 2013

jreback commented Sep 26, 2013

jtornero commented Sep 28, 2013

jtornero commented Sep 28, 2013

jtratner commented Sep 28, 2013

jreback commented Sep 29, 2013

jtornero commented Sep 30, 2013

jreback commented Mar 12, 2014