Unicode handling in `to_latex`. Needs encoding? #7061

Closed
jseabold opened this Issue May 6, 2014 · 10 comments

Comments

Projects
None yet
5 participants
Contributor

jseabold commented May 6, 2014

I can't seem to get this one to work and to_latex doesn't allow a user-specified encoding. I think this might need a look.

I have it in unicode, so try that way.

pd.DataFrame([[u'au\xdfgangen']]).to_latex('test.tex')

Nope. Ok, so let's encode it as a utf-8 string

pd.DataFrame([[u'au\xdfgangen']]).apply(lambda x : x.str.encode('utf-8')).to_latex('test.tex')

Nope. It looks like it's getting coerced back to unicode in formatter._to_str_columns() then tries to write it as ASCII...

jseabold added the Data IO label May 6, 2014

Contributor

jseabold commented May 6, 2014

Can pass a StringIO instance to buf, then encode and write this yourself as a workaround.

Contributor

TomAugspurger commented May 7, 2014

Works correctly in python 3 as well.

I've got a fix that seems to work for python 2. Changing

            with open(self.buf, 'w') as f:
                write(f, frame, column_format, strcols, longtable)

to

            import codecs
            with codecs.open(self.buf, 'wb', encoding=encoding) as f:
                write(f, frame, column_format, strcols, longtable)

along with adding an encoding kwarg to to_latex (default to utf-8?). I haven't done much with unicode, so I'm still reading about it. Let me know if this seems wrong to you, or if it needs to be done elsewhere.

Contributor

jseabold commented May 7, 2014

Yes, I'm slowly trying to move to python 3 partially for this reason.

That seems reasonable to me. I assumed that the other functions just encoded the unicode/string according to the given encoding, but I'm not sure.

The default in to_csv and friends is encoding=None. I assume it falls back to the default encoding for the locale, but I'm not positive on that. I started to check but I'm under a deadline right now.

jreback added this to the 0.15.0 milestone May 7, 2014

@jreback jreback modified the milestone: 0.16.0, Next Major Release Mar 3, 2015

Contributor

nbonnotte commented Nov 20, 2015

I just encountered the same problem with pandas 0.17, so I guess the fix has not been included?

@TomAugspurger do you intent on making a PR?

Contributor

TomAugspurger commented Nov 20, 2015

I never got around to submitting a pull request. Feel free to do so if you want! My fix above might work (would need to be tested), but it might be better to tie this in with how to_csv handles encodings (not sure, haven't looked).

Contributor

nbonnotte commented Nov 20, 2015

Good point. On the same vein, I just noticed that the decimal option is available for to_csv but not for to_latex...

I'll have a look.

Contributor

TomAugspurger commented Nov 20, 2015

There’s a possibility that we’ll be able to replace some of the to_latex code with a Jinja template, similar to the Style stuff. So don’t spend too much time on it :)

On Nov 20, 2015, at 8:27 AM, Nicolas Bonnotte notifications@github.com wrote:

Good point. On the same vein, I just noticed that the decimal option is available for to_csv but not for to_latex...

I'll have a look.


Reply to this email directly or view it on GitHub pydata#7061 (comment).

Contributor

nbonnotte commented Nov 28, 2015

to_csv uses csv.writer, with an adapter to intercept the output and convert it to utf-8. It would be possible to factorize code so that the same may be used for both to_csv and to_latex, but it would require a bit of work.

Considering this, and your previous remark, and the simplicity of your solution, I'll just implement the latter. But the encoding parameter of to_csv defaults to ascii with Python 2 and to utf-8 for Python 3, so I'll do that for to_latex.

Contributor

jreback commented Nov 29, 2015

@nbonnotte yes, this just requires a encoding argument. you may want to add a LatexFormatter (as a sub-class of DataFrameFormatter) as this will allow some re-factoring to be internally done later on.

@jreback jreback modified the milestone: 0.18.0, Next Major Release Jan 11, 2016

Contributor

jreback commented Jan 15, 2016

closed by #11914

jreback closed this Jan 15, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment