-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python 3 writing to_csv file ignores encoding argument. #13068
Comments
you would have to show a reproducible example. why does this have to do with excel? you are reporting a csv issue, no? excel being able to read something doesn't prove (or disprove) anything. |
futhermore show |
@jreback updated issue to remove Excel problem |
so will still need a copy-pastable example. |
|
@jreback updated with copy pastable |
This looks to be a design flaw in all "io" outputs that take encodings and file objects on Python 3. |
what's the problem? |
works on 0.18.0 as well. |
The first call ignores the encoding... The first assert should fail
|
hmm, you are opening it in text mode. Not really sure if a stream indicates its text or binary. I don't know that this is a bug on pandas side. Can you repro using non-pandas? |
If I open the file in binary mode, pandas tries to write str to the file
|
can j show what happens? eg that would be the test |
#!/usr/bin/env python3 On 3 May 2016 19:57, "Jeff Reback" notifications@github.com wrote:
|
ahh I see now. ok, it prob needs to be opened with a codec, so when the stream is created it should be inserted there. since you are familiar, want to do a PR? |
looks really similar to #9712 |
@jaidevd what's the desired behaviour? Crash on passing a Unicode writer, or deprecate the encoding keyword argument in favour of passing Unicode writers only? |
no i think I would raise a more informative message. If a user wants to pass a non-compat stream (and we can't do anything with it), then must raise. most usage does not pass a stream when writing. |
@jreback so crash if a unicode accepting stream is passed, and raise an informative error. |
well if passed a non unicode accepting stream when an encoding is passed I guess. I don't think their is a way to fix it? raising an exception that is helpful is just fine. |
@jreback we need the to_csv and related functions to support: either binary file objects and the encoding argument; or unicode objects without the encoding argument. |
I thought that's what I said. |
@jreback I thought you meant the status quo: neither binary file objects and the encoding argument; or unicode objects without the encoding argument. But with better exceptions. |
oh you are saying 2 issues. I didn't really look too closely. I am all for writing things correctly, or raising if its incorrect. As I said I suspect we have very little testing on writing unicode with streams now (maybe no tests), esp with alternate encodings. This is quite uncommon. Would be ok with complete tests and write if possible, raising if not. |
so always write bytes, regardless of Python version. With nice exceptions when writing to unicode streams. |
no, I believe the existing impl in py2 is correct. Write out tests for all cases, test them under both versions and you will have the answer. |
Hi ! To be more specific, the problem comes from the following code (modified to focus on the problem and be copy pastable):
If run with python2, I actually get two files with different encoding:
But with python3, they both are utf-8 encoded:
I know magic only guesses the encoding, but this seemed a clear way of showing the difference. A better proof is to try to decode the text written in the files. Using python codecs module, you get: python2:
python3:
For the record, using LibreOffice calc to try to open both files gives the same result: the file written with python3 using latin1 encoding cannot be opened properly when you specify latin1 encoding, it must be opened with utf-8 encoding to be displayed correctly. @graingert @jreback have you progressed on this subject ? For information, I use:
|
I am confused by this. You are doing If you want to write in latin1, why don't you just open the file in latin1?
prints
as expected. |
Hi @watercrossing My point is that you can specify an encoding in There may be no solution but it is confusing: you can set an option that is (quietly) not used. |
@anuraagmadhavRacherla please don't hijack unrelated issues. This question would be better suited for stackoverlfow. Also you've striped all the useful debugging information from your exception (the stack trace). |
data_frame.to_excel will need 'wb' and to_csv will need 'w' for my part. pandas==0.22.0 |
Hi folks, I wrote an article on my blog on how to Support Binary File Objects with |
I expect:
To crash with
TypeError: write() argument must be str, not bytes
and I expect:
To write the file correctly.
Copy pasta
The text was updated successfully, but these errors were encountered: