Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is interacting with files using rb and wb safer? #15

Closed
rdmurphy opened this issue Mar 9, 2015 · 7 comments
Closed

Why is interacting with files using rb and wb safer? #15

rdmurphy opened this issue Mar 9, 2015 · 7 comments

Comments

@rdmurphy
Copy link
Collaborator

rdmurphy commented Mar 9, 2015

We had this come up in our session – people wanted to know why using the b was safer. It'd be nice for us to actually be able to explain that beyond just saying, "Everybody does it! Trust us." πŸ˜„

(Which is basically what we did.)

@chrislkeller
Copy link
Collaborator

+1 indeed. It's that thing that I forgot came up last year until we came to it this year.

@hbillings
Copy link
Collaborator

According to Serdar, who is the one who pounded this into @esagara and me, nothing unexpected happens if you read/write a regular file with the binary option, but if you try to read/write a binary file without the binary option, it causes Bad Things. So it's safer just to always use "rb" and "wb." (Note that I haven't researched this myself, so perhaps there's something else to it, but I tend to trust anyone who smokes cigars with me.)

@zstumgoren
Copy link
Collaborator

The binary flag for read/write bits help avoid corrupting binary files (e.g. jpeg) on Windows machines, but I've found that it also helps avoid cross-platform headaches when transferring text files between *nix and Windows. I'll admit I can't explain the precise reasons (I'm guessing it's because of differences in the newline character), but I've noticed the binary flag occasionally resolves problems on Windows when shuffling text files between OS. @hbillings That's why we decided a while back it'd just be safer to always use it across the board (and pound it into the heads of all our poor unfortunate students :)

That said, this might be causing more headaches than it's worth in a teaching context (I know we got questions about it every year as well). You might consider dropping it. If and when folks get bit by this in their Python careers, no doubt they'll be able to sort through it by pinging you all or PyJournos :)

Just be sure to test those scripts and data ahead of class if you wind up using Windows machines again!

@tommeagher
Copy link
Collaborator

So in porting the exercises to Python3, I found that trying to read the text files with "rb" was throwing an error. I had to remove the "b" to get it to work. Does anyone know if Python3 handles the binary flag differently for text files? Is this no longer a concern?

@zstumgoren
Copy link
Collaborator

@tommeagher I got bit by this too. The "short" answer is that text vs. binary data handling is saner in Python3 -- in a way that now requires the usage of the binary mode only when you actually need bytes of data (rather than encoded characters). In olden 2.x days, the 'b' was often used to ensure file reads worked properly on certain platforms such as Windows, although the flag was generally ignored on Unix-like systems.

A more thorough explanation is here: http://python3porting.com/preparing.html#separate-binary-data-and-strings

@zstumgoren
Copy link
Collaborator

This is also helpful imho: https://docs.python.org/3/library/functions.html#open

@tommeagher
Copy link
Collaborator

@zstumgoren got it. This is really helpful. So it seems then this discussion is now moot. Thanks for the advice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants