Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use universal newline mode in csv module examples #52634

Closed
sfinnie mannequin opened this issue Apr 13, 2010 · 11 comments
Closed

use universal newline mode in csv module examples #52634

sfinnie mannequin opened this issue Apr 13, 2010 · 11 comments
Labels
docs Documentation in the Doc dir type-feature A feature request or enhancement

Comments

@sfinnie
Copy link
Mannequin

sfinnie mannequin commented Apr 13, 2010

BPO 8387
Nosy @birkenfeld, @terryjreedy, @pitrou, @bitdancer, @serhiy-storchaka
Files
  • test_csv.py
  • test.csv
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-05-31.13:41:07.882>
    created_at = <Date 2010-04-13.21:11:23.660>
    labels = ['type-feature', 'docs']
    title = 'use universal newline mode in csv module examples'
    updated_at = <Date 2020-05-31.13:41:07.881>
    user = 'https://bugs.python.org/sfinnie'

    bugs.python.org fields:

    activity = <Date 2020-05-31.13:41:07.881>
    actor = 'serhiy.storchaka'
    assignee = 'docs@python'
    closed = True
    closed_date = <Date 2020-05-31.13:41:07.882>
    closer = 'serhiy.storchaka'
    components = ['Documentation']
    creation = <Date 2010-04-13.21:11:23.660>
    creator = 'sfinnie'
    dependencies = []
    files = ['34981', '34982']
    hgrepos = []
    issue_num = 8387
    keywords = []
    message_count = 11.0
    messages = ['103086', '113190', '113221', '216887', '216888', '216892', '216894', '216904', '216909', '221627', '370455']
    nosy_count = 8.0
    nosy_names = ['georg.brandl', 'terry.reedy', 'pitrou', 'r.david.murray', 'jesstess', 'sfinnie', 'docs@python', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'out of date'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue8387'
    versions = ['Python 2.7']

    @sfinnie
    Copy link
    Mannequin Author

    sfinnie mannequin commented Apr 13, 2010

    Running the examples in the csv module docs (http://docs.python.org/library/csv.html) causes problems reading file on a mac. This is highlighted in bpo-1072404 (http://bugs.python.org/issue1072404).

    Commentary on the bug indicates a no fix, meaning most/many people using a mac will get an error if they use the sample code in the docs.

    A simpler solution would be to use universal newline mode in the doc examples. This is actually mentioned in commentary on the bug, and appears to work.

    Proposal
    --------
    In all example code blocks, use mode 'rU' when opening the file. 1st code block, for example, would become:

    spamReader = csv.reader(open('eggs.csv', 'rU'), delimiter=' ', quotechar='|')

    That should solve the problem on mac without impacting compatibility on other operating systems. Note: Haven't been able to verify this on other platforms.

    @sfinnie sfinnie mannequin assigned birkenfeld Apr 13, 2010
    @sfinnie sfinnie mannequin added docs Documentation in the Doc dir type-feature A feature request or enhancement labels Apr 13, 2010
    @terryjreedy
    Copy link
    Member

    In the current 2.7 docs, files are opened with 'rb' or 'wb'.
    In msg106210 of bpo-1072404, RDM says "The doc has been fixed;".
    I am not sure if this refers a change in the open or just removal of reference to non-working delimiter option.
    David?
    Any opinion on this request?

    @terryjreedy terryjreedy assigned docspython and unassigned birkenfeld Aug 7, 2010
    @bitdancer
    Copy link
    Member

    "The doc has been fixed" refers to the fact that the lineterminator dialect option is now documented as applying only to writing, not to reading.

    The docs could certainly be improved to discuss using universal newline mode. I'm not clear on whether or not there are disadvantages to using universal newline mode with the py2 version of the csv module, but I wouldn't be surprised if there are. Perhaps Skip can comment on whether changing the examples to use rU would be a good idea or not.

    Note that the situation for the py3k csv module is different, and it would be helpful if someone could test this issue there. Though in truth we have no resources to support non-OSX macs any longer, so if it doesn't work it may be just tough luck.

    @jesstess
    Copy link
    Member

    I ran some experiments to see what the state of the world is. I generated a test.csv by exporting a CSV file from Numbers on OSX. This generated a file with Windows-style \r\n-terminated lines. The attached test_csv.py tries to open this CSV file in binary and universal newlines modes. Here's what happens on various platforms

    Python 3:

    • Linux: both binary and universal work
    • OSX: binary errors out, universal works
    • Windows: binary errors out, universal works

    In both cases, the error was:

    $ python3 test_csv.py
    Traceback (most recent call last):
      File "test_csv.py", line 5, in <module>
        for row in spamreader:
    _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

    Python 2:

    • Linux: both binary and universal work
    • OSX: both binary and universal
    • Windows: wasn't readily able to test

    If I manually create a CSV file using TextEdit in plaintext mode on OSX, that produces a file with Mac-style \r-terminated lines. test_csv.py has the same results on this file on OSX (errors out in binary mode in Python 3).

    @jesstess
    Copy link
    Member

    All of the examples from https://docs.python.org/3/library/csv.html run without issue on OSX, though.

    In summary, the Python 2 examples error out on OSX and switching them to use 'U' instead of 'b' would fix this. I don't think any action needs to be taken for Python 3.

    My one remaining question is about binary files on Windows. The Python 2 csv docs say "If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference." I don't readily have a Windows machine to play with this -- do "binary" CSV files exist, or can we eliminate the 'b' language entirely and just talk about 'U'?

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Apr 20, 2014

    I think that it's complete nonsense to talk about binary csv files on Windows. They are just plain text files that can be manipulated with any old editor or a spreadsheet.

    @bitdancer
    Copy link
    Member

    The magic of newline='' in python3 is that it *preserves* the line end characters, which is the same thing binary mode does on windows. The place that matters, as I remember it, is when there is a newline embedded inside a quoted string. I don't remember *why* that matters, though :(. But it had something to do with how the csv module processes the data internally.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 20, 2014

    Note that 'U' is a no-op under Python 3, it's just there for compatibility reasons; i.e. 'rU' is the same as 'r'.

    Also, from a quick glance, the CSV parser in _csv.c looks newline-agnostic.

    @sfinnie: can you explain which problems you encountered running the examples? Please also post the resulting exception tracebacks, if any.

    @jesstess
    Copy link
    Member

    I realized that I typo'd 2 instead of 3 in http://bugs.python.org/issue8387#msg216888 which makes that message confusing. Here's a restatement of my findings:

    • All of the Python 3 csv examples work in Python 3 on all platforms.
    • The Python 2 binary-mode csv examples work in Python 2.7 on all platforms.
    • The Python 2 binary-mode csv examples error out on Windows and OSX when run under Python 3. We could do nothing to address this, or, if we determine that there's no negative impact to removing the 'b', update the examples to accommodate readers who are running Python 2 examples using Python 3 for whatever reason.

    Which does bring me to the same question as @pitrou, which is what data and code cause an error for @sfinnie on Python 2. :)

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jun 26, 2014

    @sfinnie can we please have a response to the question first asked by Antoine and repeated by Jessica, thanks.

    @serhiy-storchaka
    Copy link
    Member

    Python 2.7 is no longer supported.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants