Skip to content

Allow the csv module to follow RFC 4180 #132073

@sls1005

Description

@sls1005

Feature or enhancement

Proposal:

I am not a lawyer. And I know it would be very strange to adopt a non-standard, but please just read.

Background

Around 2003-2004, the csv module of Python was introduced with PEP 305 into the standard library of the language. By the time PEP 305 was purposed, the module's default CSV dialect, "excel," was defined as CSV file as exported by Excel 97 and Excel 2000. It was one of the two predefined dialects of the module. The other predefined dialect was "excel-tab."

After that, things have changed a lot. In the year 2005, a non-standard specification, RFC 4180, is published. Around 2006, a new software, which would later be called "Google Sheets," was released. And about one year later, a new software called "Numbers" is released by Apple.

Description

Today, the use of "excel" in the csv module of Python as its default dialect, despite having historical origin, may be seen as non-neutral, as there seems to be no reason in a more open and competitive world to favor a specific product over Numbers, Google Sheets, LibreOffice Calc, or a publicly available specification on the internet.

Although excel is indeed a common English word that can be found in dictionaries, Python's use of it, as described above, and in PEP 305, is highly associated with a product or products of Microsoft.

It could be viewed by Google, Apple, and users of their products as an unneutral act of favoring a product of Microsoft or promoting it in this competitive world, or at least indicating that this module is intended to be used with such a product, or that the CSV format is highly associated with such a product.

For normal users, it would be a false guarantee that this module is and will always be compatible with such a product.

Finally, it might be seen as not universal or not portable enough. Even if it is identical to RFC 4180, people would still think that it is specific to Excel rather than cross-platform. We have only three predefined dialects, with two of them being "excel" and one being "unix." Today, people would say, "It's so good. I can export and import data from Excel." Someday in the future, people may instead say, "What is an excel?"

By the time the csv module was introduced, it might seem logical to name the default mode after a well-known product; twenty years later, this decision must be reviewed.

Twenty years later, which is more common, Python or Excel? Did Microsoft standardize the CSV format? Did they (Microsoft) publish a formal specification (of CSV) for us to follow? As developers of open source projects, should we link our projects to the name of a proprietary software, or that of a publicly available specification? Do governments of this world use RFC 4180, or "excel," or "unix," as their official CSV formats? Will Python continue to support the current and future versions of Microsoft products? (I mean Excel, not Windows.) If so, is the predefined "excel" dialect subject to changes, if Microsoft changes it tomorrow?

Solution

Create a distinct dialect object, called rfc4180, by strictly following RFC 4180. And then make it the default. The specification, despite not being a standard, is the closest thing to a universal standard. There will basically be no compatible issue as the new object will almost be identical to the excel dialect. This is more of a naming issue.

Alternatively, it can be renamed to default, which is more neutral and can mean anything.

Do the same with excel-tab. For excel and excel-tab, it would be better if the supported Excel versions are specified (and tested on).

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibStandard Library Python modules in the Lib/ directorytype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions