Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow use of utf-8-sig encoding for Excel-compatible CSV export #18

Closed
felciano opened this issue May 25, 2011 · 7 comments
Closed

Allow use of utf-8-sig encoding for Excel-compatible CSV export #18

felciano opened this issue May 25, 2011 · 7 comments
Assignees
Milestone

Comments

@felciano
Copy link

Exporting tables with Unicode values to CSV does not encode include the byte-order marker needed by Excel to recognize that the CSV file has Unicode in it. As a result, double-clicking on the exported CSV file will not show the correct charactersets in the cells with non-ASCII values. This is arguably an Excel limitation (see http://www.sqlsnippets.com/en/topic-13412.html) but given that tablib is presumably trying to make life easier for people dealing with Excel, this would be nice to fix.

I believe this can be addressed by using utf-8-sig instead of utf-8 as the encoding during export.

The following demonstrates the problem and solution using a different encoding.

def testCSVandBOM():
    # requires the UnicodeWriter and UnicodeReader classes (see Python csv module docs)
    val = 'Etel\xc3\xa4-Suomi, Finland'.decode('utf-8')
    print val

    # double-clicking this file to open in Excel decodes correctly
    with open('with-BOM.csv', 'wb') as f:
        w = UnicodeWriter(f, delimiter = ",", encoding = 'utf-8-sig' )
        w.writerow(['Someplace I want to visit',val])

    # double-clicking this file to open in Excel does NOT decode correctly
    with open('without-BOM.csv', 'wb') as f:
        w = UnicodeWriter(f, delimiter = ",", encoding = 'utf-8' )
        w.writerow(['Someplace I want to visit',val])

If compatibility with current CSV export behavior is a concern, maybe this could be added as a new export format?

@ghost ghost assigned kennethreitz May 26, 2011
@kennethreitz
Copy link
Contributor

Thanks!

I think replacing the current export behavior with this is best.

@kennethreitz
Copy link
Contributor

Any objections?

@kennethreitz
Copy link
Contributor

Done.

kennethreitz pushed a commit that referenced this issue Jun 21, 2011
pombredanne pushed a commit to pombredanne/tablib that referenced this issue Aug 20, 2012
@boatcoder
Copy link

What ever happened with this? I'm using tablib==3.2.0 and not finding anything with utf-8-sig and excel is still unable to show the emojis that are in the file correctly.

@claudep
Copy link
Contributor

claudep commented Mar 31, 2022

When you export with tablib in csv, you get an unicode result, so AFAIR it's your business to write it in a file with the proper encoding.

@boatcoder
Copy link

I'm using tablib via django-tables2 and this is what I had to do to get the encoding bytes into the file.

        if request.GET.get('_export', None):
            export = TableExport(export_format=request.GET['_export'], table=table)
            response = HttpResponse(content_type=export.content_type())
            filename=f"{self.export_name}.{request.GET['_export']}"
            response["Content-Disposition"] = f'attachment; filename="{filename}"'
            # These 3 bytes at the beginning of the file make excel happy with the emojis
            response.write(b'\xef\xbb\xbf')
            response.write(export.export())
            return response

@claudep
Copy link
Contributor

claudep commented Mar 31, 2022

You could write a single write line with response.write(export.export().encode('utf-8-sig'))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants