Skip to content

Commit

Permalink
fix #30: document the usage of encoding='utf-8-sig' and verify the BO…
Browse files Browse the repository at this point in the history
…M header
  • Loading branch information
chfw committed May 15, 2017
1 parent 9f99b1d commit f4bfdc2
Show file tree
Hide file tree
Showing 2 changed files with 62 additions and 1 deletion.
47 changes: 46 additions & 1 deletion docs/source/plaincsv.rst
Original file line number Diff line number Diff line change
Expand Up @@ -122,9 +122,54 @@ Continue from previous example:
{"csv": [[1, 2, 3], [4, 5, 6]]}


Encoding parameter
--------------------------------------------------------------------------------

In general, if you would like to save your csv file into a custom encoding, you
can specify 'encoding' parameter. Here is how you write verses of
a finnish song, "Aurinko laskee länteen"[#f1]_ into a csv file

.. code-block:: python
>>> content = [[u'Aurinko laskee länteen', u'Näin sen ja ymmärsin sen', u'Poissa aika on rakkauden Kun aurinko laskee länteen']]
>>> test_file = "test-utf16-encoding.csv"
>>> save_data(test_file, content, encoding="utf-16", lineterminator="\n")
In the reverse direction, if you would like to read your csv file with custom
encoding back, you do the same to get_data:

.. code-block:: python
>>> custom_encoded_content = get_data(test_file, encoding="utf-16")
>>> assert custom_encoded_content[test_file] == content
.. [#f1] A finnish song that was entered in Eurovision in 1965. You can check out its lyrics at `diggiloo.net <http://www.diggiloo.net/?1965fi>`_
Byte order mark (BOM) in csv file
--------------------------------------------------------------------------------

By passing **encoding="utf-8-sig", You can write UTF-8 BOM header into your csv file.
Here is an example to write a sentence of "Shui Dial Getou"[#f2] into a csv file:
.. code-block:: python
>>> content = [[u'人有悲歡離合', u'月有陰晴圓缺']]
>>> test_file = "test-utf8-BOM.csv"
>>> save_data(test_file, content, encoding="utf-8-sig", lineterminator="\n")
When you read it back you will have to specify encoding too.

.. code-block:: python
>>> custom_encoded_content = get_data(test_file, encoding="utf-8-sig")
>>> assert custom_encoded_content[test_file] == content
.. [#f2] One of Su shi's most famous poem. Here is the `wiki link <https://en.wikipedia.org/wiki/Shuidiao_Getou>`_
.. testcode::
:hide:

>>> import os
>>> os.unlink("your_file.csv")

>>> os.unlink(test_file)
16 changes: 16 additions & 0 deletions tests/test_issues.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os
from nose.tools import eq_
from pyexcel_io import get_data, save_data
from pyexcel_io._compact import PY26
import binascii


def test_issue_8():
Expand Down Expand Up @@ -63,5 +67,17 @@ def test_issue_33_34():
eq_(data['csv'], expected)


def test_issue_30_utf8_BOM_header():
content = [[u'人有悲歡離合', u'月有陰晴圓缺']]
test_file = "test-utf8-BOM.csv"
save_data(test_file, content, encoding="utf-8-sig", lineterminator="\n")
custom_encoded_content = get_data(test_file, encoding="utf-8-sig")
assert custom_encoded_content[test_file] == content
with open(test_file, "rb") as f:
content = f.read()
assert content[0:3] == b'\xef\xbb\xbf'
os.unlink(test_file)


def get_fixture(file_name):
return os.path.join("tests", "fixtures", file_name)

0 comments on commit f4bfdc2

Please sign in to comment.