Skip to content
Permalink
dev
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time

Working with CSV format

Please note that csv reader load data in a lazy manner. It ignores excessive trailing cells that has None value. For example, the following csv content:

1,2,,,,,
3,4,,,,,
5,,,,,,,

would end up as:

1,2
3,4
5,

Write to a csv file

.. testcode::
   :hide:

    >>> import sys
    >>> if sys.version_info[0] < 3:
    ...     from StringIO import StringIO
    ... else:
    ...     from io import StringIO
    >>> from pyexcel_io._compact import OrderedDict

Here's the sample code to write an array to a csv file

>>> import datetime
>>> from pyexcel_io import save_data
>>> data = [
...     [1, 2.0, 3.0],
...     [
...         datetime.date(2016, 5, 4),
...         datetime.datetime(2016, 5, 4, 17, 39, 12),
...         datetime.datetime(2016, 5, 4, 17, 40, 12, 100)
...     ]
... ]
>>> save_data("your_file.csv", data)

Let's verify the file content:

>>> with open("your_file.csv", "r") as csvfile:
...     for line in csvfile.readlines():
...         print(line.strip())
1,2.0,3.0
2016-05-04,2016-05-04 17:39:12,2016-05-04 17:40:12.000100

Change line endings

By default, python csv module provides windows line ending 'rn'. In order to change it, you can do:

>>> save_data("your_file.csv", data, lineterminator='\n')

Read from a csv file

And we can read the written csv file back as the following code:

>>> from pyexcel_io import get_data
>>> import pprint
>>> data = get_data("your_file.csv")
>>> pprint.pprint(data['your_file.csv'])
[[1, 2.0, 3.0],
 [datetime.date(2016, 5, 4),
  datetime.datetime(2016, 5, 4, 17, 39, 12),
  datetime.datetime(2016, 5, 4, 17, 40, 12, 100)]]

As you can see, pyexcel-io not only reads the csv file back but also recognizes the data types: int, float, date and datetime. However, it does give your cpu some extra job. When you are handling a large csv file and the cpu budget is of your concern, you may switch off the type detection feature. For example, let's switch all off:

>>> data = get_data("your_file.csv", auto_detect_float=False, auto_detect_datetime=False)
    >>> import json
>>> json.dumps(data['your_file.csv'])
'[[1, "2.0", "3.0"], ["2016-05-04", "2016-05-04 17:39:12", "2016-05-04 17:40:12.000100"]]'

In addition to auto_detect_float and auto_detect_datetime, there is another flag named auto_detect_int, which becomes active only if auto_detect_float is True. Now, let's play a bit with auto_detect_int:

>>> data = get_data("your_file.csv", auto_detect_int=False)
>>> pprint.pprint(data['your_file.csv'])
[[1.0, 2.0, 3.0],
 [datetime.date(2016, 5, 4),
  datetime.datetime(2016, 5, 4, 17, 39, 12),
  datetime.datetime(2016, 5, 4, 17, 40, 12, 100)]]

As you see, all numeric data are identified as float type. If you looked a few paragraphs above, you would notice auto_detect_int affected [1, 2, ..] in the first row.

Write a csv to memory

Here's the sample code to write a dictionary as a csv into memory:

>>> from pyexcel_io import save_data
>>> data = [[1, 2, 3], [4, 5, 6]]
>>> io = StringIO()
>>> save_data(io, data)
>>> # do something with the io
>>> # In reality, you might give it to your http response
>>> # object for downloading

Read from a csv from memory

Continue from previous example:

>>> # This is just an illustration
>>> # In reality, you might deal with csv file upload
>>> # where you will read from requests.FILES['YOUR_XL_FILE']
>>> import json
>>> data = get_data(io)
>>> print(json.dumps(data))
{"csv": [[1, 2, 3], [4, 5, 6]]}

Encoding parameter

In general, if you would like to save your csv file into a custom encoding, you can specify 'encoding' parameter. Here is how you write verses of a finnish song, "Aurinko laskee länteen"[1] into a csv file

>>> content = [[u'Aurinko laskee länteen', u'Näin sen ja ymmärsin sen', u'Poissa aika on rakkauden Kun aurinko laskee länteen']]
>>> test_file = "test-utf16-encoding.csv"
    >>> save_data(test_file, content, encoding="utf-16", lineterminator="\n")

In the reverse direction, if you would like to read your csv file with custom encoding back, you do the same to get_data:

>>> custom_encoded_content = get_data(test_file, encoding="utf-16")
    >>> assert custom_encoded_content[test_file] == content
[1]A finnish song that was entered in Eurovision in 1965. You can check out its lyrics at diggiloo.net

Byte order mark (BOM) in csv file

By passing **encoding="utf-8-sig", You can write UTF-8 BOM header into your csv file. Here is an example to write a sentence of "Shui Dial Getou"[#f2] into a csv file:

>>> content = [['löyly', 'löyly']]
>>> test_file = "test-utf8-BOM.csv"
    >>> save_data(test_file, content, encoding="utf-8-sig", lineterminator="\n")

When you read it back you will have to specify encoding too.

>>> custom_encoded_content = get_data(test_file, encoding="utf-8-sig")
    >>> assert custom_encoded_content[test_file] == content
[2]One of Su shi's most famous poem. Here is the wiki link
.. testcode::
   :hide:

   >>> import os
   >>> os.unlink("your_file.csv")
   >>> os.unlink("test-utf16-encoding.csv")
   >>> os.unlink(test_file)