Skip to content
This repository has been archived by the owner on Jul 11, 2023. It is now read-only.

Feature/updated api #80

Merged
merged 26 commits into from
Sep 13, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
18 changes: 14 additions & 4 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
sudo:
false
required

dist:
trusty

addons:
apt:
packages:
- pandoc

language:
python
Expand All @@ -24,6 +32,11 @@ script:
after_success:
- coveralls

before_deploy:
- pandoc --version
- pandoc -f markdown_github -t rst -o README.rst README.md
- mv README.rst README.md

deploy:
provider: pypi
user: okfn
Expand All @@ -32,6 +45,3 @@ deploy:
tags: true
password:
secure: Iuf7V4+XHL6wwFYt4IyEe0vWLGO/uOpMJWQnO+1eUjmcQ1qi4E9vyEJvsJRzWKm5+/Lv9uFIRGlmpNWQzUPs5VnMc3LEBh7Clv/WIlRGvi+omCeWoEPAPUueF8qjBcvpT37QNzjB5QXJY074uAihmKh/DU2xA4K0yCB8YQefBHYeNBl0pNYVnELUW8BFmz0GE0lTwHOnM681vgR01LdPjrgIHVEvnTZkKYtDXc/cwkw610fqrFS10srnTX6KjjC/pgDm4WSuaUxbPycmriIhZR29QgAx24NO/wrdGdp5H8TIsvBFnNFlC4QuHfwiXdAKpjL6cMu2uMo639Sev/484XxTorg2QQvNhNAJtiESVAaqVviAlmUItGdmsw4xhZb0JK6NC8fOuOoccL4DBD6JtCyGurwSpznuGXh1DQUYZ7fTd5qaUDnzBuhYGc8XDvcj14XU4P5OKES4NdruRVJOwFiNSMOAT6wm8b2Ue6N+FvgsghjwUr9ESKBrPj0VoouC2+FGZWT65vt/3R9PhFuBdC6SgMLWHESBuU5GW9Bc2ucS3HUi+uUV1IGjpfIsc3qifojNJiaU7hSAggJs9QlXd7goH2fKhb9ro2klzcDKmpBLXmMk3uH0QRpv1dGUYFtgGeEFN93vP3cxYsXf8OvV+MuCxYYGgrGZu3h8fvbc5hY=

# after_script:
# - if [ "$TRAVIS_BRANCH" == "master" ]; then bash scripts/testsuite.sh; fi
80 changes: 42 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,82 +5,86 @@
[![PyPi](https://img.shields.io/pypi/v/tabulator.svg)](https://pypi.python.org/pypi/tabulator)
[![Gitter](https://img.shields.io/gitter/room/frictionlessdata/chat.svg)](https://gitter.im/frictionlessdata/chat)

A utility library that provides a consistent interface for reading tabular data.
Consistent interface for stream reading and writing tabular data (csv/xls/json/etc).

## Features

- supports various formats: csv/tsv/xls/xlsx/json/native/etc
- reads data from variables, filesystem or Internet
- streams data instead of using a lot of memory
- processes data via simple user processors
- saves data using the same interface

## Getting Started

### Installation

To get started (under development):
To get started:

```
$ pip install tabulator
```

### Quick Start

Fast access to the table with `topen` (stands for `table open`) function:
Open tabular stream from csv source:

```python
from tabulator import topen, processors
from tabulator import Stream

with topen('path.csv', headers='row1') as table:
for row in table:
print(row) # will print row tuple
with Stream('path.csv', headers=1) as stream:
for row in stream:
print(row) # will print row values list
```

For the most use cases `topen` function is enough. It takes the
`source` argument:
`Stream` takes the `source` argument:

```
<scheme>://path/to/file.<format>
```
and uses corresponding `Loader` and `Parser` to open and start to iterate
over the table. Also user can pass `scheme` and `format` explicitly
as function arguments. User can force Tabulator to use encoding of choice
to open the table passing `encoding` argument.

Function `topen` returns `Table` instance. We use context manager
to call `table.open()` on enter and `table.close()` when we exit:
- table can be iterated like file-like object returning row by row
- table can be used for manual iterating with `table.iter(keye/extended=False)`
- table can be read into memory using `read` function (return list or row tuples)
with `limit` of output rows as parameter.
and uses corresponding `Loader` and `Parser` to open and start to iterate over the tabular stream. Also user can pass `scheme` and `format` explicitly as constructor arguments. User can force Tabulator to use encoding of choice to open the table passing `encoding` argument.

In this example we use context manager to call `stream.open()` on enter and `stream.close()` when we exit:
- stream can be iterated like file-like object returning row by row
- stream can be used for manual iterating with `iter(keyed/extended)` function
- stream can be read into memory using `read(keyed/extended)` function with row count `limit`
- headers can be accessed via `headers` property
- rows sample can be accessed via `samle` property
- table pointer can be set to start via `reset` method.
- rows sample can be accessed via `sample` property
- stream pointer can be set to start via `reset` method
- stream could be saved to filesystem using `save` method

### Advanced Usage

To get full control over the process you can use more parameters.
Below the more expanded example is presented:
To get full control over the process you can use more parameters. Below the more expanded example is presented:

```python
from tabulator import topen, loaders, parsers, processors
from tabulator import Stream

def skip_even_rows(extended_rows):
for number, headers, row in extended_rows:
if number % 2:
yield (number, headers, row)

table = topen('path.csv', headers='row1', encoding='utf-8', sample_size=1000,
post_parse=[processors.skip_blank_rows, skip_even_rows]
loader_options={'constructor': loaders.File},
parser_options={'constructor': parsers.CSV, delimeter': ',', quotechar: '|'})
print(table.samle) # will print sample
print(table.headers) # will print headers list
print(table.read(limit=10)) # will print 10 rows
table.reset()
for keyed_row in table.iter(keyed=True):
stream = Stream('http://example.com/source.xls',
headers=1, encoding='utf-8', sample_size=1000,
post_parse=[skip_even_rows], parser_options={delimeter': ',', quotechar: '|'})
stream.open()
print(stream.sample) # will print sample
print(stream.headers) # will print headers list
print(stream.read(limit=10)) # will print 10 rows
stream.reset()
for keyed_row in stream.iter(keyed=True):
print keyed_row # will print row dict
for extended_row in table.iter(extended=True):
print extended_row # will print (number, headers, row) list
table.close()
for extended_row in stream.iter(extended=True):
print extended_row # will print (number, headers, row)
stream.reset()
stream.save('target.csv')
stream.close()
```

## Read more

- [Documentation](https://github.com/frictionlessdata/tabulator-py/tree/master/tabulator)
- [Docstrings](https://github.com/frictionlessdata/tabulator-py/tree/master/tabulator)
- [Changelog](https://github.com/frictionlessdata/tabulator-py/releases)
- [Contribute](CONTRIBUTING.md)

Expand Down
7 changes: 7 additions & 0 deletions data/special/long.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
id,name
1,a
2,b
3,c
4,d
5,e
6,f
96 changes: 48 additions & 48 deletions examples/topen.py → examples/stream.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,126 +6,126 @@

import io
import sys
from tabulator import topen, loaders, parsers
from tabulator import Stream


print('Parse csv format:')
source = 'data/table.csv'
with topen(source, headers='row1') as table:
print(table.headers)
for row in table:
with Stream(source, headers='row1') as stream:
print(stream.headers)
for row in stream:
print(row)


print('\nParse linear tsv format:')
source = 'data/table.tsv'
with topen(source, headers='row1') as table:
print(table.headers)
for row in table:
with Stream(source, headers='row1') as stream:
print(stream.headers)
for row in stream:
print(row)


print('\nParse json with dicts:')
source = 'file://data/table-dicts.json'
with topen(source) as table:
print(table.headers)
for row in table:
with Stream(source) as stream:
print(stream.headers)
for row in stream:
print(row)


print('\nParse json with lists:')
source = 'file://data/table-lists.json'
with topen(source, headers='row1') as table:
print(table.headers)
for row in table:
with Stream(source, headers='row1') as stream:
print(stream.headers)
for row in stream:
print(row)


print('\nParse xls format:')
source = 'data/table.xls'
with topen(source, headers='row1') as table:
print(table.headers)
for row in table:
with Stream(source, headers='row1') as stream:
print(stream.headers)
for row in stream:
print(row)


print('\nParse xlsx format:')
source = 'data/table.xlsx'
with topen(source, headers='row1') as table:
print(table.headers)
for row in table:
with Stream(source, headers='row1') as stream:
print(stream.headers)
for row in stream:
print(row)


# print('\nLoad from stream scheme:')
source = io.open('data/table.csv', mode='rb')
with topen(source, headers='row1', format='csv') as table:
print(table.headers)
for row in table:
with Stream(source, headers='row1', format='csv') as stream:
print(stream.headers)
for row in stream:
print(row)


print('\nLoad from text scheme:')
source = 'text://id,name\n1,english\n2,中国人\n'
with topen(source, headers='row1', format='csv') as table:
print(table.headers)
for row in table:
with Stream(source, headers='row1', format='csv') as stream:
print(stream.headers)
for row in stream:
print(row)


print('\nLoad from http scheme:')
source = 'https://raw.githubusercontent.com'
source += '/okfn/tabulator-py/master/data/table.csv'
with topen(source, headers='row1') as table:
print(table.headers)
for row in table:
with Stream(source, headers='row1') as stream:
print(stream.headers)
for row in stream:
print(row)


print('\nUsage of native lists:')
source = [['id', 'name'], ['1', 'english'], ('2', '中国人')]
with topen(source, headers='row1') as table:
print(table.headers)
for row in table:
with Stream(source, headers='row1') as stream:
print(stream.headers)
for row in stream:
print(row)


print('\nUsage of native lists (keyed):')
source = [{'id': '1', 'name': 'english'}, {'id': '2', 'name': '中国人'}]
with topen(source) as table:
print(table.headers)
for row in table:
with Stream(source) as stream:
print(stream.headers)
for row in stream:
print(row)


print('\nIter with keyed rows representation:')
source = [{'id': '1', 'name': 'english'}, {'id': '2', 'name': '中国人'}]
with topen(source) as table:
print(table.headers)
for row in table.iter(keyed=True):
with Stream(source, headers=1) as stream:
print(stream.headers)
for row in stream.iter(keyed=True):
print(row)


print('\nTable reset and read limit:')
source = 'data/table.csv'
with topen(source, headers='row1') as table:
print(table.headers)
print(table.read(limit=1))
table.reset()
print(table.read(limit=1))
with Stream(source, headers='row1') as stream:
print(stream.headers)
print(stream.read(limit=1))
stream.reset()
print(stream.read(limit=1))


print('\nLate headers (on a second row):')
source = 'data/special/late_headers.csv'
with topen(source, headers='row2') as table:
print(table.headers)
for row in table:
with Stream(source, headers='row2') as stream:
print(stream.headers)
for row in stream:
print(row)


print('\nSpaces in headers:')
source = 'https://raw.githubusercontent.com/datasets/gdp/master/data/gdp.csv'
with topen(source, headers='row1') as table:
print(table.headers)
for row in table.read(limit=5):
with Stream(source, headers='row1') as stream:
print(stream.headers)
for row in stream.read(limit=5):
print(row)
1 change: 1 addition & 0 deletions pylama.ini
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
[pylama]
linters = pyflakes,mccabe,pep8
ignore = E731

[pylama:mccabe]
complexity = 16
Expand Down
19 changes: 10 additions & 9 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,15 @@ def read(*paths):
# Prepare
PACKAGE = 'tabulator'
INSTALL_REQUIRES = [
'six>=1.9',
'xlrd>=0.9',
'ijson>=2.0',
'chardet>=2.0',
'openpyxl>=2.0',
'requests>=2.8',
'linear-tsv>=0.99.1',
'beautifulsoup4>=4.4',
'six>=1.9,<2.0',
'xlrd>=1.0,<2.0',
'ijson>=2.0,<3.0',
'chardet>=2.0,<3.0',
'openpyxl>=2.0,<3.0',
'requests>=2.8,<3.0',
'beautifulsoup4>=4.4,<5.0',
'linear-tsv>=0.99,<0.100',
'unicodecsv>=0.14,<0.15',
]
TESTS_REQUIRE = [
'pylama',
Expand All @@ -49,7 +50,7 @@ def read(*paths):
extras_require={'develop': TESTS_REQUIRE},
zip_safe=False,
long_description=README,
description='A utility library that provides a consistent interface for reading tabular data.',
description='Consistent interface for stream reading and writing tabular data (csv/xls/json/etc)',
author='Open Knowledge Foundation',
author_email='info@okfn.org',
url='https://github.com/frictionlessdata/tabulator-py',
Expand Down
8 changes: 5 additions & 3 deletions tabulator/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@
from __future__ import print_function
from __future__ import unicode_literals

from .table import Table
from .topen import topen
from .stream import Stream
from . import exceptions
from . import processors

# Deprecated
from .topen import topen
from .stream import Stream as Table