Skip to content

Commit

Permalink
Improve reader interface (#90)
Browse files Browse the repository at this point in the history
* 🔨 improve reader interface

* 🔨 shrink reader code

* This is an auto-commit, updating project meta data, such as changelog.rst, contributors.rst

* 🔥 remove redundant functionalitoes, never will use. what's the point

* 📚 updated doc string and the tutorial

* 🔨 update import statements

* 🔬 more test coverage

* This is an auto-commit, updating project meta data, such as changelog.rst, contributors.rst

* 💚 fix unit test failure

* 📚 update reader plugin example

* 💄 update coding style

* 📚 fix index rst file

* This is an auto-commit, updating project meta data, such as changelog.rst, contributors.rst

Co-authored-by: chfw <chfw@users.noreply.github.com>
  • Loading branch information
chfw and chfw committed Oct 4, 2020
1 parent 29c2668 commit fa80887
Show file tree
Hide file tree
Showing 33 changed files with 337 additions and 76 deletions.
File renamed without changes.
2 changes: 1 addition & 1 deletion .moban.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,5 @@ targets:
- setup.py: io_setup.py.jj2
- .travis.yml: custom_travis.yml.jj2
- README.rst: io_readme.rst.jj2
- "docs/source/index.rst": "docs/source/index.rst"
- "docs/source/index.rst": "docs/source/index.rst.jj2"
- .gitignore: gitignore.jj2
1 change: 1 addition & 0 deletions CONTRIBUTORS.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@


5 contributors
================================================================================

Expand Down
28 changes: 28 additions & 0 deletions docs/source/extensions.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,31 @@
Extend pyexcel-io Tutorial
================================================================================

pyexcel-io itself comes with csv support.

Reader
--------------------------------------------------------------------------------

Suppose we have a yaml file, containing a dictionary where the values are
two dimensional array. The task is write reader plugin to pyexcel-io so that
we can use get_data() to read it out.

Example yaml data::

.. literalinclude:: ../../examples/test.yaml
:language: yaml

Example code::

.. literalinclude:: ../../examples/custom_yeaml_reader.py
:language: python


Writer
--------------------------------------------------------------------------------



Working with xls, xlsx, and ods formats
================================================================================

Expand Down
109 changes: 105 additions & 4 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,16 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
{%include "header.rst.jj2" %}
`pyexcel-io` - Let you focus on data, instead of file formats
================================================================================

:Author: chfw
:Source code: http://github.com/pyexcel/pyexcel-io.git
:Issues: http://github.com/pyexcel/pyexcel-io/issues
:License: New BSD License
:Development: |release|
:Released: |version|
:Generated: |today|

Introduction
--------------------------------------------------------------------------------
Expand Down Expand Up @@ -33,11 +42,104 @@ as of 2014. They are invented and supported by `pyexcel-io`_.
Installation
--------------------------------------------------------------------------------

{%include "installation.rst.jj2" %}

You can install pyexcel-io via pip:

.. code-block:: bash
$ pip install pyexcel-io
or clone it and install it:

.. code-block:: bash
$ git clone https://github.com/pyexcel/pyexcel-io.git
$ cd pyexcel-io
$ python setup.py install
For individual excel file formats, please install them as you wish:

{%include "io-plugins-list.rst.jj2" %}
.. _file-format-list:
.. _a-map-of-plugins-and-file-formats:

.. table:: A list of file formats supported by external plugins

======================== ======================= ================= ==================
Package name Supported file formats Dependencies Python versions
======================== ======================= ================= ==================
`pyexcel-io`_ >=v0.6.0 csv, csvz [#f1]_, tsv, 3.6+
tsvz [#f2]_
`pyexcel-io`_ <=0.5.20 same as above 2.6, 2.7, 3.3,
3.4, 3.5, 3.6
pypy
`pyexcel-xls`_ xls, xlsx(read only), `xlrd`_, same as above
xlsm(read only) `xlwt`_
`pyexcel-xlsx`_ xlsx `openpyxl`_ same as above
`pyexcel-ods3`_ ods `pyexcel-ezodf`_, 2.6, 2.7, 3.3, 3.4
lxml 3.5, 3.6
`pyexcel-ods`_ ods `odfpy`_ same as above
======================== ======================= ================= ==================

.. table:: Dedicated file reader and writers

======================== ======================= ================= ==================
Package name Supported file formats Dependencies Python versions
======================== ======================= ================= ==================
`pyexcel-xlsxw`_ xlsx(write only) `XlsxWriter`_ Python 2 and 3
`pyexcel-xlsxr`_ xlsx(read only) lxml same as above
`pyexcel-xlsbr`_ xlsx(read only) pyxlsb same as above
`pyexcel-odsr`_ read only for ods, fods lxml same as above
`pyexcel-odsw`_ write only for ods loxun same as above
`pyexcel-htmlr`_ html(read only) lxml,html5lib same as above
`pyexcel-pdfr`_ pdf(read only) pdftables Python 2 only.
======================== ======================= ================= ==================


Plugin shopping guide
------------------------

Except csv files, xls, xlsx and ods files are a zip of a folder containing a lot of
xml files

The dedicated readers for excel files can stream read


In order to manage the list of plugins installed, you need to use pip to add or remove
a plugin. When you use virtualenv, you can have different plugins per virtual
environment. In the situation where you have multiple plugins that does the same thing
in your environment, you need to tell pyexcel which plugin to use per function call.
For example, pyexcel-ods and pyexcel-odsr, and you want to get_array to use pyexcel-odsr.
You need to append get_array(..., library='pyexcel-odsr').



.. _pyexcel-io: https://github.com/pyexcel/pyexcel-io
.. _pyexcel-xls: https://github.com/pyexcel/pyexcel-xls
.. _pyexcel-xlsx: https://github.com/pyexcel/pyexcel-xlsx
.. _pyexcel-ods: https://github.com/pyexcel/pyexcel-ods
.. _pyexcel-ods3: https://github.com/pyexcel/pyexcel-ods3
.. _pyexcel-odsr: https://github.com/pyexcel/pyexcel-odsr
.. _pyexcel-odsw: https://github.com/pyexcel/pyexcel-odsw
.. _pyexcel-pdfr: https://github.com/pyexcel/pyexcel-pdfr

.. _pyexcel-xlsxw: https://github.com/pyexcel/pyexcel-xlsxw
.. _pyexcel-xlsxr: https://github.com/pyexcel/pyexcel-xlsxr
.. _pyexcel-xlsbr: https://github.com/pyexcel/pyexcel-xlsbr
.. _pyexcel-htmlr: https://github.com/pyexcel/pyexcel-htmlr

.. _xlrd: https://github.com/python-excel/xlrd
.. _xlwt: https://github.com/python-excel/xlwt
.. _openpyxl: https://bitbucket.org/openpyxl/openpyxl
.. _XlsxWriter: https://github.com/jmcnamara/XlsxWriter
.. _pyexcel-ezodf: https://github.com/pyexcel/pyexcel-ezodf
.. _odfpy: https://github.com/eea/odfpy


.. rubric:: Footnotes

.. [#f1] zipped csv file
.. [#f2] zipped tsv file
After that, you can start get and save data in the loaded format. There
are two plugins for the same file format, e.g. pyexcel-ods3 and pyexcel-ods.
Expand Down Expand Up @@ -91,7 +193,6 @@ get_data(.., library='pyexcel-ods')
csvz
sqlalchemy
django
options
extensions


Expand Down
45 changes: 45 additions & 0 deletions examples/custom_yaml_reader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
import yaml
from pyexcel_io import get_data
from pyexcel_io.sheet import NamedContent
from pyexcel_io.plugins import IOPluginInfoChainV2
from pyexcel_io.plugin_api import ISheet, IReader


class YourSingleSheet(ISheet):
def __init__(self, your_native_sheet):
self.two_dimensional_array = your_native_sheet

def row_iterator(self):
yield from self.two_dimensional_array

def column_iterator(self, row):
yield from row


class YourReader(IReader):
def __init__(self, file_name, file_type, **keywords):
self.file_handle = open(file_name, "r")
self.native_book = yaml.load(self.file_handle)
self.content_array = [
NamedContent(key, values)
for key, values in self.native_book.items()
]

def read_sheet(self, sheet_index):
two_dimensional_array = self.content_array[sheet_index].payload
return YourSingleSheet(two_dimensional_array)

def close(self):
self.file_handle.close()


IOPluginInfoChainV2(__name__).add_a_reader(
relative_plugin_class_path="YourReader",
locations=["file"],
file_types=["yaml"],
stream_type="text",
)

if __name__ == "__main__":
data = get_data("test.yaml")
print(data)
11 changes: 11 additions & 0 deletions examples/test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
sheet 1:
- - 1
- 2
- 3
- - 2
- 3
- 4
sheet 2:
- - A
- B
- C
2 changes: 1 addition & 1 deletion pyexcel-io.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ dependencies:
- lml>=0.0.4
test_dependencies:
- pyexcel
- pyexcel-xls
- pyexcel-xls==0.5.9
- SQLAlchemy
- pyexcel-xlsxw
extra_dependencies:
Expand Down
4 changes: 0 additions & 4 deletions pyexcel_io/_compact.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,4 @@ def is_string(atype):
if atype == str:
return True

elif PY2:
if atype == unicode:
return True

return False
2 changes: 1 addition & 1 deletion pyexcel_io/database/exporters/django.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
:copyright: (c) 2014-2020 by Onni Software Ltd.
:license: New BSD License, see LICENSE for more details
"""
from pyexcel_io.plugin_api import IReader
from pyexcel_io.database.querysets import QuerysetsReader
from pyexcel_io.plugin_api.abstract_reader import IReader


class DjangoModelReader(QuerysetsReader):
Expand Down
2 changes: 1 addition & 1 deletion pyexcel_io/database/exporters/queryset.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from pyexcel_io.plugin_api import IReader
from pyexcel_io.database.querysets import QuerysetsReader
from pyexcel_io.plugin_api.abstract_reader import IReader


class QueryReader(IReader):
Expand Down
2 changes: 1 addition & 1 deletion pyexcel_io/database/exporters/sqlalchemy.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
:copyright: (c) 2014-2020 by Onni Software Ltd.
:license: New BSD License, see LICENSE for more details
"""
from pyexcel_io.plugin_api import IReader
from pyexcel_io.database.querysets import QuerysetsReader
from pyexcel_io.plugin_api.abstract_reader import IReader


class SQLTableReader(QuerysetsReader):
Expand Down
3 changes: 1 addition & 2 deletions pyexcel_io/database/importers/django.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,7 @@

import pyexcel_io.constants as constants
from pyexcel_io.utils import is_empty_array, swap_empty_string_for_none
from pyexcel_io.plugin_api.abstract_sheet import ISheetWriter
from pyexcel_io.plugin_api.abstract_writer import IWriter
from pyexcel_io.plugin_api import IWriter, ISheetWriter

log = logging.getLogger(__name__)

Expand Down
3 changes: 1 addition & 2 deletions pyexcel_io/database/importers/sqlalchemy.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,7 @@
"""
import pyexcel_io.constants as constants
from pyexcel_io.utils import is_empty_array, swap_empty_string_for_none
from pyexcel_io.plugin_api.abstract_sheet import ISheetWriter
from pyexcel_io.plugin_api.abstract_writer import IWriter
from pyexcel_io.plugin_api import IWriter, ISheetWriter


class PyexcelSQLSkipRowException(Exception):
Expand Down
7 changes: 5 additions & 2 deletions pyexcel_io/database/querysets.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,11 @@ def to_array(self):
if len(self.__query_sets) == 0:
yield []

for element in ISheet.to_array(self):
yield element
for row in self.row_iterator():
row_values = []
for value in self.column_iterator(row):
row_values.append(value)
yield row_values

def column_iterator(self, row):
if self.__column_names is None:
Expand Down
3 changes: 3 additions & 0 deletions pyexcel_io/plugin_api/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from .abstract_sheet import ISheet, ISheetWriter # noqa: F401
from .abstract_reader import IReader # noqa: F401
from .abstract_writer import IWriter # noqa: F401
21 changes: 15 additions & 6 deletions pyexcel_io/plugin_api/abstract_reader.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,18 @@
from pyexcel_io._compact import OrderedDict
from .abstract_sheet import ISheet


class IReader(object):
def read_all(self):
result = OrderedDict()
for index, sheet in enumerate(self.content_array):
result.update({sheet.name: self.read_sheet(index).to_array()})
return result
"""
content_array should be a list of NamedContent
where: name is the sheet name,
payload is the native sheet.
"""

def read_sheet(self, sheet_index) -> ISheet:
raise NotImplementedError("")

def sheet_names(self):
return [content.name for content in self.content_array]

def __len__(self):
return len(self.content_array)
16 changes: 8 additions & 8 deletions pyexcel_io/plugin_api/abstract_sheet.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
class ISheet(object):
def to_array(self):
data = []
for row in self.row_iterator():
my_row = []
for element in self.column_iterator(row):
my_row.append(element)
data.append(my_row)
return data
def row_iterator(self):
raise NotImplementedError("")

def column_iterator(self, row):
raise NotImplementedError("")


class ISheetWriter(object):
def write_row(self, data_row):
raise NotImplementedError("How does your sheet write a row of data")

def write_array(self, table):
"""
For standalone usage, write an array
Expand Down
6 changes: 6 additions & 0 deletions pyexcel_io/plugin_api/abstract_writer.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
from .abstract_sheet import ISheetWriter


class IWriter(object):
def create_sheet(self, sheet_name) -> ISheetWriter:
raise NotImplementedError("Please implement a native sheet writer")

def write(self, incoming_dict):
for sheet_name in incoming_dict:
sheet_writer = self.create_sheet(sheet_name)
Expand Down

0 comments on commit fa80887

Please sign in to comment.