Skip to content

Commit

Permalink
Add section on input formats
Browse files Browse the repository at this point in the history
  • Loading branch information
jpmckinney committed Aug 17, 2020
1 parent f5c54b5 commit 271313f
Showing 1 changed file with 20 additions and 0 deletions.
20 changes: 20 additions & 0 deletions docs/python/code.rst
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,26 @@ Script patterns
- Examples: `extension_registry <https://github.com/open-contracting/extension_registry/blob/master/manage.py>`__, `deploy <https://github.com/open-contracting/deploy/blob/master/manage.py>`__
Input formats
-------------
JSON
~~~~
In most cases, simply use the `standard library <https://docs.python.org/3/library/json.html>`__.
For critical paths involving small files, use `orjson <https://pypi.org/project/orjson/>`__.
.. note::
We can switch to the Python bindings for simdjson, pending `benchmarks <https://github.com/TkTech/pysimdjson/issues/42>`__. For JSON documents with known structures, `JSON Link <https://github.com/beached/daw_json_link>`__ is fastest, but the files relevant to us have unknown structures.
For large files, use the `same techniques <https://ocdskit.readthedocs.io/en/latest/contributing.html#streaming>`__ as OCDS Kit to stream input using `ijson <https://pypi.org/project/ijson/>`__, stream output using `iterencode <https://docs.python.org/3/library/json.html#json.JSONEncoder.iterencode>`__, and postpone evaluation using iterators. See its `brief tutorial <https://ocdskit.readthedocs.io/en/latest/library.html#working-with-streams>`__ on streaming and re-use its `default method <https://ocdskit.readthedocs.io/en/latest/_modules/ocdskit/util.html>`__.
.. note::
ijson uses `Yajl <http://lloyd.github.io/yajl/>`__. `simdjson <https://simdjson.org>`__ is faster, but is limited to `files smaller than 4 GB <https://github.com/simdjson/simdjson/blob/master/doc/basics.md#newline-delimited-json-ndjson-and-json-lines>`__ and has no `streaming API <https://github.com/simdjson/simdjson/issues/31>`__.
Output formats
--------------
Expand Down

0 comments on commit 271313f

Please sign in to comment.