Skip to content

Commit

Permalink
Update links to simdjson
Browse files Browse the repository at this point in the history
  • Loading branch information
jpmckinney committed Oct 13, 2020
1 parent fc5a27c commit c7b65ef
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/python/file_formats.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,13 @@ For critical paths involving small files, use `orjson <https://pypi.org/project/

.. note::

We can switch to the Python bindings for simdjson, pending `benchmarks <https://github.com/TkTech/pysimdjson/issues/42>`__. For JSON documents with known structures, `JSON Link <https://github.com/beached/daw_json_link>`__ is fastest, but the files relevant to us have unknown structures.
We can switch to the Python bindings for simdjson: either `pysimdjson <https://github.com/TkTech/pysimdjson>`__ or `libpy_simdjson <https://github.com/gerrymanoim/libpy_simdjson>`__. For JSON documents with known structures, `JSON Link <https://github.com/beached/daw_json_link>`__ is fastest, but the files relevant to us have unknown structures.

For large files, use the `same techniques <https://ocdskit.readthedocs.io/en/latest/contributing.html#streaming>`__ as OCDS Kit to stream input using `ijson <https://pypi.org/project/ijson/>`__, stream output using `iterencode <https://docs.python.org/3/library/json.html#json.JSONEncoder.iterencode>`__, and postpone evaluation using iterators. See its `brief tutorial <https://ocdskit.readthedocs.io/en/latest/library.html#working-with-streams>`__ on streaming and re-use its `default method <https://ocdskit.readthedocs.io/en/latest/_modules/ocdskit/util.html>`__.

.. note::

ijson uses `Yajl <http://lloyd.github.io/yajl/>`__. `simdjson <https://simdjson.org>`__ is faster, but is limited to `files smaller than 4 GB <https://github.com/simdjson/simdjson/blob/master/doc/basics.md#newline-delimited-json-ndjson-and-json-lines>`__ and has no `streaming API <https://github.com/simdjson/simdjson/issues/31>`__.
ijson uses `Yajl <http://lloyd.github.io/yajl/>`__. `simdjson <https://simdjson.org>`__ is faster, but is limited to `files smaller than 4 GB <https://github.com/simdjson/simdjson/issues/128>`__ and has no `streaming API <https://github.com/simdjson/simdjson/issues/670>`__.

Output
~~~~~~
Expand Down

0 comments on commit c7b65ef

Please sign in to comment.