add the option for json.dumps to return newline delimited json #78710

ronron · 2018-08-28T12:47:13Z

BPO	34529
Nosy	@rhettinger, @etrepum, @serhiy-storchaka, @enedil

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/etrepum'
closed_at = None
created_at = <Date 2018-08-28.12:47:12.716>
labels = ['3.8', 'type-feature', 'library']
title = 'add the option for json.dumps to return newline delimited json'
updated_at = <Date 2019-08-18.21:49:41.600>
user = 'https://bugs.python.org/ronron'

bugs.python.org fields:

activity = <Date 2019-08-18.21:49:41.600>
actor = 'Thibault Molleman'
assignee = 'bob.ippolito'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2018-08-28.12:47:12.716>
creator = 'ronron'
dependencies = []
files = []
hgrepos = []
issue_num = 34529
keywords = []
message_count = 10.0
messages = ['324244', '324257', '324271', '324305', '324358', '324360', '324363', '324370', '324371', '324475']
nosy_count = 6.0
nosy_names = ['rhettinger', 'bob.ippolito', 'serhiy.storchaka', 'enedil', 'ronron', 'Thibault Molleman']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue34529'
versions = ['Python 3.8']

ronron · 2018-08-28T12:47:13Z

Many service providers such as Google BigQuery do not accept Json.
They accept newline delimited Json.

https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-json#limitations

please allow to receive this format directly from the dump.

enedil · 2018-08-28T15:08:50Z

So this format is just a series of JSON, delimited by a newline.
Instead of changing API, you might consider this piece of code:

def ndjsondump(objects):
    return '\n'.join(json.dumps(obj) for obj in objects)

Conversely, first you str.splitlines, then json.loads each line separately.

Does it satisfy your needs?

rhettinger · 2018-08-28T18:18:42Z

Would this need be fulfilled by the *separators* option?

    >>> from json import dumps
    >>> query = dict(system='primary', action='load')
    >>> print(dumps(query, separators=(',\n', ': ')))
    {"system": "primary",
    "action": "load"}

ronron · 2018-08-29T07:32:49Z

Raymond Hettinger answer is incorrect.

The main difference between Json and new line delimited json is that new line contains valid json in each line. Meaning you can do to line #47 and what you will have in this line is a valid json. Unlike the regular json where if one bracket is wrong the while file is unreadable.

You can not just add /n after one "object". You need also to change the brackets.

Keep in mind that not all Jsons are simple.. some contains huge amount of nested objects inside of them. You must identify where Json start and where it ends without being confused by nesting jsons.

There are many programming solutions to this issue.
For example:
https://stackoverflow.com/questions/51595072/convert-json-to-newline-json-standard-using-python/

My point is that this is a new format which is going to be widely accepted since Google adopted it for BigQuery.

flipping strings can also be easily implemented yet Python still build a function to do that for the user.

I think it's wise to allow support for this with in the Json library.. saving the trouble for programmer from thinking how to implement it.

rhettinger · 2018-08-29T23:11:20Z

The main difference between Json and new line delimited json is that new line contains valid json in each line.

It is up to Bob to decide whether this feature request is within the scope of the module.

etrepum · 2018-08-29T23:28:54Z

I think the best start would be to add a bit of documentation with an example of how you could work with newline delimited json using the existing module as-is. On the encoding side you need to ensure that it's a compact representation without embedded newlines, e.g.:

    for obj in objs:
        yield json.dumps(obj, separators=(',', ':')) + '\n'

I don't think it would make sense to support this directly from dumps, as it's really multiple documents rather than the single document that every other form of dumps will output.

On the read side it would be something like:

    for doc in lines:
        yield json.loads(doc)

I'm not sure if this is common enough (and separable enough from I/O and error handling constraints) to be worth adding the functionality directly into json module. I think it would be more appropriate in the short to medium term that the each service (e.g. BigQuery) would have its own module with helper functions or framework that encapsulates the protocol that the particular service speaks.

serhiy-storchaka · 2018-08-30T04:26:04Z

This format is known as JSON Lines: http://jsonlines.org/. Its support in the user code is trivial -- just one or two lines of code.

Writing:

    for item in items:
        json.dump(item, file)

or

jsondata = ''.join(json.dumps(item) for item in items)

Reading:

    items = [json.loads(line) for line in file]

or

    items = [json.loads(line) for line in jsondata.splitlines()]

See also bpo-31553 and bpo-34393. I think all these propositions should be rejected.

ronron · 2018-08-30T07:28:57Z

I'm a bit confused here.

On one hand you say it's two lines of code. On other hand you suggest that each service provider will implement it's own functions.

What's the harm from adding - small , unbreakable functionality?

Your points for small code could have also been raised against implementing reverse() - yet Python still implemented it - saved the 2 line code from the developer.

At the end.. small change or not.. This is a new format.
Conversion between formats are required from any programming language..

etrepum · 2018-08-30T08:00:11Z

I suggested that each module would likely implement its own functions tailored to that project's IO and error handling requirements. The implementation may differ slightly depending on the protocol. This is consistent with how JSON is typically dealt with from a web framework, for example.

ronron · 2018-09-02T08:26:28Z

Well... when handling GBs of data - it's preferred to generate the file directly in the required format rather than doing conversions.

The new line is a format... protocols don't matter here...
I still think the library should allow the user to create this format directly.

Lets get out of the scope of Google or others... The new line is a great format it allows you to take a "row" in the middle of the file and see it. You don't need to read 1gb file into parser in order to see it you can use copy 1 row... We are adopting this format for all our jsons, so it would be nice to get the service directly from the library.

ChrisBarker-NOAA · 2022-08-02T20:02:48Z

This has been sleeping for quite some time, but I happened upon it looking for something else, so I thought I'd put in a wrap-up comment / suggestion:

As @serhiy-storchaka pointed out, jsonlines is a different format -- it is not JSON with particular placement of newlines.

Anyway, the key difference from JSON is that it's a sequence of json blobs, rather than one, and there are no newlines inside each blob.

To support this, there would be a new function similar to json.dump, that would take a iterable of objects and serialize them to a file-like object. (and presumably a load function that would return an iterable of objects).

@serhiy-storchaka and @etrepum have already provided some prototype options.

So: would the core devs consider a PR with these functions?

If not, then a PR for, as @etrepum suggested, an addendum to the docs for how to read/write jsonlines?

ChrisBarker-NOAA · 2022-08-03T18:39:15Z

Not sure why I didn't do this yesterday, but, of course, there is (at least one) a package on PyPi for JSONLines:

https://pypi.org/project/jsonlines/

So the solution to this issue becomes one of:

tell folks to use a third party package
or
add jsonlines support to the stdlib.

Personally, I think if it's added to the stdlib, it should use a simpler API than the jsonlines package, more similar to the existing json API -- which may be an argument for simply pointing people to a third party package.

Maybe a decision could be made by a core dev and this could be closed :-)

ronron mannequin added 3.7 (EOL) end of life type-feature A feature request or enhancement labels Aug 28, 2018

rhettinger added stdlib Python modules in the Lib dir 3.8 only security fixes and removed 3.7 (EOL) end of life labels Aug 29, 2018

rhettinger assigned etrepum Aug 29, 2018

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add the option for json.dumps to return newline delimited json #78710

add the option for json.dumps to return newline delimited json #78710

ronron mannequin commented Aug 28, 2018

ronron mannequin commented Aug 28, 2018

enedil mannequin commented Aug 28, 2018

rhettinger commented Aug 28, 2018

ronron mannequin commented Aug 29, 2018

rhettinger commented Aug 29, 2018

etrepum mannequin commented Aug 29, 2018

serhiy-storchaka commented Aug 30, 2018

ronron mannequin commented Aug 30, 2018

etrepum mannequin commented Aug 30, 2018

ronron mannequin commented Sep 2, 2018

ChrisBarker-NOAA commented Aug 2, 2022

ChrisBarker-NOAA commented Aug 3, 2022

add the option for json.dumps to return newline delimited json #78710

add the option for json.dumps to return newline delimited json #78710

Comments

ronron mannequin commented Aug 28, 2018

ronron mannequin commented Aug 28, 2018

enedil mannequin commented Aug 28, 2018

rhettinger commented Aug 28, 2018

ronron mannequin commented Aug 29, 2018

rhettinger commented Aug 29, 2018

etrepum mannequin commented Aug 29, 2018

serhiy-storchaka commented Aug 30, 2018

ronron mannequin commented Aug 30, 2018

etrepum mannequin commented Aug 30, 2018

ronron mannequin commented Sep 2, 2018

ChrisBarker-NOAA commented Aug 2, 2022

ChrisBarker-NOAA commented Aug 3, 2022