Add support for newline-delimited JSON #13

eyeseast · 2020-02-13T02:31:57Z

☝️ does that. Closes #12

A couple caveats:

Features are streamed in with a generator, since this is intended for large datasets. That means we can't look ahead at the first 100 features.
We can't auto-detect feature IDs using ndjson. I'm ok with that. You can still pass --pk=id and get the same thing.
We can't use the first 100 features to build the initial table. Again, I think it's fine. I note in the README how to grab a subset using Fiona.

I have one test here, but I'm very much open to suggestions for more.

simonw · 2020-02-15T06:07:28Z

There's a pattern for peeking ahead in sqlite-utils here:

https://github.com/simonw/sqlite-utils/blob/e8b2b7383bd94659d3b7a857a1414328bc48bc19/sqlite_utils/db.py#L993-L1004

You can use itertools.islice to pull out the first 100 items (and turn them into a list with list()), then use itertools.chain(that_list, original_iterator) to loop through the first 100 items followed by the rest of the iterator.

simonw · 2020-02-15T06:09:35Z

This looks great - the tests look robust enough to me.

eyeseast · 2020-02-15T11:49:25Z

Cool. I'll see if I can get the peek-ahead to work in a reasonable way. Was thinking about islice but wasn't totally sure where to do that with yield_records also happening.

eyeseast · 2020-02-17T01:17:37Z

I got feature.id working by sampling the stream of features coming in, and then chaining that sample back into the original stream.

Getting it to work with processed features to guess column types starts to feel precarious, because lists and generators are going to operate differently. It might ultimately be easier to do features = iter(features) at the top, so it's always a one-way stream and everything operates the same, but I'm not sure that's worth it. I think collecting a subset into a feature collection, like I describe in the readme, actually feels a little easier and more deliberate.

It's a duplicate of quakes.ndgeojson

Refs #13, #17, #19

Add support for newline-delimited JSON

6595d7a

eyeseast requested a review from simonw February 13, 2020 02:32

Use feature.id with ndjson

c99ee10

simonw added 2 commits May 16, 2021 14:11

Merge branch 'master' into 12-ndjson

4860e66

Delete quakes.geojson

a97d6ae

It's a duplicate of quakes.ndgeojson

simonw added a commit that referenced this pull request May 16, 2021

Simplified ndgeojson fixture, refs #13

f369c56

simonw merged commit 13c4e5a into simonw:master May 16, 2021

simonw mentioned this pull request May 16, 2021

--spatial-index and --nl do not play well together #19

Closed

simonw added a commit that referenced this pull request May 17, 2021

Release 0.3

da10558

Refs #13, #17, #19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for newline-delimited JSON #13

Add support for newline-delimited JSON #13

eyeseast commented Feb 13, 2020

simonw commented Feb 15, 2020

simonw commented Feb 15, 2020

eyeseast commented Feb 15, 2020

eyeseast commented Feb 17, 2020

Add support for newline-delimited JSON #13

Add support for newline-delimited JSON #13

Conversation

eyeseast commented Feb 13, 2020

simonw commented Feb 15, 2020

simonw commented Feb 15, 2020

eyeseast commented Feb 15, 2020

eyeseast commented Feb 17, 2020