Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for newlin-delimited JSON (NDJSON) #194

Closed
amanjeev opened this issue May 15, 2021 · 3 comments
Closed

Add support for newlin-delimited JSON (NDJSON) #194

amanjeev opened this issue May 15, 2021 · 3 comments
Labels
perf Performance

Comments

@amanjeev
Copy link

amanjeev commented May 15, 2021

Summary

#188 as an exercise showed that the feature to work with newline-delimited JSON (NDJSON) is not implemented in this crate.

Why

This feature is helpful when you have large number of records but each of those records are small JSON objects per line. This is often the case with large JSON files and looping over them and calling simd-json on each line is not going to help. This is added by @Licenser in this comment:

Ja the lines are fairly short too the advantages are a lot smaller (sometimes detrimental) as there is an initial cost to pay for filling the registers, doing multiple runs etc. can overshadow the performance gain for very small payloads.

@Licenser also adds

NDJSON would be incredibly cool (especially if we manage to realize in a streaming fashion / as an iterator)

What

Upstream simdjson has this feature called parse_many. Porting that to this crate is the first step.

!!!NEEDS MORE DETAILS!!!

@Licenser
Copy link
Member

Just sketching something here. An API that would be really nice would be something like (non valid rust syntax but pseude code!

fn iter_lines(r: Read) -> Iter<simd-json::DeserilizeableType>;

for items in iter_lines(file) {
   do_stuff(item)
}

@Licenser
Copy link
Member

see #124 for additional details

@Licenser
Copy link
Member

superseded by #349

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf Performance
Projects
None yet
Development

No branches or pull requests

2 participants