Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flattend json access for the tape #91

Open
balajijinnah opened this issue Dec 23, 2019 · 13 comments
Open

flattend json access for the tape #91

balajijinnah opened this issue Dec 23, 2019 · 13 comments

Comments

@balajijinnah
Copy link

@balajijinnah balajijinnah commented Dec 23, 2019

Do simdjson have flattened JSON access? (similar to https://github.com/pikkr/pikkr)

Will, there be any performance improvement if I use flattend json access?


Added by @Licenser as an issue description

The Tape struct should be querieable via a simplified version of JSONpath (section 3.2 in the paper linked below).

To achieve this we need at minumum:

  • a parser that takes a query string and turns it into a digestible format
  • a function that takes said format and applies it to a Tape
  • support for .<field> to query a object field
  • support for [<index>] to query array indexes
  • support for nesting those two
  • sufficient tests to cover the code (sufficient here is defined as 'does not drop crate coverage' or better)

Additional JSONpath operators are welcome but optional.

@Licenser

This comment has been minimized.

Copy link
Member

@Licenser Licenser commented Dec 23, 2019

I'm not super familiar with pikkr and got to look at the paper it references given I find the time for it.

I think the tape parsing simd_json::to_tape will give you a flattened representation of the DOM and gets up to 2GB/s in parsing speed that way. But I'm not sure that's exactly what you're looking for.

That said, again I got to do a bit more research to say one way or the other it'd bump performance, and i sure will :) thanks for bringing bringing this up!

@balajijinnah

This comment has been minimized.

Copy link
Author

@balajijinnah balajijinnah commented Dec 23, 2019

I want to do some thing this.

Example json:

{
	"name": "Licenser",
	"skills": {
		"language": "Rust"
	}
}

In order to get the language.

The parser takes this flattened key skills.langugage as input and returns String("Rust")

I'm keeping this open for tracking <3.

Thanks for looking into the issue.

@Licenser

This comment has been minimized.

Copy link
Member

@Licenser Licenser commented Dec 23, 2019

With the tape that shouldn't be too hard I think. It'd just be a traversal of the array with keeping nesting in check.

I really like the idea it'd allow some flexibility on access. I'll mark it as a good first issue and help wanted, if you or anyone is interested in grabbing it I'll gladly put some time aside to pair on it or help otherwise.

@Licenser Licenser added this to To do in 0.3 via automation Dec 23, 2019
@Licenser Licenser changed the title Question: does simdjson support flattend json access? flattend json access for the tape Dec 23, 2019
@Licenser

This comment has been minimized.

Copy link
Member

@Licenser Licenser commented Dec 23, 2019

renamed (as it moved from question to feature) and assigned to 0.3 goal

@sunnygleason

This comment has been minimized.

Copy link
Member

@sunnygleason sunnygleason commented Dec 23, 2019

@Licenser I was just looking at pikkr's benchmark(s), it looks like we might be able to do a quick apples-to-apples comparison pretty easily.

https://github.com/pikkr/pikkr/blob/master/benches/parser.rs

@balajijinnah I also wanted to mention that a comparison of the approaches (not necessarily the current implementations) is provided in the "related work" section of @lemire 's SIMDJSON paper: https://arxiv.org/pdf/1902.08318.pdf

image

image

Thank you for sharing this!

@sunnygleason

This comment has been minimized.

Copy link
Member

@sunnygleason sunnygleason commented Dec 23, 2019

Also, a JSONpath tool is part of SIMDJSON; presumably, this could be ported to simdjson-rs:

https://github.com/lemire/simdjson/blob/master/tools/jsonpointer.cpp

@Licenser

This comment has been minimized.

Copy link
Member

@Licenser Licenser commented Dec 23, 2019

there is also this: https://github.com/pikkr/rust-json-parser-benchmark for benchmarks

@Licenser Licenser added the medium label Dec 23, 2019
@Licenser

This comment has been minimized.

Copy link
Member

@Licenser Licenser commented Dec 23, 2019

I put a 'looking for contributors out' https://users.rust-lang.org/t/twir-call-for-participation/4821/285 - the issue is nicely self contained and a great chance for someone to get their feet wet and perhaps learn or practice some rust :)

@miker1423

This comment has been minimized.

Copy link

@miker1423 miker1423 commented Dec 27, 2019

Hello, I would like to take this feature, is anyone already working on it?

@Licenser

This comment has been minimized.

Copy link
Member

@Licenser Licenser commented Dec 27, 2019

Hi @miker1423 that's awesome :) and no not to my knowledge, Sunny and me stayed away from it since it's such a nice one to get started.

If you got any questions, get stuck or just have general questions feel free to ask any time!

When you open a PR just let us know how you prefer the review and what your goal is, if it's about learning we'll gladly go over it line by line and add suggestions, if it's about contributing then we'll have it through with as little hassle as possible :).

@miker1423

This comment has been minimized.

Copy link

@miker1423 miker1423 commented Dec 27, 2019

Thanks! I'll start as soon as posible.

@miker1423

This comment has been minimized.

Copy link

@miker1423 miker1423 commented Jan 21, 2020

Hello!
I had trouble during this month with my personal PC and I can´t use my work PC for anything other than my employeers stuff, so I couldn't do a lot of work during holidays but I'm back on track with the issue 😄.
Question, I've seen some implementations of JSONPath on Rust, some use a library (https://crates.io/crates/pest) that takes a grammar (https://github.com/greyblake/jsonpath-rs) and produce the valid parser and others implement the parser by hand (https://github.com/freestrings/jsonpath), which would be the expected implementation for this project?

@Licenser

This comment has been minimized.

Copy link
Member

@Licenser Licenser commented Jan 22, 2020

First of all no worries :) life happens to all of us and it should always have priority, we totally understand!

Pest vs. hand rolled is a tough question. The syntax of jsonpath is quite simple compared to a full language so building a custom parser isn't prohibitive (and might result in simpler code?) it also safes a improves build time since we can skip building pest itself. On the other hand pest can be handy to make the grammar bullet proof and since it's a well known entity might make it easier for people down the road to understand and probably has better error messages out of the box.

If I were writing this I'd probably write my own parser, because saving compile time outweighs having the simpler tooling pest would give me building it - but I'm also very comfortable with custom parsers so I'm biased. Plus I've had very little interaction with pest and it'd probably take me more time to learn the in's and outs of it then to write the parser. On the other hand, without time constraints I might have just picked pest for the sake of learning it :).

Neither would be a bad choice, and since you're implementing it, it would make sense to pick what seems the best fit for you. In my experience a clear understanding of why something was picked is often more important than what was picked unless there is some very heavy wight factor in favour of one or the other.

I suspect the jsonpath expression will be compiled to some kind of data structure before querying so performance on that path is probably not a concern either.

I hope this no-answer is a helpful one :) I don't want to arm chair quarterback your implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
0.3
  
To do
4 participants
You can’t perform that action at this time.