-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better errors and jsoo friendly #45
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks nice overall, and I guess it forced you to learn a bit more about the Jsonm implementation :-p
(Thanks for going to the trouble of working on the merge yourself!)
| #atom as v -> k (v :> value) | ||
| l -> raise (Abort (`Unexpected (`Lexeme (loc decoder, l, "value")))) ) | ||
| Arr vs when is_jsoo -> | ||
k (unroll_arr vs decoder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for this special case? If it is performance, do you have performance numbers to justify it?
I am not fond of the extra complexity here:
- now you have two different behaviors depending on the environment, and we have to ensure that both are correct
- it is not clear to me why we cannot have one approach that works well for both
- the
unroll_arr
approach has the downside of not being tail-recursive, which may or may not be a problem, but if it's not, and if it really is faster, why not use it everywhere? - what about the case of objects instead of arrays, why are they not treated in the same way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smondet gives me an example (see jsoo/contracts.ml
) where we reach the stack-overflow with a web-browser (in my case, Firefox 85.0.1). It's a hot-fix for this special case and avoid a systematic call to loop
for each values of a huge flat-array. You can reproduce the stack-overflow if you remove the pattern and do:
$ dune build @app_with_jsoo
$ firefox jsoo/min.html
On the JS prompt, you can see the stack-overflow. I'm not sure about the best solution but it seems fair (even if it's not really right) to switch to an other implementation for JSOO (as angstrom does). Of course, if you have a better solution, I will happy to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should understand where the stack overflow comes from, and what's a proper way to fix it. (1) the fix may be simpler than the current proposal and (2) there may be other similar stack overflow in other places (in particular the parsing of objects) that would be fixed as well.
I'm not familiar with jsoo but my guess would be that code such as
loop decoder (Value v) (fun v -> loop decoder (Arr (v :: vs)) k) )
correctly uses a tail-call for the letfmost call to loop
, but not the second call inside the function.
The code that you wrote in this PR does not use constant stack space (unlike the original version when running on a backend with proper tail-call optimization), but its stack usage is proportional to the maximum nesting depth of the input JSON, so it does well on long lists but not on deeply-nested list.
If this is an acceptable tradeoff (we are okay with stack usage proportional in nesting depth), there may be a simpler way to rewrite this function in the first place. If we really want to ensure that no (possibly malicious) JSON input can overflow the stack, then you need a different fix in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmhmm, so you advise to re-implement the whole json_of_src
according to the target?
- keep the TCO one for Bytecode/Native back-end
- provide a new one which works with
js_of_ocaml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My impression is that all backends could use a version that works for jsoo. What is the justification for using a different one for the native backend?
But the question remains of whether your not-tail-recursive implementation for jsoo is the right one, what about stack overflows caused by malicious inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jsoo
is not able to optimize the tail-call recursion. We finally get a problem between the native version and and unroll_arr
for example. By this fact, the current version (without unroll_arr
) will grow the stack (without TCO) for long list (as I get with jsoo/contract.ml
).
A common implementation should be:
- something does not grow the stack (TCO or not)
- fast as we can (or something similar such as a
tailcall
version ofjson_of_src
)
The tailcall
version (the first one without this PR) fits well for native and bytecode backend due to the TCO but it can not work for jsoo
which is not able to executable this implementation without a large nested-in-depth JSON objects. The solution from @smondet (with your advise) is good for native and bytecode back-end and solve the issue for nested-in-depth JSON objects for jsoo
(see jsoo/script.ml
) . But the problem remains for large flat array (jsoo/contracts.ml
).
You are true that large nested-in-depth which contains some large flat array is a problem for us when we will get, as before, a stack-overflow. But it seems clear, from the @smondet's corpus. that such JSON objects should appear and they are valid. For my perspective, I will prefer to separate the jsoo
impl. (which does not fit under our functional way) and the native/bytecode impl. I don't know how to implement something which can work on both 😕 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My proposal:
- merge Provide *_result variants of reading functions with informative errors #43
- recreate this PR rebased on it, with just the changes for jsoo stack-consumption, so that it's easier to look at this stuff
- then we can iterate a bit more to try to find a version that makes everyone happy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rebased the pull-request, so the diff should be more easy to read 👍.
let identity x = x | ||
|
||
let is_jsoo = match Sys.backend_type with | ||
| Native | Bytecode -> false | _ -> true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the way you define is_jsoo
seems fragile. What if new backends show up in the future? What do Bucklescript, Wasm use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this code comes from angstrom
, may be @hhugo knows a better way to identify the back-end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can pattern match on Other "js_of_ocaml"
I think. But I suspect the current behavior is better (e.g. bucklescript/rescript would probably have the same limitation as jsoo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename to not mention jsoo explicitly
bcd01fc
to
a0a9caa
Compare
I work with @smondet and he asked me to take a look at this. I have a few notes:
My preferences are, in descending order:
|
To make sure we're all on the same page, the reason for the stackoverflow is that there is no general tailcall optimization with jsoo. In particular, cps in tailcall position will not be optimized. I'm opening yet another PR to address this #47 |
This should be closed |
This PR is a mix of #42 and #43 (due to conflicts) to unlock a new release of
ezjsonm
. I added a huge test withjs_of_ocaml
to see if we can parse without stack-overflow huge JSON objects (as requested by @smondet). I would like to have some feedbacks from @gasche and @smondet about this PR before to merge.