Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

normalizeNumbers panic #212

Open
xakepp35 opened this issue Mar 20, 2023 · 12 comments
Open

normalizeNumbers panic #212

xakepp35 opened this issue Mar 20, 2023 · 12 comments

Comments

@xakepp35
Copy link

It looks like normalizeNumbers function is a problem, when unavoidably called from RunWithContext

Please, check out my reproduction example on go.dev

Is there any option to export it and let package user to decide wherever to call it, or not at all?

To rephrase - RunWithContext should be able to perform as a "pure" function - produce output, while never touching user input at all, because you cant know, if its safe to do any modification with it

Can RunWithContext behave as a "copy-on-write" executor, along whole execution path?

@xakepp35
Copy link
Author

can we please do 2 changes?

  1. add a funcction
func (c *Code) RunWithContextUnnormalized(ctx context.Context, v any, values ...any) Iter {
	switch {
	case len(values) > len(c.variables):
		return NewIter(&tooManyVariableValuesError{})
	case len(values) < len(c.variables):
		return NewIter(&expectedVariableError{c.variables[len(values)]})
	}
	return newEnv(ctx).execute(c, v, values...)
}
  1. capitalize NormalizeNumbers so it can be used by client - this one isnt very important because we can copy source code of normalize numbers, but without first change we cant proceed..

@itchyny
Copy link
Owner

itchyny commented Mar 20, 2023

Let me think carefully about changing (or adding) the API. Delaying the normalization could introduce undesired overhead due to increasing count of number conversions (e.x. echo '[0,1,2,3,4]' | gojq '[.[] * .[] * .[]]'), also messes the implementation to normalize here and there. I admit that the number normalization could bother some package users, but currently README.md clearly mentions about this limitation; do not give values sharing same data between multiple calls. However, this is the initial design, and we may improve this.

@itchyny
Copy link
Owner

itchyny commented Mar 20, 2023

Delaying the normalization could affect the package users adding custom internal functions with gojq.WithFunction.

@wader
Copy link
Contributor

wader commented Mar 20, 2023

Possible to make it a gojq.WithInputNormalize(<bool>) compile option etc? this would also apply to gojq.WithVariables values i guess?

@xakepp35
Copy link
Author

xakepp35 commented Mar 20, 2023

Its not nesesary "delaying".. in some scenarios its opposite - normalisation can be done in advance (caching is done on user side). In our scenario, for example, we have smart cache, which tries to do "best effort".. initially it stores data as messagepack'ed []byte payloads. If a request happens to some data part - it does unmarshalling to interface{}, which is now followed by immediate normalisation, and then caching the resulting object. Next jq operations are just reusing cached object without slow unmarshal or normalisation, while calls made from parallel goroutines serving requests.

Second good option (and basically even more preferred one) may be to expose NewEnv(ctx) and Execute funcs - then you can keep api, just allowing some lower level access. That way we will optimize even more. we have multiquery, and query injections, so basically we want to create env once and then call Execute, Execute, .. several times, providing precompiled Code from compiler cache, which we also have :-) because compile stage is slow.. thete are alot of things to make it work in performant manner, saving allocations. But the key is to open up some minor amount of internals, offering ability to operate withour reallocations, under some restricted conditions.. Indeed, Execute method - is obviously and logically a method of env VM, and not a Code bytecode, and its obvious that its unsafe to run parallel executes on same vm at same time, but call Execute in a sequene without reallocating env - is something that we definetely want to acheive

modularity is good - why not doing that? if i need ill call. if i dont need - I wont call and save save save ticks!

func (c *Code) ValidateInput(v any, values ...any) error {
	switch {
	case len(values) > len(c.variables):
		return &tooManyVariableValuesError{}
	case len(values) < len(c.variables):
		return &expectedVariableError{c.variables[len(values)]}
	default:
		return nil
	}
}

typically good thing is to have both high level (for beginners/simple use) and lower level available.. we started to demand lower one, and agree to take responsibilities to make it work good..

btw, running 2 Execute() in a row on a same vm , preallocated by NewEnv(ctx) - does not work, second run is not updating the output :-(

PS. @wader - we are now removing usage of gojq.WithVariables because it requires compiler stage - replacing it with custom function, obraining variables from context at run time

@xakepp35
Copy link
Author

xakepp35 commented Mar 20, 2023

sorry for such many info here, its actually worth multiple issues at once, all of them are arising when you re trying to make it perform fast, processing thousands and thousands of queries (of same format but with different arguments) per moment in a cloud env - you just want to reuse whatever theoretically can be reused!

@itchyny
Copy link
Owner

itchyny commented Mar 21, 2023

At this moment, I have no good idea to solve this issue without increasing the number of parsing or breaking the custom internal function APIs. Also, I think in the most use case of this package (like gh), only one querying is executed against one input, so I can't imagine specific application that needs to reuse the same input value.

@reyoung
Copy link

reyoung commented Mar 24, 2023

I actually face the same issue last year when using jq as a script language in an internal system.

I just patch JQ, and manually copy map and slice instead.

However, it creates a lot of memory and increase GC pressure.

reyoung@19cb611

@itchyny
Copy link
Owner

itchyny commented Apr 9, 2023

I have pushed an experimental fix for this issue to json-number branch. You can install with go install github.com/itchyny/gojq@json-number.

@xakepp35
Copy link
Author

xakepp35 commented May 30, 2023

Sorry for such late response. Looks like that fix is working good. We switched to use of json-number branch month ago. No issues were found with it after 1 month of testing.

@xakepp35
Copy link
Author

xakepp35 commented Jun 23, 2023

@itchyny Would json-number be merged into master? looks like it resolved our issue. We're using it in prod for 2 months, no problems were observed yet. Moreover, it allowed us to find and see the problems that we had with types on our side, because it started to preserve the type, thats good. Can we merge and close this one then?

@itchyny
Copy link
Owner

itchyny commented Nov 5, 2023

This issue is blocked by go-yaml/yaml#807.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants