Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion for rtree: convert a string into a formula a la TTreeFormula #634

Closed
rmadar opened this issue Apr 11, 2020 · 12 comments · Fixed by #639
Closed

Suggestion for rtree: convert a string into a formula a la TTreeFormula #634

rmadar opened this issue Apr 11, 2020 · 12 comments · Fixed by #639

Comments

@rmadar
Copy link
Member

rmadar commented Apr 11, 2020

When building a small tool on rtree module, I realized that it would be very convenient to have an equivalent of TTreeForumula() in ROOT in order to do things like:

// Declare a string expression
equation_str := "var1+var2"
f := rtree.TreeFormula(tree, equation_str)

// When looping over event, access the value, even if 'reader' 
// wasn't explicitly initialized with 'var1' and 'var2' branches
err = reader.Read(func(ctx rtree.RCtx) error {			
         x := f.Eval()
         return nil
      })

Would that be feasible? Would there be a performance cost compared to explicitly declare a reader with the variables var1 and var2 ?

@sbinet
Copy link
Member

sbinet commented Apr 11, 2020

it should be a relatively easy thing to implement, using e.g. the go/scanner, go/parser and/or go/ast packages (or even directly the yaegi interpreter or the gomacro one).

@rmadar
Copy link
Member Author

rmadar commented Apr 15, 2020

Thanks for the update! I tried it and I like it. But ... :) there is something which is a bit annoying from the user side: the type assertion needed in the event loop. I am not sure if that can be shortcut, but it practically prevents the user of having just a string to declare in order to have the value.

I just based this statement on this example, but maybe there are other syntax (?)

@sbinet
Copy link
Member

sbinet commented Apr 15, 2020

well... Go is a statically typed language, w/o generics (think, template<class T> MyClass;).
so the only way to be able to work with any type is to return interface{}.

one way around it is to wrap a closure around your rtree.Formula:

r, err := rtree.NewReader(...)
form, err := r.Formula("float64(br1) + br2", nil)
eval := func() float64 { return form.Eval().(float64) }

var sum float64
err = r.Read(func(ctx rtree.RCtx) error {
    sum += eval()
    return nil
})

@rmadar
Copy link
Member Author

rmadar commented Apr 15, 2020

OK I didnt realise that the syntax "float64(x)" and later Eval(f).(float64) works for all types ... So my concern above is not a concern anymore. Except maybe for boolean for which the conversion into a number if forbidden in go (if I got it right) ... But that should be managable (at least it forces to be a bit more "clean"). Thanks!

@sbinet
Copy link
Member

sbinet commented Apr 15, 2020

converting a boolean into a number is relatively easy:

i := map[bool]int{true: 1, false:0}[v]
f := map[bool]float64{true: 1, false:-666.6}[v]

it's a mouthful.
but it's doable :)

but you could also just create a rtree.Formula that returns a boolean:

r, err := rtree.NewReader(...)
form, err := r.Formula(
    "math.Abs(eta) > 2.5 && len(lep_pt)>0 && lep_pt[0] > 2.5", 
    []string{"math"},
)
filter := func() bool { return form.Eval().(bool) }

@rmadar
Copy link
Member Author

rmadar commented Apr 15, 2020

Indeed, one has to decide in advance whether the string will correspond to a boolean or not (which is not the case in ROOT right now - but maybe this kind of things are good, leading to a better organisation).

I have realized something else: if the variable called in the formula is not loaded explicitly in the tree.Reader, then it doesn't seems to work (unless I did something wrong). There is an example:
https://github.com/rmadar/tree-gonalyzer/blob/master/analyzer-show/main.go#L34 . If keep this var_test commented out, the tree formula will crash because I use vbar_pt as weight.

I think this point - if I did things correctly - is an important one for the formula. Do you think it is something doable, even if in it's done in a next iteration?

@sbinet
Copy link
Member

sbinet commented Apr 16, 2020

see #642

@rmadar
Copy link
Member Author

rmadar commented Apr 16, 2020

Perfect, thanks a lot!

@rmadar
Copy link
Member Author

rmadar commented Apr 16, 2020

Here is a bit benchmark performance (home-made) about the reader.Formula, for 5 samples, 15 variables and 4 selections (ie 300 histograms, over 3.1 Mevts):

  • 00:01:55 with reader.Formula
  • 00:00:31 with the usual rtree.ReadVar

Few comments:

  • when I use only reader.Formula, I need to put at least one rtree.ReadVar in the reader, otherwise this is a nul pointer and the formula cannot be initialized.
  • in the second case, I still use reader.Formula to load the weights (1 per sample) and the 4 selections).

So it is relatively expensive - but ok, we cannot get everything. For comparison:

  • 1 variables rtree.ReadVar and 5 cuts reader.Formula --> 00:00:10
  • 5 variables rtree.ReadVar and 1 cuts reader.Formula --> 00:00:27

But to be fair, when I decrease the number of cuts, I decrease the number of if in the event loop, so it's probably not only the time of reader.Formula.

@sbinet
Copy link
Member

sbinet commented Apr 16, 2020

so a x4 factor going from compiled to "compiled+interpreted" code. not too shaby for a first stab.

let me loosen a bit the requirement on the non-zero-len slice of ReadVar.

some additional performance could be recouped by analyzing the expression to determine its type (float64, []float64, bool, etc...) and generating the exact function signature:
func F() float64 instead of (currently) func F() interface{}, so one could retrieve the eval function, type-cast it to the correct signature and remove the need to type-assert for every event:

func (form Form) Func() interface{} { return form.evalFunc }

my, err := r.Formula("math.Abs(eta) > 2.5", nil)
cut := mu.Func().(func() bool)

err = r.Read(func (rtree.RCtx) error {
    if cut() { n++ }
    return nil
})

@rmadar
Copy link
Member Author

rmadar commented May 4, 2020

After the improvement obtained in #690, the factor ~ 4-5 is reduced to a factor 3, so not bad at all! For 15 variables, 1 cut and 4 samples:

before improvement (8b07e3f)

  • rtree.ReadVar: 3.9 ms/kEvt (00:00:09 for 2880 kEvts)
  • reader.Formula : 19.3 ms/kEvt (00:00:55 for 2880 kEvts)

after improvement (75db704)

  • rtree.ReadVar: 3.2 ms/kEvt (00:00:09 for 2880 kEvts)
  • reader.Formula : 9.1 ms/kEvt (00:00:26 for 2880 kEvts)

@sbinet
Copy link
Member

sbinet commented May 4, 2020

cool: almost a x2 improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants