keep_dims option #531

tachukao · 2020-06-16T22:33:54Z

As discussed in #529, I've added the keep_dims option for sum, mean, var, std, sem, min, max, log_sum_exp, prod and others.

The default option for keep_dims is true and is thus backward compatible.

@jzstark @ryanrhymes I'm not quite sure how keep_dims changes the computation graph in e.g., owl_base_computation_operation.ml. I've ignored the keep_dims option for now. Any ideas?

ryanrhymes · 2020-06-17T17:13:50Z

I cannot immediately say whether or not it will impact thing in CGraph, do we have corresponding unit tests for cgraph? @jzstark

ryanrhymes · 2020-06-17T17:15:28Z

@pvdhove Pierre can also comment if there is anything in his mind. Pierre did a lot of work on CGraph and is very familiar with it as well.

pvdhove · 2020-06-18T11:00:23Z

From what I recall, here are a few changes that would need to be made if you want the extra argument to have the expected behaviour with CGraph (all files that I mention are in owl/src/base/compute) (at the moment, everything should compile but the argument will not have any impact if the functions are evaluated with CGraph):

Adding bool arguments to the relevant functions in owl_computation_type.ml, and thus also in owl_computation_symbol.ml, owl_computation_optimiser.ml, and owl_computation_cpu_init.ml. Nothing else should change there (unless there is some optimisation to be done in the new case keep_dims=false, but it is not necessary and it is safer not to do any optimisation in the first place).
Using the new parameter in the relevant functions in owl_computation_operator.ml and in owl_computation_cpu_eval.ml.
If I understood correctly, the shape of the output depends on the keep_dims parameter; therefore, the function predicting the shape of the nodes should be changed in owl_computation_shape.ml -- I am not sure whether there is an existing _infer_shape_xx function that already returns the right shape with similar arguments.

For some more context about optimisation: what I had chosen in owl_computation_cpu_init.ml for the functions that you consider is not to be able to reuse the memory allocated to their inputs (called parents in the arguments, because the inputs of the functions correspond the parents in the computation graph). This behaviour is always safe even if you change the logic of the function, so nothing must be changed there. It might be possible to tell these functions that they can safely reuse the memory of their parents while they are computed for some extra memory optimisation, but that is very much implementation-dependent and a bit risky if changes are made later.

mseri

I think that before merging, the points addressed by @pvdhove for CGraph should be integrated to the PR

tachukao · 2020-07-14T09:04:33Z

I have just pushed some changes that shall allow users to use the keep_dims option in CGraph.

There are several aspects of the code that I do not fully understand, and for keep_dims=false I have followed the example provided by SumReduce, which hopefully is the right thing to do.

To infer shape in owl_computation_shape.ml I have written a new helper function _infer_shape_31. This function calls _infer_shape_04 (normally used by sum), when keep_dims=true. It calls _infer_shape_10 (normally used by sum_reduce), when keep_dims=false.

In owl_computation_cpu_evals.ml, @pvdhove am I correct in my interpretation that _eval_map_01 reuses the memory of the node by directly writing to it, and _eval_map_00 does not? Since I did not implement the keep_dims option for functions such as sum_, min_, and max_, I am calling _eval_map_00 when keep_dims=false.

Also, do we have a unit test for the operations in CGraph?

pvdhove · 2020-07-14T16:15:35Z

Everything looks good to me! Your interpretation of the _eval_maps is right, it is simply to distinguish functions that are pure or impure (further memory optimisation is done in owl_computation_cpu_init.ml). I haven't checked that the right shapes were computed because I got a bit lost in the discussion about the exact behaviour of keep_dims :-D but what you did seems reasonable.

For the optimisation part, I just don't understand why pattern_024 is called when keep_dims=false? It seems that the result is the same as pattern_000, because pattern_024 only does something special to SumReduce. But that does not change anything, as long as the optimisation is propagated to the parent.

To my knowledge, there are no unit tests for CG, only examples (both in the examples/ folder and in distinct repositories).

tachukao · 2020-07-14T18:24:09Z

oh right, I forgot about the optimization part. I meant to ask.

I didn't really understand what pattern_024 and pattern_000 do, and was simply coding by analogy. But now I gather pattern_024 is an optimization special to sum_reduce. Is that right?
And so in the latest commit, I have reverted back to pattern_000 for sum, min, max etc, which I assume means no optimization. Is that right @pvdhove ?

In any case, we should probably open an issue to add unit tests for operations in CG.

mseri · 2020-07-14T18:51:58Z

Thanks both for looking into it! I agree that a new issue for the unit tests would be good. @tachukao given that you are there, it may be a good idea to add some meta comments on why you are calling certain functions instead of others directly in the code (e.g. the optimisation in pattern_xxx or _eval_map)

pvdhove · 2020-07-15T09:44:58Z

I didn't really understand what pattern_024 and pattern_000 do, and was simply coding by analogy. But now I gather pattern_024 is an optimization special to sum_reduce. Is that right?

Yes, basically the optimization part performs a DFS of the computation graph before the evaluation of its nodes, and implements a few different techniques to make things more efficient: for instance, removing useless operations, pre-computing the parts of the graph that do not depend on the inputs (so that if the graph is evaluated multiple times with different inputs, those parts of the graph are only computed once), changing the order of the operations to make them faster or more precise, merging operations that can be performed in one go, etc. The pattern_024 appears to replace SumReduce by Sum when axis has a certain shape, probably because the implementation is faster in this specific case.

And so in the latest commit, I have reverted back to pattern_000 for sum, min, max etc, which I assume means no optimization. Is that right @pvdhove ?

Yes!

In any case, we should probably open an issue to add unit tests for operations in CG.

The various operations reuse the code written outside of the CGraph module, so I guess that is why unit tests with CG for specific operations have not seemed essential. However, it would be really good if there were unit tests for specific CG operations (the optimizations, creating the graph, manipulating the graph, etc.).

ryanrhymes · 2020-07-15T10:09:53Z

As Pierre mentioned, yes the real operations of CG are provided by the module we plug in, so they need to be tested with a specific backend. As you all said, the generic graph operations shall be tested at least, totally agree.

tachukao · 2020-07-15T10:14:46Z

@mseri I've just add some comments. Since we aren't do any optimization for these operations, we are going with the defaults mostly pattern_000 and eval_map_00

tachukao · 2020-07-20T13:08:47Z

@ryanrhymes @mseri shall we merge this if there's nothing else that needs changing right away?

ryanrhymes · 2020-07-20T13:55:57Z

The PR looks all good to me. Unit tests of CG is a separate issue we can take care of that in future.

@mseri do you have further comments on this?

@pveber

CHANGES: * various documentation improvements (thanks @pveber, @UnixJunkie, @Fourchaux) * Fix use of access operators (owlbarn/owl#543) * Upgrade to ocamlformat 0.15.0 (thanks @gpetiot owlbarn/owl#535) * keep_dims option (owlbarn/owl#531) * stats: fix infinite loop in ecdf * Use Fun.protect to ensure all file descriptors are being closed * owl_ndarray_maths: improve user experience in case of errors * owl_io: close file descriptors also in case of errors * owl_dense_ndarray_generic: fix error on printing 0-ary arrays * fixed bug in sub forward mode (owlbarn/owl#533) * Add stack to Algodiff (owlbarn/owl#528) * added log_sum_exp to Ndarray and Algodiff (owlbarn/owl#527) * added single-precision and double-precision Bessel functions to Ndarray (owlbarn/owl#526) * Fixes owlbarn/owl#518 by introducing another `/` to resolve data directory (@jotterbach owlbarn/owl#519) * Graph Slice node (resolves owlbarn/owl#483) (@mreppen owlbarn/owl#517) * Graph subnetwork: Multiple outputs (@mreppen owlbarn/owl#515) * Added kron and swap to Algodiff operations (owlbarn/owl#512) * various other small fixes

tachukao added 2 commits June 16, 2020 23:12

added keep_dims option to multiple functions

5bff1dd

keep_dims default is true

51f051d

tachukao mentioned this pull request Jun 17, 2020

keepdims option for sum, std, sem, var, mean etc. #529

Closed

ryanrhymes requested review from jzstark and ryanrhymes and removed request for jzstark June 17, 2020 17:10

ryanrhymes added the enhancement label Jun 17, 2020

ryanrhymes requested a review from mseri July 14, 2020 08:04

mseri reviewed Jul 14, 2020

View reviewed changes

tachukao added 2 commits July 14, 2020 09:54

updated keep_dims functions in CGraph

01c4407

ocamlformat

414688f

optimization pattern_024 -> pattern_000 for keep_dims functions

614d873

added some comments

73104a1

ryanrhymes approved these changes Jul 20, 2020

View reviewed changes

mseri approved these changes Jul 20, 2020

View reviewed changes

mseri merged commit 6be03ce into owlbarn:master Jul 20, 2020

tachukao mentioned this pull request Jul 24, 2020

Unit Tests for CGraph #538

Open

mseri mentioned this pull request Oct 4, 2020

[new release] owl-zoo, owl-top, owl-base, owl and owl-plplot (0.10.0) ocaml/opam-repository#17350

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

keep_dims option #531

keep_dims option #531

tachukao commented Jun 16, 2020

ryanrhymes commented Jun 17, 2020

ryanrhymes commented Jun 17, 2020

pvdhove commented Jun 18, 2020

mseri left a comment

tachukao commented Jul 14, 2020 •

edited

Loading

pvdhove commented Jul 14, 2020

tachukao commented Jul 14, 2020

mseri commented Jul 14, 2020 •

edited

Loading

pvdhove commented Jul 15, 2020

ryanrhymes commented Jul 15, 2020

tachukao commented Jul 15, 2020

tachukao commented Jul 20, 2020

ryanrhymes commented Jul 20, 2020

keep_dims option #531

keep_dims option #531

Conversation

tachukao commented Jun 16, 2020

ryanrhymes commented Jun 17, 2020

ryanrhymes commented Jun 17, 2020

pvdhove commented Jun 18, 2020

mseri left a comment

Choose a reason for hiding this comment

tachukao commented Jul 14, 2020 • edited Loading

pvdhove commented Jul 14, 2020

tachukao commented Jul 14, 2020

mseri commented Jul 14, 2020 • edited Loading

pvdhove commented Jul 15, 2020

ryanrhymes commented Jul 15, 2020

tachukao commented Jul 15, 2020

tachukao commented Jul 20, 2020

ryanrhymes commented Jul 20, 2020

tachukao commented Jul 14, 2020 •

edited

Loading

mseri commented Jul 14, 2020 •

edited

Loading