New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Broadcast function is exported but not usable outside of the package #223

Open
owulveryck opened this Issue Aug 17, 2018 · 16 comments

Comments

Projects
None yet
3 participants
@owulveryck
Member

owulveryck commented Aug 17, 2018

I need to implement a "add" operator for two tensors with a broadcasting mechanism as described here.

The Broadcast function seems to be a perfect fit for this. Moreover, the test is partially implementing what I am trying to do.
But nor the ʘBinaryOperatorType or any other binOp implementations are exported.

Therefore the Broadcast function can only be used within the Gorgonia package.

Maybe we should make it private to avoid confusion in the documentation and expose "Broadcasted version" of some operators instead? What do you think?

@chewxy

This comment has been minimized.

Member

chewxy commented Aug 17, 2018

Huh weird. Once upon a time ʘBinaryOperatorType was exported. Let me go find out more

@chewxy

This comment has been minimized.

Member

chewxy commented Aug 17, 2018

(why isn't ʘ a capital letter? Which charset is it from?)

@chewxy

This comment has been minimized.

Member

chewxy commented Aug 17, 2018

I had a look at Broadcast - hadn't touched it in years. That and the elementwise operations should be hung in the Shitty Software Engineering Hall of Fame. Will get in contact with you to work on this further.

@owulveryck owulveryck self-assigned this Aug 18, 2018

@chewxy

This comment has been minimized.

Member

chewxy commented Aug 19, 2018

https://github.com/gorgonia/gorgonia/releases/tag/v0.9.0-beta < -- this is out so your PR can be based off this

@deasmhumhna

This comment has been minimized.

deasmhumhna commented Oct 28, 2018

Trying to implement a column-wise normalization and running into the same problem. I don't feel I have enough of an understanding of gorgonia yet to attempt a PR, but I am curious about the design choice of requiring an explicit broadcasting pattern rather than having an implicit set of rules that act on ndarrays based on their shapes (as Numpy does).

@chewxy

This comment has been minimized.

Member

chewxy commented Oct 28, 2018

I put it to you that having implicit broadcasting leads to subtle errors you may not have caught. Even when you are familiar with the rules.

The rule is quite simple: a lower dimensioned tensor is an embedding within the higher dimension tensor. Easy to state, but not easy to remember when working on complicated things. Having an explicit broadcast meant that you know exactly where the broadcast happens in code.

As for not making it usable (the function is exported, the op isn't)... that's really more of a technical issue. I believe at one point the circle with a dot was considered exported. Or I may be wrong.

@deasmhumhna

This comment has been minimized.

deasmhumhna commented Oct 28, 2018

I figured that was the primary motivation. It's a code clarity vs. code complexity trade-off for sure. Also, as owulveryck mentioned, wouldn't you also have to export the various lowercase implementations as well? It would also be nice if you could broadcast a custom BinaryOp.

@chewxy

This comment has been minimized.

Member

chewxy commented Oct 28, 2018

Send PR! :) (you seem to be aware of the BinaryOp type)

@chewxy

This comment has been minimized.

Member

chewxy commented Oct 28, 2018

oh I just realized you're not @owulveryck

@deasmhumhna

This comment has been minimized.

deasmhumhna commented Oct 28, 2018

I can definitely try! I'm slowly becoming familar with the library. First I'm going to try to implement some of the special functions in math as Op.

@chewxy

This comment has been minimized.

Member

chewxy commented Oct 28, 2018

Be sure to implement them as ADOp and SDOp as well. ADOp plays nice with the lispmachine, giving you autodifferentiation, SDOp place nice with the compiled machines (*tapeMachine, ccsMachine, etc)

@owulveryck

This comment has been minimized.

Member

owulveryck commented Oct 28, 2018

I have though about it. I really like the idea of gorgonia being as close as possible of the arithmetic and therefore leaving the broadcasting function apart.
But the more I work with it, the more I realize it can be tricky to use.
I still haven’t found any elegant solution.
I’d like something usable like:

Add(a, broadcast(b, broadcastPattern))

This would not break the API, but this is not possible

@chewxy

This comment has been minimized.

Member

chewxy commented Oct 28, 2018

I really like the idea of gorgonia being as close as possible of the arithmetic and therefore leaving the broadcasting function apart.

Broadcasting is actually technically part of the arithmetics. If a and b are (2,3) and (3)-shaped tensors , then technically a + b is defined (that is to say, we assume b is an embedding in the tensorspace of a)

The problem is in programming. Implicit broadcasting causes errors. Sometimes, these errors are very subtle - the system does not report an error, because it's technically correct, and as a result your neural network becomes silly.

There is a solution - the longer term thing I'm working on :P - better type systems to constraint these sorts of problems. If you trawl through the code, you may see scars of previous attempts.

In the past I had sketched through some solutions. I have some solutions I know works, but are not suitable for production use. For example, if we were to go down the dependent types path, then a new calculus is required. If we were to go down the refinement types path then a SAT solver is required. There are also some other esoteric calculi out there that handles these sorts of situations well, but those are not well trod paths and would possibly have performance issues.

As such I eventually left the type system as a frankenstein simply-typed system with some amount of HM inference.

The Problem In General

The problem can be described as such:

There are currently a few type signatures for Add in Gorgonia:

Add : (Floats a) ⇒ Tensor a → Tensor a → Tensor a
Add : (Floats a) ⇒ Tensor a → a → Tensor a
Add : (Floats a) ⇒ a → Tensor a → a
Add : (Floats a) ⇒ a → a → a
Add : (Floats a) ⇒ a → a → Bool
Add : (Floats a) ⇒ Tensor a → Tensor a → Tensor Bool
Add : (Floats a) ⇒ Tensor a → a → Tensor Bool
Add : (Floats a) ⇒ a → Tensor a → Bool

Through some magic of introspecting and reflecting, Gorgonia is able to correctly guess which type signature is to be used. This "magic" is not really nice in my opinion. I do rather wish I can get rid of it.

Any type signature of Add has to pass the following tests. The notation is a bit complicated, but I wanted to show the input types of the arguments. Add(a, b (2,3)-Tensor) can be read as call Add on a and b, both of which are the Tensor type, and has a shape of (2,3). The -> shows the type returned.

  • Add(a, b (2,3)-Tensor) -> (2,3)-Tensor
  • Add(a (2,3)-Tensor, b Scalar) -> (2,3)-Tensor
  • Add(a Scalar, b (2,3)-Tensor) -> (2,3)-Tensor
  • Add(a, b Scalar) -> Scalar
  • Add(a (2,3)-Tensor, b (3)-Tensor) -> (2,3)-Tensor
  • Add(a (3)-Tensor, b (2,3)-Tensor) -> (2,3)-Tensor

And it should fail/error in these situations:

  • Add(a (2,3)-Tensor, b (3,2)-Tensor) -> error
  • Add(a (3,2)-Tensor, b (2,3)-Tensor) -> error
  • Add(a (2,3)-Tensor, b (6)-Tensor) -> error
  • Add(a (2,3)-Tensor, b (2)-Tensor) -> error

Of particular interest may be Add(a (2,3)-Tensor, b (6)-Tensor). In what world could anyone possibly do this? My answer to that is once you allow reshape operations into a mathematical system, to many people, a (2,3)-Tensor and a (6)-Tensor are the same... the discipline goes away. This obviously comes from real life experience of dealing with other people's code 😝.

How Does This Relate to Broadcast?

An ideal situation would remove Broadcast - let the underlying calculi of types handle the situations. Unfortunately due to the crappy type system that Gorgonia has, this cannot currently be done without careful analysis of programs.

What Is To Be Done?

I don't particularly know. I keep experimenting with ideas. I hope others do so too.

@chewxy

This comment has been minimized.

Member

chewxy commented Oct 29, 2018

@owulveryck

This would not break the API, but this is not possible

APIs CAN break. Please do design a system that breaks API if necessary.

@deasmhumhna

This comment has been minimized.

deasmhumhna commented Oct 29, 2018

@owulveryck

Add(a, broadcast(b, broadcastPattern))

To work correctly, the broadcasting function needs to know about both a and b. So either both need to be passed to a one-shot function or broadcasting becomes a two-step process: first a function to figure out the broadcast shape for a and b and then a function to broadcast either/both to this new shape. This would be like using a broadcast object to get the shape and then broadcast_to to resize in NumPy. The good thing about this route is that it separates broadcasting from the operation itself, so that none of the currently private types or instances need to be exported and it will work with any custom Op as well.

@owulveryck

This comment has been minimized.

Member

owulveryck commented Nov 1, 2018

Thanks for all the explanation @chewxy!
This is very helpful.

I propose a new API which is "semi-manual" for now. My understanding is that broadcasting only applies to binOps, so I have changed all the implementation to take the broadcastPattern as the third argument of all the implemented operators (Add, HAdamardProd, ...).

It does not let the underlying calculi of types handle the situations, but it is sufficient to close this Issue while waiting for a better implementation.

I will practice with it and see if I find a better solution. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment