From ec73e59dab2df1b227ed782e4bdf4c99026e40d0 Mon Sep 17 00:00:00 2001 From: chewxy Date: Tue, 9 Feb 2021 23:45:21 +1100 Subject: [PATCH 1/3] Added ARCHITECTURE.md --- ARCHITECTURE.md | 280 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 280 insertions(+) create mode 100644 ARCHITECTURE.md diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 00000000..99b394b1 --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,280 @@ +# Architecture # + +This file describes the architecture of Gorgonia. Use this file as a guide to contributing to Gorgonia. + +# Subsystems View: Overview # + +Gorgonia consists of a few parts: + +- [A system to manage and manipulate mathematical expressions](#expr). +- [A system to perform backpropagation](#backprop). +- [A set of systems to evaluate mathematical expressions](#eval). +- [A set of systems to perform gradient descent](#solver). +- [A set of "utility" systems that support the above](#utils). + +Each of these parts have their own sub-parts. Let's explore. + + +## Mathematical Expressions ## + +Instead of tediously explaining what a mathematical expression, we'll use examples and rely on the reader's ability to perform induction. + +This is an example of a mathematical expression: + +![\sigma(\mathbf{W}'\mathbf{x} + \mathbf{b})](https://render.githubusercontent.com/render/math?math=%5CLarge+%5Cdisplaystyle+%5Csigma%28%5Cmathbf%7BW%7D%27%5Cmathbf%7Bx%7D+%2B+%5Cmathbf%7B1%7D%29) + +A mathematical expression is contained in an `*ExprGraph`. Each component of a mathematical is stored in a `*Node`. + +The usual definitions of a mathematical expression would break down the above example into: + +| Component | Type (Component Name) | +|---|---| +| ![\sigma](https://render.githubusercontent.com/render/math?math=%5Clarge+%5Ctextstyle+%5Csigma) | function | +| ![\mathbf{W}](https://render.githubusercontent.com/render/math?math=%5Clarge+%5Cdisplaystyle+%5Cmathbf%7BW%7D) | variable | +| !['](https://render.githubusercontent.com/render/math?math=%5Clarge+%5Ctextstyle+%27) | operator | +| ![\mathbf{x}](https://render.githubusercontent.com/render/math?math=%5Clarge+%5Ctextstyle+%5Cmathbf%7Bx%7D) | variable | +| ![+](https://render.githubusercontent.com/render/math?math=%5Clarge+%5Ctextstyle+%2B) | operator | +| ![\mathbf{1}](https://render.githubusercontent.com/render/math?math=%5Clarge+%5Cdisplaystyle+%5Cmathbf%7B1%7D) | constant | + +In Gorgonia, these various component types are all represented as a `*Node`. The `*Node` data type provides methods for finding out what "type" of node it is. + +The `*Node` data type is a fairly broad data type - it is used in many different contexts: + +- Manipulating mathematical expressions. +- Storing the link between a term and its derivatives (and vice versa). +- Storing the results of computations. + +### The Graph ### + +The `*ExprGraph` data type stores an entire expression. As hinted, a mathematical expression is a directed acyclic graph (DAG), usually in form of a tree. The reason for preferring a DAG over a tree is that a DAG automatically optimizes operations. So `3*(x+y) + 4*(x+y)` will only execute `(x+y)` once. + +Further, having an entire expression in a graph also allows for easier reference when it comes to backpropagation. From this point forwards, "mathematical expression" may be used interchangably with "graph". + +### Ops and Functions ### + +An operator is a notational shortcut representing an operation. A function is an abstract notion of a map from one set of values to another set. The actual act of going from one set to another set is an operation. + +Hence for simplicity's sake, there is no difference between an operator and a function in Gorgonia. They are all treated as functions, but named `Op`. The `Op`s in Gorgonia may be found in files starting with `op_`. + +The definition of the `Op` interface may be found in `op.go`. + +From this point, "`Op`" may be interchangably used with "function". + + +### Variables, Weights and Constants ### + +In most deep learning frameworks, there is a separation of notions between weights and variables. Often, variables are what the user/programmer may set. So in the example above, only ![\mathbf{x}](https://render.githubusercontent.com/render/math?math=%5Clarge+%5Ctextstyle+%5Cmathbf%7Bx%7D) is a variable, while ![\mathbf{W}](https://render.githubusercontent.com/render/math?math=%5Clarge+%5Cdisplaystyle+%5Cmathbf%7BW%7D) is a generic tensor/node. + +In Gorgonia, there is no such separation. Mathematically speaking, ![\mathbf{W}](https://render.githubusercontent.com/render/math?math=%5Clarge+%5Cdisplaystyle+%5Cmathbf%7BW%7D) and ![\mathbf{x}](https://render.githubusercontent.com/render/math?math=%5Clarge+%5Ctextstyle+%5Cmathbf%7Bx%7D) are both variables in that they both do not have any values assigned to them (at least until the user assigns a value). + +We adopt the terms used by any high school calculus textbook: a variable is defined by the fact that its derivative varies with the values assigned to it. Conversely, the derivative of a constant will always be 0. + +Constants are implemented in Gorgonia as an `Op` denoting a value as a constant. + + +## Backpropagation ## + +One of the core abilites of Gorgonia is the ability to do compute partial derivatives. This is done in two ways in Gorgonia: + +- Symbolic Differentiation +- Forwards Mode Automatic Differentiation +- Reverse Mode Automatic Differentiation (FUTURE) + +Some functions are differentiable, and some functions are not. Gorgonia handles both kinds. + +### Symbollic Differentiation ### + +Symbolic differentiation is done by manipulating the graph. Consider the following expression: + +![c = a \times b](https://render.githubusercontent.com/render/math?math=%5CLarge+%5Cdisplaystyle+c+%3D+a+%5Ctimes+b) + +Here, ![\times](https://render.githubusercontent.com/render/math?math=%5Ctextstyle+%5Ctimes) is an `Op`, while ![c](https://render.githubusercontent.com/render/math?math=%5Ctextstyle+c) is the dependent variable; and ![a](https://render.githubusercontent.com/render/math?math=%5Ctextstyle+a) or ![b](https://render.githubusercontent.com/render/math?math=%5Ctextstyle+b) are independent variables. All four components are represented as `*Node` in a `*ExprGraph`. + +The partial derivatives are defined as follows: + +![\begin{aligned} +\frac{\partial c}{\partial a} &= b\\ +\frac{\partial c}{\partial b} &= a\\ +\end{aligned}](https://render.githubusercontent.com/render/math?math=%5CLarge+%5Cdisplaystyle+%5Cbegin%7Baligned%7D%0A%5Cfrac%7B%5Cpartial+c%7D%7B%5Cpartial+a%7D+%26%3D+b%5C%5C%0A%5Cfrac%7B%5Cpartial+c%7D%7B%5Cpartial+b%7D+%26%3D+a%5C%5C%0A%5Cend%7Baligned%7D) + +Hence when we perform a symbolic differentiation, we're adding new `*Node`s to the graph, each representing a partial derivative of ![c](https://render.githubusercontent.com/render/math?math=%5Ctextstyle+c) with regards to ![a](https://render.githubusercontent.com/render/math?math=%5Ctextstyle+a) or ![b](https://render.githubusercontent.com/render/math?math=%5Ctextstyle+b). + +An `Op` that supports symbolic differentiation must implement `SDOp`. + +### Automatic Differentiation ### + +Automatic differentiation does not modify the graph. Instead, differentiation is done on values. This is handled by the evaluating VM (see section on evaluation). + +To aid in automatic differentiation, a `dualValue` type is also used. A `*dualValue` is exactly what it suggests: a value that contains two values - usually the value and a gradient. + +An `Op` that supports automatic differentiation must implement `ADOp`. + + +## Evaluation of Mathematical Expressions ## + +A mathematical expression is useless by itself. Here the word "useless" is meant literally. By itself, a mathematical expression does nothing. However, the expression may be evaluated to get values out of the expression. + +The expression `1 + 2` does nothing. However, when we evaluate it, we get `3` as a result. `3` is a value. So are `1` and `2`. Specifically, `1` and `2` are constant values. + +A value in Gorgonia is defined by the `Value` interface. It is defined in `values.go`. + +To evaluate the graph, Gorgonia uses `VM`s (virtual machine). There are three main `VM`s: + +* `*tapeMachine` +* `*lispMachine` +* `*goMachine` + +The names of the `VM`s are suggestive of their operational semantics. `*tapeMachine` acts like a Turing machine with a finite tape. `*lispMachine` acts like a [Lisp machine](https://en.wikipedia.org/wiki/Lisp_machine). `*goMachine` acts like everything is concurrent process. + +In order to evaluate using a `*tapeMachine`, the mathematical expression needs to be first compiled into a program that runs on the `*tapeMachine`. `*goMachine` and `*lispMachine` runs off the graph directly, and both these machines support automatic differentiation. + +### `*tapeMachine` ### + +### `*lispMachine` ### + +### `*goMachine` ### + + +## Gradient Descent ## + +Gorgonia comes equipped with gradient descent functionalities. The main abstract data type that defines a gradient descent algorithm is the `Solver`. There are multiple `Solver`s implemented in Gorgonia. +All `Solver`s rely on a `ValueGrad`. A `ValueGrad` is anything that can provide a value and a gradient (also itself a value). + + +## Other Subsystems ## + +### Type System ### + +Gorgonia is heavily reliant on a type system (emphasis on _a_, not _the_). The type system of the mathematical expressions is a [traditional Hindley-Milner style type system](https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner) - it is powered by the [hm](https://github.com/chewxy/hm) library. It allows for the type of a new node to be inferred. However, it also means that `Op` implementors must contend with it. + +Here's a quick primer. Like all the previous definitions, it'll be defined by examples instead of rigorous inductive definitions. + +- `Float64` is a type. Specifically, it's a type constant: it has a name, and it describes what a bunch of bytes is supposed to represent. +- `a` is a type variable. It may be replaced by other types when inference is being done. +- `Matrix a` is a type. It is a type scheme (also occasionally called a polytype). +- `a → b` is a function type from type `a` to type `b`. What are `a` and `b`? They're type variables, so they can be anything. This is also called a function's type signature. + +The following table is a "translation" from Gorgonia's type system to the closest equivalent in Go's type system: + +| Gorgonia Type System Term | Go Type System Term | Notes | +|---|---|---| +| `Float64` | `float64` | Most types in Go are type constants | +| `a` | `interface{}` | This is not an entirely accurate analogy. An `interface{}` is resolved at runtime, while `a` is resolved at compile time (of the mathematical expression, not of the Go program). | +| `Vector a` | `[]T` | The `T` represents any data type, which is what `a` is. The difference is `T` has to be a concrete data type at the time of programming, while `a` doesn't have to be. | +| `a → b` | `func(a T) U` | `T` and `U` are meta variables that the programmer has to fill in at the time of programming | + +An `Op` has to return a `hm.Type` in order to fulfil the interface. Most often, you will want to return a function type. + +#### An Example: Addition #### +Let's say we're creating an addition `Op`. What inputs do additions take? An addition function is a function that takes two numbers of the same type (let's call this type `a`) and returns the results which is also of the same type. So, an addition `Op` would have the type `a → a → a` or `(a, a) → a`. + +The difference between `a → a → a` and `(a, a) → a` is subtle. Let us translate this type signature into a Go type signature for analogy's sake to help with understandability: + +| Gorgonia Type Signature | Go Type Signature | Notes | +|---|---|---| +| `a → a → a` | `func add(a interface{}) func(a interface{}) interface{}` | Also known as "curried" function | +| `(a, a) → a` | `func add(a, b interface{}) interface{}` | | + +While it's natural to gravitate towards `(a, a) → a`, Gorgonia strongly prefers `a → a → a`. Why? Because when an `Op` is defined with only one input and one output, it makes it easier to optimize the graph. + +Having defined an addition `Op` with signature `a → a → a`, we can now have a look at what Gorgonia does with that information. + +The primary thing that the type system is useful for is unification. Our addition `Op` has a type signature `a → a → a`. Now, let's say we want to create a `*Node` representing an addition between `x` adn `y`, which are also both `*Node`s. The type of `x` is the first argument, and has a type `Float64`. The first step is to replace the type variable in the first parameter of the type signature of the addition `Op`. This is better represented below: + +``` + a → a → a + ↑ +Float64 +``` + +Having replaced `a` with `Float64`, we find the remainder of the type signature to have been transformed into `Float64 → Float64`. + +Let's say the type of `y`, the second argument is `Float32`. The next step is to repeat the same steps above: replace the type variables in the first parameter of the type signature with `Float32`. + +``` +Float64 → Float64 + ↑ +Float32 +``` + +This replacement cannot happen for two reasons: + +1. There is no type variable in the function type signature. +2. `Float32` is not `Float64`. + +Hence an error occurs. Following from this we can see that the addition `Op` is both generic and restrictive at the same time. The very same `Op` allows `Matrix a → Matrix a → Matrix a` to be a legal definition, while `Matrix a → Matrix b → Matrix a` to immediately cause an error. + +#### Another Example: Scale #### + +Now, let's consider another `Op`: a scaling function. For simplicity's sake, let's say the scale `Op` only operates on vectors. We can define a type signature as follows: `Vector a → a → Vector a`. An analogous Go type signature would be `func(floats []float64, scalar float64) []float64`. Except the scale `Op` can work on any data type. If the first argument is `Vector Float64` then the remainder function will have the signature `Float64 → Vector Float64`. + +#### Where Is The Type System Used? #### + +The type system powers the creation of new `*Node`. `ApplyOp` is the function that takes an `Op`, and the children `*Node` and returns a new `*Node` representing the `Op`. + + + + +
Quick Recipes
+ +### Create a type constant ### + +The only "kind" of type constant we will really use with Gorgonia is the `tensor.Dtype`, which itself is just a wrapper around a `reflect.Type`. + +``` +T := reflect.TypeOf(v) +C := tensor.Dtype{T} +``` + +### Create a 3 dimensional Tensor Type of any underlying datatype ### + +Gorgonia comes with functions that allow you to define a Tensor type. + +``` +of := hm.TypeVariable('a') +dims := 3 +T := &gorgonia.TensorType{ + Dims: dims, + Of: of, +} +``` + +### Create a Matrix Type of Float64s ### + +``` +T := &gorgonia.TensorType { + Dims: 2, + Of: tensor.Float64, +} +``` + +### Create a Function Type `a → b` ### + +``` +a := hm.TypeVariable('a') +b := hm.TypeVariable('b') +T := hm.NewFnType(a, b) +``` + +
+ +# File-Based View # + +Another way to get around this repository is via the files. The files are quite well named (barring a few files whose names came from a particularly childish developer, @chewxy). + +## API Files ## + +The majority of APIs of Gorgonia can be found in + +* `api_gen.go` +* `operations.go` +* `gorgonia.go` + +## Op Files ## + +All files pertaining to the implementation of `Op` can be found in files starting with `op_` + + +# How Gorgonia is Developed # + +There are large parts of Gorgonia that are machine generated. TODO From 69aee2750a5d7ac2813a506d0ebaca02772c7818 Mon Sep 17 00:00:00 2001 From: chewxy Date: Tue, 9 Feb 2021 23:50:06 +1100 Subject: [PATCH 2/3] Updated formatting --- ARCHITECTURE.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 99b394b1..51b972c7 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -15,6 +15,7 @@ Gorgonia consists of a few parts: Each of these parts have their own sub-parts. Let's explore.
+ ## Mathematical Expressions ## Instead of tediously explaining what a mathematical expression, we'll use examples and rely on the reader's ability to perform induction. @@ -72,6 +73,7 @@ We adopt the terms used by any high school calculus textbook: a variable is defi Constants are implemented in Gorgonia as an `Op` denoting a value as a constant. + ## Backpropagation ## One of the core abilites of Gorgonia is the ability to do compute partial derivatives. This is done in two ways in Gorgonia: @@ -110,6 +112,7 @@ To aid in automatic differentiation, a `dualValue` type is also used. A `*dualVa An `Op` that supports automatic differentiation must implement `ADOp`. + ## Evaluation of Mathematical Expressions ## A mathematical expression is useless by itself. Here the word "useless" is meant literally. By itself, a mathematical expression does nothing. However, the expression may be evaluated to get values out of the expression. @@ -129,18 +132,23 @@ The names of the `VM`s are suggestive of their operational semantics. `*tapeMach In order to evaluate using a `*tapeMachine`, the mathematical expression needs to be first compiled into a program that runs on the `*tapeMachine`. `*goMachine` and `*lispMachine` runs off the graph directly, and both these machines support automatic differentiation. ### `*tapeMachine` ### +TODO ### `*lispMachine` ### +TODO ### `*goMachine` ### +TODO + ## Gradient Descent ## Gorgonia comes equipped with gradient descent functionalities. The main abstract data type that defines a gradient descent algorithm is the `Solver`. There are multiple `Solver`s implemented in Gorgonia. All `Solver`s rely on a `ValueGrad`. A `ValueGrad` is anything that can provide a value and a gradient (also itself a value). + ## Other Subsystems ## ### Type System ### From 04301b2bec1d366ba9ee50fe5b94492266b7bff0 Mon Sep 17 00:00:00 2001 From: chewxy Date: Tue, 9 Feb 2021 23:56:42 +1100 Subject: [PATCH 3/3] Fixed minor typos --- ARCHITECTURE.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 51b972c7..32c3e120 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -64,7 +64,7 @@ From this point, "`Op`" may be interchangably used with "function". ### Variables, Weights and Constants ### -In most deep learning frameworks, there is a separation of notions between weights and variables. Often, variables are what the user/programmer may set. So in the example above, only ![\mathbf{x}](https://render.githubusercontent.com/render/math?math=%5Clarge+%5Ctextstyle+%5Cmathbf%7Bx%7D) is a variable, while ![\mathbf{W}](https://render.githubusercontent.com/render/math?math=%5Clarge+%5Cdisplaystyle+%5Cmathbf%7BW%7D) is a generic tensor/node. +In most deep learning frameworks, there is a separation of notions between weights and variables. Often, variables are what the user/programmer may set. So in the example above, only ![\mathbf{x}](https://render.githubusercontent.com/render/math?math=%5Clarge+%5Ctextstyle+%5Cmathbf%7Bx%7D) is a variable, while ![\mathbf{W}](https://render.githubusercontent.com/render/math?math=%5Clarge+%5Cdisplaystyle+%5Cmathbf%7BW%7D) is a generic tensor/node/weights. In Gorgonia, there is no such separation. Mathematically speaking, ![\mathbf{W}](https://render.githubusercontent.com/render/math?math=%5Clarge+%5Cdisplaystyle+%5Cmathbf%7BW%7D) and ![\mathbf{x}](https://render.githubusercontent.com/render/math?math=%5Clarge+%5Ctextstyle+%5Cmathbf%7Bx%7D) are both variables in that they both do not have any values assigned to them (at least until the user assigns a value). @@ -84,7 +84,7 @@ One of the core abilites of Gorgonia is the ability to do compute partial deriva Some functions are differentiable, and some functions are not. Gorgonia handles both kinds. -### Symbollic Differentiation ### +### Symbolic Differentiation ### Symbolic differentiation is done by manipulating the graph. Consider the following expression: @@ -180,7 +180,7 @@ The difference between `a → a → a` and `(a, a) → a` is subtle. Let us tra | Gorgonia Type Signature | Go Type Signature | Notes | |---|---|---| -| `a → a → a` | `func add(a interface{}) func(a interface{}) interface{}` | Also known as "curried" function | +| `a → a → a` | `func add(a interface{}) func(a interface{}) interface{}` | Also known as "Curried" function | | `(a, a) → a` | `func add(a, b interface{}) interface{}` | | While it's natural to gravitate towards `(a, a) → a`, Gorgonia strongly prefers `a → a → a`. Why? Because when an `Op` is defined with only one input and one output, it makes it easier to optimize the graph. @@ -221,9 +221,9 @@ Now, let's consider another `Op`: a scaling function. For simplicity's sake, let The type system powers the creation of new `*Node`. `ApplyOp` is the function that takes an `Op`, and the children `*Node` and returns a new `*Node` representing the `Op`. - +
-
Quick Recipes
+ Quick Recipes ### Create a type constant ### @@ -264,7 +264,7 @@ b := hm.TypeVariable('b') T := hm.NewFnType(a, b) ``` -
+ # File-Based View #