![LOGO](logo.png)
# (eval '(HyTorch Tutorial))
Introduction to PyTorch Meta-Programming Using the Lisp Dialect Hy

Lead Maintainer: [Rafael Zamora-Resendiz](https://github.com/rz4)

**HyTorch** is a Hy (0.16.0) library running Python (3.7) and PyTorch (1.0.1)
for use in rapid low-level development of deep learning (DL) systems as well as
for experiments in DL meta-programming.

##### Table of Contents
1. [Motivation](#s1)
2. [Installation](#s2)
3. [Hy: Lisp Flavored Python](#s3)
4. [HyTorch in Action](#s4)
    1. [PyTorch Models as Hy-Expressions](#s41)
    2. [Expression Threading](#s42)
    3. [Pattern Matching](#s43)
    4. [Expression Refactoring](#s44)
5. [Network Analysis and Meta-Analysis Using HyTorch](#s5)
    1. [FUTURE: Fetching Internal Network Components](#s51)
    2. [FUTURE: Probing Networks Using Tests](#s52)
    3. [FUTURE: Loading Foriegn Pytorch Models](#s53)
    4. [FUTURE: Comparing Network Architectures](#s54)
6. [FUTURE: Hyper-Parameter Search Using Genetic Programming]()
    
---

<a name="s1"></a>
## Motivation
The dynamic execution of PyTorch operations allows enough flexibility to change
computational graphs on the fly. This provides an avenue for Hy, a lisp-binding
library for Python, to be used in establishing meta-programming practices in the
field of deep learning.

While the final goal of this project is to build a framework for DL systems to have
access to their own coding, this coding paradigm also shows promise at accelerating the development of new deep learning models while allowing significant manipulation of low-torch tensor operations at runtime. A common trend in current DL packages is an abundance of object-oriented abstraction with packages such as Keras. This only reduces transparity to the already black-box nature of NN systems, and makes interpretability and reproducibility of models even more difficult.

In order to better understand NN models and allow for quick iterative design
over novel or esoteric architectures, a deep learning programmer requires access to an
environment that allows low-level definition of tensor graphs and provides methods to quickly access network components for analysis, while still providing a framework to manage large architectures. I believe that the added expressability of Lisp in combination with PyTorch's functional API allows for this type of programming paradigm, and provides DL researchers an extendable framework which cannot be matched by any other abstracted NN packages.

<a name="s2"></a>
## Installation

The current project has been tested using Hy 0.16.0, PyTorch 1.0.1.post2 and
Python 3.7. The following ***Pip*** command can be used to install **HyTorch**:

```
$ pip3 install git+https://github.com/rz4/HyTorch
```
---

<a name="s3"></a>
## Hy: Lisp Flavored Python

"Hy is a dialect of the language Lisp designed to interact with Python by translating expressions into Python's abstract syntax tree (AST). Similar to Clojure's mapping of s-expressions onto the Java virtual machine (JVM), Hy is meant to operate as a transparent Lisp front end to Python. Lisp allows operating on code as data (metaprogramming). Thus, Hy can be used to write domain-specific languages. Hy also allows Python libraries, including the standard library, to be imported and accessed alongside Hy code with a compiling step converting the data structure of both into Python's AST." [Source: Wikipedia]()

It's recommended to look over [Hy's tutorial](http://docs.hylang.org/en/stable/tutorial.html) as it does a good job showcasing the various features of Hy. In short, Hy provides a Python-friendly Lisp which anyone who knows Python can easy pickup. Plus, you can import any Python code into Hy as well as importing Hy code to Python! Here is just a little taste of how Hy is structured: 

In [1]:
; Lisp-style Comments

;; Define function hello-world
(defn hello-world [name] (print "Hello" name "! It's a great day to be Lisping!"))

;; Evaluate
(hello-world "FooManCHEW")

Hello FooManCHEW ! It's a great day to be Lisping!


[None, None]

In [2]:
; Importing Numpy
(import [numpy :as np])

; Still be able to access attribute functions using dot notation
(setv x (np.ones '(10 10)))

; Quasi-quote and unquote
(print `(+ (+ 1 2) ~(- 4 3)))

HyExpression([
  HySymbol('+'),
  HyExpression([
    HySymbol('+'),
    HyInteger(1),
    HyInteger(2)]),
  1])


[None, None, None]

---
<a name="s4"></a>
## HyTorch In Action

<a name="s41"></a>
### PyTorch Models as Hy-Expressions:

First, let's load the necessary packages from HyTorch and Pytorch. We also will be setting our device to available resources. 

In [3]:
; Importing HyTorch Tools
(import [hytorch.core [|gensym print-lisp]])
(require [hytorch.core [|setv]])
(require [hytorch.thread [|-> |->> *-> *->>]])

; Import PyTorch
(import torch)
(import [torch.nn.functional :as tfun])

; Checking for available cuda device
(setv device (torch.device (if (.is_available torch.cuda) "cuda:0" "cpu")))

[None, True, True, None, None, None]

Next, let's define a list of leaf tensors which will be our trainable parameters in the network. Hy allows us to write the leaf tensor defintions as Hy-expressions and store the code in a unevaluated list. We then generate a list of symbols that will be mapped to the tensors using `|gensym` and define an expression to assign the evaluated leaf tensor definitions to the list of generated symbols using `|setv`.

In [4]:
; Defining leaf tensors and variable names
(setv leaf-tensor-defs '[(torch.empty [10] :dtype torch.float32 :requires-grad True)
                         (torch.empty [10 10] :dtype torch.float32 :requires-grad True)
                         (torch.empty [10] :dtype torch.float32 :requires-grad True)
                         (torch.empty [1 10] :dtype torch.float32 :requires-grad True)
                         (torch.empty [1] :dtype torch.float32 :requires-grad True)])

; Generate symbols for tensor-defs
(setv leaf-tensors (|gensym leaf-tensor-defs "L_"))

; Define assign expression for leafs
(setv create-leafs `(|setv ~leaf-tensors ~leaf-tensor-defs))

; Print reference symbols
(print-lisp leaf-tensors)

[L_0 L_1 L_2 L_3 L_4] 


[None, None, None, None]

Since our leaf tensors are empty at the momement, let's initialize each according to a random normal distribution and push each to our computing device. Again, the procedure for each leaf tensor can be defined in a list of unevaluated Hy-expressions. We can then apply these procedures by threading them to our leaf tensor symbols and store this expression for later use (thread macros are explained in the next section). Finally, we generate a new set of symbols for the initialized weights and define a setter expression for them.

In [5]:
; Define intialization procedures
(setv tensor-inits '[(-> torch.nn.init.normal (.to device))
                     (-> torch.nn.init.normal (.to device))
                     (-> torch.nn.init.normal (.to device))
                     (-> torch.nn.init.normal (.to device))
                     (-> torch.nn.init.normal (.to device))])

; Define init procedure application to leafs
(setv init-leafs (macroexpand `(|-> ~leaf-tensors ~tensor-inits)))

; Generate symbols for init weights
(setv w-tensors (|gensym leaf-tensor-defs "W_"))

; Define assign expression for weights
(setv init-weights `(|setv ~w-tensors ~init-leafs))

; Print
(print-lisp w-tensors)
(print-lisp init-leafs)

[W_0 W_1 W_2 W_3 W_4] 
[(.to (torch.nn.init.normal L_0) device) (.to (torch.nn.init.normal L_1) device) (.to (torch.nn.init.normal L_2) device) (.to (torch.nn.init.normal L_3) device) (.to (torch.nn.init.normal L_4) device)] 


[None, None, None, None, None, None]

Next, we can define the network as a seperate expression. Notice the threading macro `->`, which takes the first argument and places it as the first argument to the next argument in the series. This is a prebuilt threading macro in Hy and its very useful in defining long functional expressions in an inline format. The resulting expression is very clean and easy to follow. Thanks to PyTorch's functional API, we can take full advantage of threading macros for our network definitions. The next section will talk more about threading and the custom threading macros provided in HyTorch for more complex network designs.

In [6]:
; Defining a simple feed-forward NN as an S-Expression
(setv nn-def '(-> W_0 
                  (tfun.linear W_1 W_2) 
                  tfun.sigmoid 
                  (tfun.linear W_3 W_4) 
                  tfun.sigmoid))

; Print
(print-lisp nn-def)
(print-lisp (macroexpand nn-def))

(-> W_0 (tfun.linear W_1 W_2) tfun.sigmoid (tfun.linear W_3 W_4) tfun.sigmoid) 
(tfun.sigmoid (tfun.linear (tfun.sigmoid (tfun.linear W_0 W_1 W_2)) W_3 W_4)) 


[None, None, None]

Macros are powerful tools and allow for the extension of the source langauge into one that more accomadates the problem space. Here, we define a new macro which returns an expression for our parameter initialization routine.
By defining is procedure as a macro, the evaluated S-expressions will have access to the global variables defined in the notebook already. This macro can be later called to reset the network as needed.

In [7]:
; Define network parameter init procedure
(defmacro init-params []
  '(do (eval create-leafs) 
       (eval init-weights)))

; Initiate Parameters
(init-params)

[<function init_params at 0x115db17b8>, [None, None, None, None, None]]

Finally, let's run forward propagation of the network graph by simply evaluating our network expression and storing the resulting output in a new variable. By keeping model components seperate from one another, we have more modular control over what is being executed making debugging and refactoring much easier. 

In [8]:
; Running Forward Prop
(setv out (eval nn-def))
(print out)

tensor([0.3209], grad_fn=<SigmoidBackward>)


[None, None]

<a name="s42"></a>
### Expression Threading:
Hy has some pre-built threading macros to help write nested functions in inline
notation. This is a great start, but can be improved with some more advanced features to keep with inline notation while providing argument broadcasting and multidimensional threading for more complex computational graphs. 

Let's first review the two threading macros native to Hy, those being `->` and `->>`. These thread the arguments into each other's first or last argument slot respectively. For example:

In [9]:
; Head Threading
(print-lisp (macroexpand '(-> 10 (+ 2) (* 5) print)))

; Tail Threading
(print-lisp (macroexpand '(->> 10 (+ 2) (* 5) print)))

(print (* (+ 10 2) 5)) 
(print (* 5 (+ 2 10))) 


[None, None]

This is great for defining sequential computational graphs, but it becomes a bit messy when dealing with branching architectures. For these cases, we can use the broadcasting threading macros `*->` and `*->>`. This set of macros allows for arquitectures with multiple input/outputs or branching intermediate layers.

In [10]:
; Head Broadcast Threading
(print-lisp (macroexpand '(*-> [input1 input2] tfun.matmul (tfun.add bias) [tfun.sigmoid tfun.relu])))

; Tail Broadcast Threading
(print-lisp (macroexpand '(*->> [input1 input2] tfun.matmul (tf.add bias) [tfun.sigmoid tfun.relu])))

[(tfun.sigmoid (tfun.add (tfun.matmul input1 input2) bias)) (tfun.relu (tfun.add (tfun.matmul input1 input2) bias))] 
[(tfun.sigmoid (tf.add bias (tfun.matmul input1 input2))) (tfun.relu (tf.add bias (tfun.matmul input1 input2)))] 


[None, None]

When a list is threaded into the next argument, the list will be place as the first or last N arguments depending on head or tail threading. When the next argument is a list, the previous argument will be broadcasted to all elements in the next argument. In the previous example, we fed two inputs tensors, operated with matrix multiplication, added a bias tensor, and then outputed two tensors with different activation applied to them. 

There was another special threading macro used in the previous section called `|->` (with its tail counterpart being `|->>`). These macros are used for inline threading of lists. For example:

In [11]:
; Head List Inline Threading
(print-lisp (macroexpand '(|-> [input1 input2] [(tfun.linear w1 b1) (tf.linear w2 b2)] tfun.sigmoid)))

; Tail List Inline Threading
(print-lisp (macroexpand '(|->> [input1 input2] [(tfun.linear w1 b1) (tf.linear w2 b2)] tfun.sigmoid)))

[(tfun.sigmoid (tfun.linear input1 w1 b1)) (tfun.sigmoid (tf.linear input2 w2 b2))] 
[(tfun.sigmoid (tfun.linear w1 b1 input1)) (tfun.sigmoid (tf.linear w2 b2 input2))] 


[None, None]

Here, we made two nueral networks that function in parallel. We applied the same final activation to both, but only needed to define it once. These macros were used in the previous section to apply the initialization operators on each leaf tensor. Using these macros, complex arquitectures can be easily defined while maintaining a readable inline syntax. Since these are macros, the expanded code are legal Hy-expressions which can be operated on by different components of your program.

<a name="s43"></a>
### Pattern Matching:
Because network definitions are written as Hy-expressions, we can perform pattern matching to find internal components of an architecture. HyTorch contains pattern-matching tools which can be used test if two `HyExpressions` are equal with class-based matching denoted by `HyKeywords`. We can test to see if an expression matches a pattern as follows

In [12]:
; Import Pattern Matching Functions and Macros
(import [hytorch.match [pat-match? pat-find]])
(require [hytorch.match [pat-refract]])

[None, True]

In [13]:
; pat-match? expr pattern
(pat-match? '(print (+ (+ 2 2) 3)) '(print (+ :HyExpression 3)))

; Match by parent-class association (ex. hy.model.HyExpression child of hy)
(pat-match? '(print (+ (+ 1 2) 3)) '(print (+ :hy 3)))

; Match by sub classes (ex. Hy.models)
(pat-match? '(print (+ (+ 1 2) 3)) '(print (+ :hy:models 3)))

[True, True, True]

Using pattern matching, we can perform searches on our network defintions and retrieve subcomponents of the architecture. Here, we take the network defintion from the previous section and try to match portions of the network with a desired pattern. We match for expressions that start with sigmoid activation and are followed by a Hy expression. We first exapand the macro'd defintion and then look for the first 2 results. The matches are returned with the shallowest matches first.

In [14]:
; Our network from the previous section
(setv nn-def '(-> W_0 
                  (tfun.linear W_1 W_2) 
                  tfun.sigmoid 
                  (tfun.linear W_3 W_4) 
                  tfun.sigmoid))

; Pattern match search over expanded defintion and return first 2
(for [match (pat-find (macroexpand nn-def) '(tfun.sigmoid :HyExpression) :n 2)]
     (print "Sigmoid Match:")
     (print-lisp match))

; Find patterns using the &rest keyword. 
(for [match (pat-find (macroexpand nn-def) '(tfun.linear &rest) :n 2)]
     (print "Linear Match:")
     (print-lisp match))

Sigmoid Match:
(tfun.sigmoid (tfun.linear (tfun.sigmoid (tfun.linear W_0 W_1 W_2)) W_3 W_4)) 
Sigmoid Match:
(tfun.sigmoid (tfun.linear W_0 W_1 W_2)) 
Linear Match:
(tfun.linear (tfun.sigmoid (tfun.linear W_0 W_1 W_2)) W_3 W_4) 
Linear Match:
(tfun.linear W_0 W_1 W_2) 


[None, None, None]

<a name="s44"></a>
### Hy-Expression Refactoring:
With pattern matching, we can also quickly refactor components from our network arquitecture without having to manually rewrite any of the orginal code. This is very useful when experimenting with different hyper parameters and while keeping track of previously explored designs. HyTorch implements this using the following function which takes in pair of arguments defining the searched expression and the desired refactored code as expression tuples. These tuples have the pattern for the first element and the code that will replace the pattern as the second element. Lets change those activations to Relu and change out weight reference symbol `W_2` and `W_4` for `B_2` and `B_4` on the same model definiton. We can also match using the pattern matching kewords and @rest.

In [15]:
; Refactor model definition
(print-lisp (macroexpand `(pat-refract ~(macroexpand nn-def) (tfun.sigmoid tfun.relu) 
                                                             (W_2 B_2) 
                                                             (W_4 B_4))))

; Refactor model using pattern matching forms
(print-lisp (macroexpand `(pat-refract ~(macroexpand nn-def) (tfun.sigmoid tfun.relu) 
                                                             ((tfun.linear &rest) (tfun.conv2d W_0))))) 

(tfun.relu (tfun.linear (tfun.relu (tfun.linear W_0 W_1 B_2)) W_3 B_4)) 
(tfun.relu (tfun.conv2d W_0)) 


[None, None]

---
<a name="s5"></a>
## Network Analysis and Meta-Analysis Using HyTorch

<a name="s51"></a>
### Fetching Internal Network Components:

When working on model intepretation ande debuging its important to be able to fetch internal components of the model. While it's simple getting layer outputs by using the pattern search tool and then evaluating the sub-expression, how do we get gradients for tensors not defined as our leafs?

<a name="s52"></a>
### Probing Networks Using Tests:

<a name="s53"></a>
### Loading Foreign Pytorch Models:

<a name="s54"></a>
### Comparing Network Architectures: