![Deep Learning for Scientists in a hurry](./fig/Title.png)

In [1]:
%load_ext watermark

In [2]:
%watermark

Last updated: 2022-08-09T15:21:28.591012-04:00

Python implementation: CPython
Python version       : 3.8.10
IPython version      : 8.4.0

Compiler    : GCC 9.4.0
OS          : Linux
Release     : 3.10.0-1160.24.1.el7.x86_64
Machine     : x86_64
Processor   : x86_64
CPU cores   : 24
Architecture: 64bit



In [3]:
import time
start = time.time()
chapter_number = 2
import matplotlib
%matplotlib inline
%load_ext autoreload
%autoreload 2
import matplotlib.pyplot as plt

In [4]:
import numpy as np
import mxnet
from mxnet import nd
import math

In [5]:
%watermark -iv

matplotlib: 3.5.2
numpy     : 1.19.1
mxnet     : 1.9.1



# MXNet Multidimensional Arrays (NDArray)

Apache MXNet is deep learning software framework. Less well known than TensorFlow and PyTorch but offers some advantages for production systems and large scale trainins. Among the features highlighted we can mention:

  * **Multiple languages**: 
  
  MXNet supports frontend development on Python, R, Scala, Clojure, Julia, Perl, MATLAB and JavaScript, the back-end runs on C++.
<br>

  * **Scalability and Cloud Support**:

    MXNet can be distributed on dynamic cloud infrastructure using a distributed parameter server. Multiple GPUs or CPUs help the framework to approach linear scale.

    Several cloud computing services such as AWS and Azure offer support for MXNet.
<br>

  * **Flexible**:
  
    MXNet supports both imperative and symbolic programming. The framework allows developers to track, debug, save checkpoints, modify hyperparameters, and perform early stopping.
<br>

  * **Portable**:
  
    Supports an efficient deployment of a trained model to low-end devices for inference, such as mobile devices (using Amalgamation), Internet of things devices (using AWS Greengrass), serverless computing (using AWS Lambda) or containers. These low-end environments can have only weaker CPU or limited memory (RAM), and should be able to use the models that were trained on a higher-level environment (GPU based cluster, for example).
<br>


There are two main submodules:

  * **NDArray**: Basic manipulation of multi dimensional arrays. This is a sort of replacement for Numpy
  
  * **Gluon**: The Actual Neural Network engine that uses NDArray for the operation.

In this notebook we will concentrate on NDArray the MXNet data structure to work with multidimensional arrays on CPU and GPU memory.

# The NDArray  data structure

In _MXNet_, `NDArray` is the core data structure for all mathematical
computations.  Similar to NumPy, `NDArray` represents a multidimensional, fixed-size homogenous
array.  If you're familiar with the scientific computing python package
[NumPy](http://www.numpy.org/), you might notice that `mxnet.ndarray` is similar
to `numpy.ndarray`.  

Similar to other frameworks like TensorFlow and PyTorch, there are reason to create an alternative data structure that is capable of manipulate arrays allocated on GPU memory.
MXNet's `NDArray` supports fast execution on a wide range of
hardware configurations, including CPU, GPU, and multi-GPU machines. 
_MXNet_ NDArrays is that it also scales to distributed systems in the cloud.  

Another advantage of MXNet's `NDArray` is the ability to execute code lazily.
Lazy execution has the benefit of allowing it to automatically parallelize multiple operations across the available hardware.

Same as NumPy ndarray, an `NDArray` is a multidimensional array of numbers with the same type.  We
could represent the coordinates of a point in 3D space, e.g. `[2, 1, 6]` as a 1D
array with shape (3).  Similarly, we could represent a 2D array.  Below, we
present an array with length 2 along the first axis and length 3 along the
second axis.
```
[[0, 1, 2]
 [3, 4, 5]]
```
Note that here the use of "dimension" is overloaded.  When we say a 2D array, we
mean an array with 2 axes, not an array with two components.

Each NDArray supports some important attributes that you'll often want to query:

- **ndarray.shape**: The dimensions of the array. It is a tuple of integers
  indicating the length of the array along each axis. For a matrix with `n` rows
  and `m` columns, its `shape` will be `(n, m)`.
- **ndarray.dtype**: A `numpy` _type_ object describing the type of its
  elements.
- **ndarray.size**: The total number of components in the array - equal to the
  product of the components of its `shape`
- **ndarray.context**: The device on which this array is stored, e.g. `cpu()` or
  `gpu(1)`.


A section of this tutorial uses GPUs. If you don't have GPUs on your
machine, simply set the variable gpu_device (set in the GPUs section of this 
tutorial) to `mxnet.cpu()`.

## Array Creation

There are a few different ways to create an `NDArray`.

* We can create an NDArray from a regular Python list or tuple by using the `array` function:

In [6]:
# create a 1-dimensional array with a python list
a = nd.array([1,2,3])

[15:23:09] ../src/storage/storage.cc:196: Using Pooled (Naive) StorageManager for CPU


In [7]:
# create a 2-dimensional array with a nested python list
b = nd.array([[1,2,3], [2,3,4]])

In [8]:
{'a.shape':a.shape, 'b.shape':b.shape}

{'a.shape': (3,), 'b.shape': (2, 3)}

* We can also create an MXNet NDArray from a `numpy.ndarray` object:

In [9]:
c = np.arange(15).reshape(3,5)
# create a 2-dimensional array from a numpy.ndarray object
a = nd.array(c)
{'a.shape':a.shape}

{'a.shape': (3, 5)}

We can specify the element type with the option `dtype`, which accepts a numpy
type. By default, `float32` is used:

In [10]:
# float32 is used by default
a = nd.array([1,2,3])
a


[1. 2. 3.]
<NDArray 3 @cpu(0)>

In [11]:
# create an int32 array
b = nd.array([1,2,3], dtype=np.int32)
b


[1 2 3]
<NDArray 3 @cpu(0)>

In [13]:
# create a 16-bit float array
c = nd.array([1.2, 2.3], dtype=np.float16)
c


[1.2 2.3]
<NDArray 2 @cpu(0)>

In [14]:
(a.dtype, b.dtype, c.dtype)

(numpy.float32, numpy.int32, numpy.float16)

If we know the size of the desired NDArray, but not the element values, MXNet
offers several functions to create arrays with placeholder content:

In [15]:
# create a 2-dimensional array full of zeros with shape (2,3)
a = nd.zeros((2,3))
a


[[0. 0. 0.]
 [0. 0. 0.]]
<NDArray 2x3 @cpu(0)>

In [16]:
# create a same shape array full of ones
b = nd.ones((2,3))
b


[[1. 1. 1.]
 [1. 1. 1.]]
<NDArray 2x3 @cpu(0)>

In [17]:
# create a same shape array with all elements set to 7
c = nd.full((2,3), 7)
c


[[7. 7. 7.]
 [7. 7. 7.]]
<NDArray 2x3 @cpu(0)>

In [18]:
# create a same shape whose initial content is random and
# depends on the state of the memory
d = nd.empty((2,3))
d


[[1.148e-41 0.000e+00 7.511e-43]
 [0.000e+00 0.000e+00 0.000e+00]]
<NDArray 2x3 @cpu(0)>

## Printing Arrays

When inspecting the contents of an `NDArray`, it's often convenient to first
extract its contents as a `numpy.ndarray` using the `asnumpy` function.  Numpy
uses the following layout:

- The last axis is printed from left to right,
- The second-to-last is printed from top to bottom,
- The rest are also printed from top to bottom, with each slice separated from
  the next by an empty line.

In [19]:
b = nd.arange(18).reshape((3,2,3))
b.asnumpy()

array([[[ 0.,  1.,  2.],
        [ 3.,  4.,  5.]],

       [[ 6.,  7.,  8.],
        [ 9., 10., 11.]],

       [[12., 13., 14.],
        [15., 16., 17.]]], dtype=float32)

## Basic Operations

When applied to NDArrays, the standard arithmetic operators apply *elementwise*
calculations. The returned value is a new array whose content contains the
result.

In [20]:
a = nd.ones((2,3))
b = nd.ones((2,3))
# elementwise plus
c = a + b
c


[[2. 2. 2.]
 [2. 2. 2.]]
<NDArray 2x3 @cpu(0)>

In [21]:
# elementwise minus
d = - c
d


[[-2. -2. -2.]
 [-2. -2. -2.]]
<NDArray 2x3 @cpu(0)>

In [22]:
# elementwise pow and sin, and then transpose
e = nd.sin(c**2).T
e


[[-0.7568025 -0.7568025]
 [-0.7568025 -0.7568025]
 [-0.7568025 -0.7568025]]
<NDArray 3x2 @cpu(0)>

In [23]:
# elementwise max
f = nd.maximum(a, c)
f.asnumpy()

array([[2., 2., 2.],
       [2., 2., 2.]], dtype=float32)

As in `NumPy`, `*` represents element-wise multiplication. For matrix-matrix
multiplication, use `dot`.

In [24]:
a = nd.arange(4).reshape((2,2))
b = a * a
c = nd.dot(a,a)
c


[[ 2.  3.]
 [ 6. 11.]]
<NDArray 2x2 @cpu(0)>

In [25]:
print("b: %s, \n\nc: %s" % (b.asnumpy(), c.asnumpy()))

b: [[0. 1.]
 [4. 9.]], 

c: [[ 2.  3.]
 [ 6. 11.]]


The assignment operators such as `+=` and `*=` modify arrays in place, and thus
don't allocate new memory to create a new array.

In [26]:
a = nd.ones((2,2))
b = nd.ones(a.shape)
b += a
b.asnumpy()

array([[2., 2.],
       [2., 2.]], dtype=float32)

## Indexing and Slicing

The slice operator `[]` applies on axis 0.

In [27]:
a = nd.array(np.arange(6).reshape(3,2))
a[1:2] = 1
a[:].asnumpy()

array([[0., 1.],
       [1., 1.],
       [4., 5.]], dtype=float32)

We can also slice a particular axis with the method `slice_axis`

In [28]:
d = nd.slice_axis(a, axis=1, begin=1, end=2)
d.asnumpy()

array([[1.],
       [1.],
       [5.]], dtype=float32)

## Shape Manipulation

Using `reshape`, we can manipulate any arrays shape as long as the size remains
unchanged.

In [29]:
a = nd.array(np.arange(24))
b = a.reshape((2,3,4))
b.asnumpy()

array([[[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]],

       [[12., 13., 14., 15.],
        [16., 17., 18., 19.],
        [20., 21., 22., 23.]]], dtype=float32)

The `concat` method stacks multiple arrays along the first axis. Their
shapes must be the same along the other axes.

In [30]:
a = nd.ones((2,3))
b = nd.ones((2,3))*2
c = nd.concat(a,b)
c.asnumpy()

array([[1., 1., 1., 2., 2., 2.],
       [1., 1., 1., 2., 2., 2.]], dtype=float32)

## Reduce

Some functions, like `sum` and `mean` reduce arrays to scalars.

In [31]:
a = nd.ones((2,3))
b = nd.sum(a)
b.asnumpy()

array([6.], dtype=float32)

We can also reduce an array along a particular axis:

In [32]:
c = nd.sum_axis(a, axis=1)
c.asnumpy()

array([3., 3.], dtype=float32)

## Broadcast

We can also broadcast an array. Broadcasting operations, duplicate an array's
value along an axis with length 1. The following code broadcasts along axis 1:

In [33]:
a = nd.array(np.arange(6).reshape(6,1))
b = a.broadcast_to((6,4))  #
b.asnumpy()

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.]], dtype=float32)

It's possible to simultaneously broadcast along multiple axes. In the following example, we broadcast along axes 1 and 2:

In [34]:
c = a.reshape((2,1,1,3))
d = c.broadcast_to((2,2,2,3))
d.asnumpy()

array([[[[0., 1., 2.],
         [0., 1., 2.]],

        [[0., 1., 2.],
         [0., 1., 2.]]],


       [[[3., 4., 5.],
         [3., 4., 5.]],

        [[3., 4., 5.],
         [3., 4., 5.]]]], dtype=float32)

Broadcasting can be applied automatically when executing some operations,
e.g. `*` and `+` on arrays of different shapes.

In [35]:
a = nd.ones((3,2))
b = nd.ones((1,2))
c = a + b
c.asnumpy()

array([[2., 2.],
       [2., 2.],
       [2., 2.]], dtype=float32)

## Copies

When assigning an NDArray to another Python variable, we copy a reference to the
*same* NDArray. However, we often need to make a copy of the data, so that we
can manipulate the new array without overwriting the original values.

In [36]:
a = nd.ones((2,2))
b = a
b is a # will be True

True

The `copy` method makes a deep copy of the array and its data:

In [37]:
b = a.copy()
b is a  # will be False

False

The above code allocates a new NDArray and then assigns to *b*. When we do not
want to allocate additional memory, we can use the `copyto` method or the slice
operator `[]` instead.

In [38]:
b = nd.ones(a.shape)
c = b
c[:] = a
d = b
a.copyto(d)
(c is b, d is b)  # Both will be True

(True, True)

## Advanced Topics

MXNet's NDArray offers some advanced features that differentiate it from the
offerings you'll find in most other libraries.

### GPU Support

By default, NDArray operators are executed on CPU. But with MXNet, it's easy to
switch to another computation resource, such as GPU, when available. Each
NDArray's device information is stored in `ndarray.context`. When MXNet is
compiled with flag `USE_CUDA=1` and the machine has at least one NVIDIA GPU, we
can cause all computations to run on GPU 0 by using context `mxnet.gpu(0)`, or
simply `mxnet.gpu()`. When we have access to two or more GPUs, the 2nd GPU is
represented by `mxnet.gpu(1)`, etc.

**Note** In order to execute the following section on a cpu set gpu_device to mxnet.cpu().

In [39]:
def gpu_device(gpu_number=0):
    try:
        _ = mxnet.nd.array([1, 2, 3], ctx=mxnet.gpu(gpu_number))
    except mxnet.MXNetError:
        return None
    return mxnet.gpu(gpu_number)

In [40]:
if gpu_device() is None:
    my_device=mxnet.cpu()
else:
    my_device=mxnet.gpu()
    
my_device

[15:27:14] ../src/storage/storage.cc:196: Using Pooled (Naive) StorageManager for GPU


gpu(0)

In [41]:
def f():
    a = nd.ones((100,100))
    b = nd.ones((100,100))
    c = a + b
    print(c)
# in default mx.cpu() is used
f()


[[2. 2. 2. ... 2. 2. 2.]
 [2. 2. 2. ... 2. 2. 2.]
 [2. 2. 2. ... 2. 2. 2.]
 ...
 [2. 2. 2. ... 2. 2. 2.]
 [2. 2. 2. ... 2. 2. 2.]
 [2. 2. 2. ... 2. 2. 2.]]
<NDArray 100x100 @cpu(0)>


In [42]:
# change the default context to the first GPU
with mxnet.Context(my_device):
    f()


[[2. 2. 2. ... 2. 2. 2.]
 [2. 2. 2. ... 2. 2. 2.]
 [2. 2. 2. ... 2. 2. 2.]
 ...
 [2. 2. 2. ... 2. 2. 2.]
 [2. 2. 2. ... 2. 2. 2.]
 [2. 2. 2. ... 2. 2. 2.]]
<NDArray 100x100 @gpu(0)>


We can also explicitly specify the context when creating an array:

In [43]:
a = nd.ones((100, 100), my_device)
a


[[1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 ...
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]]
<NDArray 100x100 @gpu(0)>

Currently, MXNet requires two arrays to sit on the same device for
computation. There are several methods for copying data between devices.

In [44]:
a = nd.ones((100,100), mxnet.cpu())
b = nd.ones((100,100), my_device)
c = nd.ones((100,100), my_device)
a.copyto(c)  # copy from CPU to GPU
d = b + c
e = b.as_in_context(c.context) + c  # same to above
{'d':d, 'e':e}

{'d': 
 [[2. 2. 2. ... 2. 2. 2.]
  [2. 2. 2. ... 2. 2. 2.]
  [2. 2. 2. ... 2. 2. 2.]
  ...
  [2. 2. 2. ... 2. 2. 2.]
  [2. 2. 2. ... 2. 2. 2.]
  [2. 2. 2. ... 2. 2. 2.]]
 <NDArray 100x100 @gpu(0)>,
 'e': 
 [[2. 2. 2. ... 2. 2. 2.]
  [2. 2. 2. ... 2. 2. 2.]
  [2. 2. 2. ... 2. 2. 2.]
  ...
  [2. 2. 2. ... 2. 2. 2.]
  [2. 2. 2. ... 2. 2. 2.]
  [2. 2. 2. ... 2. 2. 2.]]
 <NDArray 100x100 @gpu(0)>}

### Serialize From/To (Distributed) Filesystems

MXNet offers two simple ways to save (load) data to (from) disk. The first way
is to use `pickle`, as you might with any other Python objects. `NDArray` is
pickle-compatible.

In [45]:
import pickle as pkl
a = nd.ones((2, 3))
# pack and then dump into disk
data = pkl.dumps(a)
pkl.dump(data, open('tmp.pickle', 'wb'))
# load from disk and then unpack
data = pkl.load(open('tmp.pickle', 'rb'))
b = pkl.loads(data)
b.asnumpy()

array([[1., 1., 1.],
       [1., 1., 1.]], dtype=float32)

The second way is to directly dump to disk in binary format by using the `save`
and `load` methods. We can save/load a single NDArray, or a list of NDArrays:

In [46]:
a = nd.ones((2,3))
b = nd.ones((5,6))
nd.save("temp.ndarray", [a,b])
c = nd.load("temp.ndarray")
c

[
 [[1. 1. 1.]
  [1. 1. 1.]]
 <NDArray 2x3 @cpu(0)>,
 
 [[1. 1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1. 1.]]
 <NDArray 5x6 @cpu(0)>]

It's also possible to save or load a dict of NDArrays in this fashion:

In [47]:
d = {'a':a, 'b':b}
nd.save("./output/temp.ndarray", d)
c = nd.load("./output/temp.ndarray")
c

{'a': 
 [[1. 1. 1.]
  [1. 1. 1.]]
 <NDArray 2x3 @cpu(0)>,
 'b': 
 [[1. 1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1. 1.]]
 <NDArray 5x6 @cpu(0)>}

The `load` and `save` methods are preferable to pickle in two respects

1. When using these methods, you can save data from within the Python interface
   and then use it later from another language's binding. For example, if we save
   the data in Python:

In [48]:
a = nd.ones((2, 3))
nd.save("./output/temp.ndarray", [a,])

we can later load it from R:
```
a <- mx.nd.load("temp.ndarray")
as.array(a[[1]])
##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    1    1    1
```

2. When a distributed filesystem such as Amazon S3 or Hadoop HDFS is set up, we
   can directly save to and load from it.

```
mx.nd.save('s3://mybucket/mydata.ndarray', [a,])  # if compiled with USE_S3=1
mx.nd.save('hdfs///users/myname/mydata.bin', [a,])  # if compiled with USE_HDFS=1
```

### Lazy Evaluation and Automatic Parallelization

MXNet uses lazy evaluation to achieve superior performance.  When we run `a=b+1`
in Python, the Python thread just pushes this operation into the backend engine
and then returns.  There are two benefits to this approach:

1. The main Python thread can continue to execute other computations once the
   previous one is pushed. It is useful for frontend languages with heavy
   overheads.
2. It is easier for the backend engine to explore further optimization, such as
   auto parallelization.

The backend engine can resolve data dependencies and schedule the computations
correctly. It is transparent to frontend users. We can explicitly call the
method `wait_to_read` on the result array to wait until the computation
finishes. Operations that copy data from an array to other packages, such as
`asnumpy`, will implicitly call `wait_to_read`.

In [49]:
import time

def do(x, n):
    """push computation into the backend engine"""
    return [nd.dot(x,x) for i in range(n)]
def wait(x):
    """wait until all results are available"""
    for y in x:
        y.wait_to_read()

tic = time.time()
a = nd.ones((1000,1000))
b = do(a, 50)
print('time for all computations are pushed into the backend engine:\n %f sec' % (time.time() - tic))
wait(b)
print('time for all computations are finished:\n %f sec' % (time.time() - tic))

time for all computations are pushed into the backend engine:
 0.001362 sec
time for all computations are finished:
 0.255941 sec


Besides analyzing data read and write dependencies, the backend engine is able
to schedule computations with no dependency in parallel. For example, in the
following code:

In [None]:
a = nd.ones((2,3))
b = a + 1
c = a + 2
d = b * c

the second and third lines can be executed in parallel. The following example
first runs on CPU and then on GPU:

In [None]:
n = 10
a = nd.ones((1000,1000))
b = nd.ones((3000,3000), my_device)
tic = time.time()
c = do(a, n)
wait(c)
print('Time to finish the CPU workload: %f sec' % (time.time() - tic))
d = do(b, n)
wait(d)
print('Time to finish both CPU/GPU workloads: %f sec' % (time.time() - tic))

Now we issue all workloads at the same time. The backend engine will try to
parallel the CPU and GPU computations.

In [None]:
tic = time.time()
c = do(a, n)
d = do(b, n)
wait(c)
wait(d)
print('Both as finished in: %f sec' % (time.time() - tic))

---

# References

There are many books about Deep Learning and many more on Machine Learning. 
This list is by no means an exhaustive list of books. I am listing the books from which I took inspiration. Also, I am listing materials where I found better ways to present topics. Often I am amazed by how people can create approachable materials for seemingly dry subjects.

The order of the books goes from divulgation and practical to the more rigorous and mathematical. Slides, blogs, and videos are those I have found over the internet or suggested by others.

### Selection of Books on Deep Learning

<br>
<div style="clear: both; display: table;">
  <div style="border: none; float: left; width: 200; padding: 5px">
  <img alt="Deep Learning - Kelleher" 
       src="./fig/books/Deep Learning - Kelleher.jpg" 
       height="100" width="100"  />
  </div>
  <div style="border: none; float: left; width: 800; padding: 5px">
      Deep Learning<br>
      John D. Kelleher<br>
      2019<br>
  </div>
</div>

<br>
<div style="clear: both; display: table;">
  <div style="border: none; float: left; width: 200; padding: 5px">
  <img alt="Introduction to Deep Learning - Charniak" 
       src="./fig/books/Introduction to Deep Learning - Charniak.jpg" 
       height="100" width="100"  />
  </div>
  <div style="border: none; float: left; width: 800; padding: 5px">
      Introduction to Deep Learning<br>
      Eugene Charniak<br>
      2018<br>
  </div>
</div>

<br>
<div style="clear: both; display: table;">
  <div style="border: none; float: left; width: 200; padding: 5px">
  <img alt="Introduction to Deep Learning - Skansi" 
       src="./fig/books/Introduction to Deep Learning - Skansi.jpg" 
       height="100" width="100"  />
  </div>
  <div style="border: none; float: left; width: 800; padding: 5px">
      Introduction to Deep Learning<br>
      Sandro Skansi<br>
      2018<br>
  </div>
</div>

<br>
<div style="clear: both; display: table;">
  <div style="border: none; float: left; width: 200; padding: 5px">
  <img alt="Deep Learning with PyTorch - Subramanian" 
       src="./fig/books/Deep Learning with PyTorch - Subramanian.jpg" 
       height="100" width="100"  />
  </div>
  <div style="border: none; float: left; width: 800; padding: 5px">
      Deep Learning with PyTorch<br>
      Vishnu Subramanian<br>
      2018<br>
  </div>
</div>

<br>
<div style="clear: both; display: table;">
  <div style="border: none; float: left; width: 200; padding: 5px">
  <img alt="Deep Learning with PyTorch - Stevens" 
       src="./fig/books/Deep Learning with PyTorch - Stevens.png" 
       height="100" width="100"  />
  </div>
  <div style="border: none; float: left; width: 800; padding: 5px">
      Deep Learning with PyTorch<br>
      Eli Stevens, Luca Artiga and Thomas Viehmann<br>
      2020<br>
  </div>
</div>

<br>
<div style="clear: both; display: table;">
  <div style="border: none; float: left; width: 200; padding: 5px">
  <img alt="Deep Learning with Python - Chollet" 
       src="./fig/books/Deep Learning with Python - Chollet.jpg" 
       height="100" width="100" />
  </div>
  <div style="border: none; float: left; width: 800; padding: 5px">
      Deep Learning with Python (Second Edition)<br>
      François Chollet<br>
      2021<br>
  </div>
</div>

<br>
<div style="clear: both; display: table;">
  <div style="border: none; float: left; width: 200; padding: 5px">
  <img alt="Deep Learning - Patterson" 
       src="./fig/books/Deep Learning - Patterson.jpeg"
       height="100" width="100" />
  </div>
  <div style="border: none; float: left; width: 800; padding: 5px">
      Deep Learning, a practitioner's approach<br>
      Josh Patterson and Adam Gibson<br>
      2017<br>
  </div>
</div>

<br>
<div style="clear: both; display: table;">
  <div style="border: none; float: left; width: 200; padding: 5px">
  <img alt="Deep Learning - Goodfellow" 
       src="./fig/books/Deep Learning - Goodfellow.jpg" 
       height="100" width="100"  />
  </div>
  <div style="border: none; float: left; width: 800; padding: 5px">
      Deep Learning<br>
      Ian Goodfelow, Yoshua Bengio, and Aaron Courville<br>
      2016<br>
  </div>
</div>

### Interactive Books

  * [Dive into Deep Learning](https://d2l.ai/index.html)<br>
    Interactive deep learning book with code, math, and discussions<br> 
    Implemented with PyTorch, NumPy/MXNet, and TensorFlow<br>
    Adopted at 300 universities from 55 countries


### Slides

  * John Urbanic's ["Deep Learning in one Afternoon"](https://www.psc.edu/wp-content/uploads/2022/04/Deep-Learning.pdf)<br>
An excellent fast, condensed introduction to Deep Learning.<br>
John is a Parallel Computing Scientist at Pittsburgh Supercomputing Center

  * [Christopher Olah's Blog](http://colah.github.io) is very good. For example about [Back Propagation](http://colah.github.io/posts/2015-08-Backprop)

  * Adam W. Harley on his CMU page offers [An Interactive Node-Link Visualization of Convolutional Neural Networks](https://www.cs.cmu.edu/~aharley/vis/)



### Jupyter Notebooks

 * [Yale Digital Humanities Lab](https://github.com/YaleDHLab/lab-workshops)
 
 * Aurelein Geron Hands-on Machine Learning with Scikit-learn 
   [First Edition](https://github.com/ageron/handson-ml) and
   [Second Edition](https://github.com/ageron/handson-ml2)
   
 * [A progressive collection notebooks of the Machine Learning course by the University of Turin](https://github.com/rugantio/MachineLearningCourse)
   
 * [A curated set of jupyter notebooks about many topics](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks)
   
### Videos

 * [Caltech's "Learning from Data" by Professor Yaser Abu-Mostafa](https://work.caltech.edu/telecourse.html)
 
 * [3Blue1Brown Youtube Channel](https://www.youtube.com/watch?v=Ilg3gGewQ5U)
 
 ---

# Back of the Book

In [None]:
n = chapter_number
t = np.linspace(0, (2*(n-1)+1)*np.pi/2, 1000)
x = t*np.cos(t)**3
y = 9*t*np.sqrt(np.abs(np.cos(t))) + t*np.sin(0.3*t)*np.cos(2*t)
plt.plot(x, y, c="green")
plt.axis('off');

In [None]:
end = time.time()
print(f'Chapter {chapter_number} took {int(end - start):d} seconds')