___

# Machine Learning in Geosciences ] 
Department of Applied Geoinformatics and Carthography, Charles University

Lukas Brodsky lukas.brodsky@natur.cuni.cz


### PyTorch installation

`pip install torch`
`pip install torchvision`

In [None]:
# Successfully installed torch-1.11.0 torchvision-0.12.0

PyTorch includes a package called torchvision which is used to load and prepare the dataset. It includes two basic functions namely Dataset and DataLoader which helps in transformation and loading of dataset.

# PyTorch tensor

Tensors are the building blocks for representing data in PyThorch. It is the fundamental data structure. The term `tensor` comes bundled with the notion of spaces. In this context of deep learning, tensors refer to the generalization of vectors and matrices to an arbitrary number of dimensions. 

The torch package contains not only the data structures for **multi-dimensional arrays** but also defines mathematical operations over these tensors. Additionally, it provides many utilities for efficient serializing of Tensors and arbitrary types, and other useful utilities.

### PyTorch tensor vs. NumPy array

A PyTorch tensor is similar to a NumPy array (*lingua franca* of data science). However, PyTorch tensors usually utilize GPUs to accelerate their numeric computations. These tensors which are created in PyTorch can be used to fit a network. The user can manually implement the forward and backward passes through the network.

Note: PyTorch tensors vs. TensorFlow; 

The major difference between PyTorch and Tensor Flow’s is in the computational graphs. Tensor Flow uses static and PyTorch uses dynamic computational graphs. This makes difference in debuging the graphs. 

### Dynamic Graphs
Static graphs are nice because user can optimize the graph up front. If programmers are re-using the same graph over and over, then this potentially costly up-front optimization can be maintained as the same graph is rerun over and over.

The major difference between them is that Tensor Flow’s computational graphs are static and PyTorch uses dynamic computational graphs.

### The tensor API

The PyTorch API establish a few directions on where to find things in the documentation (https://pytorch.org/docs/stable/index.html). 

## Tensors: Multidimensional arrays

In [None]:
import torch 
import numpy as np

### PyTorch Tensors constructors 
Comparing NumPy and PyTorch

In [None]:
a = np.ones(3)
print(a)

In [None]:
# constructing a one-dimensional tensor of size with ones
b = torch.ones(3)
print(b)

In [None]:
# slice
a[1]

In [None]:
b[1]

In [None]:
# casting to float
float(b[1])

In [None]:
# overwrite a value in tensor 
b[2] = 2.0
b

#### Constructors from other containers

In [None]:
# passing Python list to te constructor, the same effect! 
points = torch.tensor([4.0, 1.0, 5.0, 3.0, 2.0, 1.0])
points

In [None]:
# 2D, (list of lists) passed to the constructor
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points

In [None]:
# dimensionality 
points.shape

### Indexing tensors 

In [None]:
# Python list
some_list = list(range(5))

In [None]:
some_list[:]

In [None]:
some_list[1:3]

In [None]:
some_list[-1]

Tensors uses the same notion. We can use range indexing for each of the tensor's dimensions. 

In [None]:
# tensor two indices to access 2D elements
points[0, 1]

In [None]:
# the first row
points[0]

### Tensors storage

Values in tensors are allocated in **contignuous chunks of memory** managed by `torch.Storage` instances. A storage is a one-dimensional array of numerical data: that is, a contignuous block of memory containing numbers of a given type, such as float (32 bits representing floating-point number). A PyTorch `Tensor` instance is a view of such a `Storage` instance that is capable of indexing into that storage using an offset. Multiple tensors can index the same storage even if they index into the data differently. 


Each element is a 32-bit (4-byte) float (in the above case). Storing a 1D tensor of 1.000.000 float numbers will require 4.000.000 contignuous bytes plus small overhead for the metadata.  

In [None]:
# indexing into the storage 
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
# use method storage() to see the content
points.storage()

In [None]:
points

In [None]:
# even though the tensor is 3x2, the storage is contignous array of size 6

In [None]:
# manual indexing into the storage
points_storage = points.storage()
points_storage[0]

In [None]:
# we cannot index a storage of a 2D tensor using two indices
# the layout of storage is always one-dimensional 
points.storage()[1, 1]

In [None]:
points.storage()[1]

In [None]:
# changing the values in a storage leads to change in the tensor 
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points_storage = points.storage()
points_storage[0] = 2.0
points

In order to index into a storage, tensors rely on a few pieces of information that, toogether with their storage, unequivocally define them: **size**, **offset**, and **stride**. The size (or shape in NumPy world) is a tuple indicating how many elements across each dimension the tensor represents. The storage offset is the index in the storage corresponding to the first elemment in the tensor. The stride is the number of elements in the storage that need to be skipped over to obtain the next element along each dimension. 

In [None]:
# get second point in the tensor 
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
second_point = points[1]
print(second_point)

In [None]:
# offset
second_point.storage_offset()

In [None]:
# size
second_point.size()

In [None]:
# it is the same information contained in the shape
second_point.shape

In [None]:
# stride: number of elements in the storage that have to be skipped 
# when the index is increased by 1 in each dimension
points.stride()

Accessing an element i, j in a 2D tensor results in accessing the
`storage_offset + stride[0] * i + stride[1] * j` element in the storage.

This indirection between `Tensor` and `Storage` makes some operations inexpensive, like transposing a tensor or extracting a subtensor, because they do not lead to memory re-allocation! 

In [None]:
# see what happens to the size and stride when extracting subsensor
second_point = points[1]
second_point.size()

In [None]:
second_point.storage_offset()

In [None]:
second_point.stride()

In [None]:
# one less dimension, still indexing the same storage

In [None]:
# changing value (be careful)
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
second_point = points[1]
second_point[0] = 10.0
points

In [None]:
# we can also clone the subtensor into a new tensor (another memory chunk)
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
second_point = points[1].clone()
second_point[0] = 10.0
points

In [None]:
second_point

### Transposing without copying

In [None]:
# Let's take the point tensor where individual points in the rows and X and Y coordinates in the columns, 
# and turn it around so that points are in the columns. 

In [None]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points

In [None]:
# use t function (transpose)
points_t = points.t()
points_t

In [None]:
# check the storage of the 'two' tensors with id 
id(points.storage()) == id(points_t.storage())

In [None]:
# and the shape? 
points.stride()

In [None]:
points_t.stride()

### Transposing in higher dimension

In [None]:
some_t = torch.ones(3, 4, 5)
some_t.shape

In [None]:
transpose_t = some_t.transpose(0, 2)
some_t.shape

In [None]:
transpose_t.shape

In [None]:
some_t

In [None]:
some_t.stride()

In [None]:
transpose_t.stride()

A tensor whose values are laid out in the storage starting from the rightmost dimension onward is defined as `contignuous`. Data locality improves performance because of the way memory access works on modern CPUs.

In [None]:
points.is_contiguous()

In [None]:
points_t.is_contiguous()

In [None]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points_t = points.t()
points_t

In [None]:
points_t.storage()

In [None]:
points_t.stride()

We can use the `contiguous` method to obtain a new cintiguous tensor. 

In [None]:
points_t_cont = points_t.contiguous()
points_t_cont

In [None]:
points_t_cont.is_contiguous()

In [None]:
points_t_cont.stride()

In [None]:
points_t_cont.storage()

In [None]:
# Notice the storage has been re-shuffled (to be laid out row-by-row in the new storage).  

### Specifying the numeric type with dtype 

The `dtype`argument to tensor constructors specifies the numerical data (d) type, similar to NumPy. 

* `torch.float` .. 32-bit floating point number

* `torch.double` .. 64-bit

* `torch.float16` or `torch.half` .. 16-bit

* `torch.int8` .. signed 8-bit integers

* `torch.uint8` .. unsigned 8-bit 

* `torch.int16` or `torch.short` .. signed 16-bit 

* `torch.int32` or `torch.int` .. signed 32-bit 

* `torch.int64` or `torch.long` .. signed 64-bit 

* `torch.bool` .. Boolean

Computations happening in neural networks are typically executed with 32-bit floating-point precision (`torch.float` or `torch.float32`).

Tensors can be used as indexes in other tensors, PyTorch expects indexing tensors to have 64-bit integer data type (`torch.int64`). 

Predicates on tensors, such as points > 1.0, produce `bool` tensors (`torch.bool`). 

### Managing a tensor's dtype attribute

In [None]:
# float
double_points = torch.ones(10, 2, dtype=torch.double)
# integer
short_points = torch.tensor([[1, 2], [3, 4]], dtype=torch.short)

In [None]:
double_points.dtype

In [None]:
short_points.dtype

In [None]:
# casting 
double_points = torch.zeros(10, 2).double()
short_points = torch.ones(10, 2).short()

In [None]:
double_points.dtype

In [None]:
short_points.dtype

In [None]:
# or the more convenient and readble method .to()
double_points = torch.zeros(10, 2).to(torch.double)
short_points = torch.ones(10, 2).to(dtype=torch.short)

In [None]:
# mixing input data types in operations converts to the 'larger' type 
points_64 = torch.rand(5, dtype=torch.double)  # <1>
points_short = points_64.to(torch.short)
points_64 * points_short  # works from PyTorch 1.3 onwards

### Random values

In [None]:
# NumPy random 
np.random.rand(2,2)
# Torch random 
torch.rand(2,2)

### Math operations


In [None]:
# Element wise addition
a = torch.ones(2,2)
b = torch.ones(2,2)

c = a + b
c

In [None]:
c = torch.add(a, b)
c

In [None]:
# In-place addition
print(c)
c.add_(a)

In [None]:
# Multiplication, not in-place
print(torch.mul(a, b))

In [None]:
# Tensor Mean
a = torch.Tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])
print(a.size()) 

print(a.mean(dim=0))

Vast majority of operations on tensors are available in the `torch` module and can be called as methods of a tensor object. 

Look in to the web documentation(https://pytorch.org/docs/stable/index.html) for: 

* Math operations:     
    * Pointwise ops: 
    * Reduction ops: 
    * Comparision ops: 
    * Spectral ops: 


* Random sampling 

* Parallelism 

#### PyTorch  Abstraction

`Tensor`: Like array in Numpy, but runs on GPU

`Variable`: Stores data and gradient; Node in a computational graph; 


### PyTorch Variables
Variables allows us to accumulate gradients!

When using autograd, the forward pass of your network will define a computational graph; nodes in the graph will be Tensors, and edges will be functions that produce output Tensors from input Tensors.
PyTorch Tensors can be created as variable objects where a variable represents a node in computational graph.

In [None]:
from torch.autograd import Variable

a = Variable(torch.ones(2,2), requires_grad = True)
a

In [None]:
# not a variable
torch.ones(2,2)

### What is requires_grad?
**Allows calculation of gradients w.r.t. the variable!**

### NumPy interoperability

PyTorch tensors can be converted to `NumPy`arrays and vice versa very efficiently. This allows to take advantage of huge swath of functionality in the wider Python ecosystem that has built up around the NumPy array type. The zero-copy interoperabilty with NumPy arrays is due to the storage system working with the Python buffer protocol (https://docs.python.org/3/c-api/buffer.html). 

In [None]:
points = torch.ones(3, 4)
points_np = points.numpy()
points_np

In [None]:
# the returned array shares the same underlaying buffer with the tensor storage 

In [None]:
type(points_np)

We can use such conversions at basically no cost, as long as the data sits in CPU RAM. However, if the tensor is allocated on the GPU, PyTorch will make a copy of the tensor into a NumPy array allocated on the CPU. 

In [None]:
# to torch Tensor 
points = torch.from_numpy(points_np)

In [None]:
points

In [None]:
points2 = torch.rand(3,4)

In [None]:
points2

In [None]:
 points2.numpy()

### Serializing tensors

PyTorch uses `pickle` under the hood to serialize the tensor object, plus dedicated serialization code for the storage. 

In [None]:
points

In [None]:
# save our points tensor to a file
torch.save(points, './ourpoints.t')

In [None]:
# we can pass a file descriptor in lieu of the file name
with open('./ourpoints2.t','wb') as f:
   torch.save(points, f)

In [None]:
points_in = torch.load('./ourpoints.t')

In [None]:
points_in

In [None]:
with open('./ourpoints.t','rb') as f:
   points = torch.load()

In [None]:
with open('./ourpoints.t','rb') as f:
   points = torch.load(f)

In [None]:
points

### Serializing to HDF5 with h5py

HDF5 is a portable, widly supported format for representing serialized multidimensional arrays, organized in a nested key-value dictionary. Python supports HDF5 through the h5py library. 

In [None]:
import h5py

In [None]:
# saving the hdf5 file
f = h5py.File('./ourpoints.hdf5', 'w')
# create dataset function! 'coodrs' is the key into the HDF5 
dset = f.create_dataset('coords', data=points.numpy())
f.close()

In [None]:
dset

One of the interesting things in HDF5 is that we can index the dataset while on disk and access only the elements we are interested in. 

In [None]:
# load just the last two points 
f = h5py.File('./ourpoints.hdf5', 'r')
dset = f['coords']
last_points = dset[-2:]

In [None]:
last_points = torch.from_numpy(dset[-2:])
f.close()

In [None]:
last_points

### Moving tensors to the GPU

Every PyTorch tensor can be transferred to the GPU(s) in order to perform massively parrallel, fast computatios. In adition to `dtype`, a PyTorch `Tensor` also has the notion of `device`, which is where the computer the tensor data is placed. 

In [None]:
torch.cuda.is_available()

In [None]:
if torch.cuda.is_available(): 
    points_gpu = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]], device='cuda')
else: 
    print('CUDA is not available')

We can also use the `to` method. 

In [None]:
if torch.cuda.is_available(): 
    points_gpu = points.to(device='cuda')
else: 
    print('CUDA is not available')

The CPU- and GPU-based tensors expose the same user-facing API, making it much easier to write code that is agnostic to where, exactly, the heavy number crunching is running. 

In [None]:
# specify the number of the GPU device 
if torch.cuda.is_available(): 
    points_gpu = points.to(device='cuda:0')
else: 
    print('CUDA is not available')

In [None]:
# Some more GPU operations, if CUDA is installed 

In [None]:
if torch.cuda.is_available(): 
    points = 2 * points  # <1> on CPU
    points_gpu = 2 * points.to(device='cuda')  # <2> on GPU 
else: 
    print('CUDA is not available')


In [None]:
if torch.cuda.is_available(): 
    points_gpu = points_gpu + 4

In [None]:
if torch.cuda.is_available(): 
    points_cpu = points_gpu.to(device='cpu')

We can also use the shorthand method `cpu`and `cuda` instead of the `to` method. 

In [None]:
points_gpu = points.cuda()  # <1>
points_gpu = points.cuda(0)
points_cpu = points_gpu.cpu() 