# Book: Natural Language Processing with Pytorch

- Aims to bring newcomers to natural language processing and deep learning using Pytorch

- Mathematics in most places have been avoided because it is a distraction from the main goal of this book 
  

## Chapter 1: Introduction to NLP and PyTorch Basics

> Goal: 

- 1: Develop a clear understanding of the supervised learning paradigm
 
- 2: Learn how to encode inputs for the learning tasks. 
  
- 3: Master the basics of PyTorch.


## Machine Learning vs Deep Learning

![](./machine_vs_deep.png)

# Types of Learning

 - Supervised Learning: 
  
 - Unsupervised Learning:

 - Semi-supervised Learning:

![](./supervised_unsupervised_Semisuperved.png)

## Supervised Learning

> Supervised learning then becomes a process of finding the optimal parameters/weights w that will minimize the cumulative loss for all the n examples.

![](./supervised_learning.png)

## How to represent data: Data Encoding ?

- How can we represent our inputs and targets in NLP problems numerically so that we can train our model?

- We will need to represent both observations (text)  and target numerically to use them in  with machine learning algorithm--- Encoding.


![](./encoding1.png)

- There are many ways to perform this encoding (one-hot encoding, BoW, embeddings etc). 

- This book is dedicated to learning such representations for a task from data. However, we begin with some simple count­based representations that are based on heuristics. 

- Though simple, they are incredibly powerful as they are and can serve as a starting point for richer representation learning. All of these count­based representations start with a vector of fixed dimension.

## One-Hot Representation

- The one­hot representation, as the name suggests, starts with a zero vector, and sets as 1 the corresponding entry in the vector if the word is present in the sentence or document
  
- is a technique for representing categorical variables as binary vectors.

#### Example

![](./onehot1.png)
![](./onehot2.png)

- Tesla is represented as a vector of length 5 ([1,0,0,0,0])
  
-  Therefore the list of words in the sentence can be represented as an array of vectors or a matrix 

### Example
 
 Assume you have the following two sentences
 
 - Time flies like an arrow. 
 
 - Fruit flies like a banana.

Tokenizing the sentences, ignoring punctuation, and treating everything as lowercase, will yield a vocabulary of size 8:{time, fruit, flies, like, a, an, arrow, banana}


![](./onehot.png)


The binary encoding for “like a banana” would then be :[0, 0, 0, 1, 1, 0, 0, 1].

## Term Frequency (TF)

- The term frequency (TF) is the number of times a word appears in a document.

- is a measure of how important a word is to a document. 

- From previous example, the sentence “Fruit flies like time flies a fruit” has the following 
  term frequency representation: [1, 2, 2, 1, 1, 0, 0, 0].

## Term Frequency-Inverse Document Frequency (TF-IDF)

- The TF representation weights words proportionally to their frequency. 

- However, common words such as “the” do not add anything to our understanding of a specific patent. 

- Conversely, if a rare word (such as "excellent") occurs less frequently but is quite likely to be indicative of the nature of the  document, we would want to give it a larger weight in our representation. 

- The Inverse­ Document­Frequency (IDF) is a heuristic to do exactly that: by taking inverse document frequency, we can minimize the weighting of frequent terms while making infrequent terms have a higher impact

- Therefore, IDF representation penalizes common tokens and rewards rare tokens in the vector representation 



![](./idf.png)

- TF-IDF stands for term frequency-inverse document frequency and it is a measure, used in the fields of information retrieval (IR) and machine learning, that can quantify the importance or relevance of string representations (words, phrases, lemmas, etc) in a document amongst a collection of documents. 

![](./tfidf.png)

- The key intuition motivating TF-IDF is the importance of a term is inversely related to its frequency across documents.

- **TF** gives us information on how often a term appears in a document and **IDF** gives us information about the relative rarity of a term in the collection of documents. By multiplying these values together we can get our final TF-IDF value.

- The higher the TF-IDF score the more important or relevant the term is; as a term gets less relevant, its TF-IDF score will approach 0.

 

In [2]:
from sklearn.feature_extraction.text import TfidfTransformer  
from sklearn.feature_extraction.text import CountVectorizer  

corpus=["I come to China to travel", 
    "This is a car polupar in China",          
    "I love tea and Apple ",   
    "The work is to write some papers in science"] 

vectorizer=CountVectorizer()

transformer = TfidfTransformer()
tfidf = transformer.fit_transform(vectorizer.fit_transform(corpus))  
print (tfidf)

  (0, 16)	0.4424621378947393
  (0, 15)	0.697684463383976
  (0, 4)	0.4424621378947393
  (0, 3)	0.348842231691988
  (1, 14)	0.45338639737285463
  (1, 9)	0.45338639737285463
  (1, 6)	0.3574550433419527
  (1, 5)	0.3574550433419527
  (1, 3)	0.3574550433419527
  (1, 2)	0.45338639737285463
  (2, 12)	0.5
  (2, 7)	0.5
  (2, 1)	0.5
  (2, 0)	0.5
  (3, 18)	0.3565798233381452
  (3, 17)	0.3565798233381452
  (3, 15)	0.2811316284405006
  (3, 13)	0.3565798233381452
  (3, 11)	0.3565798233381452
  (3, 10)	0.3565798233381452
  (3, 8)	0.3565798233381452
  (3, 6)	0.2811316284405006
  (3, 5)	0.2811316284405006


## Word Vectors: Embeddings

- In deep learning, we used embeddings to learn a representation that is more robust to the nature of the data.

- Embeddings are a way to represent words in a vector space and capture semantic information about the words in a sentence.

- Example of embeddings: Word2Vec, Glove, FastText, WordEmbeddings,HauWE etc.


> Word Embeddings or Word vectorization is a methodology in NLP to map words or phrases from vocabulary to a corresponding vector of real numbers which used to find word predictions, word similarities/semantics. The process of converting words into numbers are called Vectorization

![](./wordembedding.png)

# PyTorch Basics

- PyTorch is an open source, community ­driven deep toolkit for building neural networks that are optimized for the task of image, text, and sequence classification.

- It is dynamic graph-based framework that allows you to define your neural network in a way that is easy to understand and debug.



## Why use PyTorch?

- PyTorch is the most used deep learning framework today. See more at [here](https://paperswithcode.com/trends).

- PyTorch also helps take care of many things such as GPU acceleration (making your code run faster) behind the scenes.

 


## Pytoch Installation



- pre-requisite: Package Manager (e.g. pip, conda)

- Python

- PyTorch version (the book supoort 1.0), Now Pytorch recent version is 1.4.0. So, expect some stuff to break in the book.


## VERIFICATION


In [3]:
import torch
torch.manual_seed(1234)
torch.__version__

  from .autonotebook import tqdm as notebook_tqdm


'1.13.0.dev20220611'

## What is Tensor 


- Tensor are the standard way of representing data in Pytorch, such as text, images, and audio.

- Their job is to represent data in a numerical way.

![](./tensor_represent_data.png)


> You could have a vector [3, 2] to describe [bedrooms, bathrooms] in your house. Or you could have [3, 2, 2] to describe [bedrooms, bathrooms, car_parks] in your house.


![](./tensor_loop.png)

# is Tensor all you need? 

-  Data Structure for holding data:

   - Python List, 

   - Numpy Array, and 
  
   - Torch Tensor
  
- Let us remember the basic of data structures in Python (List and Numpy Array) before we start using Pytorch Tensor

### From Python lists to Numpy Array

 - Python does not have built-in support for Arrays, but Python Lists can be used instead.

In [4]:
a_list = [1,3,4] #A list is the Python equivalent of an array
a_list

[1, 3, 4]

In [5]:
type(a_list)

list

In [6]:
a_list[0]

1

In [7]:
import numpy as np
a_numpy = np.array([1,3,4])
a_numpy

array([1, 3, 4])

In [8]:
type(a_numpy)

numpy.ndarray

In [9]:
a_numpy

array([1, 3, 4])

In [10]:
a_numpy[0]

1

### Why numpy?

- Size - Numpy data structures take up less space

- Performance - they have a need for speed and are faster than lists

- Functionality - SciPy and NumPy have optimized functions such as linear algebra operations built in.


![](./contiguous.png)

In [11]:
import numpy as np
import time


size_of_vec = 1000

def pure_python_version():
    t1 = time.time()
    X = range(size_of_vec)
    Y = range(size_of_vec)
    Z = [X[i] + Y[i] for i in range(len(X)) ]
    return time.time() - t1

def numpy_version():
    t1 = time.time()
    X = np.arange(size_of_vec)
    Y = np.arange(size_of_vec)
    Z = X + Y
    return time.time() - t1


t1 = pure_python_version()
t2 = numpy_version()
print(t1, t2)
print("Numpy is in this example " + str(t1/t2) + " faster!")

0.0001919269561767578 3.5762786865234375e-05
Numpy is in this example 5.366666666666666 faster!


### From Numpy Array to Torch Tensor 

- Tensors are like arrays, both are data structures that are used to store data.

- Tensor support GPU acceleration (Speed) and Gradients (Backpropagation)

- Numpy arrays are mainly used in typical machine learning algorithms whereas pytorch tensors are mainly used in deep learning which requires heavy matrix computation


#### Are Tensors Really like Numpy Arrays?

- Yes, they are. Let us see how they created

> Numpy

In [12]:
import numpy as np
a_numpy = np.array([[1, 2, 3], [2,3,4]]) # Create a numpy array
print(a_numpy)

[[1 2 3]
 [2 3 4]]


In [13]:
a_numpy.shape

(2, 3)

In [14]:
a_numpy.size

6

In [15]:
a_numpy.dtype

dtype('int64')

In [16]:
type(a_numpy)

numpy.ndarray

In [17]:
a_numpy[0]

array([1, 2, 3])

> Tensor

In [18]:
import torch
a_tensor = torch.Tensor([[1, 2, 3], [2,3,4]]) # Create a PyTorch tensor
print(a_tensor)

tensor([[1., 2., 3.],
        [2., 3., 4.]])


In [19]:
a_tensor.shape

torch.Size([2, 3])

In [20]:
a_tensor.size()

torch.Size([2, 3])

In [21]:
a_tensor.dtype

torch.float32

In [22]:
type(a_tensor)

torch.Tensor

> A tensor is an array: that is, a data structure that stores a collection of numbers that are accessible individually using an index, and that can be indexed with multiple indices.

> So, the most important difference between the two frameworks is naming. Numpy calls tensors (high dimensional matrices or vectors) arrays while in PyTorch there’s just called tensors. Everything else is quite similar.


## But, wait Tensors offer much more than just a data structure.

- GPU acceleration , which is a great advantage for deep learning
  
- distribute operations on multiple devices or machines, and 

- keep track of the graph of computations that created them ( usefull for backpropagation)
 

## Let us Learn more about Tensor

> Tensors are generalization of vectors and matrices to an arbitrary number of dimensions. 



![](./tensor_generalization.png)

![](./tensor.png)

### So, what can we do with Tensors?

Various operations are available on tensors.

- Creating tensors
  
- Operations with tensors
  
- Indexing, slicing, and joining with tensors Computing gradients with tensors
  
- Using CUDA/MPS tensors with GPUs

## Creating Tensors

- PyTorch allows us to create tensors in many different ways using the torch package. 

- The following are some of the ways to create tensors:

 

##### 1: Creating Random Tensor with a specific size

In [23]:
a_random = torch.Tensor(size = (3,4)) # Create a random tensor
# a_numpy = np.array([3,4])

print(a_random)

tensor([[0.0000e+00, 2.0000e+00, 0.0000e+00, 2.0000e+00],
        [4.8673e-39, 4.5810e-41, 6.1522e-36, 1.4013e-45],
        [8.4015e-40, 4.5810e-41, 4.7625e-10, 4.5810e-41]])


In [24]:
print(a_random.shape)
print(a_random.size())
print(type(a_random))
print(a_random.type())

torch.Size([3, 4])
torch.Size([3, 4])
<class 'torch.Tensor'>
torch.FloatTensor


>  Note: .shape is an alias for .size(), and was added to closely match numpy !

> Note: The default tensor type when you use the torch.Tensor constructor is torch.FloatTensor. 


Infact, `torch.Tensor` is an alias for the default tensor type (torch.FloatTensor).



In [25]:
a_random = torch.FloatTensor((3,4)) # Create a random tensor
print(a_random.type())

torch.FloatTensor


- But, what if I want my tensor to represent the data type I use?
  
  -  `torch.tensor` constructor infers the dtype automatically

In [26]:
a_random = torch.tensor((3,4)) # Create a random tensor
print(a_random.dtype)

torch.int64


But, you can also specify the dtype explicitly. 

In [27]:
a_torch = torch.tensor([1, 2, 3], dtype=torch.float32)

print(a_torch.type()) # Tensor type

torch.FloatTensor


I would recommend to stick to torch.tensor, if you would like to change the type, you can change

- What of if I have existing Tensor and what should I do with it?

In [28]:
a_torch = torch.tensor([1, 2, 3])

print(a_torch.type()) # Tensor type
print(a_torch.size()) # Tensor Size


torch.LongTensor
torch.Size([3])


In [29]:
a_short =  a_torch.short() # Convert to short, float(), 
print(a_short.type()) # Tensor type

torch.ShortTensor


[See different Pytorch Data Types](https://pytorch.org/docs/stable/tensors.html#data-types): Torch defines 10 tensor types with CPU and GPU variants:



- The most common type (and generally the default) is torch.float32 or torch.float. This is referred to as "32-bit floating point".

- But there's also 16-bit floating point (torch.float16 or torch.half) and 64-bit floating point (torch.float64 or torch.double).

- The reason for all of these is to do with precision in computing. Precision is the amount of detail used to describe a number.

- The higher the precision value (8, 16, 32), the more detail and hence data used to express a number.

- This matters in deep learning and numerical computing because you're making so many operations, the more detail you have to calculate on, the more compute you have to use.

- So lower precision datatypes are generally faster to compute on but sacrifice some performance on evaluation metrics like accuracy (faster to compute but less accurate).







 





In [30]:
float_16_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=torch.float16) # torch.half would also work

float_16_tensor.dtype

torch.float16

##### 2: Creating Tensors from Random Numbers


In [31]:
a_random_torch = torch.randn(2, 3) # uniform random distribution numbers between 0 and 1
# a_numpy_rand = np.random.randn(2,3) #numpy random normal distribution

(2,3)  

print(a_random_torch)
# print(a_numpy_rand)

tensor([[ 0.0461,  0.4024, -1.0115],
        [ 0.2167, -0.6123,  0.5036]])


In [32]:
a_random_torch = torch.rand(2, 3) # random normal distribution
# a_numpy_rand = np.random.rand(2,3) 

print(a_random_torch)
# print(a_numpy_rand)

tensor([[0.7749, 0.8208, 0.2793],
        [0.6817, 0.2837, 0.6567]])


### 3: Creating a filled tensor

In [33]:
a_same_scalar = torch.zeros(10,5)
print(a_same_scalar)
print(a_same_scalar.size())

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])
torch.Size([10, 5])


In [34]:
torch.ones(6, 10) # torch.ones(size=(6, 10)) 

tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

In [35]:
a_zero = torch.zeros(2, 3)
print(a_zero)

tensor([[0., 0., 0.],
        [0., 0., 0.]])


In [36]:
a_zero.fill_(5)

tensor([[5., 5., 5.],
        [5., 5., 5.]])

> Any PyTorch method with an underscore (_) refers to an in­place operation;

In [37]:
a_zero.fill_(5).size()


torch.Size([2, 3])

#### 4: Creating and initializing a tensor from lists

In [38]:
a_list = torch.tensor([1, 2, 3])
a_list

tensor([1, 2, 3])

In [39]:
a_list = torch.tensor([[1, 2, 3], 
                      [4, 5, 6]])
a_list

tensor([[1, 2, 3],
        [4, 5, 6]])

#### 5: Creating and initializing a tensor from numpy arrays

- The values can either come from a list, as in the preceding example, or from a NumPy array

In [40]:
numpy_array = np.random.rand(2, 3) 
numpy_array

array([[0.31112954, 0.99590158, 0.73157034],
       [0.6366771 , 0.1105546 , 0.84468547]])

In [41]:
type(numpy_array)

numpy.ndarray

In [42]:
torch_tensor = torch.from_numpy(numpy_array)
torch_tensor

tensor([[0.3111, 0.9959, 0.7316],
        [0.6367, 0.1106, 0.8447]], dtype=torch.float64)

In [43]:
torch_tensor.type()

'torch.DoubleTensor'

> DoubleTensor instead of the default FloatTensor (see the next section). This corresponds with the data type of the NumPy random matrix, a float64,

You can always convert from PyTorch tensors to Numpy arrays using the numpy function torch.numpy().

#### 6: Creating a range and tensors like

In [44]:
# Use torch.arange(), torch.range() is deprecated 
zero_to_ten_deprecated = torch.range(0, 10) # Note: this may return an error in the future


  zero_to_ten_deprecated = torch.range(0, 10) # Note: this may return an error in the future


In [45]:
# Create a range of values 0 to 10
zero_to_ten = torch.arange(start=0, end=10, step=1)
zero_to_ten

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

#### 7:Creating tensor of type with the same shape as another tensor.

In [46]:
# Can also create a tensor of zeros similar to another tensor
ten_zeros = torch.zeros_like(input=zero_to_ten) # will have same shape
ten_zeros

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [47]:
# Can also create a tensor of zeros similar to another tensor
ten_zeros = torch.ones_like(input=zero_to_ten) # will have same shape
ten_zeros

tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

####  Dimensions of a tensor using the ndim attribute

In [48]:
# Scalar
scalar = torch.tensor(7)
scalar

tensor(7)

In [49]:
scalar.ndim

0

In [50]:
# Scalar
vector = torch.tensor([1,2,3,4])
vector

tensor([1, 2, 3, 4])

In [51]:
vector.ndim

1

In [52]:
MATRIX = torch.tensor([[1,2,3,4],
                       [5,6,7,8]])

MATRIX

tensor([[1, 2, 3, 4],
        [5, 6, 7, 8]])

In [53]:
MATRIX.ndim

2

> You can tell the number of dimensions a tensor in PyTorch has by the number of square brackets on the outside ([) and you only need to count one side of the brackets.

In [54]:
# Tensor
TENSOR = torch.tensor([[[1, 2, 3],
                        [3, 6, 9],
                        [2, 4, 5]]])
TENSOR

tensor([[[1, 2, 3],
         [3, 6, 9],
         [2, 4, 5]]])

In [55]:
TENSOR.ndim

3

>  In practice, you'll often see scalars and vectors denoted as lowercase letters such as y or a. And matrices and tensors denoted as uppercase letters such as X or W

### Creating Named Tensors

- Named Tensors allow users to give explicit names to tensor dimensions. 

- In most cases, operations that take dimension parameters will accept dimension names, avoiding the need to track dimensions by position.

In [56]:
torch.zeros(2, 3, names=('N', 'C'))

  torch.zeros(2, 3, names=('N', 'C'))


tensor([[0., 0., 0.],
        [0., 0., 0.]], names=('N', 'C'))

- Use names to access tensor dimensions.

In [57]:
imgs = torch.randn(1, 2, 2, 3 , names=('N', 'C', 'H', 'W')) 
imgs.names

('N', 'C', 'H', 'W')

In [58]:
imgs.names[0]

'N'

In [59]:
imgs.names[1]

'C'

##  Indexing tensors

Indexing a tensor is similar to indexing a list.


> List indexing:

In [60]:
some_list = list(range(6))
some_list

[0, 1, 2, 3, 4, 5]

In [61]:
some_list[1:4]


[1, 2, 3]

In [62]:
some_list[1:]

[1, 2, 3, 4, 5]

In [63]:
some_list[1:4:2]

[1, 3]

In [64]:
some_list[:]

[0, 1, 2, 3, 4, 5]

In [65]:
print(some_list[:4])
print(some_list[:-1])

[0, 1, 2, 3]
[0, 1, 2, 3, 4]


> Pytorch tensor indexing:

In [66]:
torch_list = torch.tensor(some_list)
torch_list

tensor([0, 1, 2, 3, 4, 5])

In [67]:
torch_list[0]

tensor(0)

In [68]:
torch_list[1:4]

tensor([1, 2, 3])

## Transposing Tensors

Transposing 2D tensors is a simple operation using `t`

In [69]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])

In [90]:
points_t = points.t()
points_t

tensor([[4., 5., 2.],
        [1., 3., 1.]])

You can also transpose 3D and higher tensors using the `transpose` method by specifying the two dimensions along which transposing (flipping shape and stride) should occur:



In [93]:
some_t = torch.ones(3, 4, 5)
transpose_t = some_t.transpose(0, 2)
some_t.shape

torch.Size([3, 4, 5])

## Manipulating tensors (tensor operations)


- In deep learning, data (images, text, video, audio, protein structures, etc) gets represented as tensors.

- A model learns by investigating those tensors and performing a series of operations (could be 1,000,000s+) on tensors to create a representation of the patterns in the input data.

- After you have created your tensors, you can operate on them like you would do with traditional programming language types, like +, ­, *, /. 



> Addition: torch.add(tensor1, tensor2)

In [70]:
# Create a tensor of values and add a number to it
tensor = torch.tensor([1, 2, 3])
tensor + 10

tensor([11, 12, 13])

In [71]:
# Multiply it by 10
tensor * 10

tensor([10, 20, 30])

In [72]:
# Subtract and reassign
tensor = tensor - 10
tensor

tensor([-9, -8, -7])

In [73]:
a = torch.randn(4)
a

tensor([-0.4790,  0.8539, -0.2285,  0.3081])

> PyTorch also has a bunch of built-in functions like torch.mul() (short for multiplcation) and torch.add() to perform basic operations.



In [74]:
# Can also use torch functions
tensor = torch.tensor([1, 2, 3])
torch.multiply(tensor, 10)

tensor([10, 20, 30])

In [75]:
tensor = torch.tensor([1, 2, 3])

torch.add(tensor, 20)


tensor([21, 22, 23])

In [76]:
tensor = torch.tensor([1, 2, 3])

torch.div(tensor, 20)


tensor([0.0500, 0.1000, 0.1500])

In [77]:
torch.div(tensor, 20, rounding_mode='trunc')

tensor([0, 0, 0])

In [78]:
torch.div(tensor, 20, rounding_mode='floor')


tensor([0, 0, 0])

> Sum: torch.sum(tensor)

In [79]:
a = torch.randn(1, 3)
a

tensor([[ 1.1171,  0.1585, -0.8696]])

In [80]:
torch.sum(a)

tensor(0.4060)

> More operations can be found in the [Tensor Operations](https://pytorch.org/docs/stable/torch.html#torch-tensor-operations) section.

### Matrix multiplication (is all you need)


- One of the most common operations in machine learning and deep learning algorithms (like neural networks) is matrix multiplication.

- PyTorch implements matrix multiplication functionality in the torch.matmul() method.



- The main two rules for matrix multiplication to remember are:

The inner dimensions must match:

 - (3, 2) @ (3, 2) won't work
 - (2, 3) @ (3, 2) will work
 - (3, 2) @ (2, 3) will work

The resulting matrix has the shape of the outer dimensions:

 - (2, 3) @ (3, 2) -> (2, 2)
 - (3, 2) @ (2, 3) -> (3, 3)

Note: "@" in Python is the symbol for matrix multiplication.

More information about matrix multiplication can be found in the [Matrix Multiplication](https://pytorch.org/docs/stable/torch.html#torch-matmul) section.

In [81]:
tensor1 = torch.randn(3, 4)
tensor2 = torch.randn(4)

print(tensor1.shape)
print(tensor2.shape)

torch.Size([3, 4])
torch.Size([4])


> Can we multiply these tensors?

In [82]:
result = torch.matmul(tensor1, tensor2)

In [83]:
result

tensor([ 9.1468,  4.9503, -1.1270])

In [84]:
result.shape

torch.Size([3])

>  The difference between element-wise multiplication (multiply) and matrix multiplication (matmul) is the addition of values.


- matmul: matrix multiplication
    
- multiply: element-wise multiplication 





In [85]:
import torch
tensor = torch.tensor([1, 2, 3])
tensor.shape

torch.Size([3])

In [86]:
# Element-wise matrix mutlication
tensor * tensor

tensor([1, 4, 9])

In [87]:
# Matrix multiplication
torch.matmul(tensor, tensor)


tensor(14)

In [88]:
# Can also use the "@" symbol for matrix multiplication, though not recommended
tensor @ tensor

tensor(14)

> One of the most common errors in deep learning (shape errors)


In [89]:
# Shapes need to be in the right way  
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]], dtype=torch.float32)

tensor_B = torch.tensor([[7, 10],
                         [8, 11], 
                         [9, 12]], dtype=torch.float32)

torch.matmul(tensor_A, tensor_B) # (this will error)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)

In [None]:
print(tensor_A.shape)
print(tensor_B.shape)
#The shape of the tensor is not compatible with the shape of the matrix.

torch.Size([3, 2])
torch.Size([3, 2])


> Solution !!! make matrix multiplication work between tensor_A and tensor_B by making their inner dimensions match using Transpose() operation.

In [None]:
# View tensor_A and tensor_B
print(tensor_A)
print(tensor_B)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7., 10.],
        [ 8., 11.],
        [ 9., 12.]])


In [None]:
# View tensor_A and tensor_B.T
print(tensor_A)
print(tensor_B.T)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7.,  8.,  9.],
        [10., 11., 12.]])


In [None]:
# The operation works when tensor_B is transposed
print(f"Original shapes: tensor_A = {tensor_A.shape}, tensor_B = {tensor_B.shape}\n")
print(f"New shapes: tensor_A = {tensor_A.shape} (same as above), tensor_B.T = {tensor_B.T.shape}\n")
print(f"Multiplying: {tensor_A.shape} * {tensor_B.T.shape} <- inner dimensions match\n")
print("Output:\n")
output = torch.matmul(tensor_A, tensor_B.T)
print(output) 
print(f"\nOutput shape: {output.shape}")

Original shapes: tensor_A = torch.Size([3, 2]), tensor_B = torch.Size([3, 2])

New shapes: tensor_A = torch.Size([3, 2]) (same as above), tensor_B.T = torch.Size([2, 3])

Multiplying: torch.Size([3, 2]) * torch.Size([2, 3]) <- inner dimensions match

Output:

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

Output shape: torch.Size([3, 3])


> You can also use torch.mm() which is a short for torch.matmul().



In [None]:
# torch.mm is a shortcut for matmul
torch.mm(tensor_A, tensor_B.T)  # same as: torch.matmul(tensor_A, tensor_B.T)


tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

> Note: A matrix multiplication like this is also referred to as the dot product of two matrices. Neural networks are full of matrix multiplications and dot products.






For example, [`torch.nn.Linear()`](https://pytorch.org/docs/1.9.1/generated/torch.nn.Linear.html) module (we'll see this in action later on), also known as a feed-forward layer or fully connected layer, implements a matrix multiplication between an input `x` and a weights matrix `A`.

$$
y = x\cdot{A^T} + b
$$


#### Tensor View Operation

> Returns a new tensor with the same data as the self tensor but of a different shape.



In [None]:
x = torch.randn(4, 4)
x


tensor([[ 0.2329, -1.1014, -1.2473, -0.7485],
        [-0.9792,  0.8285, -0.2501,  0.1602],
        [ 0.7295, -0.4441,  0.8214, -0.6015],
        [ 0.9069,  1.5691, -0.1108, -0.2573]])

In [None]:
x.size()

torch.Size([4, 4])

In [None]:
y = x.view(16)
y

tensor([ 0.2329, -1.1014, -1.2473, -0.7485, -0.9792,  0.8285, -0.2501,  0.1602,
         0.7295, -0.4441,  0.8214, -0.6015,  0.9069,  1.5691, -0.1108, -0.2573])

In [None]:
y.size()

torch.Size([16])

> Using -1 in the shape argument will automatically infer the correct size of the dimension.

In [None]:
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
z

tensor([[ 0.2329, -1.1014, -1.2473, -0.7485, -0.9792,  0.8285, -0.2501,  0.1602],
        [ 0.7295, -0.4441,  0.8214, -0.6015,  0.9069,  1.5691, -0.1108, -0.2573]])

In [None]:
z.size()


torch.Size([2, 8])

> View Does not change tensor layout in memory

Transpose() operation is used to change the tensor layout in memory.

In [None]:
a = torch.randn(1, 2, 3, 4)
a.size()

torch.Size([1, 2, 3, 4])

In [None]:
b = a.transpose(1, 2)  # Swaps 2nd and 3rd dimension
b.size()

torch.Size([1, 3, 2, 4])

> Tensor API provide group of operation on working with Tensors

## Tensors and Computational Graphs

- Tensor.requires_grad is a boolean flag that indicates whether the tensor requires gradient.

- When you create a tensor with requires_grad=True, you are requiring PyTorch to manage bookkeeping information that computes gradients. 
  
- First, PyTorch will keep track of the values of the forward pass. Then, at the end of the computations, a single scalar is used to compute a backward pass. 

- The backward pass is initiated by using the backward() method on a tensor resulting from the evaluation of a loss function. The backward pass computes a gradient value for a tensor object that participated in the forward pass.

In [None]:
import torch
x = torch.ones(2, 2, requires_grad=True) 
print(x.grad is None)

True


In [None]:
x = torch.randn(5, requires_grad=True)
y = x.pow(2)
print(x.equal(y.grad_fn._saved_self))  # True
print(x is y.grad_fn._saved_self)  # True

True
True


-  let's run through a few ways to aggregate them (go from more values to less values).

In [None]:
# Create a tensor
x = torch.arange(0, 100, 10)
x

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

In [None]:
print(f"Minimum: {x.min()}")
print(f"Maximum: {x.max()}")
print(f"Mean: {x.type(torch.float32).mean()}") # won't work without float datatype
print(f"Sum: {x.sum()}")

Minimum: 0
Maximum: 90
Mean: 45.0
Sum: 450


> Note: You may find some methods such as torch.mean() require tensors to be in torch.float32 (the most common) or another specific datatype, otherwise the operation will fail.

In [None]:
print(f"Mean: {x.mean()}") # this will error

RuntimeError: mean(): could not infer output dtype. Input dtype must be either a floating point or complex dtype. Got: Long

You can also do the same as above with torch methods.



In [None]:
print(torch.max(x))

print(torch.min(x))

print(torch.mean(x.type(torch.float32)))

print(torch.sum(x))


tensor(90)
tensor(0)
tensor(45.)
tensor(450)


## Positional min/max


- You can also find the index of a tensor where the max or minimum occurs with torch.argmax() and torch.argmin() respectively.

- This is helpful incase you just want the position where the highest (or lowest) value is and not the actual value itself (we'll see this in a later section when using the softmax activation function).



In [None]:
# Create a tensor
tensor = torch.arange(10, 100, 10)
print(f"Tensor: {tensor}")

# Returns index of max and min values
print(f"Index where max value occurs: {tensor.argmax()}")
print(f"Index where min value occurs: {tensor.argmin()}")

Tensor: tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])
Index where max value occurs: 8
Index where min value occurs: 0


## Change tensor datatype



- A common issue with deep learning operations is having your tensors in different datatypes.

- If one tensor is in torch.float64 and another is in torch.float32, you might run into some errors.

- But there's a fix.

- You can change the datatypes of tensors using torch.Tensor.type(dtype=None) where the dtype parameter is the datatype you'd like to use

In [None]:
# Create a tensor and check its datatype
tensor = torch.arange(10., 100., 10.)
tensor.dtype

torch.float32

In [None]:
# Create a float16 tensor
tensor_float16 = tensor.type(torch.float16)
tensor_float16

tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)

In [None]:
# Create a int8 tensor
tensor_int8 = tensor.type(torch.int8)
tensor_int8

tensor([10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=torch.int8)

### Reshaping, stacking, squeezing and unsqueezing


- Often times you'll want to reshape or change the dimensions of your tensors without actually changing the values inside them.




| Method | One-line description |
| ----- | ----- |
| [`torch.reshape(input, shape)`](https://pytorch.org/docs/stable/generated/torch.reshape.html#torch.reshape) | Reshapes `input` to `shape` (if compatible), can also use `torch.Tensor.reshape()`. |
| [`torch.Tensor.view(shape)`](https://pytorch.org/docs/stable/generated/torch.Tensor.view.html) | Returns a view of the original tensor in a different `shape` but shares the same data as the original tensor. |
| [`torch.stack(tensors, dim=0)`](https://pytorch.org/docs/1.9.1/generated/torch.stack.html) | Concatenates a sequence of `tensors` along a new dimension (`dim`), all `tensors` must be same size. |
| [`torch.squeeze(input)`](https://pytorch.org/docs/stable/generated/torch.squeeze.html) | Squeezes `input` to remove all the dimenions with value `1`. |
| [`torch.unsqueeze(input, dim)`](https://pytorch.org/docs/1.9.1/generated/torch.unsqueeze.html) | Returns `input` with a dimension value of `1` added at `dim`. | 
| [`torch.permute(input, dims)`](https://pytorch.org/docs/stable/generated/torch.permute.html) | Returns a *view* of the original `input` with its dimensions permuted (rearranged) to `dims`. | 

Why do any of these?

Because deep learning models (neural networks) are all about manipulating tensors in some way. And because of the rules of matrix multiplication, if you've got shape mismatches, you'll run into errors. These methods help you make the right elements of your tensors are mixing with the right elements of other tensors. 

Let's try them out.

First, we'll create a tensor.

##### Reshape and View

In [None]:
# Create a tensor
import torch
x = torch.arange(1., 8.)
x, x.shape

(tensor([1., 2., 3., 4., 5., 6., 7.]), torch.Size([7]))

Now let's add an extra dimension with torch.reshape().



In [None]:
# Add an extra dimension
x_reshaped = x.reshape(1, 7)
x_reshaped, x_reshaped.shape

(tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

We can also change the view with torch.view().



In [None]:
# Change view (keeps same data as original but changes view)
# See more: https://stackoverflow.com/a/54507446/7900723
z = x.view(1, 7)
z, z.shape

(tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

##### Torch.View vs Torch.Reshape

- Both view() and reshape() can be used to change the size or shape of tensors. But they are slightly different.


- The view() has existed for a long time. It will return a tensor with the new shape. The returned tensor shares the underling data with the original tensor. If you change the tensor value in the returned tensor, the corresponding value in the viewed tensor also changes.

- Tensor.reshape() is more robust. It will work on any tensor, while Tensor.view() works only on tensor t where t.is_contiguous()==True.


In [None]:
z = torch.zeros(3, 2)
x = z.view(2, 3)

print(z)
print(x)

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])
tensor([[0., 0., 0.],
        [0., 0., 0.]])


In [None]:
z.fill_(1)


print(z)
print(x)

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])
tensor([[1., 1., 1.],
        [1., 1., 1.]])


According to [documentation](https://pytorch.org/docs/master/generated/torch.reshape.html#torch.reshape)

> Reshape() Returns a tensor with the same data and number of elements as input, but with the specified shape. When possible, the returned tensor will be a view of input. Otherwise, it will be a copy. Contiguous inputs and inputs with compatible strides can be reshaped without copying, but you should not depend on the copying vs. viewing behavior.



In [None]:
a = torch.arange(4.)

z = torch.reshape(a, (2, 2))
z

tensor([[0., 1.],
        [2., 3.]])

In [None]:
a.fill_(2)


tensor([2., 2., 2., 2.])

In [None]:
z

tensor([[2., 2.],
        [2., 2.]])

More about reshape vs view [here](https://discuss.pytorch.org/t/equivalent-of-np-reshape-in-pytorch/144/16) , [here](https://jdhao.github.io/2019/07/10/pytorch_view_reshape_transpose_permute/)

### Stack Tensor

- If we wanted to stack our new tensor on top of itself five times, we could do so with torch.stack().

- Concatenates a sequence of tensors along a new dimension.

- All tensors need to be of the same size.




In [None]:
# creating tensors
x = torch.tensor([1.,3.,6.,10.])
y = torch.tensor([2.,7.,9.,13.])
  
# printing above created tensors
print("Tensor x:", x)
print("Tensor y:", y)
  
# join above tensor using "torch.stack()"
print("join tensors:")
t = torch.stack((x,y))
  
# print final tensor after join
print(t)
  

Tensor x: tensor([ 1.,  3.,  6., 10.])
Tensor y: tensor([ 2.,  7.,  9., 13.])
join tensors:
tensor([[ 1.,  3.,  6., 10.],
        [ 2.,  7.,  9., 13.]])


In [None]:
print("join tensors dimension 0:")
t = torch.stack((x,y), dim = 0)
print(t)

join tensors dimension 0:
tensor([[ 1.,  3.,  6., 10.],
        [ 2.,  7.,  9., 13.]])


In [None]:
print("join tensors dimension 1:")
t = torch.stack((x,y), dim = 1)
print(t)

join tensors dimension 1:
tensor([[ 1.,  2.],
        [ 3.,  7.],
        [ 6.,  9.],
        [10., 13.]])


> When dim =0 the tensors are stacked increasing the number of rows. When dim =1 the tensors are transposed and stacked along the column. 



In [None]:
# Stack tensors on top of each other
torch.stack((x,y), dim = 1).size()

torch.Size([4, 2])

### Adding and Removinga single dimension ( Squeeze and Unsqueeze)) 

- Simply put, unsqueeze() "adds" a superficial 1 dimension to tensor (at the specified dimension), while squeeze removes all superficial 1 dimensions from tensor.

In [None]:
tensor = torch.tensor([1, 0, 2, 3, 4])
tensor.shape # torch.Size([5])

torch.Size([5])

In [None]:
tensor.unsqueeze(dim=0).shape # [1, 5]

torch.Size([1, 5])

In [None]:
tensor.unsqueeze(dim=1).shape # [5, 1]

torch.Size([5, 1])

It is useful for providing single sample to the network (which requires first dimension to be batch), for images it would be:



In [None]:
# 3 channels, 32 width, 32 height
tensor = torch.randn(3, 32, 32)
tensor.shape

torch.Size([3, 32, 32])

In [None]:
# 1 batch, 3 channels, 32 width, 32 height
tensor.unsqueeze(dim=0).shape

torch.Size([1, 3, 32, 32])

In [None]:
tensor.unsqueeze(dim=1).shape

torch.Size([3, 1, 32, 32])

- Squeeze() removes the dimension of size 1 from the tensor.

In [None]:
# 3 channels, 32 width, 32 height
tensor = torch.randn(3, 32, 32)
squeezed_tensor = tensor.unsqueeze(dim=0)
squeezed_tensor.shape

torch.Size([1, 3, 32, 32])

In [None]:
squeezed_tensor.squeeze().shape

torch.Size([3, 32, 32])

### Permute Tensor

- You can also rearrange the order of axes values with torch.permute(input, dims), where the input gets turned into a view with new dims.



In [None]:
# Create tensor with specific shape
x_original = torch.rand(size=(224, 224, 3))

# Permute the original tensor to rearrange the axis order
x_permuted = x_original.permute(2, 0, 1) # shifts axis 0->1, 1->2, 2->0

print(f"Previous shape: {x_original.shape}")
print(f"New shape: {x_permuted.shape}")

Previous shape: torch.Size([224, 224, 3])
New shape: torch.Size([3, 224, 224])


> Note: Because permuting returns a view (shares the same data as the original), the values in the permuted tensor will be the same as the original tensor and if you change the values in the view, it will change the values of the original.



## Indexing (selecting data from tensors)


  
- Indexing allow us to select data from tensors(e.g, select the first row of a tensor).

- Indexing a Pytorch tensor is similar to that of a Python list. The pytorch tensor indexing is 0 based, i.e, the first element of the array has index 0.

- Syntax : tensor_name[index]
 





In [None]:
tensor = torch.tensor([2, 4, 1, 7, 0, 9])

print(tensor[0])
print(tensor[3])


tensor(2)
tensor(7)


Indexing Range : tensor_name[start_index : end_index]



In [None]:
tensor = torch.tensor([2, 4, 1, 7, 0, 9])

print(tensor[1 : 5])


tensor([4, 1, 7, 0])


In [None]:
tensor = torch.tensor([2, 4, 1, 7, 0, 9])
print(tensor[2 : ])


tensor([1, 7, 0, 9])


In [None]:
tensor = torch.tensor([[1, 2, 1], [3, 8, 4]])

print(tensor[1])
print(tensor[0])


tensor([3, 8, 4])
tensor([1, 2, 1])


In [None]:
tensor = torch.tensor([[1, 2, 1], [3, 8, 4]])

print(tensor[0][1])


tensor(2)


In [None]:
# Create a tensor 
import torch
x = torch.arange(1, 10).reshape(1, 3, 3)
x, x.shape

(tensor([[[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]]),
 torch.Size([1, 3, 3]))

Indexing values goes outer dimension -> inner dimension (check out the square brackets).



In [None]:
# Let's index bracket by bracket
print(f"First square bracket:\n{x[0]}") 
print(f"Second square bracket: {x[0][0]}") 
print(f"Third square bracket: {x[0][0][0]}")

First square bracket:
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
Second square bracket: tensor([1, 2, 3])
Third square bracket: 1


### PyTorch tensors & NumPy


- Since NumPy is a popular Python numerical computing library, PyTorch has functionality to interact with it nicely.

- The two main methods you'll want to use for NumPy to PyTorch (and back again) are:

   - torch.from_numpy(ndarray) - NumPy array -> PyTorch tensor.
   - torch.Tensor.numpy() - PyTorch tensor -> NumPy array.

In [None]:
# NumPy array to tensor
import torch
import numpy as np
array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array)
array, tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

> Note: By default, NumPy arrays are created with the datatype float64 and if you convert it to a PyTorch tensor, it'll keep the same datatype (as above). However, many PyTorch calculations default to using float32. So if you want to convert your NumPy array (float64) -> PyTorch tensor (float64) -> PyTorch tensor (float32), you can use tensor = torch.from_numpy(array).type(torch.float32).



In [None]:
tensor = torch.from_numpy(array).type(torch.float32)
tensor.dtype

torch.float32

### Running tensors on GPUs (and making faster computations)


- Deep learning algorithms require a lot of numerical operations.

- However, there's another common piece of hardware called a GPU (graphics processing unit), which is often much faster at performing the specific types of operations neural networks need (matrix multiplications) than CPUs.





![](./GPU.png)

> How can I get GPU?

- Buy Hardware, NVIDVIA. Guide for buying [GPU](https://timdettmers.com/2020/09/07/which-gpu-for-deep-learning/)
- Cloud computing (AWS, GCP, Azure etc)

- Free: Colab, Kaggle, 


##### CUDA

In [None]:
# check if CUDA is available
torch.cuda.is_available()

False

In [None]:
# Set device type
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cpu'

In [None]:
torch.cuda.device_count() # count number of GPUs


0

#### MPS : Apple’ Metal Performance Shaders (MPS) as the backend for PyTorch

In [None]:
# Check PyTorch has access to MPS (Metal Performance Shader, Apple's GPU architecture)
torch.backends.mps.is_available()

True

In [None]:

# Set the device      
device = "mps" if torch.backends.mps.is_available() else "cpu"
print(f"Using device: {device}")


Using device: mps


###  Putting tensors (and models) on the GPU



- You can put tensors (and models, we'll see this later) on a specific device by calling to(device) on them. 

In [None]:
import torch

# Set the device
device = "mps" if torch.backends.mps.is_available() else "cpu"

# Create data and send it to the device
x = torch.rand(size=(3, 4)).to(device)
x

tensor([[0.8095, 0.0784, 0.3189, 0.0733],
        [0.7410, 0.3840, 0.9461, 0.1013],
        [0.1974, 0.8245, 0.5863, 0.9017]], device='mps:0')

In [None]:
# Create tensor (default on CPU)
tensor = torch.tensor([1, 2, 3])

# Tensor not on GPU
print(tensor, tensor.device)

# Move tensor to GPU (if available)
tensor_on_gpu = tensor.to(device)
tensor_on_gpu

tensor([1, 2, 3]) cpu


tensor([1, 2, 3], device='mps:0')

> Notice the second tensor has device='mps:0', this means it's stored on the 0th GPU available (GPUs are 0 indexed, if two GPUs were available, they'd be 'mps:0' and 'cuda:1' respectively, up to 'cuda:n').

### Moving tensors back to the CPU¶


- We will use the .to('cpu') method to move tensors back to the CPU.

- If this object is already in CPU memory and on the correct device, then no copy is performed and the original object is returned.




In [None]:
tensor_on_gpu #on GPU

tensor([1, 2, 3], device='mps:0')

In [None]:
# Instead, copy the tensor back to cpu
tensor_back_on_cpu = tensor_on_gpu.cpu()
tensor_back_on_cpu

tensor([1, 2, 3])

In [None]:
# Instead, copy the tensor back to cpu
tensor_back_to_numpy = tensor_on_gpu.cpu().numpy()
tensor_back_to_numpy

array([1, 2, 3])

## More on Tensors

- Pytorch Beginner Tutorial [here](https://pytorch.org/tutorials/beginner/basics/intro.html). This is a great tutorial for learning about Pytorch. Quickstart and Tensor Sections !
  
- Learn more about Tensor representations [here](https://www.youtube.com/watch?v=f5liqUk0ZTw&ab_channel=DanFleisch)


## Exercises

The best way to master a topic is to solve problems. Here are some warm­up exercises. Many of the problems will require going through the official ocumentation and finding helpful functions.

## Finding the min, max, mean, sum, etc (aggregation)¶


1  Create a 2D tensor and then add a dimension of size 1 inserted at dimension 0.
   
2. Remove the extra dimension you just added to the previous tensor.
   
3. Create a random tensor of shape 5x3 in the interval [3, 7)
   
4. Create a tensor with values from a normal distribution (mean=0, std=1).
   
5. Retrieve the indexes of all the nonzero elements in the tensor torch.Tensor([1, 1, 1, 0, 1]).
   
6. Create a random tensor of size (3,1) and then horizontally stack four copies together.

7. Return the batch matrix­matrix product of two three­dimensional matrices
(a=torch.rand(3,4,5), b=torch.rand(3,5,4)).

8. Return the batch matrix­matrix product of a 3D matrix and a 2D matrix
(a=torch.rand(3,4,5), b=torch.rand(5,4))

##  Indexing, Slicing, and Joining

- PyTorch’s indexing and slicing is similar to NumPy’s.

### Special Tensor initializations

We can create a vector of incremental numbers

In [None]:
x = torch.arange(0, 10)
print(x)

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


Sometimes it's useful to have an integer-based arange for indexing

In [None]:
x = torch.arange(0, 10).long()
print(x)

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


## Operations

Using the tensors to do linear algebra is a foundation of modern Deep Learning practices

Reshaping allows you to move the numbers in a tensor around.  One can be sure that the order is preserved.  In PyTorch, reshaping is called `view`

In [None]:
x = torch.arange(0, 20)

print(x.view(1, 20))
print(x.view(2, 10))
print(x.view(4, 5))
print(x.view(5, 4))
print(x.view(10, 2))
print(x.view(20, 1))

tensor([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
         18, 19]])
tensor([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])
tensor([[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]])
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19]])
tensor([[ 0,  1],
        [ 2,  3],
        [ 4,  5],
        [ 6,  7],
        [ 8,  9],
        [10, 11],
        [12, 13],
        [14, 15],
        [16, 17],
        [18, 19]])
tensor([[ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 9],
        [10],
        [11],
        [12],
        [13],
        [14],
        [15],
        [16],
        [17],
        [18],
        [19]])


We can use view to add size-1 dimensions, which can be useful for combining with other tensors.  This is called broadcasting. 

In [None]:
x = torch.arange(12).view(3, 4)
y = torch.arange(4).view(1, 4)
z = torch.arange(3).view(3, 1)

print(x)
print(y)
print(z)
print(x + y)
print(x + z)

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
tensor([[0, 1, 2, 3]])
tensor([[0],
        [1],
        [2]])
tensor([[ 0,  2,  4,  6],
        [ 4,  6,  8, 10],
        [ 8, 10, 12, 14]])
tensor([[ 0,  1,  2,  3],
        [ 5,  6,  7,  8],
        [10, 11, 12, 13]])


Unsqueeze and squeeze will add and remove 1-dimensions.

In [None]:
x = torch.arange(12).view(3, 4)
print(x.shape)

x = x.unsqueeze(dim=1)
print(x.shape)

x = x.squeeze()
print(x.shape)

torch.Size([3, 4])
torch.Size([3, 1, 4])
torch.Size([3, 4])


all of the standard mathematics operations apply (such as `add` below)

In [None]:
x = torch.rand(3,4)
print("x: \n", x)
print("--")
print("torch.add(x, x): \n", torch.add(x, x))
print("--")
print("x+x: \n", x + x)

x: 
 tensor([[0.6662, 0.3343, 0.7893, 0.3216],
        [0.5247, 0.6688, 0.8436, 0.4265],
        [0.9561, 0.0770, 0.4108, 0.0014]])
--
torch.add(x, x): 
 tensor([[1.3324, 0.6686, 1.5786, 0.6433],
        [1.0494, 1.3377, 1.6872, 0.8530],
        [1.9123, 0.1540, 0.8216, 0.0028]])
--
x+x: 
 tensor([[1.3324, 0.6686, 1.5786, 0.6433],
        [1.0494, 1.3377, 1.6872, 0.8530],
        [1.9123, 0.1540, 0.8216, 0.0028]])


The convention of `_` indicating in-place operations continues:

In [None]:
x = torch.arange(12).reshape(3, 4)
print(x)
print(x.add_(x))

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
tensor([[ 0,  2,  4,  6],
        [ 8, 10, 12, 14],
        [16, 18, 20, 22]])


There are many operations for which reduce a dimension.  Such as sum:

In [None]:
x = torch.arange(12).reshape(3, 4)
print("x: \n", x)
print("---")
print("Summing across rows (dim=0): \n", x.sum(dim=0))
print("---")
print("Summing across columns (dim=1): \n", x.sum(dim=1))

x: 
 tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
---
Summing across rows (dim=0): 
 tensor([12, 15, 18, 21])
---
Summing across columns (dim=1): 
 tensor([ 6, 22, 38])


#### Indexing, Slicing, Joining and Mutating

In [None]:
x = torch.arange(6).view(2, 3)
print("x: \n", x)
print("---")
print("x[:2, :2]: \n", x[:2, :2])
print("---")
print("x[0][1]: \n", x[0][1])
print("---")
print("Setting [0][1] to be 8")
x[0][1] = 8
print(x)

x: 
 tensor([[0, 1, 2],
        [3, 4, 5]])
---
x[:2, :2]: 
 tensor([[0, 1],
        [3, 4]])
---
x[0][1]: 
 tensor(1)
---
Setting [0][1] to be 8
tensor([[0, 8, 2],
        [3, 4, 5]])


We can select a subset of a tensor using the `index_select`

In [None]:
x = torch.arange(9).view(3,3)
print(x)

print("---")
indices = torch.LongTensor([0, 2])
print(torch.index_select(x, dim=0, index=indices))

print("---")
indices = torch.LongTensor([0, 2])
print(torch.index_select(x, dim=1, index=indices))

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])
---
tensor([[0, 1, 2],
        [6, 7, 8]])
---
tensor([[0, 2],
        [3, 5],
        [6, 8]])


We can also use numpy-style advanced indexing:

In [None]:
x = torch.arange(9).view(3,3)
indices = torch.LongTensor([0, 2])

print(x[indices])
print("---")
print(x[indices, :])
print("---")
print(x[:, indices])

tensor([[0, 1, 2],
        [6, 7, 8]])
---
tensor([[0, 1, 2],
        [6, 7, 8]])
---
tensor([[0, 2],
        [3, 5],
        [6, 8]])


We can combine tensors by concatenating them.  First, concatenating on the rows

In [None]:
x = torch.arange(6).view(2,3)
describe(x)
describe(torch.cat([x, x], dim=0))
describe(torch.cat([x, x], dim=1))
describe(torch.stack([x, x]))

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])
Type: torch.LongTensor
Shape/size: torch.Size([4, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5],
        [0, 1, 2],
        [3, 4, 5]])
Type: torch.LongTensor
Shape/size: torch.Size([2, 6])
Values: 
tensor([[0, 1, 2, 0, 1, 2],
        [3, 4, 5, 3, 4, 5]])
Type: torch.LongTensor
Shape/size: torch.Size([2, 2, 3])
Values: 
tensor([[[0, 1, 2],
         [3, 4, 5]],

        [[0, 1, 2],
         [3, 4, 5]]])


We can concentate along the first dimension.. the columns.

In [None]:
x = torch.arange(9).view(3,3)

print(x)
print("---")
new_x = torch.cat([x, x, x], dim=1)
print(new_x.shape)
print(new_x)

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])
---
torch.Size([3, 9])
tensor([[0, 1, 2, 0, 1, 2, 0, 1, 2],
        [3, 4, 5, 3, 4, 5, 3, 4, 5],
        [6, 7, 8, 6, 7, 8, 6, 7, 8]])


We can also concatenate on a new 0th dimension to "stack" the tensors:

In [None]:
x = torch.arange(9).view(3,3)
print(x)
print("---")
new_x = torch.stack([x, x, x])
print(new_x.shape)
print(new_x)

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])
---
torch.Size([3, 3, 3])
tensor([[[0, 1, 2],
         [3, 4, 5],
         [6, 7, 8]],

        [[0, 1, 2],
         [3, 4, 5],
         [6, 7, 8]],

        [[0, 1, 2],
         [3, 4, 5],
         [6, 7, 8]]])


#### Linear Algebra Tensor Functions

Transposing allows you to switch the dimensions to be on different axis. So we can make it so all the rows are columsn and vice versa. 

In [None]:
x = torch.arange(0, 12).view(3,4)
print("x: \n", x) 
print("---")
print("x.tranpose(1, 0): \n", x.transpose(1, 0))

x: 
 tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
---
x.tranpose(1, 0): 
 tensor([[ 0,  4,  8],
        [ 1,  5,  9],
        [ 2,  6, 10],
        [ 3,  7, 11]])


A three dimensional tensor would represent a batch of sequences, where each sequence item has a feature vector.  It is common to switch the batch and sequence dimensions so that we can more easily index the sequence in a sequence model. 

Note: Transpose will only let you swap 2 axes.  Permute (in the next cell) allows for multiple

In [None]:
batch_size = 3
seq_size = 4
feature_size = 5

x = torch.arange(batch_size * seq_size * feature_size).view(batch_size, seq_size, feature_size)

print("x.shape: \n", x.shape)
print("x: \n", x)
print("-----")

print("x.transpose(1, 0).shape: \n", x.transpose(1, 0).shape)
print("x.transpose(1, 0): \n", x.transpose(1, 0))

x.shape: 
 torch.Size([3, 4, 5])
x: 
 tensor([[[ 0,  1,  2,  3,  4],
         [ 5,  6,  7,  8,  9],
         [10, 11, 12, 13, 14],
         [15, 16, 17, 18, 19]],

        [[20, 21, 22, 23, 24],
         [25, 26, 27, 28, 29],
         [30, 31, 32, 33, 34],
         [35, 36, 37, 38, 39]],

        [[40, 41, 42, 43, 44],
         [45, 46, 47, 48, 49],
         [50, 51, 52, 53, 54],
         [55, 56, 57, 58, 59]]])
-----
x.transpose(1, 0).shape: 
 torch.Size([4, 3, 5])
x.transpose(1, 0): 
 tensor([[[ 0,  1,  2,  3,  4],
         [20, 21, 22, 23, 24],
         [40, 41, 42, 43, 44]],

        [[ 5,  6,  7,  8,  9],
         [25, 26, 27, 28, 29],
         [45, 46, 47, 48, 49]],

        [[10, 11, 12, 13, 14],
         [30, 31, 32, 33, 34],
         [50, 51, 52, 53, 54]],

        [[15, 16, 17, 18, 19],
         [35, 36, 37, 38, 39],
         [55, 56, 57, 58, 59]]])


Permute is a more general version of tranpose:

In [None]:
batch_size = 3
seq_size = 4
feature_size = 5

x = torch.arange(batch_size * seq_size * feature_size).view(batch_size, seq_size, feature_size)

print("x.shape: \n", x.shape)
print("x: \n", x)
print("-----")

print("x.permute(1, 0, 2).shape: \n", x.permute(1, 0, 2).shape)
print("x.permute(1, 0, 2): \n", x.permute(1, 0, 2))

x.shape: 
 torch.Size([3, 4, 5])
x: 
 tensor([[[ 0,  1,  2,  3,  4],
         [ 5,  6,  7,  8,  9],
         [10, 11, 12, 13, 14],
         [15, 16, 17, 18, 19]],

        [[20, 21, 22, 23, 24],
         [25, 26, 27, 28, 29],
         [30, 31, 32, 33, 34],
         [35, 36, 37, 38, 39]],

        [[40, 41, 42, 43, 44],
         [45, 46, 47, 48, 49],
         [50, 51, 52, 53, 54],
         [55, 56, 57, 58, 59]]])
-----
x.permute(1, 0, 2).shape: 
 torch.Size([4, 3, 5])
x.permute(1, 0, 2): 
 tensor([[[ 0,  1,  2,  3,  4],
         [20, 21, 22, 23, 24],
         [40, 41, 42, 43, 44]],

        [[ 5,  6,  7,  8,  9],
         [25, 26, 27, 28, 29],
         [45, 46, 47, 48, 49]],

        [[10, 11, 12, 13, 14],
         [30, 31, 32, 33, 34],
         [50, 51, 52, 53, 54]],

        [[15, 16, 17, 18, 19],
         [35, 36, 37, 38, 39],
         [55, 56, 57, 58, 59]]])


Matrix multiplication is `mm`:

In [None]:
torch.randn(2, 3, requires_grad=True)

tensor([[-0.4790,  0.8539, -0.2285],
        [ 0.3081,  1.1171,  0.1585]], requires_grad=True)

In [None]:
x1 = torch.arange(6).view(2, 3).float()
describe(x1)

x2 = torch.ones(3, 2)
x2[:, 1] += 1
describe(x2)

describe(torch.mm(x1, x2))

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0., 1., 2.],
        [3., 4., 5.]])
Type: torch.FloatTensor
Shape/size: torch.Size([3, 2])
Values: 
tensor([[1., 2.],
        [1., 2.],
        [1., 2.]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[ 3.,  6.],
        [12., 24.]])


In [None]:
x = torch.arange(0, 12).view(3,4).float()
print(x)

x2 = torch.ones(4, 2)
x2[:, 1] += 1
print(x2)

print(x.mm(x2))

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]])
tensor([[1., 2.],
        [1., 2.],
        [1., 2.],
        [1., 2.]])
tensor([[ 6., 12.],
        [22., 44.],
        [38., 76.]])


See the [PyTorch Math Operations Documentation](https://pytorch.org/docs/stable/torch.html#math-operations) for more!

## Computing Gradients

In [None]:
x = torch.tensor([[2.0, 3.0]], requires_grad=True)
z = 3 * x
print(z)

tensor([[6., 9.]], grad_fn=<MulBackward0>)


In this small snippet, you can see the gradient computations at work.  We create a tensor and multiply it by 3.  Then, we create a scalar output using `sum()`.  A Scalar output is needed as the the loss variable. Then, called backward on the loss means it computes its rate of change with respect to the inputs.  Since the scalar was created with sum, each position in z and x are independent with respect to the loss scalar. 

The rate of change of x with respect to the output is just the constant 3 that we multiplied x by.

In [None]:
x = torch.tensor([[2.0, 3.0]], requires_grad=True)
print("x: \n", x)
print("---")
z = 3 * x
print("z = 3*x: \n", z)
print("---")

loss = z.sum()
print("loss = z.sum(): \n", loss)
print("---")

loss.backward()

print("after loss.backward(), x.grad: \n", x.grad)


x: 
 tensor([[2., 3.]], requires_grad=True)
---
z = 3*x: 
 tensor([[6., 9.]], grad_fn=<MulBackward0>)
---
loss = z.sum(): 
 tensor(15., grad_fn=<SumBackward0>)
---
after loss.backward(), x.grad: 
 tensor([[3., 3.]])


### Example: Computing a conditional gradient

$$ \text{ Find the gradient of f(x) at x=1 } $$
$$ {} $$
$$ f(x)=\left\{
\begin{array}{ll}
    sin(x) \text{ if } x>0 \\
    cos(x) \text{ otherwise } \\
\end{array}
\right.$$

In [None]:
def f(x):
    if (x.data > 0).all():
        return torch.sin(x)
    else:
        return torch.cos(x)

In [None]:
x = torch.tensor([1.0], requires_grad=True)
y = f(x)
y.backward()
print(x.grad)

tensor([0.5403])


We could apply this to a larger vector too, but we need to make sure the output is a scalar:

In [None]:
x = torch.tensor([1.0, 0.5], requires_grad=True)
y = f(x)
# this is meant to break!
y.backward()
print(x.grad)

RuntimeError: grad can be implicitly created only for scalar outputs

Making the output a scalar:

In [None]:
x = torch.tensor([1.0, 0.5], requires_grad=True)
y = f(x)
y.sum().backward()
print(x.grad)

tensor([0.5403, 0.8776])


but there was an issue.. this isn't right for this edge case:

In [None]:
x = torch.tensor([1.0, -1], requires_grad=True)
y = f(x)
y.sum().backward()
print(x.grad)

tensor([-0.8415,  0.8415])


In [None]:
x = torch.tensor([-0.5, -1], requires_grad=True)
y = f(x)
y.sum().backward()
print(x.grad)

tensor([0.4794, 0.8415])


This is because we aren't doing the boolean computation and subsequent application of cos and sin on an elementwise basis.  So, to solve this, it is common to use masking:

In [None]:
def f2(x):
    mask = torch.gt(x, 0).float()
    return mask * torch.sin(x) + (1 - mask) * torch.cos(x)

x = torch.tensor([1.0, -1], requires_grad=True)
y = f2(x)
y.sum().backward()
print(x.grad)

tensor([0.5403, 0.8415])


In [None]:
def describe_grad(x):
    if x.grad is None:
        print("No gradient information")
    else:
        print("Gradient: \n{}".format(x.grad))
        print("Gradient Function: {}".format(x.grad_fn))

In [None]:
import torch
x = torch.ones(2, 2, requires_grad=True)
describe(x)
describe_grad(x)
print("--------")

y = (x + 2) * (x + 5) + 3
describe(y)
z = y.mean()
describe(z)
describe_grad(x)
print("--------")
z.backward(create_graph=True, retain_graph=True)
describe_grad(x)
print("--------")


Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
No gradient information
--------
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[21., 21.],
        [21., 21.]], grad_fn=<AddBackward0>)
Type: torch.FloatTensor
Shape/size: torch.Size([])
Values: 
21.0
No gradient information
--------
Gradient: 
tensor([[2.2500, 2.2500],
        [2.2500, 2.2500]], grad_fn=<CloneBackward>)
Gradient Function: None
--------


In [None]:
x = torch.ones(2, 2, requires_grad=True)

In [None]:
y = x + 2

In [None]:
y.grad_fn

<AddBackward0 at 0x7f35ea134940>

### CUDA Tensors

PyTorch's operations can seamlessly be used on the GPU or on the CPU.  There are a couple basic operations for interacting in this way.

In [None]:
print(torch.cuda.is_available())

True


In [None]:
x = torch.rand(3,3)
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([3, 3])
Values: 
tensor([[0.9149, 0.3993, 0.1100],
        [0.2541, 0.4333, 0.4451],
        [0.4966, 0.7865, 0.6604]])


In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda


In [None]:
x = torch.rand(3, 3).to(device)
describe(x)
print(x.device)

Type: torch.cuda.FloatTensor
Shape/size: torch.Size([3, 3])
Values: 
tensor([[0.1303, 0.3498, 0.3824],
        [0.8043, 0.3186, 0.2908],
        [0.4196, 0.3728, 0.3769]], device='cuda:0')
cuda:0


In [None]:
cpu_device = torch.device("cpu")

In [None]:
# this will break!
y = torch.rand(3, 3)
x + y

RuntimeError: expected type torch.cuda.FloatTensor but got torch.FloatTensor

In [None]:
y = y.to(cpu_device)
x = x.to(cpu_device)
x + y

tensor([[0.8394, 0.5273, 0.8267],
        [0.9273, 1.2824, 1.0603],
        [0.4574, 0.5968, 1.0541]])

In [None]:
if torch.cuda.is_available(): # only is GPU is available
    a = torch.rand(3,3).to(device='cuda:0') #  CUDA Tensor
    print(a)
    
    b = torch.rand(3,3).cuda()
    print(b)

    print(a + b)

    a = a.cpu() # Error expected
    print(a + b)

tensor([[0.5274, 0.6325, 0.0910],
        [0.2323, 0.7269, 0.1187],
        [0.3951, 0.7199, 0.7595]], device='cuda:0')
tensor([[0.5311, 0.6449, 0.7224],
        [0.4416, 0.3634, 0.8818],
        [0.9874, 0.7316, 0.2814]], device='cuda:0')
tensor([[1.0585, 1.2775, 0.8134],
        [0.6739, 1.0903, 1.0006],
        [1.3825, 1.4515, 1.0409]], device='cuda:0')


RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor

### Exercises

Some of these exercises require operations not covered in the notebook.  You will have to look at [the documentation](https://pytorch.org/docs/) (on purpose!)


(Answers are at the bottom)

#### Exercise 1

Create a 2D tensor and then add a dimension of size 1 inserted at the 0th axis.

#### Exercise 2

Remove the extra dimension you just added to the previous tensor.

#### Exercise 3

Create a random tensor of shape 5x3 in the interval [3, 7)

#### Exercise 4

Create a tensor with values from a normal distribution (mean=0, std=1).

#### Exercise 5

Retrieve the indexes of all the non zero elements in the tensor torch.Tensor([1, 1, 1, 0, 1]).

#### Exercise 6

Create a random tensor of size (3,1) and then horizonally stack 4 copies together.

#### Exercise 7

Return the batch matrix-matrix product of two 3 dimensional matrices (a=torch.rand(3,4,5), b=torch.rand(3,5,4)).

#### Exercise 8

Return the batch matrix-matrix product of a 3D matrix and a 2D matrix (a=torch.rand(3,4,5), b=torch.rand(5,4)).

Answers below

Answers still below.. Keep Going

#### Exercise 1

Create a 2D tensor and then add a dimension of size 1 inserted at the 0th axis.

In [None]:
a = torch.rand(3,3)
a = a.unsqueeze(0)
print(a)
print(a.shape)

#### Exercise 2 

Remove the extra dimension you just added to the previous tensor.

In [None]:
a = a.squeeze(0)
print(a.shape)

#### Exercise 3

Create a random tensor of shape 5x3 in the interval [3, 7)

In [None]:
3 + torch.rand(5, 3) * 4

#### Exercise 4

Create a tensor with values from a normal distribution (mean=0, std=1).

In [None]:
a = torch.rand(3,3)
a.normal_(mean=0, std=1)

#### Exercise 5

Retrieve the indexes of all the non zero elements in the tensor torch.Tensor([1, 1, 1, 0, 1]).

In [None]:
a = torch.Tensor([1, 1, 1, 0, 1])
torch.nonzero(a)

#### Exercise 6

Create a random tensor of size (3,1) and then horizonally stack 4 copies together.

In [None]:
a = torch.rand(3,1)
a.expand(3,4)

#### Exercise 7

Return the batch matrix-matrix product of two 3 dimensional matrices (a=torch.rand(3,4,5), b=torch.rand(3,5,4)).

In [None]:
a = torch.rand(3,4,5)
b = torch.rand(3,5,4)
torch.bmm(a, b)

#### Exercise 8

Return the batch matrix-matrix product of a 3D matrix and a 2D matrix (a=torch.rand(3,4,5), b=torch.rand(5,4)).

In [None]:
a = torch.rand(3,4,5)
b = torch.rand(5,4)
torch.bmm(a, b.unsqueeze(0).expand(a.size(0), *b.size()))

### END