# CSE447: Project 0, Python and Pytorch Tutorial + Review (Autumn '24)

Author: Kabir Ahuja

Thanks to Kavel Rao for feedback!


In this notebook we will have a review of Python programming and a basic introduction to Pytorch. We assume familiarity with Python from all students and the notebook should serve mainly as a review and practice of important Python concepts. No familiarity with Pytorch is assumed. In this tutorial we only focus on the very basics i.e. tensor operations. More advanced usage of Pytorch will be covered in the upcoming projects.

The Python tutorial is adapted from [Stanford's CS231N Python Numpy Tutorial](https://cs231n.github.io/python-numpy-tutorial/) originally written by [Justin Johnson](https://web.eecs.umich.edu/~justincj/) and updated for Colab by Kevin Zakka. It is compatible only with **Python3**

The Pytorch tutorial is adapted from Stanford's [CS224N PyTorch Tutorial](https://colab.research.google.com/drive/13HGy3-uIIy1KD_WFhG4nVrxJC-3nUUkP?usp=sharing) by Dilara Soylu and Ethan Chi and [UW's CSE 517 (Autumn'24) Assignment 0](https://arc.net/l/quote/pxssjbrt) by Yegor Kuznetsov.

The tutorials are accompanied by 4 exercises in total, which are originally created for this course.

## Part 1: Python Fundamentals

This part is adapted from Stanford's CS231n Python tutorial.

In this part of the assignment, we will cover Basic Python:

* Basic Data Types
* Functions
* Containers or Data structures: Lists, Dictionaries, Sets, Tuples,
* Classes

### Basic data types

#### Numbers

Integers and floats work as you would expect from other languages:

In [60]:
x = 3
print(x, type(x))

3 <class 'int'>


In [61]:
print(x + 1)  # Addition
print(x - 1)  # Subtraction
print(x * 2)  # Multiplication
print(x**2)  # Exponentiation

4
2
6
9


In [62]:
x += 1
print(x)
x *= 2
print(x)

4
8


Note that unlike many languages, Python does not have unary increment (x++) or decrement (x--) operators.

Python also has built-in types for long integers and complex numbers; you can find all of the details in the [documentation](https://docs.python.org/3.7/library/stdtypes.html#numeric-types-int-float-long-complex).

#### Booleans

Python implements all of the usual operators for Boolean logic, but uses English words rather than symbols (`&&`, `||`, etc.):

In [63]:
t, f = True, False
print(type(t))

<class 'bool'>


Now we let's look at the operations:

In [64]:
print(t and f)  # Logical AND;
print(t or f)  # Logical OR;
print(not t)  # Logical NOT;
print(t != f)  # Logical XOR;

False
True
False
True


#### Strings

In [65]:
hello = "hello"  # String literals can use single quotes
world = "world"  # or double quotes; it does not matter
print(hello, len(hello))

hello 5


In [66]:
hw = hello + " " + world  # String concatenation
print(hw)

hello world


In [67]:
hw12 = "{} {} {}".format(hello, world, 12)  # string formatting
print(hw12)

hello world 12


String objects have a bunch of useful methods; for example:

In [68]:
s = "hello"
print(s.capitalize())  # Capitalize a string
print(s.upper())  # Convert a string to uppercase; prints "HELLO"
print(s.rjust(7))  # Right-justify a string, padding with spaces
print(s.center(7))  # Center a string, padding with spaces
print(s.replace("l", "(ell)"))  # Replace all instances of one substring with another
print("  world ".strip())  # Strip leading and trailing whitespace

Hello
HELLO
  hello
 hello 
he(ell)(ell)o
world


You can find a list of all string methods in the [documentation](https://docs.python.org/3.7/library/stdtypes.html#string-methods).

### Functions

Python functions are defined using the `def` keyword. For example:

In [69]:
def sign(x):
    if x > 0:
        return "positive"
    elif x < 0:
        return "negative"
    else:
        return "zero"


print(sign(3))
print(sign(-3))
print(sign(0))

positive
negative
zero


We will often define functions to take optional keyword arguments, like this:

In [70]:
def hello(name, loud=False):
    if loud:
        print("HELLO, {}".format(name.upper()))
    else:
        print("Hello, {}!".format(name))


hello("Bob")
hello("Fred", loud=True)

Hello, Bob!
HELLO, FRED


### Exercise 1: Basic Text Processing Using String Methods (1 Point)

You will now use what we just learned to implement some basic text processing methods in Python. Text processing comes very much in handy when training machine learning models on natural language data. Since text processing is typically done before training of the models, it is also commonly referred to as *preprocessing*. While in the lectures and following projects more advanced preprocessing techniques will be covered, for this exercise you will perform the following basic text processing steps:

- convert the text to lower case
- remove extra spaces (left trailing, right trailing, or in between the words)
- remove punctuations

Note that, while more advanced libraries like regular expressions (re) or natural language toolkit (NLTK) can also be used for implementing these methods, it is also straightforward to use string methods to carry out these basic operations, which is what we recommend you to use.

**Important Note: Remove `raise NotImplementedError()` when you write your code in the functions**

In [71]:
def to_lowercase(text: str) -> str:
    """
    Convert a string to lowercase

    E.g. "Hello" -> "hello"

    Input:
        - text: string
    Output:
        - string

    """
    # YOUR CODE HERE
    lowercase_text = text.lower()
    return lowercase_text


def remove_extra_spaces(text: str) -> str:
    """
    Remove extra spaces from a string.

    E.g.: "  This is a    string   " -> "This is a string"

    Input:
        - text: string
    Output:
        - string

    """
    text_without_extra_spaces = text

    # YOUR CODE HERE
    text_without_extra_spaces = " ".join(text.split())

    return text_without_extra_spaces

def remove_punctuations(text):
    """
    Remove punctuations from a string

    E.g.: "Hello! How are you?" -> "Hello How are you"

    Input:
        - text: string
    Output:
        - string

    """
    # we can get a list of punctuations from the string module
    import string
    punctuations = string.punctuation

    text_without_punctuations = text

    # YOUR CODE HERE
    for p in punctuations:
        text_without_punctuations = text_without_punctuations.replace(p, "")

    return text_without_punctuations


def text_processing(text):
    """
    Process a text by converting it to lowercase, removing extra spaces and removing punctuations

    Input:
        - text: string
    Output:
        - string

    """
    processed_text = None

    # YOUR CODE HERE
    processed_text = remove_punctuations(remove_extra_spaces(to_lowercase(text)))

    return processed_text

In [72]:
def test_text_processing():
    assert text_processing("  This is a    string   ") == "this is a string"
    assert text_processing("Hello! How are you?") == "hello how are you"
    assert text_processing("  This is a    string   with punctuations! ") == "this is a string with punctuations"
    assert text_processing("  This is a    string   with punctuations! ") == "this is a string with punctuations"
    print("All tests pass")

test_text_processing()

All tests pass


### Containers

Python includes several built-in container types: lists, dictionaries, sets, and tuples.

#### Lists

A list is the Python equivalent of an array, but is resizeable and can contain elements of different types:

In [73]:
xs = [3, 1, 2]  # Create a list
print(xs, xs[2])
print(xs[-1])  # Negative indices count from the end of the list; prints "2"

[3, 1, 2] 2
2


In [74]:
xs[2] = "foo"  # Lists can contain elements of different types
print(xs)

[3, 1, 'foo']


In [75]:
xs.append("bar")  # Add a new element to the end of the list
print(xs)

[3, 1, 'foo', 'bar']


In [76]:
x = xs.pop()  # Remove and return the last element of the list
print(x, xs)

bar [3, 1, 'foo']


As usual, you can find all the gory details about lists in the [documentation](https://docs.python.org/3.7/tutorial/datastructures.html#more-on-lists).

#### Slicing

In addition to accessing list elements one at a time, Python provides concise syntax to access sublists; this is known as slicing:

In [77]:
nums = list(range(5))  # range is a built-in function that creates a list of integers
print(nums)  # Prints "[0, 1, 2, 3, 4]"
print(nums[2:4])  # Get a slice from index 2 to 4 (exclusive); prints "[2, 3]"
print(nums[2:])  # Get a slice from index 2 to the end; prints "[2, 3, 4]"
print(nums[:2])  # Get a slice from the start to index 2 (exclusive); prints "[0, 1]"
print(nums[:])  # Get a slice of the whole list; prints ["0, 1, 2, 3, 4]"
print(nums[:-1])  # Slice indices can be negative; prints ["0, 1, 2, 3]"
nums[2:4] = [8, 9]  # Assign a new sublist to a slice
print(nums)  # Prints "[0, 1, 8, 9, 4]"

[0, 1, 2, 3, 4]
[2, 3]
[2, 3, 4]
[0, 1]
[0, 1, 2, 3, 4]
[0, 1, 2, 3]
[0, 1, 8, 9, 4]


#### Loops

You can loop over the elements of a list like this:

In [78]:
animals = ["cat", "dog", "monkey"]
for animal in animals:
    print(animal)

cat
dog
monkey


If you want access to the index of each element within the body of a loop, use the built-in `enumerate` function:

In [79]:
animals = ["cat", "dog", "monkey"]
for idx, animal in enumerate(animals):
    print("#{}: {}".format(idx + 1, animal))

#1: cat
#2: dog
#3: monkey


#### List comprehensions:
When programming, frequently we want to transform one type of data into another. As a simple example, consider the following code that computes square numbers:

In [80]:
nums = [0, 1, 2, 3, 4]
squares = []
for x in nums:
    squares.append(x**2)
print(squares)

[0, 1, 4, 9, 16]


You can make this code simpler using a list comprehension:

In [81]:
nums = [0, 1, 2, 3, 4]
squares = [x**2 for x in nums]
print(squares)

[0, 1, 4, 9, 16]


List comprehensions can also contain conditions:

In [82]:
nums = [0, 1, 2, 3, 4]
even_squares = [x**2 for x in nums if x % 2 == 0]
print(even_squares)

[0, 4, 16]


#### Dictionaries

A dictionary stores (key, value) pairs, similar to a `Map` in Java or an object in Javascript. You can use it like this:

In [83]:
d = {"cat": "cute", "dog": "furry"}  # Create a new dictionary with some data
print(d["cat"])  # Get an entry from a dictionary; prints "cute"
print("cat" in d)  # Check if a dictionary has a given key; prints "True"

cute
True


In [84]:
d["fish"] = "wet"  # Set an entry in a dictionary
print(d["fish"])  # Prints "wet"

wet


In [85]:
print(d["monkey"])  # KeyError: 'monkey' not a key of d

KeyError: 'monkey'

In [None]:
print(d.get("monkey", "N/A"))  # Get an element with a default; prints "N/A"
print(d.get("fish", "N/A"))  # Get an element with a default; prints "wet"

N/A
wet


In [None]:
del d["fish"]  # Remove an element from a dictionary
print(d.get("fish", "N/A"))  # "fish" is no longer a key; prints "N/A"

N/A


You can find all you need to know about dictionaries in the [documentation](https://docs.python.org/3/library/stdtypes.html#mapping-types-dict).

It is easy to iterate over the keys in a dictionary:

In [None]:
d = {"person": 2, "cat": 4, "spider": 8}
for animal, legs in d.items():
    print("A {} has {} legs".format(animal, legs))

A person has 2 legs
A cat has 4 legs
A spider has 8 legs


Dictionary comprehensions: These are similar to list comprehensions, but allow you to easily construct dictionaries. For example:

In [None]:
nums = [0, 1, 2, 3, 4]
even_num_to_square = {x: x**2 for x in nums if x % 2 == 0}
print(even_num_to_square)

{0: 0, 2: 4, 4: 16}


#### Sets
A set is an unordered collection of distinct elements. As a simple example, consider the following:

In [None]:
animals = {"cat", "dog"}
print("cat" in animals)  # Check if an element is in a set; prints "True"
print("fish" in animals)  # prints "False"

True
False


In [None]:
animals.add("fish")  # Add an element to a set
print("fish" in animals)
print(len(animals))  # Number of elements in a set;

True
3


In [None]:
animals.add("cat")  # Adding an element that is already in the set does nothing
print(len(animals))
animals.remove("cat")  # Remove an element from a set
print(len(animals))

3
2


_Loops_: Iterating over a set has the same syntax as iterating over a list; however since sets are unordered, you cannot make assumptions about the order in which you visit the elements of the set:

In [None]:
animals = {"cat", "dog", "fish"}
for idx, animal in enumerate(animals):
    print("#{}: {}".format(idx + 1, animal))

#1: dog
#2: cat
#3: fish


Set comprehensions: Like lists and dictionaries, we can easily construct sets using set comprehensions:

In [None]:
from math import sqrt

print({int(sqrt(x)) for x in range(30)})

{0, 1, 2, 3, 4, 5}


#### Tuples

A tuple is an (immutable) ordered list of values. A tuple is in many ways similar to a list; one of the most important differences is that tuples can be used as keys in dictionaries and as elements of sets, while lists cannot. Here is a trivial example:

In [None]:
d = {(x, x + 1): x for x in range(10)}  # Create a dictionary with tuple keys
t = (5, 6)  # Create a tuple
print(type(t))
print(d[t])
print(d[(1, 2)])

<class 'tuple'>
5
1


In [None]:
t[0] = 1

TypeError: 'tuple' object does not support item assignment

### Exercise 2: Word Tokenization and Converting Tokens to IDs (1.5 Points)

When working with text data most Machine Learning / Deep Learning algorithms treat text as a sequence of smaller units, which are also called tokens. While there exists different levels of granularities on what constitutes as a token, for this homework we will focus on the classical approach of word tokenization i.e. the text is split into a sequence of words.

E.g. ``` "This is a sentence about to be word tokenized" -> ["This", "is", "a", "sentence", "about", "to", "be", "word", "tokenized"]```

Tokenization is usually followed by a step converting the tokens into a set of input ids. As you will learn during the course, machines do not have semantic knowledge of words built in, and learning algorithms treat these words as numbers identifying (IDs) each word or token. These IDs are typically assigned according to the index of a word in the vocabulary, which is a list of all unique words in the training corpus.

E.g. For a vocabulary: ```vocab = ["A", a", "cat", "mat", "on", "sits"]```, the sentence ```"a cat sits on a mat"``` with tokens ```["A", "cat", "sits", "on", "a", "mat"]```, will get converted to the list of ids: ```[0, 2, 5, 4, 1, 3]```

Below you will implement the necessary functionality to implement word tokenization and conversion of words into ids. Specifically you will implement three functions:

- `word_tokenize` : Converts a text (represented as python string) into a sequence of words
- `fit_vocab`: Constructs vocabulary i.e. list of unique tokens from a corpus (a list of text documents)
- `convert_token_to_ids`: Converts a list of tokens to ids

You will be using functionalities from different containers like lists, sets, and dictionaries, which you just learned. Like previous exercise, we recommend you to avoid using any external libraries to do this exercise to maximize your learning of the underlying concepts.

**Important Note: Remove `raise NotImplementedError()` when you write your code in the functions**


In [86]:
def word_tokenize(text: str) -> list:
    """
    Tokenize a text into words. You can assume that no punctuations are present in the text.

    E.g.: "we all live in a yellow submarine" -> ["we", "all", "live", "in", "a", "yellow", "submarine"]

    Input:
        - text: string
    Output:
        - list of strings

    """
    words = text
    # YOUR CODE HERE
    words = text.split()
    return words

In [87]:
def test_word_tokenize():
    assert word_tokenize("we all live in a yellow submarine") == [
        "we",
        "all",
        "live",
        "in",
        "a",
        "yellow",
        "submarine",
    ]
    assert word_tokenize("This is a sentence about to be word tokenized?") == [
        "This",
        "is",
        "a",
        "sentence",
        "about",
        "to",
        "be",
        "word",
        "tokenized?",
    ]
    print("All tests pass")


test_word_tokenize()

All tests pass


In [88]:
def fit_vocab(corpus):
    """
    Creates a vocabulary from a corpus i.e. a list of documents.


    Input:
        - corpus: list of strings
    Output:
        - List of unique words in the corpus (sorted in alphabetical order)
        - Dictionary mapping each word to its index in the vocabulary

    Important: Remember to sort the vocabulary before using it to construct the dictionary mapping
    """

    vocab = None
    word_to_idx = None

    # YOUR CODE HERE
    vocab = set()
    for text in corpus:
        for word in word_tokenize(text):
            vocab.add(word)
    vocab = sorted(list(vocab))
    word_to_idx = {word: idx for idx, word in enumerate(vocab)}

    return vocab, word_to_idx

In [89]:
def test_fit_vocab():
    corpus = [
        "We all live in a yellow submarine",
        "I am the walrus",
        "Yellow Submarine",
    ]
    vocab, word_to_idx = fit_vocab(corpus)
    assert vocab == [
        "I",
        "Submarine",
        "We",
        "Yellow",
        "a",
        "all",
        "am",
        "in",
        "live",
        "submarine",
        "the",
        "walrus",
        "yellow",
    ]
    assert word_to_idx == {
        "I": 0,
        "Submarine": 1,
        "We": 2,
        "Yellow": 3,
        "a": 4,
        "all": 5,
        "am": 6,
        "in": 7,
        "live": 8,
        "submarine": 9,
        "the": 10,
        "walrus": 11,
        "yellow": 12,
    }

    print("All tests pass")


test_fit_vocab()

All tests pass


In [90]:
def convert_tokens_to_ids(tokens, word_to_idx):
    """
    Convert a list of tokens to a list of token ids using the word_to_idx dictionary.

    Input:
        - tokens: list of strings
        - word_to_idx: dictionary mapping words to their ids
    Output:
        - list of integers
    """

    token_ids = None
    # YOUR CODE HERE
    token_ids = [word_to_idx[token] for token in tokens]
    return token_ids

In [91]:
def test_convert_tokens_to_ids():
    word_to_idx = {
        "I": 0,
        "Submarine": 1,
        "We": 2,
        "Yellow": 3,
        "a": 4,
        "all": 5,
        "am": 6,
        "in": 7,
        "live": 8,
        "submarine": 9,
        "the": 10,
        "walrus": 11,
        "yellow": 12,
    }
    tokens = ["We", "all", "live", "in", "a", "yellow", "submarine"]
    assert convert_tokens_to_ids(tokens, word_to_idx) == [2, 5, 8, 7, 4, 12, 9]
    print("All tests pass")


test_convert_tokens_to_ids()

All tests pass


### Classes

The syntax for defining classes in Python is straightforward:

In [92]:
class Greeter:

    # Constructor
    def __init__(self, name):
        self.name = name  # Create an instance variable

    # Instance method
    def greet(self, loud=False):
        if loud:
            print("HELLO, {}".format(self.name.upper()))
        else:
            print("Hello, {}!".format(self.name))


g = Greeter("Fred")  # Construct an instance of the Greeter class
g.greet()  # Call an instance method; prints "Hello, Fred"
g.greet(loud=True)  # Call an instance method; prints "HELLO, FRED!"

Hello, Fred!
HELLO, FRED


### Exercise 3: Implementing the Full Tokenization Pipeline from Scratch (1.5 Points)

We will now take everything we learned to implement a typical tokenization pipeline very similar to the one used in [Hugging Face Transformers library](https://huggingface.co/docs/tokenizers/en/pipeline). In particular, you will implement the class `WordTokenizer` with the following methods:

- `normalizer`: Applies the text processing techniques to the input text (we will use the same 3 techniques we implemented before; we use a different terminology i.e. normalizer instead of processing to be close to the huggingface naming convention)
- `tokenize`: Splits the text into a list of words
- `convert_tokens_to_ids`: Converts a list of tokens to a list of ids
- `convert_ids_to_tokens`: Converts a list of ids to a list of tokens
- `encode`: Applies `tokenize` and `convert_tokens_to_ids` methods in succession to an input text
- `decode`: Converts the list of ids back to the text form
- `train`: Trains the tokenizer, which for the word tokenizer means simply fitting the vocabulary over the corpus
  
**Important Note: Remove `raise NotImplementedError()` when you write your code in the functions**

In [93]:
class WordTokenizer:

    def __init__(self):
        """
        Constructor for the WordTokenizer class.

        Initialize `vocab` attribute as an empty list and `word_to_idx` attribute as an empty dictionary.
        """

        # YOUR CODE HERE
        self.vocab = []
        self.word_to_idx = {}

    def normalizer(self, text):
        """
        Normalizes the input text by converting it to lowercase, removing extra spaces, and removing punctuations.

        Input:
            - text: string
        Output:
            - string

        """

        normalized_text = to_lowercase(remove_extra_spaces(remove_punctuations(text)))
        # YOUR CODE HERE
        return normalized_text

    def tokenize(self, text):
        """
        Tokenizes the input text into words.

        Input:
            - text: string
        Output:
            - list of strings

        """

        tokens = None
        # YOUR CODE HERE
        tokens = word_tokenize(text)
        return tokens

    def convert_tokens_to_ids(self, tokens):
        """
        Convert a list of tokens to a list of token ids using the word_to_idx dictionary.

        Input:
            - tokens: list of strings
        Output:
            - list of integers

        """

        # Throw an error if the word_to_idx dictionary is empty
        if not self.word_to_idx:
            raise ValueError("Tokenizer has not been fit on a vocabulary yet. Call `train` method with a corpus first!")

        token_ids = None
        # YOUR CODE HERE
        token_ids = convert_tokens_to_ids(tokens, self.word_to_idx)
        return token_ids

    def convert_ids_to_tokens(self, token_ids):
        """
        Convert a list of token ids to a list of tokens using the idx_to_word dictionary.

        Input:
            - token_ids: list of integers
        Output:
            - list of strings

        """

        # Throw an error if the vocabulary is empty
        if self.vocab == []:
            raise ValueError("Tokenizer has not been fit on a vocabulary yet. Call `train` method with a corpus first!")

        tokens = None
        # YOUR CODE HERE
        tokens = [self.vocab[token] for token in token_ids]
        return tokens

    def encode(self, text):
        """
        Encodes the input text into token ids. Do not forget to normalize the text before tokenizing it.

        Input:
            - text: string
        Output:
            - list of integers

        """

        token_ids = None
        # YOUR CODE HERE
        text = self.normalizer(text)
        token_ids = convert_tokens_to_ids(self.tokenize(text), self.word_to_idx)
        return token_ids

    def decode(self, token_ids):
        """
        Decodes the input token ids into text.

        Input:
            - token_ids: list of integers
        Output:
            - string

        """
        text = None
        # YOUR CODE HERE
        text = " ".join(self.convert_ids_to_tokens(token_ids))
        return text

    def train(self, corpus):
        """
        Trains the tokenizer on a corpus i.e. a list of documents, filling in values of `self.vocab` and `self.word_to_idx`.

        Input:
            - corpus: list of strings

        Note: Do not forget to normalize the text before tokenizing it.
        """
        

        # YOUR CODE HERE
        corpus = [self.normalizer(text) for text in corpus]
        self.vocab, self.word_to_idx = fit_vocab(corpus)


In [94]:
def test_WordTokenizer():
    tokenizer = WordTokenizer()
    corpus = [
        "We all live in a yellow submarine",
        "I am the walrus",
        "Yellow Submarine",
    ]
    tokenizer.train(corpus)
    token_ids = tokenizer.encode("We all live in a yellow submarine")

    assert token_ids == [9, 1, 5, 4, 0, 10, 6]

    text = tokenizer.decode(token_ids)
    assert text == "we all live in a yellow submarine"

    print("All tests pass")


test_WordTokenizer()

All tests pass


## Part 2: Pytorch

Some tutorial content has been adapted from Stanford's CS224N PyTorch Tutorial and from Assignment 0 of CSE 517 (Winter 2024).

### Introduction
[PyTorch](https://pytorch.org/) is a deep learning framework, one of the two main frameworks alongside [TensorFlow](https://www.tensorflow.org/).
Let's start by installing and importing PyTorch:

In [None]:
!pip install torch

In [3]:
import torch
import torch.nn as nn

Tensors

This part is adapted from Stanford's CS224N PyTorch Tutorial

**Tensors** are PyTorch's most basic building block. Each tensor is a multi-dimensional matrix; for example, a 256x256 square image might be represented by a `3x256x256` tensor, where the first dimension represents color. Here's how to create a tensor from a python list:

In [96]:
list_of_lists = [
    [1, 2, 3],
    [4, 5, 6],
]

# This is just a normal python list
print(list_of_lists)

[[1, 2, 3], [4, 5, 6]]


In [97]:
# And now we make a PyTorch tensor with the same data
data = torch.tensor(list_of_lists)
print(data)

tensor([[1, 2, 3],
        [4, 5, 6]])


In [98]:
# We can, of course, create the tensor directly from a list (of lists)
data = torch.tensor(
    [
        [0, 1],
        [2, 3],
        [4, 5],
    ]
)
print(data)

tensor([[0, 1],
        [2, 3],
        [4, 5]])


In [99]:
# ValueError: tensors cannot have different lengths at different indices
torch.tensor(
    [
        [0, 1],
        [2, 3, 4],
    ]
)

ValueError: expected sequence of length 2 at dim 1 (got 3)

Each tensor has a **data type**: the major data types you'll need to worry about are floats (`torch.float32`) and integers (`torch.int`). You can specify the data type explicitly when you create the tensor, or it can be determined implicitly based on the provided data.

In [100]:
# Initializing a tensor with an explicit data type
# Notice the dots after the numbers in the output, which specify that they're floats
data = torch.tensor(
    [
        [0, 1],
        [2, 3],
        [4, 5],
    ],
    dtype=torch.float32,
)
print(data)

tensor([[0., 1.],
        [2., 3.],
        [4., 5.]])


In [101]:
# Initializing a tensor with an implicit data type
# Notice the dots after the numbers, which specify that they're floats
data = torch.tensor(
    [
        [0.11111111, 1],
        [2, 3],
        [4, 5],
    ]
)
print(data, data.dtype)

tensor([[0.1111, 1.0000],
        [2.0000, 3.0000],
        [4.0000, 5.0000]]) torch.float32


In [102]:
# Contrast that to this version, which contains ints
data = torch.tensor(
    [
        [0, 1],
        [2, 3],
        [4, 5],
    ]
)
print(data, data.dtype)

tensor([[0, 1],
        [2, 3],
        [4, 5]]) torch.int64


From here on out, for brevity, we frequently use the behavior of a Jupyter Notebook display the value of the last line of a cell.

Note that tensors are more flexible than just matrices, being able to have any number of dimensions.

Utility functions also exist to create tensors with given shapes and contents:

In [103]:
torch.zeros(2, 5)  # a tensor of all zeros

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

In [104]:
torch.ones(3, 4)  # a tensor of all ones

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

In [105]:
torch.rand(3, 3)  # a tensor with random numbers within (0,1)

tensor([[0.0202, 0.9506, 0.8732],
        [0.6783, 0.7848, 0.8923],
        [0.6374, 0.6257, 0.1092]])

In [106]:
rr = torch.arange(1, 10)  # range from [1, 10)
print(rr)

tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])


Mathematical operations can be used with tensors fairly flexibly. Operations involving a tensor and a number are easy to reason with, while operations involving tensors require them to have compatible shapes.

In [107]:
print(rr + 2)
print(rr * 2)
print(rr**2)
print(rr % 3)

tensor([ 3,  4,  5,  6,  7,  8,  9, 10, 11])
tensor([ 2,  4,  6,  8, 10, 12, 14, 16, 18])
tensor([ 1,  4,  9, 16, 25, 36, 49, 64, 81])
tensor([1, 2, 0, 1, 2, 0, 1, 2, 0])


In [108]:
a = torch.tensor([[1, 2], [2, 3], [4, 5]])  # (3, 2)
b = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8]])  # (2, 4)  (3, 4)

print("A is", a)
print("B is", b)

# a.matmul(b) and a@b do the same thing -- matrix multiply
print("a.matmul(b) is", a.matmul(b))
print("a @ b is", a @ b)

A is tensor([[1, 2],
        [2, 3],
        [4, 5]])
B is tensor([[1, 2, 3, 4],
        [5, 6, 7, 8]])
a.matmul(b) is tensor([[11, 14, 17, 20],
        [17, 22, 27, 32],
        [29, 38, 47, 56]])
a @ b is tensor([[11, 14, 17, 20],
        [17, 22, 27, 32],
        [29, 38, 47, 56]])


We can also compute **inverse** of a square matrix in pytorch

In [109]:
A = torch.randn(4, 4)
Ainv = torch.linalg.inv(A)

print("A is", A)
print("Ainv is", Ainv)
print("A @ Ainv is", A @ Ainv)
print("Distance from I is", torch.linalg.norm(A @ Ainv - torch.eye(4)))

A is tensor([[-0.6422,  0.9027,  2.5695, -0.1949],
        [ 0.7295,  0.9331, -1.2526, -0.8328],
        [-0.0088, -0.5572, -1.9768, -1.3940],
        [-0.3536,  0.2622,  0.4919, -1.0026]])
Ainv is tensor([[-8.2242, -0.0343, -7.6255, 12.2292],
        [ 4.5985,  0.7939,  3.7800, -6.8088],
        [-3.0855, -0.2813, -3.0692,  5.1006],
        [ 2.5892,  0.0817,  2.1720, -4.5884]])
A @ Ainv is tensor([[ 1.0000e+00,  2.9802e-07,  7.1526e-07, -1.9073e-06],
        [-4.7684e-07,  1.0000e+00, -4.7684e-07,  1.4305e-06],
        [ 0.0000e+00, -1.7881e-07,  1.0000e+00,  9.5367e-07],
        [ 1.1921e-07,  7.4506e-08,  2.3842e-07,  1.0000e+00]])
Distance from I is tensor(2.9270e-06)


We can also compute **Pseudo (Moore-Penrose) Inverse** when a matrix is not invertible. If you are not familiar with Pseudo inverse, you can refer to [Section 2.9 in this document](https://www.deeplearningbook.org/contents/linear_algebra.html).

In [110]:
A = torch.randn(3, 5)
Apinv = torch.linalg.pinv(A)

print("A is", A)
print("Apinv is", Apinv)
print("A @ Apinv is", A @ Apinv)
print("Distance from I is", torch.dist(A @ Apinv, torch.eye(3)))
print("Apinv @ A is", Apinv @ A)
print("Distance from I is", torch.dist(Apinv @ A, torch.eye(5)))

A is tensor([[-0.4108, -2.2564, -0.4473,  0.3267, -0.6969],
        [-0.7866, -0.8247, -1.1705, -0.2358,  0.7393],
        [ 0.1280,  0.1413,  0.2283, -1.0081,  0.7400]])
Apinv is tensor([[ 0.0868, -0.3153,  0.2040],
        [-0.4643,  0.0755, -0.2987],
        [ 0.1900, -0.5159,  0.3835],
        [-0.1341,  0.0818, -0.7320],
        [-0.1676,  0.3107,  0.2576]])
A @ Apinv is tensor([[ 1.0000e+00,  1.3411e-07, -1.6391e-07],
        [ 5.2154e-08,  1.0000e+00,  1.4901e-08],
        [-7.4506e-09,  2.3842e-07,  1.0000e+00]])
Distance from I is tensor(3.6485e-07)
Apinv @ A is tensor([[ 0.2385,  0.0931,  0.3769, -0.1030, -0.1426],
        [ 0.0931,  0.9431,  0.0510,  0.1316,  0.1584],
        [ 0.3769,  0.0510,  0.6064, -0.2029, -0.2299],
        [-0.1030,  0.1316, -0.2029,  0.6748, -0.3877],
        [-0.1426,  0.1584, -0.2299, -0.3877,  0.5372]])
Distance from I is tensor(1.4142)


The **shape** of a tensor (which can be accessed by `.shape`) is defined as the dimensions of the tensor. Here's some examples:

In [111]:
matr_2d = torch.tensor([[1, 2, 3], [4, 5, 6]])
print(matr_2d.shape)
print(matr_2d)

torch.Size([2, 3])
tensor([[1, 2, 3],
        [4, 5, 6]])


In [112]:
matr_3d = torch.tensor(
    [
        [[1, 2, 3, 4], [-2, 5, 6, 9]],
        [[5, 6, 7, 2], [8, 9, 10, 4]],
        [[-3, 2, 2, 1], [4, 6, 5, 9]],
    ]
)
print(matr_3d)
print(matr_3d.shape)

tensor([[[ 1,  2,  3,  4],
         [-2,  5,  6,  9]],

        [[ 5,  6,  7,  2],
         [ 8,  9, 10,  4]],

        [[-3,  2,  2,  1],
         [ 4,  6,  5,  9]]])
torch.Size([3, 2, 4])


**Reshaping** tensors can be used to make batch operations easier (more on that later), but be careful that the data is reshaped in the order you expect:

In [113]:
rr = torch.arange(1, 16)
print(rr)
print(rr.shape)

tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])
torch.Size([15])


In [114]:
rr = rr.view(5, 3)
print(rr)
print(rr.shape)

tensor([[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9],
        [10, 11, 12],
        [13, 14, 15]])
torch.Size([5, 3])


One of the reasons why we use **tensors** is *vectorized operations*: operations that be conducted in parallel over a particular dimension of a tensor.

In [115]:
data = torch.arange(1, 36, dtype=torch.float32).reshape(5, 7)
print("Data is:", data)

# We can perform operations like *sum* over each row...
print("Taking the sum over columns:")
print(data.sum(dim=0))

# or over each column.
print("Taking thep sum over rows:")
print(data.sum(dim=1))

# Other operations are available:
print("Taking the stdev over rows:")
print(data.std(dim=1))

Data is: tensor([[ 1.,  2.,  3.,  4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11., 12., 13., 14.],
        [15., 16., 17., 18., 19., 20., 21.],
        [22., 23., 24., 25., 26., 27., 28.],
        [29., 30., 31., 32., 33., 34., 35.]])
Taking the sum over columns:
tensor([ 75.,  80.,  85.,  90.,  95., 100., 105.])
Taking thep sum over rows:
tensor([ 28.,  77., 126., 175., 224.])
Taking the stdev over rows:
tensor([2.1602, 2.1602, 2.1602, 2.1602, 2.1602])


In [116]:
data.sum()

tensor(630.)

**Indexing**

You can access arbitrary elements of a tensor using the `[]` operator.

In [117]:
x = torch.Tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]], [[9, 10], [11, 12]]])
print(x)
print(x.shape)

tensor([[[ 1.,  2.],
         [ 3.,  4.]],

        [[ 5.,  6.],
         [ 7.,  8.]],

        [[ 9., 10.],
         [11., 12.]]])
torch.Size([3, 2, 2])


In [118]:
# Access the 0th element, which is the first row
x[0]  # Equivalent to x[0, :, :]

tensor([[1., 2.],
        [3., 4.]])

In [119]:
x[:, 0]

tensor([[ 1.,  2.],
        [ 5.,  6.],
        [ 9., 10.]])

In [120]:
x[1, :, 0]

tensor([5., 7.])

### Exercise 4: Implementing Linear Regression Using Pytorch Tensor Operations (2 Points)

You will now use what we just learned to implement linear regression using Pytorch. Recall that linear regression is a supervised learning algorithm, which is used for learning a linear relationship between a scalar output variable and input features, i.e.,

$$ y = w_0 + w_1x_1 + w_2x_2 + \cdots + w_nx_n$$

where $y \in \mathbb{R}$ is the scalar output variable and $x \in \mathbb{R}^n$ corresponds to the input features. The weights $w \in \mathbb{R}^{n+1}$ defines the linear transformation of the input features to the scalar output. We can simplify the notation by considering $x_0 = 1$,

$$y = w^Tx \text{ \quad(Note that here } x \in \mathbb{R}^{n+1}\text{ )}$$


For the cases with $m$ inputs and outputs, we can write the vectorized form as follows:

$$Y = Xw$$

where $X \in \mathbb{R}^{m \times n + 1}$ and $Y \in \mathbb{R}^m$

Training a linear regression model involves finding the values of $w$ that achieve minimum squared error on the observed data, i.e., $w = \mathrm{argmin} ||Y - Xw||_2^2$

While there are different methods for solving the linear regression problem, we will focus on the normal equations approach, which defines the exact solution to the problem as:

$w = (X^TX)^{-1}X^TY$

For numerical stability, we recommend computing the pseudo inverse of $X^TX$ instead.

We recommend checking [Stanford's CS229 notes on Linear Regression](https://see.stanford.edu/materials/aimlcs229/cs229-notes1.pdf) for those unfamiliar with the topic.

**Important Note: Remove `raise NotImplementedError()` when you write your code in the functions**

In [16]:
class LinearRegression:

    def __init__(self, n_features):
        """
        Constructor for the LinearRegression class.

        Initialize the weight vector w as zeros.

        Input:
            - n_features: int
        """
        # YOUR CODE HERE
        self.weights = torch.zeros(n_features+1)

    def predict(self, X):
        """
        Predicts the target variable given the input features.

        Input:
            - X: either a 1d torch.Tensor of shape (n_features,) or a 2d torch.Tensor of shape (n_samples, n_features)

        Output:
            - torch.float32 if X is a 1d torch.Tensor or torch.Tensor of shape (n_samples,) if X is a 2d torch.Tensor
        """

        y_pred = None

        # YOUR CODE HERE
        if X.dim() == 1:
            X = X.unsqueeze(1)
        X_aug = torch.cat((torch.ones((X.shape[0], 1)), X), dim=1)
        y_pred = X_aug @ self.weights
        return y_pred

    def fit(self, X, Y):
        """
        Solves the linear regression problem using the normal equations.

        Input:
            - X: torch.Tensor of shape (n_samples, n_features)
            - Y: torch.Tensor of shape (n_samples,)

        """

        # YOUR CODE HERE
        if X.dim() == 1:
            X = X.unsqueeze(1)
        X_aug = torch.cat((torch.ones((X.shape[0], 1)), X), dim=1)
        self.weights = torch.linalg.pinv(X_aug.T @ X_aug) @ X_aug.T @ Y

    def mse_loss(self, Y_pred, Y):
        """
        Computes the mean squared error loss ||Y - Y_pred||_2^2.

        Inputs:
            - Y_pred: torch.Tensor of shape (n_samples,) containing the predicted target variable
            - Y: torch.Tensor of shape (n_samples,) containing the true target variable

        Output:
            - torch.float32

        """

        mse = None
        # YOUR CODE HERE
        mse = torch.mean((Y - Y_pred)**2)
        return mse

In [17]:
def test_LinearRegression():

    X = torch.tensor([[1.0], [2.0], [3.0], [4.0], [5.0]])
    Y = torch.tensor([3.0, 5.0, 7.0, 9.0, 11.0])
    model = LinearRegression(1)
    # Test if weights are initialized correctly
    assert torch.all(model.weights == torch.tensor([0.0, 0.0]))

    # Test if training works correctly
    model.fit(X, Y)
    assert model.weights.shape == (2,)
    assert torch.allclose(model.weights, torch.tensor([1.0, 2.0]), atol=1e-4)

    # Test if prediction works correctly
    y_pred = model.predict(torch.tensor([6]))
    assert torch.allclose(y_pred, torch.tensor(13.0), atol=1e-4)

    # Test if prediction works correctly
    y_pred = model.predict(X)
    assert y_pred.shape == (5,)
    assert torch.allclose(y_pred, Y, atol=1e-4)

    # Test if mean squared error loss is computed correctly
    mse = model.mse_loss(y_pred, Y)
    assert torch.allclose(mse, torch.tensor(0.0), atol=1e-4)

    mse = model.mse_loss(y_pred + 2, Y)
    assert torch.allclose(mse, torch.tensor(4.0), atol=1e-4)

    print("All tests pass")

test_LinearRegression()

All tests pass


There is a lot more to Pytorch than what we just discussed today. Pytorch and other deep learning libraries are popular because they provide auto-differentiation, which as you will learn in the lectures comes in super-handy when training neural network models. We will sprinkle tutorials on advanced usage of Pytorch in the upcoming projects.