In [None]:
'''
 * Copyright (c) 2004 Radhamadhab Dalai
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
'''

## The Basic Ingredients
# 2. Preparation for Deep Learning

To prepare for your dive into deep learning, you will need a few survival skills:

1. **Techniques for Storing and Manipulating Data**
2. **Libraries for Ingesting and Preprocessing Data** from a variety of sources.
3. **Basic Linear Algebraic Operations** applied to high-dimensional data elements.
4. **Just Enough Calculus** to determine which direction to adjust each parameter in order to decrease the loss function.
5. **Automatic Derivative Computation** so that you can focus less on manual calculus.
6. **Basic Fluency in Probability**, our primary language for reasoning under uncertainty.
7. **Aptitude for Finding Answers** in the official documentation when you get stuck.

In short, this chapter provides a rapid introduction to the basics that you will need to follow most of the technical content in this book.

## 2.1 Data Manipulation

In order to get anything done, we need some way to store and manipulate data. Generally, there are two important things we need to do with data:

1. **Acquire Data**
2. **Process Data** once it is inside the computer.

### Tensors

There is no point in acquiring data without some way to store it, so to start, let’s get our hands dirty with n-dimensional arrays, which we also call **tensors**. If you already know the NumPy scientific computing package, this will be a breeze. 

For all modern deep learning frameworks, the tensor class (ndarray in MXNet, Tensor in PyTorch and TensorFlow) resembles NumPy’s ndarray, with a few killer features added:

- **Automatic Differentiation**: The tensor class supports automatic differentiation.
- **GPU Acceleration**: It leverages GPUs to accelerate numerical computation, whereas NumPy only runs on CPUs.

These properties make neural networks both easy to code and fast to run.

### 2.1.1 Getting Started

To start, we import the PyTorch library. Note that the package name is `torch`.

```python
import torch



# Introduction to Deep Learning and Tensors

## 1.7 The Essence of Deep Learning

Thus far, we have talked about machine learning broadly. **Deep learning** is the subset of machine learning concerned with models based on many-layered neural networks. It is deep in precisely the sense that its models learn many layers of transformations. While this might sound narrow, deep learning has given rise to a dizzying array of models, techniques, problem formulations, and applications.

Many intuitions have been developed to explain the benefits of depth. Arguably, all machine learning has many layers of computation, the first consisting of feature processing steps. What differentiates deep learning is that the operations learned at each of the many layers of representations are learned jointly from data.

The problems that we have discussed so far, such as learning from the raw audio signal, the raw pixel values of images, or mapping between sentences of arbitrary lengths and their counterparts in foreign languages, are those where deep learning excels and traditional methods falter. 

It turns out that these many-layered models are capable of addressing low-level perceptual data in a way that previous tools could not. Arguably the most significant commonality in deep learning methods is **end-to-end training**. That is, rather than assembling a system based on components that are individually tuned, one builds the system and then tunes their performance jointly. 

For instance, in computer vision, scientists used to separate the process of feature engineering from the process of building machine learning models. The Canny edge detector (Canny, 1987) and Lowe’s SIFT feature extractor (Lowe, 2004) reigned supreme for over a decade as algorithms for mapping images into feature vectors. 

In bygone days, the crucial part of applying machine learning to these problems consisted of coming up with manually-engineered ways of transforming the data into some form amenable to shallow models. Unfortunately, there is only so much that humans can accomplish by ingenuity in comparison with a consistent evaluation over millions of choices carried out automatically by an algorithm. 

When deep learning took over, these feature extractors were replaced by automatically tuned filters, yielding superior accuracy. Thus, one key advantage of deep learning is that it replaces not only the shallow models at the end of traditional learning pipelines but also the labor-intensive process of feature engineering. 

Moreover, by replacing much of the domain-specific preprocessing, deep learning has eliminated many of the boundaries that previously separated computer vision, speech recognition, natural language processing, medical informatics, and other application areas, offering a unified set of tools for tackling diverse problems.

Beyond end-to-end training, we are experiencing a transition from parametric statistical descriptions to fully nonparametric models. When data is scarce, one needs to rely on simplifying assumptions about reality in order to obtain useful models. When data is abundant, these can be replaced by nonparametric models that better fit the data.

To some extent, this mirrors the progress that physics experienced in the middle of the previous century with the availability of computers. Rather than solving parametric approximations of how electrons behave by hand, one can now resort to numerical simulations of the associated partial differential equations. 

This has led to much more accurate models, albeit often at the expense of explainability. Another difference to previous work is the acceptance of suboptimal solutions, dealing with nonconvex nonlinear optimization problems, and the willingness to try things before proving them. 

This newfound empiricism in dealing with statistical problems, combined with a rapid influx of talent, has led to rapid progress in practical algorithms, albeit in many cases at the expense of modifying and reinventing tools that existed for decades. 

In the end, the deep learning community prides itself on sharing tools across academic and corporate boundaries, releasing many excellent libraries, statistical models, and trained networks as open source. It is in this spirit that the notebooks forming this book are freely available for distribution and use. We have worked hard to lower the barriers of access for everyone to learn about deep learning, and we hope that our readers will benefit from this.

## 1.8 Summary

Machine learning studies how computer systems can leverage experience (often data) to improve performance at specific tasks. It combines ideas from statistics, data mining, and optimization. Often, it is used as a means of implementing AI solutions. As a class of machine learning, representational learning focuses on how to automatically find the appropriate way to represent data.

As multi-level representation learning through learning many layers of transformations, deep learning replaces not only the shallow models at the end of traditional machine learning pipelines but also the labor-intensive process of feature engineering. 

Much of the recent progress in deep learning has been triggered by an abundance of data arising from cheap sensors and Internet-scale applications, and by significant progress in computation, mostly through GPUs. 

Besides, the availability of efficient deep learning frameworks has made the design and implementation of whole system optimization significantly easier, which is a key component in obtaining high performance.

## Preparing for Deep Learning

To prepare for your dive into deep learning, you will need a few survival skills:

1. Techniques for storing and manipulating data.
2. Libraries for ingesting and preprocessing data from a variety of sources.
3. Knowledge of the basic linear algebraic operations that we apply to high-dimensional data elements.
4. Just enough calculus to determine which direction to adjust each parameter in order to decrease the loss function.
5. The ability to automatically compute derivatives so that you can forget much of the calculus you just learned.
6. Some basic fluency in probability, our primary language for reasoning under uncertainty.
7. Some aptitude for finding answers in the official documentation when you get stuck.

In short, this chapter provides a rapid introduction to the basics that you will need to follow most of the technical content in this book.

## 2.1 Data Manipulation

In order to get anything done, we need some way to store and manipulate data. Generally, there are two important things we need to do with data: 

1. Acquire it.
2. Process it once it is inside the computer.

There is no point in acquiring data without some way to store it, so to start, let’s get our hands dirty with **n-dimensional arrays**, which we also call **tensors**. If you already know the NumPy scientific computing package, this will be a breeze. For all modern deep learning frameworks, the tensor class (ndarray in MXNet, Tensor in PyTorch and TensorFlow) resembles NumPy’s ndarray, with a few killer features added.

First, the tensor class supports **automatic differentiation**. Second, it leverages **GPUs** to accelerate numerical computation, whereas NumPy only runs on CPUs. These properties make neural networks both easy to code and fast to run.

### 2.1.1 Getting Started

To start, we import the PyTorch library. Note that the package name is `torch`.

```python
# Importing the PyTorch library
import torch


In [2]:
import torch
# Creating a tensor using arange
x = torch.arange(12, dtype=torch.float32)
print(x)


tensor([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.])


In [3]:
# Shape of tensor x
tensor_shape = x.shape
print(tensor_shape)


torch.Size([12])


In [4]:
# Reshaping tensor x to a matrix X
X = x.reshape(3, 4)
print(X)


tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]])


### 2.1.2 Indexing and Slicing

As with Python lists, we can access tensor elements by indexing (starting with 0). To access an element based on its position relative to the end of the list, we can use **negative indexing**. Additionally, we can access whole ranges of indices via **slicing** (e.g., `X[start:stop)`), where the returned value includes the first index (`start`) but not the last (`stop`).

When only one index (or slice) is specified for a \(k\)-th order tensor, it is applied along axis 0. Thus, in the following code, `[-1]` selects the last row and `[1:3]` selects the second and third rows.

```python


In [5]:
X = torch.tensor([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
print(X[-1])   # Last row
print(X[1:3])  # Second and third rows



tensor([4, 3, 2, 1])
tensor([[1, 2, 3, 4],
        [4, 3, 2, 1]])


In [6]:
# Assigning a new value to an element
X[1, 2] = 17
print(X)

tensor([[ 2,  1,  4,  3],
        [ 1,  2, 17,  4],
        [ 4,  3,  2,  1]])


In [7]:
# Assigning a value to multiple elements
X[:2, :] = 12
print(X)

tensor([[12, 12, 12, 12],
        [12, 12, 12, 12],
        [ 4,  3,  2,  1]])


### 2.1.3 Operations

Now that we know how to construct tensors and how to read from and write to their elements, we can begin to manipulate them with various mathematical operations. Among the most useful tools are the **elementwise operations**. These apply a standard scalar operation to each element of a tensor. For functions that take two tensors as inputs, elementwise operations apply some standard binary operator on each pair of corresponding elements.

We can create an elementwise function from any function that maps from a scalar to a scalar. In mathematical notation, we denote such **unary scalar operators** (taking one input) by the signature $ f: \mathbb{R} \to \mathbb{R} $. This just means that the function maps from any real number onto some other real number. Most standard operators can be applied elementwise, including unary operators like $ e^x $.




In [8]:
import torch

# Example of applying the exponential function elementwise
x = torch.arange(12, dtype=torch.float32)
torch.exp(x)

tensor([1.0000e+00, 2.7183e+00, 7.3891e+00, 2.0086e+01, 5.4598e+01, 1.4841e+02,
        4.0343e+02, 1.0966e+03, 2.9810e+03, 8.1031e+03, 2.2026e+04, 5.9874e+04])

Likewise, we denote binary scalar operators, which map pairs of real numbers to a (single) real number via the signature $f : R, R → R$. Given any two vectors u and v of the same shape, and a binary operator $f$ , we can produce a vector $c = F(u, v)$ by setting $ci ← f (ui, vi )$ for all i, where $c_i$, $u_i$ , and $v_i$ are the i th elements of vectors c, u, and v. Here, we produced the vector-valued $F : R^d, R^d → R^d$ by lifting the scalar function to an elementwise vector operation. The common standard arithmetic operators for addition (+), subtraction (-), multiplication (*), division (/), and exponentiation (**) have all been lifted to elementwise operations for identically-shaped tensors of arbitrary shape.


In [None]:
# Example tensors
x = torch.tensor([1.0, 2, 4, 8])
y = torch.tensor([2, 2, 2, 2])

# Elementwise operations
x + y, x - y, x * y, x / y, x ** y

In [11]:
import torch

# Example tensors
x = torch.tensor([1.0, 2, 4, 8])  # Float tensor
y = torch.tensor([2, 2, 2, 2], dtype=torch.float32)  # Convert y to Float tensor

# Elementwise operations
sum_result = x + y
diff_result = x - y
prod_result = x * y
div_result = x / y
pow_result = x ** y

# Print results
print("x + y:", sum_result)
print("x - y:", diff_result)
print("x * y:", prod_result)
print("x / y:", div_result)
print("x ** y:", pow_result)



x + y: tensor([ 3.,  4.,  6., 10.])
x - y: tensor([-1.,  0.,  2.,  6.])
x * y: tensor([ 2.,  4.,  8., 16.])
x / y: tensor([0.5000, 1.0000, 2.0000, 4.0000])
x ** y: tensor([ 1.,  4., 16., 64.])


In [12]:
import torch

# Create a 2D tensor (matrix)
tensor_2d = torch.tensor([[1, 2, 3],
                           [4, 5, 6]])

# Transpose using .t() method
transposed_tensor = tensor_2d.t()

# Alternatively, you can use torch.transpose
# transposed_tensor = torch.transpose(tensor_2d, 0, 1)

# Print the original and transposed tensors
print("Original Tensor:\n", tensor_2d)
print("Transposed Tensor:\n", transposed_tensor)


Original Tensor:
 tensor([[1, 2, 3],
        [4, 5, 6]])
Transposed Tensor:
 tensor([[1, 4],
        [2, 5],
        [3, 6]])


## Tensor Operations in PyTorch

Given two tensors:

$$
X = \text{torch.arange}(12, \text{dtype=torch.float32}).\text{reshape}((3, 4))
$$
$$
Y = \text{torch.tensor}([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
$$

### Concatenation

When we concatenate the tensors along different dimensions, we can observe the following:

1. **Concatenating along axis 0 (rows)**:
   - The axis-0 lengths of the two tensors are \(3\) and \(3\). Thus, the resulting axis-0 length will be:
   $$
   \text{axis-0 length} = 3 + 3 = 6
   $$

   Resulting tensor:
   $$
   \text{torch.cat}((X, Y), \text{dim}=0) = 
   \begin{bmatrix}
   0. & 1. & 2. & 3. \\
   4. & 5. & 6. & 7. \\
   8. & 9. & 10. & 11. \\
   2. & 1. & 4. & 3. \\
   1. & 2. & 3. & 4. \\
   4. & 3. & 2. & 1.
   \end{bmatrix}
   $$

2. **Concatenating along axis 1 (columns)**:
   - The axis-1 lengths of the two tensors are \(4\) and \(4\). Thus, the resulting axis-1 length will be:
   $$
   \text{axis-1 length} = 4 + 4 = 8
   $$

   Resulting tensor:
   $$
   \text{torch.cat}((X, Y), \text{dim}=1) = 
   \begin{bmatrix}
   0. & 1. & 2. & 3. & 2. & 1. & 4. & 3. \\
   4. & 5. & 6. & 7. & 1. & 2. & 3. & 4. \\
   8. & 9. & 10. & 11. & 4. & 3. & 2. & 1.
   \end{bmatrix}
   $$

### Logical Comparison

We can construct a binary tensor by comparing the elements of \(X\) and \(Y\):
$$
X == Y \Rightarrow
\begin{bmatrix}
\text{False} & \text{True} & \text{False} & \text{True} \\
\text{False} & \text{False} & \text{False} & \text{False} \\
\text{False} & \text{False} & \text{False} & \text{False}
\end{bmatrix}
$$

For each position \(i, j\), if \(X[i, j]\) is equal to \(Y[i, j]\), the corresponding entry in the result takes value \(1\); otherwise, it takes value \(0\).

### Summation of Elements

Summing all the elements in tensor \(X\):
$$
\text{Sum of elements in } X = X.sum() = \text{tensor}(66.)
$$


In [13]:
import torch

# Create tensor X
X = torch.arange(12, dtype=torch.float32).reshape((3, 4))  # Shape: (3, 4)

# Create tensor Y
Y = torch.tensor([[2.0, 1, 4, 3], 
                  [1, 2, 3, 4], 
                  [4, 3, 2, 1]])  # Shape: (3, 4)

# Concatenate along dimension 0 (rows)
result_dim0 = torch.cat((X, Y), dim=0)
print("Concatenated along dim=0:\n", result_dim0)

# Concatenate along dimension 1 (columns)
result_dim1 = torch.cat((X, Y), dim=1)
print("Concatenated along dim=1:\n", result_dim1)

# Element-wise logical comparison
logical_tensor = X == Y
print("Element-wise equality:\n", logical_tensor)

# Sum all elements in tensor X
sum_result = X.sum()
print("Sum of elements in X:", sum_result)


Concatenated along dim=0:
 tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [ 2.,  1.,  4.,  3.],
        [ 1.,  2.,  3.,  4.],
        [ 4.,  3.,  2.,  1.]])
Concatenated along dim=1:
 tensor([[ 0.,  1.,  2.,  3.,  2.,  1.,  4.,  3.],
        [ 4.,  5.,  6.,  7.,  1.,  2.,  3.,  4.],
        [ 8.,  9., 10., 11.,  4.,  3.,  2.,  1.]])
Element-wise equality:
 tensor([[0, 1, 0, 1],
        [0, 0, 0, 0],
        [0, 0, 0, 0]], dtype=torch.uint8)
Sum of elements in X: tensor(66.)


### 2.1.4 Broadcasting 

By now, you know how to perform elementwise binary operations on two tensors of the same shape. Under certain conditions, even when shapes diﬀer, we can still perform elementwise binary operations by invoking the broadcasting mechanism. Broadcasting works according to the following two-step procedure: (i) expand one or both arrays by copying elements along axes with length 1 so that after this transformation, the two tensors have the same shape; (ii) perform an elementwise operation on the resulting arrays.

## Tensor Operations in PyTorch

Given two tensors:

$$
X = \text{torch.arange}(12, \text{dtype=torch.float32}).\text{reshape}((3, 4))
$$
$$
Y = \text{torch.tensor}([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
$$

### Concatenation

When we concatenate the tensors along different dimensions, we can observe the following:

1. **Concatenating along axis 0 (rows)**:
   - The axis-0 lengths of the two tensors are \(3\) and \(3\). Thus, the resulting axis-0 length will be:
   $$
   \text{axis-0 length} = 3 + 3 = 6
   $$

   Resulting tensor:
   $$
   \text{torch.cat}((X, Y), \text{dim}=0) = 
   \begin{bmatrix}
   0. & 1. & 2. & 3. \\
   4. & 5. & 6. & 7. \\
   8. & 9. & 10. & 11. \\
   2. & 1. & 4. & 3. \\
   1. & 2. & 3. & 4. \\
   4. & 3. & 2. & 1.
   \end{bmatrix}
   $$

2. **Concatenating along axis 1 (columns)**:
   - The axis-1 lengths of the two tensors are \(4\) and \(4\). Thus, the resulting axis-1 length will be:
   $$
   \text{axis-1 length} = 4 + 4 = 8
   $$

   Resulting tensor:
   $$
   \text{torch.cat}((X, Y), \text{dim}=1) = 
   \begin{bmatrix}
   0. & 1. & 2. & 3. & 2. & 1. & 4. & 3. \\
   4. & 5. & 6. & 7. & 1. & 2. & 3. & 4. \\
   8. & 9. & 10. & 11. & 4. & 3. & 2. & 1.
   \end{bmatrix}
   $$

### Logical Comparison

We can construct a binary tensor by comparing the elements of \(X\) and \(Y\):
$$
X == Y \Rightarrow
\begin{bmatrix}
\text{False} & \text{True} & \text{False} & \text{True} \\
\text{False} & \text{False} & \text{False} & \text{False} \\
\text{False} & \text{False} & \text{False} & \text{False}
\end{bmatrix}
$$

For each position \(i, j\), if \(X[i, j]\) is equal to \(Y[i, j]\), the corresponding entry in the result takes value \(1\); otherwise, it takes value \(0\).

### Summation of Elements

Summing all the elements in tensor \(X\):
$$
\text{Sum of elements in } X = X.sum() = \text{tensor}(66.)
$$


In [14]:
import torch

# Create tensor X
X = torch.arange(12, dtype=torch.float32).reshape((3, 4))  # Shape: (3, 4)

# Create tensor Y
Y = torch.tensor([[2.0, 1, 4, 3], 
                  [1, 2, 3, 4], 
                  [4, 3, 2, 1]])  # Shape: (3, 4)

# Concatenate along dimension 0 (rows)
result_dim0 = torch.cat((X, Y), dim=0)
print("Concatenated along dim=0:\n", result_dim0)

# Concatenate along dimension 1 (columns)
result_dim1 = torch.cat((X, Y), dim=1)
print("Concatenated along dim=1:\n", result_dim1)

# Element-wise logical comparison
logical_tensor = X == Y
print("Element-wise equality:\n", logical_tensor)

# Sum all elements in tensor X
sum_result = X.sum()
print("Sum of elements in X:", sum_result)


Concatenated along dim=0:
 tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [ 2.,  1.,  4.,  3.],
        [ 1.,  2.,  3.,  4.],
        [ 4.,  3.,  2.,  1.]])
Concatenated along dim=1:
 tensor([[ 0.,  1.,  2.,  3.,  2.,  1.,  4.,  3.],
        [ 4.,  5.,  6.,  7.,  1.,  2.,  3.,  4.],
        [ 8.,  9., 10., 11.,  4.,  3.,  2.,  1.]])
Element-wise equality:
 tensor([[0, 1, 0, 1],
        [0, 0, 0, 0],
        [0, 0, 0, 0]], dtype=torch.uint8)
Sum of elements in X: tensor(66.)


## Saving Memory

Running operations can cause new memory to be allocated to host results. For example, if we write 

$$
Y = X + Y
$$

we dereference the tensor that $Y$ used to point to and instead point $Y$ at the newly allocated memory. We can demonstrate this issue with Python’s `id()` function, which gives us the exact address of the referenced object in memory. 

Note that after we run 

$$
Y = Y + X
$$ 

the `id(Y)` points to a different location. This is because Python first evaluates $Y + X$, allocating new memory for the result and then points $Y$ to this new location in memory.

```python
before = id(Y)
Y = Y + X
print(id(Y) == before)  # Output: False


In [17]:

### Code Example:
#Here’s a code example to demonstrate in-place operations:


import torch

# Create tensor X
X = torch.tensor([1.0, 2.0, 3.0])

# Create tensor Y
Y = torch.tensor([4.0, 5.0, 6.0])

# Before updating Y
before_id = id(Y)

# In-place operation
Y[:] = Y + X

# After updating Y
after_id = id(Y)

# Check if the ID changed
print("ID before:", before_id)
print("ID after:", after_id)
print("Y after in-place operation:", Y)


ID before: 139778107280672
ID after: 139778107280672
Y after in-place operation: tensor([5., 7., 9.])


## 2.1.6 Conversion to Other Python Objects

Converting to a NumPy array (ndarray), or vice versa, is easy. The PyTorch tensor and NumPy array will share their underlying memory, and changing one through an in-place operation will also change the other.

```python
# Example conversion to NumPy and back to a tensor
import torch

# Create a PyTorch tensor
X = torch.tensor([1.0, 2.0, 3.0])

# Convert to NumPy array
A = X.numpy()

# Convert back to a PyTorch tensor
B = torch.from_numpy(A)

# Check the types
print(type(A), type(B))  # Output: (numpy.ndarray, torch.Tensor)


## 2.2 Data Preprocessing

So far, we have been working with synthetic data that arrived in ready-made tensors. However, to apply deep learning in the wild, we must extract messy data stored in arbitrary formats and preprocess it to suit our needs. Fortunately, the `pandas` library can do much of the heavy lifting. This section, while no substitute for a proper pandas tutorial, will give you a crash course on some of the most common routines.

### 2.2.1 Reading the Dataset

Comma-separated values (CSV) files are ubiquitous for storing tabular (spreadsheet-like) data. Here, each line corresponds to one record and consists of several (comma-separated) fields, e.g., “Albert Einstein, March 14 1879, Ulm, Federal polytechnic school, Accomplishments in the field of gravitational physics.”

To demonstrate how to load CSV files with `pandas`, we will create a CSV file below `../data/house_tiny.csv`. This file represents a dataset of homes, where each row corresponds to a distinct home and the columns correspond to the number of rooms (`NumRooms`), the roof type (`RoofType`), and the price (`Price`).

```python
import os

# Create the directory for the data file
os.makedirs(os.path.join('..', 'data'), exist_ok=True)

# Define the file path
data_file = os.path.join('..', 'data', 'house_tiny.csv')

# Write the CSV file
with open(data_file, 'w') as f:
    f.write('''NumRooms,RoofType,Price
NA,NA,127500
2,NA,106000
4,Slate,178100
NA,NA,140000
''')


In [20]:
import os

# Create the directory for the data file
os.makedirs(os.path.join('..', 'data'), exist_ok=True)

# Define the file path
data_file = os.path.join('..', 'data', 'house_tiny.csv')

# Write the CSV file
with open(data_file, 'w') as f:
    f.write('''NumRooms,RoofType,Price
NA,NA,127500
2,NA,106000
4,Slate,178100
NA,NA,140000
''')

In [21]:
import pandas as pd 
data = pd.read_csv(data_file) 
print(data)

   NumRooms RoofType   Price
0       NaN      NaN  127500
1       2.0      NaN  106000
2       4.0    Slate  178100
3       NaN      NaN  140000


## Data Preparation 

In supervised learning, we train models to predict a designated target value, given some set of input values. Our ﬁrst step in processing the dataset is to separate out columns corresponding to input versus target values. We can select columns either by name or via integer-location based indexing (iloc). You might have noticed that pandas replaced all CSV entries with value NA with a special NaN (not a number) value. This can also happen whenever an entry is empty, e.g., “3,,,270000”. These are called missing values and they are the “bed bugs” of data science, a persistent menace that you will confront throughout your career. Depending upon the context, missing values might be handled either via imputation or deletion. Imputation replaces missing val- ues with estimates of their values while deletion simply discards either those rows or those columns that contain missing values. Here are some common imputation heuristics. For categorical input ﬁelds, we can treat NaN as a category. Since the RoofType column takes values Slate and NaN, pandas can convert this column into two columns RoofType_Slate and RoofType_nan. A row whose roof type is Slate will set values of RoofType_Slate and RoofType_nan to 1 and 0, respectively. The converse holds for a row with a missing RoofType value.

In [22]:
import pandas as pd

# Load the dataset from the CSV file
data_file = '../data/house_tiny.csv'
data = pd.read_csv(data_file)

# Separate inputs and targets
inputs, targets = data.iloc[:, 0:2], data.iloc[:, 2]

# Apply one-hot encoding to the inputs
inputs = pd.get_dummies(inputs, dummy_na=True)

# Display the processed inputs
print(inputs)


   NumRooms  RoofType_Slate  RoofType_nan
0       NaN               0             1
1       2.0               0             1
2       4.0               1             0
3       NaN               0             1


In [23]:
# Fill missing values with the mean of each column
inputs = inputs.fillna(inputs.mean())

# Display the updated inputs
print(inputs)


   NumRooms  RoofType_Slate  RoofType_nan
0       3.0               0             1
1       2.0               0             1
2       4.0               1             0
3       3.0               0             1


### Conversion to the Tensor Format

Now that all the entries in `inputs` and `targets` are numerical, we can load them into tensors (recall Section 2.1).

```python
import torch

# Convert the inputs and targets to tensors
X = torch.tensor(inputs.values)
y = torch.tensor(targets.values)

# Display the tensors
print(X, y)


In [24]:
import torch

# Convert the inputs and targets to tensors
X = torch.tensor(inputs.values)
y = torch.tensor(targets.values)

# Display the tensors
print(X, y)

tensor([[3., 0., 1.],
        [2., 0., 1.],
        [4., 1., 0.],
        [3., 0., 1.]], dtype=torch.float64) tensor([127500, 106000, 178100, 140000])


## Linear Algebra

By now, we can load datasets into tensors and manipulate these tensors with basic mathematical operations. To start building sophisticated models, we will also need a few tools from linear algebra. This section offers a gentle introduction to the most essential concepts, starting from scalar arithmetic and ramping up to matrix multiplication.

### Scalars

Most everyday mathematics consists of manipulating numbers one at a time. Formally, we call these values **scalars**. For example, the temperature in Palo Alto is a balmy 72 degrees Fahrenheit. If you wanted to convert the temperature to Celsius, you would evaluate the expression:

$$
c = \frac{5}{9} (f - 32)
$$

setting $ f $ to 72. In this equation, the values 5, 9, and 32 are scalars. The variables $ c $ and $ f $ represent unknown scalars.

We denote scalars by ordinary lower-cased letters (e.g., $ x, y, z $) and the space of all (continuous) real-valued scalars by $ \mathbb{R} $. For expedience, we will skip past rigorous definitions of spaces. Just remember that the expression $ x \in \mathbb{R} $ is a formal way to say that $ x $ is a real-valued scalar. The symbol $ \in $ (pronounced “in”) denotes membership in a set. For example, $ x, y \in \{0, 1\} $ indicates that $ x $ and $ y $ are variables that can only take values 0 or 1.


In [25]:
#Scalars are implemented as tensors that contain only one element. Below, we assign two scalars and perform the familiar addition, multiplication, division, and exponentiation operations.

#```python
import torch

# Assign two scalars
x = torch.tensor(3.0)
y = torch.tensor(2.0)

# Perform operations
addition = x + y
multiplication = x * y
division = x / y
exponentiation = x ** y

# Display the results
#addition, multiplication, division, exponentiation


###  Vectors

For our purposes, you can think of vectors as fixed-length arrays of scalars. As with their code counterparts, we call these values the **elements** of the vector (synonyms include entries and components).

When vectors represent examples from real-world datasets, their values hold some real-world significance. For example, if we were training a model to predict the risk of a loan defaulting, we might associate each applicant with a vector whose components correspond to quantities like their income, length of employment, or number of previous defaults. If we were studying heart attack risk, each vector might represent a patient, and its components might correspond to their most recent vital signs, cholesterol levels, minutes of exercise per day, etc.

We denote vectors by bold lowercase letters (e.g., $\mathbf{x}, \mathbf{y}, \mathbf{z}$). Vectors are implemented as 1st-order tensors. In general, such tensors can have arbitrary lengths, subject to memory limitations.

**Caution**: In Python, like in most programming languages, vector indices start at 0 (zero-based indexing), whereas in linear algebra, subscripts begin at 1 (one-based indexing).

![image.png](attachment:image.png)


```python
import torch

# Create a vector using arange
x = torch.arange(3)

# Display the vector
print(x)


To indicate that a vector contains $ n $ elements, we write $$ \mathbf{x} \in \mathbb{R}^n $$. Formally, we call $ n $ the dimensionality of the vector. In code, this corresponds to the tensor’s length, accessible via Python’s built-in `len` function:

$$
\text{len}(\mathbf{x}) = 3
$$

We can also access the length via the `shape` attribute. The shape is a tuple that indicates a tensor’s length along each axis. Tensors with just one axis have shapes with just one element:

$$
\mathbf{x.shape} = \text{torch.Size}([3])
$$

Oftentimes, the word “dimension” gets overloaded to mean both the number of axes and the length along a particular axis. To avoid this confusion, we use **order** to refer to the number of axes and **dimensionality** exclusively to refer to the number of components.

### Matrices

Just as scalars are 0th-order tensors and vectors are 1st-order tensors, matrices are 2nd-order tensors. We denote matrices by bold capital letters (e.g., $$\mathbf{X}$$, $$\mathbf{Y}$$, and $$\mathbf{Z}$$), and represent them in code by tensors with two axes. The expression 

$$ 
\mathbf{A} \in \mathbb{R}^{m \times n} 
$$

indicates that a matrix $ \mathbf{A} $ contains $ m \times n $ real-valued scalars, arranged as $ m $ rows and $ n $ columns. When $ m = n $, we say that a matrix is square. Visually, we can illustrate any matrix as a table. To refer to an individual element, we subscript both the row and column indices, e.g., $ a_{ij} $ is the value that belongs to $ \mathbf{A} $’s $ i $-th row and $ j $-th column:

$$
\mathbf{A} = 
\begin{bmatrix}
a_{11} & a_{12} & \cdots & a_{1n} \\
a_{21} & a_{22} & \cdots & a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m1} & a_{m2} & \cdots & a_{mn}
\end{bmatrix}
$$

In code, we represent a matrix $ \mathbf{A} \in \mathbb{R}^{m \times n} $ by a 2nd-order tensor with shape $ (m, n) $. We can convert any appropriately sized $ m \times n $ tensor into an $ m \times n $ matrix by passing the desired shape to `reshape`.


Let 
$$ 
\mathbf{A} = \text{torch.arange}(6).reshape(3, 2) 
$$ 
which results in the tensor 

$$ 
\mathbf{A} = 
\begin{bmatrix}
0 & 1 \\
2 & 3 \\
4 & 5
\end{bmatrix} 
$$ 

Sometimes, we want to flip the axes. When we exchange a matrix’s rows and columns, the result is called its transpose. Formally, we signify a matrix \( \mathbf{A} \)’s transpose by \( \mathbf{A}^\top \) and if \( \mathbf{B} = \mathbf{A}^\top \), then 

$$ 
b_{ij} = a_{ji} 
$$ 

for all \( i \) and \( j \). Thus, the transpose of an \( m \times n \) matrix is an \( n \times m \) matrix:

$$
\mathbf{A}^\top = 
\begin{bmatrix}
a_{11} & a_{21} & \cdots & a_{m1} \\
a_{12} & a_{22} & \cdots & a_{m2} \\
\vdots & \vdots & \ddots & \vdots \\
a_{1n} & a_{2n} & \cdots & a_{mn}
\end{bmatrix}
$$

In code, we can access any matrix’s transpose as follows:

$$ 
\mathbf{A}.T 
$$ 

which results in the tensor 

$$ 
\begin{bmatrix}
0 & 2 & 4 \\
1 & 3 & 5
\end{bmatrix} 
$$ 

Symmetric matrices are the subset of square matrices that are equal to their own transposes: 

$$ 
\mathbf{A} = \mathbf{A}^\top 
$$ 

The following matrix is symmetric: 

$$ 
\mathbf{A} = \text{torch.tensor}\left( \begin{bmatrix}
1 & 2 & 3 \\
2 & 0 & 4 \\
3 & 4 & 5
\end{bmatrix} \right) 
$$ 

We can verify symmetry with:

$$ 
\mathbf{A} == \mathbf{A}^\top 
$$ 

resulting in 

$$ 
\begin{bmatrix}
\text{True} & \text{True} & \text{True} \\
\text{True} & \text{True} & \text{True} \\
\text{True} & \text{True} & \text{True}
\end{bmatrix} 
$$ 

Matrices are useful for representing datasets. Typically, rows correspond to individual records and columns correspond to distinct attributes.
