# Tensor

Here are the basic types of tensors along with their dimensions and some of their common operations:

1. Scalar (0D Tensor):
    A scalar tensor is a single value with no dimensions.
    Example: 5, -3.14
    Operations: Scalars are used as the base elements for mathematical operations.

2. 1D Tensor (Vector):
    A 1D tensor, often referred to as a vector, has one dimension.
    Example: [1, 2, 3, 4, 5]
    Operations: You can perform vector addition, subtraction, element-wise multiplication, and more.

3. 2D Tensor (Matrix):
    A 2D tensor, or matrix, has two dimensions: rows and columns.
    Example:
    [
      [1, 2, 3],
      [4, 5, 6],
      [7, 8, 9]
    ]
    Operations: Matrix multiplication, element-wise operations, transposition, and more.

3. 3D Tensor:
    A 3D tensor has three dimensions: depth, rows, and columns.
    Example:
    [
      [
        [1, 2, 3],
        [4, 5, 6]
      ],
      [
        [7, 8, 9],
        [10, 11, 12]
      ]
    ]
    Operations: Used for operations involving 3D data, such as RGB images or time series data.

Common operations on tensors include:
1. Element-wise Operations: Operations performed independently on each element of a tensor.
2. Matrix Multiplication: Computing the dot product of two matrices.
3. Transpose: Flipping a matrix along its main diagonal.
4. Indexing and Slicing: Accessing specific elements or sub-tensors within a tensor.
5. Reshaping: Changing the shape of a tensor while maintaining the same number of elements.
6. Reduction Operations: Calculating statistics like sum, mean, min, max along certain dimensions.
7. Broadcasting: Implicitly expanding tensors of smaller shape to match the shape of larger tensors.

In [36]:
#2d tensor
x = torch.ones(100,2)
print(x.shape)
print(x[1])

y = torch.ones(2,2)
print(y.shape)
print(y[1])

torch.Size([100, 2])
tensor([1., 1.])
torch.Size([2, 2])
tensor([1., 1.])


### Derivatives of tensor

In [23]:
import torch
import matplotlib.pylab as plt

this requires_grad parameter will tell pytorch that we would be using evaluating functions and derivatives of x using this value of x.

In [14]:
x = torch.tensor(2.0, requires_grad= True)

In [4]:
y=x**2

In [5]:
y.backward()

In [7]:
x.grad

tensor(4.)

In [8]:
z = x**2 + 2*x + 1

In [9]:
z.backward()

In [10]:
x.grad

tensor(10.)

### Partial Derivatives

In [16]:
u = torch.tensor(1.0, requires_grad= True)
v = torch.tensor(2.0, requires_grad= True)

In [17]:
f= u*v + u**2

In [18]:
f.backward()

In [19]:
u.grad

tensor(4.)

In [20]:
v.grad

tensor(1.)

### Differentiation wrt x at multiple values

In [26]:
x= torch.linspace(-10, 10, 20, requires_grad= True)
y= torch.sum(x**2)
y.backward()
x.grad

tensor([-20.0000, -17.8947, -15.7895, -13.6842, -11.5789,  -9.4737,  -7.3684,
         -5.2632,  -3.1579,  -1.0526,   1.0526,   3.1579,   5.2632,   7.3684,
          9.4737,  11.5789,  13.6842,  15.7895,  17.8947,  20.0000])

# Dataset and Dataloader

## Dataset:

### Dataset Class:

**1. Data Representation:**
The primary role of the Dataset class is to provide an interface for representing your data. It encapsulates your data points and makes them accessible in a structured manner. This includes loading and storing individual data points.

**2. Custom Data Loading:**
The Dataset class allows you to implement custom data loading and preprocessing logic. You can define how to read data from files, apply transformations, and prepare data points for training or evaluation.

**3. Data Access by Index:**
It enables you to access individual data points using their indices. The \_\_getitem\_\_() method in your custom dataset class defines how a specific data point is accessed.

**4. Size Information:**
The \_\_len\_\_() method in your Dataset class returns the total number of data points in the dataset, providing size information.

### DataLoader Class:
**1. Batching:**
The DataLoader class takes care of dividing your dataset into smaller batches. Batching is essential for efficient training, as models often learn better and faster when trained on batches of data rather than individual data points.

**2. Shuffling:**
It can shuffle the data within each epoch to prevent the model from learning any order-based patterns. Shuffling helps to ensure randomness and prevents biases during training.

**3. Parallel Loading:**
The DataLoader class can load and preprocess batches of data in parallel, leveraging multiple CPU cores. This accelerates the data loading process, especially for large datasets.

**4. Iterating Through Batches:**
It provides an iterator interface that allows you to easily loop through batches of data. You can use a for loop to iterate through batches in your training or evaluation loop.

**5. Convenient Usage:**
The DataLoader abstracts away the complexity of batching, shuffling, and parallel loading. It provides a convenient way to access data in a format suitable for training and evaluation.

In summary, the Dataset class focuses on representing your data and providing customized data loading logic, while the DataLoader class handles the mechanics of batching, shuffling, parallel loading, and iteration. Together, they provide an efficient and convenient way to work with large datasets during machine learning model training and evaluation.

### Example of creating dataset and applying single transform with is not mentioned in dataset class

In [8]:
import torch
from torch.utils.data import Dataset

# dataset
class toy_set(Dataset):
    def __init__(self,length=100,transform=None):
        self.x= 2 * torch.ones(length,2)
        self.y= torch.ones(length,1)
        self.len= length
        self.transform= transform
    
    def __getitem__(self,index):
        sample= self.x[index],self.y[index]
        if self.transform:
            sample= self.transform(sample)
        return sample
    
    def __len__(self):
        return self.len
    
#transform
class add_mult(object):
    def __init__(self,addx=1,muly=1):
        self.addx=addx
        self.muly=muly
    def __call__(self,sample):
        x=sample[0]
        y=sample[1]
        x=x+self.addx
        y=y*self.muly
        sample=x,y
        return sample

#create dataset object
dataset= toy_set()
#create transform object
a_m= add_mult()
#apply transform to a datapoint
print(dataset[0])
x_,y_=a_m(dataset[0])
print(x_,y_)



    

(tensor([2., 2.]), tensor([1.]))
tensor([3., 3.]) tensor([1.])


### Example of creating dataset and applying single transform automatically when we access a datapoint

In [11]:
import torch
from torch.utils.data import Dataset

# dataset
class toy_set(Dataset):
    def __init__(self,length=100,transform=None):
        self.x= 2 * torch.ones(length,2)
        self.y= torch.ones(length,1)
        self.len= length
        self.transform= transform
    
    def __getitem__(self,index):
        sample= self.x[index],self.y[index]
        if self.transform:
            sample= self.transform(sample)
        return sample
    
    def __len__(self):
        return self.len
    
#transform
class add_mult(object):
    def __init__(self,addx=1,muly=1):
        self.addx=addx
        self.muly=muly
    def __call__(self,sample):
        x=sample[0]
        y=sample[1]
        x=x+self.addx
        y=y*self.muly
        sample=x,y
        return sample

#transform object
a_m= add_mult()

#dataset object
# '_' indicates transform have been applied 
dataset_= toy_set(transform=a_m)
print(dataset_[0])


(tensor([3., 3.]), tensor([1.]))


### Transform Compose: to apply several transform from outside when we access a datapoint

In [13]:
import torch
from torch.utils.data import Dataset

# dataset
class toy_set(Dataset):
    def __init__(self,length=100,transform=None):
        self.x= 2 * torch.ones(length,2)
        self.y= torch.ones(length,1)
        self.len= length
        self.transform= transform
    
    def __getitem__(self,index):
        sample= self.x[index],self.y[index]
        if self.transform:
            sample= self.transform(sample)
        return sample
    
    def __len__(self):
        return self.len

class add_mult(object):
    def __init__(self,addx=1,muly=1):
        self.addx=addx
        self.muly=muly
    def __call__(self,sample):
        x=sample[0]
        y=sample[1]
        x=x+self.addx
        y=y*self.muly
        sample=x,y
        return sample

class mult(object):
    def __init__(self,mul=100):
        self.mul=mul
    def __call__(self,sample):
        x=sample[0]
        y=sample[1]
        x=x*self.mul
        y=y*self.mul
        sample=x,y
        return sample

from torchvision import transforms

#create a chain of transform
data_transform = transforms.Compose([add_mult(),mult()])

#create dataset object
dataset= toy_set()
print(dataset[0])
x_,y_= data_transform(dataset[0])
print(x_,y_)

(tensor([2., 2.]), tensor([1.]))
tensor([300., 300.]) tensor([100.])


### Transform Compose: to apply several transform automatically when we access a datapoint

In [15]:
import torch
from torch.utils.data import Dataset

# dataset
class toy_set(Dataset):
    def __init__(self,length=100,transform=None):
        self.x= 2 * torch.ones(length,2)
        self.y= torch.ones(length,1)
        self.len= length
        self.transform= transform
    
    def __getitem__(self,index):
        sample= self.x[index],self.y[index]
        if self.transform:
            sample= self.transform(sample)
        return sample
    
    def __len__(self):
        return self.len

class add_mult(object):
    def __init__(self,addx=1,muly=1):
        self.addx=addx
        self.muly=muly
    def __call__(self,sample):
        x=sample[0]
        y=sample[1]
        x=x+self.addx
        y=y*self.muly
        sample=x,y
        return sample

class mult(object):
    def __init__(self,mul=100):
        self.mul=mul
    def __call__(self,sample):
        x=sample[0]
        y=sample[1]
        x=x*self.mul
        y=y*self.mul
        sample=x,y
        return sample

from torchvision import transforms

#create a chain of transform
data_transform = transforms.Compose([add_mult(),mult()])

#create dataset object and pass transforms object
dataset_= toy_set(transform= data_transform)
print(dataset_[0])


(tensor([300., 300.]), tensor([100.]))


# torchvision

torchvision is a sub-library of PyTorch that provides various utilities and tools for working with computer vision tasks. One of the key functionalities of torchvision is to provide pre-built datasets and data transformation mechanisms that are commonly used in computer vision tasks. The torchvision.datasets module offers access to several popular datasets, and the torchvision.transforms module provides transformations for data preprocessing.

### torchvision dataset:

One of the significant use cases for torchvision.datasets is to provide standardized datasets that enable researchers and practitioners to compare different machine learning models in a consistent manner. These datasets act as a common ground for evaluating the performance of various algorithms and models on the same data, making comparisons more meaningful and fair.

When comparing models, using the same dataset allows researchers and practitioners to focus on the differences in algorithms and architectures rather than variations in the datasets themselves. This standardized approach enhances the credibility of the comparison and helps the community identify which methods perform better under specific conditions.

For example, when comparing image classification models, you might use datasets like MNIST, CIFAR-10, or ImageNet to evaluate how different models perform on different scales of complexity and diversity in the dataset. By using well-established datasets, you can ensure that the differences observed in model performance are likely due to the model itself and not the idiosyncrasies of the data.

Using torchvision.datasets for model comparison also streamlines the process, as you can readily access and preprocess these datasets without needing to manually curate, format, and preprocess the data yourself. This is especially valuable when you want to focus on the model's architecture, training process, and evaluation metrics.

### torchvision dataloader

When it comes to data loaders in torchvision, they are used to efficiently load and preprocess data from datasets for training, validation, and testing purposes. Data loaders make it easier to handle large datasets and manage memory effectively, while also allowing for various data augmentation and preprocessing techniques.

Here are the main features and uses of torchvision data loaders:

**1. Batching:** Data loaders automatically batch the data, which means that instead of loading the entire dataset at once, they load a small batch of samples (images and labels) into memory. This is essential for training neural networks efficiently, as it enables batch processing, which can significantly speed up training.

**2. Shuffling:** Data loaders can shuffle the dataset before each epoch during training. Shuffling the data helps prevent the model from memorizing the order of samples and enhances its generalization ability.

**3. Parallel Loading:** Data loaders can load batches in parallel using multiple workers. This helps to utilize the CPU cores efficiently and reduces the time spent waiting for data loading, which can speed up training.

**4. Data Augmentation:** torchvision provides various data augmentation techniques through the transforms module. You can apply transformations like random cropping, resizing, flipping, and color jittering to augment your dataset, which can improve the model's ability to generalize.

**5. Normalization:** You can apply normalization to the dataset using the transforms.Normalize transformation. Normalization ensures that pixel values have a consistent scale, which can help stabilize the training process.

**6. Custom Dataset Support:** In addition to pre-built datasets, you can also use data loaders with your custom datasets by creating a custom dataset class that adheres to the torch.utils.data.Dataset interface.

In summary, torchvision data loaders are a crucial tool for efficiently loading, preprocessing, and augmenting datasets for training and evaluation in computer vision tasks. They provide an effective way to manage memory usage, parallelize data loading, and apply various preprocessing techniques to enhance the performance of machine learning models.

### Download MNSIT pre- built dataset and Create dataloader

In [19]:
import torch
from torchvision import transforms
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader

#Define data transformations
transform = transforms.Compose([
    transforms.ToTensor(),
])

#Download MNSIT dataset
mnist_train_dataset = MNIST(root='./resources/data', train=True, transform=transform, download=True)
mnist_test_dataset = MNIST(root='./resources/data', train=False, transform=transform, download=True)

#Create data loaders
batch_size=64
train_loader= DataLoader(mnist_train_dataset, batch_size=batch_size, shuffle=True)
test_loader= DataLoader(mnist_test_dataset, batch_size=batch_size, shuffle=False)

