<a href="https://colab.research.google.com/github/gitmystuff/DTSC5502/blob/main/Module_12-Learning/Basic_PyTorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Basic PyTorch

Your name somewhere

## Runtime Processing

Here's a breakdown of the differences between CPUs, T4 GPUs, and TPU v2-8s:

**CPU (Central Processing Unit)**

* **The Brain:** This is the general-purpose processor found in every computer. It handles all the basic operations of the system, from running your operating system and applications to executing calculations and managing data.
* **Sequential Processing:** CPUs are designed to handle a wide variety of tasks, but they typically excel at sequential processing, executing instructions one after another.
* **Limited Cores:** CPUs have a relatively small number of cores (processing units), typically ranging from 4 to 64 in consumer-grade processors.

**T4 GPU (Graphics Processing Unit)**

* **Parallel Powerhouse:**  Originally designed for graphics rendering, GPUs have evolved into powerful parallel processors. They excel at handling tasks that can be broken down into many smaller, simultaneous operations.
* **Massive Cores:** GPUs contain thousands of cores, allowing them to perform massive parallel computations.
* **Deep Learning Applications:** This parallel processing power makes GPUs well-suited for deep learning tasks like image recognition, natural language processing, and scientific simulations.
* **NVIDIA Tesla T4:**  The T4 is a specific GPU model from NVIDIA's Tesla series, designed for high-performance computing and AI workloads. It offers a good balance of performance and power efficiency.

**TPU v2-8 (Tensor Processing Unit)**

* **Google's AI Specialist:** TPUs are custom-designed processors developed by Google specifically for machine learning and AI workloads.
* **Matrix Multiplication Focus:** TPUs are optimized for matrix multiplication, a core operation in deep learning algorithms.
* **High Throughput:** They offer very high throughput for matrix operations, leading to faster training and inference of deep learning models.
* **TPU v2-8:** This refers to a specific generation and configuration of TPUs. It typically consists of multiple TPU chips interconnected to provide massive computational power.
* **TensorFlow Integration:** TPUs are tightly integrated with Google's TensorFlow framework, providing optimized performance for TensorFlow models.

**Key Differences**

| Feature | CPU | T4 GPU | TPU v2-8 |
|---|---|---|---|
| **Primary Purpose** | General-purpose processing | Graphics and parallel processing | Machine learning and AI |
| **Architecture** | Sequential processing, few cores | Parallel processing, many cores | Matrix multiplication focus, high throughput |
| **Strengths** | Versatile, handles various tasks | High performance for parallel workloads | Extremely fast for deep learning |
| **Weaknesses** | Limited parallel processing power | Can be power-hungry | Specialized for TensorFlow |

**When to Use Each**

* **CPU:** For general-purpose tasks, running applications, and tasks that don't require massive parallel computation.
* **T4 GPU:** For deep learning, scientific simulations, graphics rendering, and other tasks that benefit from parallel processing.
* **TPU v2-8:** For large-scale deep learning training and inference, especially with TensorFlow models.

**In Summary**

CPUs, T4 GPUs, and TPU v2-8s are all processors designed for different purposes and with varying strengths. CPUs are general-purpose workhorses, GPUs are parallel powerhouses, and TPUs are specialized for AI. Choosing the right processor depends on the specific workload and requirements.

In [5]:
import torch
import torch.nn as nn
import torch.optim as optim

# the neural net
class Net(nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.fc1 = nn.Linear(4, 5)
    self.fc2 = nn.Linear(5, 4)
    self.fc3 = nn.Linear(4, 1)

  def forward(self, x):
    x = torch.sigmoid(self.fc1(x))
    x = torch.sigmoid(self.fc2(x))
    x = torch.sigmoid(self.fc3(x))
    return x

X = torch.tensor([[0.1, 0.2, 0.3, 0.4], [0.5, 0.6, 0.7, 0.8]])
y = torch.tensor([[0], [1]], dtype=torch.float32)

net = Net()
criterion = nn.MSELoss()
optimizer = optim.SGD(net.parameters(), lr=0.1)

epochs = 10000
for epoch in range(epochs):
  outputs = net(X)
  loss = criterion(outputs, y)
  optimizer.zero_grad()
  loss.backward()
  optimizer.step()

  if epoch % 1000 == 0:
    print(f"Epoch {epoch}, Loss: {loss.item()}")

# X = torch.tensor([[0.1, 0.2, 0.3, 0.4], [0.5, 0.6, 0.7, 0.8]])
predictions = net(X)
# print(predictions)
print("Final Predictions:", [list(arr) for arr in predictions])


Epoch 0, Loss: 0.25482961535453796
Epoch 1000, Loss: 0.24583175778388977
Epoch 2000, Loss: 0.22925221920013428
Epoch 3000, Loss: 0.09875045716762543
Epoch 4000, Loss: 0.013522750698029995
Epoch 5000, Loss: 0.005175087600946426
Epoch 6000, Loss: 0.0029665709007531404
Epoch 7000, Loss: 0.002019819337874651
Epoch 8000, Loss: 0.0015089728403836489
Epoch 9000, Loss: 0.001194087089970708
Final Predictions: [[tensor(0.0316, grad_fn=<UnbindBackward0>)], [tensor(0.9689, grad_fn=<UnbindBackward0>)]]


In [6]:
X = torch.tensor([[0.1, 0.2, 0.3, 0.4], [0.5, 0.6, 0.7, 0.8]])
predictions = net(X)
print(predictions)
print("Final Predictions:", [list(arr) for arr in predictions])

tensor([[0.0316],
        [0.9689]], grad_fn=<SigmoidBackward0>)
Final Predictions: [[tensor(0.0316, grad_fn=<UnbindBackward0>)], [tensor(0.9689, grad_fn=<UnbindBackward0>)]]


In [9]:
nu_X = torch.tensor([0.3, 0.4, 0.5, 0.6], dtype=torch.float32)

nu_predictions = net(nu_X)
print(nu_predictions)
# Instead of trying to iterate, directly convert the tensor to a list:
print("Final Predictions:", [nu_predictions.item()])

tensor([0.6115], grad_fn=<SigmoidBackward0>)
Final Predictions: [0.6114510297775269]


In PyTorch, a `torch.Tensor` is a multi-dimensional array that is the fundamental building block for all operations and models. You can think of it as the PyTorch equivalent of a NumPy array, but with some key advantages for deep learning.

Here's a breakdown of what makes `torch.Tensor` special:

**1. GPU Support**

* **CUDA Integration:** One of the primary benefits of PyTorch tensors is their seamless integration with NVIDIA GPUs. This allows you to perform computations on the GPU, significantly accelerating training and inference of deep learning models.
* **Automatic Transfers:** PyTorch can automatically transfer tensors between the CPU and GPU as needed, simplifying the process of utilizing GPU resources.

**2. Automatic Differentiation**

* **`autograd` Package:** PyTorch tensors are integrated with the `autograd` package, which enables automatic differentiation. This means PyTorch can automatically calculate gradients (derivatives) of operations performed on tensors, which is crucial for training neural networks using gradient-based optimization algorithms.

**3. Optimized Operations**

* **Efficient Computations:** PyTorch tensors are optimized for efficient numerical computation. They provide a wide range of built-in functions for tensor manipulation, linear algebra, and other mathematical operations commonly used in deep learning.

**4. Neural Network Building Blocks**

* **`torch.nn` Module:** PyTorch tensors are used extensively in the `torch.nn` module, which provides a collection of pre-built layers, activation functions, and other components for building neural networks.

**5. Dynamic Computation Graph**

* **Define-by-Run:** PyTorch uses a dynamic computation graph, which means the graph is constructed as you execute operations. This allows for flexibility in defining and modifying models during runtime, which is particularly useful for research and experimentation.

**Creating Tensors**

You can create PyTorch tensors in various ways:

```python
import torch

# From a list
tensor_from_list = torch.tensor([1, 2, 3, 4])

# From a NumPy array
numpy_array = np.array([5, 6, 7, 8])
tensor_from_numpy = torch.tensor(numpy_array)

# With random values
random_tensor = torch.randn(3, 4)  # Creates a 3x4 tensor with random values

# With zeros
zeros_tensor = torch.zeros(2, 5)  # Creates a 2x5 tensor filled with zeros
```

**Key Takeaways**

* `torch.Tensor` is the fundamental data structure in PyTorch for numerical computation.
* It offers GPU support, automatic differentiation, optimized operations, and integration with neural network modules.
* PyTorch tensors are essential for building and training deep learning models efficiently.

```python
criterion = nn.MSELoss()  # Mean Squared Error loss
optimizer = optim.SGD(net.parameters(), lr=0.1)  # Stochastic Gradient Descent
```

These two lines of code are essential for training your neural network in PyTorch. They define the **loss function** and the **optimizer** that will be used to update the network's parameters during training.

**`criterion = nn.MSELoss()`**

* **Loss Function:** A loss function measures the difference between the network's predictions and the actual target values. It quantifies the error that the network is making.
* **`nn.MSELoss()`:** This creates an instance of the `MSELoss` class from PyTorch's `nn` module. `MSELoss` stands for Mean Squared Error Loss, a common loss function for regression tasks. It calculates the average of the squared differences between the predicted and target values.
* **Purpose:** The `criterion` (loss function) will be used during training to calculate the loss between the network's output and the true labels. This loss value guides the optimization process, indicating how well the network is performing and how the parameters should be adjusted to improve accuracy.

**`optimizer = optim.SGD(net.parameters(), lr=0.1)`**

* **Optimizer:** An optimizer is an algorithm that adjusts the network's parameters (weights and biases) to minimize the loss function.
* **`optim.SGD(...)`:** This creates an instance of the `SGD` class from PyTorch's `optim` module. `SGD` stands for Stochastic Gradient Descent, a widely used optimization algorithm.
* **`net.parameters()`:** This provides the optimizer with the parameters of your neural network (`net`) that need to be updated during training.
* **`lr=0.1`:** This sets the learning rate for the optimizer. The learning rate controls the step size taken in the direction of the negative gradient during each iteration of the optimization process.

**In Summary**

* **`criterion`:** Defines the loss function (Mean Squared Error) to measure the network's prediction error.
* **`optimizer`:** Defines the optimization algorithm (Stochastic Gradient Descent) to update the network's parameters and minimize the loss.

These two components work together during the training loop:

1. **Forward Pass:** The input data is passed through the network to generate predictions.
2. **Loss Calculation:** The `criterion` (loss function) is used to calculate the error between the predictions and the true labels.
3. **Backward Pass:** The `optimizer` calculates the gradients of the loss with respect to the parameters.
4. **Parameter Update:** The `optimizer` updates the network's parameters based on the calculated gradients and the learning rate.

This iterative process continues for a specified number of epochs, gradually improving the network's accuracy by minimizing the loss function.