
# Workshop: Introduction to PyTorch and Deep Learning on DNA Sequences

 In this tutorial, you'll learn how to use PyTorch to build, train, and evaluate neural networks for taxonomic classification of DNA sequences.


Let's get started! Make sure you have PyTorch installed to run the code cells in this notebook.



## 1: Introduction to PyTorch

In this part, we'll learn the fundamentals of PyTorch, starting with tensors, the building blocks for any deep learning framework.



### 1.1 Tensors and PyTorch Basics

Tensors are multi-dimensional arrays and are the primary data structure in PyTorch. You can think of them as similar to NumPy arrays, but they are optimized for GPU operations.

Let's create some basic tensors and perform operations on them.


In [1]:

import torch

# Create a tensor
tensor_a = torch.tensor([[1, 2], [3, 4]])
print("Tensor A:\n", tensor_a)

# Creating a random tensor
random_tensor = torch.rand((3, 3))
print("\nRandom Tensor:\n", random_tensor)

# Basic tensor operations
sum_tensor = tensor_a + tensor_a
print("\nSum of Tensors:\n", sum_tensor)

Tensor A:
 tensor([[1, 2],
        [3, 4]])

Random Tensor:
 tensor([[0.3419, 0.3777, 0.2065],
        [0.0987, 0.0702, 0.2707],
        [0.4607, 0.2155, 0.4208]])

Sum of Tensors:
 tensor([[2, 4],
        [6, 8]])


>### Exercise: Create your own tensors and try different operations!


### Additional Tensor Operations: Reshaping, Slicing, and Element-wise Operations

In this section, we'll explore a few more tensor operations that are commonly used in deep learning workflows.

#### Reshaping
PyTorch provides ways to change the shape of a tensor without altering its data, such as using `.view()` or `.reshape()`.

#### Slicing and Indexing
You can select parts of tensors using slicing, similar to how you would slice arrays in NumPy.

#### Element-wise Operations
PyTorch supports element-wise operations, such as addition, subtraction, multiplication, etc.


In [2]:

# Reshaping Tensors
tensor_a = torch.arange(16)  # Create a 1D tensor with values 0 to 15
print("Original Tensor:", tensor_a)

reshaped_tensor = tensor_a.view(4, 4)  # Reshape to 4x4
print("Reshaped Tensor (4x4):\n", reshaped_tensor)

# Slicing and Indexing
print("\nSelecting the first row:", reshaped_tensor[0])         # Select first row
print("Selecting the first column:", reshaped_tensor[:, 0])      # Select first column
print("Selecting a sub-tensor:\n", reshaped_tensor[1:3, 1:3])  # Select a 2x2 sub-tensor

# Element-wise Operations
tensor_b = torch.tensor([[2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5]])
print("\nTensor B:\n", tensor_b)

added_tensor = reshaped_tensor + tensor_b
print("\nElement-wise Addition:\n", added_tensor)

multiplied_tensor = reshaped_tensor * tensor_b
print("Element-wise Multiplication:\n", multiplied_tensor)

Original Tensor: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])
Reshaped Tensor (4x4):
 tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]])

Selecting the first row: tensor([0, 1, 2, 3])
Selecting the first column: tensor([ 0,  4,  8, 12])
Selecting a sub-tensor:
 tensor([[ 5,  6],
        [ 9, 10]])

Tensor B:
 tensor([[2, 2, 2, 2],
        [3, 3, 3, 3],
        [4, 4, 4, 4],
        [5, 5, 5, 5]])

Element-wise Addition:
 tensor([[ 2,  3,  4,  5],
        [ 7,  8,  9, 10],
        [12, 13, 14, 15],
        [17, 18, 19, 20]])
Element-wise Multiplication:
 tensor([[ 0,  2,  4,  6],
        [12, 15, 18, 21],
        [32, 36, 40, 44],
        [60, 65, 70, 75]])


> ### Exercise: Try reshaping, slicing, and performing element-wise operations with different dimensions!


### 1.2 The `nn` Module and Building a Simple Neural Network

PyTorch provides the `torch.nn` module to help define neural networks. Let's define a simple fully connected neural network and explore its components.

Below is a simple feed-forward neural network with a couple of layers. We'll define this model and briefly discuss the components.


In [3]:

import torch.nn as nn

# Define a simple neural network
class SimpleNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size) 
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)  
    
    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

# Initialize the model
model = SimpleNN(input_size=10, hidden_size=20, num_classes=3)
print(model)


SimpleNN(
  (fc1): Linear(in_features=10, out_features=20, bias=True)
  (relu): ReLU()
  (fc2): Linear(in_features=20, out_features=3, bias=True)
)



### 1.3 Creating a Toy Dataset

Let's create a small random dataset for classification. This will help us understand how the training loop and loss function work.


In [4]:

from sklearn.datasets import make_classification
import torch
from torch.utils.data import TensorDataset, DataLoader

# Generate a toy dataset with 3 classes
X, y = make_classification(n_samples=100, n_features=10, n_classes=3, n_informative=5)
X_tensor = torch.tensor(X, dtype=torch.float32)
y_tensor = torch.tensor(y, dtype=torch.long)

# Create a DataLoader for batching
dataset = TensorDataset(X_tensor, y_tensor)
dataloader = DataLoader(dataset, batch_size=10, shuffle=True)

# Print a sample batch
for batch in dataloader:
    print("Sample batch:", batch)
    break


Sample batch: [tensor([[-3.8080e-01, -2.1356e-03,  8.1096e-01, -1.8996e+00,  8.4224e-02,
         -2.5316e+00,  9.5041e-01,  1.2297e+00, -2.1863e+00,  6.1889e-01],
        [ 1.4225e+00, -6.4338e-01, -3.1275e+00,  2.1705e+00, -1.0950e+00,
          5.8291e-01, -1.7202e+00, -7.3819e-01, -1.6831e+00, -1.4407e+00],
        [-1.4680e+00, -7.4585e-01,  1.6721e+00, -2.2160e+00,  2.1608e+00,
         -5.9191e-01,  3.2951e-01, -1.3191e+00,  1.6106e+00, -3.8513e-01],
        [-8.9634e-02,  6.7160e-01,  4.3028e-02,  1.8721e+00, -6.0660e-02,
          3.0383e+00, -8.3419e-01, -2.5616e+00,  2.3328e+00,  1.6661e-03],
        [-2.4882e+00,  5.8942e-01,  1.7997e+00,  3.1415e-01,  3.9802e+00,
          8.5156e-02,  7.1867e-01,  1.4972e+00,  4.0995e+00, -1.3317e+00],
        [ 7.9621e-01,  4.9196e-01,  8.5864e-02, -7.2046e-01, -6.0964e-01,
         -2.0936e+00, -1.1053e-01,  1.7644e+00, -1.2465e-01,  4.8280e-01],
        [-1.3949e+00, -2.5379e-01, -1.6233e-01, -2.7198e+00,  3.6985e+00,
          6.0304e


### 1.4 Setting Up a Training Loop with Cross-Entropy Loss

The training loop involves feeding data into the model, computing the loss, performing backpropagation, and updating model parameters.

#### Understanding Cross-Entropy Loss
Cross-entropy loss is commonly used for classification tasks. It calculates the difference between the predicted probability distribution and the true distribution.

Let's set up the training loop using cross-entropy loss.


In [5]:

# Set up the loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 5
for epoch in range(num_epochs):
    for inputs, labels in dataloader:
        # Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        
        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    
    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}")


Epoch [1/5], Loss: 1.1347
Epoch [2/5], Loss: 1.1040
Epoch [3/5], Loss: 1.0667
Epoch [4/5], Loss: 1.0530
Epoch [5/5], Loss: 1.0809
