<h1> CS506 Programming for Computing </h1>
<h2> Pytorch Tutorial </h2>
<h3> Team Members: Sukhmani Thukral,  Lanxi Luo, Shruti Arvind Kherade <h3>

This tutorial provides a comprehensive guide to using PyTorch for image classification on the MNIST dataset. You'll learn how to work with tensors, load and transform data, define neural networks, and train models using `torch.autograd`. The MNIST dataset consists of 70,000 grayscale images of handwritten digits (0â€“9), each of size 28x28 pixels.

Topics we will cover:
- Introduction to PyTorch and its Applications
- Tensors
- Autograd and Gradients
- Datasets and Data Loading
- Transforms
- Brief Introduction to Neural Network
- Building Neural Network
- Training Neural Network using torch.autograd

# (1) Introduction to PyTorch and its Applications

PyTorch is a popular open-source deep learning framework developed by Facebook's AI Research lab. It provides a flexible and efficient platform for building and training neural networks, making it popular among researchers and practitioners in machine learning and artificial intelligence.

## Key Features of PyTorch
- **Dynamic Computation Graphs:** PyTorch uses dynamic computation graphs (also known as define-by-run), allowing for more flexibility during model development and debugging.
- **Tensor Computation:** PyTorch offers a powerful N-dimensional array (tensor) library, similar to NumPy, with strong GPU acceleration.
- **Automatic Differentiation:** The `autograd` module automatically computes gradients, simplifying the process of backpropagation in neural networks.
- **Extensive Libraries:** PyTorch includes libraries for vision (`torchvision`), text (`torchtext`), and audio (`torchaudio`) tasks.

## Applications of PyTorch
PyTorch is widely used in various domains, including:
- **Computer Vision:** Image classification, object detection, segmentation, and style transfer.
- **Natural Language Processing (NLP):** Text classification, sentiment analysis, machine translation, and language modeling.
- **Reinforcement Learning:** Training agents for games, robotics, and decision-making tasks.
- **Generative Models:** Building GANs (Generative Adversarial Networks) and VAEs (Variational Autoencoders).
- **Scientific Computing:** Simulations, time-series forecasting, and other research applications.

PyTorch's ease of use, strong community support, and integration with Python make it a preferred choice for both academic research and industry applications.

## (2) Tensors
Tensors are the fundamental data structures in PyTorch, similar to NumPy arrays but with powerful GPU acceleration and gradient tracking capabilities.

### Tensor Creation
You can create tensors from Python lists or using built-in PyTorch methods.

In [2]:
import torch

# Creating tensors
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])
print("Tensor a:", a)
print("Tensor b:", b)

Tensor a: tensor([1., 2., 3.])
Tensor b: tensor([4., 5., 6.])


### Arithmetic Operations
Tensors support standard arithmetic operations like addition, subtraction, multiplication, and dot product.

In [3]:
print("Addition:", a + b)
print("Dot product:", torch.dot(a, b))

Addition: tensor([5., 7., 9.])
Dot product: tensor(32.)


### Shape and Data Types
Tensors have attributes for checking their shape and data types.

In [4]:
print("Shape of a:", a.shape)
print("Data type of a:", a.dtype)

Shape of a: torch.Size([3])
Data type of a: torch.float32


## (3) Autograd and Gradients
PyTorch uses `autograd` to automatically compute gradients. This is useful for training neural networks using backpropagation.

### requires_grad
To track operations on tensors for automatic differentiation, set `requires_grad=True`.

In [5]:
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = torch.tensor([4.0, 5.0, 6.0], requires_grad=True)

### .backward()
Computes the gradients of a scalar output with respect to input tensors.

In [None]:
z = torch.dot(x, y)
z.backward()

### Accessing .grad
Once `.backward()` is called, you can access the gradient using the `.grad` attribute.

In [6]:
print("Gradient of x:", x.grad)

Gradient of x: None


## (4) Datasets and Data Loading
PyTorch uses `torchvision.datasets` and `DataLoader` to handle data.

In [1]:
!pip install torchvision
from torchvision import datasets
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor

train_data = datasets.MNIST(root="data", train=True, download=True, transform=ToTensor())
test_data = datasets.MNIST(root="data", train=False, download=True, transform=ToTensor())

train_loader = DataLoader(train_data, batch_size=64, shuffle=True)
test_loader = DataLoader(test_data, batch_size=64, shuffle=False)


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m


100.0%
100.0%
100.0%
100.0%


## (5) Transforms
We normalize the dataset to have zero mean and unit variance. This helps the network converge faster.

In [2]:
from torchvision import transforms

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

train_data = datasets.MNIST(root="data", train=True, download=True, transform=transform)
test_data = datasets.MNIST(root="data", train=False, download=True, transform=transform)

train_loader = DataLoader(train_data, batch_size=64, shuffle=True)
test_loader = DataLoader(test_data, batch_size=64, shuffle=False)

In [None]:


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# Step 2: Load the animal dataset (replace with your Kaggle CSV path or URL)

data = {
    'Animal': ['cat', 'dog', 'rabbit', 'cat', 'dog', 'rabbit', 'cat', 'dog', 'rabbit'],
    'Weight': [4, 20, 2, 5, 22, 2.5, 4.5, 21, 3],
    'Color': ['black', 'brown', 'white', 'white', 'black', 'brown', 'brown', 'white', 'black']
}
df = pd.DataFrame(data)
print("Sample Data:")
print(df)

# Step 3: Encode categorical variables
le_color = LabelEncoder()
df['Color_encoded'] = le_color.fit_transform(df['Color'])
le_animal = LabelEncoder()
df['Animal_encoded'] = le_animal.fit_transform(df['Animal'])

# Step 4: Prepare features and target
X = df[['Weight', 'Color_encoded']]
y = df['Animal_encoded']

# Step 5: Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 6: Train a classifier
clf = RandomForestClassifier(random_state=42)
clf.fit(X_train, y_train)

# Step 7: Make predictions and evaluate
y_pred = clf.predict(X_test)
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=le_animal.classes_))

# Explanation:
# - We created a sample animal dataset with features 'Weight' and 'Color'.
# - Used LabelEncoder to convert categorical data to numeric.
# - Split the data into training and testing sets.
# - Trained a RandomForestClassifier for multi-class classification.
# - Evaluated the model using a classification report.

Sample Data:
   Animal  Weight  Color
0     cat     4.0  black
1     dog    20.0  brown
2  rabbit     2.0  white
3     cat     5.0  white
4     dog    22.0  black
5  rabbit     2.5  brown
6     cat     4.5  brown
7     dog    21.0  white
8  rabbit     3.0  black

Classification Report:
              precision    recall  f1-score   support

         cat       0.00      0.00      0.00         0
         dog       0.00      0.00      0.00         2
      rabbit       1.00      1.00      1.00         1

    accuracy                           0.33         3
   macro avg       0.33      0.33      0.33         3
weighted avg       0.33      0.33      0.33         3



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
