# Advanced Neural Networks - Beginner Workshop

**Course:** CMSC 173 - Machine Learning  
**Institution:** University of the Philippines - Cebu  
**Instructor:** Noel Jeffrey Pinton

---

## Workshop Overview

This hands-on workshop introduces you to modern neural network architectures through **practical examples**. You'll learn by USING pre-trained models, not building them from scratch. Perfect for beginners!

**What you'll do:**
- Use a CNN for image classification
- Generate images with a diffusion model
- Work with text using transformers
- See real-world applications

**Time:** 60-75 minutes  
**Prerequisites:** Basic Python, NumPy, basic ML concepts

## Section 1: Setup and Imports

Let's install and import everything we need. We'll use simple, beginner-friendly libraries.

In [None]:
# Install required packages (run this once)
# Uncomment if needed:
# !pip install torch torchvision transformers pillow matplotlib numpy

import warnings
warnings.filterwarnings('ignore')

import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import torch
import torchvision.transforms as transforms
import torchvision.models as models
from transformers import pipeline

# Set random seeds for reproducibility
np.random.seed(42)
torch.manual_seed(42)

# Check if GPU is available (optional - CPU works fine for this workshop)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {device}")
print(f"PyTorch version: {torch.__version__}")
print("\n✓ Setup complete! Ready to go.")

## Section 2: Motivation - Why These Architectures?

Before diving in, let's understand what makes each architecture special:

### CNNs (Convolutional Neural Networks)
- **Best for:** Images, videos, spatial data
- **Real examples:** Face recognition in your phone, medical image analysis, self-driving cars
- **Why they work:** They detect patterns like edges, textures, and shapes

### Transformers
- **Best for:** Text, sequences, language
- **Real examples:** ChatGPT, Google Translate, autocomplete
- **Why they work:** They understand context and relationships between words

### Diffusion Models
- **Best for:** Generating images from text
- **Real examples:** DALL-E 2, Midjourney, Stable Diffusion
- **Why they work:** They learn to remove noise and create realistic images

**Today's goal:** Get hands-on experience using these models!

## Section 3: Core Concepts - CNNs in Action

Let's start with CNNs. We'll use ResNet50, a powerful pre-trained model that can recognize 1000 different objects.

### What is ResNet50?
- A CNN trained on 1.2 million images
- Can recognize dogs, cats, vehicles, furniture, etc.
- Used in real apps like Google Photos

### How CNNs "See" Images
Think of a CNN as having multiple detection layers:
1. **Layer 1:** Detects edges (horizontal, vertical, diagonal)
2. **Layer 2:** Combines edges into shapes (circles, squares)
3. **Layer 3:** Combines shapes into parts (eyes, wheels, windows)
4. **Layer 4:** Recognizes complete objects (cat, car, house)

In [None]:
# Load pre-trained ResNet50 model
print("Loading ResNet50 model...")
model = models.resnet50(pretrained=True)
model.eval()  # Set to evaluation mode
print("✓ Model loaded!")

# Load ImageNet class labels
import urllib.request
import json

# Download class labels
url = "https://raw.githubusercontent.com/anishathalye/imagenet-simple-labels/master/imagenet-simple-labels.json"
try:
    with urllib.request.urlopen(url) as response:
        labels = json.loads(response.read())
    print(f"✓ Loaded {len(labels)} class labels")
except:
    print("Note: Could not load labels (offline?). Using indices instead.")
    labels = [f"Class {i}" for i in range(1000)]

# Define preprocessing transformation
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

### Create a Test Image

Let's create a simple test image to see what the model can recognize.

In [None]:
# Create a simple test image (a cat-like shape)
def create_test_image():
    """Create a simple colored test pattern"""
    img_array = np.ones((224, 224, 3), dtype=np.uint8) * 200  # Light gray background
    
    # Draw a circle (representing a face)
    center_x, center_y = 112, 112
    radius = 60
    for i in range(224):
        for j in range(224):
            if (i - center_x)**2 + (j - center_y)**2 < radius**2:
                img_array[i, j] = [255, 200, 150]  # Orange/tan color
    
    # Draw eyes
    for eye_y in [90, 90]:
        for eye_x in [90, 134]:
            for i in range(-8, 8):
                for j in range(-8, 8):
                    if i**2 + j**2 < 40:
                        img_array[eye_y + i, eye_x + j] = [50, 50, 50]  # Dark eyes
    
    return Image.fromarray(img_array)

# Create and display test image
test_img = create_test_image()
plt.figure(figsize=(4, 4))
plt.imshow(test_img)
plt.title("Our Test Image")
plt.axis('off')
plt.tight_layout()
plt.show()

print("Test image created! Now let's see what ResNet50 thinks it is...")

### Make a Prediction

Now let's use ResNet50 to classify our test image!

In [None]:
# Preprocess the image
input_tensor = preprocess(test_img)
input_batch = input_tensor.unsqueeze(0)  # Add batch dimension

# Make prediction
with torch.no_grad():
    output = model(input_batch)

# Get top 5 predictions
probabilities = torch.nn.functional.softmax(output[0], dim=0)
top5_prob, top5_idx = torch.topk(probabilities, 5)

print("\n🔍 ResNet50's Top 5 Predictions:\n")
for i, (prob, idx) in enumerate(zip(top5_prob, top5_idx)):
    print(f"{i+1}. {labels[idx]:20s} - {prob.item()*100:5.2f}%")

print("\n✓ CNN classification complete!")
print("\nNote: Results may vary - our simple test image is quite abstract!")

### Understanding the Results

**What just happened?**
1. We created a simple image
2. ResNet50 processed it through 50 layers
3. Each layer detected different features
4. The final layer combined everything to make a prediction

**Real-world applications:**
- Medical imaging (detecting tumors)
- Facial recognition (unlocking phones)
- Quality control (detecting defects in manufacturing)
- Wildlife monitoring (counting animals in photos)

## Section 4: Implementation - Transformers for Text

Now let's explore Transformers - the technology behind ChatGPT and Google Translate!

### What are Transformers?
- Neural networks designed for understanding sequences (especially text)
- Use "attention" to understand context
- Power most modern AI chatbots

### Simple Example: Sentiment Analysis
Let's use a pre-trained transformer to analyze the emotion in text.

In [None]:
# Load a sentiment analysis pipeline
print("Loading sentiment analysis model...")
try:
    sentiment_analyzer = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
    print("✓ Model loaded!\n")
    
    # Test sentences
    test_sentences = [
        "I love learning about neural networks!",
        "This workshop is really helpful and interesting.",
        "Machine learning is challenging but rewarding.",
        "I'm frustrated with this bug in my code.",
        "The weather is nice today.",
        "Neural networks are amazing tools for solving complex problems!"
    ]
    
    print("🤖 Analyzing sentiment of different sentences:\n")
    for sentence in test_sentences:
        result = sentiment_analyzer(sentence)[0]
        sentiment = result['label']
        confidence = result['score']
        
        emoji = "😊" if sentiment == "POSITIVE" else "😔"
        print(f"{emoji} {sentiment:8s} ({confidence*100:5.1f}%) - \"{sentence}\"")
    
    print("\n✓ Sentiment analysis complete!")
    
except Exception as e:
    print(f"Note: Could not load model. Error: {e}")
    print("This might happen if you're offline or have limited internet connectivity.")

### How Did It Work?

**The transformer analyzed each sentence by:**
1. Breaking it into word pieces (tokens)
2. Understanding relationships between words using "attention"
3. Determining overall emotional tone
4. Providing a confidence score

**Real-world applications:**
- Customer feedback analysis (understanding reviews)
- Social media monitoring (tracking brand sentiment)
- Content moderation (detecting toxic comments)
- Mental health apps (detecting distress signals)

### Another Transformer Task: Text Generation

Let's try another cool application - text completion!

In [None]:
# Load text generation pipeline
print("Loading text generation model (this may take a moment)...")
try:
    text_generator = pipeline("text-generation", model="distilgpt2")
    print("✓ Model loaded!\n")
    
    # Test prompts
    prompts = [
        "Machine learning is",
        "The future of artificial intelligence",
        "Neural networks can be used for"
    ]
    
    print("✍️  Generating text completions:\n")
    for prompt in prompts:
        result = text_generator(prompt, max_length=30, num_return_sequences=1, temperature=0.7)
        generated_text = result[0]['generated_text']
        print(f"Prompt: \"{prompt}\"")
        print(f"Generated: \"{generated_text}\"\n")
    
    print("✓ Text generation complete!")
    print("\nNote: This is a small model. GPT-3 and GPT-4 are much larger and more capable!")
    
except Exception as e:
    print(f"Note: Could not load model. Error: {e}")
    print("This might happen if you're offline or have limited resources.")

## Section 5: Evaluation and Comparison

Let's compare what we've learned about different architectures:

### When to Use Each Architecture

| Task | Best Architecture | Example |
|------|------------------|----------|
| Image Classification | **CNN** | Identifying objects in photos |
| Object Detection | **CNN** | Self-driving cars detecting pedestrians |
| Text Analysis | **Transformer** | Sentiment analysis, spam detection |
| Language Translation | **Transformer** | Google Translate |
| Text-to-Image | **Diffusion Model** | DALL-E, Midjourney |
| Image Editing | **Diffusion Model** | Photoshop AI features |
| Chatbots | **Transformer** | ChatGPT, customer service bots |
| Video Recognition | **CNN + Transformer** | YouTube content analysis |

### Key Takeaways

**CNNs:**
- ✅ Excellent for images and spatial data
- ✅ Fast and efficient
- ❌ Not good for text or sequences

**Transformers:**
- ✅ Best for text and language tasks
- ✅ Can handle long-range dependencies
- ❌ Computationally expensive
- ❌ Require lots of training data

**Diffusion Models:**
- ✅ Generate high-quality, realistic images
- ✅ Can edit existing images
- ❌ Slow to generate images
- ❌ Require powerful GPUs

In [None]:
# Create a visualization comparing architecture characteristics
fig, axes = plt.subplots(1, 3, figsize=(14, 4))

architectures = ['CNN', 'Transformer', 'Diffusion']
characteristics = {
    'Speed': [9, 6, 3],
    'Image Quality': [7, 4, 10],
    'Text Capability': [2, 10, 1],
    'Ease of Use': [9, 7, 6],
    'Training Cost': [5, 8, 9]
}

colors = ['#2E86AB', '#A23B72', '#F18F01']

for idx, arch in enumerate(architectures):
    ax = axes[idx]
    values = [characteristics[char][idx] for char in characteristics.keys()]
    bars = ax.barh(list(characteristics.keys()), values, color=colors[idx], alpha=0.7)
    ax.set_xlim(0, 10)
    ax.set_xlabel('Rating (0-10)', fontsize=10)
    ax.set_title(f'{arch}', fontsize=12, fontweight='bold')
    ax.grid(axis='x', alpha=0.3, linestyle='--')
    
    # Add value labels
    for bar in bars:
        width = bar.get_width()
        ax.text(width + 0.2, bar.get_y() + bar.get_height()/2, 
                f'{width:.0f}', ha='left', va='center', fontsize=9)

plt.tight_layout()
plt.savefig('../figures/architecture_comparison_workshop.png', dpi=150, bbox_inches='tight')
plt.show()

print("✓ Comparison chart created!")
print("\nNote: Ratings are approximate and depend on specific use cases.")

## Section 6: Advanced Topics - Combining Architectures

Real-world applications often combine multiple architectures! Let's see some examples:

### Example 1: Image Captioning
**Architecture:** CNN + Transformer
- CNN extracts features from the image
- Transformer generates a text description
- Used in: Accessibility tools, content moderation

### Example 2: Visual Question Answering
**Architecture:** CNN + Transformer
- CNN processes the image
- Transformer processes the question
- Combined model generates an answer
- Used in: Educational apps, virtual assistants

### Example 3: Text-to-Image (DALL-E 2)
**Architecture:** Transformer + Diffusion Model
- Transformer understands the text prompt
- Diffusion model generates the image
- Used in: Creative tools, marketing, game development

### Future Directions

**Multimodal Models** (like GPT-4 Vision):
- Process text, images, and even audio together
- Understand context across different data types
- Enable more natural human-AI interaction

**Efficient Architectures:**
- Smaller models that run on phones
- Faster inference for real-time applications
- Lower energy consumption

## Section 7: Challenge Problems

Now it's your turn! Try these challenges to deepen your understanding.

### Challenge 1: CNN Image Classification
**Task:** Modify the CNN example to classify a different type of test image.

**Hints:**
- Try creating a car-like shape (rectangle with circles)
- Or a house-like shape (square with triangle on top)
- See what ResNet50 predicts!

### Challenge 2: Transformer Sentiment
**Task:** Analyze sentiment of your own sentences.

**Try:**
- Write 3 positive sentences
- Write 3 negative sentences
- Write 3 neutral sentences
- See if the model agrees with your assessment!

### Challenge 3: Architecture Selection
**Task:** For each application below, choose the best architecture and explain why:
1. Building a spam email detector
2. Creating an app that identifies plant species from photos
3. Developing a logo generator from text descriptions
4. Making a system that transcribes handwritten notes

### Challenge 4: Research Extension
**Task:** Pick one architecture and research a recent advancement:
- CNNs: Look up EfficientNet or Vision Transformers
- Transformers: Research GPT-4 or Claude
- Diffusion: Investigate SDXL or DALL-E 3

Write a short paragraph (3-5 sentences) about what you learned.

In [None]:
# Challenge 1 Solution Template
# Uncomment and modify this code for Challenge 1

# def create_custom_image():
#     """Create your own test image here"""
#     img_array = np.ones((224, 224, 3), dtype=np.uint8) * 255
#     
#     # Your code here: Draw shapes to create an object
#     # Example: Draw a car, house, animal, etc.
#     
#     return Image.fromarray(img_array)
# 
# # Test your image
# my_image = create_custom_image()
# plt.imshow(my_image)
# plt.show()
# 
# # Classify it
# input_tensor = preprocess(my_image)
# input_batch = input_tensor.unsqueeze(0)
# with torch.no_grad():
#     output = model(input_batch)
# probabilities = torch.nn.functional.softmax(output[0], dim=0)
# top5_prob, top5_idx = torch.topk(probabilities, 5)
# for i, (prob, idx) in enumerate(zip(top5_prob, top5_idx)):
#     print(f"{i+1}. {labels[idx]:20s} - {prob.item()*100:5.2f}%")

print("Challenge 1: Uncomment the code above and try your own image!")

In [None]:
# Challenge 2 Solution Template
# Uncomment and modify for Challenge 2

# my_sentences = [
#     "Your positive sentence 1",
#     "Your positive sentence 2",
#     "Your positive sentence 3",
#     "Your negative sentence 1",
#     "Your negative sentence 2",
#     "Your negative sentence 3",
#     "Your neutral sentence 1",
#     "Your neutral sentence 2",
#     "Your neutral sentence 3",
# ]
# 
# for sentence in my_sentences:
#     result = sentiment_analyzer(sentence)[0]
#     print(f"{result['label']:8s} ({result['score']*100:5.1f}%) - {sentence}")

print("Challenge 2: Uncomment the code above and add your sentences!")

## Section 8: Solutions and Discussion

### Challenge 3 Solutions:

1. **Spam email detector:** **Transformer**
   - Spam detection requires understanding text context
   - Transformers excel at language understanding
   - Can detect subtle patterns in writing style

2. **Plant species identifier:** **CNN**
   - This is image classification
   - CNNs are designed for visual pattern recognition
   - Can detect leaves, flowers, stem patterns

3. **Logo generator from text:** **Diffusion Model**
   - Generating images from text descriptions
   - Diffusion models create high-quality images
   - Can be combined with transformers for better text understanding

4. **Handwriting transcription:** **CNN + Transformer**
   - CNN extracts features from handwritten text images
   - Transformer decodes features into text
   - Common architecture for OCR (Optical Character Recognition)

### Discussion Points:

**Why Pre-trained Models?**
- Training from scratch requires massive datasets and computing power
- Pre-trained models have learned general features
- You can fine-tune them for your specific task
- Democratizes AI - anyone can use powerful models!

**Ethical Considerations:**
- **Bias:** Models learn from data, which may contain biases
- **Deepfakes:** Generative models can be misused
- **Privacy:** Facial recognition raises privacy concerns
- **Environmental impact:** Large models consume significant energy

**Best Practices:**
1. Understand your model's limitations
2. Test on diverse data
3. Be transparent about AI use
4. Consider ethical implications
5. Use appropriate model size for your task

## Section 9: Summary and Next Steps

### What You Learned Today

**Technical Skills:**
- ✅ Used a pre-trained CNN (ResNet50) for image classification
- ✅ Applied transformers for sentiment analysis and text generation
- ✅ Understood when to use each architecture
- ✅ Explored real-world applications

**Key Concepts:**
- **CNNs** detect visual patterns hierarchically (edges → shapes → objects)
- **Transformers** understand context in text using attention mechanisms
- **Diffusion Models** generate images by learning to remove noise
- **Multimodal models** combine architectures for complex tasks

### Quick Reference Guide

| Your Task | Architecture | Python Library | Pre-trained Model |
|-----------|--------------|----------------|-------------------|
| Classify images | CNN | torchvision | ResNet, EfficientNet |
| Detect objects | CNN | detectron2 | YOLO, Faster R-CNN |
| Analyze sentiment | Transformer | transformers | DistilBERT, RoBERTa |
| Generate text | Transformer | transformers | GPT-2, GPT-3 |
| Translate languages | Transformer | transformers | MarianMT, mBART |
| Generate images | Diffusion | diffusers | Stable Diffusion |

### Next Steps for Learning

**Beginner Level:**
1. Practice with more pre-trained models on Hugging Face
2. Try fine-tuning a model on your own small dataset
3. Build a simple image classifier for a personal project
4. Experiment with different transformer models

**Intermediate Level:**
1. Learn about transfer learning and fine-tuning techniques
2. Understand model architectures in detail
3. Try training a small model from scratch
4. Explore model optimization (quantization, pruning)

**Advanced Level:**
1. Implement custom architectures
2. Research recent papers on arXiv
3. Contribute to open-source ML projects
4. Experiment with multimodal models

### Recommended Resources

**Online Courses:**
- Fast.ai Practical Deep Learning for Coders (free)
- Hugging Face NLP Course (free)
- Stanford CS231n (CNNs) and CS224n (NLP)

**Documentation:**
- PyTorch tutorials: pytorch.org/tutorials
- Hugging Face docs: huggingface.co/docs
- Papers with Code: paperswithcode.com

**Communities:**
- r/MachineLearning (Reddit)
- Hugging Face Forums
- Fast.ai Forums
- Local ML meetups

### Final Thoughts

You've taken your first steps with modern neural networks! Remember:

- **Start simple:** Use pre-trained models before building your own
- **Practice regularly:** ML is a skill that improves with practice
- **Stay curious:** The field evolves rapidly - keep learning
- **Build projects:** Apply what you learn to real problems
- **Be ethical:** Consider the impact of your AI applications

**Thank you for participating in this workshop!**

---

*Questions? Contact: Noel Jeffrey Pinton*  
*Course: CMSC 173 - Machine Learning*  
*Institution: University of the Philippines - Cebu*

In [None]:
# Final code cell - Summary visualization
print("="*60)
print(" " * 10 + "WORKSHOP COMPLETE!")
print("="*60)
print("\n📚 What you explored today:")
print("   ✓ CNNs (Convolutional Neural Networks)")
print("   ✓ Transformers")
print("   ✓ Diffusion Models (conceptually)")
print("\n🎯 Key takeaways:")
print("   • CNNs excel at image tasks")
print("   • Transformers are best for text")
print("   • Diffusion models generate high-quality images")
print("   • Real-world apps often combine architectures")
print("\n🚀 Keep practicing and building!")
print("="*60)