<a href="https://colab.research.google.com/github/gnoejh/AI/blob/main/Book/1.introduction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Introduction to Deep Learning

## 1.1 What is Deep Learning?

Deep learning is a subset of machine learning that uses artificial neural networks to learn hierarchical representations of data.

### Deep Learning Pipeline

<div class="zoomable-mermaid">

```mermaid
graph LR
    subgraph Input
        A[Raw Data]
    end
    subgraph Hidden Layers
        B[Simple Features]
        C[Complex Features]
        D[Abstract Concepts]
    end
    subgraph Output
        E[Predictions]
    end
    A --> B --> C --> D --> E
```

</div>

### Neural Networks Architecture

A neural network consists of:
1. Input Layer: Receives raw data
2. Hidden Layers: Processes and transforms data
3. Output Layer: Produces final predictions
4. Weights & Biases: Learnable parameters
5. Activation Functions: Non-linear transformations

In [1]:
import torch
import torch.nn as nn

class ModernNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(ModernNN, self).__init__()
        self.layers = nn.Sequential(
            nn.Linear(input_size, hidden_size),
            nn.ReLU(),
            nn.BatchNorm1d(hidden_size),
            nn.Dropout(0.3),
            nn.Linear(hidden_size, num_classes),
            nn.Softmax(dim=1)
        )
    
    def forward(self, x):
        return self.layers(x)

# Example usage
model = ModernNN(784, 256, 10)  # MNIST-like architecture
print(model)  # Print the model architecture
x = torch.randn(64, 784)  # 64 samples with 784 features each
y = model(x)  # Forward pass
print(y.shape)  # torch.Size([64, 10])

ModernNN(
  (layers): Sequential(
    (0): Linear(in_features=784, out_features=256, bias=True)
    (1): ReLU()
    (2): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): Dropout(p=0.3, inplace=False)
    (4): Linear(in_features=256, out_features=10, bias=True)
    (5): Softmax(dim=1)
  )
)
torch.Size([64, 10])


### Types of Deep Learning

| Type | Description | Common Applications | Key Architectures |
|------|-------------|---------------------|-------------------|
| Supervised | Learning from labeled data | Classification, Regression | CNN, RNN |
| Unsupervised | Finding patterns in unlabeled data | Clustering, Dimensionality Reduction | Autoencoder, GAN |
| Self-Supervised | Learning from data's inherent structure | Pre-training, Representation Learning | BERT, SimCLR |
| Reinforcement | Learning through environment interaction | Game AI, Robotics | DQN, PPO |

### Evolution of Modern AI (2012-Present)

<div class="zoomable-mermaid">

```mermaid
timeline
    title Major Deep Learning & AI Breakthroughs
    2012 : AlexNet
         : Deep Learning Revolution Begins
    2014 : GANs Introduced
         : Deep Learning for Image Generation
    2017 : Transformer Architecture
         : Attention Is All You Need
    2018 : BERT
         : Transfer Learning in NLP
    2019 : GPT-2
         : Large Language Models Emerge
    2020 : GPT-3 & DDPM
         : Few-shot Learning & Novel Diffusion Models
    2021 : DALL-E
         : Text-to-Image Generation
    2022 : ChatGPT & Stable Diffusion
         : AI Goes Mainstream
    2023 : GPT-4 & Gemini
         : Multimodal AI Systems
    2024 : Sora & Claude 3
         : Text-to-Video & Advanced Reasoning
```

</div>

#### Key Modern AI Paradigms

| Year | Technology | Impact | Key Innovation |
|------|------------|---------|----------------|
| 2017-2019 | Transformers & BERT | NLP Revolution | Attention Mechanism |
| 2020-2022 | Large Language Models | General AI Assistants | Scale & Transfer Learning |
| 2022-2023 | Diffusion Models | Creative AI | Controlled Generation |
| 2023-2024 | Multimodal AI | Cross-domain Understanding | Multi-task Learning |

#### Modern AI Capabilities

<div class="zoomable-mermaid">

```mermaid
mindmap
  root((Modern AI))
    Language
      Chat & Dialogue
      Code Generation
      Translation
    Vision
      Image Generation
      Video Synthesis
      3D Modeling
    Audio
      Speech Recognition
      Music Generation
      Voice Synthesis
    Multimodal
      Text-to-Image
      Text-to-Video
      Cross-modal Understanding
```

</div>

#### Emerging Trends
- Agent-based AI: Autonomous systems that can plan and execute complex tasks
- Multimodal Learning: Integration of different types of data and modalities
## 1.2 Deep Learning vs Traditional Machine Learning

### Key Differences:

```mermaid
graph TB
    subgraph Traditional ML
        A1[Feature Extraction] --> B1[Feature Engineering]
        B1 --> C1[Model Training]
    end
    subgraph Deep Learning
        A2[Raw Data] --> B2[Automatic Feature Learning]
        B2 --> C2[End-to-End Training]
    end
```

| Aspect | Traditional ML | Deep Learning |
|--------|----------------|---------------|
| Feature Engineering | Manual | Automatic |
| Data Requirements | Small to Medium | Large |
| Interpretability | Higher | Lower |
| Training Time | Faster | Slower |
| Hardware Requirements | CPU sufficient | GPU/TPU preferred |

## 1.3 Modern Applications

### Computer Vision
- Object Detection
- Image Segmentation
- Face Recognition
- Medical Imaging

In [1]:
# Example: Using a pre-trained vision model
from torchvision.models import resnet50, ResNet50_Weights

model = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)
model.eval()

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

### Natural Language Processing
- Machine Translation
- Text Generation
- Sentiment Analysis
- Question Answering

In [3]:
# Example: Simple sentiment analysis implementation
# Note: Transformers library has dependency conflicts in this environment

def simple_sentiment_analysis(text):
    """Simple rule-based sentiment analysis"""
    positive_words = ['love', 'great', 'amazing', 'excellent', 'revolutionizing', 
                     'fantastic', 'wonderful', 'good', 'awesome', 'brilliant']
    negative_words = ['hate', 'bad', 'terrible', 'awful', 'horrible', 
                     'disappointing', 'poor', 'worst', 'failed']
    
    text_lower = text.lower()
    pos_count = sum(1 for word in positive_words if word in text_lower)
    neg_count = sum(1 for word in negative_words if word in text_lower)
    
    if pos_count > neg_count:
        return {'label': 'POSITIVE', 'score': min(0.7 + pos_count * 0.1, 0.99)}
    elif neg_count > pos_count:
        return {'label': 'NEGATIVE', 'score': min(0.7 + neg_count * 0.1, 0.99)}
    else:
        return {'label': 'NEUTRAL', 'score': 0.5}

# Test the sentiment analysis
result1 = simple_sentiment_analysis('I love deep learning!')
print(f"Text: 'I love deep learning!'")
print(f"Result: {result1}")

result2 = simple_sentiment_analysis('Deep learning is revolutionizing AI!')
print(f"\nText: 'Deep learning is revolutionizing AI!'")
print(f"Result: {result2}")

result3 = simple_sentiment_analysis('This is a neutral statement.')
print(f"\nText: 'This is a neutral statement.'")
print(f"Result: {result3}")

Text: 'I love deep learning!'
Result: {'label': 'POSITIVE', 'score': 0.7999999999999999}

Text: 'Deep learning is revolutionizing AI!'
Result: {'label': 'POSITIVE', 'score': 0.7999999999999999}

Text: 'This is a neutral statement.'
Result: {'label': 'NEUTRAL', 'score': 0.5}


In [4]:
# Alternative: Using Hugging Face Transformers (when dependencies are properly installed)
# Uncomment and run this when transformers library is working:

"""
from transformers import pipeline

# Example: Advanced sentiment analysis with pre-trained models
classifier = pipeline('sentiment-analysis', 
                     model='cardiffnlp/twitter-roberta-base-sentiment-latest')

result = classifier('I love deep learning!')
print(f"Advanced result: {result}")

# You can also use other models:
# classifier = pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english')
"""

print("Note: The above code is commented out due to dependency conflicts.")
print("For production use, ensure proper installation of transformers and its dependencies.")

Note: The above code is commented out due to dependency conflicts.
For production use, ensure proper installation of transformers and its dependencies.


### Emerging Applications
- Autonomous Vehicles
- Drug Discovery
- Climate Modeling
- Creative Arts (AI Generation)

## 1.4 Modern Architectures

### Transformer Architecture

The Transformer architecture, introduced in the paper 'Attention Is All You Need', has revolutionized natural language processing (NLP) by enabling models to handle long-range dependencies and parallelize training.

Key components:
1. Self-Attention Mechanism: Allows the model to weigh the importance of different words in a sentence.
2. Positional Encoding: Adds information about the position of words in a sequence.
3. Multi-Head Attention: Improves the model's ability to focus on different parts of the input.
4. Feed-Forward Networks: Applies non-linear transformations to the input.

### Generative Adversarial Networks (GANs)

GANs consist of two neural networks, a generator and a discriminator, that compete against each other. The generator creates fake data, while the discriminator tries to distinguish between real and fake data.

Applications:
- Image generation
- Data augmentation
- Style transfer

### Variational Autoencoders (VAEs)

VAEs are generative models that learn to encode input data into a latent space and then decode it back to the original space. They are used for generating new data samples and learning data representations.

Applications:
- Image generation
- Anomaly detection
- Data compression

### Diffusion Models

Diffusion models are a class of generative models that learn to reverse a diffusion process, which gradually adds noise to data. They have shown impressive results in generating high-quality images.

Applications:
- Image generation
- Text-to-image synthesis
- Super-resolution


## 1.5 Large Models

| Model | Company | Size (Parameters) | Properties |
|-------|---------|-------------------|------------|
| GPT-3 | OpenAI | 175 billion | Few-shot learning, Text generation |
| BERT | Google | 340 million | Bidirectional, Pre-training for NLP |
| T5 | Google | 11 billion | Text-to-text framework |
| DALL-E | OpenAI | 12 billion | Text-to-image generation |
| GPT-4 | OpenAI | 1 trillion+ | Multimodal capabilities |
| Gemini | Google DeepMind | 500 billion | Advanced reasoning, Multimodal |
| Claude 3 | Anthropic | 100 billion | Safety-focused, Conversational AI |
| Sora | Microsoft | 200 billion | Text-to-video generation |
| Google Titans | Google | 1.5 trillion | Advanced NLP, Multimodal |
| LLama3 | Meta | 1 trillion | Multimodal, Advanced reasoning |
| DeepSeek | DeepSeek Co. | 800 billion | Multimodal, Advanced search capabilities |
| Grok | xAI | 900 billion | Advanced reasoning, Conversational AI |


## 1.6 GPU and AI Chips

| Chip | Manufacturer | Key Features |
|------|--------------|---------------|
| A100 | NVIDIA | High performance, Tensor Cores, Multi-instance GPU |
| V100 | NVIDIA | Tensor Cores, High memory bandwidth |
| TPU v4 | Google | Optimized for TensorFlow, High efficiency |
| Habana Gaudi | Intel | High throughput, Cost-effective |
| M1 | Apple | Unified memory architecture, High efficiency |
| Ascend 910 | Huawei | High performance, AI-specific optimizations |
| AMD Instinct MI100 | AMD | High performance, ROCm support |
| Cerebras CS-2 | Cerebras | Wafer-scale engine, High compute density |
| Blackwell | NVIDIA | Next-gen performance, Enhanced AI capabilities |
| HBM3e | Hynix | Enhanced bandwidth, Low power consumption |


## 1.7 AI Developer Tools

### 1.7.1 Frameworks and Libraries
- TensorFlow: An open-source platform for machine learning.
- PyTorch: An open-source machine learning library based on the Torch library.
- Keras: A high-level neural networks API, written in Python and capable of running on top of TensorFlow.
- Scikit-learn: A machine learning library for the Python programming language.
- Hugging Face Transformers: A library for state-of-the-art NLP models.

### 1.7.2 Development Environments
- Jupyter Notebook: An open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text.
- Google Colab: A free Jupyter notebook environment that runs entirely in the cloud.
- VS Code: A source-code editor made by Microsoft for Windows, Linux, and macOS.
- PyCharm: An integrated development environment (IDE) used in computer programming, specifically for the Python language.

### 1.7.3 Model Deployment and Serving
- TensorFlow Serving: A flexible, high-performance serving system for machine learning models, designed for production environments.
- TorchServe: A flexible and easy-to-use tool for serving PyTorch models.
- ONNX Runtime: A cross-platform, high-performance scoring engine for Open Neural Network Exchange (ONNX) models.
- FastAPI: A modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints.


### 1.7.4 Experiment Tracking and Management
- MLflow: An open-source platform for managing the end-to-end machine learning lifecycle.
- Weights & Biases: A tool for experiment tracking, model optimization, and dataset versioning.
- Neptune.ai: A metadata store for MLOps, built for research and production teams that run a lot of experiments.
- MLflow: An open-source platform for managing the end-to-end machine learning lifecycle.
- Weights & Biases: A tool for experiment tracking, model optimization, and dataset versioning.
- Neptune.ai: A metadata store for MLOps, built for research and production teams that run a lot of experiments.
- Comet.ml: A machine learning platform that allows data scientists and AI practitioners to track, compare, explain, and optimize experiments and models.

## 1.8 Quantum Computers

| Quantum Computer | Manufacturer | Qubits | Key Features |
|------------------|--------------|--------|--------------|
| Bristlecone | Google | 72 | Error correction, Scalable architecture |
| D-Wave Advantage | D-Wave Systems | 5000+ | Quantum annealing, Optimization problems |
| Google Quantum AI | Google | 72 | Quantum supremacy, Error correction |
| Honeywell H1 | Honeywell | 10 | High fidelity, Trapped ion technology |
| IBM Q System | IBM | 27 | High coherence times, Cloud access |
| IBM Quantum System One | IBM | 65 | High coherence times, Scalable architecture |
| IonQ Harmony | IonQ | 11 | Trapped ion technology, High fidelity |
| IonQ System | IonQ | 32 | Trapped ion technology, High fidelity |
| Origin Wukong | Wukong | 50 | High performance, Quantum supremacy |
| Rigetti Aspen-9 | Rigetti Computing | 32 | Hybrid quantum-classical computing |
| Rigetti Quantum | Rigetti Computing| 40 | High coherence times, Scalable architecture |
| Sycamore | Google | 54 | Quantum supremacy, Error correction |
| Zuchongzhi | University of Science and Technology of China | 66 | Quantum supremacy, High performance |


## Discussions

Summary:
- This chapter introduced fundamental deep learning concepts and related technologies.
- We explored modern applications across business and emerging technologies.

Questions:
1. How do diffusion models differ from transformer models?
2. What makes Transformer architectures a breakthrough compared to older NLP models?
