<a href="https://colab.research.google.com/github/gnoejh/AI/blob/main/Book/introduction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural Network Architectures

This document catalogs various neural network architectures that have been developed and widely used across different domains. These architectures represent different ways of organizing the fundamental building blocks to solve specific types of problems.

## Architecture Library

### Feedforward Networks
- [Multilayer Perceptron](./mlp.ipynb) - Classic fully-connected neural networks
- [Deep Belief Networks](./dbn.ipynb) - Stacked Restricted Boltzmann Machines with deep architectures

### Convolutional Networks
- [LeNet](./lenet.ipynb) - One of the earliest CNNs for digit recognition
- [AlexNet](./alexnet.ipynb) - Breakthrough CNN architecture that advanced image classification
- [VGG](./vgg.ipynb) - Deep CNN with uniform 3x3 convolutions throughout
- [ResNet](./resnet.ipynb) - Extremely deep networks with residual connections
- [Inception/GoogLeNet](./inception.ipynb) - Network architecture with parallel convolution paths
- [DenseNet](./densenet.ipynb) - Dense connections between layers to maximize information flow
- [EfficientNet](./efficientnet.ipynb) - Scaling networks for improved accuracy and efficiency
- [MobileNet](./mobilenet.ipynb) - Lightweight models optimized for mobile and embedded devices
- [U-Net](./unet.ipynb) - Encoder-decoder architecture with skip connections for image segmentation

### Sequence Models
- [Recurrent Neural Network](./rnn.ipynb) - Networks with feedback connections for sequence data
- [LSTM](./lstm.ipynb) - Long Short-Term Memory networks for long-range dependencies
- [GRU](./gru.ipynb) - Gated Recurrent Units, simplified variant of LSTMs

### Attention-Based Models
- [Transformer](./transformer.ipynb) - Attention-based architecture that revolutionized NLP
- [BERT](./bert.ipynb) - Bidirectional Encoder from Transformer for language understanding
- [GPT](./gpt.ipynb) - Generative Pre-trained Transformer for text generation
- [Gemini](./gemini.ipynb) - Multimodal model capable of understanding and generating text, images and other modalities
- [Vision Transformer](./vit.ipynb) - Applying transformer architecture to computer vision tasks

### Generative Models
- [Autoencoder](./autoencoder.ipynb) - Encoding and decoding for unsupervised learning
- [Variational Autoencoder](./vae.ipynb) - Probabilistic autoencoders for generating new data
- [GAN](./gan.ipynb) - Generative Adversarial Networks for realistic data generation
- [Diffusion Models](./diffusion.ipynb) - Noise-based generative models with high fidelity
- [Stable Diffusion](./stable_diffusion.ipynb) - Latent diffusion models for high-quality image generation

### Specialized Architectures
- [Siamese Networks](./siamese.ipynb) - Twin networks for similarity learning and one-shot learning
- [Graph Neural Networks](./gnn.ipynb) - Networks operating on graph-structured data

## Architecture Applications

Different neural network architectures excel in specific domains:

- **Computer Vision**: CNN-based architectures (LeNet, AlexNet, ResNet, etc.)
- **Natural Language Processing**: Sequence models and Transformers (LSTM, BERT, GPT)
- **Speech Recognition**: Hybrid CNN-RNN architectures
- **Generative Tasks**: GANs, VAEs, Diffusion Models
- **Reinforcement Learning**: Deep Q-Networks, Actor-Critic Networks
- **Graph Analysis**: Graph Neural Networks
- **Medical Imaging**: U-Net and its variants for segmentation tasks
- **Mobile Applications**: MobileNet, EfficientNet for resource-constrained environments

Each architecture notebook contains historical context, architectural details, and common use cases.

## Architecture Evolution

Neural network architectures have evolved to address limitations of previous designs:

- **Deeper Networks**: From shallow MLPs to extremely deep ResNets (152+ layers)
- **Vanishing Gradients**: RNNs → LSTMs/GRUs with gating mechanisms
- **Sequential Bottlenecks**: RNNs → Transformers with parallel processing
- **Information Flow**: Plain networks → Skip connections, Dense connections
- **Computational Efficiency**: Large models → Techniques like MobileNet, EfficientNet

Understanding this evolutionary path helps in designing new architectures for emerging problems.

## Implementation Frameworks

Most modern architectures are implemented in standard deep learning frameworks:

### TensorFlow/Keras
```python
from tensorflow.keras.applications import ResNet50, VGG16, InceptionV3

# Load pre-trained model
model = ResNet50(weights='imagenet')
```

### PyTorch
```python
import torchvision.models as models

# Load pre-trained model
model = models.resnet50(pretrained=True)
```

### Hugging Face (for NLP models)
```python
from transformers import BertModel, GPT2Model

# Load pre-trained model
model = BertModel.from_pretrained('bert-base-uncased')
```

## References and Further Reading

- LeCun, Y., et al. (1998). [Gradient-based learning applied to document recognition](http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf). Proceedings of the IEEE.
- Krizhevsky, A., et al. (2012). [ImageNet Classification with Deep Convolutional Neural Networks](https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf). NIPS.
- He, K., et al. (2015). [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385). CVPR.
- Hochreiter, S., & Schmidhuber, J. (1997). [Long Short-Term Memory](https://www.bioinf.jku.at/publications/older/2604.pdf). Neural Computation.
- Vaswani, A., et al. (2017). [Attention Is All You Need](https://arxiv.org/abs/1706.03762). NIPS.
- Goodfellow, I., et al. (2014). [Generative Adversarial Networks](https://arxiv.org/abs/1406.2661). NIPS.
- Devlin, J., et al. (2018). [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805). arXiv.
- Ronneberger, O., et al. (2015). [U-Net: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/abs/1505.04597). MICCAI.
- Howard, A. G., et al. (2017). [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861). arXiv.