<a href="https://colab.research.google.com/github/gnoejh/ict1022/blob/main/Architectures/deepseek.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DeepSeek Architecture

## Overview

DeepSeek is a family of large language models developed by DeepSeek AI, with particular focus on coding capabilities. The models have gained recognition for their strong performance on code generation, math reasoning, and general language tasks while maintaining an open approach to development.

## Key Features

- **Code-First Design**: Particularly strong performance on programming tasks across multiple languages
- **Math Reasoning**: Enhanced capabilities for mathematical reasoning and problem-solving
- **Multiple Size Variants**: Available in various parameter sizes (7B to 67B)
- **Open Weights**: Weights publicly available for research and development
- **Bilingual Training**: Strong performance in both English and Chinese

## Architecture Specifics

DeepSeek models are based on the Transformer architecture with several optimizations:

- Decoder-only design similar to GPT and Llama architectures
- SwiGLU activation function
- RMSNorm for layer normalization
- Rotary positional embeddings (RoPE)
- Group Query Attention for improved efficiency
- Extended context length (up to 128K tokens depending on variant)

## Variants

- **DeepSeek-LLM**: Base models focused on general language capabilities
- **DeepSeek-Coder**: Models specifically optimized for programming tasks
- **DeepSeek-Math**: Specialized for mathematical reasoning and problem-solving
- **DeepSeek-V2**: Updated architecture with enhanced capabilities

## Usage Examples

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the DeepSeek Coder model
model_id = "deepseek-ai/deepseek-coder-7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

# Format prompt for code generation
messages = [
    {"role": "user", "content": "Write a Python function to find the nth Fibonacci number using memoization."}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False)

# Generate code
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(inputs.input_ids, max_length=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## References

- DeepSeek AI. (2023). [DeepSeek LLM: Scaling Open-Source Language Models with Longtermism](https://arxiv.org/abs/2401.02954). arXiv.
- DeepSeek AI. (2023). [DeepSeek Coder: When the Large Language Model Meets Programming](https://github.com/deepseek-ai/DeepSeek-Coder/blob/main/README.md).
