# 🚀 DeepSeek-R1 Model with Azure AI Inference 🧠

**DeepSeek-R1** es un modelo de razonamiento de vanguardia que combina aprendizaje por refuerzo y ajuste fino supervisado, destacándose en tareas de razonamiento complejas con 37B parámetros activos y una ventana de contexto de 128K.

En este cuaderno de Jupyter aprenderás a:
1. **Initialize** el ChatCompletionsClient para endpoints sin servidor de Azure
2. **Chat** con DeepSeek-R1 utilizando la extracción de razonamiento
3. **Implement** un ejemplo de planificación de viajes con razonamiento paso a paso
4. **Leverage** la ventana de contexto de 128K para escenarios complejos

## ¿Por qué DeepSeek-R1?
- **Advanced Reasoning**: Se especializa en la resolución de problemas mediante cadenas de pensamiento
- **Massive Context**: Ventana de 128K tokens para análisis detallado
- **Efficient Architecture**: 37B parámetros activos de un total de 671B
- **Safety Integrated**: Capacidades integradas de filtrado de contenido


## 1. Setup & Authentication

Requerimientos:
- `azure-ai-inference`: Para chat completions
- `python-dotenv`: Para las variables de ambiente

Requerimientos del archivo .env 
```bash
AZURE_INFERENCE_ENDPOINT=<your-endpoint-url>
```

In [None]:
import os
import re
from dotenv import load_dotenv
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential

from pathlib import Path

# Load environment variables
notebook_path = Path().absolute()
parent_dir = notebook_path.parent
load_dotenv(parent_dir / '../.env')

endpoint = os.getenv("AZURE_INFERENCE_ENDPOINT")
model_name = "DeepSeek-R1"
key = os.getenv("AZURE_INFERENCE_KEY")

# Initialize client
try:
    client = ChatCompletionsClient(
        endpoint=endpoint,
        credential=AzureKeyCredential(key)
    )
    print("✅ Client initialized | Model:", client.get_model_info().model_name)
except Exception as e:
    print("❌ Initialization failed:", e)

❌ Initialization failed: (None) Unauthorized. Access token is missing, invalid, audience is incorrect (https://cognitiveservices.azure.com), or have expired.
Code: None
Message: Unauthorized. Access token is missing, invalid, audience is incorrect (https://cognitiveservices.azure.com), or have expired.


## 2. Intelligent Travel Planning ✈️

Demonstrate DeepSeek-R1's reasoning capabilities for trip planning:

In [None]:
def plan_trip_with_reasoning(query, show_thinking=False):
    """Get travel recommendations with reasoning extraction"""
    messages = [
        SystemMessage(content="You are a travel expert. Provide detailed plans with rationale."),
        UserMessage(content=f"{query} Include hidden gems and safety considerations.")
    ]
    
    response = client.complete(
        messages=messages,
        model=model_name,
        temperature=0.7,
        max_tokens=1024
    )
    
    content = response.choices[0].message.content
    
    # Extract reasoning if present
    if show_thinking:
        match = re.search(r"<think>(.*?)</think>(.*)", content, re.DOTALL)
        if match:
            return {"thinking": match.group(1).strip(), "answer": match.group(2).strip()}
    return content

# Example usage
query = "Plan a 5-day cultural trip to Kyoto in April"
result = plan_trip_with_reasoning(query, show_thinking=True)

print("🗺️ Query:", query)
if isinstance(result, dict):
    print("\n🧠 Thinking Process:", result["thinking"])
    print("\n📝 Final Answer:", result["answer"])
else:
    print("\n📝 Response:", result)

## 3. Technical Problem Solving 💻

Showcase coding/optimization capabilities:

In [None]:
def solve_technical_problem(problem):
    """Solve complex technical problems with structured reasoning"""
    response = client.complete(
        messages=[
            UserMessage(content=f"{problem} Please reason step by step, and put your final answer within \boxed{{}}.")
        ],
        model=model_name,
        temperature=0.3,
        max_tokens=2048
    )
    
    return response.choices[0].message.content

# Database optimization example
problem = """How can I optimize a PostgreSQL database handling 10k transactions/second?
Consider indexing strategies, hardware requirements, and query optimization."""

print("🔧 Problem:", problem)
print("\n⚙️ Solution:", solve_technical_problem(problem))

## 4. Best Practices & Considerations

1. **Reasoning Handling**: Use regex to separate <think> content from final answers
2. **Safety**: Built-in content filtering - handle HttpResponseError for violations
3. **Performance**:
   - Max tokens: 4096
   - Rate limit: 200K tokens/minute
4. **Cost**: Pay-as-you-go with serverless deployment
5. **Streaming**: Implement response streaming for long completions

```python
# Streaming example
response = client.complete(..., stream=True)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")
```

## 🎯 Key Takeaways
- Leverage 128K context for detailed analysis
- Extract reasoning steps for debugging/analysis
- Combine with Azure AI Content Safety for production
- Monitor token usage via response.usage

> Always validate model outputs for critical applications!