# 🚀 DeepSeek-R1 Model with Azure AI Inference 🧠

**DeepSeek-R1** is a state-of-the-art reasoning model combining reinforcement learning and supervised fine-tuning, excelling at complex reasoning tasks with 37B active parameters and 128K context window.

In this notebook, you'll learn to:
1. **Initialize** the ChatCompletionsClient for Azure serverless endpoints
2. **Chat** with DeepSeek-R1 using reasoning extraction
3. **Implement** a travel planning example with step-by-step reasoning
4. **Leverage** the 128K context window for complex scenarios

## Why DeepSeek-R1?
- **Advanced Reasoning**: Specializes in chain-of-thought problem solving
- **Massive Context**: 128K token window for detailed analysis
- **Efficient Architecture**: 37B active parameters from 671B total
- **Safety Integrated**: Built-in content filtering capabilities


## 1. Setup & Authentication

Required packages:
- `azure-ai-inference`: For chat completions
- `python-dotenv`: For environment variables

.env file requirements:
```bash
AZURE_INFERENCE_ENDPOINT=<your-endpoint-url>
AZURE_INFERENCE_KEY=<your-api-key>
MODEL_NAME=DeepSeek-R1
```

In [None]:
import os
import re
from dotenv import load_dotenv
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.identity import DefaultAzureCredential

# Load environment
load_dotenv()
endpoint = os.getenv("AZURE_INFERENCE_ENDPOINT",
                     "https://aaisc.services.ai.azure.com/models")
model_name = os.getenv("MODEL_NAME", "DeepSeek-R1")

# Initialize client
try:
  client = ChatCompletionsClient(endpoint=endpoint, credential=DefaultAzureCredential(
  ), credential_scopes=["https://cognitiveservices.azure.com/.default"])

except Exception as e:
  print("❌ Initialization failed:", e)

## 2. Intelligent Travel Planning ✈️

Demonstrate DeepSeek-R1's reasoning capabilities for trip planning:

In [8]:
def plan_trip_with_reasoning(query, show_thinking=False):
  """Get travel recommendations with reasoning extraction"""
  messages = [
      SystemMessage(
          content="You are a travel expert. Provide detailed plans with rationale."),
      UserMessage(
          content=f"{query} Include hidden gems and safety considerations.")
  ]

  response = client.complete(
      messages=messages,
      model=model_name,
      temperature=0.7,
      max_tokens=1024
  )

  content = response.choices[0].message.content

  # Extract reasoning if present
  if show_thinking:
    match = re.search(r"<think>(.*?)</think>(.*)", content, re.DOTALL)
    if match:
      return {"thinking": match.group(1).strip(), "answer": match.group(2).strip()}
  return content


# Example usage
query = "Plan a 5-day cultural trip to Kyoto in April"
result = plan_trip_with_reasoning(query, show_thinking=True)

print("🗺️ Query:", query)
if isinstance(result, dict):
  print("\n🧠 Thinking Process:", result["thinking"])
  print("\n📝 Final Answer:", result["answer"])
else:
  print("\n📝 Response:", result)

🗺️ Query: Plan a 5-day cultural trip to Kyoto in April

🧠 Thinking Process: Okay, the user wants a 5-day cultural trip to Kyoto in April, including hidden gems and safety tips. Let me start by recalling Kyoto's main attractions and the best times to visit them. April is cherry blossom season, so some spots might be crowded. They mentioned hidden gems, so I need to include less touristy places. Also, safety considerations are important, especially regarding crowds, weather, and COVID-19 measures.

First, day one: Arrival. Kiyomizu-dera is a must, but maybe suggest going early to avoid crowds. Then Sannenzaka and Ninenzaka for traditional streets. Lunch in Gion, maybe a specific restaurant. Hidden gem: Ishibei-koji Lane. Evening in Gion, but mention being respectful in geisha areas. Safety tip: wear comfortable shoes for the stone paths.

Day two: Arashiyama. Bamboo Grove is popular, so early morning. Then Tenryu-ji. Hidden gem: Otagi Nenbutsu-ji Temple with the statues. Lunch in Arashiy

## 3. Technical Problem Solving 💻

Showcase coding/optimization capabilities:

In [10]:
def solve_technical_problem(problem):
  """Solve complex technical problems with structured reasoning"""
  response = client.complete(
      messages=[
          UserMessage(
              content=f"{problem} Please reason step by step, and put your final answer within \boxed{{}}.")
      ],
      model=model_name,
      temperature=0.3,
      max_tokens=2048
  )

  return response.choices[0].message.content


# Database optimization example
problem = """How can I optimize a PostgreSQL database handling 10k transactions/second?
Consider indexing strategies, hardware requirements, and query optimization."""

print("🔧 Problem:", problem)
print("\n⚙️ Solution:", solve_technical_problem(problem))

🔧 Problem: How can I optimize a PostgreSQL database handling 10k transactions/second?
Consider indexing strategies, hardware requirements, and query optimization.

⚙️ Solution: <think>
Okay, so I need to figure out how to optimize a PostgreSQL database that's handling 10k transactions per second. That's a pretty high load, so I need to make sure everything is tuned properly. Let me start by breaking down the problem into parts: indexing strategies, hardware requirements, and query optimization. Maybe there are other areas too, like configuration settings or replication for scaling.

First, indexing. Indexes are crucial for speeding up queries, but they can also slow down writes if overused. For 10k TPS, the right indexes are essential. I should consider the types of indexes. B-tree is the default and good for most cases. Maybe using partial indexes if some queries filter on a specific condition often. Also, covering indexes (INCLUDE) to include columns that are frequently accessed but 

## 4. Best Practices & Considerations

1. **Reasoning Handling**: Use regex to separate <think> content from final answers
2. **Safety**: Built-in content filtering - handle HttpResponseError for violations
3. **Performance**:
   - Max tokens: 4096
   - Rate limit: 200K tokens/minute
4. **Cost**: Pay-as-you-go with serverless deployment
5. **Streaming**: Implement response streaming for long completions

```python
# Streaming example
response = client.complete(..., stream=True)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")
```

## 🎯 Key Takeaways
- Leverage 128K context for detailed analysis
- Extract reasoning steps for debugging/analysis
- Combine with Azure AI Content Safety for production
- Monitor token usage via response.usage

> Always validate model outputs for critical applications!