# 🤖 DeepSeek R1 Chat Interface


# 📱 [GitHub Repository](https://github.com/edoardoavenia/deepseek-r1-gradio-env)

## Interactive environment to explore DeepSeek-R1 models on Google Colab Free GPUs. This project demonstrates the integration between Ollama, Gradio UI, and LangChain pipelines to create a smooth and accessible development experience.

### Important: Remember to change the runtime type to T4 GPU on Google Colab before starting

In [None]:
# Update and install necessary packages

!sudo apt update
!sudo apt install -y pciutils
!curl -fsSL https://ollama.com/install.sh | sh

## 🔧 Initialize Ollama Service & Install Dependencies

In [None]:
import threading
import subprocess

# Start Ollama service in background

def run_ollama_serve():
  subprocess.Popen(["ollama", "serve"])

thread = threading.Thread(target=run_ollama_serve)
thread.start()

# install required packages

!pip install langchain_core langchain_ollama gradio llm_xml_parser

# Model Setup

This implementation supports multiple DeepSeek-R1 models that can run on Google Colab's T4 GPU (16GB):

- ```deepseek-r1:1.5b``` : A distilled model based on Qwen-2.5-Math-1.5B
- ```deepseek-r1:7b``` : A distilled model based on Qwen-2.5-Math-7B
- ```deepseek-r1:8b``` : A distilled model based on Llama-3.1-8B
- ```deepseek-r1:14b``` : A distilled model based on Qwen-2.5-14B

The default configuration uses 'deepseek-r1:1.5b', but you can scale up to the 14B model depending on your needs. Each model is optimized for reasoning tasks through distillation from their respective base models.

In [None]:
# Pull DeepSeek model and install required packages

!ollama pull deepseek-r1:1.5b  # Change with deepseek-r1:7b, deepseek-r1:8b or deepseek-r1:14b

# 💬 Conversational Interface

## The following code implements a chat interface that showcases DeepSeek-R1's unique capabilities:
- Separates the model's reasoning from its final answer using llm_xml_parser
- Maintains conversation history with LangChain
- Provides a clean, user-friendly interface through Gradio
- Uses a temperature of 0.7

## If you get any errors, try rerunning the last two code blocks

#### Model info and other variants available on [Ollama Library ↗](https://ollama.com/library/deepseek-r1:1.5b)


In [None]:
import gradio as gr
from langchain_ollama import ChatOllama
from llm_xml_parser.core.parser import parse
from langchain_core.messages import HumanMessage, AIMessage

# Initialize model
llm = ChatOllama(model="deepseek-r1:1.5b", temperature=0.7) # Change with deepseek-r1:7b, deepseek-r1:8b or deepseek-r1:14b

# Conversation memory
messages = []

def respond(message, history):
    # Add user message to history
    messages.append(HumanMessage(content=message))

    # Get model response
    response = llm.invoke(messages)

    # Parse response
    config={'think': 'single'}

    result = parse(response.content,config)

    answer = result.untagged


    # Add AI response to history
    messages.append(AIMessage(content=answer))

    return f"# Reasoning:\n{result.think}\n---\n# Answer:\n{answer}"


# Initialize Gradio interface
demo = gr.ChatInterface(
    fn=respond,
    title="DeepSeek R1 Chat",
    description="DeepSeek R1 • Ollama • LangChain • Gradio",
    theme="default"
)

# Launch the interface
demo.launch(debug=True)