# Installation

In [9]:
%%bash
pip install  --upgrade\
    'vllm>=0.8.2' \
    'transformers>=4.50.3' \
    pyzmq \
    unsloth \
    accelerate \
    bitsandbytes \
    openai \
    langchain-text-splitters \
    peft \
    FlagEmbedding \
    datasets \
    faiss-cpu \
    langchain-text-splitters \
    tavily-python \
    "flashinfer-python>=0.2.4"  --extra-index-url https://flashinfer.ai/whl/cu124/torch2.6/
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp/gguf-py/ && pip install --editable .
pip install jupyter-kernel-gateway ipykernel
pip install --upgrade --no-deps numpy==1.26.4 pandas==2.2.2

Looking in indexes: https://pypi.org/simple, https://flashinfer.ai/whl/cu124/torch2.6/
Collecting tavily-python
  Downloading tavily_python-0.5.4-py3-none-any.whl.metadata (91 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 91.6/91.6 kB 6.4 MB/s eta 0:00:00
Collecting gguf==0.10.0 (from vllm>=0.8.2)
  Using cached gguf-0.10.0-py3-none-any.whl.metadata (3.5 kB)
Using cached gguf-0.10.0-py3-none-any.whl (71 kB)
Downloading tavily_python-0.5.4-py3-none-any.whl (44 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.4/44.4 kB 3.3 MB/s eta 0:00:00
Installing collected packages: gguf, tavily-python
  Attempting uninstall: gguf
    Found existing installation: gguf 0.16.0
    Uninstalling gguf-0.16.0:
      Successfully uninstalled gguf-0.16.0
Successfully installed gguf-0.10.0 tavily-python-0.5.4
Obtaining file:///content/llama.cpp/gguf-py
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Checking if build backend supports build_editable

fatal: destination path 'llama.cpp' already exists and is not an empty directory.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
vllm 0.8.3 requires gguf==0.10.0, but you have gguf 0.16.0 which is incompatible.


In [2]:
import os
from google.colab import userdata
os.environ["HF_TOKEN"] = userdata.get('HF_WRITE_TOKEN')
!huggingface-cli login --add-to-git-credential --token $HF_TOKEN
os.environ["TAVILY_API_KEY"] = userdata.get('TAVILY_API_KEY')

Token is valid (permission: write).
The token `WriteToken` has been saved to /root/.cache/huggingface/stored_tokens
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub.
Run the following command in your terminal in case you want to set the 'store' credential helper as default.

git config --global credential.helper store

Read https://git-scm.com/book/en/v2/Git-Tools-Credential-Storage for more details.[0m
Token has not been saved to git credential helper.
Your token has been saved to /root/.cache/huggingface/token
Login successful.
Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


```bash
VLLM_BACKEND=FLASHINFER VLLM_USE_V1=1 VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 TOKENIZERS_PARALLELISM=true MAX_JOBS=2 vllm serve ISTA-DASLab/gemma-3-27b-it-GPTQ-4b-128g --port 8877 --max-model-len 4096 --api-key token-abc123 --quantization compressed-tensors --max-num-seqs=1
```

# Web Search

In [10]:
from tavily import TavilyClient
import asyncio, os, requests, time, json
from IPython.display import display, Markdown, Latex

tavily_client = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])

In [77]:
from openai import OpenAI
import math
import time
import json

client = OpenAI(
    base_url="http://localhost:8877/v1",
    api_key="token-abc123",
)

In [188]:
def deduplicate_and_format_sources(search_response, max_tokens_per_source, include_raw_content=True):
     # Collect all results
    sources_list = []
    for response in search_response:
        sources_list.extend(response['results'])

    # Deduplicate by URL
    unique_sources = {source['url']: source for source in sources_list}

    # Format output
    formatted_text = "Content from sources:\n"
    for i, source in enumerate(unique_sources.values(), 1):
        formatted_text += f"{'='*80}\n"  # Clear section separator
        formatted_text += f"Source: {source['title']}\n"
        formatted_text += f"{'-'*80}\n"  # Subsection separator
        formatted_text += f"URL: {source['url']}\n===\n"
        formatted_text += f"Most relevant content from source: {source['content']}\n===\n"
        if include_raw_content:
            # Using rough estimate of 4 characters per token
            char_limit = max_tokens_per_source * 4
            # Handle None raw_content
            raw_content = source.get('raw_content', '')
            if raw_content is None:
                raw_content = ''
                print(f"Warning: No raw_content found for source {source['url']}")
            if len(raw_content) > char_limit:
                raw_content = raw_content[:char_limit] + "... [truncated]"
            formatted_text += f"Full source content limited to {max_tokens_per_source} tokens: {raw_content}\n\n"
        formatted_text += f"{'='*80}\n\n" # End section separator

    return formatted_text.strip()

In [189]:
def generate_response(message_list):
    completion = client.chat.completions.create(
        model = "ISTA-DASLab/gemma-3-27b-it-GPTQ-4b-128g",
        messages = message_list,
        max_tokens=2048,
        frequency_penalty=0.3,
        temperature=0.6,
        stream=True,
    )

    final_answer = []
    assistant_response = ""

    start = time.time()

    # 스트림 모드에서는 completion.choices 를 반복문으로 순회
    for chunk in completion:
        chunk_content = chunk.choices[0].delta.content

        if isinstance(chunk_content, str):
            final_answer.append(chunk_content)
            # 토큰 단위로 실시간 답변 출력
            print(chunk_content, end="")
            assistant_response += chunk_content

    end = time.time()
    print(f"\n\ninference time: {end - start:.5f} sec \n\n")
    return assistant_response

In [190]:
import threading

def worker(query, search_result, req_num_result, include_raw, req_topic):
    print(f"Thread: {query}")
    search_result.append(
        tavily_client.search(
            query,
            max_results= req_num_result,
            include_raw_content= include_raw,
            topic= req_topic
        )
    )

In [191]:
def ask_tavily(search_queries, search_tasks, req_num_result, include_raw, req_topic):
    start_time = time.time()
    threads = []

    for query in search_queries:
        t = threading.Thread(target=worker, args=(query, search_tasks, req_num_result, include_raw, req_topic))
        threads.append(t)
        t.start()

    for thread in threads:
        thread.join()

    end_time = time.time()
    execution_time = end_time - start_time

    print(f"\nask_tavily task running time: {execution_time:.2f}초 \n")

In [192]:
def ask_plan_query_writer(topic, content):
    llm_prompt = """You are an expert technical writer crafting a section that synthesizes information
<section topic>
""" + topic + """
</section topic>

<section organization>
""" + content + """
</section organization>

<Task>
Your goal is to generate 3 web search queries that will help gather information for planning the sections.

The queries should:

1. Be related to the section topic
2. Help satisfy the requirements specified in the section organization

Make the queries specific enough to find high-quality, relevant sources while covering the breadth needed for the section structure.

Note1. that today's date is """+time.strftime("%Y-%m-%d")+""".
Note2. Output your response in JSON format, with the following structure: { "queries": [ "query1", "query2", "query3" ] }
</Task>"""

    return llm_prompt

In [193]:
def ask_final_writer_instructions(topic, content, search_tasks):
    final_section_writer="""You are an expert technical writer.

<Section name>
""" + content + """
</Section name>

<Section topic>
""" + topic + """
</Section topic>

<Available Website Search Content>
""" + deduplicate_and_format_sources(search_tasks, max_tokens_per_source=4000, include_raw_content=True) + """
</Available Website Search Content>

<Task>
1. Section-Specific Approach:

For Introduction:
- Use # for Website Search title (Markdown format)
- Write in simple and clear language
- Focus on the core motivation for the Section in 1-2 paragraphs
- Use a clear narrative arc to introduce the Section
- Include NO structural elements (no lists or tables)
- No sources section needed

For Conclusion/Summary:
- Use ## for Conclusion/Summary title (Markdown format)
- For comparative Conclusion/Summary:
    * Must include a focused comparison table using Markdown table syntax
    * Table should distill insights from the Section
    * Keep table entries clear and concise
- For non-comparative Conclusion/Summary:
    * Only use ONE structural element IF it helps distill the points made in the Section:
    * Either a focused table comparing items present in the Section (using Markdown table syntax)
    * Or a short list using proper Markdown list syntax:
      - Use `*` or `-` for unordered lists
      - Use `1.` for ordered lists
      - Ensure proper indentation and spacing
- Sources and url section needed. (especially when expressing a URL, please provide the entire URL exactly as given in the content without abbreviating it.)
- End with specific next steps or implications

2. Writing Approach:
- Use concrete details over general statements
- Make every word count
- Focus on your single most important point
</Task>

<Quality Checks>
- Verify that EVERY claim is grounded in the provided Source material
- Confirm each URL appears ONLY ONCE in the Source list
- For introduction: # for Website Search title, no structural elements, no sources section
- For conclusion: ## for Conclusion/Summary title, only ONE structural element at most, add sources and url section
- Markdown format
- Do not include word count or any preamble in your response
</Quality Checks>

Please note that respond in Korean always."""

    return final_section_writer

In [194]:
system_prompt = "You are a helpful assistant. And Answers must be in Korean."

topic = "기술동향"
content = "MCP(model context protocol) 과 A2A(Agent to Agent) 는 어떤 차이가 있는것인지 알려줘."

messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": ask_plan_query_writer(topic, content)},
    ]

response_query = generate_response(messages)

```json
{
  "queries": [
    "MCP 프로토콜 A2A 에이전트 비교 분석 2024-2025",
    "모델 컨텍스트 프로토콜(MCP) 기술 동향 및 활용 사례",
    "A2A (Agent to Agent) 통신 프로토콜 최신 동향 및 MCP 와의 차이점"
  ]
}
```

inference time: 6.89265 sec 




In [195]:
if "```json" in response_query:
    response_query = response_query.split("```json")[1].strip()
    response_query = response_query.split("```")[0].strip()
json_data = json.loads(response_query)
queries = json_data['queries']

print("사용자 발화 기반으로 추출한 web query 문장 3건:")
print(queries)

search_tasks = []
req_topic = 'general' # news   gerneral 과 news 중 선택
req_num_result = 3    # 각 web query 에 대해 리턴할 site 개수
include_raw = False    # site 의 원본 컨텐츠 리턴 유무

ask_tavily(queries, search_tasks, req_num_result, include_raw, req_topic)
print(search_tasks)

사용자 발화 기반으로 추출한 web query 문장 3건:
['MCP 프로토콜 A2A 에이전트 비교 분석 2024-2025', '모델 컨텍스트 프로토콜(MCP) 기술 동향 및 활용 사례', 'A2A (Agent to Agent) 통신 프로토콜 최신 동향 및 MCP 와의 차이점']
Thread: MCP 프로토콜 A2A 에이전트 비교 분석 2024-2025
Thread: 모델 컨텍스트 프로토콜(MCP) 기술 동향 및 활용 사례
Thread: A2A (Agent to Agent) 통신 프로토콜 최신 동향 및 MCP 와의 차이점

ask_tavily task running time: 2.91초 

[{'query': 'MCP 프로토콜 A2A 에이전트 비교 분석 2024-2025', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'title': 'Mcp 시장 지도 완벽 분석: 2025년 모델 컨텍스트 프로토콜 생태계 총정리', 'url': 'https://dma-ai.kr/81', 'content': 'MCP(Model Context Protocol)는 AI 모델과 에이전트가 다양한 도구 및 서비스와 원활하게 상호작용할 수 있도록 하는 표준화된 프로토콜입니다. 1. Top MCP Clients (주요 MCP 클라이언트) 2. Top MCP Servers (주요 MCP 서버) 특징: MCP 애플리케이션의 자원 관리 및 최적화를 위한 도구입니다. 특징: MCP 생태계 내의 통합 및 커뮤니케이션을 관리하는 도구입니다. 주요 MCP 생태계 참여자 클라이언트 | Cursor | https://www.cursor.com | AI 기반 코드 에디터 | Anthropic | https://www.anthropic.com | AI 안전 및 연구 회사 | OpenTools | https://opentools.com | AI 도구 디렉토리 MCP 시장 동향 및 전망 주요 MCP 클라이언트 주요 MCP 서버 

In [196]:
print("\n\n=================================================================\n")
messages.append(
    {"role": "assistant", "content": " ".join(queries)})
messages.append(
    {"role": "user", "content": ask_final_writer_instructions(topic, content, search_tasks)}
)
response_query = generate_response(messages)

print("\n\n=========================  Search Report  ========================================\n")
display(Markdown(response_query))




# MCP와 A2A: 차세대 AI 에이전트 생태계를 위한 핵심 프로토콜

최근 AI 기술 발전과 함께 에이전트 간의 협업 및 정보 교환의 중요성이 부각되고 있습니다. 이러한 요구에 발맞춰 MCP(Model Context Protocol)와 A2A(Agent to Agent)라는 두 가지 프로토콜이 등장하며 AI 에이전트 생태계를 혁신하고 있습니다. MCP는 AI 모델과 도구 간의 연결을 표준화하여 효율적인 상호작용을 가능하게 하는 반면, A2A는 에이전트들이 직접 '행동 단위'로 상호작용할 수 있는 환경을 제공합니다. 본 섹션에서는 MCP와 A2A의 차이점을 심층적으로 분석하고, 각 프로토콜의 특징과 미래 전망을 살펴봅니다.

## 결론/요약

MCP와 A2A는 모두 AI 에이전트 생태계 발전에 기여하는 중요한 프로토콜이지만, 접근 방식과 목표에서 뚜렷한 차이를 보입니다. MCP는 AI 모델과 도구 간의 맥락 공유 및 자원 관리에 초점을 맞추는 반면, A2A는 에이전트 간의 직접적인 협업과 상호운용성을 강조합니다.

| 특징 | MCP (Model Context Protocol) | A2A (Agent to Agent) |
|---|---|---|
| **초점** | 맥락 공유, 자원 관리 | 에이전트 간 협업, 상호운용성 |
| **통신 방식** | 클라이언트-서버 모델 | 다중 에이전트 관리, 태스크 중심 |
| **주요 특징** | 통합 및 커뮤니케이션 관리 | 모듈성, 툴 재사용, 캐싱 |
| **생태계** | 다양한 클라이언트와 서버 존재 | 구글 중심, 초기 단계 |
| **활용 분야** | 코드 에디터, AI 안전 및 연구, AI 도구 디렉토리 | 에이전트 간 협업, LLM 기능 확장, 도구 연결 |

두 프로토콜은 상호 보완적인 관계를 가질 수 있습니다. MCP를 통해 AI 모델과 도구를 효율적으로 연결하고 관리하며, A2A를 통해 에이전트들이 서로 협력하여 복잡한 작업을 수행할 수 있습니다. 미래에는 MCP와 A2A 기반 인프라가 결합된 

# MCP와 A2A: 차세대 AI 에이전트 생태계를 위한 핵심 프로토콜

최근 AI 기술 발전과 함께 에이전트 간의 협업 및 정보 교환의 중요성이 부각되고 있습니다. 이러한 요구에 발맞춰 MCP(Model Context Protocol)와 A2A(Agent to Agent)라는 두 가지 프로토콜이 등장하며 AI 에이전트 생태계를 혁신하고 있습니다. MCP는 AI 모델과 도구 간의 연결을 표준화하여 효율적인 상호작용을 가능하게 하는 반면, A2A는 에이전트들이 직접 '행동 단위'로 상호작용할 수 있는 환경을 제공합니다. 본 섹션에서는 MCP와 A2A의 차이점을 심층적으로 분석하고, 각 프로토콜의 특징과 미래 전망을 살펴봅니다.

## 결론/요약

MCP와 A2A는 모두 AI 에이전트 생태계 발전에 기여하는 중요한 프로토콜이지만, 접근 방식과 목표에서 뚜렷한 차이를 보입니다. MCP는 AI 모델과 도구 간의 맥락 공유 및 자원 관리에 초점을 맞추는 반면, A2A는 에이전트 간의 직접적인 협업과 상호운용성을 강조합니다.

| 특징 | MCP (Model Context Protocol) | A2A (Agent to Agent) |
|---|---|---|
| **초점** | 맥락 공유, 자원 관리 | 에이전트 간 협업, 상호운용성 |
| **통신 방식** | 클라이언트-서버 모델 | 다중 에이전트 관리, 태스크 중심 |
| **주요 특징** | 통합 및 커뮤니케이션 관리 | 모듈성, 툴 재사용, 캐싱 |
| **생태계** | 다양한 클라이언트와 서버 존재 | 구글 중심, 초기 단계 |
| **활용 분야** | 코드 에디터, AI 안전 및 연구, AI 도구 디렉토리 | 에이전트 간 협업, LLM 기능 확장, 도구 연결 |

두 프로토콜은 상호 보완적인 관계를 가질 수 있습니다. MCP를 통해 AI 모델과 도구를 효율적으로 연결하고 관리하며, A2A를 통해 에이전트들이 서로 협력하여 복잡한 작업을 수행할 수 있습니다. 미래에는 MCP와 A2A 기반 인프라가 결합된 에이전트 제품 마켓플레이스가 등장하여 더욱 강력하고 유연한 AI 시스템을 구축할 수 있을 것으로 기대됩니다.

**출처:**

*   [Mcp 시장 지도 완벽 분석: 2025년 모델 컨텍스트 프로토콜 생태계 총정리](https://dma-ai.kr/81)
*   [A2a] Ai 에이전트의 공용 언어, A2a가 여는 협업의 미래](https://infogalaxy.co.kr/entry/A2A-Agent2agent-AI-에이전트)
*   [A2A vs MCP: 새로운 에이전트 생태계를 위한 두 개의 보완적 프로토콜 · Logto 블로그](https://blog.logto.io/ko/a2a-mcp)
*   [구글의 A2a, Ai 에이전트 시대의 새로운 표준이 될까? - Mcp와의 비교부터 실제 사용 예시까지 한눈에 정리](https://digitalbourgeois.tistory.com/1039)
*   [Meet Google A2A: The Protocol That will Revolutionize Multi-Agent AI ...](https://medium.com/@the_manoj_desai/meet-google-a2a-the-protocol-that-will-revolutionize-multi-agent-ai-systems-80d55a4583ed)
*   [AI 실시간 통합을 위한 핵심 기술, MCP(Model Context Protocol) 완벽 가이드](https://the-see.tistory.com/195)
*   [Model Context Protocol(MCP) 완벽 가이드: AI 애플리케이션 개발 표준화하기](https://dma-ai.kr/77)

**다음 단계:**

AI 개발자는 MCP와 A2A의 특징을 이해하고 자신의 프로젝트에 적합한 프로토콜을 선택해야 합니다. 또한, 두 프로토콜을 결합하여 더욱 강력하고 유연한 AI 시스템을 구축하는 방안을 모색해야 할 것입니다.

# Web RAG

In [72]:
from openai import OpenAI
import math
import time
import json

client = OpenAI(
    base_url="http://localhost:8877/v1",
    api_key="token-abc123",
)

In [86]:
def generate_response(message_list):
    completion = client.chat.completions.create(
        model = "ISTA-DASLab/gemma-3-27b-it-GPTQ-4b-128g",
        messages = message_list,
        max_tokens=1024,
        frequency_penalty=0.3,
        temperature=0.6,
        stream=True,
    )

    final_answer = []
    assistant_response = ""

    start = time.time()

    # 스트림 모드에서는 completion.choices 를 반복문으로 순회
    for chunk in completion:
        chunk_content = chunk.choices[0].delta.content

        if isinstance(chunk_content, str):
            final_answer.append(chunk_content)
            # 토큰 단위로 실시간 답변 출력
            print(chunk_content, end="")
            assistant_response += chunk_content

    end = time.time()
    print(f"\n\ninference time: {end - start:.5f} sec \n\n")
    return assistant_response

In [87]:
message_list = [{"role": "system", "content": "당신은 유저의 질문에 최대한 정확하고 풍부한 정보를 전달하는 assistant 이다. 답변은 항상 한국어로 공손하게 답변해줘."}]

while True:
    user_prompt = input("USER > ")
    if user_prompt.lower() == "quit":
        break
    message_list.append({"role": "user", "content": user_prompt})

    assistant = generate_response(message_list)
    message_list.append({"role": "assistant", "content": assistant})

USER > ㅂㅈㅇㅂㅈ
죄송합니다만, "ㅂㅈㅇㅂㅈ"라는 표현이 무슨 의미인지 정확히 파악하지 못했습니다. 혹시 오타가 있거나 다른 의도로 사용하신 표현일까요? 

만약 특정 주제나 질문이 있으시다면, 좀 더 자세하게 설명해주시거나 다른 단어로 표현해주시면 제가 최대한 정확하고 풍부한 정보를 전달해 드리도록 노력하겠습니다. 

또한, 부적절하거나 불쾌감을 줄 수 있는 표현은 사용하지 않도록 주의 부탁드립니다.

inference time: 7.76852 sec 


USER > quit


In [88]:
system_prompt = "You are a helpful assistant. And Answers must be in Korean."
user_prompt = """프로 테니스 대회에서 테니스 공은 한번에 6개를 사용합니다. 이 6개의 공을 처음에는 게임 수의 합이 7게임, 다음부터는 9게임마다 새 공으로 교체를 합니다.
만일 3세트 경기가 6:5 3:6 6:4 로 진행됐다고 하면 총 몇 개의 공을 사용했을까요?
답:
각 세트마다 게임 수를 더하면 11+9+10 = 30 으로 총 30게임이 진행됐습니다.
테니스 공은 7번째 교체 후 9번째 게임마다 교체되니 7,16,25 게임에 총 3회에 교체 됩니다.
최종적으로 경기시작 시 사용한 공 6개 + 교체 시 마다 6개의 새 공으로 교체 했으니 6 + (6 * 3) = 24, 사용된 공은 총 24개 입니다.

질문:
아마추어 테니스 대회에서는 테니스공을 한번에 2개 사용합니다. 그리고 이 2개의 공을 처음에는 게임 수의 합이 7게임, 다음부터는 9게임마다 새공으로 교체를 합니다.
만일 3세트 경기가 6:5 5:7 6:7 로 진행됐다고 하면 총 몇 개의 공을 사용했을까요?
"""

In [89]:
messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt},
    ]
generate_response(messages)

답:

각 세트마다 게임 수를 더하면 11+12+13 = 36 으로 총 36게임이 진행되었습니다.
테니스 공은 7번째 교체 후 9번째 게임마다 교체되니 7, 16, 25, 34 게임에 총 4회에 교체 됩니다.
최초 경기 시작 시 사용한 공 2개 + 교체 시 마다 2개의 새 공으로 교체 했으니 2 + (2 * 4) = 10, 사용된 공은 총 10개 입니다.


inference time: 9.94419 sec 




'답:\n\n각 세트마다 게임 수를 더하면 11+12+13 = 36 으로 총 36게임이 진행되었습니다.\n테니스 공은 7번째 교체 후 9번째 게임마다 교체되니 7, 16, 25, 34 게임에 총 4회에 교체 됩니다.\n최초 경기 시작 시 사용한 공 2개 + 교체 시 마다 2개의 새 공으로 교체 했으니 2 + (2 * 4) = 10, 사용된 공은 총 10개 입니다.\n'

# Deep Search

In [90]:
from tavily import TavilyClient
import asyncio, os, requests, time, json
import threading, queue
from IPython.display import display, Markdown, Latex

tavily_client = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])

In [107]:
from openai import OpenAI
import math
import time
import json

client = OpenAI(
    base_url="http://localhost:8877/v1",
    api_key="token-abc123",
)

In [108]:
from pydantic import BaseModel, Field
import operator

class Section(BaseModel):
    name: str = Field(
        description="Name for this section of the report.",
    )
    description: str = Field(
        description="Brief overview of the main topics and concepts to be covered in this section.",
    )
    research: bool = Field(
        description="Whether to perform web research for this section of the report."
    )
    content: str = Field(
        description="The content of the section."
    )
    search_query: str = Field(None, description="Query for web search.")
    query_content: str = Field(None, description="Content of web search.")
    section_content: str = Field(None, description="Content of section.")

In [109]:
def generate_response(message_list):
    completion = client.chat.completions.create(
        model = "ISTA-DASLab/gemma-3-27b-it-GPTQ-4b-128g",
        messages = message_list,
        max_tokens=2048,
        frequency_penalty=0.3,
        temperature=0.6,
        stream=True,
    )

    final_answer = []
    assistant_response = ""

    start = time.time()

    # 스트림 모드에서는 completion.choices 를 반복문으로 순회
    for chunk in completion:
        chunk_content = chunk.choices[0].delta.content

        if isinstance(chunk_content, str):
            final_answer.append(chunk_content)
            # 토큰 단위로 실시간 답변 출력
            print(chunk_content, end="")
            assistant_response += chunk_content

    end = time.time()
    print(f"\n\ninference time: {end - start:.5f} sec \n\n")
    return assistant_response

In [110]:
def report_planner_instructions(topic, report_organization, context, feedback):
    planner_writer="""You are performing research for a report.
<Report topic>
""" + topic + """
</Report topic>

<Report organization>
""" + report_organization + """
</Report organization>

<Context>
Here is context to use to plan the sections of the report:
""" + context + """
</Context>

<Task>
Generate a list of sections for the report. Your plan should be tight and focused with NO overlapping sections or unnecessary filler.

For example, a good report structure might look like:
1/ intro
2/ overview of topic A
3/ overview of topic B
4/ comparison between A and B
5/ conclusion

Each section should have the fields:

- Name - Name for this section of the report.
- Description - Brief overview of the main topics covered in this section.
- Research - Whether to perform web research for this section of the report.
- Content - The content of the section, which you will leave blank for now.

Integration guidelines:
- Include examples and implementation details within main topic sections, not as separate sections
- Ensure each section has a distinct purpose with no content overlap
- Combine related concepts rather than separating them

Before submitting, review your structure to ensure it has no redundant sections and follows a logical flow.
</Task>

<Feedback>
Here is feedback on the report structure from review (if any):
""" + feedback + """
</Feedback>

Note1. that today's date is """+time.strftime("%Y-%m-%d")+""".
Note2. Output your response in JSON format, with the following structure: { "sections": [ "section1", "section2", "section3" ] }
Only output in JSON format when generating responses. Never include additional phrases such as "here is content in JSON format".
"""

    return planner_writer

In [111]:
def report_query_writer(topic, report_organization, num_queries):
    llm_prompt = """You are performing research for a report.

<Report topic>
""" + topic + """
</Report topic>

<Report organization>
""" + report_organization + """
</Report organization>

<Task>
Your goal is to generate """ + num_queries + """ web search queries that will help gather information for planning the report sections.

The queries should:

1. Be related to the Report topic
2. Help satisfy the requirements specified in the report organization

Make the queries specific enough to find high-quality, relevant sources while covering the breadth needed for the report structure.

Note1. that today's date is """+time.strftime("%Y-%m-%d")+""".
Note2. Output your response in JSON format, with the following structure: { "queries": [ "query1", "query2", "query3" ] }
Only output in JSON format when generating responses. Never include additional phrases such as "here is content in JSON format".
</Task>
"""

    return llm_prompt

In [112]:
def section_writer_inputs(topic, section_name, section_topic, context):
    section_writer_prompt="""
<Report topic>
""" + topic + """
</Report topic>

<Section name>
""" + section_name + """
</Section name>

<Section topic>
""" + section_topic + """
</Section topic>

<Source material>
""" + context + """
</Source material>
"""
    return section_writer_prompt

In [113]:
def final_section_writer_instructions(topic, section_name, section_topic, context):
    final_writer_prompt="""You are an expert technical writer crafting a section that synthesizes information from the rest of the report.

<Report topic>
""" + topic + """
</Report topic>

<Section name>
""" + section_name + """
</Section name>

<Section topic>
""" + section_topic + """
</Section topic>

<Available report content>
""" + context + """
</Available report content>

<Task>
1. Section-Specific Approach:

For Introduction:
- Use # for report title (Markdown format)
- 50-100 word limit
- Write in simple and clear language
- Focus on the core motivation for the report in 1-2 paragraphs
- Use a clear narrative arc to introduce the report
- Include NO structural elements (no lists or tables)
- No sources section needed

For Conclusion:
- Use ## for section title (Markdown format)
- 200-300 word limit
- For comparative reports:
    * Must include a focused comparison table using Markdown table syntax
    * Table should distill insights from the report
    * Keep table entries clear and concise
- For non-comparative reports:
    * Only use ONE structural element IF it helps distill the points made in the report:
    * Either a focused table comparing items present in the report (using Markdown table syntax)
    * Or a short list using proper Markdown list syntax:
      - Use `*` or `-` for unordered lists
      - Use `1.` for ordered lists
      - Ensure proper indentation and spacing
- End with specific next steps or implications
- No sources section needed

2. Writing Approach:
- Use concrete details over general statements
- Make every word count
- Focus on your single most important point
</Task>

<Quality Checks>
- For introduction: 50-100 word limit, # for report title, no structural elements, no sources section
- For conclusion: 200-300 word limit, ## for section title, only ONE structural element at most, no sources section
- Markdown format
- Do not include word count or any preamble in your response
</Quality Checks>

Please note that respond in Korean always."""

    return final_writer_prompt

In [114]:
report_organization = """Use this structure to create a report on the user-provided topic:

1. Introduction (no research needed)
   - Brief overview of the topic area

2. Main Body Sections:
   - Each section should focus on a sub-topic of the user-provided topic

3. Conclusion
   - Aim for 1 structural element (either a list of table) that distills the main body sections
   - Provide a concise summary of the report"""

In [166]:
section_writer_instructions = """Write one section of a research report.

<Task>
1. Review the report topic, section name, and section topic carefully.
2. If present, review any existing section content.
3. Then, look at the provided Source material.
4. Decide the sources that you will use it to write a report section.
5. Write the report section and list your sources.
</Task>

<Writing Guidelines>
- If existing section content is not populated, write from scratch
- If existing section content is populated, synthesize it with the source material
- Strict 150-200 word limit
- Use simple, clear language
- Use short paragraphs (2-3 sentences max)
- Use ## for section title (Markdown format)
</Writing Guidelines>

<Citation Rules>
- Assign each unique URL a single citation number in your text
- End with ### Sources that lists each source with corresponding numbers
- IMPORTANT: Number sources sequentially without gaps (1,2,3,4...) in the final list regardless of which sources you choose
- Example format:
  [1] Source Title: URL
  [2] Source Title: URL
</Citation Rules>

<Final Check>
1. Verify that EVERY claim is grounded in the provided Source material
2. Confirm each URL appears ONLY ONCE in the Source list
3. Verify that sources are numbered sequentially (1,2,3...) without any gaps
</Final Check>
"""


In [167]:
def worker(query, search_result, req_num_result, include_raw, req_topic):
    print(f"Thread: {query}")
    search_result.append(
        tavily_client.search(
            query,
            max_results= req_num_result,
            include_raw_content= include_raw,
            topic= req_topic
        )
    )

In [168]:
def ask_tavily(search_queries, search_tasks, req_num_result, include_raw, req_topic, opt_print=True):
    if opt_print:
        print("\nRun ask_tavily task: \n")

    threads = []
    start_time = time.time()

    for query in search_queries:
        t = threading.Thread(target=worker, args=(query, search_tasks, req_num_result, include_raw, req_topic))
        threads.append(t)
        t.start()

    for thread in threads:
        thread.join()

    end_time = time.time()
    execution_time = end_time - start_time

    if opt_print:
        print(f"\nask_tavily task running time: {execution_time:.2f}초 \n")

In [169]:
def deduplicate_and_format_sources(search_response, max_tokens_per_source, include_raw_content=True):
     # Collect all results
    sources_list = []
    for response in search_response:
        sources_list.extend(response['results'])

    # Deduplicate by URL
    unique_sources = {source['url']: source for source in sources_list}

    # Format output
    formatted_text = "Content from sources:\n"
    for i, source in enumerate(unique_sources.values(), 1):
        formatted_text += f"{'='*80}\n"  # Clear section separator
        formatted_text += f"Source: {source['title']}\n"
        formatted_text += f"{'-'*80}\n"  # Subsection separator
        formatted_text += f"URL: {source['url']}\n===\n"
        formatted_text += f"Most relevant content from source: {source['content']}\n===\n"
        if include_raw_content:
            # Using rough estimate of 4 characters per token
            char_limit = max_tokens_per_source * 2
            # Handle None raw_content
            raw_content = source.get('raw_content', '')
            if raw_content is None:
                raw_content = ''
                print(f"Warning: No raw_content found for source {source['url']}")
            if len(raw_content) > char_limit:
                raw_content = raw_content[:char_limit] + "... [truncated]"
            formatted_text += f"Full source content limited to {max_tokens_per_source} tokens: {raw_content}\n\n"
        formatted_text += f"{'='*80}\n\n" # End section separator

    return formatted_text.strip()

In [170]:
def web_search_worker(section: Section, opt_print=False):
    print(f"Thread: {section}")

    if section.research:
        section_query_prompt = report_query_writer(section.name, section.description, "3")

        messages = [
            {"role": "system", "content": section_query_prompt},
            {"role": "user", "content": "Generate search queries on the provided topic."},
        ]

        response_section_queries = generate_response(messages)

        json_data = json.loads(response_section_queries)
        queries = json_data['queries']

        section.search_query = queries

        search_tasks = []
        req_topic = 'general' # news   gerneral 과 news 중 선택
        req_num_result = 2    # 각 web query 에 대해 리턴할 site 개수
        include_raw = True    # site 의 원본 컨텐츠 리턴 유무

        ask_tavily(queries, search_tasks, req_num_result, include_raw, req_topic, opt_print)
        source_str = deduplicate_and_format_sources(search_tasks, max_tokens_per_source=2000, include_raw_content=True)
        section.query_content = source_str

        messages = [
            {"role": "system", "content": section_writer_instructions},
            {"role": "user", "content": section_writer_inputs(topic, section.name, section.description, source_str)},
        ]
        section.section_content = generate_response(messages)


In [210]:
def final_section_writer_worker(section: Section, opt_print=True):
    user_prompt = "Generate a report section based on the provided sources."
    final_section_writer_instructions(topic, section.name, section.description, source_str)

    messages = [
        {"role": "system", "content": final_section_writer_instructions},
        {"role": "user", "content": user_prompt}
    ]

    section.section_content = generate_response(messages)

In [211]:
topic = "MCP(model context protocol) 과 A2A(Agent to Agent) 는 어떤 차이가 있는것인지 알려줘."
num_queries = "3"
model_id = 102
report_planner_query_prompt = report_query_writer(topic, report_organization, num_queries)

In [212]:
user_prompt = "Generate search queries that will help with planning the sections of the report."
messages = [
    {"role": "system", "content": report_planner_query_prompt},
    {"role": "user", "content": user_prompt}
]

response_query = generate_response(messages)
if "```json" in response_query:
    response_query = response_query.split("```json")[1].strip()
    response_query = response_query.split("```")[0].strip()
json_data = json.loads(response_query)
queries = json_data['queries']

print("사용자 발화 기반으로 추출한 web query 문장 3건:")
print(queries)

```json
{
  "queries": [
    "MCP (Model Context Protocol) vs A2A (Agent to Agent) technical comparison 2024-2025",
    "A2A Agent to Agent communication frameworks and use cases",
    "MCP protocol implementation details and limitations agent interaction"
  ]
}
```

inference time: 5.48270 sec 


사용자 발화 기반으로 추출한 web query 문장 3건:
['MCP (Model Context Protocol) vs A2A (Agent to Agent) technical comparison 2024-2025', 'A2A Agent to Agent communication frameworks and use cases', 'MCP protocol implementation details and limitations agent interaction']


In [228]:
search_tasks = []
req_topic = 'general' # news   gerneral 과 news 중 선택
req_num_result = 2    # 각 web query 에 대해 리턴할 site 개수
include_raw = False    # site 의 원본 컨텐츠 리턴 유무

ask_tavily(queries, search_tasks, req_num_result, include_raw, req_topic)
source_str = deduplicate_and_format_sources(search_tasks, max_tokens_per_source=2000, include_raw_content=False)


Thread: MCP (Model Context Protocol) vs A2A (Agent to Agent) technical comparison 2024-2025
Thread: A2A Agent to Agent communication frameworks and use cases
Thread: MCP protocol implementation details and limitations agent interaction

ask_tavily task running time: 2.69초 



In [229]:
feedback = ""
planner_writer_prompt = report_planner_instructions(topic, report_organization, source_str, feedback)
print(planner_writer_prompt)

You are performing research for a report.
<Report topic>
MCP(model context protocol) 과 A2A(Agent to Agent) 는 어떤 차이가 있는것인지 알려줘.
</Report topic>

<Report organization>
Use this structure to create a report on the user-provided topic:

1. Introduction (no research needed)
   - Brief overview of the topic area

2. Main Body Sections:
   - Each section should focus on a sub-topic of the user-provided topic

3. Conclusion
   - Aim for 1 structural element (either a list of table) that distills the main body sections
   - Provide a concise summary of the report
</Report organization>

<Context>
Here is context to use to plan the sections of the report:
Content from sources:
Source: MCP Agents: The Open Standard Revolutionizing Context-Aware AI
--------------------------------------------------------------------------------
URL: https://www.luseratech.com/ai/mcp-agents-the-open-standard-revolutionizing-context-aware-ai
===
Most relevant content from source: The Model Context Protocol (MCP) Age

In [230]:
plan_user_prompt = """Generate the sections of the report. Your response must include a 'sections' field containing a list of sections.
                      Each section must have: name, description, research and contentfields.
                      Content must filled, not would be None.
                      You must not add anything other than these fields under any circumstances."""

messages = [
    {"role": "system", "content": planner_writer_prompt},
    {"role": "user", "content": plan_user_prompt}
]

response_planner = generate_response(messages)
if "```json" in response_planner:
    response_planner = response_planner.split("```json")[1].strip()
    response_planner = response_planner.split("```")[0].strip()
json_planner_data = json.loads(response_planner)

```json
{
  "sections": [
    {
      "name": "Introduction",
      "description": "A brief overview of Model Context Protocol (MCP) and Agent to Agent (A2A) protocols, highlighting their emergence in the field of AI tooling and their significance for interoperability.",
      "research": false,
      "content": "The rapid development of agentic AI necessitates standardized communication and context provision. Two key protocols addressing this need are Model Context Protocol (MCP) and Agent to Agent (A2A). MCP focuses on enabling AI models to access and utilize external context, while A2A focuses on facilitating communication between independent AI agents. Both aim to resolve limitations in current AI systems, but approach the problem from different angles. This report will detail each protocol and highlight their key differences."
    },
    {
      "name": "Model Context Protocol (MCP) – Enabling Context-Aware AI",
      "description": "Detailed explanation of MCP's architecture, fun

In [231]:
plan_from_llm = json_planner_data['sections']
print(json.dumps(plan_from_llm, indent=4))

[
    {
        "name": "Introduction",
        "description": "A brief overview of Model Context Protocol (MCP) and Agent to Agent (A2A) protocols, highlighting their emergence in the field of AI tooling and their significance for interoperability.",
        "research": false,
        "content": "The rapid development of agentic AI necessitates standardized communication and context provision. Two key protocols addressing this need are Model Context Protocol (MCP) and Agent to Agent (A2A). MCP focuses on enabling AI models to access and utilize external context, while A2A focuses on facilitating communication between independent AI agents. Both aim to resolve limitations in current AI systems, but approach the problem from different angles. This report will detail each protocol and highlight their key differences."
    },
    {
        "name": "Model Context Protocol (MCP) \u2013 Enabling Context-Aware AI",
        "description": "Detailed explanation of MCP's architecture, functional

In [232]:
report_sections = []

for part in plan_from_llm:
    section = Section(
        name=part['name'],
        description=part['description'],
        content=part['content'],
        research=part['research']
    )
    report_sections.append(section)

In [233]:
for section in report_sections:
    print(f'{section} \n')

name='Introduction' description='A brief overview of Model Context Protocol (MCP) and Agent to Agent (A2A) protocols, highlighting their emergence in the field of AI tooling and their significance for interoperability.' research=False content='The rapid development of agentic AI necessitates standardized communication and context provision. Two key protocols addressing this need are Model Context Protocol (MCP) and Agent to Agent (A2A). MCP focuses on enabling AI models to access and utilize external context, while A2A focuses on facilitating communication between independent AI agents. Both aim to resolve limitations in current AI systems, but approach the problem from different angles. This report will detail each protocol and highlight their key differences.' search_query=None query_content=None section_content=None 

name='Model Context Protocol (MCP) – Enabling Context-Aware AI' description="Detailed explanation of MCP's architecture, functionality, and purpose. Focus on how it br

In [234]:
start_time = time.time()
threads = []

for section in report_sections:
    t = threading.Thread(target=web_search_worker, args=(section, True,))
    threads.append(t)
    t.start()

for thread in threads:
    thread.join()

end_time = time.time()
execution_time = end_time - start_time

print(f"실행 시간: {execution_time:.2f}초")

Thread: name='Introduction' description='A brief overview of Model Context Protocol (MCP) and Agent to Agent (A2A) protocols, highlighting their emergence in the field of AI tooling and their significance for interoperability.' research=False content='The rapid development of agentic AI necessitates standardized communication and context provision. Two key protocols addressing this need are Model Context Protocol (MCP) and Agent to Agent (A2A). MCP focuses on enabling AI models to access and utilize external context, while A2A focuses on facilitating communication between independent AI agents. Both aim to resolve limitations in current AI systems, but approach the problem from different angles. This report will detail each protocol and highlight their key differences.' search_query=None query_content=None section_content=None
Thread: name='Model Context Protocol (MCP) – Enabling Context-Aware AI' description="Detailed explanation of MCP's architecture, functionality, and purpose. Focu

In [235]:
for section in report_sections:
    print("section.name: " + section.name)
    print("section.description: " + section.description)
    print("section.search_query: ")
    print(section.search_query)
    print("section.section_content: ")
    print(section.section_content)
    print("====================================")

section.name: Introduction
section.description: A brief overview of Model Context Protocol (MCP) and Agent to Agent (A2A) protocols, highlighting their emergence in the field of AI tooling and their significance for interoperability.
section.search_query: 
None
section.section_content: 
None
section.name: Model Context Protocol (MCP) – Enabling Context-Aware AI
section.description: Detailed explanation of MCP's architecture, functionality, and purpose. Focus on how it bridges the gap between LLMs and external resources.
section.search_query: 
None
section.section_content: 
None
section.name: Agent to Agent (A2A) – Facilitating Agent Collaboration
section.description: Detailed explanation of A2A's architecture, functionality, and purpose. Focus on how it enables communication between different AI agents.
section.search_query: 
None
section.section_content: 
None
section.name: MCP vs. A2A: Key Differences & Synergies
section.description: Direct comparison of MCP and A2A protocols highlig

In [244]:
import markdown
from IPython.display import display, HTML, Markdown

for section in report_sections:
    if section.content:
        display(Markdown(markdown.markdown(section.content)))
    # display(Markdown(section.section_content))

<p>The rapid development of agentic AI necessitates standardized communication and context provision. Two key protocols addressing this need are Model Context Protocol (MCP) and Agent to Agent (A2A). MCP focuses on enabling AI models to access and utilize external context, while A2A focuses on facilitating communication between independent AI agents. Both aim to resolve limitations in current AI systems, but approach the problem from different angles. This report will detail each protocol and highlight their key differences.</p>

<p>MCP is an open protocol designed to provide a standardized way for systems to offer context to AI models. It acts as a middleware layer, allowing models to retrieve and process external information from databases, files, and APIs. Unlike simply providing static data, MCP enables dynamic interaction with these resources. This is achieved through a client-server model where AI agents (clients) interact with MCP servers providing access to tools and data. The core purpose of MCP is to enable richer responses, deeper integrations, and more intelligent applications by providing models with real-time information and the ability to execute complex tasks autonomously.</p>

<p>A2A is an open protocol developed by Google intended to enable seamless collaboration between AI agents across diverse frameworks and vendors. It establishes a common language for agents irrespective of their underlying technology. A2A operates via an HTTP endpoint implemented by A2A Servers which expose methods defined in a JSON specification. The key goal is interoperability – allowing agents built with different frameworks (e.g., LangGraph, Google ADK) to communicate effectively without requiring custom integrations or understanding of each other's internal workings.</p>

<p>While both protocols contribute to advancing agentic AI capabilities, they address different aspects of the challenge. <strong>MCP</strong> primarily concerns itself with <em>how</em> a model accesses external resources – standardizing the interface for tools, databases &amp; APIs. It equips an agent with the ability to <em>utilize</em> information effectively. <strong>A2A</strong>, on the other hand, focuses on <em>who</em> an agent communicates with – establishing a common language for agent-to-agent interactions across different ecosystems. Essentially, A2A enables agents to <em>collaborate</em>, while MCP empowers them with contextual awareness. They aren’t mutually exclusive; in fact they can be complementary – an agent utilizing MCP for data access could then use A2A to share insights gained with other agents.</p>

<p>In conclusion, both MCP and A2A represent significant steps toward more interoperable and powerful AI systems.</p>
<p>| Feature | Model Context Protocol (MCP) | Agent to Agent (A2A) |
|---|---|---|
| <strong>Primary Focus</strong> | Standardized access to external context | Inter-agent communication |
| <strong>Functionality</strong> | Middleware layer for tool/data interaction | Protocol for agent collaboration |
| <strong>Key Benefit</strong> | Context-aware AI applications | Seamless interoperability between agents |
| <strong>Developed By</strong> | Anthropic | Google |</p>
<p>The future of AI tooling likely involves integration of both protocols—leveraging MCP for robust context provision within individual agents alongside A2A for effective collaboration between them.</p>