# LangChain Tool Demo: Inspecting Available ICON-D2 Forecast Files

This notebook demonstrates how a Large Language Model (LLM) can use
**tools** to retrieve *real-world metadata* that it cannot hallucinate:
available ICON-D2 2 m temperature forecast files from DWD Open Data.

Key idea:
- LLMs reason
- Tools provide ground truth

In [1]:
import os

# DWD proxy
os.environ["HTTP_PROXY"]  = "http://ofsquid.dwd.de:8080"
os.environ["HTTPS_PROXY"] = "http://ofsquid.dwd.de:8080"

# Optional but recommended
os.environ["http_proxy"]  = os.environ["HTTP_PROXY"]
os.environ["https_proxy"] = os.environ["HTTPS_PROXY"]

In [2]:
"""
Load environment variables and initialize the OpenAI-backed LLM.

This cell:
- loads variables from a .env file (using python-dotenv)
- checks that OPENAI_API_KEY is available
- initializes a deterministic ChatOpenAI model

Nothing is stored as memory here:
each LLM call remains stateless unless we explicitly pass context.
"""

from dotenv import load_dotenv
import os

from langchain_openai import ChatOpenAI

# Load .env file
load_dotenv()

# Verify API key
api_key = os.getenv("OPENAI_API_KEY")
assert api_key and api_key.strip(), "❌ OPENAI_API_KEY not found in environment"
print("✅ OPENAI_API_KEY loaded")

# Initialize LLM
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0
)

print("✅ LLM initialized")

✅ OPENAI_API_KEY loaded
✅ LLM initialized


In [3]:
"""
Imports required for tool definitions and data access.

This cell:
- imports HTTP and regex utilities
- imports typing for tool schemas
- imports the LangChain tool decorator

No LLM logic yet.
"""

import requests
import re
from typing import List

from langchain.tools import tool

In [4]:
"""
Tool: list_icon_d2_t2m_files

This tool accesses the DWD Open Data server and returns
all available ICON-D2 2m temperature GRIB2 filenames.

Responsibilities:
- HTTP access
- HTML parsing
- Returning raw filenames only

No interpretation is done here.
"""

@tool
def list_icon_d2_t2m_files() -> List[str]:
    """
    List available ICON-D2 2m temperature GRIB2 files
    from the DWD Open Data server.
    """
    url = "https://opendata.dwd.de/weather/nwp/icon-d2/grib/00/t_2m/"
    response = requests.get(url, timeout=10)
    response.raise_for_status()

    filenames = re.findall(
        r'icon-d2_germany_icosahedral_single-level_\d{10}_\d{3}_2d_t_2m\.grib2\.bz2',
        response.text
    )

    return sorted(set(filenames))

In [5]:
"""
Tool: extract_leadtimes

This tool parses forecast lead times from ICON-D2 filenames.

Responsibilities:
- Filename parsing
- Returning structured lead times (e.g. '000', '003', ...)

This keeps string parsing out of the LLM.
"""

@tool
def extract_leadtimes(filenames: List[str]) -> List[str]:
    """
    Extract forecast lead times (e.g. 000, 003, 006)
    from ICON-D2 filenames.
    """
    leadtimes = set()

    for name in filenames:
        match = re.search(r'_(\d{3})_2d_t_2m\.grib2\.bz2$', name)
        if match:
            leadtimes.add(match.group(1))

    return sorted(leadtimes)

In [6]:
"""
Bind tools to the LLM.

This step:
- exposes tool schemas to the LLM
- does NOT execute any tool
- does NOT create memory or state

The LLM can now decide to request tool calls.
"""

llm_with_tools = llm.bind_tools(
    [list_icon_d2_t2m_files, extract_leadtimes]
)

print("✅ Tools bound to LLM")

✅ Tools bound to LLM


In [7]:
"""
Ask a question that the LLM cannot answer from training data alone.

The correct answer depends on the *current* contents
of the DWD Open Data server.
"""

query = (
    "What ICON-D2 forecast lead times are currently available "
    "for 2m temperature?"
)

response = llm_with_tools.invoke(query)
response

AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 116, 'total_tokens': 132, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_c4585b5b9c', 'id': 'chatcmpl-CsozI4CvUDsqVWbVoT5GKIUxKZj5o', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--019b743d-7707-7351-a56e-0a210db17caf-0', tool_calls=[{'name': 'list_icon_d2_t2m_files', 'args': {}, 'id': 'call_QSMSR7A7HXzglZxitcBrHy5q', 'type': 'tool_call'}], usage_metadata={'input_tokens': 116, 'output_tokens': 16, 'total_tokens': 132, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

In [8]:
"""
Inspect tool calls requested by the LLM.

At this point:
- no tool has been executed yet
- the LLM only emits a structured request
"""

response.tool_calls

[{'name': 'list_icon_d2_t2m_files',
  'args': {},
  'id': 'call_QSMSR7A7HXzglZxitcBrHy5q',
  'type': 'tool_call'}]

In [9]:
"""
Manually execute the requested tool.

This keeps control in the Python runtime:
- the LLM does not execute code
- tools are called explicitly and safely
"""

tool_results = {}

for call in response.tool_calls:
    if call["name"] == "list_icon_d2_t2m_files":
        result = list_icon_d2_t2m_files.invoke(call["args"])
        tool_results["filenames"] = result
        print(f"Retrieved {len(result)} filenames")

Retrieved 49 filenames


In [10]:
"""
Feed tool output back to the LLM (correct content type).

Important rules:
- ToolMessage.content must be a STRING
- Structured data must be serialized explicitly (e.g. JSON)
- tool_call_id must match the original tool request
"""

import json
from langchain_core.messages import ToolMessage

# Take the first (and only) tool call
tool_call = response.tool_calls[0]

# Serialize filenames to JSON
filenames_json = json.dumps(tool_results["filenames"], indent=2)

tool_message = ToolMessage(
    name=tool_call["name"],
    content=filenames_json,
    tool_call_id=tool_call["id"],
)

# Invoke the LLM again with:
# - original AI message
# - tool result message
response2 = llm_with_tools.invoke(
    [
        response,
        tool_message,
    ]
)

response2

AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 1584, 'prompt_tokens': 1794, 'total_tokens': 3378, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 1664}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_c4585b5b9c', 'id': 'chatcmpl-CsozJXzKR8V5OX0xcb0UuEztvWAe4', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--019b743d-7bbf-76e3-914a-40a4df926c7c-0', tool_calls=[{'name': 'extract_leadtimes', 'args': {'filenames': ['icon-d2_germany_icosahedral_single-level_2025123100_000_2d_t_2m.grib2.bz2', 'icon-d2_germany_icosahedral_single-level_2025123100_001_2d_t_2m.grib2.bz2', 'icon-d2_germany_icosahedral_single-level_2025123100_002_2d_t_2m.grib2.bz2', 'icon-d2_germany_icosahedral_single-level_20

In [11]:
"""
The LLM now has filenames but still needs interpretation.

It should request the lead time extraction tool.
"""

response2.tool_calls


[{'name': 'extract_leadtimes',
  'args': {'filenames': ['icon-d2_germany_icosahedral_single-level_2025123100_000_2d_t_2m.grib2.bz2',
    'icon-d2_germany_icosahedral_single-level_2025123100_001_2d_t_2m.grib2.bz2',
    'icon-d2_germany_icosahedral_single-level_2025123100_002_2d_t_2m.grib2.bz2',
    'icon-d2_germany_icosahedral_single-level_2025123100_003_2d_t_2m.grib2.bz2',
    'icon-d2_germany_icosahedral_single-level_2025123100_004_2d_t_2m.grib2.bz2',
    'icon-d2_germany_icosahedral_single-level_2025123100_005_2d_t_2m.grib2.bz2',
    'icon-d2_germany_icosahedral_single-level_2025123100_006_2d_t_2m.grib2.bz2',
    'icon-d2_germany_icosahedral_single-level_2025123100_007_2d_t_2m.grib2.bz2',
    'icon-d2_germany_icosahedral_single-level_2025123100_008_2d_t_2m.grib2.bz2',
    'icon-d2_germany_icosahedral_single-level_2025123100_009_2d_t_2m.grib2.bz2',
    'icon-d2_germany_icosahedral_single-level_2025123100_010_2d_t_2m.grib2.bz2',
    'icon-d2_germany_icosahedral_single-level_2025123100_

In [12]:
"""
Execute the second tool: lead time extraction.

Again:
- explicit execution
- deterministic
- no hidden control flow
"""

leadtimes = extract_leadtimes.invoke(
    {"filenames": tool_results["filenames"]}
)

leadtimes


['000',
 '001',
 '002',
 '003',
 '004',
 '005',
 '006',
 '007',
 '008',
 '009',
 '010',
 '011',
 '012',
 '013',
 '014',
 '015',
 '016',
 '017',
 '018',
 '019',
 '020',
 '021',
 '022',
 '023',
 '024',
 '025',
 '026',
 '027',
 '028',
 '029',
 '030',
 '031',
 '032',
 '033',
 '034',
 '035',
 '036',
 '037',
 '038',
 '039',
 '040',
 '041',
 '042',
 '043',
 '044',
 '045',
 '046',
 '047',
 '048']

In [13]:
"""
Feed the second tool result (lead times) back to the LLM
and obtain the final grounded answer.

Important:
- ToolMessage requires tool_call_id
- Tool output must be serialized (JSON)
"""

import json
from langchain_core.messages import ToolMessage

# The LLM requested the second tool in response2
tool_call_2 = response2.tool_calls[0]

# Serialize lead times
leadtimes_json = json.dumps(leadtimes, indent=2)

tool_message_2 = ToolMessage(
    name=tool_call_2["name"],
    content=leadtimes_json,
    tool_call_id=tool_call_2["id"],
)

# Final LLM invocation with:
# - previous AI message
# - second tool result
final_response = llm_with_tools.invoke(
    [
        response2,
        tool_message_2,
    ]
)

print(final_response.content)


The forecast lead times extracted from the ICON-D2 filenames are as follows:

- 000
- 001
- 002
- 003
- 004
- 005
- 006
- 007
- 008
- 009
- 010
- 011
- 012
- 013
- 014
- 015
- 016
- 017
- 018
- 019
- 020
- 021
- 022
- 023
- 024
- 025
- 026
- 027
- 028
- 029
- 030
- 031
- 032
- 033
- 034
- 035
- 036
- 037
- 038
- 039
- 040
- 041
- 042
- 043
- 044
- 045
- 046
- 047
- 048

This indicates that the forecast covers a total of 49 lead times from 000 to 048 hours.
