## Config

In [5]:
from openai import OpenAI

OPENAI_API_KEY = "insert_api_key"

client = OpenAI(
    api_key=OPENAI_API_KEY
)

## Extraction prompt

In [6]:
prompt = """Your task is to extract controlled airspace boundary data from a provided document.

Target sections: FIR, CTA, and TMA. The boundary of each section is expressed as a sequence of latitude/longitude coordinates and occasional strings (e.g., “AT-HU border”, “along the latitude 55 00 00N to”). Extract all such sections present in the document.

If the document does not contain any FIR/CTA/TMA sections, return an empty object.

For each detected section:
- name: Use the section title transformed to UPPERCASE with underscores instead of spaces/hyphens (e.g., “København FIR” → “KØBENHAVN_FIR”; “Aalborg CTA” → “AALBORG_CTA”; “BUDAPEST TMA 1” → “BUDAPEST_TMA_1”).
- type: One of "FIR", "CTA", "TMA".
- items: A SINGLE, ORDERED LIST of boundary items for that section. Preserve the original order across line breaks and separators (“-”).
- lower_limit and upper_limit: The lower and upper limit for the area (e.g., "FL 660" or "9500 FT ALT"), typically only present for CTA and TMA sections. Return empty strings if not present.

Item rules
1) Coordinates → capture exactly as written (e.g., "54 44 35N 010 10 00E" or "465209N 0160650E"). Do NOT convert, reformat, or split. Output each full coordinate as a single string item.
2) Strings →
   • Border labels: normalize country names to ISO alpha-2 and output as "XX-YY border" (case-insensitive; underscores/hyphens/spaces in the source are allowed). Examples:
       "Danish-German border" → "DK-DE border"
       "AUSTRIA_HUNGARY" → "AT-HU border"
       "HUNGARY_SLOVAKREPUBLIC" → "HU-SK border"
   • In case of unambiguous cross references in the same document, such as e.g., “Lateral limits as for Budapest FIR”, the items from the referenced section should be duplicated in place of that reference string.
   • All other phrases or unclear references (e.g., "along the latitude 55 00 00N to", "West of the lines from") must be preserved verbatim as string items without substitution.

Do not include headings beyond the area name, altitude/level notes, frequencies, or unrelated remarks.

Output strictly as JSON matching the schema below. No extra keys, no commentary.

Below are two examples of the kind of inputs you might receive, and how the output should be formatted:

Example 1 input:

KØBENHAVN FIR
54 44 35N 010 10 00E - 54 45 54N 010 03 13E -
Danish-German border - 55 04 09N 008 23 31E -
55 04 00N 008 20 00E - 55 00 00N 008 00 00E -
along the latitude 55 00 00N to
55 00 00N 005 00 00E - 57 00 00N 005 00 00E

UNL/GND G except other regulated ATS airspace.
Designated as RMZ above FL 95.
Note: RVSM airspace is established within the entire
København FIR from FL 290 to FL 410 inclusive

KØBENHAVN CTA
A.
Lateral limits as for FIR.
FL 660/FL 195 C
B.
Lateral limits as for the FIR east of a line from
57 12 38N 007 53 53E - 55 36 58N 008 08 55E to
55 00 00N 007 42 57E.
FL 195/3500 FT MSL E except other regulated ATS
airspace. Designated as RMZ above FL 95.

Example 1 output:

{
  "areas": [
    {
      "name": "KØBENHAVN_FIR",
      "type": "FIR",
      "items": [
        "54 44 35N 010 10 00E",
        "54 45 54N 010 03 13E",
        "DK-DE border",
        "55 04 09N 008 23 31E",
        "55 04 00N 008 20 00E",
        "55 00 00N 008 00 00E",
        "along the latitude 55 00 00N to",
        "55 00 00N 005 00 00E",
        "57 00 00N 005 00 00E"
      ],
      "lower_limit": "GND",
      "upper_limit": "UNL"
    },
    {
      "name": "KØBENHAVN_CTA_A",
      "type": "CTA",
      "items": [
        "Lateral limits as for FIR.",
      ],
      "lower_limit": "FL 195",
      "upper_limit": "FL 660"
    },
    {
      "name": "KØBENHAVN_CTA_B",
      "type": "CTA",
      "items": [
        "Lateral limits as for the FIR east of a line from",
        "57 12 38N 007 53 53E",
        "55 36 58N 008 08 55E",
        "55 00 00N 007 42 57E"
      ],
      "lower_limit": "3500 FT MSL",
      "upper_limit": "FL 195"
    }
  ]
}

---

Example 2 input (excerpt across three pages):

Page 1:
BUDAPEST FIR
465209N 0160650E along
border AUSTRIA_HUNGARY -
480024N 0170939E along
border
HUNGARY_SLOVAKREPUBLI
C - 482412N 0220919E along
border HUNGARY_UKRAINE -
475733N 0225422E along
border HUNGARY_ROMANIA -
460702N 0201602E along
border HUNGARY_SERBIA -
455515N 0185324E along
border CROATIA_HUNGARY -
462901N 0163358E along
border HUNGARY_SLOVENIA -
465209N 0160650E
FL 660
GND

Page 2:
Name
Lateral limits
Vertical limits
Class of airspace
BUDAPEST CTA
Lateral limits as for Budapest
FIR
FL 660
9500 FT ALT
C

Name
Lateral limits
Vertical limits
Class of airspace
Identification
of unit
providing
service
Call sign of
aeronautical station
Languages used
Area and conditions of
use
Frequencies
SATVOICE number
Purpose
Remarks
1 23 4 5
BUDAPEST TMA
For lateral and vertical limits see
BUDAPEST TMA PARTS table.
C
BUDAPEST
APP
BUDAPEST APPROACH
EN
122.975 MHZ Primary channel (also
usable by 8.33
exempted aircraft)
119.510 CH
123.860 CH
124.900 MHZ Standby channel
(also usable by 8.33
exempted aircraft)

Page 3:
BUDAPEST TMA PARTS
1
BUDAPEST TMA1
472011N 0181744E - 470220N 0182212E - 465337N 0190031E - 465726N 0185421E - 470324N 0184445E - 472011N
0181744E
FL 195
9500 FT ALT
C

Example 2 output:

{
  "areas": [
    {
      "name": "BUDAPEST_FIR",
      "type": "FIR",
      "items": [
        "465209N 0160650E",
        "AT-HU border",
        "480024N 0170939E",
        "HU-SK border",
        "482412N 0220919E",
        "HU-UA border",
        "475733N 0225422E",
        "HU-RO border",
        "460702N 0201602E",
        "HU-RS border",
        "455515N 0185324E",
        "HR-HU border",
        "462901N 0163358E",
        "HU-SI border",
        "465209N 0160650E"
      ],
      "lower_limit": "GND",
      "upper_limit": "FL 660"
    },
    {
      "name": "BUDAPEST_CTA",
      "type": "CTA",
      "items": [
        "465209N 0160650E",
        "AT-HU border",
        "480024N 0170939E",
        "HU-SK border",
        "482412N 0220919E",
        "HU-UA border",
        "475733N 0225422E",
        "HU-RO border",
        "460702N 0201602E",
        "HU-RS border",
        "455515N 0185324E",
        "HR-HU border",
        "462901N 0163358E",
        "HU-SI border",
        "465209N 0160650E"
      ],
      "lower_limit": "9500 FT ALT",
      "upper_limit": "FL 660"
    },
    {
      "name": "BUDAPEST_TMA1",
      "type": "TMA",
      "items": [
        "472011N 0181744E",
        "470220N 0182212E",
        "465337N 0190031E",
        "465726N 0185421E",
        "470324N 0184445E",
        "472011N 0181744E"
      ],
      "lower_limit": "9500 FT ALT",
      "upper_limit": "FL 195"
    }
  ]
}
"""

## API LLM Call

In [7]:
from typing import List, Literal
from pydantic import BaseModel

class Area(BaseModel):
    name: str                     # e.g., "KØBENHAVN_FIR", "AALBORG_CTA", "BUDAPEST_TMA_1"
    type: Literal["FIR","CTA","TMA"]
    items: List[str]              # ordered; raw coordinate strings and string labels/phrases
    lower_limit: str = ""
    upper_limit: str = ""

class Areas(BaseModel):
    areas: List[Area]


response = client.responses.parse(
    model="gpt-5",
    input=[
        {
            "role": "system",
            "content": prompt,
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "input_text",
                    "text": "Apply the extraction logic to the below document",
                },
                {
                    "type": "input_file",
                    "file_url": "https://aim.naviair.dk/media/files/qxymq5bewol/EK_ENR_2_1_en.pdf"
                },
            ],
        },
    ],
    text_format=Areas
)

result: Areas = response.output_parsed

In [24]:
for area in result.areas:
    print(f"Name: {area.name}")
    print(f"Coords: {area.items}")
    print(f"Lower limit: {area.lower_limit}")
    print(f"Upper limit: {area.upper_limit}")
    print("\n")

Name: KØBENHAVN_FIR
Coords: ['58 30 00N 010 30 00E', '56 12 53N 012 22 05E', 'DK-SE border', '55 20 12N 012 38 27E', '54 55 00N 012 51 00E', '54 27 00N 012 00 00E', '54 26 45N 011 50 00E', '54 27 50N 011 40 00E', '54 30 00N 011 30 00E', '54 33 15N 011 20 00E', '54 36 10N 011 10 00E', '54 38 40N 011 00 00E', '54 39 10N 010 50 00E', '54 39 20N 010 40 00E', '54 39 30N 010 30 00E', '54 42 00N 010 20 00E', '54 44 35N 010 10 00E', '54 45 54N 010 03 13E', 'DK-DE border', '55 04 09N 008 23 31E', '55 04 00N 008 20 00E', '55 00 00N 008 00 00E', 'along the latitude 55 00 00N to', '55 00 00N 005 00 00E', '57 00 00N 005 00 00E', 'along the latitude 57 00 00N to', '57 00 00N 007 30 00E', '58 30 00N 010 30 00E']
Lower limit: GND
Upper limit: UNL


Name: KØBENHAVN_CTA_A
Coords: ['58 30 00N 010 30 00E', '56 12 53N 012 22 05E', 'DK-SE border', '55 20 12N 012 38 27E', '54 55 00N 012 51 00E', '54 27 00N 012 00 00E', '54 26 45N 011 50 00E', '54 27 50N 011 40 00E', '54 30 00N 011 30 00E', '54 33 15N 011 20 

## Waypoint extraction

In [27]:
waypoint_prompt = """Your task is to extract waypoint object from a provided document.

For each detected section:
- name: Use the waypoint name transformed to UPPERCASE with underscores instead of spaces/hyphens 
- coordinate: A SINGLE COORDINATE for a given waypoint

Item rules
1) Coordinates → capture exactly as written (e.g., "54 44 35N 010 10 00E" or "465209N 0160650E"). Do NOT convert, reformat, or split. 
"""

In [28]:
waypoint_url = "https://aim.naviair.dk/media/files/qxymq5bewol/EK_ENR_2_1_en.pdf"

In [40]:
from typing import List, Literal, Set
from pydantic import BaseModel

class Waypoint(BaseModel):
    name: str                     # e.g., "ABNED, ROMIN, UNKAR etc."
    coordinate: List[str]             # ordered; raw coordinate strings and string labels/phrases

class Waypoints(BaseModel):
    waypoints: List[Waypoint]


response = client.responses.parse(
    model="gpt-4o", # Switch to gpt-5 for better performance
    input=[
        {
            "role": "system",
            "content": waypoint_prompt,
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "input_text",
                    "text": "Apply the extraction logic to the below document",
                },
                {
                    "type": "input_file",
                    "file_url": waypoint_url
                },
            ],
        },
    ],
    text_format=Waypoints
)

result: Waypoints = response.output_parsed

In [50]:
result

Waypoints(waypoints=[Waypoint(name='KØBENHAVN_FIR', coordinate=['54 44 35N 010 10 00E']), Waypoint(name='SECTOR_A', coordinate=['5501N 01201E']), Waypoint(name='SECTOR_B', coordinate=['5501N 01201E']), Waypoint(name='SECTOR_C', coordinate=['5501N 01201E']), Waypoint(name='SECTOR_D', coordinate=['5501N 01201E']), Waypoint(name='SECTOR_E', coordinate=['5534N 01152E']), Waypoint(name='SECTOR_I', coordinate=['5456N 01046E']), Waypoint(name='SECTOR_L', coordinate=['5642N 00953E']), Waypoint(name='SECTOR_N', coordinate=['5540N 00809E']), Waypoint(name='SECTOR_UN', coordinate=['5540N 00809E']), Waypoint(name='SECTOR_V', coordinate=['5705N 00953E']), Waypoint(name='SECTOR_UV', coordinate=['5705N 00953E']), Waypoint(name='SECTOR_UA', coordinate=['5536N 01237E']), Waypoint(name='SECTOR_UC', coordinate=['5536N 01237E']), Waypoint(name='AALBORG_CTA', coordinate=['57 38 58N 010 28 55E']), Waypoint(name='AALBORG_TMA', coordinate=['57 07 18N 009 13 55E']), Waypoint(name='AARHUS_CTA', coordinate=['56 

In [51]:
for wp in result.waypoints:
    print(f"{wp.name} — {', '.join(wp.coordinate)}")

KØBENHAVN_FIR — 54 44 35N 010 10 00E
SECTOR_A — 5501N 01201E
SECTOR_B — 5501N 01201E
SECTOR_C — 5501N 01201E
SECTOR_D — 5501N 01201E
SECTOR_E — 5534N 01152E
SECTOR_I — 5456N 01046E
SECTOR_L — 5642N 00953E
SECTOR_N — 5540N 00809E
SECTOR_UN — 5540N 00809E
SECTOR_V — 5705N 00953E
SECTOR_UV — 5705N 00953E
SECTOR_UA — 5536N 01237E
SECTOR_UC — 5536N 01237E
AALBORG_CTA — 57 38 58N 010 28 55E
AALBORG_TMA — 57 07 18N 009 13 55E
AARHUS_CTA — 56 51 38N 010 28 55E
AARHUS_TMA — 56 25 28N 010 02 55E
BILLUND_CTA — 56 03 16.8N 009 29 55.4E
BILLUND_TMA — 56 03 16.8N 009 29 55.4E
KARUP_CTA — 56 38 28N 009 42 25E
KARUP_TMA — 56 21 18N 008 30 25E
KØBENHAVN_TMA — 55 59 06N 011 49 33E
ROSKILDE_TMA — 55 59 06N 011 49 33E
RØNNE_TMA — 55 17 26N 014 18 28E
SKRYDSTRUP_CTA — 55 29 58N 009 54 56E
SKRYDSTRUP_TMA — 55 09 28N 008 39 55E
SYLT_TMA — 55 10 00N 008 03 45E
