For Nadiia: discover and query Derzhstat SDMX data with a CLI and a local LLM #21

aborruso · 2026-05-01T18:16:51Z

aborruso
May 1, 2026
Maintainer

Hi Nadiia,

I follow your work with great interest, and your recent LinkedIn post about building a local MCP server for Derzhstat really hit home for me. You described exactly the problems that motivated us to build opensdmx.

You hit every wall:

data is there, but hard to discover without insider knowledge
the exact dataflow ID (DF_PRICE_CHANGE_CONSUMER_GOODS_SERVICE) is invisible unless you reverse-engineer StatGPT's steps
404s and empty responses, with no explanation of why
and your conclusion: "you have to be a coder, a statistician, have a powerful machine to run it"

We're a small Italian open data association (onData) and we've been building an open source Python CLI — opensdmx — designed to lower exactly these barriers. I'd love for you to try it and tell us what you think.

Install the CLI

uv tool install opensdmx

(or pip install opensdmx if you prefer pip)

Try it on Derzhstat

Derzhstat is a built-in provider, so no endpoint configuration is needed. Let's start with your inflation question.

Step 1 — Discover the dataset

opensdmx search "price" --provider derzhstat

Output:

Search: price (8)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ df_id                                              ┃ df_description                                 ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ DF_PRICE_CHANGE_HOUSING_MARKET                     │ Changes in housing market prices               │
│ DF_PRICE_CHANGES_IMPORT                            │ Changes in import prices                       │
│ DF_PRICE_CHANGE_MANUFACTURER_INDUSTRIAL_PRODUCT    │ Price changes of manufacturers of industrial…  │
│ DF_PRICE_CHANGES_OF_SERVICE_PRODUCERS              │ Price changes of service producers             │
│ DF_AGRICULTURAL_PRODUCTS_AT_CONSTANT_PRICES        │ Agricultural products at constant prices       │
│ DF_CONSUMER_PRICES_FOR_NATURAL_GAS_AND_ELECTRICITY │ Prices for natural gas and electricity…        │
│ DF_PRICE_CHANGE_CONSUMER_GOODS_SERVICE             │ Changes in prices (tariffs) for consumer…      │
│ DF_PRICE_CHANGE_CONSTRUCTION                       │ Changes in construction prices                 │
└────────────────────────────────────────────────────┴────────────────────────────────────────────────┘

Note: on first run, the CLI downloads and caches the full Derzhstat dataflow catalog (~112 datasets). This takes a moment, but subsequent searches are instant — the catalog is cached locally and reused.

DF_PRICE_CHANGE_CONSUMER_GOODS_SERVICE — the dataset you had to reverse-engineer from StatGPT — appears right there, no insider knowledge required.

Step 2 — Inspect the dataflow structure

opensdmx info DF_PRICE_CHANGE_CONSUMER_GOODS_SERVICE --provider derzhstat

Output:

╭───────────────────────────── Dataset Info ─────────────────────────────╮
│ ID:          DF_PRICE_CHANGE_CONSUMER_GOODS_SERVICE                    │
│ Description: Changes in prices (tariffs) for consumer goods (services) │
╰────────────────────────────────────────────────────────────────────────╯
                        Dimensions
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ dimension_id        ┃ position ┃ description                          ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ INDICATOR           │ 0        │ Price changes indicators codelist    │
│ BASE_PERIOD         │ 1        │ Base period                          │
│ REGION              │ 2        │ KATOTTG                              │
│ GOODS_SERVICES_TYPE │ 3        │ Goods and services type              │
│ FREQ                │ 4        │ Frequency                            │
└─────────────────────┴──────────┴──────────────────────────────────────┘

Now you know the five dimensions and their order. The next step is to see which values are actually available for each one.

Step 3 — Explore available filters

opensdmx constraints DF_PRICE_CHANGE_CONSUMER_GOODS_SERVICE --provider derzhstat

Output:

              Constraints: DF_PRICE_CHANGE_CONSUMER_GOODS_SERVICE
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ dimension_id        ┃ n_values ┃ sample                                      ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ BASE_PERIOD         │       11 │ BASE_YEAR_2001, BASE_YEAR_2007, PREV_YEAR…  │
│ FREQ                │        2 │ A, M                                        │
│ GOODS_SERVICES_TYPE │      117 │ 0 (total CPI), 01 (food & beverages)…       │
│ INDICATOR           │        3 │ AVG_CONS_PRCS, CORE_INFL, INDEX_CONSUMPRICE │
│ REGION              │       28 │ UA00000000000000000 (national), regions…    │
└─────────────────────┴──────────┴─────────────────────────────────────────────┘

If you want to see the full list of codes and labels for a specific dimension — for example to understand what GOODS_SERVICES_TYPE code 0 actually means — add the dimension name to the command:

opensdmx constraints DF_PRICE_CHANGE_CONSUMER_GOODS_SERVICE GOODS_SERVICES_TYPE --provider derzhstat

Output (first rows):

   DF_PRICE_CHANGE_CONSUMER_GOODS_SERVICE / GOODS_SERVICES_TYPE (constrained)
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ id           ┃ name                                                          ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 0            │ Consumer price indices  ← the overall CPI total              │
│ 01           │ Food and non-alcoholic beverages                              │
│ 01_1         │ Bread and cereals                                             │
│ ...          │ (117 codes total, following the COICOP classification)        │
└──────────────┴───────────────────────────────────────────────────────────────┘

So 0 is the overall CPI — which is what we want. Same pattern works for any dimension.

Step 4 — Get the data

opensdmx get DF_PRICE_CHANGE_CONSUMER_GOODS_SERVICE \
  --provider derzhstat \
  --INDICATOR INDEX_CONSUMPRICE \
  --BASE_PERIOD PREV_YEAR \
  --REGION UA00000000000000000 \
  --GOODS_SERVICES_TYPE 0 \
  --FREQ A \
  --start-period 2010 --end-period 2016 \
  --out ua_cpi.csv

Result (filtered from CSV):

TIME_PERIOD  │  OBS_VALUE
─────────────┼───────────
2010         │  109.4 %
2011         │  108.0 %
2012         │  100.6 %
2013         │   99.7 %
2014         │  112.1 %
2015         │  148.7 %   ← the value you needed
2016         │  113.9 %

148.7% year-over-year — the exact number StatGPT gave you, retrieved from the same source, with no authentication, no BankID, no account.

A note on wages

You mentioned wages too. I looked — and this is where opensdmx gives you an honest answer rather than a hallucination.

First, find the relevant datasets:

opensdmx search "salary" --provider derzhstat

Search: salary (2)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ df_id                        ┃ df_description            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ DF_SALARY_PAYMENT_STATUS     │ Salary payment status     │
│ DF_SALARY_LEVEL_OF_EMPLOYEES │ Salary level of employees │
└──────────────────────────────┴───────────────────────────┘

Then check what's actually inside DF_SALARY_LEVEL_OF_EMPLOYEES:

opensdmx constraints DF_SALARY_LEVEL_OF_EMPLOYEES --provider derzhstat

              Constraints: DF_SALARY_LEVEL_OF_EMPLOYEES
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ dimension_id       ┃ n_values ┃ sample                                       ┃
┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ BREAKDOWN          │      229 │ A, B, B-E                                    │
│ BREAKDOWN_CATEGORY │        7 │ AGE_GROUP, ECONOMIC_ACTIVITY_TYPE, …         │
│ FREQ               │        1 │ A                                            │
│ INDICATOR          │        1 │ ACC_SAL_PAYM                                 │
│ PERIOD_OF_TIME     │        2 │ HOUR, MONTH                                  │
│ REGION             │       28 │ UA00000000000000000, …                       │
│ SEX                │        3 │ FEMALE, MALE, _T                             │
└────────────────────┴──────────┴──────────────────────────────────────────────┘

The constraints don't directly list available years. To check the actual time coverage, you can download a small sample:

opensdmx get DF_SALARY_LEVEL_OF_EMPLOYEES \
  --provider derzhstat \
  --INDICATOR ACC_SAL_PAYM --PERIOD_OF_TIME MONTH \
  --SEX _T --REGION UA00000000000000000 --FREQ A \
  --out ua_wages.csv

The dataset metadata says "annual (once every four years)", and the data confirms it: the SDMX endpoint currently has 2016 and 2020 only — not 2015.

(DF_SALARY_LEVEL_OF_EMPLOYEES is a sample survey on salary levels by sex, age group, education, and economic activity — useful for structural breakdowns, but not a monthly/annual time series.)

The same story applies to DF_EXPENSES_OF_ENTERPRISES_FOR_WORKFORCE_MAINTENANCE (average monthly labour costs per employee, by sector): only 2018 is available.

So the 404s and empty responses you got for wages in 2015 weren't a tooling problem — the data simply isn't published in SDMX for that year. That's useful to know. A clear "not available" is far better than a silent failure or a hallucination.

The real superpower: skill + CLI together

The CLI alone is already useful, but what you described — wanting a local model to navigate statistical data naturally — is where things get interesting.

opensdmx ships an sdmx-explorer skill — a set of instructions that teaches any AI agent how to use the CLI interactively: discover dataflows, explore constraints, retrieve and interpret data, step by step.

Install it in one command:

npx skills add ondata/opensdmx --skill sdmx-explorer

(Or copy the skills/sdmx-explorer/ folder from the repo directly into your project if you prefer.)

Then tell your local LLM:

"use sdmx-explorer skill: I want to find data on inflation (consumer prices) and average wages in Ukraine for 2015, using the Derzhstat SDMX endpoint"

I ran this exact prompt on my machine. The agent:

Checked opensdmx providers — found derzhstat built-in
Ran opensdmx search "price" --provider derzhstat — found DF_PRICE_CHANGE_CONSUMER_GOODS_SERVICE immediately
Explored constraints — understood the available indicators, base periods, and regions
Retrieved 148.7% for 2015 — correct, no hallucination
For wages: searched all salary-related datasets, found them, and clearly reported that 2015 data is absent from the SDMX endpoint

No reverse-engineering. No guessing dataflow IDs. No 404s without explanation.

This is the design philosophy behind opensdmx: a CLI built to be orchestrated by AI. Not an MCP (which requires a running server and a capable machine), but a lightweight tool any local model can call, read, and reason about.

This release is yours

Reading your post pushed us to fix two real problems in the CLI — Derzhstat blocks non-browser clients (fixed with a configurable User-Agent), and some providers reject the standard SDMX key-filter format (fixed with data_key_format). We added Derzhstat as a built-in preset.

Release v0.6.0 is dedicated to you. Thank you for writing so honestly about what doesn't work.

Would love to hear what you think — especially if you try it with your Qwen 2.5 7B setup. Does the skill + CLI approach work on your hardware? Are there other Derzhstat datasets you hit walls on?

— Andrea, onData

virnadiia · 2026-05-02T09:52:22Z

virnadiia
May 2, 2026

Dear Andrea! Thank you so much for your post! It helped me to not give up in my attempts to build connection with our Derzhstat)

So, I used all recommendations. I changed it here: uv tool run opensdmx search "price" --provider derzhstat (added uv tool run). Everything works perfect in cmd.

The next stel - is skills integration. I should have installed git on my windows firstly in order to run this npx skills add ondata/opensdmx --skill sdmx-explorer. I used this
winget install --id Git.Git -e --source winget

After this I desabled my mcp server on qwen 2.5 7B model and run this prompt: "use sdmx-explorer skill: I want to find data on inflation (consumer prices) and average wages in Ukraine for 2015, using the Derzhstat SDMX endpoint". The result:

I would appriciate if you can help me with that!

3 replies

aborruso May 3, 2026
Maintainer Author

Hi Nadiia,

great progress — the CLI working perfectly on Windows is already the important part.

Before going further, one thing I'd like to share as a guiding principle: choose tools that serve your ideas, not the other way around. It's easy to fall into the trap of adapting your workflow to fit a specific tool. But the goal here is to query Derzhstat data with a local LLM — the client you use to do that should be a detail, not a constraint.

I'll be honest: I don't use Windows for AI projects, and I'm not familiar with LM Studio to give you reliable advice about it. So I won't pretend I know how to make it work well for this use case.

What I see in your screenshot is the core issue.

The model is trying to call get_statistical_data — that's a tool from your old MCP server, which you disabled. The model is falling back on something it knows from its context, rather than following the skill.

But here's the thing: you don't need any MCP server at all for this workflow. The skill + CLI approach is intentionally minimal. You just need a client that can do two things:

Read a skill — load the SKILL.md content into the model's context
Execute CLI commands — run opensdmx search ..., opensdmx get ..., etc.

No server. No custom tools. Just those two capabilities.

The mismatch with LM Studio

npx skills add is part of an open Agent Skills ecosystem by Vercel Labs that supports 44+ agents. When you ran it, it installed the skill into the config directories of any supported agents detected on your machine. LM Studio is not one of the 44 supported agents — so the skill was never actually loaded for it.

If you want to stay in LM Studio

It looks like LM Studio supports shell execution via the mbagley/local-shell-access plugin — a community plugin that gives the model a shell_exec tool, including on Windows via PowerShell/cmd. If that works, you'd still need to paste the SKILL.md content manually as a system prompt (LM Studio isn't part of the npx skills ecosystem). I can't confirm this path myself since I don't use LM Studio, but it may be worth trying.

A minimal path forward (my recommendation)

opencode is one of the supported agents, it has a native bash tool (so it can run opensdmx directly), and it works on Windows. We strongly recommend using it as a CLI tool in your terminal — not the desktop app. That's where the skill + CLI orchestration works as designed:

npm i -g opencode-ai

or download from opencode.ai/download. Either way, use it as a CLI tool in your terminal — that's where the skill + bash orchestration works as designed, not in the desktop app.

The key point: you can keep Qwen 2.5 7B running in LM Studio as the inference backend. opencode connects to it via:

LOCAL_ENDPOINT=http://localhost:1234/v1

So nothing changes in how you run your model — you just replace LM Studio's chat interface with opencode as the agent layer. Then npx skills add ondata/opensdmx --skill sdmx-explorer will work exactly as intended, and the model will orchestrate the CLI step by step.

Would love to hear if this works on your setup.

— Andrea

virnadiia May 3, 2026

Thank you for these details! Yes, you are totally right - it works perfectly via powershell or cmd directly (but not in Ukrainian). My goal was to create kinda StatGPT that works without identification on a local model (I use Qwen) in plain language with 0 knowledge of coding, just using a chat and asking questions in Ukrainian as I did it with this official StatGPT. I also want to understand how do Derzhstat data communicate with AI, is it discoverable, interpretable, up to date.

I will use your recommendations to make my model work and write you back! Thank you once again for opensdmx and your guidance here!

aborruso May 3, 2026
Maintainer Author

Hi @virnadiia,

reading your last message made me smile — because what you just described as your goal is exactly what you already have, once the right agent is connected to a model.

"a StatGPT that works without identification on a local model, in plain language, with 0 knowledge of coding"

That's the skill + CLI in action. Once the setup is done, you open a terminal, type something like:

«sdmx-explorer Хочу дані про інфляцію в Україні у 2015 році»

...and the agent does everything: discovers the dataset, explores the constraints, downloads the data, interprets the result. No dataflow IDs. No SDMX syntax. No authentication. No coding.

You don't interact with the CLI at all — the model does. You just ask questions in natural language.

The only requirement is an agent that can do two things: read a skill (load the SKILL.md into context) and run CLI commands. opencode is one option, but any tool with those two capabilities works — Claude Code, Cursor, or others. The inference backend stays Qwen in LM Studio, nothing changes there.

The setup is a one-time thing. After that, it's exactly the experience you're looking for.

— Andrea

virnadiia · 2026-05-03T18:51:16Z

virnadiia
May 3, 2026

Yes! It works perfectly! I am so happy to finally receive answers on my questions and not 500, 404 errors))) I have found very specific data on employment at the regional level. It took a lot of time to understand where to search region identifiers, but it successfully delivered the data.

However, I also came across a few limitations along the way, such as outdated data (many datasets have not been updated but it is available on the official website!), metadata only (some data are not fully accessible, only their descriptions), processing issues (some data is not described clearly enough for the agent to process it easily), API restrictions (there are also quite a lot of restrictions on the Derzhstat API side). I have asked a model to analyze problems with Derzhstat and it said following:

"1. Empty responses: Datasets such as DF_POPULATION_BIRTH (birth rate) and DF_ASSETS_EQUITY_LIABILITIES_FINANCIAL_RESULTS_B_A (financial results) return only metadata (~2010 bytes) without any observations.
2. Time-series limitations: Complete data is available only up to 2021; from 2022 onwards, the data is fragmented due to the war.
3. Rigid key structure: The system returns a 400: more than expected N dimension(s) error unless the query contains exactly 6 or 12 dimensions (depending on the dataset).
4. 403 Forbidden: The /constraints endpoint appears to be blocked for open access.
5. Encoding issues: In Windows PowerShell, opensdmx crashes with a UnicodeEncodeError unless the output is redirected to a file.

Conclusion: The API is only suitable for basic statistics (unemployment, salaries). The rest of the datasets are either empty or contain fragmented data after 2022."

I will check later to see if the official StatGPT has better access to data than what is provided via the public API.

Thank you again for your help and for this tool!

2 replies

aborruso May 3, 2026
Maintainer Author

Hi @virnadiia ,
this is wonderful — regional employment data, found and delivered. That's exactly the kind of discovery the tool is built for.

And thank you for taking the time to document the limitations so carefully. Your testing uncovered real bugs in opensdmx that we've now fixed. We've just released v0.6.3 — please upgrade before continuing:

uv tool install --upgrade opensdmx

Let me go through each of the five points the model raised, because the picture is more nuanced than its conclusion suggests.

1. Empty responses

There are actually two separate issues here:

DF_POPULATION_BIRTH — the data exists (460,000+ rows, 2010–2021). The problem was two bugs: first, Derzhstat's server runs out of memory when you query without a date range (it returns an error page instead of data); second, opensdmx crashed on the first decimal observation value (0.1, 0.2…) because Polars was inferring the column as integer. Both fixed in v0.6.3. Use --start-period / --end-period to stay within the server's memory limit.
DF_ASSETS_EQUITY_LIABILITIES_FINANCIAL_RESULTS_B_A — returns HTTP 500 on every query, with or without filters. This is a server-side problem on Derzhstat's end; nothing we can fix.

2. Time-series limitations

The model's characterisation is partially correct but too broad. It depends on the dataset type:

Economic datasets (prices, wages, employment) — data is available through 2024, though post-2022 records carry a note: "data exclude territories temporarily occupied by the Russian Federation and territories where military actions are/were conducted."
Demographic datasets (births, population) — post-2022 data is genuinely absent; 310 rows are marked "Information is not available".

So the gap seems real for demographics, but economic statistics are well covered.

3. Rigid key structure (400 error)

This is not an opensdmx issue. Our CLI uses a format that avoids the key-filter path entirely, so that 400 error never appears. What does happen on Derzhstat is that wildcard dot notation (the standard SDMX way of saying "all values for this dimension") returns a 404 — which is why we apply dimension filters client-side after download. The 400 the model saw came from direct API calls, not from the CLI.

For example, this query works without any key structure issues:

opensdmx get DF_PRICE_CHANGE_CONSUMER_GOODS_SERVICE --provider derzhstat --INDICATOR INDEX_CONSUMPRICE --REGION UA00000000000000000 --GOODS_SERVICES_TYPE 0 --start-period 2020 --end-period 2024

The CLI filters dimensions client-side — you can pass as many or as few dimension filters as you want, and you never need to worry about the number of dimensions in the URL.

4. 403 on /constraints

We tested the /constraints endpoint on DF_PRICE_CHANGE_CONSUMER_GOODS_SERVICE and DF_POPULATION_BIRTH and got HTTP 200, not 403. The most likely cause of the 403 you saw in earlier tests is a missing User-Agent header — Derzhstat blocks requests that don't look like browser traffic. opensdmx sets User-Agent: Mozilla/5.0 automatically, which is why it works. Direct curl or Python requests calls without that header will be rejected.

You can verify it yourself:

opensdmx constraints DF_PRICE_CHANGE_CONSUMER_GOODS_SERVICE --provider derzhstat

              Constraints: DF_PRICE_CHANGE_CONSUMER_GOODS_SERVICE
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ dimension_id        ┃ n_values ┃ sample                                      ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ BASE_PERIOD         │       11 │ BASE_YEAR_2001, BASE_YEAR_2007,             │
│                     │          │ BASE_YEAR_2010                              │
│ FREQ                │        2 │ A, M                                        │
│ GOODS_SERVICES_TYPE │      117 │ 0, 01, 01_1                                 │
│ INDICATOR           │        3 │ AVG_CONS_PRCS, CORE_INFL, INDEX_CONSUMPRICE │
│ REGION              │       28 │ UA00000000000000000, UA01000000000013043,   │
│                     │          │ UA05000000000010236                         │
└─────────────────────┴──────────┴─────────────────────────────────────────────┘

5. UnicodeEncodeError on Windows PowerShell

Fixed in v0.6.3. The root cause was that PowerShell defaults to cp1252 encoding, which can't represent Cyrillic characters or box-drawing symbols. opensdmx now forces UTF-8 on Windows before any output is written.

The model's conclusion — "only suitable for basic statistics" — is perhaps too pessimistic. Economic and labour data (unemployment, wages, prices, employment by region) seems decent to me — though I'd need to browse the database more broadly before making a real judgement. The real limitation is demographic data post-2022, which reflects the situation on the ground, not an API problem.

After upgrading to v0.6.3, try DF_POPULATION_BIRTH again with a date range:

opensdmx get DF_POPULATION_BIRTH --provider derzhstat --start-period 2015 --end-period 2020 --yes

The --yes flag is needed because this dataset has ~37,000 series — the CLI detects this automatically and stops to warn you before downloading, unless you explicitly confirm with --yes.

Looking forward to hearing what you find next.

— Andrea

aborruso May 4, 2026
Maintainer Author

I have found very specific data on employment at the regional level. It took a lot of time to understand where to search region identifiers, but it successfully delivered the data.

Hi @virnadiia,
there is actually a direct way — no reverse-engineering needed. The constraints command, when given a specific dimension name, returns the full list of codes with their human-readable labels:

opensdmx constraints DF_LABOR_FORCE_A REGION --provider derzhstat

Output:

      DF_LABOR_FORCE_A / REGION (constrained)
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ id                  ┃ name                      ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ UA00000000000000000 │ Ukraine                   │
│ UA01000000000013043 │ Avtonomna Respublika Krym │
│ UA05000000000010236 │ Vinnytska                 │
│ UA07000000000024379 │ Volynska                  │
│ UA12000000000090473 │ Dnipropetrovska           │
│ UA14000000000091971 │ Donetska                  │
│ UA18000000000041385 │ Zhytomyrska               │
│ UA21000000000011690 │ Zakarpatska               │
│ UA23000000000064947 │ Zaporizka                 │
│ UA26000000000069363 │ Ivano-Frankivska          │
│ UA32000000000030281 │ Kyivska                   │
│ UA35000000000016081 │ Kirovohradska             │
│ UA44000000000018893 │ Luhanska                  │
│ UA46000000000026241 │ Lvivska                   │
│ UA48000000000039575 │ Mykolaivska               │
│ UA51000000000030770 │ Odeska                    │
│ UA53000000000028050 │ Poltavska                 │
│ UA56000000000066151 │ Rivnenska                 │
│ UA59000000000057109 │ Sumska                    │
│ UA61000000000060328 │ Ternopilska               │
│ UA63000000000041885 │ Kharkivska                │
│ UA65000000000030969 │ Khersonska                │
│ UA68000000000099709 │ Khmelnytska               │
│ UA71000000000010357 │ Cherkaska                 │
│ UA73000000000044923 │ Chernivetska              │
│ UA74000000000025378 │ Chernihivska              │
│ UA80000000000093317 │ Kyiv                      │
│ UA85000000000065278 │ Sevastopol                │
└─────────────────────┴───────────────────────────┘

All 28 KATOTTG codes — national level + 27 regions/cities — each with its name, in one command. The same pattern works for any dataset that has a REGION dimension: just replace the dataflow ID.

If you need the list as JSON (e.g. to pass it to a model or save it for later):

opensdmx constraints DF_LABOR_FORCE_A REGION --provider derzhstat --output json

This is the same mechanism that lets the sdmx-explorer skill translate codes into readable labels before plotting or summarising — it queries constraints <dataset> <dimension> under the hood.

We'll add an explicit note about this in the documentation, because it's not obvious that constraints accepts a dimension name as a second argument. Thank you for surfacing it.

virnadiia · 2026-05-04T17:47:10Z

virnadiia
May 4, 2026

Thank you! I have tried this agent to analyze stat data that is accessible on the open data portal - just provided the link amd wrote a prompt (about business financial statements) and it analyzed (it was not an easy task!) the dataset and provided me top 3 the most successfull companies and collected and saved everything in a file on my C driver! This is magic) I am testing now new prompts using cmd. I have some troubles with launching ondata agent but after the second attempt it works)

I plan to write a letter to the Statistic Service of Ukraine and the Ministry of Digital Transformation regarding access to statdata and how to make it better. If you have suggestions what to add (I want to write about filtering, datasets catalog with search, improved documentation with query examples, about outdated data via API (vs updated data on the website), need of the sections for AI/LM) I would appriciate that very much!

2 replies

aborruso May 5, 2026
Maintainer Author

Hi @virnadiia,

when you talk about the business financial statements, you are not using opensdmx anymore — you are showing how convenient opencode is when combined with a good AI model. Did I get that right?
It is a great tool!

Regarding the ondata agent you had trouble launching — which one are you referring to?

virnadiia May 6, 2026

Hi Andrea, I have used the same terminal and just asked: can you analyze finstatements? and gave it a link on the dataset.

Regarding the ondata agent launch - there was a problem with choosing Qwen as my model. It started to freeze and stuck on C/ path configuration or something like that. I restarted and it worked fine!

In the meantime I had a talk to the State Statistic Service of Ukraine and they are happy to receive recommendations (hope, to implement it too).

Best,
Nadiia

For Nadiia: discover and query Derzhstat SDMX data with a CLI and a local LLM #21

Uh oh!

Uh oh!

aborruso May 1, 2026 Maintainer

Install the CLI

Try it on Derzhstat

Step 1 — Discover the dataset

Step 2 — Inspect the dataflow structure

Step 3 — Explore available filters

Step 4 — Get the data

A note on wages

The real superpower: skill + CLI together

This release is yours

Replies: 3 comments · 7 replies

Uh oh!

virnadiia May 2, 2026

Uh oh!

Uh oh!

aborruso May 3, 2026 Maintainer Author

Uh oh!

virnadiia May 3, 2026

Uh oh!

Uh oh!

aborruso May 3, 2026 Maintainer Author

Uh oh!

virnadiia May 3, 2026

Uh oh!

aborruso May 3, 2026 Maintainer Author

Uh oh!

aborruso May 4, 2026 Maintainer Author

Uh oh!

virnadiia May 4, 2026

Uh oh!

aborruso May 5, 2026 Maintainer Author

Uh oh!

virnadiia May 6, 2026

aborruso
May 1, 2026
Maintainer

Replies: 3 comments 7 replies

virnadiia
May 2, 2026

aborruso May 3, 2026
Maintainer Author

aborruso May 3, 2026
Maintainer Author

virnadiia
May 3, 2026

aborruso May 3, 2026
Maintainer Author

aborruso May 4, 2026
Maintainer Author

virnadiia
May 4, 2026

aborruso May 5, 2026
Maintainer Author