For Nadiia: discover and query Derzhstat SDMX data with a CLI and a local LLM #21
Replies: 3 comments 7 replies
-
Beta Was this translation helpful? Give feedback.
-
|
Yes! It works perfectly! I am so happy to finally receive answers on my questions and not 500, 404 errors))) I have found very specific data on employment at the regional level. It took a lot of time to understand where to search region identifiers, but it successfully delivered the data. However, I also came across a few limitations along the way, such as outdated data (many datasets have not been updated but it is available on the official website!), metadata only (some data are not fully accessible, only their descriptions), processing issues (some data is not described clearly enough for the agent to process it easily), API restrictions (there are also quite a lot of restrictions on the Derzhstat API side). I have asked a model to analyze problems with Derzhstat and it said following: "1. Empty responses: Datasets such as DF_POPULATION_BIRTH (birth rate) and DF_ASSETS_EQUITY_LIABILITIES_FINANCIAL_RESULTS_B_A (financial results) return only metadata (~2010 bytes) without any observations. Conclusion: The API is only suitable for basic statistics (unemployment, salaries). The rest of the datasets are either empty or contain fragmented data after 2022." I will check later to see if the official StatGPT has better access to data than what is provided via the public API. Thank you again for your help and for this tool! |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Nadiia,
I follow your work with great interest, and your recent LinkedIn post about building a local MCP server for Derzhstat really hit home for me. You described exactly the problems that motivated us to build opensdmx.
You hit every wall:
DF_PRICE_CHANGE_CONSUMER_GOODS_SERVICE) is invisible unless you reverse-engineer StatGPT's stepsWe're a small Italian open data association (onData) and we've been building an open source Python CLI — opensdmx — designed to lower exactly these barriers. I'd love for you to try it and tell us what you think.
Install the CLI
(or
pip install opensdmxif you prefer pip)Try it on Derzhstat
Derzhstat is a built-in provider, so no endpoint configuration is needed. Let's start with your inflation question.
Step 1 — Discover the dataset
opensdmx search "price" --provider derzhstatOutput:
DF_PRICE_CHANGE_CONSUMER_GOODS_SERVICE— the dataset you had to reverse-engineer from StatGPT — appears right there, no insider knowledge required.Step 2 — Inspect the dataflow structure
Output:
Now you know the five dimensions and their order. The next step is to see which values are actually available for each one.
Step 3 — Explore available filters
Output:
If you want to see the full list of codes and labels for a specific dimension — for example to understand what
GOODS_SERVICES_TYPEcode0actually means — add the dimension name to the command:Output (first rows):
So
0is the overall CPI — which is what we want. Same pattern works for any dimension.Step 4 — Get the data
Result (filtered from CSV):
148.7% year-over-year — the exact number StatGPT gave you, retrieved from the same source, with no authentication, no BankID, no account.
A note on wages
You mentioned wages too. I looked — and this is where opensdmx gives you an honest answer rather than a hallucination.
First, find the relevant datasets:
opensdmx search "salary" --provider derzhstatThen check what's actually inside
DF_SALARY_LEVEL_OF_EMPLOYEES:The constraints don't directly list available years. To check the actual time coverage, you can download a small sample:
The dataset metadata says "annual (once every four years)", and the data confirms it: the SDMX endpoint currently has 2016 and 2020 only — not 2015.
(
DF_SALARY_LEVEL_OF_EMPLOYEESis a sample survey on salary levels by sex, age group, education, and economic activity — useful for structural breakdowns, but not a monthly/annual time series.)The same story applies to
DF_EXPENSES_OF_ENTERPRISES_FOR_WORKFORCE_MAINTENANCE(average monthly labour costs per employee, by sector): only 2018 is available.So the 404s and empty responses you got for wages in 2015 weren't a tooling problem — the data simply isn't published in SDMX for that year. That's useful to know. A clear "not available" is far better than a silent failure or a hallucination.
The real superpower: skill + CLI together
The CLI alone is already useful, but what you described — wanting a local model to navigate statistical data naturally — is where things get interesting.
opensdmx ships an sdmx-explorer skill — a set of instructions that teaches any AI agent how to use the CLI interactively: discover dataflows, explore constraints, retrieve and interpret data, step by step.
Install it in one command:
(Or copy the
skills/sdmx-explorer/folder from the repo directly into your project if you prefer.)Then tell your local LLM:
I ran this exact prompt on my machine. The agent:
opensdmx providers— foundderzhstatbuilt-inopensdmx search "price" --provider derzhstat— foundDF_PRICE_CHANGE_CONSUMER_GOODS_SERVICEimmediatelyNo reverse-engineering. No guessing dataflow IDs. No 404s without explanation.
This is the design philosophy behind opensdmx: a CLI built to be orchestrated by AI. Not an MCP (which requires a running server and a capable machine), but a lightweight tool any local model can call, read, and reason about.
This release is yours
Reading your post pushed us to fix two real problems in the CLI — Derzhstat blocks non-browser clients (fixed with a configurable User-Agent), and some providers reject the standard SDMX key-filter format (fixed with
data_key_format). We added Derzhstat as a built-in preset.Release v0.6.0 is dedicated to you. Thank you for writing so honestly about what doesn't work.
Would love to hear what you think — especially if you try it with your Qwen 2.5 7B setup. Does the skill + CLI approach work on your hardware? Are there other Derzhstat datasets you hit walls on?
— Andrea, onData
Beta Was this translation helpful? Give feedback.
All reactions