German Government Domains
An incomplete listing of german government domains (and the code for the scraper used to build the list).
If you only want a subset of the available data, variants filtered by
Domain Type are provided:
data/domains.federal.csvcontains only government agencies (
data/domains.cities.csvcontains only cities (
There currently isn't a publicly available directory of all the domain names registered by the german government and its agencies. Such a directory would be useful for people looking to get an aggregate view of government websites and how they are hosted. For example, Ben Balter has been doing some great work analyzing the official set of US
This is by no means an official or a complete list. It is intended to be a first step toward a better understanding of how the government is managing its official sites.
What can I do with it?
- Plug the CSV into 18F/domain-scan to get more data (like HTTPS support) about the domains
- Check the IPv6 reachability
- Test if the sites are reachable even without the
How to update
The list is populated by scrapers and static files and merged by a makefile. To run the process yourself, checkout this repository and run:
bundle install make
After everything ran, you can look into
Scrapers and Sources
bundde-behoerden-scraper.rb: crawls an official government agency list.
wikidata-cities.rb: uses a sparql query to get cities with their domains from Wikidata.
data/source/bmf.csv: list from BMF, manually extracted from their digital services page on bundesfinanzministerium.de
data/source/ifg-bka.csv: is a list from BKA, aquired with a freedom of information request
data/source/ifg-bmas.csv: is a list from BMAS, aquired with a freedom of information request
data/source/ifg-bmvi.csv: is a list from BMVI, aquired with a freedom of information request
data/source/ifg-bt.csv: is a list from Bundestag, aquired with a freedom of information request
data/source/ifg-bva.csv: is a list from Bva, aquired with a freedom of information request
data/source/ifg-dwd.csv: is a list from DWD, aquired with a freedom of information request
data/source/overrides.csv: manually curated list of domains for which the scraper returns a wrong agency name
data/source/reverse-dns.csv: DNS lookup of 220.127.116.11/21, sites manually visited and picked
I'd love to have some help with this! Please feel free to create an issue or submit a pull request if you notice something that can be better. Specifically, suggesting additional pages we can scrape and domains that are either not found or have incorrect organization names associated with them would be very helpful.
- scrape even more domains
- manual collection