<a href="https://colab.research.google.com/github/knobs-dials/wetsuite-dev/blob/main/examples/datacollect_koop_repos.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## This notebook's goal

Showing you how to address KOOP's repositories directly using the SRU interface, to access BWB, CVDR, and some others,
aiming to get machine-readable data, and possibly human-readable documents.

You will not need this to search at all, but you might want to read this to specialize a little.

Directly talking to these repositories may be the most up-to-date source for this data, and may avoid the need for database to store things in.
If you use something like colab you don't even need to install anything on your own computers.

The first goal is to show this as a route of fetching data in an automated way - not to do much with the result just yet.

That said, it's on the low-level technical side, so you need to be comfortable with python, working with a new query language,
and it can't hurt to think think about bulk requests and perhaps caching.

This may be a little overwhelming, and **you may well prefer to load the bwb or cvdr datasets we provide**.

In [None]:
# For local installs you can install the wetsuite package once.  
# In colab you get a disposable environment each time,  and will have to start with this install each time. 
!pip3 --quiet install -U https://github.com/knobs-dials/wetsuite-dev/archive/refs/heads/main.zip

In [2]:
# imports we'll be using
import pprint, datetime, random

import dateutil.parser

import wetsuite.datasets
import wetsuite.datacollect.koop_repositories
import wetsuite.helpers.koop_parse

## On SRU

SRU ([Search/Retrieval via URL](https://en.wikipedia.org/wiki/Search/Retrieve_via_URL)) was created as a search API with simpler and standard formats and interchange, so is easier to implement than specialized protocols. You could _almost_ use it without a library.

It has just two operations:
* `explain` - "hello server, describe yourself"
* `searchRetrieve` - "I would like the results for this query please"

Exactly how powerful/flexible any search is depends on the backend you are talking to - it varies how many metadata fields are exposed usefully (explain will tell you that) and which query operations are supported.

We provide a basic class to interact with the repositories we are most interested in.
We'll introduce its functions by example, and programmers may also care to read e.g. `help( wetsuite.datacollect.koop_repositories.BWB() )`.

# Searching KOOP's repositories via their SRU API

KOOP's repositories give access to the data behind wetten.overheid.nl, lokaleregelgeving.overheid.nl, and others.

## Example: Query basics, on the BWB (Basis WettenBestand) 

KOOP's BWB repository can be seen as the data equivalent of https://wetten.overheid.nl

There is some [technical intrduction to searching the BWB with SRU](https://www.overheid.nl/sites/default/files/wetten/Gebruikersdocumentatie%20BWB%20-%20Zoeken%20binnen%20het%20basiswettenbestand%20v1.3.1.pdf).
Notably, this route to the BWB does not seem to allow searching the body text ([wetten.overheid.nl](https://wetten.overheid.nl/) does).
As such, this may mainly be useful for [known-item searches](https://en.wikipedia.org/wiki/Known-item_search), date ranges, and such.

In [3]:
sru_bwb = wetsuite.datacollect.koop_repositories.BWB()  # object that knows where to fetch from

In [3]:
pprint.pprint( sru_bwb.explain_parsed() )  # this is a self-decripion of the API for you to read, mainly useful to figure out the names of indices that you can search in

{'database/numRecs': '129614',
 'description': 'Gemeenschappelijke zoekdienst van overheid.nl voor BWB Online',
 'explain_url': 'http://zoekservice.overheid.nl/sru/Search?&version=1.2&x-connection=BWB&operation=explain',
 'extent': 'Dutch national legislation',
 'host': 'zoekservice.overheid.nl',
 'indices': [('dcterms', 'identifier'),
             ('dcterms', 'modified'),
             ('dcterms', 'type'),
             ('overheid', 'authority'),
             ('overheidbwb', 'rechtsgebied'),
             ('overheidbwb', 'overheidsdomein'),
             ('overheidbwb', 'onderwerpVerdrag'),
             ('overheidbwb', 'titel'),
             ('overheidbwb', 'afkorting'),
             ('overheidbwb', 'wetsfamilie'),
             ('overheidbwb', 'geldigheidsdatum'),
             ('overheidbwb', 'zichtdatum'),
             ('overheidbwb', 'bekendmaking'),
             ('overheidbwb', 'dossiernummer')],
 'port': '80',
 'sets': [('dcterms',
           'http://purl.org/dc/terms/',
           'I

**The query syntax** is Common Query Language.

You can get fairly far copying examples, or guessing based on them. 

For the more technically minded:
- parts of a query are ***`indexname operator term`***, e.g. 
  - `overheidbwb.titel = kip` 
  - `dcterms.modified > 2022-01-01`
  - `body any woning` 
  - use doublequotes when there's a space or one of `<>=/()` in the term, e.g. `dcterms.title = "wet kip"`
- the operators supported vary per field and per server (some servers get fancy, many do not), so unless you stick to the very basics, then check its `explain`, and/or its documentation
  - dates and numbers mostly have: `<` `<=` `>`, `>=` `=` 
  - text field often have most of:
    - `any`:      you can see `body any "foo bar"`  as short for   `body any foo  OR  body any bar`
    - `all`:      you can see `body all "foo bar"`  as short for   `body any foo  AND  body any bar`
    - `==`:      exact match
    - `exact`:    exact match
    - `adj`       exact phrase search - these words should appear adjacent as specified
    - `=`:        server choice, e.g. for text might be `==` or `adj` if present
- you can combine multiple of those `index operator term` chunks, using AND and OR, and brackets, see e.g. the CVDR example below

**Some details vary per server, and per repository in it**, e.g. 
  - which indexes (mostly named fields) are in searchable, and what they are called 
    - you can do an `explain` to find out.
  - how search results actually point to the actual documents that they describe
  - there may be shorthands for index names, e.g. BWB allows 'titel' meaning 'overheidbwb.titel'
  - The more detailed question you have, the more you have to figure out repository details. We try to provide helper functions.

In [4]:
# we will try a few queries, so a 'summarize result items' helps keep our code brief
def print_bwb_results(records):
    ' takes a list of etree object '
    for i, record in enumerate( records ): 
        print('\n***  Record %d of %d  ***'%(i+1, sru_bwb.numberOfRecords))
        # each record is an ElementTree style object - which is clunky whenever you just want to just pick out a few values to show
        #   we provide functions that parse that into python data structures, in this case: 
        meta = wetsuite.helpers.koop_parse.bwb_searchresult_meta(record)
        pprint.pprint(meta)    


### Known-item search

In [5]:
print_bwb_results( sru_bwb.search_retrieve_many( 'dcterms.identifier = BWBR0045754', up_to=5 ) ) # up to 5: if we happen to have a lot of results, don't fetch and print all


#sru_bwb.search_retrieve_many('dcterms.identifier==BWBR0001840', callback=bwb_callback) # Grondwet

#sru_bwb.search_retrieve_many('dcterms.identifier==BWBR0004825', callback=bwb_callback) # Reglement verkeersregels en verkeerstekens, to see how e.g. images work


***  Record 1 of 4  ***
{'authority': 'Binnenlandse Zaken en Koninkrijksrelaties',
 'created': '2022-05-01',
 'creator': 'Ministerie van Binnenlandse Zaken en Koninkrijksrelaties',
 'geldigheidsperiode_einddatum': '2022-07-31',
 'geldigheidsperiode_startdatum': '2022-05-01',
 'identifier': 'BWBR0045754',
 'language': 'nl',
 'locatie_manifest': 'https://repository.officiele-overheidspublicaties.nl/bwb/BWBR0045754/manifest.xml',
 'locatie_toestand': 'https://repository.officiele-overheidspublicaties.nl/bwb/BWBR0045754/2022-05-01_0/xml/BWBR0045754_2022-05-01_0.xml',
 'locatie_wti': 'https://repository.officiele-overheidspublicaties.nl/bwb/BWBR0045754/BWBR0045754.WTI',
 'modified': '2023-02-01',
 'overheidsdomein': 'Overheid, bestuur en koninkrijk',
 'rechtsgebied': 'Bestuursrecht',
 'title': 'Wet open overheid',
 'toestand': 'http://wetten.overheid.nl/id/BWBR0045754/2022-05-01/0',
 'type': 'wet',
 'zichtperiode_einddatum': '9999-12-31',
 'zichtperiode_startdatum': '2022-05-01'}

***  Rec

It turns out each version over time gets its own search result.

Also meaning that, to get a current version, you might want to filter by things
such as that `geldigheidsperiode_einddatum`. (We do something similar in the CVDR example below)

### Title search

In [6]:
print_bwb_results( sru_bwb.search_retrieve_many( 'overheidbwb.titel any textiel', up_to=5 ) )


***  Record 1 of 2  ***
{'authority': 'Volksgezondheid, Welzijn en Sport',
 'created': '2015-07-02',
 'creator': 'Ministerie van Binnenlandse Zaken en Koninkrijksrelaties',
 'geldigheidsperiode_einddatum': '2022-04-13',
 'geldigheidsperiode_startdatum': '2001-04-13',
 'identifier': 'BWBR0012348',
 'language': 'nl',
 'locatie_manifest': 'https://repository.officiele-overheidspublicaties.nl/bwb/BWBR0012348/manifest.xml',
 'locatie_toestand': 'https://repository.officiele-overheidspublicaties.nl/bwb/BWBR0012348/2001-04-13_0/xml/BWBR0012348_2001-04-13_0.xml',
 'locatie_wti': 'https://repository.officiele-overheidspublicaties.nl/bwb/BWBR0012348/BWBR0012348.WTI',
 'modified': '2022-04-15',
 'overheidsdomein': 'Economie en ondernemen',
 'rechtsgebied': 'Ondernemingspraktijk',
 'title': 'Warenwetbesluit formaldehyde in textiel',
 'toestand': 'http://wetten.overheid.nl/id/BWBR0012348/2001-04-13/0',
 'type': 'AMvB',
 'zichtperiode_einddatum': '9999-12-31',
 'zichtperiode_startdatum': '2001-04-1

### Changes this year

2023

In [9]:
this_year = str(datetime.date.today().year)
print_bwb_results( sru_bwb.search_retrieve_many( 'dcterms.modified >= %s-01-01'%this_year, up_to=5 ) )


***  Record 1 of 14437  ***
{'authority': 'Veiligheid en Justitie',
 'created': '2015-07-01',
 'creator': 'Ministerie van Binnenlandse Zaken en Koninkrijksrelaties',
 'geldigheidsperiode_einddatum': '2002-06-30',
 'geldigheidsperiode_startdatum': '2002-01-01',
 'identifier': 'BWBR0001827',
 'language': 'nl',
 'locatie_manifest': 'https://repository.officiele-overheidspublicaties.nl/bwb/BWBR0001827/manifest.xml',
 'locatie_toestand': 'https://repository.officiele-overheidspublicaties.nl/bwb/BWBR0001827/2002-01-01_0/xml/BWBR0001827_2002-01-01_0.xml',
 'locatie_wti': 'https://repository.officiele-overheidspublicaties.nl/bwb/BWBR0001827/BWBR0001827.WTI',
 'modified': '2023-02-25',
 'overheidsdomein': 'Rechtspraak',
 'rechtsgebied': 'Burgerlijk procesrecht',
 'title': 'Wetboek van Burgerlijke Rechtsvordering (geldt in geval van '
          'digitaal procederen)',
 'toestand': 'http://wetten.overheid.nl/id/BWBR0001827/2002-01-01/0',
 'type': 'wet',
 'zichtperiode_einddatum': '9999-12-31',
 

## CVDR
The CVDR repository can be seen as the data equivalent of https://lokaleregelgeving.overheid.nl

Checking what we can search:

In [13]:
sru_cvdr = wetsuite.datacollect.koop_repositories.CVDR()

pprint.pprint( sru_cvdr.explain_parsed() ) # seeing which indexes are here. 
# This one has a more complex information model, so you can dig a little deeper to see what you can do with it.

{'database/numRecs': '266775',
 'description': 'Gemeenschappelijke zoekdienst van overheid.nl voor Centrale '
                'Voorziening Decentrale Regelgeving',
 'explain_url': 'http://zoekservice.overheid.nl/sru/Search?&version=1.2&x-connection=cvdr&operation=explain',
 'extent': 'Lokale regelingen of the Dutch government',
 'host': 'zoekservice.overheid.nl',
 'indices': [('dcterms', 'identifier'),
             ('dcterms', 'title'),
             ('dcterms', 'language'),
             ('dcterms', 'creator'),
             ('dcterms', 'modified'),
             ('dcterms', 'isFormatOf'),
             ('dcterms', 'alternative'),
             ('dcterms', 'source'),
             ('dcterms', 'isRatifiedBy'),
             ('dcterms', 'subject'),
             ('dcterms', 'issued'),
             (None, 'workid'),
             (None, 'bronformaat'),
             (None, 'organisatieType'),
             (None, 'sorteerTitel'),
             (None, 'gemeente'),
             (None, 'provincie'),
   

### Damocles
Let's try looking for Amsterdam's policy around on [Wet damocles](https://nl.wikipedia.org/wiki/Wet_Damocles).

As [the relevant SRU manual](https://data.overheid.nl/sites/default/files/dataset/d0cca537-44ea-48cf-9880-fa21e1a7058f/resources/Handleiding%2BSRU%2B2.0.pdf) mentions in passing, `dt.spatial` refers to where it applies, `dt.creator` refers to who is responsible for creating the document. For this case we assume they are the same. Also, this repository lets us write `creator` instead of `dt.creator`, etc., nice for a bit of readability in these examples.

In [14]:
# we'll be playing with queries, so make 'show results' a minimal amount of typing away
def print_cvdr_results(records):  
    ' takes a list of etree object '
    print('fetched %d records\n'%len(records))
    for i, record in enumerate( records ):
        print('***  Record %d of %d  ***'%(i+1, sru_cvdr.numberOfRecords))
        meta = wetsuite.helpers.koop_parse.cvdr_meta(record, flatten=True) # flatten smushes down possibly-repeated fields into a single value. Good enough (only) for presentation.
        pprint.pprint( meta )

In [15]:
# See if we can search for amsterdam
print_cvdr_results( sru_cvdr.search_retrieve_many( '(creator any Amsterdam) ', up_to=1 ) ) # show just one of many results, we only check whether it works

fetched 1 records

***  Record 1 of 3848  ***
{'alternatieveIdentifier': '',
 'alternative': 'Verordening op de vastgoedregistratie',
 'betreft': 'nieuwe regeling',
 'creator': 'Amsterdam (overheid:Gemeente)',
 'gedelegeerdeRegelgeving': '<al>Geen</al>',
 'identifier': 'CVDR108223_1',
 'inwerkingtredingDatum': '2008-10-01',
 'isFormatOf': 'Gemeenteblad 2008, afd. 3A, nr. 182/461 ()',
 'isRatifiedBy': 'gemeenteraad (overheid:BestuursorgaanGemeente)',
 'issued': '2008-10-01',
 'kenmerk': 'Gemeenteblad 2008, afd. 1, nr. 461',
 'language': 'nl',
 'modified': '2018-01-30',
 'onderwerp': 'Ruimtelijke ordening, grondbeleid en bouwen',
 'opvolgerVan': '',
 'organisatietype': 'Gemeente',
 'preferred_url': 'https://lokaleregelgeving.overheid.nl/CVDR108223/1',
 'publicatieurl_xhtml': 'https://repository.officiele-overheidspublicaties.nl/cvdr/CVDR108223/1/html/CVDR108223_1.html',
 'publicatieurl_xml': 'https://repository.officiele-overheidspublicaties.nl/cvdr/CVDR108223/1/xml/CVDR108223_1.xml',
 '

In [11]:
# alright, now also require 'damocles' in the body text
print_cvdr_results( sru_cvdr.search_retrieve_many( '(creator any Amsterdam) AND (body any damocles)', up_to=5 ) ) 

fetched 0 records



Nothing. Hm. Maybe it's called 'damoclesbeleid'?

In [12]:
print_cvdr_results( sru_cvdr.search_retrieve_many( '(creator any Amsterdam) AND (body any damocles  OR  body any damoclesbeleid)', up_to=5 ) )

fetched 1 records

***  Record 1 of 1  ***
{'alternatieveIdentifier': '',
 'alternative': 'Verzamelbesluit van de burgemeester van de gemeente Amsterdam '
                'verband houdende met de herindeling van de gemeenten '
                'Amsterdam en Weesp',
 'betreft': 'nieuwe regeling',
 'creator': 'Amsterdam (overheid:Gemeente)',
 'identifier': 'CVDR674918_1',
 'inwerkingtredingDatum': '2022-03-25',
 'isFormatOf': 'gmb-2022-138618 '
               '(https://zoek.officielebekendmakingen.nl/gmb-2022-138618)',
 'isRatifiedBy': 'burgemeester (overheid:BestuursorgaanGemeente)',
 'issued': '2022-03-07',
 'kenmerk': 'Onbekend.',
 'language': 'nl',
 'modified': '2022-03-25',
 'onderwerp': '',
 'opvolgerVan': '',
 'organisatietype': 'Gemeente',
 'preferred_url': 'https://lokaleregelgeving.overheid.nl/CVDR674918/1',
 'publicatieurl_xhtml': 'https://repository.officiele-overheidspublicaties.nl/cvdr/CVDR674918/1/html/CVDR674918_1.html',
 'publicatieurl_xml': 'https://repository.officiele-

Not actually what we want - it's about reorganization and just happens to mention [Damoclesbeleid gemeente Weesp](https://lokaleregelgeving.overheid.nl/CVDR622223/1). 

If it exists, it probably isn't ***called*** damocles. 

Let's widen that to also include things that mention one of `drugs softdrugs harddrugs handelshoeveelheid opiumwet 13b` AND mention one of `sluiting herstelsanctie bestuursdwang`. 

This is a practical consideration: we _will_ get too many results, but what we want should at least be in there,  and filtering out can be easier than continuing to guess. 

In [13]:
print_cvdr_results( sru_cvdr.search_retrieve_many( '(creator any "Amsterdam") AND ( (body any "damoclesbeleid damocles") OR (body any "drugs softdrugs harddrugs handelshoeveelheid opiumwet 13b") AND (body any "sluiting herstelsanctie bestuursdwang"))', up_to=5 ) )



fetched 5 records

***  Record 1 of 114  ***
{'alternatieveIdentifier': '',
 'alternative': 'Beleidsregels sluitingen en heropeningen Amsterdam',
 'betreft': 'nieuwe regeling',
 'creator': 'Amsterdam (overheid:Gemeente)',
 'identifier': 'CVDR640125_1',
 'inwerkingtredingDatum': '2020-05-08',
 'isFormatOf': 'gmb-2020-115757 '
               '(https://zoek.officielebekendmakingen.nl/gmb-2020-115757)',
 'isRatifiedBy': 'burgemeester (overheid:BestuursorgaanGemeente)',
 'issued': '2020-04-24',
 'kenmerk': 'Onbekend.',
 'language': 'nl',
 'modified': '2020-05-08',
 'onderwerp': '',
 'opvolgerVan': '',
 'organisatietype': 'Gemeente',
 'preferred_url': 'https://lokaleregelgeving.overheid.nl/CVDR640125/1',
 'publicatieurl_xhtml': 'https://repository.officiele-overheidspublicaties.nl/cvdr/CVDR640125/1/html/CVDR640125_1.html',
 'publicatieurl_xml': 'https://repository.officiele-overheidspublicaties.nl/cvdr/CVDR640125/1/xml/CVDR640125_1.xml',
 'redactioneleToevoeging': '<al>Deze regeling vervangt

There it is, plus a bunch of unrelated and expired entries.  We'll get to the expiry part of that in the next section.

### Damocles per municipality

We have a list of municipalities:

In [17]:
gem = wetsuite.datasets.load('gemeentes')
print( gem.description )


    This is largely the more interesting fields from https://organisaties.overheid.nl/export/Gemeenten.csv
    augmented with RDF-like data like that under https://standaarden.overheid.nl/owms/terms/Leiden_(gemeente)

    
    .data is a list of dicts, one per gemeente (currently 344 of them). Keys in that dict include:
    
    'Namen' - a list of name variants. 
       Usually just the short name, and a longer one with "Gemeente " in front
       Sometimes with alternative names, e.g. ["Den Bosch", "Gemeente 's-Hertogenbosch", "'s-Hertogenbosch"]
       We have used these as "Match one of these" to search for gemeentebeleid per gemeente
   
    Descriptions like 'Aantal inwoners', 'Oppervlakte'
    
    Organisational relations like 
      - 'Bevat plaatsen'
      - 'Overlaps with', mentioning Provinces, Waterschappen
      - 'Service area of' - things like GGD, Police, Social services  (each item is a list because we tend to have a full name and an abbreviation)
      - 'Predecesso

In [18]:
# Showing one random example of gemeente data.
#   in this example we only actually care about 'Namen', though
pprint.pprint( random.choice( gem.data ) )

{'Aantal inwoners': '74298',
 'Bevat plaatsen': ['Blokker', 'Hoorn NH', 'Zwaag'],
 'CBSCode': '0405',
 'Namen': ['Hoorn', 'Gemeente Hoorn'],
 'OWMS URI': 'http://standaarden.overheid.nl/owms/terms/Hoorn_(gemeente)',
 'Oppervlakte': [52, 'km2'],
 'Organisatiecode': 'gm0405',
 'Overlaps with': [['Hoogheemraadschap Hollands Noorderkwartier'],
                   ['Noord-Holland']],
 'Predecessors': [],
 'Raad': [['Fractie Tonnaer', 6],
          ['Hoorn lokaal', 4],
          ['ÉénHoorn', 4],
          ['GroenLinks', 4],
          ['VVD', 3],
          ['D66', 3],
          ['PvdA', 3],
          ['CDA', 2],
          ['Liberaal Hoorn', 2],
          ['Sociaal Hoorn', 2],
          ['De Realistische Partij', 1],
          ['ChristenUnie', 1]],
 'Service area of': [['Afvalbeheer Westfriesland', 'ABWF'],
                     ['Gemeentelijke Gezondheidsdienst Hollands Noorden',
                      'GGD HN',
                      'GGD Hollands Noorden',
                      '1620'],
       

For each municipality, we pick out 'Namen' and put it in a query:

In [20]:
for gemeente_dict in gem.data[65:70]: # looking for den haag  with its other name, to check that the code and search are not not tripping over that.      (-35:-30  exposes a current repo bug)

    # we probably want to search in the index called 'creator'
    # when there are multiple names, we accept any of them.
    # doublequotes because there's spaces in some.
    query_gemeente_names = ' OR '.join( '(creator = "%s")'%naam   for naam in gemeente_dict['Namen'] )

    # this is the query we settled on earlier, plus the name requirement
    query = '(%s) AND ( (body any "damoclesbeleid damocles")  OR  (body any "drugs softdrugs harddrugs handelshoeveelheid opiumwet 13b") AND (body any "sluiting herstelsanctie bestuursdwang"))'%( 
        query_gemeente_names
    )

    ## search and fetch only first page, just so that num_records is filled in to report
    cvdr = wetsuite.datacollect.koop_repositories.CVDR()
    cvdr.search_retrieve( query ) 
    print( "\n == %3d  hits for   %s == "%(cvdr.num_records(), ' / '.join(gemeente_dict['Namen'])) )

    ## search and fetch all, summarizing each record as we go  (callback style instead)
    def show_brief( record ): 
        meta = wetsuite.helpers.koop_parse.cvdr_meta( record, flatten=True )
        uit = meta.get('uitwerkingtredingDatum', None)  # ignore things that are expired, because they were probably replaced by something else also in the results  (side note: the expiry data doesn't look 100% correct)

        # old policies are still in here, and we can reasonably assume that ones that expired will probably be replaced by another in the results, so we can just hide them. 
        # Yes, this can also be done in the query
        if uit not in (None,'')  and  (dateutil.parser.parse(uit.split('+')[0]).date() < datetime.date.today()):  # TODO: push newer code that avoids the need for that + nonsense
            pass
        else:
            print( "  %15s  %10s..%-10s  %s"%( meta.get('identifier'), meta.get('inwerkingtredingDatum'),  meta.get('uitwerkingtredingDatum',''),  meta.get('title')) )
            #print('    URL: %s'%meta.get('publicatieurl_xml') )     # 'publicatieurl_xml' points to text in structured XML.  There is also 'publicatieurl_xhtml' (more browser-presentable),  and 'preferred_url' (a link to the page that lokaleregelgeving.overheid.nl would also send you to)
            
            if False: # If you wanted to extract the text, this would be a (very crude) start:
              xml_data = wetsuite.helpers.net.download( meta.get('publicatieurl_xml') )
              tree = etree.strip_namespace( etree.fromstring( xml_data ) )
              for al in tree.find('body/regeling/regeling-tekst').getiterator('al'):
                  print(  ''.join( etree.all_text_fragments(al) )  )

    cvdr.search_retrieve_many( query, callback=show_brief ) # all results, and show brief summary, mainly just titles


 ==  56  hits for   Den Haag / Gemeente Den Haag / 's-Gravenhage == 
     CVDR645629_1  2020-11-10..            Beleidsregel toezicht bedrijfsmatige activiteiten 2020
     CVDR674619_1  2022-03-24..            Beleidsregel bestuurlijke boete, sluiting en beheerovername op grond van de Woningwet Den Haag 2022
     CVDR690428_1  2023-01-01..            Beleidsregel beoordeling levensgedrag Den Haag 2023
     CVDR11313_53  2022-12-01..            Algemene plaatselijke verordening voor de gemeente Den Haag

 ==  25  hits for   Den Helder / Gemeente Den Helder == 
     CVDR657606_1  2021-05-15..            Beleidsregel van de burgemeester van de gemeente Den Helder, houdende regels over sluiting van lokalen en woningen op grond van artikel 13b Opiumwet (Damoclesbeleid Den Helder 2021)
     CVDR674768_1  2022-03-26..            Beleidsregels van de burgemeester van de gemeente Den Helder, houdende regels omtrent coffeeshops (Beleid coffeeshops Den Helder 2022)
     CVDR627607_1  2019-09-20.

Always think and check, rather than trust automation blindly.

In this case, consider:
- the above search doesn't have a good hit for Den Haag. 
  - They do actually have a policy, but [on their website](https://denhaag.raadsinformatie.nl/modules/13/Overige_bestuurlijke_stukken/113642) rather than in in CVDR. There are other cases like this, which you will probably only really find out by hand.
<!-- -->

- [CVDR19959/1](https://lokaleregelgeving.overheid.nl/CVDR19959/1) and [CVDR375267/1](https://lokaleregelgeving.overheid.nl/CVDR375267/1) look to me like the same thing, for Deventer, and both mention they are current. 
  - I can't tell offhand whether that's correct, or they e.g. forgot to mark the older one as ended when the newer one was introduced. There are a handful more cases like these, so it might instead have some practical or legal reason I am not aware of.
<!-- -->

- Municipality mergers means names change over time, e.g. `Kollumerland en Nieuwkruisland` (a.k.a. `Kollumerland ca.`), `Dongeradeel`, en `Ferwerderadeel` are now `Noardeast-Fryslân`.
  - Presumably they don't re-issue all policy on that day, which probably means most active policy is still under the old name? 
    - TODO: actually look into that - it might be worth putting the previous name in the gemeente dataset
<!-- -->

- Municipality naming converntions may throw you off. Consider e.g.:
  - `Den Haag` is also known as `'s-Gravenhage`, and `Den Bosch` is also known as `'s-Hertogenbosch`
  - abbreviations, e.g. `Nuenen, Gerwen en Nederwetten` may appear as `Neunen c.a.` - and I would assume also just `Neunen`
  - somewhat less officially, Frisian towns should be assumed to an have two equivalent names. This may be subtle (`Dantumadeel` versus `Dantumadiel`) or less so (`Leeuwarden` versus `Ljouwert`)

<!-- -->

- There is a `Bergen` (municipality _and_ town) in Noord Holland and a `Bergen` (municipality _and_ town) in Limburg. 
  - In [this government list](https://organisaties.overheid.nl/export/Gemeenten.csv) the municipalities are called `Bergen (L)` and `Bergen NH` but it seems a poor idea to assume that is precisely how they appear in all use. You should proably assume searches by name will mix these two, for you to resolve manually (it would be nice if we could search by gemeentecode/organisatiecode, here gm0893 and gm0373 respectively).


### Some other searches

Another sort of search that may be interesting:

In [23]:
### All changes in the last week

import wetsuite.helpers.etree
one_week_ago = datetime.date.today() - datetime.timedelta(days=7) # python date/datetime objects let you do that

fetched_records = sru_cvdr.search_retrieve_many( 'dcterms.modified > %s'%( one_week_ago.strftime('%Y-%m-%d') ), # date as text, yyyy-mm-dd style 
                                         up_to=20  )
for record in fetched_records:
    #print( wetsuite.helpers.etree.tostring(record).decode('u8') )
    meta = wetsuite.helpers.koop_parse.cvdr_meta( record, flatten=True )
    # you can argue over whether for date you want 'modified', 'issued', 'inwerkingtredingDatum', and maybe show 'terugwerkendekrachtDatum'
    print( "  %15s  %10s  %s"%( meta.get('identifier'), meta.get('modified'),  meta.get('title','') ) )

if len(fetched_records) < sru_cvdr.num_records():
    print("NOTE: there were %d matching items, we fetched only %d"%(sru_cvdr.num_records(), len(fetched_records)))

     CVDR172769_3  2023-07-01  Bouwverordening Enschede 2012
     CVDR213613_4  2024-01-01  Algemeen delegatiebesluit
    CVDR296035_12  2023-07-03  Parkeerverordening 2013
     CVDR296525_3  2024-01-01  Bouwverordening gemeente Buren 2013
     CVDR298206_2  2023-09-01  Verordening geldelijke voorzieningen commissieleden
     CVDR332619_3  2024-01-01  Bouwverordening
     CVDR375606_7  2023-05-27  Besluit bedieningstijden bruggen en sluizen 2007
     CVDR376306_2  2023-07-01  Algemeen delegatiebesluit Vlissingen 2015
     CVDR406898_3  2023-05-25  Verordening op de Adviescommissie POP3 Gelderland
     CVDR413094_8  2023-07-01  Verordening leerlingenvervoer gemeente Borsele 2016
    CVDR418922_13  2023-07-06  Algemene Plaatselijke Verordening voor Arnhem
     CVDR452422_2  2024-01-01  Besluit tot wijziging van de Haven- en Kadeverordening 2016
      CVDR51448_2  2024-01-01  Monumentenverordening Gemeente Maasdriel 2010
     CVDR622795_2  2023-06-02  Besluit van het college van burgemees

In [None]:
# unsorted:


# -----------------------------------------------

#sru_cvdr.search_retrieve_many("creator any Delft", at_a_time=1000, up_to=50000, callback=cvdr_callback)
#sru_cvdr.search_retrieve_many("creator any Amsterdam", at_a_time=1000, up_to=50000, callback=cvdr_callback)
#sru_cvdr.search_retrieve_many("creator any Utrecht", at_a_time=1000, up_to=50000, callback=cvdr_callback)

#sru_cvdr.search_retrieve_many("title any Damocles or title any damoclesbeleid", at_a_time=1000, up_to=50000, callback=cvdr_callback)
#sru_cvdr.search_retrieve_many("title any Opiumwet and title any 13b", at_a_time=1000, up_to=50000, callback=cvdr_callback)
#sru_cvdr.search_retrieve_many("title any Opiumwet and title any 13", at_a_time=1000, up_to=50000, callback=cvdr_callback)
#sru_cvdr.search_retrieve_many("dcterms.source any BWBR0001941", at_a_time=1000, up_to=50000, callback=cvdr_callback)
#sru_cvdr.search_retrieve_many("dcterms.source any 13b", at_a_time=1000, up_to=50000, callback=cvdr_callback)
#sru_cvdr.search_retrieve_many("dcterms.source any opiumwet", at_a_time=1000, up_to=50000, callback=cvdr_callback)
#sru_cvdr.search_retrieve_many("isFormatOf='CVDR640125'", at_a_time=1000, up_to=50000, callback=cvdr_callback)
#  https://repository.officiele-overheidspublicaties.nl/CVDR/CVDR640125/1/xml/CVDR640125_1.xml


#sru_cvdr.search_retrieve_many('dcterms.modified>=2022-06-01', at_a_time=1000, up_to=50000, callback=cvdr_callback)
#sru_cvdr.search_retrieve_many('dcterms.modified>=2022-01-01 and dcterms.modified<=2022-06-01', at_a_time=1000, up_to=50000, callback=cvdr_callback)
#sru_cvdr.search_retrieve_many('dcterms.modified>=2021-01-01 and dcterms.modified<=2021-12-31', up_to=50000, callback=cvdr_callback) 
#sru_cvdr.search_retrieve_many('dcterms.modified>=2013-01-01 and dcterms.modified<=2013-12-31', up_to=50000, callback=cvdr_callback) 
#sru_cvdr.search_retrieve_many('dcterms.modified<=2012-12-31', up_to=50000, callback=cvdr_callback) 

# doesn't seem to let you search for "all versions of"
#sru_cvdr.search_retrieve_many("dcterms.identifier=CVDR272112_2", at_a_time=1000, up_to=50000, callback=cvdr_callback)
#sru_cvdr.search_retrieve_many("dcterms.identifier=CVDR272112", at_a_time=1000, up_to=50000, callback=cvdr_callback)

# ERROR case
#sru_cvdr.search_retrieve_many("dcterms.identifier=CVDR7915_1", at_a_time=1000, up_to=50000, callback=cvdr_callback)

## Other related code

In [24]:
# there are a bunch of helper functions to help you deal with search results (e.g. parsing metadata and identifiers) 
# ...and to some degree the documents.  One or two are used above.    
# TODO: document, explain, demonstrate more

# there are also some more specific tools, like:

# "given a CVDR work id (or specific expression ID implying the work), find all knovn expression IDs for that work ID"
wetsuite.helpers.koop_parse.cvdr_versions_for_work( 'CVDR165982' ) 
#   will also accept expression IDs, e.g. CVDR165982_1, which it treats as its work ID.
#   Note that this does a search, so will not be fast to do for a large list of them.

['CVDR165982_1', 'CVDR165982_2']

# Officiele publicaties

There is some more technical detail in https://www.koopoverheid.nl/binaries/koop/documenten/instructies/2021/02/09/handleiding-voor-het-uitvragen-van-de-collectie-officiele-publicaties/Handleiding+SRU2.0+v1.2+28052021.pdf also touches on details

## PLOOI

https://kia.pleio.nl/file/download/e7fad70c-b2f6-4fd3-ac85-e3b8d0cd9ead/plooi-technische-handreiking.pdf

https://kia.pleio.nl/file/download/a56145c5-89be-4445-8e10-ecbc5458c895/plooi-handreiking-voor-informatie.pdf

In [None]:
sru_plooi = wetsuite.datacollect.koop_repositories.PLOOI( verbose=False )
pprint.pprint( sru_plooi.explain_parsed() ) # seeing which indexes are here. 

In [None]:
def handle_plooi_record(rec):
    print( rec )


#sru_plooi.search_retrieve_many( 'plooi.informatiecategorie any Wob', up_to=5, callback=handle_plooi_record )

sru_plooi.search_retrieve_many( 'dcterms.type = "beslissing op verzoek art. 3 Wob"', up_to=5, callback=handle_plooi_record )


# It seems that individual results are documents that can be part of a larger request, e.g. 
#  https://open.overheid.nl/Details/ronl-0347ff50b0d03b10060fe4bf242431d97d85a3ad/1
#  https://open.overheid.nl/Details/ronl-5719be0c0a840f7c63413cf1c00d8d5eab3177c1/1
# are part of
#  https://open.overheid.nl/Details/ronl-6d82bce1a0afdcbc0d5640b9992bac1631d830c5/1

# The inventaris also refers to the things already public.

