<a href="https://colab.research.google.com/github/scarfboy/wetsuite-dev/blob/main/examples/dataset_gemeentes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip --quiet install https://github.com/scarfboy/wetsuite-dev/archive/refs/heads/main.zip

In [3]:
import wetsuite.datasets
gem = wetsuite.datasets.load('gemeentes')
print( gem.description )


    This is largely the more interesting fields from https://organisaties.overheid.nl/export/Gemeenten.csv
    augmented with RDF-like data like that under https://standaarden.overheid.nl/owms/terms/Leiden_(gemeente)

    
    .data is a list of dicts, one per gemeente (currently 344 of them). Keys in that dict include:
    
    'Namen' - a list of name variants. 
       Usually just the short name, and a longer one with "Gemeente " in front
       Sometimes with alternative names, e.g. ["Den Bosch", "Gemeente 's-Hertogenbosch", "'s-Hertogenbosch"]
       We have used these as "Match one of these" to search for gemeentebeleid per gemeente
   
    Descriptions like 'Aantal inwoners', 'Oppervlakte'
    
    Organisational relations like 
      - 'Bevat plaatsen'
      - 'Overlaps with', mentioning Provinces, Waterschappen
      - 'Service area of' - things like GGD, Police, Social services  (each item is a list because we tend to have a full name and an abbreviation)
      - 'Predecesso

In [36]:
import pprint, random

pprint.pprint( random.choice( gem.data ) )  # show details for one random gemeente

{'Aantal inwoners': '51119',
 'Bevat plaatsen': ['Akkrum',
                    'Aldeboarn',
                    'Bontebok',
                    'De Knipe',
                    'Gersloot',
                    'Haskerdijken',
                    'Heerenveen',
                    'Hoornsterzwaag',
                    'Jubbega',
                    'Katlijk',
                    'Luinjeberd',
                    'Mildam',
                    'Nes Gem Heerenveen',
                    'Nieuwebrug',
                    'Nieuwehorne',
                    'Nieuweschoot',
                    'Oranjewoud',
                    'Oudehorne',
                    'Oudeschoot',
                    'Terband',
                    'Tjalleberd'],
 'CBSCode': '0074',
 'Namen': ['Heerenveen', 'Gemeente Heerenveen'],
 'OWMS URI': 'http://standaarden.overheid.nl/owms/terms/Heerenveen_(gemeente)',
 'Oppervlakte': [140, 'km2'],
 'Organisatiecode': 'gm0074',
 'Overlaps with': [['Fryslân'], ['Wetterskip Fryslân']]

## Beleidsregels per gemeente

One reason for this not-really-a-dataset was just the names, so we can do things like look for beleidsregels per gemeente.0

The below example combines the names with a specific search into the KOOP repositories, one per municipality (see also the datacollect_koop_repos example for more introduction to the repositories).

In [33]:
import datetime
import wetsuite.datacollect.koop_repositories
import wetsuite.helpers.etree as etree
import wetsuite.helpers.net


# the weird offset is trying to find Den Haag with its other name, to check that the search is not tripping over that
#  ...and to illustrate there isn't actually a good hit for some - Den Haag does actually have a policy, but not in CVDR
for gemeente_dict in gem.data[64:67]:  
    query_gemeente_names = ' OR '.join( '(creator = "%s")'%naam  for naam in gemeente_dict['Namen'] )

    ## Construct a complex-looking query to mean:
    #   (match gemeente by one of its names)  AND (  mentions 'damocles'   OR   (mentions drugs or the opiumwet  AND  mentions words you likely see around putting people out of their house)
    # This is a practical consideration: we _will_ get too many results, but at least what we want is probably in there,  and filtering out can be easier than searching again 
    query = '(%s) AND ( (body any "damoclesbeleid damocles")  OR  (body any "drugs softdrugs harddrugs handelshoeveelheid opiumwet 13b") AND (body any "sluiting herstelsanctie bestuursdwang"))'%( 
        query_gemeente_names
    )
    #print( query )


    ## search and fetch only first page, just so that num_records is filled in to report
    cvdr = wetsuite.datacollect.koop_repositories.CVDR()
    cvdr.search_retrieve( query ) 
    print( "\n == %3d  hits for   %s == "%(cvdr.num_records(), ' / '.join(gemeente_dict['Namen'])) )

    ## search and fetch all, summarizing each record as we go
    def show_brief( record ): 
        ''' a brief summary of each search result.   
            Ignore how this code works for now, because we absolutely need to make this easier for you to do. '''
        #print( etree.tostring( gzd ).decode('u8') ) # for debug, seeing what's in that record
        gzd = record.find('recordData')[0]
        owmskern     = etree.kvelements_to_dict( gzd.find('originalData/meta/owmskern') )
        cvdripm      = etree.kvelements_to_dict( gzd.find('originalData/meta/cvdripm')  )
        enrichedData = etree.kvelements_to_dict( gzd.find('enrichedData')               )
        # ignore things that were expired, because they were probably replaced by something else also in the results  (note: the repo's expiry data doesn't look 100% correct)
        uit = cvdripm.get('uitwerkingtredingDatum', None) # if there is no uitwerkingtreding, it still applies. If it is a date before todat, it does not.
        if uit is not None  and  (datetime.datetime.strptime(uit,"%Y-%m-%d").date() < datetime.date.today()): 
           pass # expired
        else:
          print( "  %15s  %10s..%-10s  %s"%( owmskern.get('identifier'),  cvdripm.get('inwerkingtredingDatum'),  cvdripm.get('uitwerkingtredingDatum',''),  owmskern.get('title')) )
          #print('    URL: %s'%enrichedData.get('publicatieurl_xml') ) 
          # 'publicatieurl_xml' points to text in structured XML.  There is also 'publicatieurl_xhtml' (more browser-presentable),  and 'preferred_url' (a link to the page that lokaleregelgeving.overheid.nl would also send you to)

    cvdr.search_retrieve_many( query, callback=show_brief ) # all results, and show titles



 ==  30  hits for   Delft / Gemeente Delft == 
     CVDR681430_1  2022-09-21..            Beleidsregel bestuurlijke handhaving artikel 13b Opiumwet, Delft 2022
     CVDR663040_1  2021-10-16..            Coffeeshopbeleid Delft 2021 (met handhavingsarrangement)
     CVDR681180_1  2022-09-13..            Handhavingsarrangement mensenhandel voor Delft
    CVDR185002_20  2022-10-27..            Algemene plaatselijke verordening voor Delft
      CVDR45720_2  2010-10-11..            Horeca Exploitatieverordening voor Delft
     CVDR681261_1  2022-09-15..            Uitvoeringsplan Veiligheid Prioriteiten 2022

 ==  56  hits for   Den Haag / Gemeente Den Haag / 's-Gravenhage == 
     CVDR645629_1  2020-11-10..            Beleidsregel toezicht bedrijfsmatige activiteiten 2020
     CVDR674619_1  2022-03-24..            Beleidsregel bestuurlijke boete, sluiting en beheerovername op grond van de Woningwet Den Haag 2022
     CVDR690428_1  2023-01-01..            Beleidsregel beoordeling levensgedr