# Purpose of this notebook

See what kind of things we can find out from the [tweede kamer open data portal](https://opendata.tweedekamer.nl/).

That portal has two interfaces: 
an [Atom-style API called SyncFeed](https://opendata.tweedekamer.nl/documentatie/syncfeed-api), and an [OData API](https://opendata.tweedekamer.nl/documentatie/odata-api).

The Atom API is a little easier to interact without a whole library, OData is indeed more work to use but can be a little more thorough and flexible.

The returned formats seem to be XML and JSON.

Either way, there is a [relational data model](https://opendata.tweedekamer.nl/documentatie/informatiemodel) that you should be thinking of,
though in this notebook we focus primarily on the dossiers and documents in them, which stays relatively simple.

Because there is a bunch of data referring to other data, much of the below is trying to explore/show
what kind of interesting things might be in there in the first place.

...because we probably don't want to just fetch everything,
we want to show how to figure how to get and use the parts you need for a specific purpose.

## Atom/SyncFeed API


In [17]:
import collections, pprint

from wetsuite.datacollect     import tweedekamer_nl # contains some basic code dealing with the syncfeed API
from wetsuite.helpers         import etree
from wetsuite.helpers         import strings
from wetsuite.helpers         import notebook

#### Any one resource type

Syncfeed starts by interacting with an URL like:

        https://gegevensmagazijn.tweedekamer.nl/SyncFeed/2.0/Feed?category=Persoon

...which is the first bunch/page of them, and links to successive pages.

To fetch all resources of the mentioned soort/category, and to then perhaps make a single list of it, 
you need something that follows and fetches those links, e.g. our wetsuite.datacollect.tweedekamer_nl.fetch_all

However, due to the relational nature of of the data (pointing at related objects, and you often wanting to fetch a related constellation of them),
it quickly becomes interesting to interrogate the data with more foresight - we'll get to that later.

Uses vary. 
- There is e.g. a [another notebook that extracts who is member of what party](extras_datacollect_tweede_kamer_parties.ipynb), 
  which is a somewhat manual piecing together of 'Persoon', 'Fractie', 'FractieZetel', and 'FractieZetelPersoon'.

- If you wanted a record of what gets done in an everyday way, you might care about 'Vergadering', 'Verslag', 'Stemming',
  or if you are more interested in documentation, then 'Zaak', 'Document', 'Kamerstukdossier'. 

- Kamerstukdossiers are interesting - and also their relations are a little more complex than they seem at first.

For a list of types with some explanation of their relations, see [the documentation](https://opendata.tweedekamer.nl/documentatie/).

For just a list of the types:

In [None]:
tweedekamer_nl.resource_types

In [4]:
# Let's start with something basic - fetch a lits of all Zaal objects

etrees = tweedekamer_nl.fetch_all( 'Zaal' )   # in this case it should take only a handful of seconds, 
#                     though note that for e.g. Stemming, Zaak, Document total to hundred of megabytes
# It returned a list of etree objects,  
# so if our goal were to to write that into a single XML file, we want to merge that
single_tree = tweedekamer_nl.merge_etrees( etrees )  

# For reference, you could see that joined thing
#print( etree.debug_pretty( single_tree ) )
# ...or possibly save that for later use
#xmlstring = wetsuite.helpers.etree.tostring( single_tree )                # then save that 
#with open('%s.xml'%soort, 'wb') as xf:   # (technically injection-sensitive, except it's fine as long as we control the values of soort)
#    xf.write( xmlstring )       

# Right now we are probably just interested in getting out data into python dicts so you can handle them more easily
#There is another function that helps us see each entry's details as python dict structures
entry_dicts = tweedekamer_nl.entry_dicts( single_tree )

pprint.pprint( entry_dicts )

[{'category': 'zaal',
  'id': '6e7dfdae-583a-4191-8818-a89a538c469f',
  'naam': 'Z7 - Statenpassage - Petitie',
  'sysCode': '154',
  'title': '6e7dfdae-583a-4191-8818-a89a538c469f',
  'updated': '2019-08-15T14:45:52Z'},
 {'category': 'zaal',
  'id': 'f207b9d5-434e-4cdc-aa1b-7e5a55bc1791',
  'naam': 'Eerste Kamer',
  'sysCode': '101',
  'title': 'f207b9d5-434e-4cdc-aa1b-7e5a55bc1791',
  'updated': '2019-08-15T14:45:52Z'},
 {'category': 'zaal',
  'id': 'be4bcf1c-a5a1-426b-8ca0-e41fb9493846',
  'naam': 'Z6 - Plein 2 - Petitie',
  'sysCode': '142',
  'title': 'be4bcf1c-a5a1-426b-8ca0-e41fb9493846',
  'updated': '2019-08-15T14:45:52Z'},
 {'category': 'zaal',
  'id': '65c9a207-3a13-4213-99d5-b9a2d6157c26',
  'naam': 'Regentenkamer',
  'sysCode': '67',
  'title': '65c9a207-3a13-4213-99d5-b9a2d6157c26',
  'updated': '2019-08-15T14:45:51Z'},
 {'category': 'zaal',
  'id': 'c66f17e7-6b9a-40b5-b722-01600d78cfb8',
  'naam': 'Schrijfkamer',
  'sysCode': '55',
  'title': 'c66f17e7-6b9a-40b5-b722-016

In [9]:
# a little more readably  /  an example of picking things from that dict
for detail_dict in wetsuite.datacollect.tweedekamer_nl.entry_dicts( single_tree ):
    print(f"{detail_dict['naam']:40s}    {detail_dict['updated']}")


Z7 - Statenpassage - Petitie                2019-08-15T14:45:52Z
Eerste Kamer                                2019-08-15T14:45:52Z
Z6 - Plein 2 - Petitie                      2019-08-15T14:45:52Z
Regentenkamer                               2019-08-15T14:45:51Z
Schrijfkamer                                2019-08-15T14:45:51Z
Statenlokaal                                2019-08-15T14:45:51Z
Van Mierlozaal                              2019-08-15T14:45:52Z
Z5 - Schriftelijke Inbreng                  2019-08-15T14:45:52Z
van Someren-Downerzaal                      2019-08-15T14:45:51Z
Rooksalon                                   2019-08-15T14:45:51Z
Fortuynzaal                                 2019-08-15T14:45:52Z
Koffiekamer                                 2019-08-15T14:45:52Z
Extern                                      2019-08-15T14:45:53Z
Z5 - (Nog) geen zaal beschikbaar            2019-08-15T14:45:53Z
Evenementruimte 1                           2019-08-15T14:45:53Z
Koffiekamer              

#### Kamerstukdossiers

Now let's focus on the kamerstukdossiers.

(note: half of this code is more for interrogating what's there, you can white much shorter things)

In [10]:
# Fetch all kamerstukdossiers.    
#   May take half a minute or so to fetch all,
#   because that's ~30 fetches amounting to ~6MByte of XML.
#   and so also let's not print it
single_tree = tweedekamer_nl.merge_etrees( tweedekamer_nl.fetch_all( 'Kamerstukdossier' ) )

In [18]:
# Let's see what kind of dossier topics we have in there, 
#    ...so far purely by looking at the text in their title

verbose = 0

ourtypes = collections.defaultdict(list)
for i, entry_node in enumerate( single_tree ):
    edict = tweedekamer_nl.entry_dict_from_node( entry_node )

    ksd = entry_node.find('content/kamerstukdossier')
    if verbose:
        print()
        print(etree.debug_pretty(ksd))
        pprint.pprint(edict)

    # exception case:   if that attribute is there, there are no contents
    if ksd.get('verwijderd', None) == 'true':   
        ourtypes['[verwijderd]'].append( ksd.get('id') )
        #print("SKIP verwijderd (%s)"%ksd.get('id'))
        continue 

    try:
        titel = ksd.find('titel').text
    except AttributeError:
        print( 'ERROR: no title? (%r)'%entry_node )
        display( notebook.etree_visualize_selection(ksd, '*', mark_subtree=True) )

    if titel is None:
        print( 'ERROR: dossier without titel (%r)'%entry_node )
        display( notebook.etree_visualize_selection(ksd, '*', mark_subtree=True) )
        continue
    titel = titel.strip()


    if strings.contains_any_of(titel, ['EU-voorstel', 'EU voorstel', 'EU-mededeling', 'EU-trendrapport']):
        ourtypes['eu'].append( titel )
        continue

    elif strings.contains_any_of(titel, ['begrotingssta', 'slotwet', 'voorjaarsnota','najaarsnota', 'Financieel jaarverslag'], case_sensitive=False):
        #print( 'BEGROTING %-7s %20s  %s'%(edict.get('nummer'), edict.get('updated'), titel) )
        ourtypes['begroting'].append( titel )   # up here to not accidentally count wijziging in begrotingsstaat as a law
        continue
    elif strings.contains_any_of(titel, ['omzetbelasting'], case_sensitive=False):
        #print( 'BELASTING %-7s %20s  %s'%(edict.get('nummer'), edict.get('updated'), titel) )
        ourtypes['belasting'].append( titel )
        continue

    elif strings.contains_any_of(titel, ['wetsvoorstel', 'voorstel van wet'], case_sensitive=False):
        #print( 'LAW       %-7s %20s  %s'%(ksd.get('nummer'), ksd.get('updated'), titel) )
        ourtypes['wet'].append( titel )
        continue
    elif titel.startswith('Wet '):
        ourtypes['wet'].append( titel ) 
        continue
    
    elif strings.contains_any_of(titel, ['wetswijziging', 'wijziging van wet ', 'wijziging van de wet ', 'aanpassing van de wet',
                                 'Wijziging van de', # followed by a specifically named law   this one is fuzzier than necessary, might be better to regexp-match
                                 ], case_sensitive=False):
        ourtypes['wet'].append( titel )
        continue
    elif strings.contains_all_of(titel, ['wijziging', 'wetboek'], case_sensitive=False):
        ourtypes['wet'].append( titel )
        continue
    elif strings.contains_all_of(titel, ['wijziging', 'wetten'], case_sensitive=False):
        #print( 'LAW       %-7s %20s  %s'%(ksd.get('nummer'), ksd.get('updated'), titel) )
        ourtypes['wet'].append( titel )
        continue
    elif strings.contains_all_of(titel, ['verbeter', 'wetten'], case_sensitive=False):
        ourtypes['wet'].append( titel )
        continue
    elif strings.contains_all_of(titel, ['aanpassing', ' Wet'], case_sensitive=False):
        ourtypes['wet'].append( titel )
        continue

    elif strings.contains_any_of(titel, ['Initiatiefnota','Interpellatie'], case_sensitive=False):
        ourtypes['discussions'].append( titel )
        continue
    elif strings.contains_any_of(titel, ['burgerinitiatief',], case_sensitive=False):
        ourtypes['discussions'].append( titel )
        continue
    elif strings.contains_any_of(titel, ['Herindeling van de gemeenten',]):
        ourtypes['local'].append( titel )
        continue

    elif strings.contains_any_of(titel, ['mbudsman',]):
        ourtypes['ombudsman'].append( titel )
        continue

    elif strings.contains_any_of(titel, ['Evaluatie',]):
        ourtypes['evaluatie'].append( titel )
        continue

    elif strings.contains_any_of(titel, ['Verdrag',]):
        ourtypes['verdrag'].append( titel )
        continue

    else:
        ourtypes['unsorted'].append( titel )
        #print( 'DONTKNOW %-7s %20s  %s'%(edict.get('nummer'), edict.get('updated'), titel) )
        continue
        #if re.search('', titel):

    #sru_openpub.search_retrieve_many( 'w.dossiernummer=%s'%edict.get('nummer'), callback=op_callback )


for typ, title_list in ourtypes.items():
    #if typ=='unsorted': # cases for which the title isn't a strong indication -- fair enough, but print them to see if there's any patterns we're missing
    #    pprint.pprint( title_list )
    print(  '%-5d %s'%( len(title_list), typ )  )
#pprint.pprint( dtypes )

2056  wet
2009  unsorted
1713  begroting
438   verdrag
245   discussions
108   evaluatie
297   eu
24    ombudsman
6     local
17    belasting
18    [verwijderd]


## OData interface

The SyncFeed API is perfectly functional, though it leaves you to do interpretation of the relations yourself, 
so let's see if the OData API is any more help.

There is a helpful library out there, [tkapi](https://github.com/openkamer/tkapi) (MIT license). 
This means we don't have to implement it ourselves.

Note that neither API is is a very _fast_ interface, 
or efficient when your goal is sifting through the entire collection of data in ways it was not initially designed for. 
(It also seems it sometimes refuses to connect?)

That is part of why it might be interesting to some to have this notebook, and the produced dataset just give you the actual list.

In [None]:
# if you haven't already
!pip3 install tkapi

In [19]:
import tkapi, tkapi.document   
from tkapi.document import DocumentSoort
api = tkapi.TKApi()

In [20]:
# Get an idea of what this API even does
#   the document types are function calls, to wit:
list(name   for name in dir(api)   if name.startswith('get_'))

['get_activiteiten',
 'get_agendapunten',
 'get_all_items',
 'get_antwoorden',
 'get_besluiten',
 'get_commissies',
 'get_documenten',
 'get_dossiers',
 'get_fractie_zetels',
 'get_fracties',
 'get_geschenken',
 'get_item',
 'get_items',
 'get_kamervragen',
 'get_personen',
 'get_reizen',
 'get_related',
 'get_stemmingen',
 'get_vergaderingen',
 'get_verslagen',
 'get_verslagen_van_algemeen_overleg',
 'get_zaken']

In [25]:
# Okay, let's try that
#dossiers = api.get_dossiers( )
#len(dossiers)  # a few thousand, which is why get_dossiers took a few secdonds to fetch

from tkapi.dossier import Dossier
dossier_filter = Dossier.create_filter()
dossier_filter.filter_nummer('35302')

dossiers = api.get_dossiers( dossier_filter )


In [60]:
# Let's get a basic summary of dossiers.
#    we find out apparently  dossier nummers  are not unique without the  toevoeging

# let's also sort by dossiers -- and toevoeging, which requires minor syntax-fu right now
sorted_dossiers = sorted(  dossiers,   key=lambda dossier:str(dossier.nummer)+(dossier.toevoeging or '')  )

i = 0
for dossier in sorted_dossiers:   #[:200]: # show a few hundred, not all

    # you could e.g. figure out other zaken that refer to the same documents
    #zaaknrs = set()
    #for zaak in dossier.zaken:
    #    zaaknrs.add( zaak.nummer ) # zaak.onderwerp)

    nummer_and_toevoeging = ('%s-%s'%(dossier.nummer, dossier.toevoeging or '')).rstrip('-')
    print( f"== Dossier {nummer_and_toevoeging} == {dossier.titel} ==" )
    print( '  ',dossier.url.replace(')','%29') ) # the replace is to make the notebook's url include the final bracket
    for document in sorted(dossier.documenten, key=lambda doc:doc.volgnummer):
        # {str(document.soort).split(".",1)[1][:25]:25s}
        print( f'   DOC {str(document.volgnummer):3s} - {str(document.datum):12s} - {document.onderwerp:100s} - {document.bestand_url:30s}'
              )
    break


# a document object has attributes like
['aanhangselnummer', 'activiteiten', 'actors', 'agendapunten', 'alias', 'begin_date_key', 'bestand_url', 'create_filter', 
'datum', 'dossier_nummers', 'dossiers', 'end_date_key', 'expand_params', 'filter_param', 'get_date_from_datetime_or_none',
'get_date_or_none', 'get_datetime_or_none', 'get_param_expand', 'get_params_default', 'get_property_enum_or_none',
'get_property_or_empty_string', 'get_property_or_none', 'get_resource_url_or_none', 'get_year_or_none', 'gewijzigd_op',
'id', 'nummer', 'onderwerp', 'orderby_param', 'print_json', 'related_item', 'related_items', 'related_items_deep',
'soort', 'titel', 'titel_citeer', 'type', 'url', 'vergaderjaar', 'versies', 'volgnummer', 'zaken']

== Dossier 35302 == Wijziging van enkele belastingwetten en enige andere wetten (Belastingplan 2020) ==
   https://gegevensmagazijn.tweedekamer.nl/OData/v4/2.0/Kamerstukdossier(d1b73eca-a237-451f-817e-1e96a9328a5a%29
   DOC 1   - 2019-09-17   - Koninklijke boodschap                                                                                - https://gegevensmagazijn.tweedekamer.nl/OData/v4/2.0/Document(5c820386-d13d-4615-96d3-bcd5a299e754)/TK.DA.GGM.OData.Resource()
   DOC 2   - 2019-09-17   - Voorstel van wet                                                                                     - https://gegevensmagazijn.tweedekamer.nl/OData/v4/2.0/Document(168dbc2e-8efb-4818-bc4c-41dd4d7cbc77)/TK.DA.GGM.OData.Resource()
   DOC 3   - 2019-09-17   - Memorie van toelichting                                                                              - https://gegevensmagazijn.tweedekamer.nl/OData/v4/2.0/Document(05794c6c-47e8-4598-98e5-a6e78160bd1b)/TK.DA.GGM.OData.Resource()
   DOC 4 

['aanhangselnummer',
 'activiteiten',
 'actors',
 'agendapunten',
 'alias',
 'begin_date_key',
 'bestand_url',
 'create_filter',
 'datum',
 'dossier_nummers',
 'dossiers',
 'end_date_key',
 'expand_params',
 'filter_param',
 'get_date_from_datetime_or_none',
 'get_date_or_none',
 'get_datetime_or_none',
 'get_param_expand',
 'get_params_default',
 'get_property_enum_or_none',
 'get_property_or_empty_string',
 'get_property_or_none',
 'get_resource_url_or_none',
 'get_year_or_none',
 'gewijzigd_op',
 'id',
 'nummer',
 'onderwerp',
 'orderby_param',
 'print_json',
 'related_item',
 'related_items',
 'related_items_deep',
 'soort',
 'titel',
 'titel_citeer',
 'type',
 'url',
 'vergaderjaar',
 'versies',
 'volgnummer',
 'zaken']

Let's say that our interest is more specific:
finding what  Raad van State  has to say about  proposed laws (wetsvoorstellen).

...and, in the process also learn what the kinds of documents there are in each dossier.
 
There is also the [advice on the raad van state site](https://www.raadvanstate.nl/adviezen/),
(for a more data-like form, see also our [extras_datacollect_raadvanstate](extras_datacollect_raadvanstate.ipynb)),
but there it is not placed in the context of the law it's referring to.
This interface should at least gives us the law's name.

In [42]:
# We start by selecting dossiers where there already _is_ RvS advice.
#  - this is a decent filter for wetsvoorstellen
#  - and filters out wetsvoorstellen that don't need this advice (e.g. begroting)
# ...but we are about to find out
# - there are other things that RVS advises on, like finances (see e.g. 36200) 
# - there are law changes that RVS does not advise on (e.g. TODO)

sorted_dossiers = sorted(dossiers,  key=lambda d:d.nummer,  reverse=True )

count = 0
for i, dossier in enumerate( sorted_dossiers ):
    nummer_and_toevoeging = ('%s-%s'%(dossier.nummer, dossier.toevoeging or '')).rstrip('-')

    #if (dossier.nummer%100) == 0: # ignore a few specific special cases for now,   just because they're large to print
    #    continue

    ## In our stated interest:  first see if it has RvS advice
    sorted_docs      = sorted(dossier.documenten,  key=lambda d:d.volgnummer )
    has_raadvanstate = False
    for document in sorted_docs:
        try:
            # these come from an enum, try  list( tkapi.document.DocumentSoort )  to see a list
            if document.soort in (DocumentSoort.ADVIES_AFDELING_ADVISERING_RAAD_VAN_STATE, 
                                  #DocumentSoort.ADVIES_AFDELING_ADVISERING_RAAD_VAN_STATE_EN_NADER_RAPPORT, # seems to be begrotingstuff?  (TODO: check)
                                  DocumentSoort.ADVIES_AFDELING_ADVISERING_RAAD_VAN_STATE_EN_REACTIE_VAN_DE_INITIATIEFNEMERS,
                                ):
                has_raadvanstate = True
        except ValueError: # there's some invalid / non-covered soort values in the data
            pass # ignore
        # we can filter on more, but we may not need to?

    if not has_raadvanstate:
        continue
    # if execution gets here, it's probably interesting to us.
    
    count += 1
    #if len(sorted_docs)>500:
    #    print( "\n\n== Dossier %s == %s =="%( dossier.nummer, dossier.titel) )
    #    print(' LARGE: %d documents'%len(sorted_docs))
    #    print(' %s ({{kamerdossier|%d}}'%(dossier.titel, dossier.nummer))
    #    continue

    print( "\n== %r == Dossier %s == %d docs == %s =="%( dossier.id, nummer_and_toevoeging, len(dossier.documenten), dossier.titel) )
    for document in sorted_docs:
        try:
            if 0: # just to make the summaries a little easier to read
                if document.soort in (DocumentSoort.MOTIE, DocumentSoort.AMENDEMENT, DocumentSoort.BRIEF_REGERING, DocumentSoort.VERSLAG_VAN_EEN_ALGEMEEN_OVERLEG,
                                    DocumentSoort.MEMORIE_VAN_TOELICHTING_INITIATIEFVOORSTEL,
                                    ):
                    continue
        except ValueError:
            print( "soort not known by tkapi")
            continue

        try:
            docsoort = document.soort
        except ValueError: # this seems to be internal inconsistency
            continue

        show_all_docs = False
        if show_all_docs or docsoort in (DocumentSoort.ADVIES_AFDELING_ADVISERING_RAAD_VAN_STATE, 
                                DocumentSoort.ADVIES_AFDELING_ADVISERING_RAAD_VAN_STATE_EN_REACTIE_VAN_DE_INITIATIEFNEMERS):

            print( '#%s'%(document.volgnummer, ), document.soort.name)
            #print( 'soort', document.soort.name, '(%s)'%document.soort.value )
            print( '  onderwerp    ', document.onderwerp )     # for wetsvoorstel-dossiers, seems to often be the same as soort plus some detail (who a letter is from, who )
            print( '  citeertitel  ', document.titel_citeer ) # for wetsvoorstel-dossiers, this often seems to name the law. Or a related one, see e.g. 36195
            print( '  titel        ', document.titel )              # for wetsvoorstel-dossiers, this seems to often name the law, plus sometimes some reason
            #print( 'versies', document.versies )
            print( '  url          ', document.bestand_url )
            if 0:        # It may be interesting to know the document is part of multiple dossiers and/or multiple zaken
                print( '  zaken         ', document.zaken )
                #nums = document.dossier_nummers
                #nums.pop(dossier.nummer)
                #if len(nums)>0:
                #  print( "  also in dossiers: %s"%nums )

            print()

    #if i > 1000: # show only a bunch, not all
    #    print("break %d"%i)
    #    break
print( 'Interesting cases: %d'%count )


== '87a7ff74-bf81-4e71-a69e-2439ce536c2c' == Dossier 36346 == 5 docs == Voorstel van wet van het lid Van Houwelingen betreffende het houden van een raadplegend referendum over het Nederlandse lidmaatschap van de Europese Unie (Wet raadplegend referendum Nederlands EU-lidmaatschap) ==
#4 ADVIES_AFDELING_ADVISERING_RAAD_VAN_STATE_EN_REACTIE_VAN_DE_INITIATIEFNEMERS
  onderwerp     Advies Afdeling advisering Raad van State en Reactie van de initiatiefnemer
  citeertitel   Wet raadplegend referendum Nederlands EU-lidmaatschap
  titel         Voorstel van wet van het lid Van Houwelingen betreffende het houden van een raadplegend referendum over het Nederlandse lidmaatschap van de Europese Unie (Wet raadplegend referendum Nederlands EU-lidmaatschap)
  url           https://gegevensmagazijn.tweedekamer.nl/OData/v4/2.0/Document(bd0dbb7d-bbbe-4aa3-8e28-ba55df7756e3)/TK.DA.GGM.OData.Resource()


== '49f64470-6e41-4115-9547-420f5e3b6a3e' == Dossier 36200 == 188 docs == Nota over de toestand van ’

KeyboardInterrupt: 

## Actually making data from that

TODO: decide what