<a href="https://colab.research.google.com/github/knobs-dials/wetsuite-datacollect/blob/main/tweede_kamer_dossiers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Purpose of this notebook

See how we might get towards actually making a dataset from data at [tweede kamer open data portal](https://opendata.tweedekamer.nl/).

We restrict ourselves to kamerstukdossiers and their documents.

You probably want to read at least the last part of part 1, as it started looking 



## Kamerstukdossiers

Kamerstukken are documents between government and parialment, 
and are are organized into dossiers (kamerdossiers, kamerstukdossiers).


### How do numbers work?

The first thing to grasp is how the identifiers for dossiers work, how they these are numbered.


**Five digits for the dossier, plus a a number within that dossier?**

Broadly, there is a five-digit numbering that started in 1945 and has been increasing since.

For budgets it goes to a round number (next multiple of 50 or so? so we can recognize them easily?),
and skip some numbers in the process, but that's minor details.

Say, [31440](https://zoek.officielebekendmakingen.nl/dossier/31440) has a [document nr 1](https://zoek.officielebekendmakingen.nl/dossier/kst-31440-1.html) and a [document nr 2](https://zoek.officielebekendmakingen.nl/dossier/kst-31440-2.html) (kst-31440-1 and kst-31440-2 if you like identifiers).

Simple, right?

Weeeell, no.

**Sometimes the year is in there too**

kst-20062007-30961-B

kst-20082009-30536-V



**numbers do not need to be numbers**
- sometimes there's also letters, e.g. [36268](https://zoek.officielebekendmakingen.nl/dossier/36268) has 1..7, A, and B
- sometimes there's _only_ letters. 
  - Sometimes [double letters](https://zoek.officielebekendmakingen.nl/kst-20082009-22112-CR.html) (note that [22112 is huge, Nieuwe Commissievoorstellen en initiatieven van de lidstaten van de Europese Unie](https://zoek.officielebekendmakingen.nl/dossier/22112))  -- apparently it's A to Z, AA though AZ, BA though BZ, etc.

- there's stuff like [dossier 31544](https://zoek.officielebekendmakingen.nl/dossier/31544) and I'm not yet sure how that fits into the numbering/identifier distinctions

**identifiers should be unique, but numbers don't always show that way**
- e.g. [28965](https://zoek.officielebekendmakingen.nl/dossier/28965) seems to have three numbers 10
  - ...the internal identifiers happen to be `kst-28965-10`, `kst-28965-10-b1`, `kst-28965-10-b2`

**the five digits can be... added to** (as part 1 already discovered)
  - For example
    - there is a dossier 33400. There is also a dossier 33400-I, 33400-III, 33400-A, and more than a dozen mode 
    - there is dossier 21501 (with 1 document) and 
21501-02     Raad Algemene Zaken en Raad Buitenlandse Zaken
21501-03     Begrotingsraad
21501-04     Ontwikkelingsraad
21501-07     Raad voor Economische en Financiële Zaken
21501-08     Milieuraad
21501-20     Europese Raad
21501-27     Sportraad
21501-28     Defensieraad
21501-32     Landbouw- en Visserijraad
21501-33     Raad voor Vervoer, Telecommunicatie en Energie
21501-34     Raad voor Onderwijs, Jeugd, Cultuur en Sport
21501-31     Raad voor de Werkgelegenheid, Sociaal Beleid, Volksgezondheid en Consumentenzaken
21501-30     Raad voor Concurrentievermogen

  - these are distinct dossiers - that happen to be strongly related. In this case, they are all about the budget for 2013
  - this has its own conventions, like:
    - the split between roman numeras and letters
    - that letters (other than I) may be skipped for consistency of that lettering with previous years.
    - you might call the way II and III are split differently an inconsistency

To expand on that last-mentioned dossier case:
- [33400-I](https://zoek.officielebekendmakingen.nl/doossier/33400-I)        - Vaststelling van de begrotingsstaat van de Koning
- [33400-IIA](https://zoek.officielebekendmakingen.nl/dossier/33400-IIA)     - Staten-generaal
- [33400-IIB](https://zoek.officielebekendmakingen.nl/dossier/33400-IIB)     - Overige Hoge Colleges van Staat en Kabinetten van de Gouverneurs
- [33400-III](https://zoek.officielebekendmakingen.nl/dossier/33400-III)     - Ministerie van Algemene Zaken (IIIA), Kabinet der Koningin (IIIB), Commissie van Toezicht betreffende de Inlichtingen- en Veiligheidsdiensten (IIIC)
- [33400-IV](https://zoek.officielebekendmakingen.nl/dossier/33400-IV)       - Koninkrijksrelaties
- [33400-V](https://zoek.officielebekendmakingen.nl/dossier/33400-V)         - Ministerie van Buitenlandse Zaken
- [33400-VI](https://zoek.officielebekendmakingen.nl/dossier/33400-VI)       - Ministerie van Veiligheid en Justitie
- [33400-VII](https://zoek.officielebekendmakingen.nl/dossier/33400-VII)     - Ministerie van Binnenlandse Zaken en Koninkrijksrelaties
- [33400-VIII](https://zoek.officielebekendmakingen.nl/dossier/33400-VIII)   - Ministerie van Onderwijs, Cultuur en Wetenschap (VIII)
- [33400-IX](https://zoek.officielebekendmakingen.nl/dossier/33400-IX)       - Ministerie van Financiën 
- [33400-X](https://zoek.officielebekendmakingen.nl/dossier/33400-X)         - Ministerie van Defensie (X)
- [33400-XII](https://zoek.officielebekendmakingen.nl/dossier/33400-XII)     - Ministerie van Infrastructuur en Milieu (XII)
- [33400-XIII](https://zoek.officielebekendmakingen.nl/dossier/33400-XIII)   - Ministerie van Economische Zaken, Landbouw en Innovatie 
- [33400-XV](https://zoek.officielebekendmakingen.nl/dossier/33400-XV)       - Ministerie van Sociale Zaken en Werkgelegenheid 
- [33400-XVI](https://zoek.officielebekendmakingen.nl/dossier/33400-XVI)     - Ministerie van Volksgezondheid, Welzijn en Sport (XVI)
- [33400-XVII](https://zoek.officielebekendmakingen.nl/dossier/33400-XVII)   - Buitenlandse Handel en Ontwikkelingssamenwerking (XVII) 
- [33400-XVIII](https://zoek.officielebekendmakingen.nl/dossier/33400-XVIII) - Wonen en Rijksdienst (XVIII) 
- [33400-A](https://zoek.officielebekendmakingen.nl/dossier/33400-A)         - Infrastructuurfonds 
- [33400-B](https://zoek.officielebekendmakingen.nl/dossier/33400-B)         - Gemeentefonds 
- [33400-C](https://zoek.officielebekendmakingen.nl/dossier/33400-C)         - Provinciefonds
- [33400-F](https://zoek.officielebekendmakingen.nl/dossier/33400-F)         - Diergezondheidsfonds 
- [33400-H](https://zoek.officielebekendmakingen.nl/dossier/33400-H)         - BES-fonds 
- [33400-J](https://zoek.officielebekendmakingen.nl/dossier/33400-J)         - Deltafonds

Presented like that, it just seems like good organisation -- though it arguably makes it least clear what 
- [33400](https://zoek.officielebekendmakingen.nl/dossier/33400) _without_ additions then is. Is it the broad stuff? Miscellaneous? Yes?
 

Further notes
- One minor implication is that when you see `kst-35302-F`, that must mean it's [dossier 35302, nr. F](https://zoek.officielebekendmakingen.nl/kst-35302-F), not dossier 35302-F.
  - ...only because that looks like an identifier - just `35302-F` takes more guesswork to figure out.

- ...and don't assume [I](https://zoek.officielebekendmakingen.nl/kst-20072008-30536-I.html) and [V](https://zoek.officielebekendmakingen.nl/kst-20082009-30536-V) are avoided as numbers

- the interesting-toevoeging and interesting-numbering can of course combine
  - so you get things like `kst-31700-XVI-H-b2` and `kst-20082009-31700-IV-E-h1`
    - which in the `kst-21501-33-226` form that is an identifier for a specific document you can figure out
    - but there are cases, like `21501-33`, that you can't figure out without further knowledge - is that dossier 21501 document number 33, or is that dossier 21501-33?

- Apparently before 1945 the numbering reset per year
  We mostly don't deal with what because a lot of that isn't digital (yet?)



In [97]:
def is_all_digits(s):
    return len( s.strip('0123456789') ) == 0

def has_lowercase_letter(s):
    return ( s != s.upper())


def parse_bekendmaking_id(s):
    ret = {}
    parts = s.split('-')

    if s.startswith('ah-tk-'):
        ret['type'] = 'ah-tk'
        parts.pop( 0 )
        parts.pop( 0 )
        ret['vergaderjaar'] = parts.pop( 0 )
        #if len(parts)>1:
        #    raise ValueError('ahtkcheck', parts)
        ret['docnum'] = '-'.join(parts)
    elif s.startswith('ah-ek-'):
        ret['type'] = 'ah-ek'
        parts.pop( 0 )
        parts.pop( 0 )
        ret['vergaderjaar'] = parts.pop( 0 )
        #if len(parts)>1:
        #    raise ValueError('ahekcheck', parts)
        ret['docnum'] = '-'.join(parts)
    elif s.startswith('h-ek-'):
        ret['type'] = 'h-ek'
        parts.pop( 0 )
        parts.pop( 0 )
        ret['vergaderjaar'] = parts.pop( 0 )
        #if len(parts)>1:
        #    raise ValueError('ahekcheck', parts)
        ret['docnum'] = '-'.join(parts)
        #print(s, ret)
    elif s.startswith('h-tk-'):
        pass
    elif s.startswith('ag-tk-'):
        pass
    elif s.startswith('kst-'):
        ret = parse_kst_id( s )
    elif parts[0] in ('stcrt', ):
        pass
    elif s.startswith('stb-'):
        pass
    elif s.startswith('trb-'): # includes interesting cases like trb-2009-mei-v1, which is part of trb-2009-mei
        pass
    elif parts[0] in ('ag', 'blg', 'kv', 'nds', 'h', 'ah'
    ):
        parts.pop(0)
        if len(parts)==1:
            ret['type'] = parts[0]
            ret['docnum'] = parts[0]
        elif len(parts)==2:  # ASSUMED for now
            ret['type'] = parts[0]
            ret['docnum'] = '-'.join(parts) 
        else:
            raise ValueError( 'ERR1', s, parts )
    
    else:
        raise ValueError( 'ERR2', s, parts )


def parse_kst_id(s, debug=False):
    ret = {}#{'input':s}
    dossiernum=[]
    parts = s.split('-')

    #if s in 'kst-15510 kst-15445 kst-15750 kst-LXXX-B kst-12025 kst-12023 kst-14448 kst-16753 kst-16805 kst-14672 kst-15656 kst-15608'.split():
    #    ret['_var'] = 'e'
    #    return ret

    if parts[0] == 'kst':
        ret['type'] = parts.pop(0)
    else:
        raise ValueError('Does not start with kst: %r'%s)

    if len(parts[0])==8:
        # this is a solid source, but we only sometimes have it
        # so we might as well breed the expectation you need to parse the metadata for this
        #ret['vergaderjaar'] =  
        parts.pop( 0 )

    if len(parts[0])==5:
        dossiernum.append( parts.pop( 0 ) )
    else:
        return {}
        raise ValueError("ERR1 Don't know what to do with %r - %r"%(s, parts))

    # in the context of a kst- identifier, we know we are referring to a document so can make some assumptions
    if len(parts) == 0: 
        ret['_var'] = 'e'
        return ret
        #raise ValueError("ERR0 Don't know what to do with %r - %r"%(s, parts))
    
    elif len(parts) == 1: 
        # cases like
        # there must be a document number, so this must be it
        ret['docnum'] = parts.pop( 0 )
        ret['_var'] = '1'

    elif len(parts) == 2: 
        # cases like  kst-32123-[I-5],   kst-21501-[33-226] kst-20082009-31700-[IV-D]
        #             kst-32168-[3-b2],     
        if is_all_digits( parts[-1] ):          # must be a singular full document number (?) so the first part must be dossiernum
            dossiernum.append( parts.pop(0) )
            ret['docnum'] = parts.pop(0)
            ret['_var'] = '2a'
        elif has_lowercase_letter( parts[-1] ): # that's the second part of a document numer
            ret['docnum'] = '-'.join(parts)
            ret['_var'] = '2b'
        else:                                   # assume last part is just a document number (so it's actually the first case again)
            dossiernum.append( parts.pop(0) )
            ret['docnum'] = parts.pop(0)
            ret['_var'] = '2c'
            #raise ValueError("ERR2 Don't know what to do with %r - %r"%(s, parts))

    elif len(parts) == 3: 
        ret['_var'] = '3'
        # cases like kst-32123-[XIV-A-b1]  
        # TODO: check we can actually assume this is always moredossiernum-docnum-moredocnum
        dossiernum.append( parts.pop(0) )
        ret['documentnum'] = '-'.join(parts)
        #raise ValueError("ERR3 Don't know what to do with %r - %r"%(s, parts))
    
    else:
        raise ValueError("ERR4 Don't know what to do with %r - %r"%(s, parts))

    ret['dossiernum'] = '-'.join( dossiernum )

    if not debug:
        ret.pop('_var')
    return ret



for test in 'kst-32123-I-5   kst-20082009-32024-C   kst-32142-A2E   kst-26643-144-h1   kst-32123-XIV-A-b1   kst-32168-3-b2 '.split():
    d = parse_kst_id(test)
    print( '%-25s https://zoek.officielebekendmakingen.nl/dossier/%-10s  %s'%(test, d['dossiernum'], d) )

import os
i = 0
for r, ds, fs in os.walk('/data/Docs/_/2009'):
    for fn in fs:
        if '-' in fn and '.' in fn:
            i+=1
            fn = fn.split('.',1)[0]
            #print(fn)
            try:
                parse_bekendmaking_id(fn)
            except Exception as e:
                ffn = os.path.join(r, fn)
                print(ffn, e) 

            #break

i

kst-32123-I-5             https://zoek.officielebekendmakingen.nl/dossier/32123-I     {'type': 'kst', 'docnum': '5', 'dossiernum': '32123-I'}
kst-20082009-32024-C      https://zoek.officielebekendmakingen.nl/dossier/32024       {'type': 'kst', 'docnum': 'C', 'dossiernum': '32024'}
kst-32142-A2E             https://zoek.officielebekendmakingen.nl/dossier/32142       {'type': 'kst', 'docnum': 'A2E', 'dossiernum': '32142'}
kst-26643-144-h1          https://zoek.officielebekendmakingen.nl/dossier/26643       {'type': 'kst', 'docnum': '144-h1', 'dossiernum': '26643'}
kst-32123-XIV-A-b1        https://zoek.officielebekendmakingen.nl/dossier/32123-XIV   {'type': 'kst', 'documentnum': 'A-b1', 'dossiernum': '32123-XIV'}
kst-32168-3-b2            https://zoek.officielebekendmakingen.nl/dossier/32168       {'type': 'kst', 'docnum': '3-b2', 'dossiernum': '32168'}
h-ek-20082009-31 {'type': 'h-ek', 'vergaderjaar': '20082009', 'docnum': '31'}
h-ek-20082009-31 {'type': 'h-ek', 'vergaderjaar': '2008200

135044

In [1]:
import collections
import random
import time

import wetsuite.helpers.localdata
import wetsuite.helpers.notebook

The toevoeging is for the dossier nummer. What sort are there?

Let's estimate what kind of dossiers are there, 
purely based on their titles.

In [None]:
ourtypes = collections.defaultdict(list)


for kd in ks_dicts:
    content = kd.get('content')
    if content.get('verwijderd','false') == 'true':
        ourtypes['[verwijderd]'].append( content.get('id') )
        continue 

    titel = content.get('titel')
    if titel is None:
        ourtypes['[no titel]'] += 1
        #display( notebook.etree_visualize_selection(ksd, '*', mark_subtree=True) )
    
    if strings.contains_any_of(titel, ['EU-voorstel', 'EU voorstel', 'EU-mededeling', 'EU-trendrapport']):
        ourtypes['eu'].append( titel )
        continue

    elif strings.contains_any_of(titel, ['begrotingssta', 'slotwet', 'voorjaarsnota','najaarsnota', 'Financieel jaarverslag'], case_sensitive=False):
        #print( 'BEGROTING %-7s %20s  %s'%(edict.get('nummer'), edict.get('updated'), titel) )
        ourtypes['begroting'].append( titel )   # up here to not accidentally count wijziging in begrotingsstaat as a law
        continue
    elif strings.contains_any_of(titel, ['omzetbelasting'], case_sensitive=False):
        #print( 'BELASTING %-7s %20s  %s'%(edict.get('nummer'), edict.get('updated'), titel) )
        ourtypes['belasting'].append( titel )
        continue

    elif strings.contains_any_of(titel, ['wetsvoorstel', 'voorstel van wet'], case_sensitive=False):
        #print( 'LAW       %-7s %20s  %s'%(ksd.get('nummer'), ksd.get('updated'), titel) )
        ourtypes['wet'].append( titel )
        continue
    elif titel.startswith('Wet '):
        ourtypes['wet'].append( titel ) 
        continue
    
    elif strings.contains_any_of(titel, ['wetswijziging', 'wijziging van wet ', 'wijziging van de wet ', 'aanpassing van de wet',
                                 'Wijziging van de', # followed by a specifically named law   this one is fuzzier than necessary, might be better to regexp-match
                                 ], case_sensitive=False):
        ourtypes['wet'].append( titel )
        continue
    elif strings.contains_all_of(titel, ['wijziging', 'wetboek'], case_sensitive=False):
        ourtypes['wet'].append( titel )
        continue
    elif strings.contains_all_of(titel, ['wijziging', 'wetten'], case_sensitive=False):
        #print( 'LAW       %-7s %20s  %s'%(ksd.get('nummer'), ksd.get('updated'), titel) )
        ourtypes['wet'].append( titel )
        continue
    elif strings.contains_all_of(titel, ['verbeter', 'wetten'], case_sensitive=False):
        ourtypes['wet'].append( titel )
        continue
    elif strings.contains_all_of(titel, ['aanpassing', ' Wet'], case_sensitive=False):
        ourtypes['wet'].append( titel )
        continue

    elif strings.contains_any_of(titel, ['Initiatiefnota','Interpellatie'], case_sensitive=False):
        ourtypes['discussions'].append( titel )
        continue
    elif strings.contains_any_of(titel, ['burgerinitiatief',], case_sensitive=False):
        ourtypes['discussions'].append( titel )
        continue
    elif strings.contains_any_of(titel, ['Herindeling van de gemeenten',]):
        ourtypes['local'].append( titel )
        continue

    elif strings.contains_any_of(titel, ['mbudsman',]):
        ourtypes['ombudsman'].append( titel )
        continue

    elif strings.contains_any_of(titel, ['Evaluatie',]):
        ourtypes['evaluatie'].append( titel )
        continue

    elif strings.contains_any_of(titel, ['Verdrag',]):
        ourtypes['verdrag'].append( titel )
        continue

    else:
        ourtypes['unsorted'].append( titel )
        #print( 'DONTKNOW %-7s %20s  %s'%(edict.get('nummer'), edict.get('updated'), titel) )
        continue
        #if re.search('', titel):

    #sru_openpub.search_retrieve_many( 'w.dossiernummer=%s'%edict.get('nummer'), callback=op_callback )


for typ, title_list in ourtypes.items():
    #if typ=='unsorted': # cases for which the title isn't a strong indication -- fair enough, but print them to see if there's any patterns we're missing
    #    pprint.pprint( title_list )
    print(  '%-5d %s'%( len(title_list), typ )  )
#pprint.pprint( dtypes )

...not a lot.  

The relations turn out to be a little eaiser to do in other ways. For example:

# So

In [2]:
# if you haven't already:
# !pip3 install tkapi

import tkapi, tkapi.document   
from tkapi.document import DocumentSoort
api = tkapi.TKApi()

In [3]:
tkapi_docs = wetsuite.helpers.localdata.LocalKV('tkapi_docs.db', key_type=str, value_type=bytes)

In [4]:
all_dossiers = api.get_dossiers()
# If we wanted to download all document's actual contents...
#  you might prefer to do this per soort, to avoid gigabytes of what you don't want
all_docs = []

for dossier in all_dossiers:        # a few thousand of them, fetching takes a handful of seconds, but...
    all_docs.extend( dossier.documenten ) # fetching the related document metadata will take around fifteen minutes        

In [20]:
print( "Documents to fetch: %d"%len(all_docs))

# fetching all the actual content would probably take hours, also depending on how nice we are being to the servers
count_cached, count_fetched = 0, 0
pb = wetsuite.helpers.notebook.progress_bar( len(all_docs) )
for document in all_docs:
    bytestring, came_from_cache = wetsuite.helpers.localdata.cached_fetch( tkapi_docs, document.url )
    bytestring, came_from_cache = wetsuite.helpers.localdata.cached_fetch( tkapi_docs, document.bestand_url )
    if came_from_cache:
        count_cached += 1
    else: 
        count_fetched += 1
    pb.description = f"fetched {count_fetched}, cached {count_cached}  ({(100.*count_cached)/(count_cached+count_fetched):.0f}% cached)  "    
    pb.value += 1

Documents to fetch: 181111


  0%|          | 0/181111 [00:00<?, ?it/s]

In [None]:
# for typ, doclist in soorten_by_count:
#     if 'VERSLAG' in typ:
#         print( f'{len(doclist):<10d} {typ}' )
    
#     if typ == 'VERSLAG_VAN_EEN_COMMISSIEDEBAT':
#         count_fetched, count_cached = 0, 0
#         pb = wetsuite.helpers.notebook.progress_bar( len(doclist), description=str(typ) )

#         for doc in doclist:
#             try:
#                 for dossier in doc.dossiers:
#                     toe_s = dossier.toevoeging is not None  and  '-%s'%dossier.toevoeging  or  '   ' # apologies for the syntax-fu
#                     #print( 'Document %s belongs to %5s%-5s  (%s)'%(doc.nummer,  dossier.nummer, toe_s, dossier.titel),  )
#                     #print( '  ', doc.url )
#                     #print( '  ',doc.bestand_url )

#                     bytestring, came_from_cache = wetsuite.helpers.localdata.cached_fetch( tkapi_docs, doc.url)
#             except Exception as e:
#                 print('ERR: '+str(e) )
            
#             #time.sleep(0.1)
#             pb.value += 1

In [18]:
tkapi_docs.summary(True)

{'size_bytes': 12124487680,
 'size_readable': '12.1G',
 'num_items': 362207,
 'avgsize_bytes': 33474}

In [21]:
# Out of interest, what kind of documents are they in the first place?
# 
filetypes = collections.defaultdict(int)
import magic

for url, value in tkapi_docs.items(): #.random_sample(1000):
    if 'Resource()' in url: # the store contains both the fetched metadata and the fetched resource
        descr = magic.from_buffer( value )
        if 'Composite Document File' in descr:
            descr = 'Early MS Office'
        filetypes[ descr ] +=1

In [71]:
filetypes

defaultdict(int,
            {'PDF document, version 1.4': 159982,
             'PDF document, version 1.3': 20498,
             'PDF document, version 1.2': 409,
             'Earlier MSOffice': 49,
             'PDF document, version 1.6': 48,
             'Microsoft Word 2007+': 98,
             'PDF document, version 1.5': 6,
             'PDF document, version 1.7': 6})

In [24]:
# 
vn = collections.defaultdict(int)
for doc in all_docs:
    vn[ doc.volgnummer ] += 1
sorted( vn.items(), key=lambda x:x[0] )

[(1, 6096),
 (2, 5617),
 (3, 5184),
 (4, 4298),
 (5, 3941),
 (6, 3567),
 (7, 3034),
 (8, 2633),
 (9, 2405),
 (10, 2183),
 (11, 2011),
 (12, 1879),
 (13, 1738),
 (14, 1627),
 (15, 1522),
 (16, 1451),
 (17, 1376),
 (18, 1308),
 (19, 1235),
 (20, 1174),
 (21, 1109),
 (22, 1058),
 (23, 1006),
 (24, 968),
 (25, 926),
 (26, 894),
 (27, 872),
 (28, 842),
 (29, 811),
 (30, 784),
 (31, 762),
 (32, 736),
 (33, 721),
 (34, 712),
 (35, 695),
 (36, 684),
 (37, 673),
 (38, 662),
 (39, 649),
 (40, 639),
 (41, 631),
 (42, 634),
 (43, 625),
 (44, 619),
 (45, 608),
 (46, 601),
 (47, 594),
 (48, 586),
 (49, 583),
 (50, 580),
 (51, 579),
 (52, 573),
 (53, 561),
 (54, 549),
 (55, 543),
 (56, 536),
 (57, 532),
 (58, 532),
 (59, 524),
 (60, 520),
 (61, 511),
 (62, 507),
 (63, 507),
 (64, 504),
 (65, 499),
 (66, 494),
 (67, 493),
 (68, 487),
 (69, 484),
 (70, 477),
 (71, 471),
 (72, 470),
 (73, 461),
 (74, 461),
 (75, 455),
 (76, 455),
 (77, 448),
 (78, 439),
 (79, 435),
 (80, 423),
 (81, 425),
 (82, 423),
 (

I ask because I know of cases like 
    https://zoek.officielebekendmakingen.nl/kst-35302-F
as part of 
    https://zoek.officielebekendmakingen.nl/dossier/35302


In [35]:
# Let's get that specific dossier:
from tkapi.dossier import Dossier
dossier_filter = Dossier.create_filter()
dossier_filter.filter_nummer('35302')
for dossier in api.get_dossiers( dossier_filter ):
    print('-')
    for document in sorted( dossier.documenten, key=lambda x:x.volgnummer ):
        print(document.volgnummer, end=' ')
# TODO: figure out whether that's the API or tkapi

-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 

..or even more visibly,
    https://zoek.officielebekendmakingen.nl/dossier/34211
just seems to have no documents:

In [43]:
dossier_filter = Dossier.create_filter()
dossier_filter.filter_nummer('34211')
for dossier in api.get_dossiers( dossier_filter ):
    print( len(dossier.documenten) )
    for document in sorted( dossier.documenten, key=lambda x:x.volgnummer ):
        print(document.volgnummer)

0
