# Purpose of this notebook

Getting some value lists out of [rechtspraak.nl](https://www.rechtspraak.nl/),
in part for wordlist-like use, in part for use in search queries.

For example, there is a list of categories at https://data.rechtspraak.nl/Waardelijst/Rechtsgebieden

The strings and codes might potentially be things to search for, and/or find in metadata,
or to provide a list of as context, so we would like this in data form.

## Values for filters

### Rechtsgebieden

In [1]:
import wetsuite.datacollect.rechtspraaknl

rgb = wetsuite.datacollect.rechtspraaknl.parse_rechtsgebieden() # fetches and parses

# rgb is a dict, from identifiers to a list of names, which will e.g. contain 
#     'http://psi.rechtspraak.nl/rechtsgebied#bestuursrecht':                   ['Bestuursrecht']
# as well as
#     'http://psi.rechtspraak.nl/rechtsgebied#bestuursrecht_mededingingsrecht': ['Mededingingsrecht', 'Bestuursrecht']

# Meaning Bestuursrecht is a grouping of this and more, and Mededingingsrecht one of several specific parts of Bestuursrecht


the shape of the data above might do for just lookups of such identifiers.

If you wanted to present it as grouped / tree-like categories, you would need some slightly awkward code.
Said code might look like:

In [2]:

# For that purpose, a little restructuring is useful
rechtsgebieden_groups = {}
for identifier, namelist in rgb.items():
    if len(namelist)==1:
        broad, = namelist
        rechtsgebieden_groups[broad] = {'identifier':identifier, 'name':broad, 'specific':[]}
    elif len(namelist)==2:
        specific, broad = namelist
        rechtsgebieden_groups[broad]['specific'].append( {'identifier':identifier, 'name':specific} )



In [3]:
# which you might use something like:

for groupkey, groupdetails in rechtsgebieden_groups.items():
    print( '%-30s      (%s)'%(groupdetails['name'], groupdetails['identifier']) )
    for subgroup in groupdetails['specific']:
        print( '    %-30s  (%s)'%(subgroup['name'], subgroup['identifier']) )

Bestuursrecht                       (http://psi.rechtspraak.nl/rechtsgebied#bestuursrecht)
    Ambtenarenrecht                 (http://psi.rechtspraak.nl/rechtsgebied#bestuursrecht_ambtenarenrecht)
    Belastingrecht                  (http://psi.rechtspraak.nl/rechtsgebied#bestuursrecht_belastingrecht)
    Bestuursprocesrecht             (http://psi.rechtspraak.nl/rechtsgebied#bestuursrecht_bestuursprocesrecht)
    Bestuursstrafrecht              (http://psi.rechtspraak.nl/rechtsgebied#bestuursrecht_bestuursstrafrecht)
    Europees bestuursrecht          (http://psi.rechtspraak.nl/rechtsgebied#bestuursrecht_europeesBestuursrecht)
    Mededingingsrecht               (http://psi.rechtspraak.nl/rechtsgebied#bestuursrecht_mededingingsrecht)
    Omgevingsrecht                  (http://psi.rechtspraak.nl/rechtsgebied#bestuursrecht_omgevingsrecht)
    Socialezekerheidsrecht          (http://psi.rechtspraak.nl/rechtsgebied#bestuursrecht_socialezekerheidsrecht)
    Vreemdelingenrecht           

## Fetch names for gerechtscodes used in ECLIs

It seems that here we have to be a little more creative, and scape it from a webpage like the one mentioned in the code.

Note that the page this uses is not an authoritative list - there was previously a typo in there (RBGL instead of RBGEL for Rechtbank Gelderland), but it's pretty good,
and there is extra information there that can be practical.

In [4]:
# BeautifulSoup (bs4) is a library that makes it easier to scrape values from HTML with relatively simple code.
import bs4
bytedata = wetsuite.helpers.net.download('https://www.rechtspraak.nl/Uitspraken/Paginas/Volledige-lijst-Nederlandse-gerechtscodes.aspx')
soup = bs4.BeautifulSoup( bytedata )

In [5]:
gerechtcode_data = {}  # example item:    'AGAMS': {'abbrev':'AGAMS', 'name':'Ambtenarengerecht Amsterdam', 'extra': ['opgeheven','ambtenarengerecht'],}

# Here we try to imitate the rechtspraak.nl's website search's grouping,  which knows more than I do
# TODO: have an 'everything else' cateogry
groups = {  # grouping name -> [ list of gerechtcodes in it ]
    'Hoge Raad':['HR', 'PHR'],
    'Raad van State':['RVS'],
    'Centrale Raad van Beroep':['CRVB'],
    'College van Beroep voor het bedrijfsleven':['CBB'],
    'Gerechtshoven':[],
    'Rechtbanken':[],
    'Andere instanties binnen het koningkrijk':[],
}


table = soup.select('table.rnl-rteTable-default')[0]
for tr in table.find_all('tr'):
    curdata = { 'extra':[] }  # extra is a list of tags we add ourselves, intended to look for specific things a little easier
    tds = tr.find_all('td')
    if len(tds)==2: # implicitly ignores the header row, those are th
        td1, td2 = tds

        abbrev = td1.text   
        curdata['abbrev'] = abbrev

        name = td2.text   # note that there are sometimes elemements in here, for tooltips.  TODO: deal with that a little more explicitly  and/or  and check the result
        if ' (opgeheven)' in name:
            name = name.replace(' (opgeheven)','').strip()
            curdata['extra'].append( 'opgeheven' )
        curdata['name'] = name

        # group them by their codes     TODO: check these
        if abbrev.startswith('GH'):
            curdata['extra'].append('gerechtshof')
            groups['Gerechtshoven'].append( abbrev )
             
        if abbrev.startswith('RB'):
            curdata['extra'].append( 'rechtbank' )
            groups['Rechtbanken'].append( abbrev )

        if abbrev.startswith('KT'):
            curdata['extra'].append( 'kantongerecht' )
        if abbrev.startswith('AG'):
            curdata['extra'].append( 'ambtenarengerecht' )
        if abbrev.startswith('RVB') or abbrev.startswith('ORB'):
            curdata['extra'].append( 'raadvanberoep' )

        if abbrev.startswith('O'): # Aruba, Curacao, Sint Maarten, Bonaire, Sint Eustatius, Saba 
            curdata['extra'].append( 'eilanden' ) # O... seems to overlap with fairly few of the abovementioned categories, but TODO: look at that
            groups['Andere instanties binnen het koningkrijk'].append( abbrev )

        # sort of special cases, add so that you can more easily look for these
        if abbrev == 'HR':
            curdata['extra'].append( 'hr' ) 
        if abbrev == 'PHR':
            curdata['extra'].append( 'hr' ) 
        if abbrev == 'CBB':
            curdata['extra'].append( 'cbb' ) 
        if abbrev == 'CRVB':
            curdata['extra'].append( 'crvb' ) 
        if abbrev == 'RVS':
            curdata['extra'].append( 'rvs' ) 

        # add some groupings that may be useful
        if 'tucht' in name.lower() or 'raad van discipline' in name.lower():
            curdata['extra'].append( 'tucht' ) 

        if 'notarissen' in name.lower() or 'notariaat' in name.lower(): # TODO: not sure this is a useful grouping
            curdata['extra'].append( 'notar' ) 

        if abbrev == 'XX': 
            curdata['extra'].append( 'xx' ) 
            name = 'No code or not yet assigned, possibly international' # TODO: see if that's accurate wording.

        # Which ones have not been sorted?
        if len(curdata['extra'])==0 or len(curdata['extra'])==1 and curdata['extra'][0]=='opgeheven': 
            #print(abbrev, name)
            curdata['extra'].append( 'unsorted' ) 
        #   which are currently:
        #     CVBSTUF  College van Beroep Studiefinanciering
        #     DETARCO  Tariefcommissie
        #     TACAKN   Accountantskamer NIVRA
        #     TAHVD    Hof van Discipline
        #     TDIVBC   Veterinair Beroepscollege
        #     TGDKG    Kamer voor Gerechtsdeurwaarders

        gerechtcode_data[abbrev] = curdata

## Review the data we just made

In [6]:
gerechtcode_data

{'AGAMS': {'extra': ['opgeheven', 'ambtenarengerecht'],
  'abbrev': 'AGAMS',
  'name': 'Ambtenarengerecht Amsterdam'},
 'AGARN': {'extra': ['opgeheven', 'ambtenarengerecht'],
  'abbrev': 'AGARN',
  'name': 'Ambtenarengerecht Arnhem'},
 'AGGRO': {'extra': ['opgeheven', 'ambtenarengerecht'],
  'abbrev': 'AGGRO',
  'name': 'Ambtenarengerecht Groningen'},
 'AGHAA': {'extra': ['opgeheven', 'ambtenarengerecht'],
  'abbrev': 'AGHAA',
  'name': 'Ambtenarengerecht Haarlem'},
 'AGROE': {'extra': ['opgeheven', 'ambtenarengerecht'],
  'abbrev': 'AGROE',
  'name': 'Ambtenarengerecht Roermond'},
 'AGROT': {'extra': ['opgeheven', 'ambtenarengerecht'],
  'abbrev': 'AGROT',
  'name': 'Ambtenarengerecht Rotterdam'},
 'AGSGR': {'extra': ['opgeheven', 'ambtenarengerecht'],
  'abbrev': 'AGSGR',
  'name': "Ambtenarengerecht 's-Gravenhage"},
 'AGSHE': {'extra': ['opgeheven', 'ambtenarengerecht'],
  'abbrev': 'AGSHE',
  'name': "Ambtenarengerecht 's-Hertogenbosch"},
 'AGUTR': {'extra': ['opgeheven', 'ambtenar

In [8]:
import pprint
with open( 'gerechtcodes.py', 'w' ) as f:
    f.write( 'data = ' + pprint.pformat( gerechtcode_data) )

In [27]:
groups

{'Hoge Raad': ['HR', 'PHR'],
 'Raad van State': ['RVS'],
 'Centrale Raad van Beroep': ['CRVB'],
 'College van Beroep voor het bedrijfsleven': ['CBB'],
 'Gerechtshoven': ['GHAMS',
  'GHARL',
  'GHARN',
  'GHDHA',
  'GHLEE',
  'GHSGR',
  'GHSHE'],
 'Rechtbanken': ['RBALK',
  'RBALM',
  'RBAMS',
  'RBARN',
  'RBASS',
  'RBBRE',
  'RBDHA',
  'RBDOR',
  'RBGEL',
  'RBGRO',
  'RBHAA',
  'RBLEE',
  'RBLIM',
  'RBMAA',
  'RBMID',
  'RBMNE',
  'RBNHO',
  'RBNNE',
  'RBOBR',
  'RBONE',
  'RBOVE',
  'RBROE',
  'RBROT',
  'RBSGR',
  'RBSHE',
  'RBUTR',
  'RBZLY ',
  'RBZUT',
  'RBZWB',
  'RBZWO'],
 'Andere instanties binnen het koningkrijk': ['OCHM',
  'OGAACMB',
  'OGANA',
  'OGEAA',
  'OGEAB',
  'OGEABES',
  'OGEAC',
  'OGEAM',
  'OGEANA',
  'OGHACMB',
  'OGHNAA',
  'OHJNA',
  'ORBAACM',
  'ORBANAA',
  'ORBBACM',
  'ORBBNAA']}