## Purpose of this notebook

Mention the various potential sources of data we have found, including many we chose not to use for datasets (yet?).


## European data

#### EUR-Lex


**What**: 
- EU law, case law, documents from the [Official Journal](https://en.wikipedia.org/wiki/Official_Journal_of_the_European_Union) (OJ / OJEU), other public EU documents
- mostly as HTML, PDF 
- often in multiple languages (up to the [(currently) 24 official EU languages](https://european-union.europa.eu/principles-countries-history/languages_en))
  - potential parallel data?
  - which languages may depend on context; none of them are a given



**Accessible via**: Site, API (SPARQL or SOAP)

**Identifiers used**: [CELEX](#celex_notes), [ECLI](#ecli_notes), [ELI notes](#eli_notes)

**Source and/or responsible parties**: Publications Office of the European Union

**Content license**: TODO: read again, but looks like [CC-BY for contents, PD for metadata](https://eur-lex.europa.eu/content/legal-notice/legal-notice.html)

<!--
https://eur-lex.europa.eu/search.html?scope=EURLEX&text=1308%2F2013&lang=nl&type=quick&qid=1675872839170


where 
: reference is probably something like <tt>[[CELEX]]:32013R1308</tt>
: and language is something like EN or NL ()

EUR-Lex URLs look like
: https://eur-lex.europa.eu/legal-content/''language''/TXT/?uri=''reference''
: https://eur-lex.europa.eu/legal-content/NL/TXT/HTML/?uri=CELEX:32013R1308
: https://eur-lex.europa.eu/legal-content/NL/TXT/PDF/?uri=CELEX:32013R1308


The document structure looks very rendered, but you can pick the overall structure out of it.
-->


There are also [data interfaces](https://eur-lex.europa.eu/content/welcome/data-reuse.html)



See also: 
* https://eur-lex.europa.eu/homepage.html?locale=en


### HUDOC

**What**: European Court of Human Rights (ECHR)'s documents

**Accessible via**: Webpage (though its own search API might be used)

**Identifiers used**: its own? (including application numbers)

**Source and/or responsible**: European Court of Human Rights (ECHR)  (VERIFY)

**Content license**: [this page mentions](https://echr.coe.int/Pages/home.aspx?p=disclaimer) "provided the source is acknowledged (ECHR-CEDH) and the reproduction is made for private use or for the purposes of information and education in connection with the Court’s activities [...] and that any such reproduction is free of charge. [...] Users should nevertheless be aware that certain information and texts may be protected under intellectual property law, in particular by copyright." 

<!--
HUDOC has an application number,
e.g.
https://hudoc.echr.coe.int/eng#{%22itemid%22:[%22001-72205%22]}
refers to 
https://hudoc.echr.coe.int/eng#{%22appno%22:[%2231465/96%22]}
-->

See also:
* https://hudoc.echr.coe.int

* http://www.echr.coe.int/Documents/HUDOC_Manual_ENG.PDF

* https://en.wikipedia.org/wiki/European_Convention_on_Human_Rights


### Eurovoc

Vocabulary, in the 'please use this form to refer to X', so also a basis for the semantics, linked data sense?


## Dutch data

### Legal restrictions on data

As [Bestanden en hergebruik](https://www.overheid.nl/help/officiele-bekendmakingen/bestanden-en-hergebruik#OEP005) points out,
[Artikel 11 van de Auteurswet, BWBR0001886](http://wetten.overheid.nl/jci1.3:c:BWBR0001886&hoofdstuk=I&paragraaf=3&artikel=11) says,
**"Er bestaat geen auteursrecht op wetten, besluiten en verordeningen, door de openbare macht uitgevaardigd, noch op rechterlijke uitspraken en administratieve beslissingen"**

Which seems to mean that the contents of 
laws (wetten),
ordinances (verordeningen),
and court judgments (vonnissen van rechtbanken)
may have an author but no (copy)rights can be withheld.


While implicitly true from that law, some sources note this more explicitly. 
For example, [this entry](https://data.overheid.nl/dataset/basis-wetten-bestand) marks the Basis Wetten Bestand as [CC0](https://en.wikipedia.org/wiki/Creative_Commons_license#Zero_/_public_domain),
which seems purely a reiteration of intent, of something that was already free to use due to the mentioned law.


Other government publications __may__ not be public domain. 

Apparently you can still copy and publish such data - __except__ if those specific copyrights are explicitly mentioned to be reserved (voorbehouden).

<!-- -->

Consider for example the nature of varied [ZBO](https://nl.wikipedia.org/wiki/Zelfstandig_bestuursorgaan)s. 

Most seem to _effectively_ be government, but a few are edge cases, so you probably want to check out what they say about licensing.

For example, 
* [acm.nl](https://www.acm.nl/nl/copyright) seems to reiterate, roughly, 'free to use/publish except where those rights are mentioned to be reserved. Like our logo.' 
* kansspelautoriteit.nl seems to put copyright on its website, but I do believe that applies to the site; the sanctions themselves are decisions that seem to be covered under that Article 11 (VERIFY). 

Ask your local friendly expert about such details.



## Some more technical notes

### Notes on structure and relations


#### Kamerstukken, dossiers

A kamerstuk is a document meant for exchange between parliament and government. (By law, kamerstukken should be public and preferably digital)

This includes reports, letters, law proposals, and various other things.



Notes:
* It sometimes matters that on overheid.nl the data model splits kamerstukken into a regular Kamerstuk and a Bijlage ('appendix'), so Kamerstuk exists both in a broad sense of either, and the specific sense of '...that is not an appendix'.


**Kamerstukdossiers** (a.k.a. dossiers, kamerdossiers) are a collection of Kamerstukken about a topic. 

They have unique identifiers (**Kamerstuknummer**, **dossiernummer**, or **vetnummer**), 
which are roughly sequential but with some special cases like the national budget.


In some cases these seem to be split into sub-dossiers,
see e.g. [this result for dossier 36200](https://gegevensmagazijn.tweedekamer.nl/OData/v4/2.0/Kamerstukdossier?$filter=(Nummer%20eq%2036200%20)) - it seems the Nummer + Toevoeging combination is what makes actually distinct dossiers (VERIFY) -- but e.g. [searching here](https://zoek.officielebekendmakingen.nl/dossier/36200) does not show that distinction.

<!--
-->

This includes things like
* law proposals and changes (e.g. ([dossier 34864](https://zoek.officielebekendmakingen.nl/dossier/34864)))

* budgets and other finances (e.g. [dossier 36200](https://zoek.officielebekendmakingen.nl/dossier/36200), [dossier 36100](https://zoek.officielebekendmakingen.nl/dossier/36100), [dossier 34000](https://zoek.officielebekendmakingen.nl/dossier/34000), [dossier 33400](https://zoek.officielebekendmakingen.nl/dossier/33400), 
  * split into sub-dossiers?

* a number of wider and longer term ones sometimes with hundreds to thousands of documents (e.g. Belastingdienst ([dossier 31066](https://zoek.officielebekendmakingen.nl/dossier/31066)), Jeugdzorg ([dossier 31839](https://zoek.officielebekendmakingen.nl/dossier/31839)), Kinderopvang ([dossier 31322](https://zoek.officielebekendmakingen.nl/dossier/31322)), Air quality ([dossier 30175](https://zoek.officielebekendmakingen.nl/dossier/30175)), Gaswinning ([dossier 33529](https://zoek.officielebekendmakingen.nl/dossier/33529)), Vreemdelingenbeleid ([dossier 19637](https://zoek.officielebekendmakingen.nl/dossier/19637)).
* and/or fairly specific ones, reaction to a Queen's day incident ([dossier 32054](https://zoek.officielebekendmakingen.nl/dossier/32054)), Parking ([dossier 31529](https://zoek.officielebekendmakingen.nl/dossier/31529)), some effects of the EHEC  bacteria([dossier 32801](https://zoek.officielebekendmakingen.nl/dossier/32801)), formalities around a [state-owned casino](https://en.wikipedia.org/wiki/Holland_Casino)) ([dossier 34576](https://zoek.officielebekendmakingen.nl/dossier/34576)), waiting times in healthcare ([dossier 25170](https://zoek.officielebekendmakingen.nl/dossier/25170)), etc. 


Notes:
* There seems to be no categorization (VERIFY), to e.g. select just law proposals/changes
  * but there are other ways, like the title, and that the first document in those always seem to be a Koninklijke boodschap

* A kamerstuk ''can'' be part of multiple dossiers, and the reference to that kamerstuk can be slightly less than obvious.

* e.g. officielebekendmakingen.nl URLs 
  * have fixed structure, like https://zoek.officielebekendmakingen.nl/kst-30175-2.pdf is the second document in that dossier ([dossier 30175](https://zoek.officielebekendmakingen.nl/dossier/30175))
  * if they are new, they may have a temporary number that may change. The older number will redirect

* There are on the order of 150k kamerstukken (VERIFY), sorted into 6000+ dossiers (VERIFY).

* You could link to (replace NUMMER) 
  * metadata like https://gegevensmagazijn.tweedekamer.nl/OData/v4/2.0/Kamerstukdossier?$filter=(Nummer%20eq%20NUMMER)
  * site search like https://zoek.officielebekendmakingen.nl/dossier/MUMMER

See also:
- https://nl.wikipedia.org/wiki/Kamerstuk

- https://opendata.tweedekamer.nl/documentatie/kamerstukdossier

#### Wetsvoorstellen

<!--

The process of creating a law has a handful of requried or just typical steps (including e.g. advice from Raad van State, though not for most budget things), and a final vote,
and if it leads to discussion can become a lot more complex, with amendments, summaries of discussions, motions (voted), reactions, 

Officially involves the king/queen, but they aren't literally present so we often talk about the kabinet (VERIFY) doing this.
That said, these dossier start with a Koninklijke boodschap


These steps are official letters, grouped into a numbered dossier (but this is only one type of dossier (VERIFY)).

That dossier number can be found in a few places. [[#Tweede_Kamer_Open_Data|Tweede Kamer Open Data]] lists the dossiers by number ([https://gegevensmagazijn.tweedekamer.nl/OData/v4/2.0/Kamerstukdossier full list], [https://gegevensmagazijn.tweedekamer.nl/OData/v4/2.0/Kamerstukdossier(61cd1984-523d-4e73-aee5-9ced10ad25ce) example]).

I cannot find a mapping to kamerstukken that make up this dossier using this API, though.



Openkamer.org, which shows such a dossier nicely and with more details aggregated ([https://www.openkamer.org/dossier/tijdlijn/34861/ example]),
mentions using zoek.officielebekendmakingen.nl which seems to confirm that. 

The officielebekendmakingen.nl site lets you list dossier contents [https://zoek.officielebekendmakingen.nl/dossier/34861 example]) {{comment|(The SRU interface shows the same [https://repository.overheid.nl/sru?&version=1.2&x-connection=officielepublicaties&operation=searchRetrieve&query=c.product-area%3D%3Dofficielepublicaties and w.dossiernummer%3D34861] and would make more sense for code)}} though I have not found a list of these dossiers.



There are further linkages that could be done. Say, Raad van State's site has the text for the above case, at,
https://www.raadvanstate.nl/adviezen/@64691/w03-17-0224-ii/
which does not seem to be linked to from officielebekendmakingen.nl/opendata.tweedekamer.nl, and which only mentions the dossier number.
This makes it less than obvious to combine.

Similarly, there is a wetgevingskalender entry





In principe geen kamervragen aan gerelateerd - vragen zijn deel van adviesfunctie.


-->

### Identifier notes


#### Juriconnect / jci

**Juriconnect** / jci are references to laws and regulations, that look something like `jci1.31:c:BWBR0012345&g=2005-01-01&artikel=3.1`


The concept could be used more widely (and sometimes is) yet juricionnect usually (and jci typically) refers to its application to the [[BWB|basis wetten bestand]], centering on BWB's own identifiers (a.k.a. BWB-ID), which look like BWBR0012345.

So far I've mostly seen it used to for hyperlinking between laws, and from CVDR to laws.


The structure is 

        jci{version}:{type}:{BWB-nummer}{key-value}*


These links can refer to laws, but also to specific parts of them. Consider:

        jci1.31:c:BWBR0012345&g=2005-01-01&artikel=3.1
means
- jci according to version 1.31 specs
- single consolidation
- refers to artikel 3.1 of the version of [BWBR0012345](https://wetten.overheid.nl/BWBR0012345) that was valid on 2005-01-01   {{comment|(...but see date related footnotes)}}

Notes:
- type is either `c` (single consolidation) or `v` (collection of consolidations)

- the current version of juriconnect is 1.3.1: [(1.3.1 documentation PDF)](https://standaarden.overheid.nl/bwb/doc/Juriconnect_Standaard_BWB_1_3_1.pdf), but [https://www.juriconnect.nl/implementatie.asp see older variants]. Most difference relates to the meaning of fields like date.

- The standard makes a point that these references might be referring to any/all versions of a thing,
- so only with these dates can it be a identifier referring to to a specific version, and unambiguously to actual text.
  - There are some further date related nuances, such as that 'wet X, artikel Y' _without_ a geldigheidsdatum/zichtdatum to resolve it is interpreted to mean "all versions of this where there there that artikel exists (VERIFY)
  


**locatiestring** is the part of the key-value part that refers to parts, with keys like {{inlinecode|artikel}}, {{inlinecode|hoofdstuk}}, {{inlinecode|boek}} - seems unrestricted other than you can't use s, e, g, or z because they are used for date logic (see below)



**Details to the date are a little interesting**

Up to juriconnect 1.2, there was mainly a "get the version valid at this date" parameter based on the geldigheidperiod.

This because laws and regulations tend to have a inwerktredingsdatum and uitwerktredingdatum, and the geldigheidsperiode is everything inbetween.

However, it is possible that the text says it will apply retroactivity, 
: in which case the legal test applies ''before'' its inwerktredingsdatum,
: and the geldigheidsperiode will overlap with another

And note that since we are referring to **consolidations'', instructions to modify a law can be seen as "create a new consolidation, with the exact same geldigheidsperiode".


This means 1.2's references with just geldigheidsdatum would not be unambiguous for these cases. 

Juriconnect 1.3 resolves these, via the concepts of zichtdatum and zichtbaarheidsperiode

To use the documentation's example, consider
- consolidation X1 with a geldigheidsperiode of 1/1/2010 through 31/12/2010
- wijzigingsinstructie Z (in 2011) implicitly creates consolidation X2 - also with geldigheidsperiode 1/1/2010 through 31/12/2010

A zichtdatum of 1/7/2010 would amount to "as best we knew at that time", i.e. X1.


While inaccurate according to the actual model, you could ''roughly'' see 
- query by geldigheidsdatum "everything we know now" and 
- query by zichtdatum "what we knew at that time"

<!--
It looks like wetten.overheid.nl will create links with both g and z set to the inwerkingtreding of the version you are currently viewing (VERIFY)
-->

<!--
For type {{inlinecode|c}}, a jci can specify 
* g meaning we query by geldigheidsdatum
* z meaning we query by zichtdatum

: both default to today, which amounts to 'the currently valid version'
: if you use z, you must also specify g, and z must be no earlier than g

The details are a little different for 

For type {{inlinecode|v}}, (verzameling consisting of 0 or more consolidaties)
* s meaning we query by start date of geldigheid
* e meaning we query by end date of geldigheid
* z meaning we query by zichtdatum

: if you use z, you must also specify g, and z must be no earlier than s
-->




<!--

        - for type=='c' (single consolidation), expected params include
            g  geldigheidsdatum
            z  zichtdatum
        - for type=='v' (collection), expected params include
            s  start of geldigheid
            e  end of geldigheid
            z  zichtdatum
-->

See also
- https://juriconnect.nl/implementatie.asp?subpagina=documentatie
  - currently probably mostly [the 1.31 specs](https://juriconnect.nl/downloadreg.asp?bestand=Juriconnect%5FStandaard%5FBWB%5F1%5F3%5F1%2Epdf&type=pdf)

#### JCDR and CVDR-ID

CVDR-ID (also seen referred to as JCDR?) are identifiers within CVDR.


These include an enumeration/version number.
For example: CVDR186651_6 is the sixth expression/consolidation of CVDR186651 (the work, to use FRBR terms)

Systems seem to treat the expression ID (with version) as document identifiers,
and leave it merely implied that a work id is also a thing,
...though KOOP's SRU interface lets you search by workid.

Surprisingly, a lookup like https://lokaleregelgeving.overheid.nl/CVDR186651 gives not the last but the first version; the last would be 
https://lokaleregelgeving.overheid.nl/CVDR186651/6 


See also:
- https://www.forumstandaardisatie.nl/sites/bfs/files/proceedings/MvO%2020190313%20Forum%20Standaardisatie%20LX%20en%20standaarden_public.pdf
- https://www.forumstandaardisatie.nl/open-standaarden/jcdr
- https://standaarden.overheid.nl/cvdr/doc/Juriconnect-Standaard-Decentrale-Regelgeving-1.0.pdf


#### CELEX notes

Parts of EUR-Lex website use CELEX identifiers with an URN-style prefix, e.g. CELEX:32016R0679,
but in a lot of places you will just see the identifier (like 32016R0679) 
and you will need to assume from context that this is _probably_ a CELEX identifier.

Luckily, the document type being a letter in the middle is good indication.
Slightly less luckily, there are a lot of those and the pattern to look for is nontrivial.



**The basic form** looks like 32016R0679 (which is [GDPR](https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32016R0679))
- 3                       **sector**: legislation
- 2016                      **year**: 2016
- R                **document type**: Regulations
- 0679             **document number**


There are a dozen sectors, 0 through 9 and C and E - depending a _little_ on how you count. Consider that 
* 0 deals with consolidation, which implies it contains no official documents
* 7 are national transposition measures
  - member states can choose the form for transposing EU directives into national law, who then notify the EU, and EUR-Lex publishes metadata (title, date of publication, transposed directive/s, etc.) and ''optionally'' the text   (see also [National transposition](https://eur-lex.europa.eu/collection/n-law/mne.html)) 
  - these which have the same identifier as their basis, except
  - the sector would be 7 instead of 3
  - you have an added 3-letter country code and a sequential number (see also [Types of documents in EUR-Lex](https://eur-lex.europa.eu/content/tools/TableOfSectors/types_of_documents_in_eurlex.html))


Document types are one or two letters, and each sector has a number of document types - document types that have distinct meanings in each sector




**Optional additions** include

* multiple things on the same day
  * adds a bracketed number
  * e.g. 32012A0424(01)
  * this is not a relation to the basic ID - this seem to be unrelated documents that come from the same source?{{verify}}

* Corrigenda
  * adds R and a bracketed number
  * e.g. [32009L0164R(01)](https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32009L0164R(01)) is the first corrigendum to [32009L0164](https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=CELEX%3A32009L0164)

* [national transposition](https://eur-lex.europa.eu/collection/n-law/mne.html)
  * e.g. 


* referring versions by date, as e.g. EUR-Lex does, e.g.
  * 02012L0019-20120724 and 02012L0019-20180704
  * 02016R0679-20160504 (which is a consolidated variant of a specific version of 32016R0679)


I've not yet read up on each of these -- particularly not on how they combine.



Further notes:
* consolidated versions are __not official__, they are there for convenience.
  * Consolidated texts have the same CELEX number as the act they came from, but with sector "0" 

* e.g. the EUR-Lex site may redirect you to another CELEX number, in particular a consolidated version, e.g. 32012L0019 goes to 02012L0019, 

* e.g. the EUR-Lex site may point out there is a newer version, e.g. [32016R0679](https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32016R0679) more specifically refers you to [02016R0679-20160504](https://eur-lex.europa.eu/legal-content/EN/AUTO/?uri=CELEX:02016R0679-20160504)

* CELEX documents may also have an ECLI, which tend to look like `ECLI:EU:doctype:year:identifier`, <!--e.g. 61955CJ0008 is ECLI:EU:C:1956:7, 61955CJ0008(01) is ECLI:EU:C:1956:11 --> but you can't predict these from the CELEX alone.

* The national transposition's (section 7) country code is not the same as the national case law (section 8)


See also: 
* https://eur-lex.europa.eu/content/tools/eur-lex-celex-infographic-A3.pdf
* https://eur-lex.europa.eu/content/tools/TableOfSectors/types_of_documents_in_eurlex.html
* https://en.wikipedia.org/wiki/Template:CELEX
* https://eur-lex.europa.eu/content/tools/TableOfSectors/types_of_documents_in_eurlex.html
* https://eur-lex.europa.eu/content/help/eurlex-content/celex-number.html


#### ECLI notes

ECLI consists of `:`-separated...
* `ECLI`
* country code (2 characters)
* court code  (1-7 characters (VERIFY)) (generally a whole bunch of specific ones, and a few special cases for courts of appeal, higher courts.)
* year (4 digits)
* case identifier  (ordinal?)  (seems to be `[A-Z-z0-9.]{,25}` but countries usually keep shorter and structured, and may have historical numbering sorted in, etc)



In the case of ECLIs from the Netherlands, that's
* `ECLI`
* `NL`
* one of the court codes listed e.g. at [this page](https://www.rechtspraak.nl/Uitspraken/Paginas/ECLI.aspx) or [here](https://www.rechtspraak.nl/Uitspraken/Paginas/Volledige-lijst-Nederlandse-gerechtscodes.aspx)
* year
* case identifier is
  - since 2013: just numbers, sequentially assigned
  - before 2013 they were often an [[LJN]] (two letters, four numbers), and numbering was added for pre-2013 things without an LJN

e.g. 
- `ECLI:NL:GHDHA:2013:4466`
- `ECLI:NL:RVS:2021:525`
- `ECLI:NL:RBDHA:2013:BZ7059`
- `ECLI:NL:TNORARL:2015:37`


Notes:
* During LJN times, Hoge Raad put arrest and conclusion (VERIFY) under the same LJN, which will show up as the same case identifier but with court code [HR and PHR respectivly](https://www.rechtspraak.nl/Uitspraken/Paginas/ECLI.aspx)
  * these days, they have separate ECLIs

* Court code of XX seems seems used when 
  * uitspraken from organisations other than courts (bezwaarcommissies, klachtencommissies)
  * uitspraken from other countries that
    * if these cases are later assignd an ECLI in another country, [the XX will then point to that](https://www.rechtspraak.nl/Uitspraken/Paginas/ECLI.aspx), e.g.  [ECLI:NL:XX:2011:BW6071](https://uitspraken.rechtspraak.nl/#!/details?id=ECLI:NL:XX:2011:BW6071) points to [ECLI:EU:C:2011:787](https://e-justice.europa.eu/ecli/ECLI:EU:C:2011:787)



See also:
* https://e-justice.europa.eu/content_european_case_law_identifier_ecli-175-en.do
* https://eur-lex.europa.eu/content/help/eurlex-content/ecli.html

* https://www.rechtspraak.nl/Uitspraken/Paginas/ECLI.aspx

* https://www.scribbr.nl/leidraad-voor-juridische-auteurs/jurisprudentie/