### Purpose of this notebook

Some of this was just research that went into patterns.py and meta.py,
but some of this is useful context in general.


## Identifiers, unambiguous



### Juriconnect identifier (jci)

[Juriconnect as a whole](https://www.juriconnect.nl/) can be considered a wider platform for communication between a handful of parties.

Juriconnect's most interesting output is probably the juriconnect identifier, jci,
that look something like `jci1.31:c:BWBR0012345&g=2005-01-01&artikel=3.1`, 
primarily used for references to laws (and sometimes to regulations and other things).

The application to Dutch laws will be within the **basis wetten bestand**,
and centering on BWB's own identifiers (a.k.a. BWB-id), which look like BWBR0012345.
JCI also adds the ability to refer to specific parts.

So far we have largely seen it used to for hyperlinking between laws, 
and occasionally from CVDR to laws, and in a few cases from CVDR to other CVDR entries.


The structure is (where the `*` means to indicate 'zero or more key-value pairs'):

        jci{version}:{type}:{BWB-nummer}{key-value}*


For example:

        jci1.31:c:BWBR0012345&g=2005-01-01&artikel=3.1
means
- jci according to the version 1.31 specs 
- single consolidation
- refers to artikel 3.1 of the version of [BWBR0012345](https://wetten.overheid.nl/BWBR0012345) that was valid on 2005-01-01   (...but see date related notes below)

Notes:
- type is either `c` (single consolidation) or `v` (collection of consolidations)

- Where are **locatie string** is mentioned, it refers to the part of the key-value fragment
  that is pointing to a more specific part of the whole, in particular keys like `artikel`, `soort`, `hoofdstuk`.
  - There seem to be no restrictions to what names you can use there, other than that you cannot use the already-defined keys, like `s`, `e`, `g`, or `z`.

- the current version of juriconnect is 1.3.1: [(1.3.1 documentation PDF)](https://standaarden.overheid.nl/bwb/doc/Juriconnect_Standaard_BWB_1_3_1.pdf),
  - but [older variants](https://www.juriconnect.nl/implementatie.asp) exist. There is little to no structural changes between versions. Most difference relates to the precise interpretation, of fields such as date.

- The standard makes a point that these references might be referring to _any/all versions_ of a thing (here of a BWB-id)
  - so only with these dates can it be a identifier referring to to a specific document, and unambiguously to actual text.
  - There are some further date related nuances, such as that 'wet X, artikel Y' _without_ a geldigheidsdatum/zichtdatum to resolve it is interpreted to mean "all versions of this where there there that artikel exists (VERIFY)

#### Details to JCI dates are a little interesting

Up to juriconnect 1.2, there was mainly a "get the version valid at this date" parameter, based on the geldigheidperiode.

Laws and regulations tend to have a inwerktredingsdatum and uitwerktredingdatum, and the geldigheidsperiode is everything inbetween.

<!-- -->

Since we are referring to **consolidations**, instructions to modify a law can be seen as "create a new consolidation, with the same geldigheidsperiode".

However, it is possible that legal text says it will apply retroactivity, in which case it applies to cases _before_ its inwerktredingsdatum.
That, while sticking with the above definition would mean geldigheidsperiode will overlap with another.

<!-- -->

This means 1.2's references with just geldigheidsdatum would not be unambiguous for these cases. 
At the very least, we would need a clearly defined way of dealing with such cases.

Juriconnect 1.3 resolves this via the concepts of zichtdatum and zichtbaarheidsperiode.

To use the documentation's example, consider
- consolidation X1 with a geldigheidsperiode of 1/1/2010 through 31/12/2010
- wijzigingsinstructie Z (in 2011) implicitly creates consolidation X2 - also with geldigheidsperiode 1/1/2010 through 31/12/2010

A zichtdatum of 1/7/2010 would amount to "as best we knew _at that time_", i.e. X1.


While the following is inaccurate according to the actual model, you could _roughly_ see 
- query by geldigheidsdatum "everything we know now" and 
- query by zichtdatum "what we knew at that time"

<!--
It looks like wetten.overheid.nl will create links with both g and z set to the inwerkingtreding of the version you are _currently_ viewing (VERIFY)
-->

<!--
For type `boek`, a jci can specify 
* g meaning we query by geldigheidsdatum
* z meaning we query by zichtdatum

: both default to today, which amounts to 'the currently valid version'
: if you use z, you must also specify g, and z must be no earlier than g

The details are a little different for 

For type `c`, (verzameling consisting of 0 or more consolidaties)
* s meaning we query by start date of geldigheid
* e meaning we query by end date of geldigheid
* z meaning we query by zichtdatum

: if you use z, you must also specify g, and z must be no earlier than s
-->
<!--

        - for type=='c' (single consolidation), expected params include
            g  geldigheidsdatum
            z  zichtdatum
        - for type=='v' (collection), expected params include
            s  start of geldigheid
            e  end of geldigheid
            z  zichtdatum
-->

#### See also
- https://juriconnect.nl/implementatie.asp?subpagina=documentatie
  - currently probably mostly [the 1.31 specs](https://juriconnect.nl/downloadreg.asp?bestand=Juriconnect%5FStandaard%5FBWB%5F1%5F3%5F1%2Epdf&type=pdf)

### ECLI notes

ECLI consists of `:`-separated...
* `ECLI`
* country code (2 characters)
* court code  (max 7 characters) (settled per country. Note that this is probably dozens of specific ones, and a few special cases for courts of appeal, higher courts.)
* year (4 digits)
* case identifier (max 25 characters), seems to allow `[A-Za-z0-9.]` (case insensitive)
  - countries usually keep it rather shorter than that (and _may_ have historical numbering sorted in)
  - you may want to assume a final `.` is part of the sentence, not the identifier


In the case of ECLIs from the Netherlands, that's
* `ECLI`
* `NL`
* one of the court codes listed e.g. at [this page](https://www.rechtspraak.nl/Uitspraken/Paginas/ECLI.aspx) or [here](https://www.rechtspraak.nl/Uitspraken/Paginas/Volledige-lijst-Nederlandse-gerechtscodes.aspx) (we have our own copy of that as data)
* year
* case identifier is
  - before 2013 they were often an LJN (two letters, four numbers), and numbering was added for pre-2013 things without an LJN
  - since 2013: often just numbers, sequentially assigned, but may still be LJN-like?

e.g. 
- `ECLI:NL:RBDHA:2013:BZ7059`
- `ECLI:NL:GHDHA:2013:4466`
- `ECLI:NL:TNORARL:2015:37`
- `ECLI:NL:RVS:2021:525`


Notes:
* ECLI is technically case-INsensitive, so cold be lowercase or even mixed, but seems to _very_ conventionally be all-uppercase.

* aside from country `NL`, we will also see a bunch of `EU` (and `CE`? Not sure what the difference is exactly), and the rare references to other EU counties (VERIFY)

* The Netherlands seems to use court code of XX when 
  * uitspraken from organisations other than courts (bezwaarcommissies, klachtencommissies)
  * uitspraken from other countries that
    * if these cases are later assignd an ECLI in another country, [the XX will then point to that](https://www.rechtspraak.nl/Uitspraken/Paginas/ECLI.aspx), e.g.  [ECLI:NL:XX:2011:BW6071](https://uitspraken.rechtspraak.nl/#!/details?id=ECLI:NL:XX:2011:BW6071) points to [ECLI:EU:C:2011:787](https://e-justice.europa.eu/ecli/ECLI:EU:C:2011:787)

* During Dutch LJN times, Hoge Raad put arrest and conclusie (VERIFY) under the same LJN.  
  * With ECLIs, they will show up as two ECLIs, using the same case identifier but with different court codes [HR and PHR respectivly](https://www.rechtspraak.nl/Uitspraken/Paginas/ECLI.aspx)
* also note ECLI case IDs may still use LJN-looking sequences

* Errors I've seen
  * misspellings of `ECLI:` (ignoring mistakes made less than 5 times because you get the point) (note some seem to be OCR errors)
  
               ECLl  328
                ECL  271
               ELCI  245
                CLI  81
                ECI  77
                ECU  62
                ELI  44
              ECLLI  14
                 EU  11
               ECLL  10
               ECLU  10
               ECLJ  7
               EGLI  7
         ECLInummer  7
               EVLI  6
               ECLT  5
              ECDLI  5

  * Spurious country codes
    * for reference, as [the wikipedia page](https://en.wikipedia.org/wiki/European_Case_Law_Identifier#Identifier_construction) mentions, 
      - "The standard uses mostly [ISO 3166-1 alpha-2 codes](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2#Decoding_table) 
      - with the exception of the United Kingdom (UK) and Greece (EL).
      - A special code for non-states can be assigned by the European Commission;"

    * one-character country codes (e.g. `C` probably meaning to be `CE`, `N` probably intended to be `NL`)
    * Various others - from one count in Dutch documents:

                   NL  790484 
                   EU  34017 
                   CE  4879      # might be intended to be EU? Unsure.
                   EP  303       # technically that's ISO3166-1 alpha-2 for EPOrg, probably incorrect?
                   HR  170
                   DE  76
                   FR  33
                   EC  25        # ?
                   NK  24        # ?
                   NR  15        # ?
                   NJ  14        # ?
                   BL  13        # ?
                   AT  10
                   LN  9         # ? 
                   ML  7         # ?
                   HL  5         # ?
                   RB  4         # ?
                   ES  3
                   CL  3         # ?
                   NI  3         # ? 
                   BE  3
                   RU  2         # ? 
                   KL  1         # ?
                   N1  1         # ?
                   NF  1         # ? 
                   CZ  1
                   NE  1         # ?
                   CU  1         # ?
                   NH  1         # ?
                   CR  1         # ?
                   XX  1         # ? 
                   NC  1         # ?
                   BK  1         # ?

  * Spurious court codes
    * overly long court codes like `CRVBX7178.B` and `NLORBBNAA`
    * a handful of seemingly incorrect court code typos like `RBDH`, `GHSE`, `CVRB`
                   
  * spaces between the final `:` and the case ID (we _could_ try to be robust to this)

  * note that capitalisation changes are technically valid, but practically unusual even if they are unusual


See also:
* https://e-justice.europa.eu/content_european_case_law_identifier_ecli-175-en.do
* https://eur-lex.europa.eu/content/help/eurlex-content/ecli.html

* https://www.rechtspraak.nl/Uitspraken/Paginas/ECLI.aspx

* https://www.scribbr.nl/leidraad-voor-juridische-auteurs/jurisprudentie/

* https://en.wikipedia.org/wiki/European_Case_Law_Identifier

## Identifiers, moderately unambiguous

### CELEX notes

Parts of EUR-Lex website use CELEX identifiers with an [URN](https://en.wikipedia.org/wiki/Uniform_Resource_Name)-style prefix, e.g. `CELEX:32016R0679`,
but in a lot of places you will just see the identifier (like `32016R0679`) 
and you will need to assume from context that this is _probably_ a CELEX identifier.

Luckily, the document type being a letter in the middle is good indication.
(Slightly less luckily, there are a lot of variations, and the pattern to look for is nontrivial)



**The basic form** looks like 32016R0679 (this is [GDPR](https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32016R0679))

    3                       sector: legislation
    2016                      year: 2016
    R                document type: Regulations
    0679           document number


There are a dozen **sectors**, 0 through 9 and C and E - depending a _little_ on how you count. Consider that 
* 0 deals with consolidation, which implies it contains no official documents
* 7 are national transposition measures
  - member states can choose the form for transposing EU directives into national law, who then notify the EU, and EUR-Lex publishes metadata (title, date of publication, transposed directive/s, etc.) and _optionally_ the text   (see also [National transposition](https://eur-lex.europa.eu/collection/n-law/mne.html)) 
  - these which have the same identifier as their basis, except
  - the sector would be 7 instead of 3
  - you have an added 3-letter country code and a sequential number (see also [Types of documents in EUR-Lex](https://eur-lex.europa.eu/content/tools/TableOfSectors/types_of_documents_in_eurlex.html))


Document types are one or two letters, and each sector has a number of document types - document types that have distinct meanings in each sector



**Things that can also appear on such identifiers** include:

* a bracketed number, e.g. 32012A0424(01)
  * has no relation to the basic ID - this seem to be unrelated documents that come from the same source on the same day? (VERIFY)

* Corrigenda
  * adds R and a bracketed number
  * e.g. [32009L0164R(01)](https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32009L0164R%2801%29) is the first corrigendum to [32009L0164](https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=CELEX%3A32009L0164)
  * not to be confused with non-corrigenda bracketed addition (see previous point)

* [national transposition](https://eur-lex.europa.eu/collection/n-law/mne.html)
  * e.g. 
<!--
https://eur-lex.europa.eu/legal-content/HU/ALL/?uri=CELEX%3A32019L0904
https://eur-lex.europa.eu/legal-content/HU/ALL/?uri=CELEX%3A02019L0904-20190612
https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A72019L0904MLT_202206212


Directive: (3, L)
 31995L0046           https://eur-lex.europa.eu/legal-content/EN/NIM/?uri=CELEX:31995L0046

has national measures, listed at https://eur-lex.europa.eu/legal-content/EN/NIM/?uri=CELEX:31995L0046  for example: 

71995L0046NLD_101320  https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=NIM:101320
71995L0046NLD_213105  https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=NIM:213105

71995L0046GBR_101332  https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=NIM:101332
71995L0046GBR_226766  https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=NIM:226766

7*FRA_31672           https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=NIM:31672
(which doesn't seem to be short for 71995L0046FRA_31672 ?)
                      

You could _conceptually_ group those as 71995L0046NLD, 71995L0046GBR, and 71995L0046FRA
-->

* referring versions by date, as e.g. EUR-Lex does, e.g.
  * [02012L0019-20120724](https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A02012L0019-20120724) and [02012L0019-20180704](https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A02012L0019-20180704)
  * [02016R0679-20160504](https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A02016R0679-20160504) (which is a consolidated variant of a specific version of 32016R0679)


I've not yet read up on each of these -- particularly not on how they combine.



Further notes:
* consolidated versions are __not official__, they are there for convenience.
  * Consolidated texts have the same CELEX number as the act they came from, but with sector "0" 

* e.g. the EUR-Lex site may redirect you to another CELEX number, in particular a consolidated version, e.g. 32012L0019 goes to 02012L0019, 

* e.g. the EUR-Lex site may point out there is a newer version, e.g. [32016R0679](https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32016R0679) more specifically refers you to [02016R0679-20160504](https://eur-lex.europa.eu/legal-content/EN/AUTO/?uri=CELEX:02016R0679-20160504)

* CELEX documents may also have an ECLI, which tend to look like `ECLI:EU:doctype:year:identifier`, <!--e.g. 61955CJ0008 is ECLI:EU:C:1956:7, 61955CJ0008(01) is ECLI:EU:C:1956:11 --> but you can't predict these from the CELEX alone.

* The national transposition's (section 7) country code is not the same as the national case law (section 8)


See also: 
* https://eur-lex.europa.eu/content/tools/eur-lex-celex-infographic-A3.pdf
* https://eur-lex.europa.eu/content/tools/TableOfSectors/types_of_documents_in_eurlex.html
* https://en.wikipedia.org/wiki/Template:CELEX
* https://eur-lex.europa.eu/content/tools/TableOfSectors/types_of_documents_in_eurlex.html
* https://eur-lex.europa.eu/content/help/eurlex-content/celex-number.html   

### CVDR-id and JCDR

CVDR-id (also seen referred to as JCDR?) are identifiers within CVDR.


These include an enumeration/version number.
For example: `CVDR186651/6` is the sixth expression(/consolidation) of CVDR186651 (the work, to use [FRBR terms](https://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records))

In pratice, systems generally seem to treat the expression ID (with version) as identifiers of specific documents,
and leave it merely implied that they are part of a series described by the work id.

One exception is KOOP's SRU interface, which lets you search by work id.

<!--  -->

CVDR-id is, in basis, quite simple: a conceptual work, and a version of it.

These would fall into the 'unambiguous' category except that in use, 
`CVDR186651/6` might be referred to as `CVDR186651_6`, `CVDR186651` or `186651_6` or even `186651`,
the last of which which you can detect with decent certainty if you find it in a metadata field for identifiers for,
but not if you find that number from flowing text, so context matters.

Also, depending on place these are inserted (URL, search field, etc), the underscore might need to be a slash, or the other way around (`CVDR186651_6` ≃ `CVDR186651/6`).

Even the website seems inconistent with some of this - see [the links used in the version metadata to CVDR186651](https://lokaleregelgeving.overheid.nl/CVDR186651?&show-wti=true).

<!--
Surprisingly, a website lookup like https://lokaleregelgeving.overheid.nl/CVDR186651 gives not the most recent but the first version; the most recent would be 
https://lokaleregelgeving.overheid.nl/CVDR186651/6 
-->


See also:
- https://www.forumstandaardisatie.nl/sites/bfs/files/proceedings/MvO%2020190313%20Forum%20Standaardisatie%20LX%20en%20standaarden_public.pdf
- https://www.forumstandaardisatie.nl/open-standaarden/jcdr
- https://standaarden.overheid.nl/cvdr/doc/Juriconnect-Standaard-Decentrale-Regelgeving-1.0.pdf


### LJN

Landelijk Jurisprudentie Nummer (LJN, and initially ELRO, Electronisch Loket Rechterlijke Organisatie) 
was an identifier assigned between to 1999 to 2013, after which ELCI replaced it. 
And absorbed LJN, which is relevant in that there is a (complete?) list of LJN to ECLI,
also why we can now choose to use ECLI over LJN.

(...with some practical differences, like that Hoge Raad cases,
arrest and conclusie would be under the same LJN, but would get separate ECLIs.)


In terms of clearly identifiable identifiers, `LJN AR5213` is clear, and similarly, `LJ-nummer AR 5213` or `LJN AR 5213`.

As to `AR 5213` - well. Only really when contexts ''says'' it's an LJN, 
but you can make a _decent_ guess if you have

But if we don't know that, looking for patterns like 'two letters and some numbers' would have _many_ false positives.





<!--
An identifier for decisions in case law, introduced in 1999 (as ELRO) and replaced by ECLI since 2013 



Two letters, four numbers

technically they should be written without space, but there are enough cases that do have a space?


https://nl.wikipedia.org/wiki/Landelijk_Jurisprudentie_Nummer
-->


### Kamerstukdossier numbering


https://nl.wikipedia.org/wiki/Kamerstuk


### Court cases (other)

Court cases often have an ELCI.
But not necessarily (VERIFY)


Also, courts have their own references.

For example, `Hoge Raad, 18 maart 2005 C03/206 HR` seems equivalent to `ECLI:NL:HR:2005:AR5213` (VERIFY).

Not an identifier per se, but you can find the single thing you are looking for based on it.
And the zaaknummer (`C03/206 HR`) alone should be enough in theory (VERIFY).

Also, korte gedingen seem to have their internal coding (VERIFY).

## Moderately structured citations


While citations in general (see e.g. [Leidraad voor juridische auteurs](https://www.google.com/search?q=Leidraad+voor+juridische+auteurs+2019+PDF)) are usually too unstructured to get much metadata from, there are some that fit a pattern and often contain a resolvable reference, such as is often the case with kamerstukken. Consider cases like

            Kamerstukken II 2015/16, 34442, nr. 3, p. 7.
            Kamerstukken I 1995/96, 23700, nr. 188b, p. 3.
            Kamerstukken I 2014/15, 33802, C, p. 3.
            Kamerstukken II 1999/2000, 2000/2001, 2001/2002, 26 855.
            Kamerstukken I 2000/2001, 26 855 (250, 250a); 2001/2002, 26 855 (16, 16a, 16b, 16c).

At the same time, this already contains variations that are spare enough that we have to make some mild assumptions
- that fourth case probably means 
  - dossier 26855 (that space is a style choice used in the documents, and in _some_ references)
  - maybe three distinct documents from three years of that dossier - maybe all documents from those three years?
  - according to that leidraad, that should technically be 1999/2000, 2000/01, 2001/02 but that's an expectable and easy variation to deal with
- the third case
  - [C apparently works as a number](https://zoek.officielebekendmakingen.nl/kst-33802-C.html). 
  - the citation leaves out 'nr. ' in recognition that it's not, and that's something we'll have to accept 

- the fourth and fifth expand (from [gapping](https://en.wikipedia.org/wiki/Gapping)-like omission to avoid repetition in these lists)
  - in the last, the parenthesized 




## More natural references

There are also textual references like 
- `Artikel 10, tweede lid, aanhef en onder e van de. Wet openbaarheid van bestuur`
- `hoofdstukken 6 tot en met 8 van de Awb`

Such references are fairly brief and clear, in part due to preferences mentioned e.g. in [Hoodstuk 3 Aanwijzingen voor de regelgeving](https://wetten.overheid.nl/BWBR0005730/2022-04-01/#Hoofdstuk3_Paragraaf3.3).
which e.g. mentions
* "Indien bij het ontwerpen van een bepaling een sluitende, maar ingewikkelde formulering is gevonden, dient steeds te worden nagegaan of het niet eenvoudiger kan. Ook dient men bedacht te zijn op het weglaten van overbodige woorden. Dus bijvoorbeeld niet ‘Het bepaalde in het tweede lid van artikel 5 is van toepassing’, maar ‘Artikel 5, tweede lid, is van toepassing’."

* How to resolve shortened, relative references (e.g. different article of a recently mentioned law)

* "De verwijzing naar een regeling wordt zo mogelijk verbijzonderd tot een verwijzing naar artikelen."

* "Indien dit de duidelijkheid van de verwijzing vergroot, wordt de verwijzing naar een artikel verbijzonderd tot een verwijzing naar een onderdeel van het artikel."

* "Afkortingen worden alleen gebruikt indien dit redelijkerwijs niet te vermijden is. Bij gebruik ervan worden zij in de begripsbepalingen opgenomen."

Note that there is contextual brevity that is harder to resolve mechanically - see e.g [Aanwijzing 3.27 Aanwijzingen voor de regelgeving](https://wetten.overheid.nl/jci1.3:c:BWBR0005730&hoofdstuk=3&paragraaf=3.3&aanwijzing=3.27&z=2022-04-01&g=2022-04-01)


These are guideliens rather than requirements, and in practice they are often somewhat messier.  

Law-to-law references tend to be pretty regular, but local regulations less so. 

<!-- -->
<!--

https://www.kcbr.nl/beleid-en-regelgeving-ontwikkelen/aanwijzingen-voor-de-regelgeving/hoofdstuk-3-aspecten-van-vormgeving-31-364/ss-33-aanhaling-en-verwijzing

**Regeling zonder citeertitel**
: https://wetten.overheid.nl/BWBR0005730/2022-04-01#Hoofdstuk3_Paragraaf3.3_Artikel3.37
: https://www.kcbr.nl/beleid-en-regelgeving-ontwikkelen/aanwijzingen-voor-de-regelgeving/hoofdstuk-3-aspecten-van-vormgeving/ss-33-aanhaling-en-verwijzing/aanwijzing-337-aanhalen-regelingen-zonder-citeertitel

-->



### On leftover ambiguity

It is one thing to _detect_ that there is such a natural reference present.

...and another thing entirely to find what exactly it is to,
knowing you are getting the right historical version (via the date the reference was made),
dealing with deviations like abbreviated names,
names that are non-unique prefixes of others,
typos, and other such issues.

In practice, the complexity and completeness of such references lies on a sliding scale, 
of 'how hard do you want to try, and how much fuzziness and mistakes will you accept rather than reject?'

If you care for more detail on that, and some solution to resolve references regardless, read things like [This bachelor thesis](https://theses.liacs.nl/pdf/2020-2021-StrijkerJLS.pdf) (and [its code](https://github.com/Strijkerr/BachelorThesis)) and things it mentions, including _M van Opijnen et al. (2015) "[Beyond the Experiment: the eXtendable Legal Link eXtractor](https://www.google.com/search?q=Beyond%20the%20Experiment%3A%20the%20eXtendable%20Legal%20Link%20eXtractor)"_


## Names -- that _can_ be quite useful 
### Relatively unambiguous names (with footnotes)

There are also **names** that are _usually_ unambigious references.

For example, seeing the sequence of words `Wet openbaarheid van bestuur` is fairly certain to refer to [BWBR0005252](https://wetten.overheid.nl/BWBR0005252).

There are also references into parts of these, like `Artikel 10, tweede lid, aanhef en onder e van de. Wet openbaarheid van bestuur`. 

<!-- -->

Similarly, "Wet werk en bijstand" is fairly unambiguous - BWBR0015703.

...except it's valled "Participatiewet" [since 2015](https://wetten.overheid.nl/BWBR0015703/2024-01-01/0/informatie#tab-wijzigingenoverzicht).

...and with the footnote that whenever laws get altered over time, you may wish to know about the laws that altered them too -- here BWBR0024187, BWBR0030997, BWBR0015738, BWBR0020183, and more.

So in a looser sense -- or, arguably, the more _precise_ sense of "what text applies legally at a given time" -- 
both "Participatiewet" and "Wet werk en bijstand" could be said to refer to a ''group'' of specific laws.

The exactly view that you need on that group of text will depend on the scope of your interest.

<!-- -->

Also, we have some practical names that are are less unique than full names. 

Say, jurisprudence might refer to [Stb. 2001, 580](https://zoek.officielebekendmakingen.nl/stb-2001-580.html), which may be _unambiguously_ called "**Wet van 6 december 2001 tot herziening van het procesrecht voor burgerlijke zaken, in het bijzonder de wijze van procederen in eerste aanleg**" ([e.g. LiDO uses that full title](https://linkeddata.overheid.nl/front/portal/spiegel-lijstweergave?id=http%3A%2F%2Flinkeddata.overheid.nl%2Fterms%2Fjurisprudentie%2Fid%2FECLI%3ANL%3APHR%3A2022%3A1043&callback=&dates=&fields=&fq=%7B%21tag%3Dlink_richting%7Dlink_richting%3A%22inkomend%22&facet.field=%7B%21ex%3Dlink_richting%7Dlink_richting&facet.field=%7B%21ex%3Dobj_type%7Dobj_type&facet.field=%7B%21ex%3Dobj_organisatie_groep%7Dobj_organisatie_groep&facet.field=%7B%21ex%3Dobj_organisatie%7Dobj_organisatie&facet.field=%7B%21ex%3Dlink_type%7Dlink_type&facet.field=%7B%21ex%3Dobj_jaar%7Dobj_jaar)), which is actually a modification to Wetboek van Burgerlijke Rechtsvordering.

...yet in a lot of cases this may be referred to with just "herziening procesrecht", which is more convenient, clear enough in context - but outside of the context of his legal history, those are just two adjacent words which we may not even recognize as a reference to a specific concept.

Such details puts up some limits. 
Machines built by a legal non-expert cannot be expected understand much more than the legal non-expert themselves.

Both can be taught, but the point is that **both _have_ to be**.

## Names in context

**Acronyms for laws** are informal, not registered anywhere, or even declared in the text,
like laws typically do in their last article.

Most acronyms are unambigious because people try to keep it that way,
but there are a good amount of exceptions.

Most of those exceptions are perfectly resolveable in the context of a specific area of interest,
but you need more information to resolve it with confidence.


**Names for court cases** are even less formal.

Anywhere else, "wrongful life" or "Baby Kelly" are just are just words.
Within the context of court rulings, it seems many would say that is _obviously_ 
 [`ECLI:NL:HR:2005:AR5213`](https://uitspraken.rechtspraak.nl/#!/details?id=ECLI:NL:HR:2005:AR5213),
because we're talking about prescendent.

People will find the case that way, 
even if such titles are informal, not registered anyway, and more convention than anything else.

Similarly, "Rensing/Polak II" is fairly unambiguous, even though it's **not** [`ECLI:NL:HR:2005:AT4537`](https://uitspraken.rechtspraak.nl/#!/details?id=ECLI:NL:HR:2005:AT4537)'s name/title.

## Citation-like

See e.g. [ECLI:NL:HR:2005:AR5213 on rechtspraak.nl](https://uitspraken.rechtspraak.nl/#!/details?id=ECLI:NL:HR:2005:AR5213). 

It mentions 
- `JOL 2005, 162`,
- `NJ 2006, 606`,
- `RvdW 2005, 42`,
- `VR 2005, 47`,
- `JWB 2005/107`,
- `RV 2014/149`, and
- `JA 2005/34`.

Anywhere else, these are just letters and numbers.

In this context, they are likely to be references to articles in specific journals -- citations, not identifiers.

Yet non-legal experts (like the person who is writing this), knowing that, still find it challenging to find what those are referencing.
Without legal background in the Netherlands,
it's not even clear what journals those are abbreviations for, even if you know for certain _that_ they are journal references,
or, arguably, what their role is.
Why are they mentioned here?  Do they have legal importance, are they discussions, something inbetween?

Is that list complete? Could it could it be? 

Data-wise, things that happen to discuss to a case arguably don't belong tacked onto the case itself (on <tt>rechtspraak.nl</tt>),
but more practically, it's probably more useful than not doing it, and where else would it go?

Nor is it easy to find and read those -- those articles are not necessarily public in the "not without paying money" sense.

<!-- -->
Nor, being citations, is there a singular way to write them.   

Consider:
- `NJ` or `Ned. Jur.` or `Nederlandse Jurisprudentie` or `Nederlandse Jurisprudentie (NJ)`
- `NJ 2006, 606` or `Ned. Jur. 2006/606` or `Nederlandse Jurisprudentie (NJ) 2006 afl. 48 page 606` or `HR 18-03-2005, NJ 2006, 606, 42 Baby Kelly Arrest` or `Hoge Raad, 18-03-2005, C03/206HR , NJ 2006/606 (met noot J.B.M. Vranken)`

<!--
Hoge Raad Civiel en Hoge Raad Straf  
Nederlandse Jurisprudentie           https://www.recht.nl/vakliteratuur/algemeen/artikel/102501/hoge-raad-18-03-2005-c03-206hr/
Rechtspraak van de Week              https://www.bjutijdschriften.nl/tijdschrift/maandbladvermogensrecht/2005/5/MvV_2005_016_005_001
Verkeersrecht Jurisprudentie ANWB    
? Juridisch Wetenschappelijk Bureau
?
Jurisprudentie Aansprakelijkheid
-->

At this point, people in the field will probably be looking for products like 
[Legal Intelligence](https://www.legalintelligence.com/search?q=ECLI:NL:HR:2005:AR5213) 
or 
[InView](https://www.inview.nl/document/id157620050318c03206hradmusp/ecli-nl-hr-2005-ar5213-hr-18-03-2005-nr-c03-206hr?ctx=WKNL_CSL_10000001&tab=uitspraken&pagina=1)
or 
[recht.nl](https://www.recht.nl/rechtspraak/?ecli=ECLI:NL:HR:2005:AR5213) or others.

...but they all have their own view on this.
Which makes sense, because a list of articles typically isn't complete. 

(and these are paid products, so most people will get no view at all)

<!-- -->

Also, legal people like being succinct, 
and like to abbreviate any word or phrase in sight, which can also make references a little harder to parse,
by humans _or_ machines.

For example,
- in _"[...] is vermeldenswaardig  Vzngr. Rb. Leeuwarden 28 augustus 2002, KG 2002, 248,"_
  - "Vzngr. Rb. Leeuwarden" is short for Voorzieningenrechter rechtbank Leeuwarden
  - "Vzngr. Rb. Leeuwarden 28 augustus 2002, KG 2002, 248" is a case, a Kort Geding (injunction), at that court, so _the whole thing_ is a reference.
- _"[...] een eventueel aanvullende ‘arbitrage’ ex art. 43 RO-oud, art. 96 Rv."_ seems to be _two_ references: 
    - "art. 43 RO-oud"   (Wet op de rechterlijke organisatie, and apparently not a version current at time of that writing?) 
    - "art. 96 Rv" is almost a concept in itself, but the reference is to [a specific article in the Wetboek van Burgerlijke Rechtsvordering](https://wetten.overheid.nl/BWBR0001827/2023-07-01#BoekEerste_TiteldeelTweede_AfdelingTweede_Artikel96)

## Unsorted

### European references and identifiers


Documents within the Official Journal -- and beyond the OJ -- can follow one of various citation styles,
for which you probably want to read 
* [Interinstitutional style guide](https://www.google.com/search?q=EU+Interinstitutional+style+guide+pdf)
* [Harmonising the numbering of EU legal acts](https://eur-lex.europa.eu/content/tools/elaw/OA0614022END.pdf)
* https://libguides.bournemouth.ac.uk/c.php?g=471702&p=3226269



You should expect to see things like:
* 65/1/EEC
* Directive 2006/116/EC 
* Directive 93/98/EEC
* Council directive 1999/2/EC
* Council Regulation (EC) No 2820/98
* COUNCIL DECISION 2010/168/CFSP
* Decision No 284/2010/EU of the European Parliament and of the Council
* Commission Decision (EU) 2015/119
* Commission Delegated Decision (EU) 2015/1602
* Commission Implementing Decision (EU) 2015/103
* OJ L 13, 18.1.1969
* OJ L, 2023/2387, 2.10.2023, ELI: http://data.europa.eu/eli/reg_impl/2023/2387/oj
* OJ C, C/2023/90, 2.10.2023, ELI: http://data.europa.eu/eli/C/2023/90/oj

* Regulation (EU) 2015/1 of the European Parliament and of the Council …
* Directive (EU) 2015/2 of the European Parliament and of the Council …
* Council Decision (EU) 2015/3 …
* Council Decision (CFSP) 2015/4 …
* Commission Delegated Regulation (EU) 2015/5 …
* Commission Implementing Directive (EU) 2015/6 …
* Decision (EU) 2015/7 of the European Parliament …
* Decision (EU, Euratom) 2015/8 of the European Parliament …

* OJ C 291 A, 8.11.1991, p. 1
* 2012/C 325/02  (apparently?)

* (and some early styles, before the OJ was even a thing, e.g. [Decision 69/13/Euratom](https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A31969D0013) (which seems to establish the publications office)


Translated, e.g.:
- Richtlijn 2006/123/EG
- Richtl. 2006/123
- Rl. (EU) 2006-123
- EEG-richtlijn 2006.123 ([EEG](https://nl.wikipedia.org/wiki/Europese_Economische_Gemeenschap) being the translation for EEC so these will be older documents only)
- Uitvoeringsverordening (EU) 157/2010
- Besluit 2010/168/GBVB
- Verordening (EU) 2019/631
- Besluit (EU, Euratom) 2016/88
- Gedelegeerde Verordening (EU) 2018/625

Notes: 
- where OJ series are used, _most_ but not all will be `OJ L` (legislation) or `OJ C` (Information/notices)
- There is also ELI, though the Netherlands seems to not have implemented this yet (?)
- If you want to find these via pattern matching, be aware of name changes
- The [Leidraad voor juridische auteurs](https://www.google.com/search?q=Leidraad+voor+juridische+auteurs+2019+PDF) has more notes
- There are some commonly referenced documents that get their own not-so-formal nickname. Don't count on these aliases always being defined as they should be, in documents _or_ informal citations. Consider cases like:
  - _Rome II_ for _Verordening (EG) 864/2007_
  - _Btw-richtlijn_ for _Richtlijn 2006/112/EG_
  - _Algemene verordening gegevensbescherming (AVG)_ for _Verordening (EU) 2016/679_ (That's the Dutch name for Regulation (EU) 2016/679, the GDPR)


<!--
(Directive) [0-9]{2,4}/[0-9]+(/EC|EEC|EU)?

https://eur-lex.europa.eu/complete-help.html


https://libguides.northampton.ac.uk/oscolaguide/europeanlegislation

https://www.youtube.com/watch?v=Nk5U2Vm3g54

And for reference: https://european-union.europa.eu/institutions-law-budget/law/types-legislation_en
-->


<!--
OJ series

* L  - legislation
  * LI
  * LA?
  * LM
* C  - information and notices
  * CA - 
  * CI -  
  * CE - 1999..2004
* A - historical, before EC was set up; 1952..1958?
* P - historical, after EC was set up; 1958..1967

Where ..A is for annex, ..I for isolated, ..M for special edition

https://eur-lex.europa.eu/content/help/oj/series-and-subseries.html?locale=en
https://eur-lex.europa.eu/content/help/oj/series-and-subseries.html?locale=nl


ELI has a core ontology
https://op.europa.eu/en/web/eu-vocabularies/eli
-->


### ELI notes

<!-->
You would be forgiven to think that an ELI (European Legislation Identifier) would be
* just EU government documents
* just for legislation
* just an identifier

But since its inception it has broadened to
* also used elsewhere (apparently including the Brazilian Federal Parliament, and a french stock market regulator which is technically independent)
* also describe things beyond legislation (though when talking about ELI-DL, the addition is only technically not about the legislation itself but how it came to be)
* also specifies embeddable semantic data (and the ontology required for that) and in some parts gets very semantic and abstract.



_In this context_, our scope is the identifier-lookiong thing that refers to legislation.

Well, a little wider. Remember that the EU makes things like  [regulation, directive, decision, recommendataions](https://usda-eu.org/faq/difference-between-a-regulation-directive-and-decision/), important in distinction in that some apply directly, others must be transposed into national law, some are more specific than to everyone, and some aren't bind at all (roughly respectively).

Within the context of the EU's Official Journal, the template is something like:
        http://data.europa.eu/eli/{typeOfDocument}/{yearOfAdoption}/{numberOfDocument}/oj

so ELIs look like:
        https://eur-lex.europa.eu/eli/dec/2009/496/oj
        https://eur-lex.europa.eu/eli/dir/1965/1/oj

...which refer to 
- the EU DECision we might also refer to as 2009/496/EC, and has CELEX 32009D0496
- the EU DIRective we might also refer to as 65/1/EEC, and has CELEX 31965L0001


When it comes to directives, there will often be a national law which, depending on adoption, 
may also have an ELI. This one follows
        /eli/{jurisdiction}/{agent}/{sub-agent}/{year}/{month}/{day}/{type}/{natural identifier}/{level 1…}/{point in time}/{version}/{language}
...and will be stuck on top of a namespace/server that depends on the jurisdiction.



https://eur-lex.europa.eu/content/help/eurlex-content/eli.html
        
https://eur-lex.europa.eu/eli-register/implementation.html

https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=LEGISSUM%3Ajl0068
-->

## See also

* "[Identificatie- standaarden in het juridisch domein](https://www.google.com/search?q=Identificatie-+standaarden+in+het+juridisch+domein)"

## Unsorted



***LiDo*** extracts a number of such references. For example, for
  https://linkeddata.overheid.nl/document/ECLI:NL:PHR:2011:BP5608
resolves law references like 
- (Art. 81 RO)
- (als bedoeld in art. 22 Rv.)
- art. 166 lid 1 in verbinding met art. 353 lid 1 Rv
- artikel 166 Rv
- artikel 3:303 BW
- artikel 22 Rv

- Wetboek van Burgerlijke Rechtsvordering, Artikel 353
- Wetboek van Burgerlijke Rechtsvordering, Artikel 22
- Wetboek van Burgerlijke Rechtsvordering, Artikel 166
- Wet op de rechterlijke organisatie, Artikel 81
- Burgerlijk Wetboek Boek 3, Artikel 303

and jurisprudence references (LJNs can be looked up to ELCI)
- LJN: AO7817, NJ 2005, 270
- LJN: BO6106
- LJN: ZC2793, NJ 1999, 685
- LJN: AW2089, NJ 2006, 327

- ECLI:NL:HR:2011:BP5608 - Hoge Raad, 27-05-2011 / 09/04566
- ECLI:NL:HR:2011:BO6106 - Hoge Raad, 28-01-2011 / 10/00698
- ECLI:NL:HR:2006:AW2089 - Hoge Raad, 09-06-2006 / C05/082HR
- ECLI:NL:HR:2004:AO7817 - Hoge Raad, 09-07-2004 / C03/079HR
- ECLI:NL:HR:1998:ZC2793 - Hoge Raad, 27-11-1998 / 9016 (C97/081)
- ECLI:CE:ECHR:1986:1017JUD000953281 - Europees Hof voor de Rechten van de Mens, 17-10-1986 CASE OF REES v. THE UNITED KINGDOM 9532/81

If you want that as data, consider 
http://linkeddata.overheid.nl/service/get-links?ext-id=ECLI:NL:PHR:2011:BP5608&output=xml
though as 
https://linkeddata.overheid.nl/front/portal/services 
notes, this is not part of public LiDo so you'll need to request an account first





## 