Skip to content
Retrieve documents from EUR-Lex and convert them to usable data
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


eurlex.js is a command line utility to retrieve documents (specifically: regulation drafts) in all supported languages from the EUR-Lex website and convert them into JSON. It is made with node.js and can be installed locally via npm.


eurlex.js can be installed using npm:

npm install -g eurlex

Of course you must have node node.js with npm installed.

eurlex.js works fine with Linux, *BSD and Darwin, but never was tested with Win32.


Once installed you can use eurlex on the command line:

eurlex [options] <EUR-Lex URI>

You get a brief description of all the options with

eurlex --help

If you are curious what it looks like to get and convert something, try:

eurlex -vu -l de,en,fr COM:2012:0011:FIN -o eurlex-com-2012-0011-fin.json


Since the HTML otuput of Eurlex is pretty far from being machine readable, eurlex.js applies a lot of magic to read it anyway. The magic can be fine tuned with setting in a file called profile.json. Here is a stripped and commented version of profile.json:

	"lang": ["en","de","..."],           // array of avalable languages
	"expressions": {                     // regular expressions
		"lang": "...",                   // to match the language of the document 
		"title": "..."                   // to match the title of the document
	"delimiters": {                      // delimiters (they are all regex)
		"en": {                          // for this language 
			"recitals": ["...","..."],   // start and end of recitals
			"articles": ["...","..."],   // start and end of articles
			"chapter": "^CHAPTER ",      // string to match a chapter
			"section": "^SECTION ",      // string to match a section
			"article": "^Article ",      // string to match an article
			"fixes": [                   // before a line is parsed
				["...","..."],           // .replace(/first/, "second")
				["...","..."]            // as many as you need
		"lv": {
			"recitals": ["...","..."],
			"articles": ["...","..."],
			"chapter": [                 // if this is an array
				"^([XVI]+) NODAĻA",      // if matches: chapter
				"^([XVI]+) NODAĻA$",     // if matches: text missing
				"^([XVI]+) NODAĻA (.*)$" // $1 is the literal, $2 is the text
			"section": [                 // same here...
				"^([0-9]+)\\. IEDAĻA", 
				"^([0-9]+)\\. IEDAĻA$", 
				"^([0-9]+)\\. IEDAĻA (.*)$"
			"article": [                 // note! for article[3] 
				"^([0-9]+)\\. pants",    // $1 is the literal, __$3__ is the text
				"^([0-9]+)\\. pants$", 
				"^([0-9]+)(\\.) pants (.*)$"
			"fixes": []                  // fixes indeed can be empty

Limitations & Known issues

  • In Magyar, paragraphs and points partly use the same literal enclosures, which leads to paragraphs will be interpreted as headless points. You should be safe using --unify with another language as first parameter.
  • The translations for Malti are formatted pretty crappy and have redundant fragments. You have to hardly rely on the fixes in your profile.json


eurlex.js is licensed under EUPL

You can’t perform that action at this time.