Skip to content

An R package to access data from the eulaw.app API

License

Notifications You must be signed in to change notification settings

jfjelstul/eulaw

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

eulaw

An R package to access data from the eulaw.app API. This package provides an intuitive, easy-to-use R interface for the eulaw.app API. This API provides access to a variety of research-ready databases, including:

  • The Evolution of European Union Law (EvoEU) Database
  • The European Commission Internal Organization (ECIO) Database
  • The European Union Infringement Procedure (EUIP) Database
  • The European Union State Aid (EUSA) Database
  • The European Union Technical Regulations (EUTR) Database
  • The European Union Member States (EUMS) Database

Replication materials and documentation are available in the GitHub respository for each database.

The eulaw.app API, the eulaw package, and all of the databases were created by Joshua C. Fjelstul, Ph.D.

Installation

You can install the latest development version of the eulaw package from GitHub:

# install.packages("devtools")
devtools::install_github("jfjelstul/eulaw")

Citation

If you use data from the eulaw package in a project or paper, please cite the package:

Joshua Fjelstul (2021). eulaw: An R Interface to the eulaw.app API. R package version 0.1.0.9000.

The BibTeX entry for the package is:

@Manual{,
  title = {eulaw: An R Interface to the eulaw.app API},
  author = {Joshua Fjelstul},
  year = {2021},
  note = {R package version 0.1.0.9000},
}

Problems

If you notice an error in the data or a bug in the R package, please report it here.

Example: Getting data on the EU infringement procedure

Suppose we want directed dyad-year data on decisions in infringement cases (i.e., the number of decisions opened by each Commission department against each member state per year). Further, suppose we want data only on letters of formal notice and reasoned opinions under Article 258 of the Treaty on the Functioning of the European Union (TFEU) since 2010. This data is available from the European Union Infringement Procedure (EUIP) Databse, which is part of the European Union Compliance Project (EUCP).

We can easily get exactly the data we're looking for right from R in just a few easy steps using the eulaw package, which is an R interface for the eulaw.app API. This API provides access to a variety of research-ready databases, including the EUIP Database.

Looking up databases

First, let's double-check what datasets we have to work with. To see the databases that are available via the eulaw.app API, we can use the list_databases() function. This function doesn't have any arguments.

list_databases()
# Requesting data via the eulaw.app API...
# Response received...
# # A tibble: 6 x 2
#   database_id database
#         <int> <chr>   
# 1           1 evoeu   
# 2           2 ecio    
# 3           3 euip    
# 4           4 eusa    
# 5           5 eutr    
# 6           6 eums  

This function requests information via the evoeu.app API and returns a tibble (see the tidyverse) that lists the available databases. We're going to want the euip database. We could also use the describe_databases() function, which also gives a description of each database.

Looking up datasets

Next, we need to pick the dataset in the euip database that has directed dyad-year data on decisions in infringement cases. To see the datasets that are available in the euip database, we can use the list_datasets() function. The function takes one argument, database, as given by list_databases().

list_datasets(database = "euip")
# Requesting data via the eulaw.app API...
# Response received...
# # A tibble: 22 x 2
#    dataset_id dataset         
#         <int> <chr>           
#  1          1 cases           
#  2          2 cases_ts        
#  3          3 cases_ts_ct     
#  4          4 cases_csts_ms   
#  5          5 cases_csts_ms_ct
#  6          6 cases_csts_dp   
#  7          7 cases_csts_dp_ct
#  8          8 cases_ddy       
#  9          9 cases_ddy_ct    
# 10         10 cases_net       
# # … with 12 more rows

To see the whole list, we can assign the output to an object, as in datasets <- list_datasets(database = "euip"), and view it using View(datasets). We're looking for decisions_ddy, which contains directed dyad-year data on decisions.

If we don't already know we're looking for the decisions_ddy dataset, we can use the function describe_datasets(), which provides a description of each dataset, to see what's available and find the right one.

out <- describe_datasets(database = "euip")
# Requesting data via the eulaw.app API...
# Response received...
View(out)

Checking the codebook

To double-check that the decisions_ddy dataset has the information we're looking for, we can look at the codebook using the function describe_variables(). This function has two required arguments, database and dataset. It returns a tibble.

out <- describe_variables(
  database = "euip", 
  dataset = "decisions_ddy"
)
# Requesting data via the eulaw.app API...
# Response received...
View(out)

Searching for data

Now that we know what database and dataset we need, and how to access the documentation, we're ready to download the data. We're specifically looking for directed dyad-year data on letters of formal notice and reasoned opinions under Article 258 TFEU, so we don't need to download the entire decisions_ddy dataset, which also includes data on referrals to the Court and decisions under Article 260 TFEU. Instead of downloading the entire dataset, we can filter the data using the API and download just what we're looking for.

We can use the download_data() function to download the data. This function takes two required arguments, database and dataset, and one optional argument, parameters. The parameters argument should be a list that specifies values for API parameters. API parameters correspond to variables in each dataset and let you filter the data. The download_data() function will throw an error if we try use an invalid API parameter.

Looking up API parameters

We can see the API parameters that are available for the decisions_ddy dataset using the function list_parameters(). This function has two required arguments, database and dataset.

list_parameters(
  database = "euip", 
  dataset = "decisions_ddy"
)
# Requesting data via the eulaw.app API...
# Response received...
# # A tibble: 5 x 2
#   parameter_id parameter        
#          <int> <chr>            
# 1            1 year_min         
# 2            2 year_max         
# 3            3 department_id    
# 4            4 member_state_id  
# 5            5 decision_stage_id

We can see there are 5 API parameters for the decisions_ddy dataset. Generally, each API parameter corresponds to one variable in the dataset. There is an API parameter for all variables ending in _id. The one exception to this rule is the year variable. If a dataset includes a year variable, there are two API parameters, year_min and year_max. This lets you specify a range.

Looking up API parameter values

We want to use the decision_stage_id parameter and the year_min parameter, which will let us filter the data by decision stage and year. For the year_min parameter, we just need to specify a year. For the parameter decision_stage_id, we need to know what values to provide in order to get letters of formal notice and reasoned opinions under Article 258 TFEU. We can look up the corresponding variables, decision_stage_id and decision_stage, in the codebook (as above). But we can easily see the unique values of decision_stage_id using the function list_parameter_values(). This function has two required arguments, database and parameter. API parameters often appear in multiple datasets within the same database, and are always coded the same way across datasets, so we don't need to specify which dataset we're interested in.

list_parameter_values(
  database = "euip", 
  parameter = "decision_stage_id"
)
# Requesting data via the eulaw.app API...
# Response received...
# # A tibble: 6 x 2
#   value label                                
#   <int> <chr>                                
# 1     1 Letter of formal notice (Article 258)
# 2     2 Reasoned opinion (Article 258)       
# 3     3 Referral to the Court (Article 258)  
# 4     4 Letter of formal notice (Article 260)
# 5     5 Reasoned opinion (Article 260)       
# 6     6 Referral to the Court (Article 260)  

We can see from the output that letters of formal notice are coded 1 and reasoned opinions are coded 2. When we specify the parameters argument in the download_data() function, we need to provide a list where the name of each element is a valid API parameter. If we want to specify multiple values for a parameter, we can use a vector, as in decision_state_id = c(1, 2).

Downloading data

Now that we know how to use API parameters, we can use download_data() to download just the data we're interested in.

out <- download_data(
  database = "euip",
  dataset = "decisions_ddy",
  parameters = list(
    year_min = 2010,
    decision_stage_id = c(1, 2)
  )
)
# Requesting data via the eulaw.app API...
# Response received...
# Observations requested: 18300
# Downloading 10000 observations every 5 seconds...
# Total estimated time: 0.08 minutes (5 seconds)
# Batch 1 of 2 complete (observations 1 to 10000 of 18300)
# Batch 2 of 2 complete (observations 10001 to 18300 of 18300)
# Your download is complete                         
# 
# If you use this database in a paper or project, please use the following citation:
# 
# Joshua C. Fjelstul (2021). eulaw: An R Interface to the eulaw.app API. R package version 0.1.0.9000. https://github.com/jfjelstul/eulaw
View(out)

The download_data() function downloads the data in batches of 10000 observations. The eulaw.app API has a rate limit, but this function automatically manages the rate limit for us. It will download 1 batch approximately every 5 seconds.

The function prints some useful information to the console while the data downloads. It tells us how many observations we have requested, how many batches it will take to download the data, and approximately how long it will take. It provides an update every time a batch is downloaded and counts down to the next batch. The function returns a tibble that we can manipulate with dplyr and tidyr.

And that's it! We now have a research-ready dataset in our R workspace.