Java library to interface with OpenML
The Java App is used for a number of OpenML components, such as the ARFF parser and Evaluation engine, which depend on the Weka API. It is invoked from the OpenML API by means of a CLI interface. Typically, a call looks like this:
java -jar webapplication.jar -config "api_key=S3CR3T_AP1_K3Y" -f evaluate_run -r 500
Which in this case executes the webapplication jar, invokes the function "evaluate run" and gives it parameter run id 500. The config parameter can be used to set some config items, in this case the api_key is mandatory. Every OpenML user has an api_key, which can be downloaded from their OpenML profile page. The response of this function is a call to the OpenML API uploading evaluation results to the OpenML database. Note that in this case the PHP website invokes the Java webapplication, which makes a call to the PHP website again, albeit another endpoint.
The webapplication does not have direct writing rights into the database. All communication to the database goes by means of the OpenML Connector, which communicates with the OpenML API. As a consequence, the webapplication could run on any system, i.e., there is no formal need for the webapplication to be on the same server as the website code. This is important, since this created modularity, and not all servers provide a command line interface to PHP scripts.
Another example is the following:
java -jar webapplication -config "api_key=S3CR3T_AP1_K3Y" -f all_wrong -r 81,161 -t 59
Which takes a comma separated list of run ids (no spaces) and a task id as input and outputs the test examples on the dataset on which all algorithms used in the runs produced wrong examples (in this case, weka.BayesNet_K2 and weka.SMO, respectively). An error will be displayed if there are runs not consistent with the task id in there.
The bootstrap class of the webapplication is
org.openml.webapplication.Main
It automatically checks authentication settings (such as api_key) and the determines which function to invoke.
It uses a switch-like if - else contruction to facilitate the functionalities of the various functions. Additional functions can be added to this freely. From there on, it is easy to add functionality to the webapplication.
Parameters are handled using the Apache Commons CommandLineParser class, which makes sure that the passed parameters are available to the program.
In order to make new functionalities available to the website, there also needs to be programmed an interface to the function, somewhere in the website. The next section details on that.
By design, the REST API is not allowed to communicate with the Java App. All interfaces with the Java webapplication should go through other controllers of the PHP CodeIgniter framework., for example api_splits. Currently, the website features two main API's. These are represented by a Controller. Controllers can be found in the folder openml_OS/controllers. Here we see:
- api_new.php, representing the REST API
- api_splits.php, representing an API interfacing to the Java webapplication.
The Java API allows you connect to OpenML from Java applications.
Stable releases of the Java API are available from Maven Central Or, you can check out the developer version from GitHub
Include the jar file in your projects as usual, or install via Maven.
- Create an
OpenmlConnector
instance with your authentication details. This will create a client with all OpenML functionalities.
OpenmlConnector client = new OpenmlConnector("api_key")
All functions are described in the Java Docs.
To download data, flows, tasks, runs, etc. you need the unique id of that resource. The id is shown on each item's webpage and in the corresponding url. For instance, let's download Data set 1. The following returns a DataSetDescription object that contains all information about that data set.
DataSetDescription data = client.dataGet(1);
You can also search for the items you need online, and click the icon to get all id's that match a search.
To upload data, flows, runs, etc. you need to provide a description of the object. We provide wrapper classes to provide this information, e.g. DataSetDescription
, as well as to capture the server response, e.g. UploadDataSet
, which always includes the generated id for reference:
DataSetDescription description = new DataSetDescription( "iris", "The famous iris dataset", "arff", "class");
UploadDataSet result = client.dataUpload( description, datasetFile );
int data_id = result.getId();
More details are given in the corresponding functions below. Also see the Java Docs for all possible inputs and return values.
Retrieves the description of a specified data set.
DataSetDescription data = client.dataGet(1);
String name = data.getName();
String version = data.getVersion();
String description = data.getDescription();
String url = data.getUrl();
Retrieves the description of the features of a specified data set.
DataFeature reponse = client.dataFeatures(1);
DataFeature.Feature[] features = reponse.getFeatures();
String name = features[0].getName();
String type = features[0].getDataType();
boolean isTarget = features[0].getIs_target();
Retrieves the description of the qualities (meta-features) of a specified data set.
DataQuality response = client.dataQuality(1);
DataQuality.Quality[] qualities = reponse.getQualities();
String name = qualities[0].getName();
String value = qualities[0].getValue();
For data streams. Retrieves the description of the qualities (meta-features) of a specified portion of a data stream.
DataQuality qualities = client.dataQuality(1,0,10000,null);
Retrieves a list of all data qualities known to OpenML.
DataQualityList response = client.dataQualityList();
String[] qualities = response.getQualities();
Uploads a data set file to OpenML given a description. Throws an exception if the upload failed, see openml.data.upload for error codes.
DataSetDescription dataset = new DataSetDescription( "iris", "The iris dataset", "arff", "class");
UploadDataSet data = client.dataUpload( dataset, new File("data/path"));
int data_id = result.getId();
Registers an existing dataset (hosted elsewhere). The description needs to include the url of the data set. Throws an exception if the upload failed, see openml.data.upload for error codes.
DataSetDescription description = new DataSetDescription( "iris", "The iris dataset", "arff", "class");
description.setUrl("http://datarepository.org/mydataset");
UploadDataSet data = client.dataUpload( description );
int data_id = result.getId();
Retrieves the description of the flow/implementation with the given id.
Implementation flow = client.flowGet(100);
String name = flow.getName();
String version = flow.getVersion();
String description = flow.getDescription();
String binary_url = flow.getBinary_url();
String source_url = flow.getSource_url();
Parameter[] parameters = flow.getParameter();
Retrieves an array of id's of all flows/implementations owned by you.
ImplementationOwned response = client.flowOwned();
Integer[] ids = response.getIds();
Checks whether an implementation with the given name and version is already registered on OpenML.
ImplementationExists check = client.flowExists("weka.j48", "3.7.12");
boolean exists = check.exists();
int flow_id = check.getId();
Removes the flow with the given id (if you are its owner).
ImplementationDelete response = client.openmlImplementationDelete(100);
Uploads implementation files (binary and/or source) to OpenML given a description.
Implementation flow = new Implementation("weka.J48", "3.7.12", "description", "Java", "WEKA 3.7.12")
UploadImplementation response = client.flowUpload( flow, new File("code.jar"), new File("source.zip"));
int flow_id = response.getId();
Retrieves the description of the task with the given id.
Task task = client.taskGet(1);
String task_type = task.getTask_type();
Input[] inputs = task.getInputs();
Output[] outputs = task.getOutputs();
Retrieves all evaluations for the task with the given id.
TaskEvaluations response = client.taskEvaluations(1);
Evaluation[] evaluations = response.getEvaluation();
For data streams. Retrieves all evaluations for the task over the specified window of the stream.
TaskEvaluations response = client.taskEvaluations(1);
Evaluation[] evaluations = response.getEvaluation();
Retrieves the description of the run with the given id.
Run run = client.runGet(1);
int task_id = run.getTask_id();
int flow_id = run.getImplementation_id();
Parameter_setting[] settings = run.getParameter_settings()
EvaluationScore[] scores = run.getOutputEvaluation();
Deletes the run with the given id (if you are its owner).
RunDelete response = client.runDelete(1);
Uploads a run to OpenML, including a description and a set of output files depending on the task type.
Run.Parameter_setting[] parameter_settings = new Run.Parameter_setting[1];
parameter_settings[0] = Run.Parameter_setting(null, "M", "2");
Run run = new Run("1", null, "100", "setup_string", parameter_settings);
Map outputs = new HashMap<String,File>();
outputs.add("predictions",new File("predictions.arff"));
UploadRun response = client.runUpload( run, outputs);
int run_id = response.getRun_id();