Skip to content

openml/openml-weka

Repository files navigation

OpenML Weka Connector

License Build Status Coverage Status

Package for uploading Weka experiments to OpenML. Works in combination with the OpenML Apiconnector (available on Maven Central; version >= 1.0.14) and Weka (available on Maven Central; version >= 3.9.0)

Downloading datasets from OpenML

The following code example downloads a specific set of OpenML datasets and loads them into the Weka data format (weka.core.Instances), that can be used trivially for off line development and experimenting.

public static void downloadData() throws Exception {
  // Fill in the API key (obtainable from your OpenML profile)
  String apikey = "<FILL_IN_OPENML_API_KEY>";
  
  // Instantiate the OpenmlConnector object 
  // requires artifact org.openml.apiconnector (version 1.0.14) from Maven central
  OpenmlConnector openml = new OpenmlConnector(apikey);
  
  // Download the OpenML object containing the `OpenML100' benchmark set
  Study s = openml.studyGet("OpenML100", "data");
  
  // Loop over all the datasets
  for (Integer dataId : s.getDataset()) {
    // DataSetDescription is an OpenML object containing meta-information about the dataset
    DataSetDescription dsd = openml.dataGet(dataId);
    
    // datasetFile downloads the raw dataset file from openml
    File datasetFile = dsd.getDataset(apikey);
    
    // Converts this file into the Weka format
    Instances dataset = new Instances(new FileReader(datasetFile));
    System.out.println("Downloaded " + dsd.getName());
    System.out.println("numObservations = " + dataset.numInstances() + "; numFeatures = " + dataset.numAttributes());
  }
}

Uploading Weka experiments

The following code example downloads a specific set of OpenML tasks (dubbed: the OpenML100) and executes a NaiveBayes classifier on it.

public static void runTasksAndUpload() throws Exception {
  // Fill in the API key (obtainable from your OpenML profile)
  String apikey = "<FILL_IN_APIKEY>";
  
  // The WekaConfig module gives us the possibilities to enable or disable various Weka Specific options
  WekaConfig config = new WekaConfig();
  
  // Instantiate the OpenmlConnector object 
  // requires artifact org.openml.apiconnector (version >= 1.0.14) from Maven central
  OpenmlConnector openml = new OpenmlConnector(apikey);
  
  // Download the OpenML object containing the `OpenML100' benchmark set
  Study s = openml.studyGet("OpenML100", "tasks");
  
  // Loop over all the tasks
  for (Integer taskId : s.getTasks()) {
    // create a Weka classifier to run on the task
    Classifier tree = new NaiveBayes();
    
    // execute the task (can take a while, depending on the classifier / dataset combination)
    int runId = RunOpenmlJob.executeTask(openml, config, taskId, tree);
    
    // After several minutes, the evaluation measures will be available on the server
    System.out.println("Available on " + openml.getApiUrl() + "run/" + runId);
    
    // Download the run from the server:
    Run run = openml.runGet(runId);
  }
}

Obtaining experimental results from OpenML

OpenML contains a large number of experiments, conveniently available for everyone. In order to obtain and analyse these results, the OpenML Apiconnector could be of use. Please follow the demonstration depicted on the respective Github page.

How to cite

If you found this package useful, please cite: J. N. van Rijn, Massively Collaborative Machine Learning, Leiden University, 2016. If you used OpenML in a scientific publication, please check out the OpenML citation policy.