Latest release of OXPath Project: 1.0.4
The first version, OXPath 1.0, can be found at https://github.com/diadem/OXPath.
The current version supports Linux and OSX platforms.
Meltwater uses OXPath to extract millions of documents from 100'000s of sources daily.
OXPath Project consists of the following modules:
- OXPath Core, implementing the core functionality of the OXPath language.
- WebAPI, implementing an interface to web browsers (only Firefox 47.0.1 is currently supported).
- Util contains functionality required for the project.
- Output Handlers are a set of modules for serialising the result tree of OXPath into different formats (e.g., XML, JSON, CSV, RDB).
- OXPath CLI is a command line interface for OXPath.
- Browser Installer installs a web browser required by OXPath.
The project requires Java 1.7 (or higher).
Linux users need to run Browser Installer, which will install web browser into
.oxpath in their home directory.
Mac users need to install a web browser supported by OXPath (i.e., Firefox 47.0.1) and convey OXPath with a configuration file as follows:
<?xml version="1.0" encoding="UTF-8" ?> <diadem> <webapi> <platforms> <platform os-type="OSX"> <home user-home-rel="true">.oxpath</home> <browser name="FIREFOX"> <relpath>firefox_47.0.1</relpath> <run-file-path>/Applications/Firefox 47.0.1.app/Contents/MacOS/firefox</run-file-path> <display-size-file-relpath>display_size</display-size-file-relpath> <download-dir-relpath>download</download-dir-relpath> </browser> </platform> </platforms> </webapi> </diadem>
Installation Into Your Local Repository
The installation of OXPath requires Maven v.3.
All OXPath maven artifacts can be installed with either of the following commands:
mvn install (with unit tests) or
mvn install -Dmaven.test.skip=true (without unit tests).
These commands will also create a binary file
oxpath-cli.jar, which you can find in the
The implementation of the command line interface for OXPath is in the directory oxpath-cli, which can produce an executable binary oxpath-cli.jar.
Details of running the binary oxpath-cli.jar can be found in oxpath-cli/README.md.
OXPath can be integrated into other maven artifacts with the following dependency statements:
<dependency> <groupId>org.oxpath</groupId> <artifactId>oxpath-core</artifactId> <version>2.2.1</version> </dependency> <dependency> <groupId>org.oxpath</groupId> <artifactId>webapi</artifactId> <version>1.4.1</version> </dependency>
To specify the output handler, which can be used to convert the OXPath output tree, add a relevant dependency statement. All available output handlers can be found in the directory output-handlers.
An example for the OXPath XML Output Handler:
<dependency> <groupId>org.oxpath</groupId> <artifactId>oxpath-output-xml</artifactId> <version>1.0.1</version> </dependency>
Documentation and References
- The Javadoc API
- User manual: Fayzrakhmanov et al. "Introduction to OXPath" (2018)
- Paper: Furche et al. "OXPath: A language for scalable data extraction, automation, and crawling on the deep web" (2013)
OXPath Syntax Highlighting
- Andrew Sellers, the University of Oxford
- Giovanni Grasso, the University of Oxford & Meltwater
- Tim Furche, the University of Oxford & Meltwater
- Ruslan Fayzrakhmanov, the University of Oxford & QuantumBlack (a McKinsey company). The main contact person for the open source version (ruslan.fayzrakhmanov AT cs.ox.ac.uk)
- Giorgio Orsi, the University of Oxford & Meltwater
- Christian Schallhart, the University of Oxford
A complete list of authors and contributors is in CONTRIBUTORS.md.
Copyright (C) 2016-2019, OXPath Team.