LinkedPipes ETL

LinkedPipes ETL is an RDF based, lightweight ETL tool.

Library of components to get you started faster
Sharing of configuration among individual pipelines using templates
RDF configuration of transformation pipelines

Requirements

Linux, Windows, iOS
Docker
Docker Compose is optional as docker compose is supported by modern versions of Docker

For building locally

Java 17, 18, 20
Git
Optionally Maven
Node.js 18 & npm

Installation and startup

You can run LP-ETL in Docker, or build it from the source.

Docker

To start LP-ETL you can use:

git clone https://github.com/linkedpipes/etl.git
cd etl
docker compose up

This would use pre-build images stored at GitHub Packages. The images are build from the main branch.

Alternatively you can use one liner. For example to run LP-ETL from develop branch on http://localhost:9080 use can use following command:

curl https://raw.githubusercontent.com/linkedpipes/etl/develop/docker-compose.yml | LP_ETL_PORT=9080 LP_VERSION=develop docker-compose -f - up

You may need to run the docker command as sudo or be in the docker group.

Building Docker images

You can build LP-ETL images your self. Note that on Windows, there is an issue with buildkit. See the temporary workaround.

Configuration

Environment variables:

LP_VERSION - default value main, determine the version of Docker images.
LP_ETL_DOMAIN - The URL of the instance, this is used instead of the domain.uri from the configuration.
LP_ETL_PORT - Specify port mapping for frontend, this is where you can connect to your instance. This does NOT have to be the same as port in LP_ETL_DOMAIN in case of reverse-proxying.

docker compose utilizes several volumes that can be used to access/provide data. See docker-compose.yml comments for examples and configuration. You may want to create your own docker-compose.yml for custom configuration.

From source on Linux

Installation

$ git clone https://github.com/linkedpipes/etl.git
$ cd etl
$ mvn install

Configuration

The configuration file deploy/configuration.properties can be edited, mainly changing paths to working, storage, log and library directories.

Startup

$ cd deploy
$ ./executor.sh >> executor.log &
$ ./executor-monitor.sh >> executor-monitor.log &
$ ./storage.sh >> storage.log &
$ ./frontend.sh >> frontend.log &

Running LP-ETL as a systemd service

See example service files in the deploy/systemd folder.

From source on Windows

Note that it is also possible to use Bash on Ubuntu on Windows or Cygwin and proceed as with Linux.

Installation

git clone https://github.com/linkedpipes/etl.git
cd etl
mvn install

Configuration

The configuration file deploy/configuration.properties can be edited, mainly changing paths to working, storage, log and library directories.

Startup

In the deploy folder, run

executor.bat
executor-monitor.bat
storage.bat
frontend.bat

Data import

You can copy pipelines and templates data from one instance to another directly.

Assume that you have copy of a data directory ./data-source with pipelines and templates subdirectories. You can obtain the directory from any running instance, you can even merge content of multiple of those directories together. In the next step you would like to import the data into a new instance. You can just copy the files to respective directories under ./data-target. Keep in mind that this would preserve the IRIs.

Should you need to change the IRIs, you should employ import and export functionality available in the frontend.

Plugins - Components

The components live in the jars directory. If you need to create your own component, you can copy an existing component and change it.

Update notes

Update note 5: 2019-09-03 breaking changes in the configuration file. Remove /api/v1 from the executor-monitor.webserver.uri, so it looks like: executor-monitor.webserver.uri = http://localhost:8081. You can also remove executor.execution.uriPrefix as the value is derived from domain.uri.

Update note 4: 2019-07-03 we changed the way frontend is run. If you do not use our script to run it, you need to update yours.

Update note 3: When upgrading from develop prior to 2017-02-14, you need to delete {deploy}/jars and {deploy}/osgi.

Update note 2: When upgrading from master prior to 2016-11-04, you need to move your pipelines folder from e.g., /data/lp/etl/pipelines to /data/lp/etl/storage/pipelines, update the configuration.properites file and possibly the update/restart scripts as there is a new component, storage.

Update note 1: When upgrading from master prior to 2016-04-07, you need to delete your old execution data (e.g., in /data/lp/etl/working/data)

Name		Name	Last commit message	Last commit date
Latest commit History 2,605 Commits
.github/workflows		.github/workflows
.mvn/wrapper		.mvn/wrapper
dataunit-core		dataunit-core
deploy-dependencies		deploy-dependencies
deploy		deploy
documentation		documentation
executor-monitor		executor-monitor
executor		executor
frontend		frontend
libraries		libraries
plugin-api		plugin-api
plugin-libraries		plugin-libraries
plugin-test		plugin-test
plugins		plugins
pom-backend		pom-backend
pom-plugin		pom-plugin
storage		storage
test		test
vocabulary		vocabulary
.gitignore		.gitignore
.travis.yml		.travis.yml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
checkstyle-exclude.xml		checkstyle-exclude.xml
checkstyle.xml		checkstyle.xml
docker-compose.yml		docker-compose.yml
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml
spotbugs-exclude.xml		spotbugs-exclude.xml

License

linkedpipes/etl

Folders and files

Latest commit

History

Repository files navigation

LinkedPipes ETL

Requirements

For building locally

Installation and startup

Docker

Building Docker images

Configuration

From source on Linux

Installation

Configuration

Startup

Running LP-ETL as a systemd service

From source on Windows

Installation

Configuration

Startup

Data import

Plugins - Components

Update notes

About

Topics

Resources

License

Stars

Watchers

Forks

Languages