A quick pipeline to import Stack Exchange XML dump data to a relational db
🏠 TODO
mvn clean package
Before the pipeline is run, the schema-base.sql
must be executed on the desired output schema. This will initialize
the tables and create necessary indices for the data dump.
Run with docker (taking care to pass the required app.datasource.xxx and spring.datasource.xxx properties as env vars):
docker run -e APP_DATASOURCE_URL=XXXXX -e ... snimmagadda/stacke-batch-mysql:latest
To run from source, app.datasource.xxx
properties should be updated accordingly. Metrics job/task metadata by default
are output to an in-memory HSQL DB which can be overridden with the spring.datasource.xxx
properties. Example yaml:
app:
datasource:
dialect: org.hibernate.dialect.MySQLDialect
driver-class-name: org.mariadb.jdbc.Driver
url: "jdbc:mysql://localhost:3306/stacke"
username: "root"
password: "password"
Streamlined ways to import are a W.I.P. For now, manual configuration of application.yaml is required, and running from source is the simplest way to pass in custom datafiles. Once properties are configured, you can run locally with the following:
mvn spring-boot:run
mvn test
👤 Sai Nimmagadda
- Website: https://s11a.com
- Github: @snimmagadda1
Contributions, issues and feature requests are welcome!
Feel free to
check issues page.
Copyright © 2020 Sai Nimmagadda.
This project is MIT licensed.
This README was generated with ❤️ by readme-md-generator