Skip to content

Batch pipeline to import Stack Exchange XML data dumps to relational DB

License

Notifications You must be signed in to change notification settings

snimmagadda1/stack-exchange-dump-to-mysql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

stack-exchange-dump-to-mysql 👋

Version License: MIT

A quick pipeline to import Stack Exchange XML dump data to a relational db

🏠 TODO

Install

mvn clean package

Usage

Before the pipeline is run, the schema-base.sql must be executed on the desired output schema. This will initialize the tables and create necessary indices for the data dump.

Run with docker (taking care to pass the required app.datasource.xxx and spring.datasource.xxx properties as env vars):

docker run -e APP_DATASOURCE_URL=XXXXX -e ... snimmagadda/stacke-batch-mysql:latest 

To run from source, app.datasource.xxx properties should be updated accordingly. Metrics job/task metadata by default are output to an in-memory HSQL DB which can be overridden with the spring.datasource.xxx properties. Example yaml:

app:
  datasource:
    dialect: org.hibernate.dialect.MySQLDialect
    driver-class-name: org.mariadb.jdbc.Driver
    url: "jdbc:mysql://localhost:3306/stacke"
    username: "root"
    password: "password"

Streamlined ways to import are a W.I.P. For now, manual configuration of application.yaml is required, and running from source is the simplest way to pass in custom datafiles. Once properties are configured, you can run locally with the following:

mvn spring-boot:run

Run tests

mvn test

Author

👤 Sai Nimmagadda

🤝 Contributing

Contributions, issues and feature requests are welcome!
Feel free to check issues page.

📝 License

Copyright © 2020 Sai Nimmagadda.
This project is MIT licensed.


This README was generated with ❤️ by readme-md-generator

About

Batch pipeline to import Stack Exchange XML data dumps to relational DB

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published