Repository for an akka microservice that lift the trained spark ml algorithms as a actorsystem with http endpoints.
akka-lift-ml helps you with the hard data engineering part, when you have found a good solutions with your data science team. The service can train your models on a remote spark instance and serve the results with a small local spark service. You can access it over http e.g. with the integrated swagger ui. To build your own system you need sbt and scala. The trained models are saved to AWS S3 and referenced in a postgres database, so can scale out your instances for load balacing.
- JDK8 http://www.oracle.com/technetwork/java/javase/downloads/index.html)
- sbt(http://www.scala-sbt.org/release/docs/Getting-Started/Setup.html)
- docker for dockerbuild (https://www.docker.com/community-edition/)
- aws account if you want to use a cognito userpool for authentifaction (https://aws.amazon.com/de/)
- enough memory for spark
- Integration of swagger-ui
localhost:8080/v1/swagger/index.html
- Autogenerated swagger doc from routes as yaml / json
localhost:8080/v1/api-docs/swagger.yaml
orlocalhost:8080/v1/api-docs/swagger.json
- CRUD Repositorys via slick-repo
- CORS Support via akka-http-cors
- Implemented Authentication with AWS Cognito (JWK) and JWT Token via nimbusds (in Java)
- Test coverage with ScalaTest and scoverage code coverage report
- Ready for Docker deployment and CloudFormation deployment
- Config file with optional runtime parameters
- In-Memory Postgres SQL database for tests
- Flyway database migration
- HikariCP as connection pool
- Logging via Log4j with a xml template
- Collaborative Filtering with ALS (Alternating-Least-Squares), even when the user is not in the rating
- Easy cleaning of data.
- More spark mllib features
- Add more and better tests
- Prepare your data with 3 columns user,product,retaing - sample can be found in test resources (retail-raiting.csv)
- If you want to train remote and not on your local machine, first start your Spark Cluster (Spark Cluster with 1x Master & 3x worker via Docker)
- Checkout the source code from github -Start a PostgreSQL Database via RDS, Docker or locally
- Make related config changes to application.conf or docker.conf
- If you use AWS be sure that the s3 Bucket is not in EUROPE!! Spark 2.1 can not write/read data then
- create a jar as a spark driver
sbt package
- be sure the path in application.conf is set correctly. - run
sbt run
- go to Swagger UI (http://localhost:8283/swagger/index.html)
- send your request to the service
- after successfull training you get the result via http get
- run
sbt docker:publishLocal
to create a docker container image
For more details and instructions read the wiki.
SQL_URL
- database url by schemejdbc:postgresql://host:port/database-name
SQL_USER
- database userSQL_PASSWORD
- database passwordNIC_IP
- IP Address bounded to the http service default is 0.0.0.0NIC_PORT
- TCP Port used for the http service default is 8080USER_POOL
- Define an other cognito user pool than the preconfigured userpool
To run application, call:
sbt run
For launching application in Docker, you must configure database docker instance and run docker image, generated by sbt.
Generating application docker image and publishing on localhost:
sbt docker:publishLocal
Example of running, generated docker image:
docker run --name akkaHttp -m 6g -e SQL_USER=dbuser -e SQL_PASSWORD=dbpass -e SQL_URL=jdbcURL -d -p 8283:8283 APPLICATION_IMAGE
APPLICATION_IMAGE
- id or name of application docker image
look at --link
parameter if the database is also a docker container
To run tests, call:
sbt test
To run all tests, with codecoverage, call:
sbt clean coverage test
To generate a coverage report afterwars the testrun, call:
sbt coverageReport
Tobias Jonas
akka-lift-ml is licensed under Apache License, Version 2.0.
Commercial Support innFactory Cloud & DataEngineering