Skip to content

Performance and refactor of Spark module #73

Closed
@timrobertson100

Description

@timrobertson100

Data growth has meant that the backfill jobs regularly take 9+ hours with 600 cores which is now a concern. Additionally, the code is convoluted and now difficult to maintain. Explorations in this proof of concept look promising and should be completed to provide a drop-in replacement of the module.

Some design goals:

  • A drop-in replacement module (target Spark 2, but plan for Spark 3)
  • Aim for a significant performance increase using the same or fewer resources
  • Aim to remove scala code unless necessary
  • Aim for simpler code to grok

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions