Skip to content

[query] In Google, large pipelines which encounter transient errors often fail to cleanly restart. #13356

Closed
@chrisvittal

Description

@chrisvittal

The VDS combiner is flaky on query on batch on GCP due to issues reading VCFs with intervals.

Errors observed:

  • BGZ validation errors
  • Unexpected end of input

Both of these point to issues in the interface between the FSSeekableInputStream that underpins GoogleFS and the BGZipInputStream that contains it at least in the presence of more than one seek.

Unfortunately, the conditions that reproduce this are rare, and when our clusters are quieter (nighttime) the errors are even less frequent.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions