Skip to content

Use a hash for reference/alternate_bases when making the merge key #102

@arostamianfar

Description

@arostamianfar

We currently just concatenate reference/alternate_bases together as the merge key (see here).

This may break Dataflow if the length of reference/alternate_bases are large (larger than 1MB limit for the key). We should create a hash of these fields and use that instead. This ensures that our keys do not get arbitrarily large.

As a result, the key would be reference_name:start_position:end_position:<hash of reference_name:alternate_bases>. Given that we include reference/start/end as part of the key (excluding the hash) then the risk of collision is extremely low. However, we can still add an assert here just to be sure.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions