We currently just concatenate reference/alternate_bases together as the merge key (see here).
This may break Dataflow if the length of reference/alternate_bases are large (larger than 1MB limit for the key). We should create a hash of these fields and use that instead. This ensures that our keys do not get arbitrarily large.
As a result, the key would be reference_name:start_position:end_position:<hash of reference_name:alternate_bases>. Given that we include reference/start/end as part of the key (excluding the hash) then the risk of collision is extremely low. However, we can still add an assert here just to be sure.