allow for infinite allele violations in simulation/validation #17

wsdewitt · 2017-02-06T18:25:05Z

We need validations that are robust to simulation results that violate the infinite alleles assumption. Currently the simulation code is set up to retry if the collapsed tree has repeated alleles, but it would be better if our validation could handle this somehow. Maybe for MRCA we score that allele using the copy/lineage that results in the largest hamming distance compared to the ancestor in the inferred tree, or maybe take the mean of all of them?

matsen · 2017-02-06T21:50:29Z

Great that you're thinking about this.

The situations of infinite sites versus repeated alleles seem sufficiently incomparable that handling them as if they were comparable seems like it might create problems. In any case I'd want to know how frequently it happened.

So how frequently does it happen? Could we just say that the assumption was violated in x% of the simulations?

krdav · 2017-02-26T02:19:23Z

Indeed repeated alleles in the simulation happens quite often. It depends very much on the simulation parameters but I have seen several parameter regimes where 90-99% of all simulations have been terminated because of repeated alleles.

krdav · 2017-03-21T21:59:03Z

The conclusion from todays talk was that we are going do deal with this by making a list of all possible trees with this convergently evolved leaf, then do the validation metrics on these and take the worst tree as the results.

Problem is then that the number of trees increase exponentially with the number of repeated alleles. This could be resolved by iterating through each unresolved repeat, finding the worst (by a given validation metric) way to resolve it and then move on to the next repeat.

wsdewitt · 2017-08-15T22:16:55Z

We can now simulate repeated genotypes, and MRCA and COAR validation metrics work (although RF is wonky).

wsdewitt closed this as completed Aug 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow for infinite allele violations in simulation/validation #17

allow for infinite allele violations in simulation/validation #17

wsdewitt commented Feb 6, 2017

matsen commented Feb 6, 2017

krdav commented Feb 26, 2017

krdav commented Mar 21, 2017

wsdewitt commented Aug 15, 2017

allow for infinite allele violations in simulation/validation #17

allow for infinite allele violations in simulation/validation #17

Comments

wsdewitt commented Feb 6, 2017

matsen commented Feb 6, 2017

krdav commented Feb 26, 2017

krdav commented Mar 21, 2017

wsdewitt commented Aug 15, 2017