Fix uniqueness check in intersect.by_shared_chroms() #581
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #574. Thank you @dajana17 for reporting this!
The intended check was to see if the entire
table
andother
only contain one chromosome each (and it's also the same chromosome). However,.is_unique
actually does almost the opposite thing: it's true when all values in the chromosome column are unique. For example, it triggers for [chr1, chr2, chr3, chr4, chr5], while the intention was to trigger for [chr1, chr1, chr1, chr1, chr1].This appears to be a very rare edge case, because in the real world data the list of chromosomes in both tables was probably never or very rarely unique.Actually, the list of FASTA contigs (the first table) is always unique, so only the BED file needs to have non-repeating chromosomes in order for the bug to trigger.