-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deal with LINE elements #3
Comments
I think I should filter out the sequences perhaps both before blast and after blast. The ECR11 sequence doesn't give many blast hits outside of primates, so perhaps removing the LINE from that prior to blast will improve the hits in other species. |
Okay, I've now (in commit 6418a80) removed repeats from the data pipeline both prior to blasting (I blast with both the masked and unmasked sequence to get better hits) and after. However, I'm going to run into an issue when I get to the final selection calling, since that assumes that the coordinates in the MPRA data match the human sequence, which they no longer necessarily do. |
Alright, with the latest bolus of commits, I think I have everything fixed. I ran into an additional issue where different regions were getting masked, so I got around that by propagating the masks after a final alignment. The coordinate issue I solved by reinserting a homo sapiens sequence with a different name, and having all my scripts know not to mask that. |
For ECR11, Justin removes LINE elements from... the Homonoidae sequences? The reconstruction? The text is unclear, but at any rate, I need to not compare bases in LINE elements and similar.
Which means that I need to
The text was updated successfully, but these errors were encountered: