Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with LINE elements #3

Closed
petercombs opened this issue Apr 12, 2019 · 3 comments
Closed

Deal with LINE elements #3

petercombs opened this issue Apr 12, 2019 · 3 comments

Comments

@petercombs
Copy link
Owner

For ECR11, Justin removes LINE elements from... the Homonoidae sequences? The reconstruction? The text is unclear, but at any rate, I need to not compare bases in LINE elements and similar.

Which means that I need to

  1. Identify LINEs and other repetitive elements
  2. Figure out an appropriate time to remove them from my sequences
  3. Do everything after that.
@petercombs
Copy link
Owner Author

I think I should filter out the sequences perhaps both before blast and after blast. The ECR11 sequence doesn't give many blast hits outside of primates, so perhaps removing the LINE from that prior to blast will improve the hits in other species.

@petercombs
Copy link
Owner Author

petercombs commented Apr 24, 2019

Okay, I've now (in commit 6418a80) removed repeats from the data pipeline both prior to blasting (I blast with both the masked and unmasked sequence to get better hits) and after. However, I'm going to run into an issue when I get to the final selection calling, since that assumes that the coordinates in the MPRA data match the human sequence, which they no longer necessarily do.

@petercombs
Copy link
Owner Author

Alright, with the latest bolus of commits, I think I have everything fixed. I ran into an additional issue where different regions were getting masked, so I got around that by propagating the masks after a final alignment.

The coordinate issue I solved by reinserting a homo sapiens sequence with a different name, and having all my scripts know not to mask that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant