Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vcf filtering #31

Merged
merged 4 commits into from Nov 19, 2021
Merged

Vcf filtering #31

merged 4 commits into from Nov 19, 2021

Conversation

jkbonfield
Copy link
Contributor

@jkbonfield jkbonfield commented Mar 1, 2021

The workflows section was getting long, so I split it into separate documents.

I then added a VCF filtering guide, with images showing true vs false variants plotted against a range of INFO and FORMAT metrics. It's clear some of these can work well for filtering variants, although how to do so well is tricky as the values needed are often depth specific. The main aim is to teach people how to explore filtering on their own data sets (assuming they have a known truth set). Possibly we should improve on this by providing some scripts, rather than the one-liners embedded within the tutorial, although that is at least educational.

For now this is a draft PR as the parameters filtered on are tied to the changes in samtools/bcftools#1363, especially for indel filtering.

I think only once that bcftools goes live (not this release, but hopefully the one after) can we consider publishing a filtering guide.

A preformatted version of this PR (at the time of writing at least) can be seen here:

https://jkbonfield.github.io/www.htslib.org/workflow-filter/

@jkbonfield jkbonfield marked this pull request as ready for review October 25, 2021 13:40
Copy link
Member

@daviesrob daviesrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

images/150x_DPQ30_lhist.png appears to be empty (and possibly not referenced)? Is it supposed to be there?

Rather than have workflow-cram.md etc, would it be possible to put the workflows into their own directory (so workflow/cram.md), so the root directory doesn't get too crowded?


Additionally `bcftools call` has some options which govern output of variants.

Option                    | Description
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all these non-breaking spaces necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes or it looks totally rubbish, but I've found a better solution in <img width=150/>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also removed the DPQ30 images, added the ones I missed (BQBZ), fixed the rogue quote and done some shuffling to a subdirectory. Testing it atm. Thanks for feedback

The most obvious filter parameter however is the QUAL field.

Bcftools can filter-in or filter-out using options `-i` and `-e`
respectively on the `bcftools view` or `bcftools filter1 commands. For
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing closing back-tick here.

Each has a table of contents linking to the others for ease of
navigation.  Maybe this isn't needed though as the menus are there also?
- Moved workflows to their own subdirectory
- Added BQBZ plots
- Removed DPQ30 plots
- Fixed table widths in a more elegant fashion
- Fixed missing quote
@jkbonfield
Copy link
Contributor Author

After a ridiculous amount of trial and error to prevent 0 headings or 2 headings per page, I finally wrestled it into submission and I think this is now ready for review again.

@jkbonfield jkbonfield merged commit fe0f6a6 into samtools:gh-pages Nov 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants