Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output column names #13

Closed
colindaven opened this issue Mar 19, 2022 · 7 comments
Closed

Output column names #13

colindaven opened this issue Mar 19, 2022 · 7 comments
Assignees
Labels
documentation Improvements or additions to documentation question Further information is requested

Comments

@colindaven
Copy link

Looks very interesting. Did I miss the output column names somewhere ?

gi_1070105496_ref_NC_031108.1__Propionibacterium_phage_PFR2,_complete_genome           2   2    2    132   0   0.0000
gi_1070125494_ref_NC_031129.1__Salmonella_phage_SJ46,_complete_genome                  2   2    2    128   0   0.0000
gi_110645916_ref_NC_001401.2__Adeno-associated_virus_-_2,_complete_genome              1   2    2    67    0   0.0000

Thanks

@esteinig
Copy link
Owner

esteinig commented Mar 19, 2022

Ah sorry haven't documented everything properly yet, you can add -t/--table to get better formatted output (let me know if it's mangled, still gotta fine tune that table format)

Columns are:

  • number of distinct alignment regions
  • number of unique reads aligned (Illumina assumes R1 and R2 have identical read names)
  • number of alignments
  • base pairs aligned
  • reference sequence length
  • reference sequence coverage

@esteinig
Copy link
Owner

esteinig commented Mar 19, 2022

And -p/--cov-plot if a --fasta file is provided to get some basic (approximate) coverage plots in the terminal.

@esteinig
Copy link
Owner

Also note there are default options for minimum query alignment length (50 bp) and minimum mapping quality (30) on by default at the moment. You can set --min-len / --min-mapq to zero to deactivate those.

@esteinig esteinig added the question Further information is requested label Mar 19, 2022
@esteinig esteinig self-assigned this Mar 19, 2022
@esteinig esteinig changed the title Output cols ? Output column names Mar 19, 2022
@esteinig esteinig added the documentation Improvements or additions to documentation label Mar 20, 2022
@colindaven
Copy link
Author

Ok, thanks for the info, I'll have another go at it and write some scripts.

@esteinig
Copy link
Owner

esteinig commented Mar 21, 2022

If you are planning to integrate vircov into a pipeline, maybe hold out for the next release as the input command line arguments are going to change (due to adding SAM/BAM support). Should have that out in a couple of days!

I forgot there is also one empty column at the end - -v/--verbose will add whitespace separated tags into that column, one for each coverage region in the format <start of region>:<stop of region>:<number of alignments in the region>

My apologies that it's still changing a bit right now, the interface should stabilize in 0.5.0 and 0.6.0

@colindaven
Copy link
Author

Ok, no worries. We're just testing, not writing a formal pipeline, and that's probably the extent of use at the moment. It's better than other efforts we've seen, but alignment vs viruses is definitely an unsolved problem. Coverage is generally a huge issue, as is genome size and specific mappings - as I'm sure you know.

@esteinig
Copy link
Owner

Yeah definitely, we are trying to address the same issues and see what sticks (as you can see). This little tool is meant to be more experimental at this stage, rather than a definitive solution to a really hard problem. Saw your nice efforts on the Wochenende pipeline, very cool and a fitting name :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants