Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a skeleton VCF parser which can print variants in VCF output. #46

Merged
merged 4 commits into from
Feb 15, 2021

Conversation

joshuak94
Copy link
Collaborator

@joshuak94 joshuak94 commented Jan 22, 2021

Resolves #9 and resolves #10.

Here I introduce a skeleton VCF parser to address issue #29. It doesn't really do any checking to make sure values are valid, but it does provide a template for information regarding variants to be stored in a VCF format-friendly way.

Some things up for discussion:

  1. Currently, the INFO header is stored in a vector of info_entry objects. In the variants themselves, the respective INFO entries are stored in a map, with the key being the INFO name and the value being the INFO value... I'm not too satisfied with this current implementation (to have a std::map for every variant seems unnecessary).
  2. Constructors for variant_header and variant_record are currently empty, because I figured all of the information is not always known on construction. But maybe it makes sense to change this?
  3. The INFO field can store various types of information (integers, floats, flags, characters, and strings). Currently, everything is being stored as a string as I wasn't sure the best way to do this.

@Irallia Irallia requested review from a team and eaasna and removed request for a team January 25, 2021 09:07
Copy link
Contributor

@eaasna eaasna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nive work!

Copy link
Contributor

@eaasna eaasna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have two questions

@@ -0,0 +1,193 @@
#include <seqan3/io/stream/concept.hpp>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the connection to the seqan3 VCF parser? Will you also be using it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! So at the moment, seqan3 doesn't have a VCF parser implemented. So the one I'm implementing here will be used as a sort of framework for the future seqan3 implementation!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the explanation!

src/find_deletions/deletion_finding_and_printing.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@eaasna eaasna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Copy link
Collaborator

@Irallia Irallia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work!
Could you maybe add some Information to the classes and the head of the file, as I mentioned?
And if you don't have a strong opinion, I would replace that i with an _i. 💅
Thanks again!

Comment on lines 24 to 42
/*! \brief Set the file format for this VCF file header.
*
* \param fileformati The input fileformat.
*/
void set_fileformat(std::string fileformati)
{
fileformat = fileformati;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea, so it doesn't look like a typo:

Suggested change
/*! \brief Set the file format for this VCF file header.
*
* \param fileformati The input fileformat.
*/
void set_fileformat(std::string fileformati)
{
fileformat = fileformati;
}
/*! \brief Set the file format for this VCF file header.
*
* \param fileformat_i The input fileformat.
*/
void set_fileformat(std::string fileformat_i)
{
fileformat = fileformat_i;
}

#include <seqan3/io/stream/concept.hpp>
#include <fstream>
#include <map>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a short description like

/*
 * We're using the The Variant Call Format (VCF) Specification: VCFv4.3 and BCFv2.2
 * https://samtools.github.io/hts-specs/VCFv4.3.pdf
 */

#include <fstream>
#include <map>

class info_entry
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe something like this for the info_enty class:

/*
 * An Info field looks like:
 * ##INFO=<ID=ID,Number=number,Type=type,Description="description",Source="source",Version="version">
 * https://samtools.github.io/hts-specs/VCFv4.3.pdf (1.4.2 Information field format)
 */

Copy link
Collaborator

@Irallia Irallia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGFM, thank you!

@Irallia Irallia merged commit 91e4df3 into seqan:master Feb 15, 2021
@joshuak94 joshuak94 deleted the vcf_parser branch February 17, 2021 11:50
Irallia pushed a commit to Irallia/iGenVar that referenced this pull request May 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] SeqAn3: Develop info fields of the VCF parser. [FEATURE] SeqAn3: Develop a simple VCF parser.
3 participants