-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ID column to SV BED file #10
Comments
Thanks for your suggestion! We generally recommend using the Tier 1 vcf file for this information, and the Tier 1 bed describes the regions in which we've made (almost) all the SV calls in the vcf. We don't have an easy way to add annotations to the Tier 2 bed since many of the variants are complex, but we are working towards new assembly-based benchmarks to describe these, including one focused on medically relevant genes for which we'll post a draft very soon. In the meantime, you could use a whole genome hifiasm/dipcall vcf to get one estimate of the potential SV call in HG002 - ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_HG002_medical_genes_benchmark_v0.01.00/GRCh37/hifiasm_v0.11. |
Thanks Justin! My assumption was that the BED file was generated from the VCF The reason I'm working with BED is that we typically work with GRCh38, so lifting over the BED is easy but not the VCF itself. |
I apologize for the confusion - as you suspect, the Tier1 BED has a very
different meaning. NCBI has remapped the vcf to GRCh38 at
https://ftp.ncbi.nlm.nih.gov/pub/dbVar/data/Homo_sapiens/by_study/vcf/nstd175.GRCh38.variant_call.vcf.gz,
though there likely are some edge cases that did not remap optimally. You
could use this in combination with the whole genome hifiasm/dipcall vcf for
GRCh38 to estimate potential SV calls in HG002 -
ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_HG002_medical_genes_benchmark_v0.01.00/GRCh38/hifiasm_v0.11
.
…On Wed, Dec 9, 2020 at 6:12 PM Steve Huang ***@***.***> wrote:
Thanks Justin!
My assumption was that the BED file was generated from the VCF
ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_SVs_Integration_v0.6/HG002_SVs_Tier1_v0.6.vcf.gz
But that might not be true based on your reply.
The reason I'm working with BED is that we typically work with GRCh38, so
lifting over the BED is easy but not the VCF itself.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AASTU5Q7RXWGO3G4AO5BCBLST7747ANCNFSM4UUCLGGA>
.
|
Thanks for the information Justin! |
Hi,
I constantly make use of the GIAB SV callset and really appreciate the effort of curating all of these.
I do have one feature request:
The SV BED file right now contains only the coordinates but not the type of variant the interval is associated with, or the originating variant ID available from the VCF (in HG19).
An IGV trick that I constantly use is packing some information—that I want to quickly get for the variant—from the source VCF into the ID (4th) column of the BED file, which will be displayed by IGV. This way one doesn't need to click on a VCF record just for a quick glance.
I'd appreciate it if the VCF ID records are copied into the BED file.
Thank you!
Steve
The text was updated successfully, but these errors were encountered: