Skip to content
This repository has been archived by the owner on Nov 9, 2023. It is now read-only.

verilylifesciences/variant-qc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Disclaimer

This is not an official Verily product.

variant-qc

This repository contains code to perform cohort-level quality control checks on human genomic variants. Cloud technology is used to perform queries in parallel. For prior work, see Cloud-based interactive analytics for terabytes of genomic variants data.

View output from these queries run on public data

Before running the queries yourself, you can see the results on a few public datasets:

Run these queries on your own data

Load data to BigQuery

The queries in this repository assume that the VCFs were loaded to BigQuery using Variant Transforms with the MOVE_TO_CALLS merge strategy included.

Using the MOVE_TO_CALLS merge strategy will produce a core set of columns common to all tables created from VCFs and calls for the exact same (reference_name, start_position, end_position, reference_bases, and all alternate_bases) grouped together in a single row.

We recommend loading single-sample VCFs into a "genome call table" and also the multisample VCF into a "multisample-variants table".

If you do not have a multisample VCF, you could:

Predict ancestry

If your sample information does not already include ancestry, you can predict the ancestry for each genome using Genomic ancestry inference with deep learning.

Run the QC overview reports

Run the RMarkdown parameterized reports to get an overview of your data.

Drill down on results

Drill down further on results by creating additional plots and/or performing additional queries. For example, these queries can be used from the context of Jupyter notebooks, and then additional queries or other queries can be used to further explain the results for a particular dataset.

Technologies used

The methods make use of:

Each technology has introductory material that may help you when working with the code in this repository.

About

Quality control methods for human genomic variants.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published