Skip to content

Latest commit

 

History

History
120 lines (98 loc) · 3.11 KB

hail.rst

File metadata and controls

120 lines (98 loc) · 3.11 KB

Hail Interoperation

Glow includes functionality to enable conversion between a Hail MatrixTable and a Spark DataFrame, similar to one created with the native Glow datasources <variant_data>.

Create a Hail cluster

To use the Hail interoperation functions, you need Hail to be installed on the cluster. On a Databricks cluster, install Hail with an environment variable. See the Hail installation documentation to install Hail in other setups.

Convert to a Glow DataFrame

Convert from a Hail MatrixTable to a Glow-compatible DataFrame with the function from_matrix_table.

from glow.hail import functions
df = functions.from_matrix_table(mt, include_sample_ids=True)

By default, the genotypes contain sample IDs. To remove the sample IDs, set the parameter include_sample_ids=False.

Schema mapping

The Glow DataFrame variant fields are derived from the Hail MatrixTable row fields.

Required Glow DataFrame variant field Hail MatrixTable row field
Yes contigName locus.contig
Yes start locus.position - 1
Yes end info.END or locus.position - 1 + len(alleles[0])
Yes referenceAllele alleles[0]
No alternateAlleles alleles[1:]
No names [rsid, varid]
No qual qual
No filters filters
No INFO_<ANY_FIELD> info.<ANY_FIELD>

The Glow DataFrame genotype sample IDs are derived from the Hail MatrixTable column fields.

All of the other Glow DataFrame genotype fields are derived from the Hail MatrixTable entry fields.

Glow DataFrame genotype field Hail MatrixTable entry field
phased GT.phased
calls GT.alleles
depth DP
filters FT
genotypeLikelihoods GL
phredLikelihoods PL
posteriorProbabilities GP
conditionalQuality GQ
haplotypeQualities HQ
expectedAlleleCounts EC
mappingQuality MQ
alleleDepths AD
<ANY_FIELD> <ANY_FIELD>