# table of contents:

[1. Import hail, other libraries and data](#1)

[2. Explore MatrixTable, collect field descriptions](#2)

   [2.1 Removing the star alleles](#2.1)
    
[3. Do simple aggregations, filtering and plots](#3)

[4. Work with single sample and single variant](#4)

   [4.1 Single sample](#4.1)

   [4.2 Single variant](#4.2)

[5. Explore Clinvar sinificance of detected variants](#5)

[5.1 Are there known pathogenic variants in > 1 sample?](#5.1)

[6. Annotate with CADD (in progress)](#6)

<a id='1'></a> 
## 1. Import hail, other libraries and data

always run this code to widen notebook:

In [456]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

In [1]:
import hail as hl
hl.init() 

Running on Apache Spark version 2.4.1
SparkUI available at http://349d1de1bab4:4040
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.26-2dcc3d963867
LOGGING: writing to /hail/hail-20191114-0831-0.2.26-2dcc3d963867.log


In [455]:
from hail.plot import show
from pprint import pprint
hl.plot.output_notebook()

In [197]:
import numpy as np
import pandas as pd

In [10]:
hl.import_vcf('data/annotated.test.vcf', reference_genome='GRCh38').write('data/sample.mt', overwrite=True)

2019-11-12 15:44:48 Hail: INFO: Ordering unsorted dataset with network shuffle
2019-11-12 15:45:18 Hail: INFO: wrote matrix table with 726380 rows and 151 columns in 116 partitions to data/sample.mt


In [356]:
mt = hl.read_matrix_table('data/sample.mt') # mt stands for MatrixTable

<a id='2'></a> 
## 2. Explore MatrixTable, collect field descriptions

In [192]:
header = hl.get_vcf_metadata('data/annotated.test.vcf')

In [381]:
pprint(header) #ważne: ISEQ_GNOMAD_GENOMES_V3_AF_nfe ISEQ_AGGREGATED_CLINVAR_SIGNIFICANCE ANN

{'filter': {'PASS': {'Description': 'All filters passed'}},
 'format': {'AD': {'Description': 'Allelic depths for the ref and alt alleles '
                                  'in the order listed',
                   'Number': '.',
                   'Type': 'Integer'},
            'DP': {'Description': 'Read depth',
                   'Number': '1',
                   'Type': 'Integer'},
            'GQ': {'Description': 'Genotype quality',
                   'Number': '1',
                   'Type': 'Integer'},
            'GT': {'Description': 'Genotype', 'Number': '1', 'Type': 'String'},
            'PGT': {'Description': 'Physical phasing haplotype information, '
                                   'describing how the alternate alleles are '
                                   'phased in relation to one another',
                    'Number': '1',
                    'Type': 'String'},
            'PID': {'Description': 'Physical phasing ID information, where '
                      

In [19]:
mt.describe() #see whats inside (AF—Allele Frequency for each ALT allele, in the same order as listed)

----------------------------------------
Global fields:
    None
----------------------------------------
Column fields:
    's': str
----------------------------------------
Row fields:
    'locus': locus<GRCh38>
    'alleles': array<str>
    'rsid': str
    'qual': float64
    'filters': set<str>
    'info': struct {
        AC: array<int32>, 
        AF: array<float64>, 
        AN: int32, 
        BaseQRankSum: float64, 
        ClippingRankSum: float64, 
        DB: bool, 
        DP: int32, 
        ExcessHet: float64, 
        FS: float64, 
        InbreedingCoeff: float64, 
        MLEAC: array<int32>, 
        MLEAF: array<float64>, 
        MQ: float64, 
        MQRankSum: float64, 
        QD: float64, 
        ReadPosRankSum: float64, 
        SOR: float64, 
        MULTIALLELIC: bool, 
        ALTS: array<str>, 
        ALTS_pos: int32, 
        ALTS_indices: array<str>, 
        ISEQ_GNOMAD_EXOMES_AC: array<int32>, 
        ISEQ_GNOMAD_EXOMES_AN: int32, 
        ISEQ_GNOM

In [4]:
mt.rows().select().show(5)

locus,alleles
locus<GRCh38>,array<str>
chr1:10146,"[""AC"",""*""]"
chr1:10177,"[""A"",""*""]"
chr1:10622,"[""T"",""*""]"
chr1:10623,"[""T"",""*""]"
chr1:30923,"[""G"",""*""]"


<a id='2.1'></a>

## 2.1 Removing the star alleles

###These are orphaned stars and shouldn't be here



In [363]:
mt = mt.filter_rows(mt.alleles.contains('*'), keep = False)

In [364]:
mt.count_cols()

151

In [365]:
mt.count_rows()

47373

We can use rows along with select to pull out 5 variants. The select method takes either a string refering to a field name in the table, or a Hail Expression. Here, we leave the arguments blank to keep only the row key fields, locus and alleles.

In [366]:
mt.entry.show(5)

locus,alleles
locus<GRCh38>,array<str>
chr1:69968,"[""A"",""G""]"
chr1:183189,"[""G"",""C""]"
chr1:183238,"[""G"",""C""]"
chr1:183937,"[""G"",""A""]"
chr1:184994,"[""G"",""C""]"


In [367]:
mt.entry.take(5) #create a list

[Struct(AD=[7, 0], DP=7, GQ=21, GT=Call(alleles=[0, 0], phased=False), PGT=None, PID=None, PL=[0, 21, 239]),
 Struct(AD=[0, 0], DP=0, GQ=None, GT=None, PGT=None, PID=None, PL=None),
 Struct(AD=[0, 0], DP=0, GQ=None, GT=None, PGT=None, PID=None, PL=None),
 Struct(AD=[0, 0], DP=0, GQ=None, GT=None, PGT=None, PID=None, PL=None),
 Struct(AD=[6, 0], DP=6, GQ=18, GT=Call(alleles=[0, 0], phased=False), PGT=None, PID=None, PL=[0, 18, 173])]

In [368]:
mt.entry.take(200)[130] #but works slow!

Struct(AD=[10, 0], DP=10, GQ=30, GT=Call(alleles=[0, 0], phased=False), PGT=None, PID=None, PL=[0, 30, 327])

In [40]:
mt.s.show(5) #samples


s
str
"""S_136"""
"""S_170c"""
"""S_170d"""
"""S_6981"""
"""S_6982"""


In [191]:
mt.s.describe()

--------------------------------------------------------
Type:
        str
--------------------------------------------------------
Source:
    <hail.matrixtable.MatrixTable object at 0x7f93c162d410>
Index:
    ['column']
--------------------------------------------------------


In [442]:
mt.s.take(5)

['S_136', 'S_170c', 'S_170d', 'S_6981', 'S_6982']

In [369]:
mt.GT.show()

locus,alleles,S_136.GT,S_170c.GT,S_170d.GT,S_6981.GT
locus<GRCh38>,array<str>,call,call,call,call
chr1:69968,"[""A"",""G""]",0/0,,,
chr1:183189,"[""G"",""C""]",0/0,0/0,0/0,0/0
chr1:183238,"[""G"",""C""]",0/0,0/0,0/0,0/0
chr1:183937,"[""G"",""A""]",0/0,0/1,0/1,0/0
chr1:184994,"[""G"",""C""]",0/0,0/0,,0/0
chr1:185336,"[""C"",""T""]",0/0,0/1,0/0,0/0
chr1:185497,"[""G"",""A""]",0/0,0/0,0/1,0/0
chr1:185550,"[""G"",""A""]",0/0,0/0,0/0,0/0
chr1:186338,"[""T"",""G""]",0/1,0/0,0/1,0/1
chr1:186341,"[""T"",""G""]",0/0,0/0,0/0,0/0


In [53]:
mt.info.take(2)[1]   #print an example info for a vartiant

Struct(AC=[2], AF=[0.083], AN=24, BaseQRankSum=-0.27, ClippingRankSum=-0.0, DB=True, DP=8231, ExcessHet=3.9794, FS=0.0, InbreedingCoeff=0.1448, MLEAC=[3], MLEAF=[0.125], MQ=1.84, MQRankSum=-0.0, QD=30.0, ReadPosRankSum=2.37, SOR=1.061, MULTIALLELIC=True, ALTS=['C', '*'], ALTS_pos=10177, ALTS_indices=['2'], ISEQ_GNOMAD_EXOMES_AC=None, ISEQ_GNOMAD_EXOMES_AN=None, ISEQ_GNOMAD_EXOMES_AF=None, ISEQ_GNOMAD_EXOMES_nhomalt=None, ISEQ_GNOMAD_EXOMES_controls_AF=None, ISEQ_GNOMAD_EXOMES_controls_nhomalt=None, ISEQ_GNOMAD_EXOMES_non_neuro_AF=None, ISEQ_GNOMAD_EXOMES_non_neuro_nhomalt=None, ISEQ_GNOMAD_EXOMES_non_cancer_AF=None, ISEQ_GNOMAD_EXOMES_non_cancer_nhomalt=None, ISEQ_GNOMAD_EXOMES_popmax=None, ISEQ_GNOMAD_EXOMES_popmax_AF=None, ISEQ_GNOMAD_EXOMES_nhomalt_popmax=None, ISEQ_GNOMAD_EXOMES_controls_popmax=None, ISEQ_GNOMAD_EXOMES_controls_AF_popmax=None, ISEQ_GNOMAD_EXOMES_controls_nhomalt_popmax=None, ISEQ_GNOMAD_EXOMES_non_neuro_popmax=None, ISEQ_GNOMAD_EXOMES_non_neuro_AF_popmax=None, ISEQ

In [371]:
mt.info.ISEQ_CLINVAR_REVIEW_STATUS.describe() #describe a field

--------------------------------------------------------
Type:
        array<str>
--------------------------------------------------------
Source:
    <hail.matrixtable.MatrixTable object at 0x7f939ceb6490>
Index:
    ['row']
--------------------------------------------------------


In [372]:
hl.summarize_variants(mt)

Number of alleles,Count
2,47373

Allele type,Count
SNP,41030
Deletion,3351
Insertion,2992

Contig,Count
chr1,4499
chr2,2769
chr3,2436
chr4,1549
chr5,2578
chr6,1900
chr7,2886
chr8,1458
chr9,1812
chr10,1567


<a id='3'></a> 
## 3. Do simple aggregations, filtering and plots

In [447]:
mt.aggregate_entries(hl.agg.counter(mt.info.MULTIALLELIC)) #count occurence of given value, but has troubles with the array fields

{False: 6678428, True: 474895}

exercise: Try to see some variants where ISEQ_GENES_NAMES is.defined:

In [448]:
mt.filter_rows(hl.is_defined(mt.info.ISEQ_GENES_NAMES)).info.ISEQ_GENES_NAMES.show(5) #is_defined() filters out empty/nondefined

locus,alleles,Unnamed: 2_level_0
locus<GRCh38>,array<str>,array<str>
chr1:69968,"[""A"",""G""]","[""OR4F5""]"
chr1:183189,"[""G"",""C""]","[""FO538757.2""]"
chr1:183238,"[""G"",""C""]","[""FO538757.2""]"
chr1:183937,"[""G"",""A""]","[""FO538757.2""]"
chr1:184994,"[""G"",""C""]","[""FO538757.1""]"


In [449]:
mt.aggregate_entries(hl.agg.counter(mt.GT.n_alt_alleles())) #distribution of genotype calls

{0: 6651990, 1: 344890, 2: 84194, None: 72249}

In [450]:
mt.aggregate_entries(hl.agg.fraction(hl.is_defined(mt.GT))) #calculate the call rate directly

0.9898999388116544

annotate the mt with call rate per variant: 

In [451]:
mt = mt.annotate_rows(call_rate = hl.agg.fraction(hl.is_defined(mt.GT)))

In [452]:
mt.call_rate.describe()

--------------------------------------------------------
Type:
        float64
--------------------------------------------------------
Source:
    <hail.matrixtable.MatrixTable object at 0x7f935fd37ad0>
Index:
    ['row']
--------------------------------------------------------


In [453]:
mt.call_rate.summarize()

0,1
Non-missing,47373 (100.00%)
Missing,0
Minimum,0.01
Maximum,1.00
Mean,0.99
Std Dev,0.06


In [460]:
p = hl.plot.histogram(mt.call_rate, bins=25, title='Variant Call Rate Histogram', range=(0,1.0), legend='Call Rate')
show(p) #plot call rate per variant frequency

In [490]:
p = hl.plot.histogram(mt.DP, range=(0,100), bins=30, title='DP Histogram', legend='DP')
show(p)

<a id='4'></a> 
## 4. Work with single sample and single variant

<a id='4.1'></a> 
### 4.1 Single sample

In [461]:
S_136 = mt.filter_cols(mt.s == "S_136")

In [462]:
S_136.GT.describe()

--------------------------------------------------------
Type:
        call
--------------------------------------------------------
Source:
    <hail.matrixtable.MatrixTable object at 0x7f937795a110>
Index:
    ['column', 'row']
--------------------------------------------------------


In [463]:
S_136.count_cols()

1

In [464]:
S_136.count_rows()

47373

In [465]:
S_136.aggregate_entries(hl.agg.counter(S_136.GT.n_alt_alleles()))

{0: 44094, 1: 2291, 2: 532, None: 456}

In [413]:
S_136.GT.show()

locus,alleles,S_136.GT
locus<GRCh38>,array<str>,call
chr1:10146,"[""AC"",""*""]",0/0
chr1:10177,"[""A"",""*""]",
chr1:10622,"[""T"",""*""]",
chr1:10623,"[""T"",""*""]",
chr1:30923,"[""G"",""*""]",
chr1:54714,"[""TTCTTTCTTTCTTTC"",""*""]",0/0
chr1:54715,"[""TC"",""*""]",0/0
chr1:54718,"[""TTCTTTC"",""*""]",0/0
chr1:54720,"[""CTTTCTTTCTTTCT"",""*""]",0/0
chr1:54724,"[""CT"",""*""]",0/1


In [209]:
S_136.aggregate_entries(hl.agg.counter(S_136.info.MULTIALLELIC))

{False: 44228, True: 682152}

In [212]:
S_136.alleles.is_star()

AttributeError: 'ArrayExpression' object has no attribute 'is_star'

In [210]:
S_136.alleles.describe

<bound method Expression.describe of <ArrayExpression of type array<str>>>

<a id='4.2'></a> 
### 4.2 Single variant

In [181]:
mt.locus.describe()

--------------------------------------------------------
Type:
        locus<GRCh38>
--------------------------------------------------------
Source:
    <hail.matrixtable.MatrixTable object at 0x7f93c162d410>
Index:
    ['row']
--------------------------------------------------------


In [186]:
hl.Locus.parse('chr1:183189', reference_genome='GRCh38') #get hail locus-type field from chrZ:12345 string

Locus(contig=chr1, position=183189, reference_genome=GRCh38)

In [187]:
chr1_183189 = mt.filter_rows(mt.locus == hl.Locus.parse('chr1:183189', reference_genome='GRCh38'))

In [188]:
chr1_183189.aggregate_entries(hl.agg.counter(chr1_183189.GT.n_alt_alleles()))

2019-11-14 16:46:20 Hail: INFO: reading 1 of 116 data partitions


{0: 132, 1: 19}

In [375]:
chr1_183189.alleles.show()

2019-11-19 08:29:49 Hail: INFO: reading 1 of 116 data partitions


locus,alleles
locus<GRCh38>,array<str>
chr1:183189,"[""G"",""C""]"


In [374]:
chr1_183189.info.ANN.collect()

2019-11-19 08:28:42 Hail: INFO: reading 1 of 116 data partitions


[['C|missense_variant|MODERATE|FO538757.2|ENSG00000279928|transcript|ENST00000624431.1|protein_coding|2/3|c.114G>C|p.Lys38Asn|430/718|114/402|38/133||',
  'C|downstream_gene_variant|MODIFIER|FO538757.1|ENSG00000279457|transcript|ENST00000623083.3|protein_coding||c.*2028C>G|||||1736|',
  'C|downstream_gene_variant|MODIFIER|FO538757.1|ENSG00000279457|transcript|ENST00000623834.3|protein_coding||c.*2028C>G|||||1734|',
  'C|downstream_gene_variant|MODIFIER|FO538757.1|ENSG00000279457|transcript|ENST00000624735.1|protein_coding||c.*1738C>G|||||1738|',
  'C|downstream_gene_variant|MODIFIER|MIR6859-2|ENSG00000273874|transcript|ENST00000612080.1|miRNA||n.*4702C>G|||||4702|']]

In [385]:
chr1_183189.info.ISEQ_AGGREGATED_CLINVAR_SIGNIFICANCE.collect()

2019-11-19 08:49:37 Hail: INFO: reading 1 of 116 data partitions


[None]

<a id='5'></a> 

### 5. Explore Clinvar sinificance of detected variants

In [395]:
mt.info.ISEQ_AGGREGATED_CLINVAR_SIGNIFICANCE # hl.delimit function converts array into string

<ArrayExpression of type array<str>>

In [409]:
mt.aggregate_rows(hl.agg.counter(hl.delimit(mt.info.ISEQ_AGGREGATED_CLINVAR_SIGNIFICANCE))) #yay!

{'likely_benign': 487,
 None: 44158,
 'risk_factor': 8,
 'benign': 363,
 'benign:other': 1,
 'pathogenic/likely_pathogenic': 32,
 'uncertain_significance:other': 1,
 'likely_pathogenic': 28,
 'not_provided': 60,
 'protective': 2,
 'pathogenic:risk_factor': 2,
 'benign/likely_benign': 330,
 'drug_response': 5,
 'conflicting_interpretations_of_pathogenicity': 697,
 'benign/likely_benign:other': 3,
 'uncertain_significance': 1074,
 'other': 5,
 'pathogenic': 114,
 'conflicting_interpretations_of_pathogenicity:risk_factor': 1,
 'affects': 2}

In [414]:
hl.is_defined(mt.info.ISEQ_CLINVAR_DISEASES)

<BooleanExpression of type bool>

In [434]:
mt.aggregate_rows(hl.agg.counter(hl.delimit(mt.info.ISEQ_CLINVAR_DISEASES)))

{'Combined_oxidative_phosphorylation_deficiency^Combined_oxidative_phosphorylation_deficiency_5^OVARIAN_DYSGENESIS_7': 1,
 'Cerebrooculofacioskeletal_syndrome_4^ERCC1-Related_Xeroderma_Pigmentosum^Xeroderma_pigmentosum': 1,
 'White_sponge_nevus_of_cannon': 3,
 'Autoimmune_lymphoproliferative_syndrome^Autoimmune_lymphoproliferative_syndrome_type_2A^Neoplasm_of_stomach^Non-Hodgkin_lymphoma': 1,
 'Cohen_syndrome': 8,
 'Pachyonychia_congenita_2^Pachyonychia_congenita_syndrome^Steatocystoma_multiplex': 1,
 'Otitis_media_susceptibility_to': 4,
 'PLATELET_ABNORMALITIES_WITH_EOSINOPHILIA_AND_IMMUNE-MEDIATED_INFLAMMATORY_DISEASE': 7,
 'Cortical_dysplasia_complex_with_other_brain_malformations_6^Michelin-tire_baby': 2,
 'AMYOTROPHIC_LATERAL_SCLEROSIS_SUSCEPTIBILITY_TO_25^Myoclonus_intractable_neonatal^Spastic_paraplegia_10': 1,
 'Spondyloepimetaphyseal_dysplasia_Genevieve_type': 1,
 'Reduced_antithrombin_III_activity^Venous_thrombosis': 13,
 'Cousin_syndrome': 1,
 'Deafness_autosomal_recessive_1

In [426]:
mt.filter_rows(mt.info.ISEQ_CLINVAR_DISEASES.contains('Schizophrenia'), keep = True).count()

(11, 151)

In [445]:
mt.filter_rows(mt.info.ISEQ_CLINVAR_DISEASES.contains('Autism_17^Autism_spectrum_disorder'), keep = True).rsid.show()

locus,alleles,rsid
locus<GRCh38>,array<str>,str
chr11:70473268,"[""C"",""T""]","""rs140134890"""
chr11:70490374,"[""C"",""T""]","""rs117843717"""
chr11:70698758,"[""G"",""A""]",
chr11:70820627,"[""C"",""T""]",
chr11:70820628,"[""G"",""A""]",
chr11:70952780,"[""A"",""AACTCGCC""]",


In [468]:
a = 'Cowden_syndrome_1^Familial_cancer_of_breast^Glioma_susceptibility_2^Macrocephaly/autism_syndrome^Malignant_tumor_of_prostate^Meningioma_familial^Neoplasm_of_the_breast^Neoplasm_of_the_genitourinary_tract^PTEN_hamartoma_tumor_syndrome'

voi = mt.filter_rows(mt.info.ISEQ_CLINVAR_DISEASES.contains(a), keep = True) # VariantOfInterest

In [470]:
voi.GT.show()

locus,alleles,S_136.GT,S_170c.GT,S_170d.GT,S_6981.GT
locus<GRCh38>,array<str>,call,call,call,call
chr10:87931071,"[""G"",""A""]",0/0,0/0,0/0,0/0


In [471]:
voi.aggregate_entries(hl.agg.counter(voi.GT.n_alt_alleles()))

{0: 150, 1: 1}

<a id='5.1'></a> 


### 5.1 Are there known pathogenic variants in > 1 sample?


In [494]:
mt.filter_rows(mt.info.ISEQ_AGGREGATED_CLINVAR_SIGNIFICANCE.contains('pathogenic'), keep = True).count()

(114, 151)

In [496]:
pathogenic = mt.filter_rows(mt.info.ISEQ_AGGREGATED_CLINVAR_SIGNIFICANCE.contains('pathogenic'), keep = True)

In [499]:
pathogenic.aggregate_rows(hl.agg.counter(hl.delimit(pathogenic.info.AC)))

{'12': 1,
 '8': 1,
 '4': 6,
 '9': 1,
 '5': 4,
 '6': 2,
 '1': 61,
 '2': 21,
 '7': 3,
 '3': 14}

In [503]:
pathogenic.filter_rows(pathogenic.info.AC == [12]).rsid.show()

locus,alleles,rsid
locus<GRCh38>,array<str>,str
chr7:142750600,"[""A"",""C""]",


https://www.ncbi.nlm.nih.gov/clinvar/RCV000031923/#clinical-assertions this is the link to this variant 

<a id='6'></a> 

### 6. Annotate with CADD

In [486]:
db = hl.experimental.DB()

In [487]:
db

<hail.experimental.db.DB at 0x7f938d3b4650>

In [488]:
mt = db.annotate_rows_db(mt, 'CADD') # this is problem with google sdk in the container - to be fixed

FatalError: IOException: No FileSystem for scheme: gs

Java stack trace:
java.io.IOException: No FileSystem for scheme: gs
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
	at is.hail.io.fs.HadoopFS.is$hail$io$fs$HadoopFS$$_fileSystem(HadoopFS.scala:156)
	at is.hail.io.fs.HadoopFS.isDir(HadoopFS.scala:173)
	at is.hail.variant.RelationalSpec$.readMetadata(MatrixTable.scala:39)
	at is.hail.variant.RelationalSpec$.readReferences(MatrixTable.scala:71)
	at is.hail.variant.ReferenceGenome$.fromHailDataset(ReferenceGenome.scala:586)
	at is.hail.variant.ReferenceGenome.fromHailDataset(ReferenceGenome.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:745)



Hail version: 0.2.26-2dcc3d963867
Error summary: IOException: No FileSystem for scheme: gs