Skip to content

mptrepanier/spark-saturday-advanced-hail

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Spark Saturday Advanced - Hail 0.2 on Databricks

Instructions to Register for Free Databricks Community Edition

Instructions for Creating Hail Resource Libraries

The Hail resources necessary for running this tutorial are located in the resources folder.

(Based on Hail-0.2-7a98b6a65d44. For other builds, please see resources in Hail's artifact bucket located at gs://hail-common/builds)

After downloading them, navigate to the Workspace/Users section of the notebook, and then to your user folder.

workspace

Within your user folder, right click and select create library.

createlibrary

On the next screen, create a library for the hail-all-spark.jar (name it whatever you want). Select the option to automatically have this library attached to all clusters (or manually attach it at a later point). Repeat the process for the .egg file.

Instructions for Creating a Cluster

Navigate to "Clusters" on the left-hand side of the screen and select "Create Cluster" on the top of the page that appears.

cluster

Name your cluster and then fill in the Spark Config like below.

sparkconfig

spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator is.hail.kryo.HailKryoRegistrator
spark.databricks.delta.preview.enabled true
spark.driver.extraClassPath ./hail-all-spark.jar
spark.executor.extraClassPath ./hail-all-spark.jar

Finally, click create cluster.

Importing the Hail Notebook

Please note that this notebook was adapted from https://hail.is/docs/stable/tutorials/01-genome-wide-association-study.html for use on Databricks.

Useful Links

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published