- Go to http://databricks.com/try-databricks
- Click on Get Started - Free
- Fill in the requested information and hit sign up. Ensure you use an email address that can access emails.
The Hail resources necessary for running this tutorial are located in the resources folder.
(Based on Hail-0.2-7a98b6a65d44. For other builds, please see resources in Hail's artifact bucket located at gs://hail-common/builds
)
After downloading them, navigate to the Workspace/Users section of the notebook, and then to your user folder.
Within your user folder, right click and select create library.
On the next screen, create a library for the hail-all-spark.jar
(name it whatever you want). Select the option to automatically have this library attached to all clusters (or manually attach it at a later point). Repeat the process for the .egg
file.
Navigate to "Clusters" on the left-hand side of the screen and select "Create Cluster" on the top of the page that appears.
Name your cluster and then fill in the Spark Config like below.
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator is.hail.kryo.HailKryoRegistrator
spark.databricks.delta.preview.enabled true
spark.driver.extraClassPath ./hail-all-spark.jar
spark.executor.extraClassPath ./hail-all-spark.jar
Finally, click create cluster.
Please note that this notebook was adapted from https://hail.is/docs/stable/tutorials/01-genome-wide-association-study.html for use on Databricks.
- Click the "Workspace" button on the left-hand side of your screen.
- Navigate to your user tab as before.
- Right click and select "import."
- Select URL and paste the URL to the
.ipynb
into the input: https://github.com/mptrepanier/spark-saturday-advanced-hail/blob/master/hail-tutorial-spark-saturday-advanced.ipynb
- Hail, the Spark-based genomic analysis software: https://hail.is
- Spark 2.2.0 documentation: http://spark.apache.org/docs/2.2.0
- Anaconda documentation: https://docs.continuum.io/anaconda/navigator/tutorials/
- Databricks documentation: http://docs.databricks.com