Code and Presentation for PyCon2016
- Follow the instructions on iota to setup execution environment and get and process the raw StackExchange Data.
Databricks Community Edition
- Sign up for free community version edition of Databricks
- Create a new spark cluster (Spark 1.6.1 and Hadoop 2)
- To import the Jupyter Notebooks from
notebookdirectory, select Workspace option from the menu in the left, click on the dropdown next to the username and select Import.
- Select the URL option, paste the url of the notebook and click Import. For example, url of Word2Vec notebook is https://github.com/shagunsodhani/PyCon2016/blob/master/notebook/AskUbuntu/Question/Word2Vec.ipynb
- Fetch the processed parquet files from here, upload them to Databricks as explained here and set the
dbfs_question_data_pathto path where parquet files are uploaded.
- Processed data (which is used for demo) can be downloaded here.
- For getting the latest version of data, follow the instructions on iota to setup execution environment and get and process the raw StackExchange Data.
- This repo only contains notebooks which will be demoed at PyCon2016.
- For more notebooks related to Stack Exchange data, check out iota
The image showing the workflow to import the notebooks is created by Databricks. Licence at https://creativecommons.org/licenses/by-nc-nd/4.0/