Skip to content

Commit

Permalink
FEAT-#3709: Update Modin tutorials (#3658)
Browse files Browse the repository at this point in the history
Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>
Signed-off-by: Rehan Durrani <rehan@ponder.io>
  • Loading branch information
RehanSD and dorisjlee committed Jan 24, 2022
1 parent 154697b commit 76707bf
Show file tree
Hide file tree
Showing 7 changed files with 494 additions and 342 deletions.
5 changes: 3 additions & 2 deletions examples/tutorial/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
fsspec
s3fs
ray==1.0.0
jupyterlab
git+https://github.com/modin-project/modin
ipywidgets
tqdm
modin[all]
146 changes: 0 additions & 146 deletions examples/tutorial/tutorial_notebooks/cluster/exercise_4.ipynb

This file was deleted.

123 changes: 42 additions & 81 deletions examples/tutorial/tutorial_notebooks/cluster/exercise_5.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -13,55 +13,33 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercise 5: Executing on a cluster environment\n",
"# Exercise 5: Setting up cluster environment\n",
"\n",
"**GOAL**: Learn how to connect Modin to a Ray cluster and run pandas queries on a cluster.\n",
"**GOAL**: Learn how to set up a cluster for Modin.\n",
"\n",
"**NOTE**: Exercise 4 must be completed first, this exercise relies on the cluster created in Exercise 4."
"**NOTE**: This exercise has extra requirements. Read instructions carefully before attempting. \n",
"\n",
"**This exercise instructs the user on how to start a 700+ core cluster, and it is not shut down until the end of Exercise 5. Read instructions carefully.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Modin performance scales as the number of nodes and cores increases. In this exercise, we will reproduce the data from the plot below.\n",
"Often in practice we have a need to exceed the capabilities of a single machine. Modin works and performs well in both local mode and in a cluster environment. The key advantage of Modin is that your notebook does not change between local development and cluster execution. Users are not required to think about how many workers exist or how to distribute and partition their data; Modin handles all of this seamlessly and transparently.\n",
"\n",
"![ClusterPerf](../img/modin_cluster_perf.png)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Don't change this cell!\n",
"import ray\n",
"ray.init(address=\"auto\")\n",
"import modin.pandas as pd\n",
"from modin.config import NPartitions\n",
"if NPartitions.get() != 768:\n",
" print(\"This notebook was designed and tested for an 8 node Ray cluster. \"\n",
" \"Proceed at your own risk!\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!du -h big_yellow.csv"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"df = pd.read_csv(\"big_yellow.csv\", quoting=3)"
"![Cluster](../img/modin_cluster.png)\n",
"\n",
"**Extra Requirements for this exercise**\n",
"\n",
"Detailed instructions can be found here: https://docs.ray.io/en/latest/cluster/cloud.html\n",
"\n",
"From command line:\n",
"- `pip install boto3`\n",
"- `aws configure`\n",
"- `ray up modin-cluster.yaml`\n",
"\n",
"Included in this directory is a file named [`modin-cluster.yaml`](https://github.com/modin-project/modin/blob/master/examples/tutorial/tutorial_notebooks/cluster/modin-cluster.yaml). We will use this to start the cluster."
]
},
{
Expand All @@ -70,8 +48,7 @@
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"count_result = df.count()"
"# !pip install boto3"
]
},
{
Expand All @@ -80,18 +57,18 @@
"metadata": {},
"outputs": [],
"source": [
"# print\n",
"count_result"
"# !aws configure"
]
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"groupby_result = df.groupby(\"passenger_count\").count()"
"## Starting and connecting to the cluster\n",
"\n",
"This example starts 1 head node (m5.24xlarge) and 7 workers (m5.24xlarge), 768 total CPUs.\n",
"\n",
"Cost of this cluster can be found here: https://aws.amazon.com/ec2/pricing/on-demand/."
]
},
{
Expand All @@ -100,18 +77,14 @@
"metadata": {},
"outputs": [],
"source": [
"# print\n",
"groupby_result"
"# !ray up modin-cluster.yaml"
]
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"apply_result = df.applymap(str)"
"Connect to the cluster with `ray attach`"
]
},
{
Expand All @@ -120,8 +93,7 @@
"metadata": {},
"outputs": [],
"source": [
"# print\n",
"apply_result"
"# !ray attach modin-cluster.yaml"
]
},
{
Expand All @@ -130,40 +102,29 @@
"metadata": {},
"outputs": [],
"source": [
"ray.shutdown()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Shutting down the cluster\n",
"# DO NOT CHANGE THIS CODE!\n",
"# Changing this code risks breaking further exercises\n",
"\n",
"**You may have to change the path below**. If this does not work, log in to your \n",
"\n",
"Now that we have finished computation, we can shut down the cluster with `ray down`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!ray down modin-cluster.yaml"
"import time\n",
"time.sleep(600) # We need to give ray enough time to start up all the workers\n",
"import ray\n",
"ray.init(address=\"auto\")\n",
"from modin.config import NPartitions\n",
"assert NPartitions.get() == 768, \"Not all Ray nodes are started up yet\"\n",
"ray.shutdown()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### This ends the cluster exercise"
"### Please move on to Exercise 6"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -177,7 +138,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
"version": "3.9.9"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 76707bf

Please sign in to comment.