## Connect to existing Dask cluster

First, we get the list of Dask clusters running on Coiled:

In [1]:
import coiled
clusters = coiled.list_clusters()
clusters

{'koverholt-289f7354-0': {'id': 4089,
  'status': 'running',
  'account': 'koverholt',
  'dashboard_address': 'http://ec2-18-220-201-179.us-east-2.compute.amazonaws.com:8787',
  'configuration': 62,
  'options': {},
  'address': 'tls://ec2-18-220-201-179.us-east-2.compute.amazonaws.com:8786'},
 'koverholt-56c29a80-4': {'id': 4090,
  'status': 'pending',
  'account': 'koverholt',
  'dashboard_address': '',
  'configuration': 62,
  'options': {},
  'address': ''}}

Now we can connect to and reuse the first avilable Dask cluster:

In [2]:
cluster_name = list(clusters.keys())[0]
cluster = coiled.Cluster(
    name=cluster_name,
)

Using existing cluster: koverholt-289f7354-0


Let's point the `distributed` client to the Dask cluster on Coiled and output the link to the dashboard:

In [3]:
from dask.distributed import Client
client = Client(cluster)
print('Dashboard:', client.dashboard_link)

Dashboard: http://ec2-18-220-201-179.us-east-2.compute.amazonaws.com:8787



+---------+--------+-----------+---------+
| Package | client | scheduler | workers |
+---------+--------+-----------+---------+
| numpy   | 1.18.5 | 1.19.5    | 1.19.5  |
+---------+--------+-----------+---------+


In [4]:
import dask.dataframe as dd

df = dd.read_csv(
    "s3://nyc-tlc/trip data/yellow_tripdata_2019-*.csv",
    dtype={
        "payment_type": "UInt8",
        "VendorID": "UInt8",
        "passenger_count": "UInt8",
        "RatecodeID": "UInt8",
    },
    storage_options={"anon": True},
    blocksize="16 MiB",
).persist()

df.groupby("passenger_count").tip_amount.mean().compute()

passenger_count
0    2.122789
1    2.206790
2    2.214306
3    2.137775
4    2.023804
5    2.235441
6    2.221105
7    6.675962
8    7.111625
9    7.377822
Name: tip_amount, dtype: float64