# Starting a Cluster on Nebari

This notebook demonstrates the way we like to start clusters on the
nebari cloud environment. 

In [None]:
import os
import logging 
try:
    from dask_gateway import Gateway
except ImportError:
    logging.error("Unable to import Dask Gateway.  Are you running in a cloud compute environment?\n")
    raise


## Dask Gateway Options

The cluster scheduler on nebari makes use of a `Gateway`. This handles the 
instantiation of clusters of workers, and gives us a way to monitor their
progress. Gateways are not used on all clustered systems (KubeCluster is
one alternative you might find on other cloud platforms -- such as `pangeo.chs.usgs.gov`). 

In [None]:
gateway = Gateway()
os.environ['DASK_DISTRIBUTED__SCHEDULER__WORKER_SATURATION'] = "1.0"
_options = gateway.cluster_options()
_options.conda_environment='users/users-pangeo'  ##<< this is the conda environment we use on nebari.
_options.profile = 'Medium Worker'

## AWS Environment Variables
By default, the cluster does not hand the entire set of environment variables to
each of the workers. This is an important default to override in the case of the
AWS configuration parameters. 

Because individual workers in the cluster do not have access to the standard file
system (where `~/.aws/credentials` is), the workers do not have a way to obtain
their AWS credentials unless we hand them over as environment variables. So... we
have to establish key variables in the environment, and explicity pass those to
the cluster workers at the time the cluster is started: 

In [None]:
_env_to_add={}
aws_env_vars=['AWS_ACCESS_KEY_ID',
              'AWS_SECRET_ACCESS_KEY',
              'AWS_SESSION_TOKEN',
              'AWS_DEFAULT_REGION',
              'AWS_S3_ENDPOINT']
for _e in aws_env_vars:
    if _e in os.environ:
        _env_to_add[_e] = os.environ[_e]
_options.environment_vars = _env_to_add    

## Cluster Start

In [None]:
cluster = gateway.new_cluster(_options)          ##<< create cluster via the dask gateway
cluster.adapt(minimum=10, maximum=30)             ##<< Sets scaling parameters. 
client = cluster.get_client()

## Notify
Give the user the link by which they can monitor the cluster workers' progress and status. 

In [None]:
print("The 'cluster' object can be used to adjust cluster behavior.  i.e. 'cluster.adapt(minimum=10)'")
print("The 'client' object can be used to directly interact with the cluster.  i.e. 'client.submit(func)' ")
print(f"The link to view the client dashboard is:\n>  {client.dashboard_link}")