# Getting Started with Pilot-Streaming and Edge on LRZ (Jetstream/TACC WIP)

In the first step we need to import all required packages and modules into the Python Path

Pilot-Streaming can be used to manage the Dask and Kafka environments both in the cloud and on the edge. 



`resource`: URL of the Local Resource Manager. Examples:

* `slurm://localhost`: Submit to local SLURM resource manager, e.g. on master node of Wrangler or Stampede
* `slurm+ssh://login1.wrangler.tacc.utexas.edu`: Submit to Wrangler master node SLURM via SSH (e.g. on node running a job)
* `os://` Openstack
* `ec2://` EC2


`type:` The `type` attributes specifies the cluster environment. It can be: `Spark`, `Dask` or `Kafka`.


Depending on the resource there might be other configurations necessary, e.g. to ensure that the correct subnet is used the Spark driver can be configured using various environment variables:   os.environ["SPARK_LOCAL_IP"]='129.114.58.2'



In [None]:
# Pilot-Streaming
import os, sys
import distributed
import json
import pilot.streaming
import getpass
import socket

#configure loggin
import logging
logging.getLogger().setLevel(logging.DEBUG)
logging.getLogger("stevedore.extension").setLevel(logging.CRITICAL)
logging.getLogger("keystoneauth").setLevel(logging.CRITICAL)
logging.getLogger("urllib3.connectionpool").setLevel(logging.CRITICAL)
logging.getLogger("asyncio").setLevel(logging.CRITICAL)



sys.modules['pilot.streaming']

RESOURCE_URL_HPC="slurm+ssh://login4.stampede2.tacc.utexas.edu"
WORKING_DIRECTORY=os.path.join(os.environ["HOME"], "work")

#RESOURCE_URL_EDGE="ssh://js-17-136.jetstream-cloud.org"
RESOURCE_URL_EDGE="os://cc.lrz.de"
#RESOURCE_URL_EDGE="ssh://localhost"
WORKING_DIRECTORY_EDGE="/home/aluckow"

# 1. Dask on Jetstream (pre-launched VM)


In [None]:
pilot_compute_description = json.load(open("config/openstack_description_lrz.json", "r"))
pilot_compute_description

In [None]:
pilot_compute_description["os_password"] = getpass.getpass()


In [None]:
os_pilot = pilot.streaming.PilotComputeService.create_pilot(pilot_compute_description)
os_pilot.wait()

In [None]:
os_pilot.get_details()

In [None]:
import distributed
dask_client  = distributed.Client(os_pilot.get_details()['master_url'])
#dask_client  = distributed.Client()
dask_client.scheduler_info()

In [None]:
dask_client.gather(dask_client.map(lambda a: a*a, range(10)))

In [None]:
dask_client.gather(dask_client.map(lambda a: socket.gethostname(), range(10)))

## Test edge dask behind firewall

In [None]:
dask_client  = distributed.Client("tcp://138.246.235.6:8786")
dask_client.scheduler_info()


## Stop Cluster

In [None]:
os_pilot.cancel()

# 2. Start Kafka Cluster


In [None]:
RESOURCE_URL="slurm+ssh://login4.stampede2.tacc.utexas.edu"
WORKING_DIRECTORY=os.path.join(os.environ["HOME"], "work")


In [None]:
pilot_compute_description = {
    "resource":RESOURCE_URL,
    "working_directory": WORKING_DIRECTORY,
    "number_of_nodes": 1,
    "cores_per_node": 48,
    "project": "TG-MCB090174",
    "queue": "normal",
    "config_name": "stampede",
    "walltime": 59,
    "type":"kafka"
}

In [None]:
%%time
kafka_pilot = pilot.streaming.PilotComputeService.create_pilot(pilot_compute_description)
kafka_pilot.wait()


# 3. Start Stream Processing on Kafka/Dask