# Getting Started with Pilot-Streaming on Wrangler

In the first step we need to import all required packages and modules into the Python Path

In [1]:
# System Libraries
import sys, os
sys.path.append("..")
import pandas as pd

## logging
import logging
logging.basicConfig(level=logging.DEBUG)
logging.getLogger().setLevel(logging.ERROR)
logging.getLogger("py4j").setLevel(logging.ERROR)
 

# Pilot-Streaming
import pilot.streaming
sys.modules['pilot.streaming']

<module 'pilot.streaming' from '/home/01131/tg804093/anaconda3/lib/python3.6/site-packages/pilot/streaming.py'>

The Pilot-Compute Description is a simple key/value style description of the cluster environment that should be started. Alternatively, the commandline tool delivered with this package can be used:

     pilot-streaming --resource=slurm://localhost --queue=normal --walltime=59 --number_cores=48 --framework spark 

# 1. Spark

In [2]:
### Required Spark configuration that needs to be provided before pyspark is imported and JVM started
#os.environ["SPARK_LOCAL_IP"]='129.114.58.2' #must be done before pyspark is loaded
import pyspark
import os

pilot_compute_description = {
    "resource":"slurm+ssh://login1.wrangler.tacc.utexas.edu",
    "working_directory": os.path.join('/work/01131/tg804093/wrangler/', "work"),
    "number_of_nodes": 1,
    "cores_per_node": 48,
    "project": "TG-MCB090174",
    "queue": "normal",
    "walltime": 59,
    "type":"spark"
}

Start Spark Cluster and Wait for Startup Completion

In [3]:
%%time

spark_pilot = pilot.streaming.PilotComputeService.create_pilot(pilot_compute_description)
spark_pilot.wait()

DEBUG:pilot-streaming:Pilot-Streaming SLURM: Parsing job description
DEBUG:pilot-streaming:Submit pilot job to: slurm+ssh://login1.wrangler.tacc.utexas.edu
DEBUG:pilot-streaming:Type Job IDps-25490


/tmp/tmp3c5oggdo
Submission of Job Command: ssh login1.wrangler.tacc.utexas.edu sbatch  tmp3c5oggdo
Cleanup: ssh login1.wrangler.tacc.utexas.edu rm tmp3c5oggdo


DEBUG:pilot-streaming:Pilot-Streaming SLURM: SSH run job finished
DEBUG:pilot-streaming:Output - 

To access the system:

1) If not using ssh-keys, please enter your TACC password at the password prompt
2) At the TACC Token prompt, enter your 6-digit code followed by <return>.

---------------------------------------------------------------
          Welcome to the Wrangler Supercomputer                 
---------------------------------------------------------------

No reservation for this job
--> Verifying valid submit host (login1)...OK
--> Verifying valid jobname...OK
--> Enforcing max jobs per user...OK
--> Verifying availability of your home dir (/home/01131/tg804093)...OK
--> Verifying availability of your work dir (/work/01131/tg804093/wrangler)...OK
--> Verifying valid ssh keys...OK
--> Verifying access to desired queue (normal)...OK
--> Verifying job request is within current queue limits...OK
--> Checking available allocation (TG-MCB090174)...OK
Submitted batch job 91691

D

**** Job: 91691 State : Running
Create Spark Context for URL: spark://129.114.58.103:7077
Create Spark Context for URL: spark://129.114.58.103:7077
CPU times: user 13.8 ms, sys: 23.4 ms, total: 37.2 ms
Wall time: 27.7 s


In [4]:
spark_pilot.get_details()

Create Spark Context for URL: spark://129.114.58.103:7077


{'spark_home': '/work/01131/tg804093/wrangler/work/spark-2548d178-efe1-11e8-8522-549f3509766c/spark-2.4.0-bin-hadoop2.7',
 'master_url': 'spark://129.114.58.103:7077',
 'web_ui_url': 'http://129.114.58.103:8080'}

In [5]:
#sc = pyspark.SparkContext(master="spark://129.114.58.135:7077", appName="test")

In [6]:
sc = spark_pilot.get_context()

Create Spark Context for URL: spark://129.114.58.103:7077


In [8]:
rdd = sc.parallelize([1,2,3])
rdd.map(lambda a: a*a).collect()

[1, 4, 9]

In [9]:
spark_pilot.cancel()

DEBUG:pilot-streaming:Cancel SLURM job


# 2. Kafka

In [4]:
pilot_compute_description = {
    "resource":"slurm://localhost",
    "working_directory": os.path.join('/work/01131/tg804093/wrangler/', "work"),
    "number_of_nodes": 1,
    "cores_per_node": 48,
    "project": "TG-MCB090174",
    "queue": "normal",
    "walltime": 59,
    "type":"kafka"
}

In [None]:
%%time
kafka_pilot = pilot.streaming.PilotComputeService.create_pilot(pilot_compute_description)
kafka_pilot.wait()

DEBUG:pilot-streaming:Pilot-Streaming SLURM: Parsing job description
DEBUG:pilot-streaming:Submit pilot job to: slurm://localhost
DEBUG:pilot-streaming:Type Job IDps-c97bd


/tmp/tmpt27cwoql
Submission of Job Command: ssh login1.wrangler.tacc.utexas.edu sbatch  tmpt27cwoql
Cleanup: ssh login1.wrangler.tacc.utexas.edu rm tmpt27cwoql


DEBUG:pilot-streaming:Pilot-Streaming SLURM: SSH run job finished
DEBUG:pilot-streaming:Output - 

To access the system:

1) If not using ssh-keys, please enter your TACC password at the password prompt
2) At the TACC Token prompt, enter your 6-digit code followed by <return>.

---------------------------------------------------------------
          Welcome to the Wrangler Supercomputer                 
---------------------------------------------------------------

No reservation for this job
--> Verifying valid submit host (login1)...OK
--> Verifying valid jobname...OK
--> Enforcing max jobs per user...OK
--> Verifying availability of your home dir (/home/01131/tg804093)...OK
--> Verifying availability of your work dir (/work/01131/tg804093/wrangler)...OK
--> Verifying valid ssh keys...OK
--> Verifying access to desired queue (normal)...OK
--> Verifying job request is within current queue limits...OK
--> Checking available allocation (TG-MCB090174)...OK
Submitted batch job 91694

D

**** Job: 91694 State : Running


In [10]:
kafka_pilot.get_details()

{'details': {'broker.id': '0',
  'listeners': 'PLAINTEXT://c251-132:9092',
  'zookeeper.connect': 'c251-132:2181',
  'zookeeper.connection.timeout.ms': '6000'},
 'master_url': 'c251-132:2181'}

In [11]:
kafka_pilot.cancel()

# 3. Dask

In [2]:
import distributed

pilot_compute_description = {
    "resource":"slurm://localhost",
    "working_directory": os.path.join('/work/01131/tg804093/wrangler/', "work"),
    "number_of_nodes": 1,
    "cores_per_node": 48,
    "project": "TG-MCB090174",
    "queue": "normal",
    "walltime": 59,
    "type":"dask"
}

In [3]:
%%time
dask_pilot = pilot.streaming.PilotComputeService.create_pilot(pilot_compute_description)
dask_pilot.wait()

DEBUG:pilot-streaming:Pilot-Streaming SLURM: Parsing job description
DEBUG:pilot-streaming:Submit pilot job to: slurm://localhost
DEBUG:pilot-streaming:Type Job IDps-3ed06


/tmp/tmpkz74oax4
Submission of Job Command: ssh login1.wrangler.tacc.utexas.edu sbatch  tmpkz74oax4
Cleanup: ssh login1.wrangler.tacc.utexas.edu rm tmpkz74oax4


DEBUG:pilot-streaming:Pilot-Streaming SLURM: SSH run job finished
DEBUG:pilot-streaming:Output - 

To access the system:

1) If not using ssh-keys, please enter your TACC password at the password prompt
2) At the TACC Token prompt, enter your 6-digit code followed by <return>.

---------------------------------------------------------------
          Welcome to the Wrangler Supercomputer                 
---------------------------------------------------------------

No reservation for this job
--> Verifying valid submit host (login1)...OK
--> Verifying valid jobname...OK
--> Enforcing max jobs per user...OK
--> Verifying availability of your home dir (/home/01131/tg804093)...OK
--> Verifying availability of your work dir (/work/01131/tg804093/wrangler)...OK
--> Verifying valid ssh keys...OK
--> Verifying access to desired queue (normal)...OK
--> Verifying job request is within current queue limits...OK
--> Checking available allocation (TG-MCB090174)...OK
Submitted batch job 91058

D

**** Job: 91058 State : Queue


KeyboardInterrupt: 

In [7]:
dask_pilot.get_details()

{'master_url': 'tcp://c251-135:8786', 'web_ui_url': 'http://c251-135:8787'}

In [15]:
import distributed
dask_client  = distributed.Client(dask_pilot.get_details()['master_url'])
dask_client.scheduler_info()

{'address': 'tcp://129.114.58.135:8786',
 'id': 'Scheduler-363ae53b-1276-4ffc-bdc7-70b1aeb4283a',
 'services': {'bokeh': 8787},
 'type': 'Scheduler',
 'workers': {'tcp://129.114.58.135:41796': {'cpu': 8.0,
   'executing': 0,
   'host': '129.114.58.135',
   'in_flight': 0,
   'in_memory': 10,
   'last-seen': 1515036799.385555,
   'last-task': 1515036750.2663264,
   'local_directory': '/home/01131/tg804093/dask-worker-space/worker-lkiSY_',
   'memory': 103673856,
   'memory_limit': 134778585088,
   'name': 'tcp://129.114.58.135:41796',
   'ncores': 48,
   'num_fds': 24,
   'pid': 44991,
   'read_bytes': 158293.4720896321,
   'ready': 0,
   'services': {'bokeh': 8789, 'nanny': 42225},
   'time': 1515036798.885611,
   'time-delay': 0.00038909912109375,
   'write_bytes': 158293.4720896321}}}

In [14]:
dask_client.gather(dask_client.map(lambda a: a*a, range(10)))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]