# Ray Serve Implementation of a Simulated Data Governance Server

It logs the ids of records observed in a data processing pipeline.

> **Note:** Run all the cells in this notebook before running [Spark-RayServiceUDF.ipynb](Spark-RayServiceUDF.ipynb).

To learn more about Ray:
* [Ray.io](http://ray.io)
* [Ray Serve](https://docs.ray.io/en/master/rayserve/overview.html)

[Dean Wampler](mailto:dean@anyscale.com)

In [1]:
import sys, time, json, requests

In [2]:
from ray import serve
import ray

In [3]:
sys.path.append('..')
from data_governance_system import DataGovernanceSystem, Record
from data_governance_ray_serve import init_service

In [4]:
port = 8100

In [5]:
ray.init()  # Run in local mode. Pass 'auto' to connect to a running cluster.

2020-05-18 06:31:55,009	INFO resource_spec.py:212 -- Starting Ray with 4.15 GiB memory available for workers and up to 2.08 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-05-18 06:31:55,335	INFO services.py:1170 -- View the Ray dashboard at [1m[32mlocalhost:8265[39m[22m


{'node_ip_address': '192.168.1.149',
 'raylet_ip_address': '192.168.1.149',
 'redis_address': '192.168.1.149:29731',
 'object_store_address': '/tmp/ray/session_2020-05-18_06-31-55_001773_75974/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2020-05-18_06-31-55_001773_75974/sockets/raylet',
 'webui_url': 'localhost:8265',
 'session_dir': '/tmp/ray/session_2020-05-18_06-31-55_001773_75974'}

In [6]:
print(f'Click here to open the Ray Dashboard: http://{ray.get_webui_url()}')

Click here to open the Ray Dashboard: http://localhost:8265


In [7]:
init_service('ray_serve_demo', port)

[2m[36m(pid=76319)[0m 2020-05-18 06:32:07,175	INFO master.py:122 -- Starting router with name 'SERVE_ROUTER_ACTOR'
[2m[36m(pid=76319)[0m 2020-05-18 06:32:07,179	INFO master.py:143 -- Starting HTTP proxy with name 'SERVE_PROXY_ACTOR'
[2m[36m(pid=76319)[0m 2020-05-18 06:32:07,184	INFO master.py:168 -- Starting metric monitor with name 'SERVE_METRIC_MONITOR_ACTOR'
[2m[36m(pid=76319)[0m 2020-05-18 06:32:07,197	INFO master.py:483 -- Registering route /log to endpoint log with methods ['PUT'].
[2m[36m(pid=76320)[0m INFO:     Started server process [76320]
[2m[36m(pid=76320)[0m INFO:     Waiting for application startup.
[2m[36m(pid=76320)[0m INFO:     Application startup complete.
[2m[36m(pid=76319)[0m 2020-05-18 06:32:07,263	INFO master.py:483 -- Registering route /ids to endpoint ids with methods ['GET'].
[2m[36m(pid=76319)[0m 2020-05-18 06:32:07,300	INFO master.py:483 -- Registering route /count to endpoint count with methods ['GET'].
[2m[36m(pid=76319)[0m 202

'ray_serve_demo'

In [8]:
def do_test(port, num_records=10, timeout=1.0):
    """Try out the server."""
    records = [Record(i, f'data for record {i}') for i in range(num_records)] # sample "records"

    address = f'http://127.0.0.1:{port}'
    print(f'Putting {num_records} records... to {address}')
    for record in records:
        response = requests.put(f'{address}/log?id={record.record_id}', timeout=timeout)
        print(f'log response = {response.json()}')

    count = requests.get(f'{address}/count', timeout=timeout)
    print(f'count:  {count.json()}')

    ids = requests.get(f'{address}/ids', timeout=timeout)
    print(f'ids:    {ids.json()}')
    
    up_time = requests.get(f'{address}/up_time', timeout=timeout)
    print(f'uptime: {up_time.json()}')

In [9]:
do_test(port)

Putting 10 records... to http://127.0.0.1:8100
log response = {'message': 'sent async log request for 0'}
log response = {'message': 'sent async log request for 1'}
log response = {'message': 'sent async log request for 2'}
log response = {'message': 'sent async log request for 3'}
log response = {'message': 'sent async log request for 4'}
log response = {'message': 'sent async log request for 5'}
log response = {'message': 'sent async log request for 6'}
log response = {'message': 'sent async log request for 7'}
log response = {'message': 'sent async log request for 8'}
log response = {'message': 'sent async log request for 9'}
count:  {'count': 10}
ids:    {'ids': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']}
uptime: {'up_time': 9.21183180809021}


Reset (removed the logged ids)

In [10]:
requests.put(f'http://127.0.0.1:{port}/reset')

<Response [200]>

In [11]:
count = requests.get(f'http://127.0.0.1:{port}/count')
print(f'count now = {count.json()}')

count now = {'count': 0}
