# Anyscale Services + Canary Rollout Features

Anyscale Services is the part of the Anyscale platform which provides web endpoints to Ray Serve applications. Anyscale Services provides key production features including
* High availability (HA)
* Canary rollouts for new service versions
* Extensive monitoring/management
* Support for the entire Ray platform, FastAPI, and applications which go beyond Ray

## Setup

The service versions are implemented in Python using standard Ray Serve APIs
* `1-hello-chat.py` - skeleton for a chat service, it generates a response in a trivial static manner
* `2-llm-chat.py` - our real LLM chat service

Each service version has a corresponding YAML file used to deploy that version -- in this case, `1-service.yaml` and `2-service.yaml`.

Extended configuration is possible via these YAML files as well as via CLI parameters, but the present examples are minimal starting points for clarity.

Currently, the service is not deployed

## Initial service rollout

In [None]:
! anyscale service rollout -f 1-service.yaml

In the Anyscale UI (or via logs) we can monitor the initial rollout

> We can launch `load_test.py` in a console to generate a steady stream of requests to our service

Get a web authentication token from the Anyscale UI and place it in token.txt

In [None]:
with open("token.txt", "r") as f:
    token = f.read()

Each Anyscale service has a unique URL -- calls to this URL will be routed automatically during the version changeover

> Get your service URL from the Anyscale UI

In [None]:
base_url = "https://llms-in-prod-gppbq.cld-kvedzwag2qa8i5bj.s.anyscaleuserdata.com/"

We'll set up minimal code to make a request to our service

In [None]:
import requests

path = "/"
full_url = f"{base_url}{path}"
headers = {"Authorization": f"Bearer {token}"}

In [None]:
sample_json = '{ "user_input" : "hello" }'

requests.post(
    base_url, headers={"Authorization": "Bearer " + token}, json=sample_json
).json()

## Upgrading the service to real LLM chat

To change over to a real LLM-backed chat service, we run another similar CLI command

Note that, although this demo uses the same configurations for the initial and "real" services, we can upgrade a service to code requiring different hardware and/or different software. The only thing that has to stay the same is the service name.

### Demo of canary rollouts

We'll demonstrate rolling out multiple service versions while monitoring both externally and via Anyscale


The canary rollout feature allows zero-downtime upgrades as a live service transitions from one implementation to a new one
* Additional clusters are automatically provisioned by Anyscale for new service versions
    * service versions do *not* need to share config, dependencies, or even hardware requirements
    * the only thing that stays the same is the (internal) name and external endpoint
* Load is gradually shifted from the old service to the new one by Anyscale load balancers
    * rollout (changeover) schedule can be automatic, customized, or manually controlled
* Status of old and new versions are visible and accessible simultaneously in the Anyscale UI
    * Grafana integration shows realtime statistics on service transition
    * __Rollback__ feature is available if it is necessary to abort the transition and return all traffic to original service

In [None]:
! anyscale service rollout -f 2-service.yaml

At this point, we may want to observe the canary rollout service changeover in
* the Anyscale service UI
* Grafana timeseries chart of all-version traffic

In [None]:
sample_json = '{ "user_input" : "When did Taylor Swift\'s Eras Tour Start?" }'

requests.post(
    base_url, headers={"Authorization": "Bearer " + token}, json=sample_json
).json()

We can further manage the service via Anyscale UI, Python SDK or CLI