# Introduction to Ray
---
Welcome!

# 1. Ray Core
Ray Core is the foundation of the entire Ray ecosystem. With simple primitives, it lets every engineer easily build scalable, distributed systems in Python in a cloud-agnostic way.

### Why Ray Core?
- Simple Primitives: flexibly compose distributed applications with tasks, actors, and objects in native Python code.
- Multi-cloud: run the same Ray code on any cloud --AWS, GCP, Azure -- or even on-prem.
- Dynamic Scaling: Ray Core can automatically scale (using Ray Autoscaler) up or down to smoothly handle changing compute load.
- Massive Scalability: Ray Core can easily scale to thousands of cores, and is getting more scalable with every release.
- Open Community / Ecosystem: With a vibrant dedicated community and rich ecosystem of integrations, fixes, and best practices are easy to find.
- Laptop -> Cluster With Ease: With Ray Client, going from laptop to cluster is as easy as changing 1 variable

In [None]:
# Approximate pi using random sampling. Generate x and y randomly between 0 and 1. 
#  if x^2 + y^2 < 1 it's inside the quarter circle. x 4 to get pi. 
import ray
from random import random

# Let's start Ray
ray.init()

SAMPLES = 1000000; 
# By adding the `@ray.remote` decorator, a regular Python function
# becomes a Ray remote function.
@ray.remote
def pi4_sample():
    in_count = 0
    for _ in range(SAMPLES):
        x, y = random(), random()
        if x*x + y*y <= 1:
            in_count += 1
    return in_count

# To invoke this remote function, use the `remote` method.
# This will immediately return an object ref (a future) and then create
# a task that will be executed on a worker process. Get retreives the result. 
future = pi4_sample.remote()
pi = ray.get(future) * 4.0 / SAMPLES
print(f'{pi} is an approximation of pi') 

# Now let's do this 100,000 times. 
# With regular python this would take 11 hours
# Ray on a modern laptop, roughly 2 hours
# On a 10-node Ray cluster, roughly 10 minutes 
BATCHES = 100000
results = [] 
for _ in range(BATCHES):
    results.append(pi4_sample.remote())
output = ray.get(results)
pi = sum(output) * 4.0 / BATCHES / SAMPLES
print(f'{pi} is a way better approximation of pi') 

### Summary
#### Key Concepts
#### Key API Elements in This Section
#### Next

# 2. Ray AIR
What it is, and what is it in relation to the other libraries.

## Ray Data
Ray Datasets are the standard way to load and exchange data in Ray libraries and applications. They provide basic distributed data transformations such as map, filter, and repartition, and are compatible with a variety of file formats, data sources, and distributed frameworks.
### Why Ray Data?
- Built for Scale: Run basic data operations such as map, filter, repartition, and shuffle on petabyte-scale data in native Python code
- Distributed Arrow: With a distributed Arrow backend, it easily works with a variety of file formats, data sources, and distributed frameworks.
- Ray Ecosystem: Load your data once and enjoy a pluggable experience Ray once your data is in your Ray cluster with Datasets, leveraging Ray is a breeze.

In [None]:
# Create a Dataset of Python objects.
ds = ray.data.range(10000)
# -> Dataset(num_blocks=200, num_rows=10000, schema=<class 'int'>)

ds.take(5)
# -> [0, 1, 2, 3, 4]

ds.count()
# -> 10000

# Create a Dataset of Arrow records.
ds = ray.data.from_items([{"col1": i, "col2": str(i)} for i in range(10000)])
# -> Dataset(num_blocks=200, num_rows=10000, schema={col1: int64, col2: string})

ds.show(5)
# -> {'col1': 0, 'col2': '0'}
# -> {'col1': 1, 'col2': '1'}
# -> {'col1': 2, 'col2': '2'}
# -> {'col1': 3, 'col2': '3'}
# -> {'col1': 4, 'col2': '4'}

ds.schema()
# -> col1: int64
# -> col2: string

### Summary
#### Key Concepts
#### Key API Elements in This Section
#### Next

## Ray Train
Ray Train is a lightweight library for distributed deep learning that allows you to easily supercharge your distributed PyTorch and TensorFlow training on Ray.
### Why Ray Train?
- Native Multi-GPU Support: Easily scale from single threaded to multi-GPU training in under 10 lines of code.
- Intuitive API: With a best-in-class API for gradient descent, migrate to production or scale to a large cluster without rewriting code.
- Framework-agnostic: Seamlessly works with best-in-class deep learning frameworks including PyToch, Tensorflow, Horovod, and many more.

In [None]:
import ray.train as train
from ray.train import Trainer
import torch

def train_func():
    # Setup model.
    model = torch.nn.Linear(1, 1)
    model = train.torch.prepare_model(model)
    loss_fn = torch.nn.MSELoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=1e-2)

    # Setup data.
    input = torch.randn(1000, 1)
    labels = input * 2
    dataset = torch.utils.data.TensorDataset(input, labels)
    dataloader = torch.utils.data.DataLoader(dataset, batch_size=32)
    dataloader = train.torch.prepare_data_loader(dataloader)

    # Train.
    for _ in range(5):
        for X, y in dataloader:
            pred = model(X)
            loss = loss_fn(pred, y)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

    return model.state_dict()

trainer = Trainer(backend="torch", num_workers=4)
trainer.start()
results = trainer.run(train_func)
trainer.shutdown()

print(results)

### Summary
#### Key Concepts
#### Key API Elements in This Section
#### Next

## Ray RLLib
RLlib is the industry-standard reinforcement larning Python frameowrk built on Ray. Designed for quick interation and a fast path to production, it includes 25+ latest algorithms that are all implemented to run at scale and in multi-agent mode.
### Why RLlib?
- Vibrant Community: Reinforcement learning is hard. Easily find code examples and connect with other developers and experts.
- Distributed-first: Iterate quickly without needing to rewrite again to go to production or scale to a large cluster
- State-of-the-Art: Choose from the latest and greatest in reinforcement learning algorithms to find the one best suited for your problem. Enjoy multi-agent support in all.
- Support External Simulators: Optimize your policies using an industry- or problem-specific external simulator. Connect simulations to RLlib via its PolicyServer/Client architecture.
- Seriously fast: Experience really fast policy evaluation with lower overhead than most algorithms.
- Tap into Ray ecosystem: Find the perfect set of hyperparameters using Ray Tune. Serve your trained model in a massively parallel way with Ray Serve.

In [None]:
from ray import tune
from ray.rllib.agents.ppo import PPOTrainer
 
tune.run(PPOTrainer, config={
   "env": "CartPole-v0",
   "framework": "torch",
   "log_level": "INFO"
})

### Summary
#### Key Concepts
#### Key API Elements in This Section
#### Next

## Ray Tune
Ray Tune is a Python library for fast hyperparameter tuning at scale. Easily distribute your trial runs to quickly find the best hyperparameters.
### Why Ray Tune?
- State of the Art Algorithms: Maximize model performance and minimize training costs by using the latest algorithms such as PBT, HyperBAND, ASHA, and more.
- Library Agnostic: Ray Tune supports all the popular machine learning frameworks, including PyTorch, TensorFlow, XGBoost, LightGBM, and Keras--use your favorite!
- Built-In Distributed Mode: WIth built-in multi-GPU and multi-node support, and seamless fault tolerance, easily parallelize your hyperparameter search jobs.
- Power Up Existing Workflows: Have an existing workflow in another library like HyperOpt and Ax? Integrate Ray Tine to improve performance with minimal code changes.
- 10x Your Productivity: Start using Ray Tune by changing just a couple lines of code. Enjoy simpler code, automatic checkpoints and integrations with tools like MLflow and TensorBoard.
- Hooks into the Ray Ecosystem: Use Ray Tune on its own, or combine with other Ray libraries such as XGBoost-Ray, RLlib

In [None]:
from ray import tune
 
def objective(step, alpha, beta):
   return (0.1 + alpha * step / 100)**(-1) + beta * 0.1
 
def training_function(config):
   # Hyperparameters
   alpha, beta = config["alpha"], config["beta"]
   for step in range(10):
       # Iterative training function - can be any arbitrary training procedure.
       intermediate_score = objective(step, alpha, beta)
       # Feed the score back back to Tune.
       tune.report(mean_loss=intermediate_score)
 
analysis = tune.run(
   training_function,
   config={
       "alpha": tune.grid_search([0.001, 0.01, 0.1]),
       "beta": tune.choice([1, 2, 3])
   })
 
print("Best config: ", analysis.get_best_config(
   metric="mean_loss", mode="min"))
 
# Get a dataframe for analyzing trial results.
df = analysis.results_df

### Summary
#### Key Concepts
#### Key API Elements in This Section
#### Next

## Ray Serve
Ray Serve lets you serve machine learning models in real-time or batch using a simple Python API. Serve individual models or create composite model pipelines, where you can independently deploy, update, and scale individual components.
### Why Ray Serve?
- Pythonic API: Configure your model serving declartively in pure Python without needing YAML or JSON configs.
- Low Latency, High Throughput: Horizontally scale across hundreds of processes or machines, while keeping the overhead in single-digit milliseconds.
- Multi-model composition: Easily compose multiple models, mix model serving with business logic, and independently scale components, without complex microservices.
- Framework-agnostic: Use a single tool to serve all types of models -- from PyTorch and Tensorflow to scikit-Learn models -- and business logic.
- Fast API Integration: Scale an existing FastAPI server easily or define an HTTP interface for your model using its simple, elegant API.
- Native GPU support: Using GPUs is as simple as adding one line of Python code. Maximize hardware utilization by sharing CPUs or GPUs between different models.

In [None]:
import requests

from sklearn.datasets import load_iris
from sklearn.ensemble import GradientBoostingClassifier

from ray import serve

@serve.deployment(route_prefix="/iris")
class BoostingModel:
    def __init__(self, model):
        self.model = model
        self.label_list = iris_dataset["target_names"].tolist()

    async def __call__(self, request):
        payload = await request.json()
        print(f"Received flask request with data {payload}")

        prediction = self.model.predict([payload["vector"]])[0]
        human_name = self.label_list[prediction]
        return {"result": human_name}

if __name__ == "__main__":

    # Train model.
    iris_dataset = load_iris()
    model = GradientBoostingClassifier()
    model.fit(iris_dataset["data"], iris_dataset["target"])

    # Deploy model
    serve.run(BoostingModel.bind(model))

    # Query model
    sample_request_input = {"vector": [1.2, 1.0, 1.1, 0.9]}
    response = requests.get("http://localhost:8000/iris", json=sample_request_input)
    print(response.text)
    
    # prints
    # Result:
    # {"result": "versicolor"}

### Summary
#### Key Concepts
#### Key API Elements in This Section
#### Next

## Ray Clusters
Unclear when to mention this elegantly

### Summary
#### Key Concepts
#### Key API Elements in This Section
#### Next

# 3. Ray Ecosystem

## Distributed XGBoost / LightGBM

### Summary
#### Key Concepts
#### Key API Elements in This Section
#### Next

## Integrations
Ray integrates with many popular Python and machine learning libraries and frameworks, letting you scale your existing workloads with minimal code changes.

### Summary
#### Key Concepts
#### Key API Elements in This Section
#### Next

## Community Libraries

### Summary
#### Key Concepts
#### Key API Elements in This Section
#### Next

# Next Steps