Permalink
Switch branches/tags
Nothing to show
Find file Copy path
cc4add0 Mar 1, 2018
1 contributor

Users who have contributed to this file

271 lines (190 sloc) 9.35 KB

Parsec Provider Integration

This document describes the assumptions underlying Parsec's cloud provider integration, and what APIs are needed to integrate with us.

Terms and Concepts

  • provider - The supplier of hardware in the cloud (you!).
  • instance - The cloud instance provided to Parsec by the provider.
  • user - An user on the Parsec platform, who requests provider capacity from our app.
  • region - The machine-readable datacenter location, e.g. us-east-1
  • instance type - a CPU, RAM and GPU combination, e.g. g2.2xlarge
  • instance image - the base image from which new instances can be spawned, e.g. AMI on AWS.

Agent Requirements

The Parsec agent refers to the process that we run on the instance that communicates with our APIs to signal the health of the server, allows users to connect using their Parsec client, and streams the computer to the connected clients using the custom Parsec protocols.

To be able to connect, there are a few important requirements that a provider must fulfill:

  • Support instances with the Windows Server 2016 64bit base OS (or equivalent).
    • Our user account must have root access.
  • Non-virtualized access to graphics card.
    • To guarantee high performance to our users, we read and write to the native graphics card APIs. As such, we do not support multi-tenant VMs that share virtual access to a graphics card.
  • Static, resolvable IP and other networking
    • Parsec has the ability to punch through many layers of NATs, but this does not work for all user network setups. As such, a provider should assign an un-NATed ip to each instance.
    • The ip does not need to stay fixed between separate runs of the instance, so a stopped instance can have it's ip reclaimed.
    • Parsec uses a variety of non-standard ports to send data back and forth between servers and clients. As such, providers must ensure this traffic isn't filtered to/from the datacenter.

Instance Model

These are the important attributes and concepts an instance must support.

  • id
  • status
  • region
  • instance type
  • instance image
  • disk size
  • tags
  • user data

id

Each instance should be identified by a unique string identifier, that stays fixed for the lifetime of the instance. Ids are globally unique, and are never reclaimed.

It is advisable to keep these reasonably short to start off and not use UUIDs, as it makes it easier to query APIs during testing.

status

We rely on instances to report the following statuses, which we refer to around the documentation. Statuses should be descriptive strings, but do not need to be the exact strings we use, as we keep an internal mapping.

  • started/online/on - instance is online and Parsec Agent is expected to be running.
    • A small delay when the OS is starting up is fine, this can still be an online state.
    • However, instances that are updating (such as Windows Update [see elsewhere]) should not be considered online.
  • stopped/offline/off - instance is powered down and doesn't cost user anything except storage
  • starting - instance is transitioning to online state
  • stopping - instance is transitioning to offline state
  • provisioning - instance is being created for the first time. Will eventually arrive at online state.

[Optional]

  • terminated - a instance that is terminated costs a user nothing and has no storage. It cannot be interacted with and exists only for record-keeping purposes
  • rebooting - some providers support a specific state for when a instance is restarting. It should be followed by an online state.

region

The region where the instance is running.

instance type

The instance type for the instance.

instance image

The image the instance was spawned from.

disk size

An instance should support different storage sizes for it's volume, which a users decides on startup. This disk does not have to be resizable, lives along with the instance, and is removed when the instance is terminated.

We do not currently manage the concept of volumes separately from instances, so we do not rely on concepts of attaching/detaching, etc. However, we do require that volumes will be automatically removed along with their instances. Furthermore, storage should be billed separately from the running of an instance.

user data

When our agent starts, we need a well defined way to retrieve data from the instance to allow initial authentication against our API.

We supply the user data on instance create, but it should be able to be updated for a given instance. Updates can be restricted to stopped instances only.

Our agent expects to retrieve the data from a url or ip that is only resolvable from the instance, and does not require further authentication, and which is available immediately on instance startup.

GOOD: http://169.254.169.254/latest/user-data (AWS)
GOOD: https://metadata.paperspace.com (Paperspace: This DNS record is only routable by a given instance. NB: https is not a requirement)
BAD: https://public.provider.com/my-instance-id/data/ (We store sensitive data in this field, so only the instance itself must know about it)
BAD: https://api.provider.com/get-user-data/my-instance-id/?key=API_KEY (We do not want to store the API key on the instance where it can be compromised by the user)

tags

We rely on the ability to tag instances with data, which is later returned to us in the list, status and billing calls, so we can correlate instances from providers with their DB records.

These tags are set on create time, and must be returned in the list call. They should be editable.

Tags are different from user data in that we use tags to map instances to users and environments, whereas user data is used to bootstrap authentication for a given user once a instance is started.

Tags should also be returned with billing information for a given instance, so we can correlate how much each instance is costing us.

Parsec API Calls

Here follows an in depth look at the API calls we require for a functioning provider integration.

We expect Provider APIs to return JSON, to follow RESTful practices, and to signal errors with both HTTP error codes and well formed JSON error objects.

Create

Requests the creation of a new instance in a given region. The request should block until an instance create request has been successfully scheduled with the datacenter, and it's metadata has been created, but should finish before the instance is fully provisioned and should set the instance state to provisioning. Importantly, once the call returns instance id should be allocated, so we can track it's progress using provider status API. This call should not be eventually consistent, and should not return a create pending request or equivalent. If there is an error, such as a lack of availability, the call should not succeed and must return a well-formed error.

Arguments

  • region
  • instance type
  • instance image
  • disk size
  • tags
  • user data

Returns

{ "instance_id": "some-id",
  "status": "provisioning",
  ...
}

List

Lists all instances under active management. Does not include terminated instances. Importantly, this call must include the up-to-date status and the tags of each instance.

Arguments

None

Returns

[{instance}, ...]

List of instance objects

Status

Gives the detailed status of an instance

Arguments

  • instance id

Returns

{"id": "some-id", "status": "status", "some-non-standard-field": "something", ...}

An instance object, potentially with more fields than list and create calls to aid with debugging.

Availability

Returns if resources are available to create new instance in a given region. Returning true doesn't guarantee that resources are available, but signal that we should block creates for a given region when there is a demand surge.

Arguments

  • instance type
  • region

Returns

{"available": true/false}

Start

Requests a start of a instance. If a instance can not be started, such as no availability, should return an error.

Arguments

  • instance id

Returns

{"success": "ok"} OR {"error": "some error"}

Stop

Requests a stop of a instance.

Arguments

  • instance id

Returns

{"success": "ok"} OR {"error": "some error"}

Terminate

Same as a stop, except it also cleans up all associated data of the instance. Machines should not have to be stopped to be terminated.

Arguments

  • instance id

Returns

{"success": "ok"} OR {"error": "some error"}

Billing

Billing doesn't have to be a strict API, but we do need an automatic way to reconcile costs to us from the provider on a per-instance basis and correlate it with users.

Ideally, billing records can be fetches using an API in .csv format, where each record describes details of an instance charge, including the tags described above, as well as a unique identifier to help us de-duplicate records.