Skip to content
Mike Holt edited this page Oct 23, 2018 · 9 revisions

A workspace in the Scale system is a location where files are stored (source files, product files, etc). A workspace contains configuration specifying how files are stored into and retrieved from the workspace. Workspaces are configured to use various brokers, which know how to store/retrieve files in different storage systems (e.g. NFS).

Example workspace configuration:

{
   "version": "1.0",
   "broker": {
      "type": "nfs",
      "nfs_path": "host:/my/path"
   }
}

The broker value is a JSON object providing the configuration for this workspace’s broker. The type value indicates that the NFS (Network File System) broker should be used for this workspace. The nfs_path field specifies the NFS host and path that should be mounted in order to access the files. To see all of the options for a workspace’s configuration, please refer to the Workspace Configuration Specification below.

Workspace Configuration Specification Version 1.0

A valid workspace configuration is a JSON document with the following structure:

{
   "version": STRING,
   "broker": {
      "type": STRING
   }
}

version

Type: String
Required: No


Defines the version of the configuration used. This allows updates to be made to the specification while maintaining backwards compatibility by allowing Scale to recognize an older version and convert it to the current version. The default value, if not included, is the latest version (currently 1.0). It is recommended, though not required, that you include the version so that future changes to the specification will still accept your workspace configuration.

broker

Type: JSON Object
Required: Yes


Defines the broker that the workspace should use for retrieving and storing files.
  • type

    Type: String
    Required: Yes


    Specifies the type of the broker to use. The other fields that configure the broker are based upon the type specified. The valid broker types are:
    • host - A host broker mounts a local directory from the host into the job’s container. Usually this local directory is a shared file system that has been mounted onto the host.
    • nfs - An nfs broker utilizes an NFS (Network File System) for file storage.
    • s3 - An s3 broker utilizes the Amazon Web Services (AWS) Simple Storage Service (S3) for file storage.

    Additional broker fields may be required depending on the type of broker selected. See below for more information on each broker type.

Host Broker

The host broker mounts a local directory from the host into the job’s container. This local directory should be a shared file system that has been mounted onto all hosts in the cluster. All hosts must have the same shared file system mounted at the same location for this broker to work properly.

??? danger "Permissions" The Scale Docker containers run with a UID and GID of 7498. To ensure that permissions are appropriately handled within Docker, make sure that your host’s local directory is owned by a user/group with UID/GID of 7498/7498.

??? danger "Security" There are potential security risks involved with mounting a host directory into a Docker container. It is recommended that you use another broker type if possible.

Example host broker configuration:

{
   "version": "1.0",
   "broker": {
      "type": "host",
      "host_path": "/the/absolute/host/path"
   }
}

The host broker requires one additional field in its configuration:

host_path

Type: String
Required: Yes


Specifies the absolute path of the host’s local directory that should be mounted into a job’s container in order to access the workspaces files.

NFS Broker (experimental)

The NFS broker mounts a remote network file system volume into the job’s container.

??? danger "Plugin Required" In order to use Scale’s NFS broker, you must install and run the Netshare Docker plugin. Please see http://netshare.containx.io/ for more information.

??? danger "Permissions" The Scale Docker containers run with a UID and GID of 7498. To ensure that permissions are appropriately handled within Docker, make sure that the directories in your NFS file volume are owned by a user/group with UID/GID of 7498/7498.

Example NFS broker configuration:

{
   "version": "1.0",
   "broker": {
      "type": "nfs",
      "nfs_path": "host:/my/path"
   }
}

The NFS broker requires one additional field in its configuration:

nfs_path

Type: String
Required: Yes


Specifies the remote NFS path to use for storing and retrieving the workspace files. It should be in the format host:/path.

S3 Broker (experimental)

The S3 broker references a storage location that exists as an S3 bucket in your AWS account. Please take note of the bucket name, which is typically of the form my_name.domain.com since bucket names must be globally unique (See Bucket Restrictions). The bucket must be configured for read and/or write access through an appropriate IAM account (Identity and Access Management). Once the IAM account is created and granted permissions to the bucket, then there are two ways to handle authentication. IAM roles can be used to automatically grant permissions to the EC2 executing the broker operations (See AWS Roles). This method is preferred because no secret keys are required. Alternatively, an ACCESS KEY ID and SECRET ACCESS KEY can be generated and used with this broker (See AWS Credentials). These tokens allow 3rd party software to access resources on behalf of the associated account.

??? danger "Security" A dedicated IAM account should be used rather than the root AWS account to limit the risk of damage if a leak were to occur and similarly the IAM account should be given the minimum possible permissions needed to work with the bucket. The access tokens should also be changed periodically to further protect against leaks.

??? danger "Security" While this broker is in the experimental phase, the access keys are currently stored in plain text within the Scale database and exposed via the REST interface. A future version will maintain these values using a more appropriate encrypted store service.

Example S3 broker configuration:

{
   "version": "1.0",
   "broker": {
      "type": "s3",
      "bucket_name": "my_bucket.domain.com",
      "credentials": {
         "access_key_id": "AKIAIOSFODNN7EXAMPLE",
         "secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
      },
      "host_path": "/my_bucket",
      "region_name": "us-east-1"
   }
}

The S3 broker requires the following additional fields in its configuration:

bucket_name

Type: String
Required: Yes


Specifies the globally unique name of a storage bucket within S3. The bucket should be created before attempting to use it here.

credentials

Type: String
Required: No


Provides the necessary information to access the bucket. This attribute should be omitted when using IAM role-based security. If it is included for key-based security, then both sub-attributes must be included. An IAM account should be created and granted the appropriate permissions to the bucket before attempting to use it here.
  • access_key_id

    Type: String
    Required: No


    A Unique identifier for the user account in IAM that will be used as a proxy for read and write operations within Scale.
  • secret_access_key

    Type: String
    Required: No


    A generated token that the system can use to prove it should be able to make requests on behalf of the associated IAM account without requiring the actual password used by that account.

host_path

Type: String
Required: No


Adds S3 workspace support for locally mounted buckets and partial file read-only access. If a FUSE file system (such as s3fs or goofys) mounts the S3 bucket at the host_path location on all nodes, an alternative to downloading large files is available to jobs that use only portions of a file. The job interface must indicate partial equal to true for any input files to take advantage of host_path. Only read operations are performed using the mount, all write operations will use the S3 REST API.

region_name

Type: String
Required: No


Specifies the AWS region where the S3 bucket is located. This is not always required, as environment variables or configuration files could set the default region, but it is a highly recommended setting for explicitly indicating the bucket region.
Clone this wiki locally