Skip to content

Commit

Permalink
Add s3 index.rst and schema.yaml files
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 562371510
Change-Id: I6cfcfc3930e7a58e6e51a3f443f3bdf941e496a7
  • Loading branch information
laramiel authored and Copybara-Service committed Sep 3, 2023
1 parent 32a37d9 commit 78b93c9
Show file tree
Hide file tree
Showing 3 changed files with 246 additions and 0 deletions.
8 changes: 8 additions & 0 deletions tensorstore/kvstore/s3/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,14 @@ package(default_visibility = ["//visibility:public"])

licenses(["notice"])

filegroup(
name = "doc_sources",
srcs = glob([
"**/*.rst",
"**/*.yml",
]),
)

# To enable debug checks, specify:
# bazel build --//tensorstore/kvstore/s3:debug
bool_flag(
Expand Down
77 changes: 77 additions & 0 deletions tensorstore/kvstore/s3/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
.. _s3-kvstore-driver:

``s3`` Key-Value Store driver
===============================

The ``s3`` driver provides access to S3. Keys directly correspond to HTTP paths.

.. json:schema:: kvstore/s3
.. json:schema:: Context.s3_request_concurrency
.. json:schema:: Context.s3_request_retries
.. json:schema:: Context.experimental_s3_rate_limiter
.. json:schema:: Context.data_copy_concurrency
.. json:schema:: KvStoreUrl/s3
.. _s3-authentication:

Authentication
--------------

To use the ``s3`` driver, you can access buckets that allow public access
without credentials. Otherwise amazon credentials are required:

1. Credentials may be obtained from the environment. Set the
:envvar:`AWS_ACCESS_KEY_ID` environment variable, optionally along with
the :envvar:`AWS_SECRET_ACCESS_KEY` environment variable and the
:envvar:`AWS_SESSION_TOKEN` environment variable as they would be
used by the `aws cli <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html>`_.

2. Credentials may be obtained from the default user credentials file, which may
be found at :file:`~/.aws/credentials`, or the file specified by the
environment variable :envvar:`AWS_SHARED_CREDENTIALS_FILE`, along with
a profile from the schema, or as indicated by the :envvar:`AWS_PROFILE`
environment variables.

3. Acquiring credentials from the EC2 Metadata server is unimplemented.


.. envvar:: AWS_ACCESS_KEY_ID

Specifies an AWS access key associated with an IAM account.
See <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html>

.. envvar:: AWS_SECRET_ACCESS_KEY

Specifies the secret key associated with the access key.
This is essentially the "password" for the access key.
See <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html>

.. envvar:: AWS_SESSION_TOKEN

Specifies the session token value that is required if you are using temporary
security credentials that you retrieved directly from AWS STS operations.
See <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html>

.. envvar:: AWS_SHARED_CREDENTIALS_FILE

Specifies the location of the file that the AWS CLI uses to store access keys.
The default path is `~/.aws/credentials`.
See <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html>

.. envvar:: AWS_PROFILE

Specifies the name of the AWS CLI profile with the credentials and options to
use. This can be the name of a profile stored in a credentials or config file,
or the value `default`` to use the default profile.

If defined, this environment variable overrides the behavior of using the
profile named `[default]` in the credentials file.
See <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html>


161 changes: 161 additions & 0 deletions tensorstore/kvstore/s3/schema.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
$schema: http://json-schema.org/draft-07/schema#
$id: kvstore/s3
allOf:
- $ref: KvStore
- type: object
properties:
driver:
const: s3
bucket:
type: string
title: AWS S3 Storage bucket.
requester_pays:
type: boolean
title: Permit requester-pays requests.
description: |
This option must be enabled in order for any operations to succeed if the bucket has
Requester Pays enabled and the supplied credentials are not for an owner of the bucket.
default: false
aws_region:
type: string
title: AWS region identifier to use in signatures.
description: |
If `.endpoint` is not specified, the region of the `.bucket` is determined automatically.
endpoint:
type: string
title: S3 server endpoint to use in place of the public Amazon S3 endpoints.
description: |
Must be an http or https URL.
examples:
- "http://localhost:1234"
host:
type: string
title: Override HTTP host header to send in requests.
description: |
May only be specified in conjunction with `.endpoint`, to send a different host than
specified in `.endpoint`. This may be useful for testing with
`localstack <https://localstack.cloud/>`__."
examples:
- "mybucket.s3.af-south-1.localstack.localhost.com"
profile:
type: string
description: |
The profile name from the s3 credentials.
s3_request_concurrency:
$ref: ContextResource
description: |-
Specifies or references a previously defined
`Context.s3_request_concurrency`.
s3_request_retries:
$ref: ContextResource
description: |-
Specifies or references a previously defined
`Context.s3_request_retries`.
experimental_s3_rate_limiter:
$ref: ContextResource
description: |-
Specifies or references a previously defined
`Context.experimental_s3_rate_limiter`.
data_copy_concurrency:
$ref: ContextResource
description: |-
Specifies or references a previously defined
`Context.data_copy_concurrency`. It is normally more convenient to
specify a default `~Context.data_copy_concurrency` in the `.context`.
default: data_copy_concurrency
required:
- bucket
definitions:
s3_request_concurrency:
$id: Context.s3_request_concurrency
description: |-
Specifies a limit on the number of concurrent requests to S3.
type: object
properties:
limit:
oneOf:
- type: integer
minimum: 1
- const: "shared"
description: |-
The maximum number of concurrent requests. If the special value of
``"shared"`` is specified, a shared global limit specified by
environment variable :envvar:`TENSORSTORE_S3_REQUEST_CONCURRENCY`,
which defaults to 32.
default: "shared"
s3_request_retries:
$id: Context.s3_request_retries
description: |
Specifies retry parameters for handling transient network errors.
An exponential delay is added between consecutive retry attempts. The
default values are appropriate for S3.
type: object
properties:
max_retries:
type: integer
minimum: 1
description: |-
Maximum number of attempts in the case of transient errors.
default: 32
initial_delay:
type: string
description: |-
Initial backoff delay for transient errors.
default: "1s"
max_delay:
type: string
description: |-
Maximum backoff delay for transient errors.
default: "32s"
experimental_s3_rate_limiter:
$id: Context.experimental_s3_rate_limiter
description: |-
Experimental rate limiter configuration for S3 reads and writes.
type: object
properties:
read_rate:
type: number
description: |-
The maximum rate or read and/or list calls issued per second.
write_rate:
type: number
description: |-
The maximum rate of write and/or delete calls issued per second.
doubling_time:
type: string
description:
The time interval over which the initial rates scale to 2x. The cases
where this setting is useful depend on details to the storage buckets.
default: "0"
url:
$id: KvStoreUrl/s3
allOf:
- $ref: KvStoreUrl
- type: string
title: |
:literal:`s3://` KvStore URL scheme
description: |
AWS S3 key-value stores may be specified using the
:file:`s3://{bucket}/{path}` URL syntax, as supported by `aws s3
<https://docs.aws.amazon.com/cli/latest/reference/s3/>`__.
.. admonition:: Examples
:class: example
.. list-table::
:header-rows: 1
:widths: auto
* - URL representation
- JSON representation
* - ``"s3://my-bucket"``
- .. code-block:: json
{"driver": "s3",
"bucket": "my-bucket"}
* - ``"s3://bucket/path/to/dataset"``
- .. code-block:: json
{"driver": "s3",
"bucket": "my-bucket",
"path": "path/to/dataset"}

0 comments on commit 78b93c9

Please sign in to comment.