Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compression to object store #264

Merged
merged 6 commits into from
Feb 5, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 46 additions & 39 deletions adr/ADR-20.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
|--------|----|------|----|
|1 |2021-11-03|@scottf|Initial design|
|2 |2023-06-14|@Jarema|Add metadata|
|3 |2024-02-05|@Jarema|Add Compression|

## Context

Expand All @@ -21,16 +22,17 @@ This document describes a design of a JetStream backed object store. This ADR is

We intend to hit a basic initial feature set as below, with some future facing goals as indicated:

Initial feature list:
Current feature list:

- Represent an object store.
- Store a large quantity of related bytes in chunks as a single object.
- Retrieve all the bytes from a single object
- Store metadata regarding each object
- Store multiple objects in a single store
- Store multiple objects in a single store
- Ability to specify chunk size
- Ability to delete an object
- Ability to understand the state of the object store.
- Ability to understand the state of the object store
- Store compression (via Stream compression)

Possible future features

Expand All @@ -39,10 +41,10 @@ Possible future features
- Archiving (tiered storage)
- Searching/Indexing (tagging)
- Versioning / Revisions
- Overriding digest algorithm
- Overriding digest algorithm
- Capturing Content-Type (mime type)
- Per chunk Content-Encoding (i.e. gzip)
- Read an individual chunk.
- Read an individual chunk.

## Basic Design

Expand All @@ -57,7 +59,7 @@ Possible future features
Protocol Naming Conventions are fully defined in [ADR-6](ADR-6.md)

### Object Store
The object store name or bucket name (`bucket`) will be used to formulate a stream name
The object store name or bucket name (`bucket`) will be used to formulate a stream name
and is specified as: `restricted-term` (1 or more of `A-Z, a-z, 0-9, dash, underscore`)

### Object Id
Expand All @@ -71,9 +73,9 @@ Currently `SHA-256` is the only supported digest. Please use the uppercase form
when specifying the digest as in `SHA-256=IdgP4UYMGt47rgecOqFoLrd24AXukHf5-SVzqQ5Psg8=`.

### Modified Time
Modified time is never stored.
Modified time is never stored.
* When putting an object or link into the store, the client should populate the ModTime with the current UTC time before returning it to the user.
* When getting an object or getting an object or link's info, the client should populate the ModTime with message time from the server.
* When getting an object or getting an object or link's info, the client should populate the ModTime with message time from the server.

### Default Settings

Expand All @@ -98,6 +100,7 @@ type ObjectStoreConfig struct {
Storage StorageType // stream storate_type
Replicas int // stream replicas
Placement Placement // stream placement
Compression bool // stream compression
Jarema marked this conversation as resolved.
Show resolved Hide resolved
}
```

Expand Down Expand Up @@ -132,7 +135,8 @@ type ObjectStoreConfig struct {
"placement": {
"cluster": "clstr",
"tags": ["tag1", "tag2"]
}
},
compression: true
}
```

Expand All @@ -144,7 +148,7 @@ type ObjectStoreConfig struct {
type ObjectLink struct {
// Bucket is the name of the other object store.
Bucket string `json:"bucket"`

// Name can be used to link to a single object.
// If empty means this is a link to the whole store, like a directory.
Name string `json:"name,omitempty"`
Expand All @@ -160,7 +164,7 @@ type ObjectMetaOptions struct {
}
```

### ObjectMeta
### ObjectMeta

Object Meta is high level information about an object.

Expand All @@ -176,31 +180,31 @@ type ObjectMeta struct {
}
```

### ObjectInfo
### ObjectInfo

Object Info is meta plus instance information.
The fields in ObjectMeta are serialized in line as if they were
direct fields of ObjectInfo
Object Info is meta plus instance information.
The fields in ObjectMeta are serialized in line as if they were
direct fields of ObjectInfo

```go
type ObjectInfo struct {
ObjectMeta

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure these spaces are desired, also below and above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those spaces were removed in this PR, not added (by markdown linter). I have no idea why you would want them here, but the docs renders the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh god, need more coffee sorry, thought you were adding them lol.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right I see whats happening, you're removing spaces fine - but I think the entire line should be removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right, removed them entirely.
It was just linter cleaning things up. I didn't want to change more than required for this PR, as I will cleanup the ADR as follow up :).

Bucket string `json:"bucket"`

NUID string `json:"nuid"`

// the total object size in bytes
Size uint64 `json:"size"`

ModTime time.Time `json:"mtime"`

// the total number of chunks
Chunks uint32 `json:"chunks"`

// as in http, <digest-algorithm>=<digest-value>
Digest string `json:"digest,omitempty"`

Deleted bool `json:"deleted,omitempty"`
}
```
Expand Down Expand Up @@ -248,37 +252,40 @@ The status of an object
type ObjectStoreStatus interface {
// Bucket is the name of the bucket
Bucket() string

// Description is the description supplied when creating the bucket
Description() string

// Bucket-level metadata
Metadata() map[string]string

// TTL indicates how long objects are kept in the bucket
TTL() time.Duration

// Storage indicates the underlying JetStream storage technology used to store data
Storage() StorageType

// Replicas indicates how many storage replicas are kept for the data in the bucket
Replicas() int

// Sealed indicates the stream is sealed and cannot be modified in any way
Sealed() bool

// Size is the combined size of all data in the bucket including metadata, in bytes
Size() uint64
// BackingStore provides details about the underlying storage.

// BackingStore provides details about the underlying storage.
// Currently the only supported value is `JetStream`
BackingStore() string
}

// IsCompressed indicates if the data is compressed on disk
IsCompressed() bool
}
```

## Functional Interfaces

### ObjectStoreManager
### ObjectStoreManager

Object Store Manager creates, loads and deletes Object Stores

Expand All @@ -295,7 +302,7 @@ CreateObjectStore(cfg ObjectStoreConfig) -> ObjectStore
DeleteObjectStore(bucket string)
```

### ObjectStore
### ObjectStore

Storing large objects efficiently. API are required unless noted as "Optional/Convenience".

Expand All @@ -320,7 +327,7 @@ PutFile(file [string/file reference]) -> ObjectInfo
_Notes_

On convenience methods accepting file information only, consider that the reference could have
operating specific path information that is not transferable. One solution would be to only
operating specific path information that is not transferable. One solution would be to only
use the actual file name as the object name and discard any path information.

**Get**
Expand All @@ -347,8 +354,8 @@ GetFile(name string, file string)

**GetInfo**

GetInfo will retrieve the current information for the object.
* Do not return info for deleted objects, except with optional convenience methods.
GetInfo will retrieve the current information for the object.
* Do not return info for deleted objects, except with optional convenience methods.

```
GetInfo(name string) -> ObjectInfo
Expand Down Expand Up @@ -424,11 +431,11 @@ Status() -> ObjectStoreStatus

### ObjectStore Links

Links are currently under discussion whether they are necessary.
Links are currently under discussion whether they are necessary.
Here is the required API as proposed.
Please note that in this version of the api, it is possible that
Please note that in this version of the api, it is possible that
`obj ObjectInfo` or `bucket ObjectStore` could be stale, meaning their state
has changed since they were read, i.e. the object was deleted after it's info was read.
has changed since they were read, i.e. the object was deleted after it's info was read.

**AddLink**

Expand Down
Loading