A library for collecting and preserving Google Team Drive history.
This library is a work in progress and is subject to breaking changes.
// Prepare a context
ctx := context.Background()
// Prepare a Google Drive API client
driveService, _ := drive.New(oauthHttpClient)
// Determine the ID of the team drive to be used
teamDriveID := resource.ID("TEAMDRIVEID")
// Prepare an in-memory repository
repo := memrepo.New()
// Prepare a collector that will query the drive service
collector := driveapicollector.New(driveService, teamDriveID)
// Create a stream
stream := drivestream.New(repo, teamDriveID, drivestream.WithLogging(os.Stdout))
// Initialize or update the repository
stream.Update(ctx, collector) // Will perform first-run initialization and resume incomplete updates automatically
// Update the database every minute
timer := time.NewTimer(time.Minute)
for {
select {
case <-timer.C:
stream.Update(ctx, collector)
case <-ctx.Done():
return
}
}
The drivestream
library is intended to be accessed through a Stream
type, which wraps a Repository
implementation that the caller provides.
A stream provides two capabilities:
- Collection and persistence of data through calls to
Update()
- Access to collected data through a
Cursor
(this is not yet implemented)
Data collected from a Team Drive is preserved in a repository. The repository implementation is pluggable, but drivestream was designed with key/value storage in mind.
The drivestream project supplies the following implementations of Repository
:
memrepo
: An in-memory repository useful for testing.boltrepo
: A repository backed by a bolt database.
TODO: Add support for badger
.
Data is brought into a drivestream repository through a series of collections. Each collection is assigned a monotonically increasing sequence number. The initial collection is assigned sequence number zero.
During collection a data source is queried to retrieve the following metadata:
- Information about the team drive, such as its name
- The complete set of files within the team drive
- Subsequent changes to the team drive or its contents
A collection is designed to be resumable in case of interruption or service failure. Data is progressively added to a collection as a series of data pages. A collection can be resumed from the last page that was successfully written.
Collection relies upon a Collector
implementation, which is responsible
for querying a particular kind of data source.
The drivestream project supplies the following implementations of Collector
:
driveapicollector
: A collector that queries Google Drive API version 3
Once collected, data is processed and reformulated into a series of commits.
Once finished with its commit processing, a collection moves to moves to a
Finalized
state. Collections in a finalized state cannot be modified.
A commit represents a consistent view of the entire team drive at a point in time. Like collections, each commit is assigned a monotonically increasing sequence number. The initial commit is assigned sequence number zero.
A collection is processed into one or more commits. The first commit includes all files present in the team drive when the first collection was performed. Subsequent commits typically contain only a single change.
Commits are constructed in two phases:
- Version Processing
- Tree Processing
Each phase is resumable in case of interruption or service failure.
Once a commit has completed it moves to a Finalized
state. Commits in a
finalized state cannot be modified.
TODO: Design this process.
TODO: Design this process.
A work-in-progress key/value database schema:
Key Value
---------------------------------------------------------------- ----------------
/schema "{SCHEMA_VERSION}"
/drive/{DRIVE_ID}/collection/{COLLECTION_NUM}/data "{JSON(COLLECTION_DATA)}"
/drive/{DRIVE_ID}/collection/{COLLECTION_NUM}/state/{STATE_NUM} "{JSON(COLLECTION_STATE)}"
/drive/{DRIVE_ID}/collection/{COLLECTION_NUM}/page/{PAGE_NUM} "{JSON(PAGE_DATA)}"
/drive/{DRIVE_ID}/commit/{COMMIT_NUM}/data "{JSON(COMMIT_DATA)}"
/drive/{DRIVE_ID}/commit/{COMMIT_NUM}/state/{STATE_NUM} "{JSON(COMMIT_STATE)}"
/drive/{DRIVE_ID}/commit/{COMMIT_NUM}/file/{FILE_ID} "{VERSION}"
/drive/{DRIVE_ID}/commit/{COMMIT_NUM}/tree/{PARENT_ID}/{CHILD_ID} "{ACTION}"
/drive/{DRIVE_ID}/time/collection/{TIME} "{COLLECTION_NUM}"
/drive/{DRIVE_ID}/time/commit/{TIME} "{COMMIT_NUM}"
/drive/{DRIVE_ID}/version/{VERSION} "{JSON(DRIVE_DATA)}"
/drive/{DRIVE_ID}/view/{COMMIT_NUM} "{VERSION}"
/drive/{DRIVE_ID}/tree/{COMMIT_NUM} "{HASH(FILE_LIST|CHUNK_LIST)}"
/file/{FILE_ID}/version/{VERSION} "{JSON(FILE_DATA)}"
/file/{FILE_ID}/view/{DRIVE_ID}/{COMMIT_NUM} "{VERSION}"
/file/{FILE_ID}/tree/{DRIVE_ID}/{COMMIT_NUM} "{HASH(FILE_LIST|CHUNK_LIST)}"
/file/{FILE_ID}/time/{DRIVE_ID}/{TIME} "{COMMIT_NUM}"
/tree/hash/{HASH(FILE_LIST|CHUNK_LIST)} "{BINARY(FILE_LIST|CHUNK_LIST)}"