Skip to content

GoogleCloudPlatform/storage-sdrs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

650 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Google Cloud Storage - Supplementary Data Retention Service (SDRS)

SDRS allows an organization to manage the Time to Live (TTL) for objects in Google Cloud Storage (GCS)
according to retention policies based off of the creation time encoded in partition prefixes.

For example, the official age of an object could exist as follows:

bucketX/datasetY/{yyyy}/{mm}/{dd}/{hh}/log.txt

In this example, the information encoded in the object name rather than the GCS object metadata
creation time serves to define its age. An organization can define a TTL for datasets and thereby
reliably enforce object retention based on the encoded creation time.

At the most fundamental level, SDRS enforces object retention by mapping policy rules defining the time-to-live (TTL) for datasets existing in GCS buckets. Note, for scenarios where GCS object retention management can rely solely on object creation time rather than an encoded prefix, please see: Object Lifecycle Management

Releases

See the latest release and other releases for the details

High Level Architecture

SDRS is an open-source GCP GitHub project

SDRS exists in two main parts:

  1. A server side service exposing functionality through a RESTful API, see Server Components
  2. A sample client demonstrating interaction with the server side services, see Client Components

SDRS is primarily written in the Java 8 and Python 3 programming languages.
Maven is used as the Java build management tool.
Deployment Manager is used as the DevOps Cloud Orchestration tool.

Key GCP Technologies Utilized in SDRS

Managed Instance Groups (MIGs)

Cloud Functions

Cloud Pub/Sub

Cloud Endpoints

Google Stackdriver

Cloud SQL

Cloud Scheduler

Storage Transfer Service (STS)

Cloud Deployment Manager

Getting Started with SDRS

To get started, clone the project from Google Cloud Platform's Github site here
The full source code for both the server along with a sample client are included in the project. Build and deployment instructions are included as well.

Local Development/Build Steps

The instructions in this section describe how to quickly get started and deploy SDRS to a DEV GCP environment.

  1. Ensure your local environment compiles and builds using Maven:
    mvn clean install package 
  1. Create a CloudSQL instance.

  2. Run MySQL DDL or mods to create/update a database schema in the Cloud SQL instance created above. Note, set log_bin_trust_function_creators to true to overcome a possible error you may encounter when creating the db trigger.

  3. Create Pub/Sub infrastructure for SDRS to publish messages.

  4. Build the SDRS Docker image.

  5. Deploy the SDRS Docker Image you just built into a Compute Engine VM.

  6. Run docker image on the VM:

    • SSH to VM instance.
    • Download your service account credential.json, which is used by SDRS.
    • Create your env.txt, which sets application settings (i.e. database connection).
    • Stop container sudo docker container stop [your_container_id]
    • Run the following to start SDRS

    docker run --detach -v [crendential_json_dir_on_host]:[docker_mount] --name=sdrs --env-file=[your_env_txt] --publish=8080:8080 [your_docker_image]

Note, the application is configured by two key files found in the src/main/resources directory:

  1. ApplicationConfiguration file
  2. Hibernate Configuration file

The sample appConfig.xml file contains example settings that can be leveraged for a development deployment. In general, values that are well known at compile/build/package time can be directly set in the applicationConfig file. For more details on these settings see, Configurable Values

However, values that need to be injected post build (during deployment) are set by token replacement environment variables.
See this sample environment file.

Enterprise Deployment Steps to Google Cloud Platform (GCP)

The instructions in this section serve as an example for deploying SDRS to a full production like GCP environment. For details see the main DevOps Deployment README.
In general, deploying SDRS to a production like environment should occur in the following order:

Deploying the Server Side Components

  1. Cloud SQL Infrastructure Deployment see the Cloud SQL Deployment README.
  2. MySQL DDL or mods execution see the MySQL Schema and mods.
  3. Pub/Sub Infrastructure Deployment the Server Pub/Sub Deployment README.
  4. MIG Deployment the MIG Deployment README.

Deploying the Sample Client Side Components

  1. Cloud Function Deployment (Includes Client side Pub/Sub triggers) see the Client Cloud Functions README.
  2. Cloud Scheduler Crontab creation by way of the GCP Console UI see the Cloud Scheduler README.

Server Components

Configuration Service Details

The Configuration Service is a server side component that is responsible for exposing a RESTful API that handles CRUD operations for the retention policies.

The Configuration Service is the key touchpoint to SDRS when provisioning or updating retention policies. For more details, see the Configuration Service README.

Execution Service Details

The Execution Service is a server side component that is responsible for exposing a RESTful API that manages the execution of retention policy enforcement (i.e. the deletion of objects). The Execution Service is capable of enforcing object retention for three specific use cases:

  1. Retention Policies - dataset specific policies provisioned by way of the configuration service
  2. Default/Global Policy - a global dataset rule that serves as a catch-all for datasets not already covered by specific retention policies
  3. On-demand Delete Markers - ad hoc requests to delete specific datasets immediately

For more details, see the Execution Service README.

Validation Service Details

The Validation Service is a server side component that is responsible for exposing a RESTful API that manages the execution of jobs that serve to validate the completion of already requested enforcement processes. For more details, see the Validation Service README.

Notification Service Details

The Notification Service is a server side component that is responsible for broadcasting notifications of SDRS events to interested parties by way of Pub/Sub

Client Components

Sample Cloud Functions Deployment & Details

The Cloud Functions serve as an example client demonstrating how to interact with the server side SDRS RESTful API. Included in the code base are Cloud Functions that invoke the Configuration, Execution, Validation, and Notification services. For more details, see the Client Cloud Functions README.

Sample Cloud Scheduler Details

SDRS has several functional areas that can be scheduled on a recurring frequency. The scheduler strategy in this sample uses Cloud Scheduler as a decoupled, externally managed crontab service that invokes a Pub/Sub topic that invokes a Cloud Function to invoke SDRS Execution and Validation functionality on a scheduled basis.

For more details, see the Cloud Scheduler README.

Contributing

See the contributing instructions to get started contributing.

License

All solutions within this repository are provided under the Apache 2.0 license. Please see the LICENSE file for more detailed terms and conditions.

About

Data retention tool for Google Cloud Storage

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors