Skip to content

Commit

Permalink
Feature / AWS basic storage (#213)
Browse files Browse the repository at this point in the history
* Simplify Gradle functions for managing build-time plugins

* Move core services onto new plugin build system

* Turn off default plugins flag

* Move deploy-metadb and -lib-db onto new plugin handling

* Set plugin dependencies (pre work on aws-storage)

* Switch to AWS SDK V2

* Stub AWS storage plugin

* Add AWS storage plugin to build config, disable AWS config for now (on AWS V1 API)

* Use data context instead of exec context for file reader / writer method in IFileStorage

* Stub IFileStorage methods in S£ object storage

* Add integration testing (stub) in CI for S3 storage

* Add integration tests for data round trip

* Set up integration testing workflow for S3 testing

* Do not include logging in storage integration config

* Use different H2 DB path to let deploy-metadb work in integration

* Let platform test setup create the metadb schema

* Let platform test setup create the metadb schema

* Integration tests for tenant separation test

* Integration tests for file round trip and operations

* Integration tests for data operations

* Rename integration test config files for int-metadb tests

* Rename stability test

* Expose storage tests as a test suite that can be used across implementations

* Set up storage operations test for AWS storage

* Move storage errors class to the main common storage package

* Move exception class mapping for local storage errors into a specialized class

* Run storage test suite for S3 storage impl

* Move storage operation constants into IFileStorage (share across impls)

* Move generic handling of known exceptions in storage error handler

* Fix a few storage ops tests to allow running with one storage instance for the whole suite

* Allow more settings to be defined in PlatformTest

* Trace logging in core codecs

* Explicit start and stop for storage, pass in service ELG

* Change default number of service threads in data service

* Add Netty NIO options lib to AWS storage plugin

* Update data plugin test suite for S3

* S3 storage plugin rough work

* Implement S3 storage in the runtime
(this is temporary, we should switch to arrow file system + fsspec)

* Fix handling of leading slash in S3 object storage root path

* Update CI configuration (S3 testing not available in CI yet)

* Packaging for AWS plugin

* Compliance fixes

* Compliance fixes

* Compliance fixes

* Filter out plugin JARs that are already included as part of the TRAC platform

* Allow MIT-0 license

* Quick documentation on AWS S3 storage

* Handle plugin load failures

* Supply unit test config for slow unit (integration) tests

* Handle plugin load failures
  • Loading branch information
martin-traverse committed Dec 5, 2022
1 parent 84dc604 commit 05bd5d3
Show file tree
Hide file tree
Showing 71 changed files with 3,263 additions and 599 deletions.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
104 changes: 104 additions & 0 deletions .github/config/int-storage-s3.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Copyright 2022 Accenture Global Solutions Limited
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

config:
secret.type: PKCS12
secret.url: secrets.p12


platformInfo:
environment: TEST_ENVIRONMENT
production: false
deploymentInfo:
region: UK


authentication:
jwtIssuer: http://localhost/
jwtExpiry: 7200


# Stick with H2 database for storage integration testing
metadata:
format: PROTO
database:
protocol: JDBC
properties:
dialect: H2
jdbcUrl: ${TRAC_DIR}/trac.meta
h2.user: trac
h2.pass: trac
h2.schema: public
pool.size: 10
pool.overflow: 5


storage:

defaultBucket: STORAGE_INTEGRATION
defaultFormat: ARROW_FILE

buckets:

STORAGE_INTEGRATION:
protocol: S3
properties:
region: ${TRAC_AWS_REGION}
bucket: ${TRAC_AWS_BUCKET}
path: int-storage-s3
accessKeyId: ${TRAC_AWS_ACCESS_KEY_ID}
secretAccessKey: ${TRAC_AWS_SECRET_ACCESS_KEY}


repositories:
UNIT_TEST_REPO:
protocol: git
properties:
repoUrl: ${CURRENT_GIT_ORIGIN}


executor:
protocol: LOCAL
properties:
venvPath: ${TRAC_EXEC_DIR}/venv


instances:

meta:
- scheme: http
host: localhost
port: 8081

data:
- scheme: http
host: localhost
port: 8082

orch:
- scheme: http
host: localhost
port: 8083


services:

meta:
port: 8081

data:
port: 8082

orch:
port: 8083
76 changes: 72 additions & 4 deletions .github/workflows/integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ jobs:
DB_PORT: 3306,
DB_OPTIONS: '--health-cmd="mysqladmin ping" --health-interval=10s --health-timeout=5s --health-retries=3',
BUILD_sql_mysql: true,
TRAC_CONFIG_FILE: '.github/config/trac-int-mysql.yaml',
TRAC_CONFIG_FILE: '.github/config/int-metadb-mysql.yaml',
TRAC_SECRET_KEY: wDeq3x-NjaLL7,
MYSQL_DATABASE: trac,
MYSQL_USER: trac_admin,
Expand All @@ -145,7 +145,7 @@ jobs:
DB_PORT: 3306,
DB_OPTIONS: '--health-cmd="mysqladmin ping" --health-interval=10s --health-timeout=5s --health-retries=3',
BUILD_sql_mariadb: true,
TRAC_CONFIG_FILE: '.github/config/trac-int-mariadb.yaml',
TRAC_CONFIG_FILE: '.github/config/int-metadb-mariadb.yaml',
TRAC_SECRET_KEY: uYhnKwq8+esS,
MYSQL_DATABASE: trac,
MYSQL_USER: trac_admin,
Expand All @@ -157,7 +157,7 @@ jobs:
DB_PORT: 5432,
DB_OPTIONS: '--health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5',
BUILD_sql_postgresql: true,
TRAC_CONFIG_FILE: '.github/config/trac-int-postgresql.yaml',
TRAC_CONFIG_FILE: '.github/config/int-metadb-postgresql.yaml',
TRAC_SECRET_KEY: hjXks83bX=wxMr,
POSTGRES_DB: trac,
POSTGRES_USER: trac_admin,
Expand All @@ -168,7 +168,7 @@ jobs:
DB_PORT: 1433,
DB_OPTIONS: '-e "NO_DB_OPTIONS=not_used"', # docker run -e flag sets an env variable, passing '' causes errors
BUILD_sql_sqlserver: true,
TRAC_CONFIG_FILE: '.github/config/trac-int-sqlserver.yaml',
TRAC_CONFIG_FILE: '.github/config/int-metadb-sqlserver.yaml',
TRAC_SECRET_KEY: unHkj>weN2jSl,
MSSQL_PID: Developer,
ACCEPT_EULA: Y,
Expand Down Expand Up @@ -233,3 +233,71 @@ jobs:
name: junit-test-results
path: build/modules/*/reports/**
retention-days: 7


int-storage:

# Storage integration tests not available in CI yet
# (test cases must be run manually until CI resources are available)
if: ${{ false }}

runs-on: ubuntu-latest
timeout-minutes: 5

strategy:

# Try to finish all jobs - it can be helpful to see if some succeed and others fail
fail-fast: false

matrix:

storage:

- { PROTOCOL: S3,
BUILD_aws_storage: true,
TRAC_CONFIG_FILE: '.github/config/int-storage-s3.yaml',
TRAC_SECRET_KEY: storage_s3_secrets,
S3_BUCKET: not-configured }


env: ${{ matrix.storage }}

steps:

# fetch-depth = 0 is needed to get tags for version info
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0

- name: Set up Java
uses: actions/setup-java@v3
with:
distribution: ${{ env.JAVA_DISTRIBUTION }}
java-version: ${{ env.JAVA_VERSION }}
cache: gradle

- name: Build
run: ./gradlew testClasses

# Auth tool will also create the secrets file if it doesn't exist
- name: Prepare Auth Keys
run: |
./gradlew auth-tool:run --args="\
--config ${{ env.TRAC_CONFIG_FILE }} \
--secret-key ${{ env.TRAC_SECRET_KEY }} \
--task create_root_auth_key EC 256"
# No need to prepare DB, it is done by the platform test setup

- name: Integration Tests
run: ./gradlew integration -DintegrationTags="int-storage"

# If the tests fail, make the output available for download
- name: Store failed test results
uses: actions/upload-artifact@v3
if: failure()
with:
name: junit-test-results
path: build/modules/*/reports/**
retention-days: 7
31 changes: 31 additions & 0 deletions .github/workflows/packaging.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,27 @@ jobs:
cd build/dist
zip -r tracdap-sandbox-${VERSION}.zip tracdap-sandbox-${VERSION}
# The dist for each plugin includes all its dependency JARs
# We filter out JARs that are already included as part of the TRAC platform
# This avoids putting the same JAR on the classpath twice when a plugin is installed
- name: Assemble plugins package
run: |
VERSION=`dev/version.sh`
mkdir -p build/dist/tracdap-plugins-${VERSION}
for PLUGIN in build/plugins/*/install/*; do
PLUGIN_NAME=`basename ${PLUGIN}`
mkdir build/dist/tracdap-plugins-${VERSION}/${PLUGIN_NAME}
cp ${PLUGIN}/*.jar build/dist/tracdap-plugins-${VERSION}/${PLUGIN_NAME}/
for JAR in ${PLUGIN}/lib/*.jar; do
JAR_NAME=`basename ${JAR}`
if [ ! -f build/dist/tracdap-sandbox-${VERSION}/lib/${JAR_NAME} ]; then
cp ${JAR} build/dist/tracdap-plugins-${VERSION}/${PLUGIN_NAME}/
fi
done
done
cd build/dist
tar -czvf tracdap-plugins-${VERSION}.tgz tracdap-plugins-${VERSION}/*
- name: Save packages
uses: actions/upload-artifact@v3
with:
Expand Down Expand Up @@ -291,6 +312,16 @@ jobs:
asset_name: tracdap-sandbox-${{ steps.tracdap-version.outputs.tracdap_version }}.zip
asset_content_type: application/zip

- name: Publish plugins package
uses: actions/upload-release-asset@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ github.event.release.upload_url }}
asset_path: tracdap-plugins-${{ steps.tracdap-version.outputs.tracdap_version }}.tgz
asset_name: tracdap-plugins-${{ steps.tracdap-version.outputs.tracdap_version }}.tgz
asset_content_type: application/gzip

- name: Publish API packages
uses: actions/upload-release-asset@v1
env:
Expand Down
13 changes: 7 additions & 6 deletions dev/compliance/owasp-false-positives.xml
Original file line number Diff line number Diff line change
Expand Up @@ -72,18 +72,19 @@
<vulnerabilityName>CVE-2022-38752</vulnerabilityName>
<vulnerabilityName>CVE-2022-38751</vulnerabilityName>
<vulnerabilityName>CVE-2022-41854</vulnerabilityName>
<vulnerabilityName>CVE-2022-1471</vulnerabilityName>
</suppress>

<!-- This error is fixed in Jackson version 2.14-rc1 -->
<!-- MAT: I'm not comfortable putting the dependency on an RC version -->
<!-- As soon as 2.14 is released, we can upgrade and remove this exception -->
<!-- More spurious errors being reported in SonaType -->
<!-- This is not a real vulnerability, it should be contested / closed -->
<!-- https://nvd.nist.gov/vuln/detail/CVE-2022-45868 -->

<suppress>
<notes><![CDATA[
file name: jackson-databind-2.13.4.jar
file name: h2-2.1.214.jar
]]></notes>
<packageUrl regex="true">^pkg:maven/com\.fasterxml\.jackson\.core/jackson\-databind@.*$</packageUrl>
<vulnerabilityName>CVE-2022-42003</vulnerabilityName>
<packageUrl regex="true">^pkg:maven/com\.h2database/h2@.*$</packageUrl>
<cve>CVE-2022-45868</cve>
</suppress>

</suppressions>
3 changes: 3 additions & 0 deletions dev/compliance/permitted-licenses.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@
{
"moduleLicense": "MIT License"
},
{
"moduleLicense": "MIT-0"
},
{
"moduleLicense": "The 3-Clause BSD License"
},
Expand Down
8 changes: 4 additions & 4 deletions doc/deployment/authentication.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,8 @@ Providers

You need to configure one provider in the authentication section of the gateway config file.

**Guest Provider**
^^^^^^^^^^^^^^^^^^
Guest Provider
^^^^^^^^^^^^^^

The guest provider logs everyone in as guest, without prompting for credentials.
This is the default provider set up in the sandbox example configuration.
Expand All @@ -94,8 +94,8 @@ The user ID and name can be set as properties of the provider.
userName: Guest User
**Basic Provider**
^^^^^^^^^^^^^^^^^^
Basic Provider
^^^^^^^^^^^^^^

The basic provider uses HTTP basic authentication, which typically causes the browser
authentication window to appear when users try to access pages in a browser. To use
Expand Down
1 change: 1 addition & 0 deletions doc/deployment/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,4 @@ Deployment
platform
metadata_store
authentication
storage
72 changes: 72 additions & 0 deletions doc/deployment/storage.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@

Storage Configuration
=====================


Local Storage
-------------

Local storage is available in the base platform and does not require installing any plugins.
For instructions on setting up local storage, see the
:doc:`sandbox quick start guide <sandbox>`

AWS S3 Storage
--------------

You will need to set up an S3 bucket, and an IAM user with permissions to access that bucket.
These are the permissions that need to be assigned to the bucket.

.. code-block:: json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ListObjectsInBucket",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<aws_account_id>:user/<iam_user>"
},
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<bucket_name>"
},
{
"Sid": "AllObjectActions",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<aws_account_id>:user/<iam_user>"
},
"Action": [
"s3:*Object",
"s3:*ObjectAttributes"
],
"Resource": "arn:aws:s3:::<bucket_name>/*"
}
]
}
To install the AWS storage plugin, download the plugins package from the latest release on the
`release page <https://github.com/finos/tracdap/releases>`_. Inside the plugins package you
will find a folder for the AWS storage plugin, copy the contents of this folder into the *plugins*
folder of your TRAC D.A.P. installation.

You will then be able to configure an S3 storage instance in your platform configuration. The region,
bucket name and access key properties are required.

The *path* property is optional, if specified it will be used as a prefix for all objects stored in the bucket.
TRAC follows the convention of using path-like object keys, so backslashes can be used in the path prefix if desired.

.. code-block:: yaml
storage:
buckets:
TEST_PLUGIN:
protocol: S3
properties:
region: <aws_region>
bucket: <aws_bucket_name>
path: <storage_prefix>
accessKeyId: <aws_access_key_id>
secretAccessKey: <aws_secret_access_key>
Loading

0 comments on commit 05bd5d3

Please sign in to comment.