Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cadc-11115 new caom2-meta-sync module #247

Merged
merged 15 commits into from
Jun 15, 2023
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .github/workflows/gradle.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,4 +56,7 @@ jobs:
run: cd caom2-artifact-discover && ../gradlew --info clean build javadoc checkstyleMain

- name: build and test caom2-artifact-download
run: cd caom2-artifact-download && ../gradlew --info clean build javadoc checkstyleMain
run: cd caom2-artifact-download && ../gradlew --info clean build javadoc checkstyleMain

- name: build and test caom2-meta-sync
run: cd caom2-meta-sync && ../gradlew --info clean build javadoc checkstyleMain
5 changes: 5 additions & 0 deletions caom2-meta-sync/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
FROM images.opencadc.org/library/cadc-java:1

ADD build/distributions/caom2-meta-sync.tar /

CMD ["/caom2-meta-sync/bin/caom2-meta-sync"]
124 changes: 124 additions & 0 deletions caom2-meta-sync/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# CAOM2 Meta Sync process

Process to sync Observations from a CAOM2 repository service
to a CAOM2 database. Process runs continuously exiting only
when source queries return no results.

pdowler marked this conversation as resolved.
Show resolved Hide resolved
## configuration

See the [cadc-java](https://github.com/opencadc/docker-base/tree/master/cadc-java)
image docs for general config requirements.

Runtime configuration must be made available via the `/config` directory.


### caom2-meta-sync.properties
```
# log level
org.opencadc.caom2.metasync.logging={info|debug}

# Destination caom2 database settings
pdowler marked this conversation as resolved.
Show resolved Hide resolved
org.opencadc.caom2.metasync.destination.db.schema={schema}
org.opencadc.caom2.metasync.destination.db.username={dbuser}
org.opencadc.caom2.metasync.destination.db.password={dbpassword}
org.opencadc.caom2.metasync.destination.db.url=jdbc:postgresql://{server}/{database}

# Source repository service
org.opencadc.caom2.metasync.source.repoService={uri}

# Source caom2 database settings
org.opencadc.caom2.metasync.source.db.schema={schema}
org.opencadc.caom2.metasync.source.db.usrname={dbuser}
org.opencadc.caom2.metasync.source.db.password={dbpassword}
org.opencadc.caom2.metasync.source.db.url=jdbc:postgresql://{server}/{database}

# The collection to sync
org.opencadc.caom2.metasync.collection={collection name}

# Base for generating Plane publisherID values
org.opencadc.caom2.metasync.basePublisherID={uri}

# Number of threads used to read from the source service
org.opencadc.caom2.metasync.threads={integer}

# Number of observations to sync per batch
org.opencadc.caom2.metasync.batchSize={integer}

pdowler marked this conversation as resolved.
Show resolved Hide resolved
# Whether to process each collection only once or continuously
org.opencadc.caom2.metasync.runContinuously={true|false}

# Max sleep time in seconds when running continuously
org.opencadc.caom2.metasync.maxSleep={integer}

# Do logging but do not sync collections
org.opencadc.caom2.metasync.dryrun={true|false}

```
The source can be either a repository service, or a caom2 database. Only one of
`org.opencadc.caom2.metasync.source.repoService` or
`org.opencadc.caom2.metasync.source.db.*` can be configured.

`org.opencadc.caom2.metasync.source.repoService` is the resource identifier for
a registered caom2 repository service (e.g. ivo://cadc.nrc.ca/ams)

`org.opencadc.caom2.metasync.source.db.*` is the caom2 database connection settings.

`org.opencadc.caom2.metasync.collection` The collection name used to query
for Artifacts in the repository service.

`org.opencadc.caom2.metasync.basePublisherID` is the base for generating Plane
publisherID values. The base is an uri of the form ivo://<authority>[/<path>]
publisherID values: <basePublisherID>/<collection>?<observationID>/<productID>

`org.opencadc.caom2.metasync.source.threads` is the number of threads used to
read observations from the source repository service.

`org.opencadc.caom2.metasync.batchSize` is the number of Observations
processed as a single batch. It's a limit on the maximum number of
Observations returned from a repository service query.

`org.opencadc.caom2.metasync.runContinuously` when true continuously loops through
and syncs each collection, pausing between runs. When false each collection is
synced once and the application exits.

`org.opencadc.caom2.metasync.maxSleep={integer}` is the maximum sleep time
in seconds between runs when `org.opencadc.caom2.metasync.runContinuously=true`.
The sleep time starts at 60 seconds, doubling each time when no data is found to sync,
until maxSleep is reached. The sleep time will reset to 60 seconds once data is found to sync.

`org.opencadc.caom2.metasync.dryrun={true|false}` when true the application
will only log, it will not sync collections. When false it will
sync the collections.

### cadcproxy.pem
pdowler marked this conversation as resolved.
Show resolved Hide resolved
Optional certificate in /config is used to authenticate https calls
to other services if challenged for a client certificate.
If cadcproxy.pem is not present, queries to the repository service
are made anonymously.


## building it
```
gradle clean build
docker build -t caom2-meta-sync -f Dockerfile .
```

## checking it
```
docker run -it caom2-meta-sync:latest /bin/bash
```

## running it
```
docker run --user opencadc:opencadc -v /path/to/external/config:/config:ro --name caom2-meta-sync caom2-meta-sync:latest
```

## apply version tags
```bash
. VERSION && echo "tags: $TAGS"
for t in $TAGS; do
docker image tag caom2-meta-sync:latest caom2-meta-sync:$t
done
unset TAGS
docker image list caom2-meta-sync
```
4 changes: 4 additions & 0 deletions caom2-meta-sync/VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
## deployable containers have a semantic and build tag
# semantic version tag: major.minor[.patch]
# build version tag: timestamp
TAGS="0.1-$(date --utc +"%Y%m%dT%H%M%S")"
37 changes: 37 additions & 0 deletions caom2-meta-sync/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
plugins {
id 'java'
id 'maven'
id 'application'
id 'checkstyle'
}

repositories {
mavenCentral()
mavenLocal()
}

sourceCompatibility = 1.8

group = 'org.opencadc'

description = 'OpenCADC CAOM Metadata Sync application'
def git_url = 'https://github.com/opencadc/caom2db'

mainClassName = 'org.opencadc.caom2.metasync.Main'

dependencies {
implementation 'org.opencadc:cadc-util:[1.6,2.0)'
implementation 'org.opencadc:caom2:[2.4.4,2.5)'
implementation 'org.opencadc:caom2persistence:[2.4.14,2.5)'
implementation 'org.opencadc:caom2-repo:[1.4,1.5)'

// needed to run plane metadata compute plugin (--compute)
implementation 'org.opencadc:caom2-compute:[2.4.6,2.5)'

// needed to run access-control regen plugin (--generate-ac)
implementation 'org.opencadc:caom2-access-control:[2.4,2.5)'

runtimeOnly 'org.postgresql:postgresql:[42.2,43.0)'
}

apply from: '../opencadc.gradle'
Loading
Loading