Skip to content

Oozie coordinator job to create a DwC-A Checklist of all names that GBIF has type specimens for

Notifications You must be signed in to change notification settings

gbif/type-specimen-checklist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

type-specimen-checklist

This project covers the Java classes and Oozie workflow and coordinator job needed to create a Darwin Core Checklist Archive for all distinct names that GBIF has type specimens for on a regular basis. Only names that are parsable with the GBIF Name Parser are included to exclude problematic names.

How to run the Oozie workflow once

To run the workflow use the script install-workflow ./install-workflow.sh dev gitOAuthToken which requires 2 command line parameters:

The configuration file used by this workflow requires the following settings:

# hdfs
namenode=hdfs://ha-nn
jobtracker=prodmaster3-vh.gbif.org:8032
# oozie
oozie.url=http://oozie.gbif.org:11000/oozie
oozie.use.system.libpath=true
oozie.libpath=hdfs://ha-nn/type-specimen-dwca-builder-uat/lib/
oozie.launcher.mapreduce.task.classpath.user.precedence=true
user.name=yarn
environment=uat
# in days
coordFrequency=7

# hive
hiveDB=prod_b
hiveMetastore=thrift://prodmaster1-vh.gbif.org:9083

# dwca
dwca.file=/occurrence-download/uat-downloads/gbif-type-specimen-checklist.zip

Installing the Oozie coordinator job

The same workflow can be executed as an Oozie coordinator job running on a regular basis (see coordFrequency config property) using the same configs as for the workflow above. To install the coordinator job, removing any previous coordinator job, use the script install-coordinator.sh ./install-coordinator.sh dev gitOAuthToken which again requires the 2 command line parameters described above.

About

Oozie coordinator job to create a DwC-A Checklist of all names that GBIF has type specimens for

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published