Skip to content

[GOBBLIN-25] Gobblin data-management run script and example configuration#1873

Merged
asfgit merged 1 commit into
apache:masterfrom
fermich:gobblin-data-retention-run
Jul 29, 2017
Merged

[GOBBLIN-25] Gobblin data-management run script and example configuration#1873
asfgit merged 1 commit into
apache:masterfrom
fermich:gobblin-data-retention-run

Conversation

@fermich
Copy link
Copy Markdown

@fermich fermich commented May 17, 2017

This pull request adds a run script for the DatasetCleanerJob class which is used in the gobblin-data-management module to run retention process. There is also example configuration to complement the retention module docs.

Copy link
Copy Markdown
Contributor

@ibuenros ibuenros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @fermich , thanks for setting up this script and example!

In terms of the script, I think it would be more appropriate to set up a CliApplication for this (see for example DecryptCli), which will also not require the Azkaban stuff.

# limitations under the License.
#

/tags/retention/hive
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could put this in /hive/includes.conf instead.

@fermich
Copy link
Copy Markdown
Author

fermich commented May 19, 2017

@ibuenros CliApplication looks nice. I have prepared implementation of cli runner but it is failing on getting cluster configuration. Should I submit the app like hadoop jar command does or
CliApplication requires a special environment when running it?
Link to source: https://github.com/fermich/gobblin/blob/dataset-cleaner-cli/gobblin-data-management/src/main/java/gobblin/runtime/retention/DatasetCleanerCli.java

@ibuenros
Copy link
Copy Markdown
Contributor

@fermich you need to make sure that environment variable HADOOP_HOME is defined. You can check the classpath Gobblin will use by running bin/gobblin classpath

@fermich
Copy link
Copy Markdown
Author

fermich commented May 22, 2017

@ibuenros After setting HADOOP_HOME it starts working like a charm! Thank you! Summing up, there are two additional changes:

  • DatasetCleaner starts via CliApplication
  • includes.conf is moved one directory up

@abti
Copy link
Copy Markdown
Member

abti commented Jul 27, 2017

Issue: https://issues.apache.org/jira/browse/GOBBLIN-25

Please update your PR title with following prefix: [GOBBLIN-25]

@fermich fermich changed the title Gobblin data-management run script and example configuration [GOBBLIN-25] Gobblin data-management run script and example configuration Jul 28, 2017
@fermich
Copy link
Copy Markdown
Author

fermich commented Jul 28, 2017

@abti done

@abti
Copy link
Copy Markdown
Member

abti commented Jul 29, 2017

Thanks.

+1 LGTM

@asfgit asfgit merged commit 5a896d2 into apache:master Jul 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants