Configuration files

dsarmientos edited this page Sep 1, 2012 · 3 revisions

If they exists, Dumbo will read the config files /etc/dumbo.conf and ~/.dumborc (the latter overrides the former when certain configuration settings appear in both). These files are INI-files that can contain the following sections:

  • [common]: default options for all commands
  • [start]: default options for the dumbo start command
  • [cat]: default options for the dumbo cat command
  • [ls]: default options for the dumbo ls command
  • [rm]: default options for the dumbo rm command
  • [get]: default options for the dumbo put command
  • [put]: default options for the dumbo get command
  • [unix]: default options for programs that run locally using UNIX pipes
  • [streaming]: default options for programs that run on Hadoop Streaming
  • [hadoops]: abbreviations for values of -hadoop
  • [eggs]: abbreviations for values of -libegg
  • [jars]: abbreviations for values of -libjar
  • [inputformats]: abbrevations for values of -inputformat

Here is an example:

[common]
python: python2.5
libegg_1: utils
libegg_2: parse
libjar_1: core
libjar_2: charts

[streaming]
name: %(prog)s-%(output)s

[hadoops]
herd1: /usr/local/hadoop-herd1
herd2: /usr/local/hadoop-herd2

[jars]
core: /usr/local/dumbo/jars/core.jar
charts: /usr/local/dumbo/jars/charts.jar

[eggs]
utils: /usr/local/dumbo/eggs/utils-0.2-py2.5.egg
parse: /usr/local/dumbo/eggs/parse-0.1-py2.5.egg

[inputformats]
customfile: com.mycompany.hadoop.mapred.CustomFileInputFormat

As illustrated by this example, the problem of multiple settings requiring the same key can be avoided by appending _<index> to the keys.