Permalink
Switch branches/tags
jenkins-tomk-hadoop-1 jenkins-tomas_jenkins-7 jenkins-tomas_jenkins-6 jenkins-tomas_jenkins-5 jenkins-tomas_jenkins-4 jenkins-tomas_jenkins-3 jenkins-tomas_jenkins-2 jenkins-tomas_jenkins-1 jenkins-sample-docs-3 jenkins-sample-docs-2 jenkins-sample-docs-1 jenkins-rel-vapnik-1 jenkins-rel-vajda-4 jenkins-rel-vajda-3 jenkins-rel-vajda-2 jenkins-rel-vajda-1 jenkins-rel-ueno-9 jenkins-rel-ueno-8 jenkins-rel-ueno-7 jenkins-rel-ueno-6 jenkins-rel-ueno-5 jenkins-rel-ueno-4 jenkins-rel-ueno-3 jenkins-rel-ueno-2 jenkins-rel-ueno-1 jenkins-rel-tverberg-6 jenkins-rel-tverberg-5 jenkins-rel-tverberg-4 jenkins-rel-tverberg-3 jenkins-rel-tverberg-2 jenkins-rel-tverberg-1 jenkins-rel-tutte-2 jenkins-rel-tutte-1 jenkins-rel-turnbull-2 jenkins-rel-turnbull-1 jenkins-rel-turing-10 jenkins-rel-turing-9 jenkins-rel-turing-8 jenkins-rel-turing-7 jenkins-rel-turing-6 jenkins-rel-turing-5 jenkins-rel-turing-4 jenkins-rel-turing-3 jenkins-rel-turing-2 jenkins-rel-turing-1 jenkins-rel-turin-4 jenkins-rel-turin-3 jenkins-rel-turin-2 jenkins-rel-turin-1 jenkins-rel-turchin-11 jenkins-rel-turchin-10 jenkins-rel-turchin-9 jenkins-rel-turchin-8 jenkins-rel-turchin-7 jenkins-rel-turchin-6 jenkins-rel-turchin-5 jenkins-rel-turchin-4 jenkins-rel-turchin-3 jenkins-rel-turchin-2 jenkins-rel-turchin-1 jenkins-rel-turan-4 jenkins-rel-turan-3 jenkins-rel-turan-2 jenkins-rel-turan-1 jenkins-rel-tukey-6 jenkins-rel-tukey-5 jenkins-rel-tukey-4 jenkins-rel-tukey-3 jenkins-rel-tukey-2 jenkins-rel-tukey-1 jenkins-rel-tibshirani-12 jenkins-rel-tibshirani-11 jenkins-rel-tibshirani-10 jenkins-rel-tibshirani-9 jenkins-rel-tibshirani-8 jenkins-rel-tibshirani-7 jenkins-rel-tibshirani-5 jenkins-rel-tibshirani-4 jenkins-rel-tibshirani-3 jenkins-rel-tibshirani-2 jenkins-rel-tibshirani-1 jenkins-rel-slotnick-1 jenkins-rel-slater-9 jenkins-rel-slater-8 jenkins-rel-slater-7 jenkins-rel-slater-6 jenkins-rel-slater-5 jenkins-rel-slater-4 jenkins-rel-slater-3 jenkins-rel-slater-2 jenkins-rel-slater-1 jenkins-rel-simons-7 jenkins-rel-simons-6 jenkins-rel-simons-5 jenkins-rel-simons-4 jenkins-rel-simons-3 jenkins-rel-simons-2 jenkins-rel-simons-1 jenkins-rel-shannon-30 jenkins-rel-shannon-29
Nothing to show
Find file
Fetching contributors…
Cannot retrieve contributors at this time
108 lines (72 sloc) 6.19 KB

#How to Launch H2O-Dev from the Command Line

You can use Terminal (OS X) or the Command Prompt (Windows) to launch H2O-Dev. When you launch from the command line, you can include additional instructions to H2O-Dev, such as how many nodes to launch, how much memory to allocate for each node, assign names to the nodes in the cloud, and more.

There are two different argument types:

  • JVM arguments
  • H2O arguments

The arguments use the following format: java <JVM Options> -jar h2o.jar <H2O Options>.

##JVM Options

  • -version: Display Java version info.
  • -Xmx<Heap Size>: To set the total heap size for an H2O node, configure the memory allocation option -Xmx. By default, this option is set to 1 Gb (-Xmx1g). When launching nodes, we recommend allocating a total of four times the memory of your data.

Note: Do not try to launch H2O with more memory than you have available.

##H2O Options

  • h or -help: Display this information in the command line output.
  • -name <H2O-DevCloudName>: Assign a name to the H2O instance in the cloud (where <H2O-DevCloudName> is the name of the cloud. Nodes with the same cloud name will form an H2O cloud (also known as an H2O cluster).
  • -flatfile <FileName>: Specify a flatfile of IP address for faster cloud formation (where <FileName> is the name of the flatfile.
  • -ip <IPnodeAddress>: Specify an IP address other than the default localhost for the node to use (where <IPnodeAddress> is the IP address).
  • -port <#>: Specify a port number other than the default 54321 for the node to use (where <#> is the port number).
  • -network <IPv4NetworkSpecification1>[,<IPv4NetworkSpecification2> ...]: Specify a range (where applicable) of IP addresses (where <IPv4NetworkSpecification1> represents the first interface, <IPv4NetworkSpecification2> represents the second, and so on). The IP address discovery code binds to the first interface that matches one of the networks in the comma-separated list. For example, 10.1.2.0/24 supports 256 possibilities.
  • -ice_root <fileSystemPath>: Specify a directory for H2O to spill temporary data to disk (where <fileSystemPath> is the file path).
  • -flow_dir <server-side or HDFS directory>: Specify a directory for saved flows. The default is /Users/h2o-<H2OUserName>/h2oflows (where <H2OUserName> is your user name).
  • nthreads <#ofThreads>: Specify the maximum number of threads in the low-priority batch work queue (where <#ofThreads> is the number of threads). The default is 99.
  • -client: Launch H2O node in client mode. This is used mostly for running Sparkling Water.

##Cloud Formation Behavior

New H2O nodes join to form a cloud during launch. After a job has started on the cloud, it prevents new members from joining.

  • To start an H2O node with 4GB of memory and a default cloud name: java -Xmx4g -jar h2o.jar

  • To start an H2O node with 6GB of memory and a specific cloud name: java -Xmx6g -jar h2o.jar -name MyCloud

  • To start an H2O cloud with three 2GB nodes using the default cloud names: java -Xmx2g -jar h2o.jar & java -Xmx2g -jar h2o.jar & java -Xmx2g -jar h2o.jar &

Wait for the INFO: Registered: # schemas in: #mS output before entering the above command again to add another node (the number for # will vary).

##Flatfile Configuration

If you are configuring many nodes, it is faster and easier to use the -flatfile option, rather than -ip and -port.

To configure H2O-Dev on a multi-node cluster:

  1. Locate a set of hosts.
  2. Download the appropriate version of H2O-Dev for your environment.
  3. Verify that the same h2o.jar file is available on all hosts.
  4. Create a flatfile (a plain text file with the IP and port numbers of the hosts). Use one entry per line. For example:
    
    192.168.1.163:54321
    192.168.1.164:54321   
    
    
  5. Copy the flatfile.txt to each node in the cluster.
  6. Use the -Xmx option to specify the amount of memory for each node. The cluster's memory capacity is the sum of all H2O nodes in the cluster.

For example, if you create a cluster with four 20g nodes (by specifying -Xmx20g four times), H2O will have a total of 80 gigs of memory available.

For best performance, we recommend sizing your cluster to be about four times the size of your data. To avoid swapping, the -Xmx allocation must not exceed the physical memory on any node. Allocating the same amount of memory for all nodes is strongly recommended, as H2O-Dev works best with symmetric nodes.

Note the optional -ip and -port options specify the IP address and ports to use. The -ip option is especially helpful for hosts with multiple network interfaces.

java -Xmx20g -jar h2o.jar -flatfile flatfile.txt -port 54321

The output will resemble the following:

```
04-20 16:14:00.253 192.168.1.70:54321    2754   main      INFO:   1. Open a terminal and run 'ssh -L 55555:localhost:54321 H2O-DevUser@###.###.#.##'
04-20 16:14:00.253 192.168.1.70:54321    2754   main      INFO:   2. Point your browser to http://localhost:55555
04-20 16:14:00.437 192.168.1.70:54321    2754   main      INFO: Log dir: '/tmp/h2o-H2O-DevUser/h2ologs'
04-20 16:14:00.437 192.168.1.70:54321    2754   main      INFO: Cur dir: '/Users/H2O-DevUser/h2o-dev'
04-20 16:14:00.459 192.168.1.70:54321    2754   main      INFO: HDFS subsystem successfully initialized
04-20 16:14:00.460 192.168.1.70:54321    2754   main      INFO: S3 subsystem successfully initialized
04-20 16:14:00.460 192.168.1.70:54321    2754   main      INFO: Flow dir: '/Users/H2O-DevUser/h2oflows'
04-20 16:14:00.475 192.168.1.70:54321    2754   main      INFO: Cloud of size 1 formed [/192.168.1.70:54321]
```

As you add more nodes to your cluster, the output is updated: INFO WATER: Cloud of size 2 formed [/...]...

  1. Access the H2O-Dev web UI (Flow) with your browser. Point your browser to the HTTP address specified in the output Listening for HTTP and REST traffic on ....

To check if the cloud is available, point to the url http://<ip>:<port>/Cloud.json (an example of the JSON response is provided below). Wait for cloud_size to be the expected value and the consensus field to be true:

{
...

"cloud_size": 2,
"consensus": true,

...
}