Puppet module to deploy Cloudera Manager and Cloudera's Distribution, including Apache Hadoop (CDH).
Ruby Puppet Shell HTML
#35 Compare This branch is 364 commits ahead, 1 commit behind cloudera:master.

README.markdown

#Cloudera Manager

Build Status

####Table of Contents

  1. Overview
  2. Module Description - What the module does and why it is useful
  3. Setup - The basics of getting started with this module
  4. Usage - Configuration options and additional functionality
  5. Reference - An under-the-hood peek at what the module is doing and how
  6. Limitations - OS compatibility, etc.
  7. Development - Guide for contributing to the module

##Overview

This Puppet module manages the installation and configuration of Cloudera Manager, a management application for Apache Hadoop, on the Cloudera official supported operating systems.

##Module Description

This module manages the installation of Cloudera Manager, a management application for Apache Hadoop. It follows the standards written in the Cloudera Manager Installation Guide "Installation Path B - Installation Using Your Own Method". By default, this module assumes that parcels will be used to deploy Cloudera's Distribution of Apache Hadoop (CDH) and related software. If parcels are not desired, this module can also manage the installation of CDH including HDFS & MapReduce, Impala, Sentry, Search, Spark, HBase, and LZO compression. The module can also configure TLS security of the Cloudera Manager communications channels, and set up Cloudera Manager to use an alternative to the embedded database.

Cloudera Certified This module is certified on Cloudera 5.

##Setup

###What this module affects

  • Installs the Cloudera software repository for CM.
  • Installs Oracle Java Development Kit (JDK) 7.
  • Optionally installs the Oracle Java Cryptography Extensions.
  • Installs the CM agent.
  • Configures the CM agent to talk to a CM server.
  • Starts the CM agent.
  • Sets the kernel vm.swappiness to 0.
  • Disables the kernel transparent hugepage compaction.
  • Separately installs the CM server and database connectivity (by default to the embedded database server).
  • Separately starts the CM server.
  • Optionally installs the Cloudera software repository for CDH.
  • Optionally installs most components of CDH 5 including HBase, Impala, Search, and Spark.
  • Optionally installs GPL Extras (LZO).

###Requirements

Please read through the Cloudera Manager Requirements document in order to discover all of the entities (ie operating systems, databases, and browsers) supported by Cloudera Manager. Pay close attention to the Resource Requirements and Networking and Security Requirements sections. There are a number of requirements that this module cannot easily configure for your environment (ie No blocking by Security-Enhanced Linux (SELinux)) and which you must ensure are correct on your platform.

###Beginning with this module

Most nodes that will be a part of a Hadoop cluster will use this declaration.

class { '::cloudera':
  cm_server_host => 'smhost.localdomain',
}

The node that will be the CM server (ie smhost.localdomain) will use this declaration. This should only be included on one node of your environment. By default it will install the embedded PostgreSQL database on the same node. With the correct parameters, it can instead connect to local or remote MySQL, PostgreSQL, or Oracle RDBMS databases.

class { '::cloudera':
  cm_server_host   => 'smhost.localdomain',
  install_cmserver => true,
}

###Upgrading

####Deprecation Warning

  • The default for use_parcels will switch to true before the 1.0.0 release.

This:

class { '::cloudera':
  cm_server_host => 'smhost.localdomain',
}

would become this:

class { '::cloudera':
  cm_server_host => 'smhost.localdomain',
  use_parcels    => false,
}
  • The puppetlabs/mysql dependency will update to version 2 before the 1.0.0 release. Make sure to review its changelog in the case of an upgrade.

  • The class ::cloudera::repo will be renamed to ::cloudera::cdh::repo and the Impala repository will be split out into ::cloudera::impala::repo before the 1.0.0 release.

This:

class { '::cloudera::repo':
  cdh_version => '4.1',
  cm_version  => '4.1',
}

would become this:

class { '::cloudera::cdh::repo':
  version => '4.1',
}
class { '::cloudera::impala::repo':
  version => '4.1',
}
  • The class parameters and variables yumserver and yumpath have been renamed to reposerver and repopath respectively for the 2.0.0 release. This makes the name more generic as it applies to APT and Zypprepo as well as YUM package repositories.

This:

class { 'cloudera':
  cm_yumserver => 'http://packageserver.localdomain',
  cm_yumpath   => '/gplextras/',
}

would become this:

class { 'cloudera':
  cm_reposerver => 'http://packageserver.localdomain',
  cm_repopath   => '/gplextras/',
}
  • The use_gplextras parameter has been renamed to install_lzo for the 2.0.0 release.

This:

class { 'cloudera':
  cm_server_host => 'smhost.example.com',
  use_gplextras  => true,
}

would become this:

class { 'cloudera':
  cm_server_host => 'smhost.example.com',
  install_lzo    => true,
}

##Usage

All interaction with the cloudera module can be done through the main cloudera class. This means you can simply toggle the options in ::cloudera to have full functionality of the module.

###TLS Security Level 1: Configuring TLS Encryption only for Cloudera Manager

Level 2: Configuring TLS Authentication of Server to Agents

Level 3: Configuring TLS Authentication of Agents to Server

This module's deployment of TLS provides both level 1 and level 2 configuration (encryption and authentication of the server to the agents). Level 3 is not presently implemented. You will need to provide a TLS certificate and the signing certificate authority for the CM server. See the File resources in the below example for where the files need to be deployed.

There are some settings inside CM that can only be configured manually. See the Level 1 instructions for the details of what to change in the WebUI and use the below values:

Setting                       Value
Use TLS Encryption for Agents (check)
Path to TLS Keystore File     /etc/cloudera-scm-server/keystore
Keystore Password             The value of server_keypw in Class['::cloudera::cm5::server'].
Use TLS Encryption for        (check)
  Admin Console

The node that will be the CM agent may use this declaration:

class { '::cloudera':
  server_host => 'smhost.localdomain',
  use_tls     => true,
  install_jce => true,
}
file { '/etc/pki/tls/certs/cloudera_manager.crt': }

The node that will be the CM agent+server may use this declaration:

class { '::cloudera':
  server_host      => 'smhost.localdomain',
  install_cmserver => true,
  use_tls          => true,
  install_jce      => true,
  server_keypw     => 'myPassWord',
}
file { '/etc/pki/tls/certs/cloudera_manager.crt': }
file { '/etc/pki/tls/certs/cloudera_manager-ca.crt': }
file { "/etc/pki/tls/certs/${::fqdn}-cloudera_manager.crt": }
file { "/etc/pki/tls/private/${::fqdn}-cloudera_manager.key": }

###External Database

If you decide not to use the embedded database, the Cloudera Manager server database configuration can be completed by configuring this module to call the scm_prepare_database.sh script. The external database must be configured and ready for connection with the supplied credentials via some method outside of this module.

class { '::cloudera':
  cm_server_host   => 'smhost.localdomain',
  install_cmserver => true,
  db_type          => 'postgresql',
  db_host          => 'dbhost.localdomain',
  db_port          => '5432',
  db_user          => 'root',
  db_pass          => 'SeCrEt',
}

###Parcels

Parcel is an alternative binary distribution format supported by Cloudera Manager 4.5+ that simplifies distribution of CDH and other Cloudera products. By default, this module assumes software deployment of CDH via parcel. To allow Cloudera Manager to install CDH via RPMs (or DEBs) instead of parcels, just set use_parcels => false.

Nodes that will be cluster members will use this declaration:

class { '::cloudera':
  cm_server_host => 'smhost.localdomain',
  use_parcels    => false,
}

For more advanced use cases, nodes that will be gateways may use this declaration to install extra parts of CDH:

class { '::cloudera':
  cm_server_host => 'smhost.localdomain',
  use_parcels    => false,
}
class { '::cloudera::cdh5::mahout': }
class { '::cloudera::cdh5::kite': }
# Install Oozie WebUI support (optional):
class { '::cloudera::cdh5::oozie::ext': }
# Install MySQL support (optional):
class { '::cloudera::cdh5::hue::mysql': }
class { '::cloudera::cdh5::oozie::mysql': }

For more advanced use cases, the node that will be just the CM server may use this declaration: (This will skip installation of the CDH software as it is not required.)

class { '::cloudera::cm5::repo': } ->
class { '::cloudera::java5': } ->
class { '::cloudera::java5::jce': } ->
class { '::cloudera::cm5': } ->
class { '::cloudera::cm5::server': }

###LZO Compression

Hadoop-specific LZO compression libraries are available in the Cloudera GPL Extras repository. To deploy the Hadoop-specific and also the native libraries on a non-parcel system just add install_lzo => true to the class declaration. Additional configuration in Cloudera Manager will be required to activate the functionality (ignore the mention of parcels in the link to the documentation).

class { '::cloudera':
  cm_server_host => 'smhost.localdomain',
  use_parcels    => false,
  install_lzo    => true,
}

To deploy the native LZO compression libraries on a parcel system just add install_lzo => true to the class declaration. Additional configuration in Cloudera Manager will be required to activate the functionality.

class { '::cloudera':
  cm_server_host => 'smhost.localdomain',
  use_parcels    => true,
  install_lzo    => true,
}

##Reference

###Classes

####Public Classes

  • cloudera: Installs and configures Cloudera Manager. Includes most other classes.

####Private Classes

  • cloudera::java5: Installs the Oracle Java Development Kit (JDK) from the Cloudera Manager repository.
  • cloudera::java5::jce: Installs the Oracle Java Cryptography Extension (JCE) unlimited strength jurisdiction policy files.
  • cloudera::cm5
  • cloudera::cm5::repo
  • cloudera::cm5::server
  • cloudera::cdh5
  • cloudera::cdh5::repo
  • cloudera::gplextras5
  • cloudera::gplextras5::repo
  • cloudera::java: Installs the Oracle Java Development Kit (JDK) from the Cloudera Manager repository.
  • cloudera::java::jce: Installs the Oracle Java Cryptography Extension (JCE) unlimited strength jurisdiction policy files.
  • cloudera::cm
  • cloudera::cm::repo
  • cloudera::cm::server
  • cloudera::cdh
  • cloudera::cdh::repo
  • cloudera::gplextras
  • cloudera::gplextras::repo
  • cloudera::impala
  • cloudera::impala::repo
  • cloudera::search
  • cloudera::search::repo
  • cloudera::lzo

###Parameters

The following parameters are available in the cloudera module:

####ensure

Ensure if present or absent. Default: present

####autoupgrade

Upgrade package automatically, if there is a newer version. Default: false

####service_ensure

Ensure if service is running or stopped. Default: running

####service_enable

Start service at boot. Default: true

####cdh_reposerver

URI of the YUM server. Default: http://archive.cloudera.com

####cdh_repopath

The path to add to the $cdh_reposerver URI. Only set this if your platform is not supported or you know what you are doing. Default: auto-set, platform specific

####cdh_version

The version of Cloudera's Distribution, including Apache Hadoop to install. Default: 5

####cm_reposerver

URI of the YUM server. Default: http://archive.cloudera.com

####cm_repopath

The path to add to the $cm_reposerver URI. Only set this if your platform is not supported or you know what you are doing. Default: auto-set, platform specific

####cm_version

The version of Cloudera Manager to install. Default: 5

####cm5_repopath

The path to add to the $cm_reposerver URI. Only set this if your platform is not supported or you know what you are doing. Default: auto-set, platform specific

####ci_reposerver

URI of the YUM server. Default: http://archive.cloudera.com

####ci_repopath

The path to add to the $ci_reposerver URI. Only set this if your platform is not supported or you know what you are doing. Default: auto-set, platform specific

####ci_version

The version of Cloudera Impala to install. Default: 1

####cs_reposerver

URI of the YUM server. Default: http://archive.cloudera.com

####cs_repopath

The path to add to the $cs_reposerver URI. Only set this if your platform is not supported or you know what you are doing. Default: auto-set, platform specific

####cs_version

The version of Cloudera Search to install. Default: 1

####cg_reposerver

URI of the YUM server. Default: http://archive.cloudera.com

####cg_repopath

The path to add to the $cg_reposerver URI. Only set this if your platform is not supported or you know what you are doing. Default: auto-set, platform specific

####cg_version

The version of Cloudera Search to install. Default: 5

####cm_server_host

Hostname of the Cloudera Manager server. Default: localhost

####cm_server_port

Port to which the Cloudera Manager server is listening. Default: 7182

####use_tls

Whether to enable TLS on the Cloudera Manager server and agent. Default: false

####verify_cert_file

The file holding the public key of the Cloudera Manager server as well as the chain of signing certificate authorities. PEM format. Default: /etc/pki/tls/certs/cloudera_manager.crt or /etc/ssl/certs/cloudera_manager.crt

####use_parcels

Whether to install CDH software via parcels or packages. Default: true

####install_lzo

Whether to install the native LZO compression library packages. If use_parcels is false, then also install the Hadoop-specific LZO compression library packages. You must configure and deploy the GPLextras parcel repository if use_parcels is true. Default: false

####install_java

Whether to install the Cloudera supplied Oracle Java Development Kit. If this is set to false, then an Oracle JDK will have to be installed prior to applying this module. Default: true

####install_jce

Whether to install the Oracle Java Cryptography Extension unlimited strength jurisdiction policy files. This requires manual download of the zip file. See files/README_JCE.md for download instructions. Default: false

####install_cmserver

Whether to install the Cloudera Manager Server. This should only be set to true on one host in your environment. Default: false

####database_name

Name of the database to use for Cloudera Manager. Default: scm

####username

Name of the user to use to connect to database_name. Default: scm

####password

Password to use to connect to database_name. Default: scm

####db_host

Host to connect to for database_name. Default: localhost

####db_port

Port on db_host to connect to for database_name. Default: 3306

####db_user

Administrative database user on db_host. Default: root

####db_pass

Administrative database user db_user password. Default:

####db_type

Which type of database to use for Cloudera Manager. Valid options are embedded, mysql, oracle, or postgresql. Default: embedded

####server_ca_file

The file holding the PEM public key of the Cloudera Manager server certificate authority. Default: /etc/pki/tls/certs/cloudera_manager-ca.crt or /etc/ssl/certs/cloudera_manager-ca.crt

####server_cert_file

The file holding the PEM public key of the Cloudera Manager server. Default: /etc/pki/tls/certs/${::fqdn}-cloudera_manager.crt or /etc/ssl/certs/${::fqdn}-cloudera_manager.crt

####server_key_file

The file holding the PEM private key of the Cloudera Manager server. Default: /etc/pki/tls/private/${::fqdn}-cloudera_manager.key or /etc/ssl/private/${::fqdn}-cloudera_manager.key

####server_chain_file

The file holding the PEM public key(s) of the Cloudera Manager server intermediary certificate authority. Default: none

####server_keypw

The password used to protect the keystore. Default: none

####proxy

The URL to the proxy server for the YUM repositories. Default: absent

####proxy_username

The username for the YUM proxy. Default: absent

####proxy_password

The password for the YUM proxy. Default: absent

####parcel_dir

The directory where parcels are downloaded and distributed. Default: /opt/cloudera/parcels

##Limitations

###OS Support:

Cloudera official supported operating systems for CM4 and supported operating systems for CM5.

  • RedHat family - tested on CentOS 5.9, CentOS 6.4
  • SuSE family - tested on SLES 11SP3
  • Debian family - tested on Debian 6.0.7, Debian 7.0, Ubuntu 10.04.4 LTS, and Ubuntu 12.04.2 LTS

###Software Support:

  • Cloudera Manager - tested with 4.1.2, 4.8.0, and 5.0.0beta2
  • CDH - tested with 4.1.2 and 4.5.0, 5.0.0beta2
  • Cloudera Impala - tested with 1.0 and 1.2.3
  • Cloudera Search - tested with 1.1.0
  • Cloudera GPL Extras - tested with 4.3.0 and 5.0.0

###Notes:

  • Supports Top Scope variables (i.e. via Dashboard) and Parameterized Classes.
  • Based on the Cloudera Manager 5.0.0 Beta 2 Installation Guide
  • TLS certificates must be in PEM format and are not deployed by this module.
  • When using parcels, the CDH software is not deployed by Puppet. Puppet will only install the Cloudera Manager server/agent. You must then configure Cloudera Manager to deploy the parcels.
  • When installing packages and not parcels on SLES, SP2 is required as the hadoop-2.0.0+1518-1.cdh4.5.0.p0.24.sles11.x86_64 package requires netcat-openbsd which is not available on SLES 11SP1.
  • Osfamily RedHat 5 requires the EPEL YUM repository when installing LZO support.
  • This module does not support upgrading from CDH4 to CDH5 packages, including Impala, Search, and GPL Extras.

###Issues:

  • Need external module support for the Oracle Instant Client JDBC.
  • When using an external PostgreSQL server that is on the same host as the CM server, PostgreSQL must be configured to accept connections with md5 password authentication.
  • Osfamily RedHat 5 requires Python 2.6 from the EPEL YUM repository when installing the Hue service.

###TODO:

See TODO.md for more items.

##Development

Please see DEVELOP.md for information on how to contribute.

Copyright (C) 2013 Mike Arnold mike@razorsedge.org

Licensed under the Apache License, Version 2.0.

razorsedge/puppet-cloudera on GitHub

razorsedge/cloudera on Puppet Forge