Multi-tenant Hadoop cluster deployment with HA and Kerberos in 1 command
Shell PHP CoffeeScript Python Dockerfile Perl Other
Clone or download
Permalink
Failed to load latest commit information.
activemq/server src: remove label literals Dec 8, 2017
ambari service: action renamed state Aug 3, 2018
atlas nikita: type renamed to action Jul 19, 2018
benchmark ssh: move from options to action Jan 7, 2018
bin normalize: refactor nodes as instances Dec 8, 2017
cloudera-manager nikita: update log calls Jul 19, 2018
collectd collectd: fix migration Feb 27, 2018
commons monit: fix service j2 template Jun 1, 2018
docs src: rename mecano to nikita Mar 4, 2017
druid druid: filter prepare on one node Jul 19, 2018
elasticsearch es/filebeat: added zookeeper logs Apr 26, 2018
esdocker Update log4j2.properties Jun 29, 2018
flume migration: masson migration on its way Dec 8, 2017
grafana grafana.webui: make cluster_name mandatory Jul 19, 2018
hadoop hadoop: nikita 2nd arg is an object Aug 2, 2018
hbase hbase.master: put back 2.5.3 package before upgrade code Jul 19, 2018
hdf ssh: move from options to action Jan 7, 2018
hdp ssh: move from options to action Jan 7, 2018
hive service: action renamed state Aug 3, 2018
hue service: action renamed state Aug 3, 2018
huedocker huedocker: fix path dependency Jul 19, 2018
incubator/nebula ssh: move from options to action Jan 7, 2018
janusgraph nikita: update log calls Jul 19, 2018
kafka kafka.client: make check pass Jul 19, 2018
knox knox.server: fix template options Apr 27, 2018
lib hdp_select: fix nikita refactor Aug 13, 2018
log4j log4j, metrics: finish migration Dec 8, 2017
mahout migration: masson migration on its way Dec 8, 2017
metrics log4j, metrics: finish migration Dec 8, 2017
mongodb ssh: move from options to action Jan 7, 2018
nagvis nagvis: msson2 refactor Dec 27, 2017
nifi nifi: fix typo Feb 5, 2018
oozie oozie.client: replace fqdn with hostname for beeline Jul 19, 2018
opentsdb opentsdb: masson2 ready Jan 11, 2018
phoenix nikita: type renamed to action Jul 19, 2018
pig nikita: update log calls Jul 19, 2018
prometheus add prometheus jmx_exporters Jul 31, 2018
ranger ranger: nikita fix 2nd argument Aug 3, 2018
redis redis/master: refresh since migration Feb 2, 2018
resources default.hive-site: remove default properties Jan 8, 2018
retired service: action renamed state Aug 3, 2018
rexster src: remove label literals Dec 8, 2017
shinken shinken/poller: fix options.group merge May 17, 2018
smartsense src: remove label literals Dec 8, 2017
solr solr: add missing resources Jul 19, 2018
spark service: action renamed state Aug 3, 2018
spark2/client migration: masson migration on its way Dec 8, 2017
sqoop migration: masson migration on its way Dec 8, 2017
swarm swarm add options to daemons and fix external zookeeper wait Apr 23, 2018
test nikita: rename then to next Dec 16, 2017
tez migration: masson migration on its way Dec 8, 2017
titan nikita: update log calls Jul 19, 2018
zeppelin ssh: move from options to action Jan 7, 2018
zookeeper zookeeper.service: make kerberos optional Mar 26, 2018
.gitignore package: ignore yarn lock Dec 8, 2017
CHANGELOG.md Bump to version 0.1.1 Aug 6, 2018
CONVETION.md package: latest dependencies Jul 19, 2018
LICENSE.md license: title Apr 13, 2015
README.md package: improve project description Dec 8, 2017
package-lock.json Bump to version 0.1.1 Aug 6, 2018
package.json Bump to version 0.1.1 Aug 6, 2018

README.md

Ryba boostraps and manages a full secured Hadoop cluster with one command. This is an Open-source software (OSS) project released under the new BSD license originally developed for one of the World largest utility company. It is used every day to manager and keep to date the cluster for every components.

Ryba is our answer to DevOps integration need for product delivery and quality testing. It provides the flexibilty to answer the demand of your internal information technology (IT) operations team. It is written in JavaScript and CoffeeScript to facilitate and accelerate feature developments and maintenance releases. The language encourages self-documented code, look by yourself the source code deploying two NameNode configured with HA.

Install Ryba locally or on a remote server and you are ready to go. It uses SSH to connect to each server of your cluster and it will install and check all the components you wish. You don't need to prepare your cluster nodes as long as a minimal installation of RHEL or CentOS is installed with a root user or a user with sudo access.

Ryba motivations

  • Use secured comminication with SSH
  • No database used, full distribution across multiple servers relying on GIT
  • No agent or pre-installation required on your cluster nodes
  • Version control all your configuration and modifications with GIT and NPM, the Node.js Package Manager
  • Command-based to integrate with your Business Continuity Plan (BCP) and existing scripts
  • For developer, as simple as learning Node.js and not a new framework
  • Self-documented code written in Literate CoffeeScript
  • Idempotent and executable on a running cluster without any negative impact

Ryba features

  • Bootstrap the nodes from a fresh install
  • Configure proxy environment if needed
  • Optionnaly create a bind server (useful in Vagrant development environment)
  • Install OpenLDAP and Kerberos and/or integrate with your existing infrastructure
  • Deploy the latest Hortonworks Data Platform (HDP)
  • Setup High Availabity for HDFS
  • Integrate Kerberos with cross realm support
  • Set IPTables rules and startup scripts
  • Check the running components
  • Provide convenient utilities such as global start/stop/status commands, distributed shell execution, ...

Installation

Node.js

First download Node.js. You might need to adjust the name of the Node.js archive depending on the version you choose to install. Also, replace the path "/usr/local/node" to another location (eg "~/node") if you don't have the permission to write inside "/usr/local".

# Download the Node.js package
wget --no-check-certificate https://nodejs.org/download/release/v6.2.2/node-v6.2.2-linux-x64.tar.gz
# Extract the Node.js package
tar xzf node-v6.2.2-linux-x64.tar.gz
# Move Node.js into its final destination
sudo mv node-v6.2.2-linux-x64 /usr/local/node
# Add path to Node.js binary
echo 'export PATH=/usr/local/node/bin:$PATH' >> ~/.bashrc
# Source the update profile
. ~/.bashrc
# Check if node is installed
node -v
# Clean up uploaded archive
rm -rf node-v6.2.2-linux-x64.tar.gz

If you are behind a proxy, configure the Node.js Pakage Manager (NPM) with the commands:

npm config set proxy http://proxy.company.com:8080
npm config set https-proxy http://proxy.company.com:8080

Ryba

Run npm install to download the project dependencies.

Security

  • Authentication Ryba configures every components to work with Kerberos when possible. All the components listed above (except Elasticsearch, MongoDB, Nagios in community version) does support Kerberos.

  • Authorization Since Ryba does support Apache Ranger, you can manage easily the Access Control List from Ranger Admin. Indeed Apache Ranger provides support for ACL administration for the main Big Data components under the Apache project.

  • Encryption Ryba configures TLS/SSL encryption for every service. You can generate (see an example on https://github.com/ryba-io/ryba-env-metal) or provide your own certificate, and Ryba will upload the certificates on the nodes and configure the components.

At the end of the ryba installation, you have a full Kerberized cluster with SSL encryption enabled.

High Availability

Ryba does configure every service with High Availibity, if the service supports it. It does the configuration according to the layout of the cluster. Just define where you want the service to be installed, and Ryba does every step like installing, starting and checking.

Check

Ryba has a check command which run components, to verifiy that it is rightly configured and running. Check can be port binding verification (for example port 50470 for the Hadoop HDFS Namenode), or complete functional test like launching mapreduce jobs on YARN.

Contributors