Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Setup scripts for Hadoop 2 and later with Kerberos security enabled
XML Shell Ruby
Pull request Compare This branch is 1 commit ahead, 112 commits behind master.
Fetching latest commit...
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



This is intended to get a secure Hadoop installed in a purely-local
configuration on your computer for development purposes. It allows you
to work with secure Hadoop even when your computer is not connected to
the Internet.  It requires that you install a virtual machine running
Linux to provide a DNS server (named) and a Kerberos Key Distribution
Center. The host computer provides the virtualization environment as
well as serving as an NTP (network time protocol) server. This latter
service allows your VM to sync itself to the correct (host-determined
time and date) when you awaken your computer from sleep.  

Quick Start

Below, I use "(host)" and "(guest)" to clarify which action should be
done on your host computer or your guest VM, respectively.

1. Set up a VM with Linux with a host-only network. 

VMWare, Virtual Box, or another virtualization product should
work. You should set up the VM so that the guest OS has access to two
networks. One will be a bridged network that allows the guest
OS to connect to the Internet so that it can install packages and
updates. The other is a host-only network on a private range

2. Set up your hadoop-runtime dir (host)

  git clone git://
  cd hadoop-common
  mvn package -Pdist
  ls hadoop-common/hadoop-dist/target/hadoop-*

There should be a directory whose name begins with hadoop that's returned 
by this "ls". Find this suffix and use it below here:

  ln -s hadoop-common/hadoop-dist/target/hadoop-$SUFFIX hadoop-runtime

3. Create Kerberos Principals (guest)

  Git clone hadoop-conf on your guest VM.

  Within your top-level hadoop-conf dir:

  make principals

You may also do:

   make principals OPT1=VAL1 .. OPTn=VALn

where OPT=VAL pairs look like these examples
 (defaults shown):

   MASTER=`hostname -f`

4. On host computer, make sure ntpd is running (host)

  Eugenes-MacBook-Pro:hadoop-conf ekoontz$ ps -ef | grep ntpd
    0 81521     1   0  3:05PM ??         0:01.03 /usr/sbin/ntpd -c /private/etc/ntp-restrict.conf -n -g -p /var/run/ -f /var/db/ntp.drift

The guest VM should be able to connect to the host computer's ntpd so
that the guest can keep its time in sync with the host, which is
important for Kerberos tickets to work correctly. See also step 7 below for ntp configuration on the guest VM.

5. On guest VM, install and start bind, krb5kdc, and ntpdate (guest)

  [root@centos1]# yum -y install krb5-server bind ntpdate

6. Configure named on guest VM (guest)

Add the following at the bottom of /etc/named.conf:

  zone "local" {
       type master;
       file "local.db";
  zone "" {
       type master;
       file "local.ptr.db";

Add the following as the file /var/named/local.db:
*note that I use and as the hostnames of the host computer and the guest VM, respectively, on the host-only network; substitute your own values*.

  $TTL 1H
  @    SOA LOCALHOST. foobar (1 1h 15m 30d 2h)
  eugenes-macbook-pro  A
  centos		     A

Add the following as the file /var/named/local.ptr.db:
*175.16.172 should be substituted for your host-only network most-significant bytes, reversed. So if your host-only network was on 1.2.3.x, you'd use below*

  $TTL 1H
  @    SOA LOCALHOST. foobar (1 1h 15m 30d 2h)
  1			IN PTR	eugenes-macbook-pro.local.
  3    			IN PTR	centos.local.

7. On guest VM, make sure ntpdate is set to use IP of Host computer on host-only network (guest)

  [ekoontz@centos1 ~]$ cat /etc/ntp.conf | grep server


8. Start services on guest VM (guest)

  [root@centos1]# chkconfig krb5kdc on
  [root@centos1]# chkconfig bind on
  [root@centos1]# chkconfig ntpdate on

9. On guest VM, make sure bind, krb5kdc are running (guest)

   [ekoontz@centos1 ~]$ ps -ef | grep krb5kdc
root      2477     1  0 11:39 ?        00:00:00 /usr/sbin/krb5kdc -P /var/run/

  [ekoontz@centos1 ~]$ ps -ef | grep bind
  nobody    2815     1  0 11:39 ?        00:00:00 /usr/sbin/dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/libvirt/network/ --conf-file= --except-interface lo --listen-address --dhcp-range, --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override

10. Edit and's DNS_SERVERS: set to IP of VM on host-only network (host)

  export DNS_SERVERS=

11. Edit $HOME/hadoop-conf/krb5.conf: set kdc and admin_server to IP of VM on host-only network (host)

    #change to ip or hostname of your kdc
    kdc =
    #change to ip or hostname of your admin_server
    admin_server =

12. Install Hadoop configuration in hadoop-runtime (host)

  make install

13. Sync guest clock (guest)

  [root@centos1]# sudo service ntpdate restart; date

You will need to do this every time your host computer is put to
sleep. Your host computer will sync its time automatically using its
built-in battery-powered clock, but the guest VM must be synced
manually via the above step. (Perhaps there's a way to have it
automatically sync periodically but I am not aware of it).

13. Start Hadoop (host)

(You will eventually want to run this in a screen session after
doing it a few times).

  make start

14. Test (host)

Run a simple Map-Reduce job through the Hadoop runtime.

  make test
Something went wrong with that request. Please try again.