Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Setup scripts for Hadoop 2 and later with Kerberos security enabled
XML Shell Ruby
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
templates
Makefile install zoo.cfg
README
container-log4j.properties add container-log4j.properties to conf files to be installed
eugene.yaml
example.yaml
hadoop-conf.sh
hadoop-env.sh
hadoop-metrics.properties
hadoop-metrics2.properties
hadoop-policy.xml hadoop-policy.xml needed with hadoop-2.0.3.
hdfs-test.sh
hostnames.mk
httpfs-env.sh
httpfs-log4j.properties
httpfs-signature.secret
httpfs-site.xml
krb5.conf
log4j.properties add workaround for HDFS-4302
nn-web.xml
principals.sh
rewrite-config.xsl
slaves
ssl-client.xml.example
ssl-server.xml.example
stop.sh
terms.rb
user-keytab.sh
yarn-env.sh fix path to hadoop-conf.sh
yarn-site.xml
zoo.cfg

README

Introduction

This is intended to get a secure 2.0-era Hadoop installed, from
source, in a purely-local configuration on your computer for
development purposes. It uses a private host-only network to run the
Hadoop network on. This allows you to work with secure Hadoop even when
your computer is not connected to the Internet.  It requires that you
install a virtual machine running Linux to provide a DNS server
(named) and a Kerberos Key Distribution Center (KDC). The host computer
provides the virtualization environment, runs a NTP (network time
protocol) server. The latter service allows your VM to sync itself to
the correct (host-determined time and date) when you awaken your
computer from sleep. Finally, the host computer also runs the actual
Hadoop daemons.

For more information on compiling and running Hadoop, please also see:

http://ekoontz.github.com/hadoop/2012/02/27/hadoop-from-source/

Quick Start

Below, I use "(host)" and "(guest)" at the top of each step to clarify
which action should be done on your host computer or your guest VM,
respectively.

1. Set up a VM with Linux with a host-only network. (host)

VMWare, Virtual Box, or another virtualization product should
work. You should set up the VM so that the guest OS has access to two
networks. One will be a bridged network that allows the guest
OS to connect to the Internet so that it can install packages and
updates and access remote git software repositories.

The other is a host-only network on a private range
(e.g. 175.16.172.0/8). This will be the network that we will run
Hadoop on.  All configuration below will be done with reference only
to this latter network.

2. Set up your hadoop-runtime dir (host)

  cd 
  git clone git://git.apache.org/hadoop-common.git
  cd hadoop-common
  mvn clean package -DskipTests -Pdist
  cd
  ls hadoop-common/hadoop-dist/target/hadoop-*

There should be a directory whose name begins with hadoop that's returned 
by this "ls". Find this suffix and use it below here:

  ln -s hadoop-common/hadoop-dist/target/hadoop-$SUFFIX hadoop-runtime

3. Create hadoop-conf directory if you haven't already:

  git clone https://github.com/ekoontz/hadoop-conf.git


4. On host computer, make sure ntpd is running (host)

  Eugenes-MacBook-Pro:hadoop-conf ekoontz$ ps -ef | grep ntpd
    0 81521     1   0  3:05PM ??         0:01.03 /usr/sbin/ntpd -c /private/etc/ntp-restrict.conf -n -g -p /var/run/ntpd.pid -f /var/db/ntp.drift

The guest VM should be able to connect to the host computer's ntpd so
that the guest can keep its time in sync with the host, which is
important for Kerberos tickets to work correctly. See also step 7
below for ntp configuration on the guest VM.

5. Install krb5kdc, bind and ntpdate (guest)

 [root@centos1 ekoontz]# cat /etc/ntp.conf | grep ^server
server 172.16.175.1
  [root@centos1]# yum -y install krb5-server bind ntpdate
  [root@centos1]# chkconfig ntpd off
  [root@centos1]# chkconfig ntpdate on

We will use the host computer's ntp daemon to set the guest's date, so
we can shut down ntpd on the guest and start ntpdate (client-only).

You may also prefer to run the ntpd server on the guest, in which case
your configuration may differ.

6. Configure bind on guest VM (guest)

Add the following at the bottom of /etc/named.conf:

  zone "local" {
       type master;
       file "local.db";
  };
  zone "175.16.172.in-addr.arpa" {
       type master;
       file "local.ptr.db";
  };

Add the following as the file /var/named/local.db:
*note that I use 172.16.175.1 and 172.16.175.3 as the hostnames of the host computer and the guest VM, respectively, on the host-only network; substitute your own values*.

  $TTL 1H
  @    SOA LOCALHOST. foobar (1 1h 15m 30d 2h)
       NS LOCALHOST.
  eugenes-macbook-pro  A	172.16.175.1
  centos	       A	172.16.175.3


Add the following as the file /var/named/local.ptr.db:
*175.16.172 should be substituted for your host-only network most-significant bytes, reversed. So if your host-only network was on 1.2.3.x, you'd use 3.2.1.in-addr.arpa below*

  $TTL 1H
  $ORIGIN 175.16.172.in-addr.arpa.
  @    SOA LOCALHOST. foobar (1 1h 15m 30d 2h)
       NS LOCALHOST.
  1			IN PTR	eugenes-macbook-pro.local.
  3    			IN PTR	centos.local.

7. On guest VM, make sure ntpdate is set to use IP of Host computer on host-only network (guest)
I recommend that you use *only* this server as the DNS server to simplify diagnostics.

  [ekoontz@centos1 ~]$ cat /etc/ntp.conf | grep server

  server 172.16.175.1

8. Start services on guest VM (guest)

  [root@centos1]# chkconfig krb5kdc on
  [root@centos1]# chkconfig named on
  [root@centos1]# chkconfig ntpdate on

9. On guest VM, make sure bind and krb5kdc are running (guest)

   [ekoontz@centos1 ~]$ ps -ef | grep krb5kdc
root      2477     1  0 11:39 ?        00:00:00 /usr/sbin/krb5kdc -P /var/run/krb5kdc.pid

  [ekoontz@centos1 ~]$ ps -ef | grep bind
  nobody    2815     1  0 11:39 ?        00:00:00 /usr/sbin/dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/libvirt/network/default.pid --conf-file= --except-interface lo --listen-address 192.168.122.1 --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override

10. Test guest-run named from host (host)

Both of the hostnames you added in step 6 should be resolvable:

   $ dig @172.16.175.3 centos.local +short
   172.16.175.3

   $ dig @172.16.175.3 eugenes-macbook-pro.local +short
   172.16.175.1

And reverse-resolvable:

   $ dig @172.16.175.3 -x 172.168.175.3 +short
   centos.local.

   $ dig @172.16.175.3 -x 172.168.175.1 +short
   eugenes-macbook.pro.local.

11. Setup

  make install start

12. Test

  make test


13. Diagnostics

  make report

Notes:

If you do:
  
  make login

and get something like:

  kinit: krb5_get_init_creds: time skew (41863) larger than max (300)

, it is likely that your KDC's time is behind the Master's time. 

Check using:

  make report

and look for MASTER DATE: and DNS_SERVER DATE:. (Note that the report
shows DNS_SERVER, but if you are using this setup, the KDC should be
running on the same machine as the DNS_SERVER).

If the two DATEs are not in sync, the master's time will be the more
accurate since it is running directly on your computer, not on a VM.

To fix the date skew, run:

   make sync

Then try "make report" to verify that the MASTER DATE: and DNS_SERVER
DATE: are the same (a second or two difference is ok). Now "make login"
should work also.

If the DNS_SERVER DATE is still out of sync, read below.

Syncing guest clock (guest vm)

  [root@centos1]# sudo service ntpdate restart; date

You will need to do this every time your host computer is put to
sleep. Your host computer will sync its time automatically using its
built-in battery-powered clock (and using an external NTP server if
configured to do so and if it's connected to the Internet), but the
guest VM must be synced manually via the above step. (Perhaps there's
a way to have it automatically sync periodically but I am not aware of
it).

If you get an error message from ntpdate such as "no server suitable
for synchronization found", try running with -d for more information:

   [root@centos1]# sudo ntpdate -d 172.16.175.1

Where "172.16.175.1" is the host-only IP of your host machine. If your
debugging output shows "172.16.175.1: Server dropped: strata too
high", then your host OS's ntpd server is probably not synced with its
own external time source. Try restarting ntpd on your host OS:

   $ sudo killall ntpd

Wait a few seconds and make sure that ntpd is running on the host OS, 
then try again on the guest OS to do:

   [root@centos1]# sudo ntpdate -d 172.16.175.1

Something went wrong with that request. Please try again.