Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: master
Fetching contributors…

Cannot retrieve contributors at this time

229 lines (152 sloc) 7.716 kb
Introduction
This is intended to get a secure 2.0-era Hadoop installed, from
source, in a purely-local configuration on your computer for
development purposes. It uses a private host-only network to run the
Hadoop network on. This allows you to work with secure Hadoop even when
your computer is not connected to the Internet. It requires that you
install a virtual machine running Linux to provide a DNS server
(named) and a Kerberos Key Distribution Center (KDC). The host computer
provides the virtualization environment, runs a NTP (network time
protocol) server. The latter service allows your VM to sync itself to
the correct (host-determined time and date) when you awaken your
computer from sleep. Finally, the host computer also runs the actual
Hadoop daemons.
For more information on compiling and running Hadoop, please also see:
http://ekoontz.github.com/hadoop/2012/02/27/hadoop-from-source/
Quick Start
Below, I use "(host)" and "(guest)" at the top of each step to clarify
which action should be done on your host computer or your guest VM,
respectively.
1. Set up a VM with Linux with a host-only network. (host)
VMWare, Virtual Box, or another virtualization product should
work. You should set up the VM so that the guest OS has access to two
networks. One will be a bridged network that allows the guest
OS to connect to the Internet so that it can install packages and
updates and access remote git software repositories.
The other is a host-only network on a private range
(e.g. 175.16.172.0/8). This will be the network that we will run
Hadoop on. All configuration below will be done with reference only
to this latter network.
2. Set up your hadoop-runtime dir (host)
cd
git clone git://git.apache.org/hadoop-common.git
cd hadoop-common
mvn clean package -DskipTests -Pdist
cd
ls hadoop-common/hadoop-dist/target/hadoop-*
There should be a directory whose name begins with hadoop that's returned
by this "ls". Find this suffix and use it below here:
ln -s hadoop-common/hadoop-dist/target/hadoop-$SUFFIX hadoop-runtime
3. Create hadoop-conf directory if you haven't already:
git clone https://github.com/ekoontz/hadoop-conf.git
4. On host computer, make sure ntpd is running (host)
Eugenes-MacBook-Pro:hadoop-conf ekoontz$ ps -ef | grep ntpd
0 81521 1 0 3:05PM ?? 0:01.03 /usr/sbin/ntpd -c /private/etc/ntp-restrict.conf -n -g -p /var/run/ntpd.pid -f /var/db/ntp.drift
The guest VM should be able to connect to the host computer's ntpd so
that the guest can keep its time in sync with the host, which is
important for Kerberos tickets to work correctly. See also step 7
below for ntp configuration on the guest VM.
5. Install krb5kdc, bind and ntpdate (guest)
[root@centos1 ekoontz]# cat /etc/ntp.conf | grep ^server
server 172.16.175.1
[root@centos1]# yum -y install krb5-server bind ntpdate
[root@centos1]# chkconfig ntpd off
[root@centos1]# chkconfig ntpdate on
We will use the host computer's ntp daemon to set the guest's date, so
we can shut down ntpd on the guest and start ntpdate (client-only).
You may also prefer to run the ntpd server on the guest, in which case
your configuration may differ.
6. Configure bind on guest VM (guest)
Add the following at the bottom of /etc/named.conf:
zone "local" {
type master;
file "local.db";
};
zone "175.16.172.in-addr.arpa" {
type master;
file "local.ptr.db";
};
Add the following as the file /var/named/local.db:
*note that I use 172.16.175.1 and 172.16.175.3 as the hostnames of the host computer and the guest VM, respectively, on the host-only network; substitute your own values*.
$TTL 1H
@ SOA LOCALHOST. foobar (1 1h 15m 30d 2h)
NS LOCALHOST.
eugenes-macbook-pro A 172.16.175.1
centos A 172.16.175.3
Add the following as the file /var/named/local.ptr.db:
*175.16.172 should be substituted for your host-only network most-significant bytes, reversed. So if your host-only network was on 1.2.3.x, you'd use 3.2.1.in-addr.arpa below*
$TTL 1H
$ORIGIN 175.16.172.in-addr.arpa.
@ SOA LOCALHOST. foobar (1 1h 15m 30d 2h)
NS LOCALHOST.
1 IN PTR eugenes-macbook-pro.local.
3 IN PTR centos.local.
7. On guest VM, make sure ntpdate is set to use IP of Host computer on host-only network (guest)
I recommend that you use *only* this server as the DNS server to simplify diagnostics.
[ekoontz@centos1 ~]$ cat /etc/ntp.conf | grep server
server 172.16.175.1
8. Start services on guest VM (guest)
[root@centos1]# chkconfig krb5kdc on
[root@centos1]# chkconfig named on
[root@centos1]# chkconfig ntpdate on
9. On guest VM, make sure bind and krb5kdc are running (guest)
[ekoontz@centos1 ~]$ ps -ef | grep krb5kdc
root 2477 1 0 11:39 ? 00:00:00 /usr/sbin/krb5kdc -P /var/run/krb5kdc.pid
[ekoontz@centos1 ~]$ ps -ef | grep bind
nobody 2815 1 0 11:39 ? 00:00:00 /usr/sbin/dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/libvirt/network/default.pid --conf-file= --except-interface lo --listen-address 192.168.122.1 --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override
10. Test guest-run named from host (host)
Both of the hostnames you added in step 6 should be resolvable:
$ dig @172.16.175.3 centos.local +short
172.16.175.3
$ dig @172.16.175.3 eugenes-macbook-pro.local +short
172.16.175.1
And reverse-resolvable:
$ dig @172.16.175.3 -x 172.168.175.3 +short
centos.local.
$ dig @172.16.175.3 -x 172.168.175.1 +short
eugenes-macbook.pro.local.
11. Setup
make install start
12. Test
make test
13. Diagnostics
make report
Notes:
If you do:
make login
and get something like:
kinit: krb5_get_init_creds: time skew (41863) larger than max (300)
, it is likely that your KDC's time is behind the Master's time.
Check using:
make report
and look for MASTER DATE: and DNS_SERVER DATE:. (Note that the report
shows DNS_SERVER, but if you are using this setup, the KDC should be
running on the same machine as the DNS_SERVER).
If the two DATEs are not in sync, the master's time will be the more
accurate since it is running directly on your computer, not on a VM.
To fix the date skew, run:
make sync
Then try "make report" to verify that the MASTER DATE: and DNS_SERVER
DATE: are the same (a second or two difference is ok). Now "make login"
should work also.
If the DNS_SERVER DATE is still out of sync, read below.
Syncing guest clock (guest vm)
[root@centos1]# sudo service ntpdate restart; date
You will need to do this every time your host computer is put to
sleep. Your host computer will sync its time automatically using its
built-in battery-powered clock (and using an external NTP server if
configured to do so and if it's connected to the Internet), but the
guest VM must be synced manually via the above step. (Perhaps there's
a way to have it automatically sync periodically but I am not aware of
it).
If you get an error message from ntpdate such as "no server suitable
for synchronization found", try running with -d for more information:
[root@centos1]# sudo ntpdate -d 172.16.175.1
Where "172.16.175.1" is the host-only IP of your host machine. If your
debugging output shows "172.16.175.1: Server dropped: strata too
high", then your host OS's ntpd server is probably not synced with its
own external time source. Try restarting ntpd on your host OS:
$ sudo killall ntpd
Wait a few seconds and make sure that ntpd is running on the host OS,
then try again on the guest OS to do:
[root@centos1]# sudo ntpdate -d 172.16.175.1
Jump to Line
Something went wrong with that request. Please try again.