Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

We’re showing branches in this repository, but you can also compare across forks.

base fork: facebookarchive/hadoop-20
base: e162b0dbe5
...
head fork: facebookarchive/hadoop-20
compare: c50b7e148d
Commits on Jul 07, 2012
rvadali Prevent unnecessary DNS lookups
Summary:
1. Do not use InetSocketAddress as the key for the cache: the constructor does DNS lookups.
   Just use a string of the form host:port as the key
2. CoronaJobTrakcer.getTaskTrackerClient should call
   TrackerClientCache.getClient, not TrackerClientCache.createClient

Test Plan:
unit tests

Revert Plan:

Reviewers: dms, aching, pyang

Reviewed By: dms

Task ID: 1148515
e3c2ab5
dikang [raid] make the DistributedRaidFileSystem rename the parity file toge…
…ther with source file.

Summary: Codec should be initialized correctly.

Reviewers: weiyan, hkuang

Reviewed By: hkuang
493b58b
pritam Fix a bug in safe block counting for RBWs.
Summary:
There was a bug in the safe block count while processing RBWs.
If an RBW block was reported twice by the same datanode while the NN was
in safemode, the NN would incorrectly increment the safe block count
twice.

Test Plan:
1) Unit test added
2) All unit tests.

Revert Plan:

Reviewers: hkuang, tomasz, weiyan

Reviewed By: hkuang

Task ID: 1144182
840cdde
dragos DFSOutputStream.closeThreads now throws InterruptedIOException
Summary:
Changed the DFSOutputStream.closeThreads method to throw an
InterruptedIOException instead of an IOException.

Test Plan: By running "ant test-hdfs".

Reviewers: sdong, hkuang, pkhemani

Reviewed By: pkhemani

CC: pkhemani

Task ID: 1109322
10ebe79
dragos Adding an ODS counter for FSCK operations
Summary:
Added in NamenodeMetrics a new TimeVaryingLongMetric
for each fsck() method call.

Test Plan:
Running "ant test-dfs" (still in progress).
Updating this section after more testing will be done

Reviewers: sdong, hkuang

Reviewed By: sdong

Task ID: 1152886
7b0c8e9
sdong DFSClient doesn't retry block recovery on "block already committed ex…
…ception"

Summary: Block recovery will never successful if a block is already committed. It's just a waste of time for clients to retry. By passing this information from namenodes all the way to clients, clients can avoid some retries.

Test Plan: ant test

Reviewers: hkuang, pkhemani

Reviewed By: hkuang

Task ID: 1135383
f71ae4f
hkuang Add a raidshell command to raid a list of paths
Summary:
Add a command in this format:
-distRaid <raid_policy_name> <path1> ... <pathn>

that submit a map/reduce job to raid the given list of paths. A path could be a file or a directory. if it is a directory, only its children files are going to be raided.

Test Plan: Add a unit test in TestRaidShell

Reviewers: weiyan, dikang

Reviewed By: weiyan
b5bac3b
sdong HDFS: recounting INode count and disk space before exiting safe mode
Summary:
Standby Namenode's disk space quota is not accurate, as it misses addAdditionalBlock operation compared to primary. So the number needs to be recount before it becomes primary.

One thing I'm not sure is the cost of the method. Shall we complicate the codes a little bit to avoid one unnecessary recounting? Or we can do it as a follow-up.

Test Plan: Deploy in cluster.

Reviewers: hkuang, tomasz

Reviewed By: hkuang

Task ID: 1127167
88b31ba
pritam Fix FastCopy bug regarding aligning datanodes for locality.
Summary:
FastCopy had a bug where it sorted both datanode lists for
locality. This would cause problems in cases where we had src = {b, a
  , c} and dst = {b, c, d }. The final two lists would be :

a, b, c and
b, c, d

instead of something like

b, c, a
b, c, d

So the copies would incorrectly be performed from a->b, b->c and c->d.

Test Plan:
Unit tests added.

Revert Plan:

Reviewers: hkuang, sdong

Reviewed By: hkuang
0c1cbff
rvadali Attempts for a task should go to different threads.
Summary:
The assignment of task attempt -> worker thread in CoronaTaskLauncher
should be such that different attempts of the task go to different
threads.

Test Plan:
unit tests

Revert Plan:

Reviewers: dms, aching, pyang

Reviewed By: dms

Task ID: 1148515
8754f5c
rvadali Prevent CM -> Job Tracker timeouts.
Summary:
The cluster manager has a timeout on getting a response from the job
tracker on thrift calls. If the job tracker is doing something time
intensive and is not able to complete the thrift call, it can be timed
out.

This patch tries to prevent this situation from occuring. We add a
SessionDriverService.Iface implementation that simply queues up the
calls from the Cluster Manager during the thrift call. This will
guarantee quick response times to the cluster manager. There is a
separate thread to take the calls from the queue and execute them
in-order.

Keeping the calls in the same queue is important since that will reflect
the sequence of calls that the Cluster Manager makes.

Test Plan:
unit tests

Revert Plan:

Reviewers: dms, aching, pyang

Reviewed By: dms
db64d91
aching NioServerSocketChannelFactory needs the maximum number of worker thre…
…ads.

Summary: The worker thread factory had enough maximum threads, but netty assumes number of cores * 2.  This is likely not adequate for our clusters.

Test Plan: Passed all corona unittests.

Reviewers: dms, rvadali, pyang

Reviewed By: rvadali

CC: mattwkelly

Task ID: 1152668
bc5947b
pritam Make FastCopy work with Avatar.
Summary:
FastCopy was broken even with regular Avatar (no failovers)
  since it relied on the uri from the filesystem interface to determine
  the namenode address. In case of Avatar this uri was the logical name
  and not the real physical address of the machine. I've changed
  FastCopy such that it works with AvatarNodes before and after a
  failover. FastCopy still doesn't work during a failover, but I feel
  this is not a necessity right now since FastCopy is used as part of MR
  jobs which can retry failed jobs during a failover and then correctly
  run FastCopy again after a failover is done.

Test Plan:
1) TestFastCopy*
2) TestAvatarFastCopy

Revert Plan:

Reviewers: hkuang, sdong

Reviewed By: hkuang

Task ID: 784627
62cc922
tomasz Change exception type from Runtime to IOException in Standby
Summary:
It looks like in the testDoubleFailover, the rollFSImage is sometimes
succeeding, but since we are performing failover, the responder might
already by down. It's safer to thhrow IOException in the diff'ed line,
since this will not result in standby being shutdown.

Test Plan: ant test TestAvatarTxIds

Reviewers: hkuang, pritam

Reviewed By: pritam
0ba84de
hkuang Allow -distRaid command to raid all files in the subtree
Summary:
- distRaid raid all files in the subtree rooted at the given directory
- add a test case for this
- validate input paths

Test Plan: TestRaidShell

Reviewers: weiyan

Reviewed By: weiyan
f954128
rvadali For tests, reduce netty threadpool size
Summary: Use 10 threads per netty server. Otherwise we are getting OOM

Test Plan:
unit tests

Revert Plan:

Reviewers: dikang

Reviewed By: dikang
34d80ef
rvadali Fix build and use static constant
Summary: Fix build and use static constant

Test Plan:
ran MiniCoronaRunJob

Revert Plan:

Reviewers: dikang

Reviewed By: dikang
cc6a14d
rvadali Fix bug in detecting dead job tracker.
Summary:
The job tracker reporter thread in the corona task tracker should
periodically attempt a heartbeat to the corona job tracker to detect
a dead job tracker. The job tracker reporter has logic to heartbeat
every 3 minutes even if there are no tasks running for that job, but
there is a bug in that logic which is fixed by this diff.

Also:
1. Set a name for the job tracker reporter thread. This will help in
   debugging.
1. prefix log messages from job tracker reporter with the reporter name.

Test Plan:
unit tests

Revert Plan:

Reviewers: dms, aching, pyang

Reviewed By: dms
f01e2af
rvadali Make bin/hadoop/job -killJob work
Summary:
Make CoronaJobTracker.killJob() handle job IDs that belong to some other
job. We can then use CoronaClient to kill the session.
While testing this I found that we always get a session ID when we
create a CoronaJobTracker object, even if all we do is kill another job.
So in this change, I also changed the logic so that getNewJobID() does
the work of getting a new session ID.

Test Plan:
ran unit tests, tested bin/hadoop/job -killJob on a sleep job

Revert Plan:

Reviewers: dms, aching, pyang

Reviewed By: dms

Task ID: 1140268
577b96a
pritam Remove synchronization from getListenerAddress.
Summary:
The primary avatar had a bug where it would enter a deadlock
since Server.waitForHandlers() is synchronized and would take the lock
and wait for all handlers and then a RPC handler could call
getListenerAddress() and wait for the lock on the Server object. Looks
like getListenerAddress() should not be synchronized since it does not
modify any state.

Test Plan: 1) All unit tests.

Reviewers: hkuang, weiyan, tomasz

Reviewed By: hkuang

Task ID: 1155907
a2c6a8b
pritam Testcase to reproduce Standby Failure
Test Plan: Adding a unit test.

Reviewers: hkuang

Reviewed By: hkuang

Task ID: 1136885
7ea9178
pritam Option to force failover for Avatar.
Summary:
Sometimes the primary Avatar does not shutdown cleanly and we
need an option to force a failover without any transaction id checking.
This can be done under a case where an operator can inspect the edit
files on the filer and the local disk and then force a failover without
any checking.

Test Plan:
1) Unit tests added.
2) All avatar unit tests.

Revert Plan:

Reviewers: hkuang, tomasz, weiyan

Reviewed By: hkuang
af1226b
dikang [dfsclient] DFSClient will determine the retry times according to blo…
…ck's replication factor.

Summary:
So, for Reed-solomon raided files with replication factor 1, we will
only retry 1 time when we choose the datanode. This will speed up Raid read reconstruction.

Test Plan: test locally.

Reviewers: hkuang, weiyan, sdong

Reviewed By: hkuang
8c79547
tomasz Add checkpoint status to webui
Summary:
This patch adds checkpoint status at the standby to dfsclusterhealth web
ui, as well as the ingest status. We might want to clean the
primary/standby specific information and put it on the corresponding
dfshealth site.

Test Plan: manual

Reviewers: hkuang, weiyan, pritam

Reviewed By: weiyan
9ea5270
gauravmenghani Putting ClusterManager into Safe Mode
Summary:
For now, we are able to set and unset the Safe Mode. We can fire the command to coronaadmin to set the safe mode, which in turn, asks the ClusterManager to go
into Safe Mode. The ClusterManager throws SafeModeExceptions on any new requests while it is in Safe Mode, and communicates to the ProxyJobTracker that it is going into
Safe Mode.

CoronaJobTrackers upon contacting the ClusterManager come to know that it is in Safe Mode. If they are not able to connect to the ClusterManager, while it is
down for an upgrade, they try the ProxyJobTracker to find if the ClusterManager has gone down for an upgrade. While it is up, the ClusterManager does not time out any of
the sessions and the nodes.

Test Plan:
Turn on the Safe Mode by:
hadoop coronaadmin -setSafeMode

Turn off the Safe Mode by:
hadoop coronaadmin -unsetSafeMode

1. If you turn on the Safe Mode while a job is running, it will pause and wait for the CM to come back up in the normal state. If you turn off the Safe Mode without
bringing down the ClusterManager, the execution will resume normally. Otherwise, currently the job fails because an InvalidSessionHandle exception in thrown, which will be
resolved once we persist the state of the ClusterManager and restore it when it comes back up.

2. If a job is started after the Safe Mode is turned on, it will wait for the CM to come back up in the normal state.

Reviewers: dms, aching, rvadali

Reviewed By: dms

Task ID: 1112019
f6505cf
weiyan [Raid Dir-Raid] Web UI changes for directory raid
Summary:
For directory raid, we may be more interested in the number of RAIDED,
NOT_RAIDED_TOO_NEW, NOT_RAIDED_TOO_SMALL, NOT_RAIDED_BUT_SHOULD directories
than that of files.

Change the webui to adopt the codec design and counters to include the counter of
directory.

The counters will be updated by the StatisticsCollector, which is another task.

Test Plan: deploy

Reviewers: dikang, hkuang

Reviewed By: hkuang

Task ID: 1078275
a91ff6f
weiyan Fix incompile code 2d2f94f
weiyan [Raid Dir-Raid] PlacementMonitor changes
Summary:
When PurgeMonitor scans the parity files, it will get the source file/directory
for each parity file and then compute the histogram of the blocks for them.
Change placementMonitor to get the block information for directory and
refactor BlockAndDatanodeResolver to work with multiple files.

Test Plan: Add a unit test testDirectoryPlacementMonitor

Reviewers: dikang, hkuang

Reviewed By: hkuang

Task ID: 1078264
75a164f
weiyan [Raid Dir-Raid] More tests in TestRaidNode and fix TriggerMonitor
Summary:
In the TriggerMonitor, when we select raid candidates based on the filelist,
we should verify if parity files exist when source files' replication already
reaches the targetreplication.

Change TestRaidNode to test the situation where we have directory raid and
file raid policies at the same time.

Test Plan: ant test

Reviewers: dikang, hkuang

Reviewed By: hkuang

Task ID: 1078000
d00755a
dikang [raid] the DistributedRaidFileSystem should deal with multi codec in …
…one directory and the har directories.

Summary:
If a directory contains the files raided using both XOR and RS, we
should move all the parity files together with the source directory,
like the delete and undelete methods.

For the har one, we will throw the IOException.

Test Plan:
ant test, run locally on dev server.

will come up a unit test for the har one.

Reviewers: weiyan, hkuang

Reviewed By: weiyan
a57e68a
hkuang Make sure to bring down AvatarDataNode when DataXceiverServer is down
Summary:
When DataXceiverService goes down, it sets DataNode#shouldRun to be true.
This diff makes sure that ServicePair threads go down as well so DataNode will go down.
I also removed a variable "shutdown" instead use shouldRun consistently everywhere.
Added a unit test that injects a RuntimeException to DataXeiverServer thread and makes sure that datanode goes down.

Test Plan: TestAvatarCleanShutdown

Reviewers: tomasz

Reviewed By: tomasz

Task ID: 1114372
373374b
rvadali Map tasks do not need to connect to corona JT directly.
Summary:
The purpose of DirectTaskUmbilicalProtocol is to enable reducers to get
task completion events directly from the corona job tracker.
The mappers do not need to go through this. If they do, a large number
of concurrently mappers would start connecting to the corona JT and they
could get connection errors because that would overwhelm the corona JT

Test Plan: deploy

Reviewers: gauravmenghani, aching, pyang, dms

Task ID: 1166249
e8cdcfb
pritam Fix TestAvatarFailover timeouts.
Summary:
TestAvatarFailover had an issue in double failovers where when
we restart the standby it might backoff full block reports and due to a
large block report delay, the blocks wouldn't be reported in time and
the standby would not be able to failover. Reducing the full block
report delay for the double failvoer tests.

Test Plan: Run it

Reviewers: tomasz, hkuang

Reviewed By: hkuang
2fa229a
dikang [Raid DFSClient] fix the DFSClient retries unit test.
Summary:
Since we changed the retry in DFSClient, we need to change to unit
tests too.

Test Plan: unit test

Reviewers: hkuang, sdong, weiyan, pritam

Reviewed By: pritam
c50b7e1