Permalink
Switch branches/tags
Nothing to show
Find file
Fetching contributors…
Cannot retrieve contributors at this time
8553 lines (5900 sloc) 336 KB
Hadoop Change Log
Release 0.20.1 - Unreleased
INCOMPATIBLE CHANGES
HADOOP-5726. Remove pre-emption from capacity scheduler code base.
(Rahul Kumar Singh via yhemanth)
HADOOP-5881. Simplify memory monitoring and scheduling related
configuration. (Vinod Kumar Vavilapalli via yhemanth)
NEW FEATURES
HADOOP-6080. Introduce -skipTrash option to rm and rmr.
(Jakob Homan via shv)
HADOOP-3315. Add a new, binary file foramt, TFile. (Hong Tang via cdouglas)
IMPROVEMENTS
HADOOP-5711. Change Namenode file close log to info. (szetszwo)
HADOOP-5736. Update the capacity scheduler documentation for features
like memory based scheduling, job initialization and removal of pre-emption.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-4674. Fix fs help messages for -test, -text, -tail, -stat
and -touchz options. (Ravi Phulari via szetszwo)
HADOOP-4372. Improves the way history filenames are obtained and manipulated.
(Amar Kamat via ddas)
HADOOP-5897. Add name-node metrics to capture java heap usage.
(Suresh Srinivas via shv)
HDFS-438. Improve help message for space quota command. (Raghu Angadi)
MAPREDUCE-767. Remove the dependence on the CLI 2.0 snapshot.
(Amar Kamat via ddas)
HDFS-1111. Introduce getCorruptFileBlocks() for fsck. (Sriram Rao via shv)
OPTIMIZATIONS
BUG FIXES
HADOOP-5691. Makes org.apache.hadoop.mapreduce.Reducer concrete class
instead of abstract. (Amareshwari Sriramadasu via sharad)
HADOOP-5646. Fixes a problem in TestQueueCapacities.
(Vinod Kumar Vavilapalli via ddas)
HADOOP-5655. TestMRServerPorts fails on java.net.BindException. (Devaraj
Das via hairong)
HADOOP-5654. TestReplicationPolicy.<init> fails on java.net.BindException.
(hairong)
HADOOP-5688. Fix HftpFileSystem checksum path construction. (Tsz Wo
(Nicholas) Sze via cdouglas)
HADOOP-5213. Fix Null pointer exception caused when bzip2compression
was used and user closed a output stream without writing any data.
(Zheng Shao via dhruba)
HADOOP-5718. Remove the check for the default queue in capacity scheduler.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-5719. Remove jobs that failed initialization from the waiting queue
in the capacity scheduler. (Sreekanth Ramakrishnan via yhemanth)
HADOOP-4744. Attaching another fix to the jetty port issue. The TaskTracker
kills itself if it ever discovers that the port to which jetty is actually
bound is invalid (-1). (ddas)
HADOOP-5349. Fixes a problem in LocalDirAllocator to check for the return
path value that is returned for the case where the file we want to write
is of an unknown size. (Vinod Kumar Vavilapalli via ddas)
HADOOP-5636. Prevents a job from going to RUNNING state after it has been
KILLED (this used to happen when the SetupTask would come back with a
success after the job has been killed). (Amar Kamat via ddas)
HADOOP-5641. Fix a NullPointerException in capacity scheduler's memory
based scheduling code when jobs get retired. (yhemanth)
HADOOP-5828. Use absolute path for mapred.local.dir of JobTracker in
MiniMRCluster. (yhemanth)
HADOOP-4981. Fix capacity scheduler to schedule speculative tasks
correctly in the presence of High RAM jobs.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-5210. Solves a problem in the progress report of the reduce task.
(Ravi Gummadi via ddas)
HADOOP-5850. Fixes a problem to do with not being able to jobs with
0 maps/reduces. (Vinod K V via ddas)
HADOOP-5728. Fixed FSEditLog.printStatistics IndexOutOfBoundsException.
(Wang Xu via johan)
HADOOP-4626. Correct the API links in hdfs forrest doc so that they
point to the same version of hadoop. (szetszwo)
HADOOP-5883. Fixed tasktracker memory monitoring to account for
momentary spurts in memory usage due to java's fork() model.
(yhemanth)
HADOOP-5539. Fixes a problem to do with not preserving intermediate
output compression for merged data.
(Jothi Padmanabhan and Billy Pearson via ddas)
HADOOP-5932. Fixes a problem in capacity scheduler in computing
available memory on a tasktracker.
(Vinod Kumar Vavilapalli via yhemanth)
HADOOP-5648. Fixes a build issue in not being able to generate gridmix.jar
in hadoop binary tarball. (Giridharan Kesavan via gkesavan)
HADOOP-5908. Fixes a problem to do with ArithmeticException in the
JobTracker when there are jobs with 0 maps. (Amar Kamat via ddas)
HADOOP-5924. Fixes a corner case problem to do with job recovery with
empty history files. Also, after a JT restart, sends KillTaskAction to
tasks that report back but the corresponding job hasn't been initialized
yet. (Amar Kamat via ddas)
HADOOP-5882. Fixes a reducer progress update problem for new mapreduce
api. (Amareshwari Sriramadasu via sharad)
HADOOP-5746. Fixes a corner case problem in Streaming, where if an exception
happens in MROutputThread after the last call to the map/reduce method, the
exception goes undetected. (Amar Kamat via ddas)
HADOOP-5884. Fixes accounting in capacity scheduler so that high RAM jobs
take more slots. (Vinod Kumar Vavilapalli via yhemanth)
HADOOP-5937. Correct a safemode message in FSNamesystem. (Ravi Phulari
via szetszwo)
HADOOP-5869. Fix bug in assignment of setup / cleanup task that was
causing TestQueueCapacities to fail.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-5921. Fixes a problem in the JobTracker where it sometimes never used
to come up due to a system file creation on JobTracker's system-dir failing.
This problem would sometimes show up only when the FS for the system-dir
(usually HDFS) is started at nearly the same time as the JobTracker.
(Amar Kamat via ddas)
HADOOP-5920. Fixes a testcase failure for TestJobHistory.
(Amar Kamat via ddas)
HDFS-26. Better error message to users when commands fail because of
lack of quota. Allow quota to be set even if the limit is lower than
current consumption. (Boris Shkolnik via rangadi)
MAPREDUCE-2. Fixes a bug in KeyFieldBasedPartitioner in handling empty
keys. (Amar Kamat via sharad)
MAPREDUCE-130. Delete the jobconf copy from the log directory of the
JobTracker when the job is retired. (Amar Kamat via sharad)
MAPREDUCE-657. Fix hardcoded filesystem problem in CompletedJobStatusStore.
(Amar Kamat via sharad)
MAPREDUCE-179. Update progress in new RecordReaders. (cdouglas)
MAPREDUCE-124. Fix a bug in failure handling of abort task of
OutputCommiter. (Amareshwari Sriramadasu via sharad)
HADOOP-6139. Fix the FsShell help messages for rm and rmr. (Jakob Homan
via szetszwo)
HADOOP-6141. Fix a few bugs in 0.20 test-patch.sh. (Hong Tang via
szetszwo)
HADOOP-6145. Fix FsShell rm/rmr error messages when there is a FNFE.
(Jakob Homan via szetszwo)
MAPREDUCE-565. Fix partitioner to work with new API. (Owen O'Malley via
cdouglas)
MAPREDUCE-465. Fix a bug in MultithreadedMapRunner. (Amareshwari
Sriramadasu via sharad)
MAPREDUCE-18. Puts some checks to detect cases where jetty serves up
incorrect output during shuffle. (Ravi Gummadi via ddas)
MAPREDUCE-735. Fixes a problem in the KeyFieldHelper to do with
the end index for some inputs (Amar Kamat via ddas)
HADOOP-6150. Users should be able to instantiate comparator using TFile
API. (Hong Tang via rangadi)
MAPREDUCE-383. Fix a bug in Pipes combiner due to bytes count not
getting reset after the spill. (Christian Kunz via sharad)
MAPREDUCE-40. Keep memory management backwards compatible for job
configuration parameters and limits. (Rahul Kumar Singh via yhemanth)
MAPREDUCE-796. Fixes a ClassCastException in an exception log in
MultiThreadedMapRunner. (Amar Kamat via ddas)
HDFS-525. The SimpleDateFormat object in ListPathsServlet is not thread
safe. (Suresh Srinivas via szetszwo)
MAPREDUCE-838. Fixes a problem in the way commit of task outputs
happens. The bug was that even if commit failed, the task would
be declared as successful. (Amareshwari Sriramadasu via ddas)
MAPREDUCE-805. Fixes some deadlocks in the JobTracker due to the fact
the JobTracker lock hierarchy wasn't maintained in some JobInProgress
method calls. (Amar Kamat via ddas)
HDFS-167. Fix a bug in DFSClient that caused infinite retries on write.
(Bill Zeller via szetszwo)
HDFS-527. Remove unnecessary DFSClient constructors. (szetszwo)
MAPREDUCE-832. Reduce number of warning messages printed when
deprecated memory variables are used. (Rahul Kumar Singh via yhemanth)
MAPREDUCE-745. Fixes a testcase problem to do with generation of JobTracker
IDs. (Amar Kamat via ddas)
MAPREDUCE-834. Enables memory management on tasktrackers when old
memory management parameters are used in configuration.
(Sreekanth Ramakrishnan via yhemanth)
MAPREDUCE-818. Fixes Counters#getGroup API. (Amareshwari Sriramadasu
via sharad)
MAPREDUCE-807. Handles the AccessControlException during the deletion of
mapred.system.dir in the JobTracker. The JobTracker will bail out if it
encounters such an exception. (Amar Kamat via ddas)
HADOOP-6213. Remove commons dependency on commons-cli2. (Amar Kamat via
sharad)
MAPREDUCE-430. Fix a bug related to task getting stuck in case of
OOM error. (Amar Kamat via ddas)
Release 0.20.0 - 2009-04-15
INCOMPATIBLE CHANGES
HADOOP-4210. Fix findbugs warnings for equals implementations of mapred ID
classes. Removed public, static ID::read and ID::forName; made ID an
abstract class. (Suresh Srinivas via cdouglas)
HADOOP-4253. Fix various warnings generated by findbugs.
Following deprecated methods in RawLocalFileSystem are removed:
public String getName()
public void lock(Path p, boolean shared)
public void release(Path p)
(Suresh Srinivas via johan)
HADOOP-4618. Move http server from FSNamesystem into NameNode.
FSNamesystem.getNameNodeInfoPort() is removed.
FSNamesystem.getDFSNameNodeMachine() and FSNamesystem.getDFSNameNodePort()
replaced by FSNamesystem.getDFSNameNodeAddress().
NameNode(bindAddress, conf) is removed.
(shv)
HADOOP-4567. GetFileBlockLocations returns the NetworkTopology
information of the machines where the blocks reside. (dhruba)
HADOOP-4435. The JobTracker WebUI displays the amount of heap memory
in use. (dhruba)
HADOOP-4628. Move Hive into a standalone subproject. (omalley)
HADOOP-4188. Removes task's dependency on concrete filesystems.
(Sharad Agarwal via ddas)
HADOOP-1650. Upgrade to Jetty 6. (cdouglas)
HADOOP-3986. Remove static Configuration from JobClient. (Amareshwari
Sriramadasu via cdouglas)
JobClient::setCommandLineConfig is removed
JobClient::getCommandLineConfig is removed
JobShell, TestJobShell classes are removed
HADOOP-4422. S3 file systems should not create bucket.
(David Phillips via tomwhite)
HADOOP-4035. Support memory based scheduling in capacity scheduler.
(Vinod Kumar Vavilapalli via yhemanth)
HADOOP-3497. Fix bug in overly restrictive file globbing with a
PathFilter. (tomwhite)
HADOOP-4445. Replace running task counts with running task
percentage in capacity scheduler UI. (Sreekanth Ramakrishnan via
yhemanth)
HADOOP-4631. Splits the configuration into three parts - one for core,
one for mapred and the last one for HDFS. (Sharad Agarwal via cdouglas)
HADOOP-3344. Fix libhdfs build to use autoconf and build the same
architecture (32 vs 64 bit) of the JVM running Ant. The libraries for
pipes, utils, and libhdfs are now all in c++/<os_osarch_jvmdatamodel>/lib.
(Giridharan Kesavan via nigel)
HADOOP-4874. Remove LZO codec because of licensing issues. (omalley)
HADOOP-4970. The full path name of a file is preserved inside Trash.
(Prasad Chakka via dhruba)
HADOOP-4103. NameNode keeps a count of missing blocks. It warns on
WebUI if there are such blocks. '-report' and '-metaSave' have extra
info to track such blocks. (Raghu Angadi)
HADOOP-4783. Change permissions on history files on the jobtracker
to be only group readable instead of world readable.
(Amareshwari Sriramadasu via yhemanth)
HADOOP-5531. Removed Chukwa from Hadoop 0.20.0. (nigel)
NEW FEATURES
HADOOP-4575. Add a proxy service for relaying HsftpFileSystem requests.
Includes client authentication via user certificates and config-based
access control. (Kan Zhang via cdouglas)
HADOOP-4661. Add DistCh, a new tool for distributed ch{mod,own,grp}.
(szetszwo)
HADOOP-4709. Add several new features and bug fixes to Chukwa.
Added Hadoop Infrastructure Care Center (UI for visualize data collected
by Chukwa)
Added FileAdaptor for streaming small file in one chunk
Added compression to archive and demux output
Added unit tests and validation for agent, collector, and demux map
reduce job
Added database loader for loading demux output (sequence file) to jdbc
connected database
Added algorithm to distribute collector load more evenly
(Jerome Boulon, Eric Yang, Andy Konwinski, Ariel Rabkin via cdouglas)
HADOOP-4179. Add Vaidya tool to analyze map/reduce job logs for performanc
problems. (Suhas Gogate via omalley)
HADOOP-4029. Add NameNode storage information to the dfshealth page and
move DataNode information to a separated page. (Boris Shkolnik via
szetszwo)
HADOOP-4348. Add service-level authorization for Hadoop. (acmurthy)
HADOOP-4826. Introduce admin command saveNamespace. (shv)
HADOOP-3063 BloomMapFile - fail-fast version of MapFile for sparsely
populated key space (Andrzej Bialecki via stack)
HADOOP-1230. Add new map/reduce API and deprecate the old one. Generally,
the old code should work without problem. The new api is in
org.apache.hadoop.mapreduce and the old classes in org.apache.hadoop.mapred
are deprecated. Differences in the new API:
1. All of the methods take Context objects that allow us to add new
methods without breaking compatability.
2. Mapper and Reducer now have a "run" method that is called once and
contains the control loop for the task, which lets applications
replace it.
3. Mapper and Reducer by default are Identity Mapper and Reducer.
4. The FileOutputFormats use part-r-00000 for the output of reduce 0 and
part-m-00000 for the output of map 0.
5. The reduce grouping comparator now uses the raw compare instead of
object compare.
6. The number of maps in FileInputFormat is controlled by min and max
split size rather than min size and the desired number of maps.
(omalley)
HADOOP-3305. Use Ivy to manage dependencies. (Giridharan Kesavan
and Steve Loughran via cutting)
IMPROVEMENTS
HADOOP-4565. Added CombineFileInputFormat to use data locality information
to create splits. (dhruba via zshao)
HADOOP-4749. Added a new counter REDUCE_INPUT_BYTES. (Yongqiang He via
zshao)
HADOOP-4234. Fix KFS "glue" layer to allow applications to interface
with multiple KFS metaservers. (Sriram Rao via lohit)
HADOOP-4245. Update to latest version of KFS "glue" library jar.
(Sriram Rao via lohit)
HADOOP-4244. Change test-patch.sh to check Eclipse classpath no matter
it is run by Hudson or not. (szetszwo)
HADOOP-3180. Add name of missing class to WritableName.getClass
IOException. (Pete Wyckoff via omalley)
HADOOP-4178. Make the capacity scheduler's default values configurable.
(Sreekanth Ramakrishnan via omalley)
HADOOP-4262. Generate better error message when client exception has null
message. (stevel via omalley)
HADOOP-4226. Refactor and document LineReader to make it more readily
understandable. (Yuri Pradkin via cdouglas)
HADOOP-4238. When listing jobs, if scheduling information isn't available
print NA instead of empty output. (Sreekanth Ramakrishnan via johan)
HADOOP-4284. Support filters that apply to all requests, or global filters,
to HttpServer. (Kan Zhang via cdouglas)
HADOOP-4276. Improve the hashing functions and deserialization of the
mapred ID classes. (omalley)
HADOOP-4485. Add a compile-native ant task, as a shorthand. (enis)
HADOOP-4454. Allow # comments in slaves file. (Rama Ramasamy via omalley)
HADOOP-3461. Remove hdfs.StringBytesWritable. (szetszwo)
HADOOP-4437. Use Halton sequence instead of java.util.Random in
PiEstimator. (szetszwo)
HADOOP-4572. Change INode and its sub-classes to package private.
(szetszwo)
HADOOP-4187. Does a runtime lookup for JobConf/JobConfigurable, and if
found, invokes the appropriate configure method. (Sharad Agarwal via ddas)
HADOOP-4453. Improve ssl configuration and handling in HsftpFileSystem,
particularly when used with DistCp. (Kan Zhang via cdouglas)
HADOOP-4583. Several code optimizations in HDFS. (Suresh Srinivas via
szetszwo)
HADOOP-3923. Remove org.apache.hadoop.mapred.StatusHttpServer. (szetszwo)
HADOOP-4622. Explicitly specify interpretor for non-native
pipes binaries. (Fredrik Hedberg via johan)
HADOOP-4505. Add a unit test to test faulty setup task and cleanup
task killing the job. (Amareshwari Sriramadasu via johan)
HADOOP-4608. Don't print a stack trace when the example driver gets an
unknown program to run. (Edward Yoon via omalley)
HADOOP-4645. Package HdfsProxy contrib project without the extra level
of directories. (Kan Zhang via omalley)
HADOOP-4126. Allow access to HDFS web UI on EC2 (tomwhite via omalley)
HADOOP-4612. Removes RunJar's dependency on JobClient.
(Sharad Agarwal via ddas)
HADOOP-4185. Adds setVerifyChecksum() method to FileSystem.
(Sharad Agarwal via ddas)
HADOOP-4523. Prevent too many tasks scheduled on a node from bringing
it down by monitoring for cumulative memory usage across tasks.
(Vinod Kumar Vavilapalli via yhemanth)
HADOOP-4640. Adds an input format that can split lzo compressed
text files. (johan)
HADOOP-4666. Launch reduces only after a few maps have run in the
Fair Scheduler. (Matei Zaharia via johan)
HADOOP-4339. Remove redundant calls from FileSystem/FsShell when
generating/processing ContentSummary. (David Phillips via cdouglas)
HADOOP-2774. Add counters tracking records spilled to disk in MapTask and
ReduceTask. (Ravi Gummadi via cdouglas)
HADOOP-4513. Initialize jobs asynchronously in the capacity scheduler.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-4649. Improve abstraction for spill indices. (cdouglas)
HADOOP-3770. Add gridmix2, an iteration on the gridmix benchmark. (Runping
Qi via cdouglas)
HADOOP-4708. Add support for dfsadmin commands in TestCLI. (Boris Shkolnik
via cdouglas)
HADOOP-4758. Add a splitter for metrics contexts to support more than one
type of collector. (cdouglas)
HADOOP-4722. Add tests for dfsadmin quota error messages. (Boris Shkolnik
via cdouglas)
HADOOP-4690. fuse-dfs - create source file/function + utils + config +
main source files. (pete wyckoff via mahadev)
HADOOP-3750. Fix and enforce module dependencies. (Sharad Agarwal via
tomwhite)
HADOOP-4747. Speed up FsShell::ls by removing redundant calls to the
filesystem. (David Phillips via cdouglas)
HADOOP-4305. Improves the blacklisting strategy, whereby, tasktrackers
that are blacklisted are not given tasks to run from other jobs, subject
to the following conditions (all must be met):
1) The TaskTracker has been blacklisted by at least 4 jobs (configurable)
2) The TaskTracker has been blacklisted 50% more number of times than
the average (configurable)
3) The cluster has less than 50% trackers blacklisted
Once in 24 hours, a TaskTracker blacklisted for all jobs is given a chance.
Restarting the TaskTracker moves it out of the blacklist.
(Amareshwari Sriramadasu via ddas)
HADOOP-4688. Modify the MiniMRDFSSort unit test to spill multiple times,
exercising the map-side merge code. (cdouglas)
HADOOP-4737. Adds the KILLED notification when jobs get killed.
(Amareshwari Sriramadasu via ddas)
HADOOP-4728. Add a test exercising different namenode configurations.
(Boris Shkolnik via cdouglas)
HADOOP-4807. Adds JobClient commands to get the active/blacklisted tracker
names. Also adds commands to display running/completed task attempt IDs.
(ddas)
HADOOP-4699. Remove checksum validation from map output servlet. (cdouglas)
HADOOP-4838. Added a registry to automate metrics and mbeans management.
(Sanjay Radia via acmurthy)
HADOOP-3136. Fixed the default scheduler to assign multiple tasks to each
tasktracker per heartbeat, when feasible. To ensure locality isn't hurt
too badly, the scheudler will not assign more than one off-switch task per
heartbeat. The heartbeat interval is also halved since the task-tracker is
fixed to no longer send out heartbeats on each task completion. A
slow-start for scheduling reduces is introduced to ensure that reduces
aren't started till sufficient number of maps are done, else reduces of
jobs whose maps aren't scheduled might swamp the cluster.
Configuration changes to mapred-default.xml:
add mapred.reduce.slowstart.completed.maps
(acmurthy)
HADOOP-4545. Add example and test case of secondary sort for the reduce.
(omalley)
HADOOP-4753. Refactor gridmix2 to reduce code duplication. (cdouglas)
HADOOP-4909. Fix Javadoc and make some of the API more consistent in their
use of the JobContext instead of Configuration. (omalley)
HADOOP-4830. Add end-to-end test cases for testing queue capacities.
(Vinod Kumar Vavilapalli via yhemanth)
HADOOP-4980. Improve code layout of capacity scheduler to make it
easier to fix some blocker bugs. (Vivek Ratan via yhemanth)
HADOOP-4916. Make user/location of Chukwa installation configurable by an
external properties file. (Eric Yang via cdouglas)
HADOOP-4950. Make the CompressorStream, DecompressorStream,
BlockCompressorStream, and BlockDecompressorStream public to facilitate
non-Hadoop codecs. (omalley)
HADOOP-4843. Collect job history and configuration in Chukwa. (Eric Yang
via cdouglas)
HADOOP-5030. Build Chukwa RPM to install into configured directory. (Eric
Yang via cdouglas)
HADOOP-4828. Updates documents to do with configuration (HADOOP-4631).
(Sharad Agarwal via ddas)
HADOOP-4939. Adds a test that would inject random failures for tasks in
large jobs and would also inject TaskTracker failures. (ddas)
HADOOP-4920. Stop storing Forrest output in Subversion. (cutting)
HADOOP-4944. A configuration file can include other configuration
files. (Rama Ramasamy via dhruba)
HADOOP-4804. Provide Forrest documentation for the Fair Scheduler.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-5248. A testcase that checks for the existence of job directory
after the job completes. Fails if it exists. (ddas)
HADOOP-4664. Introduces multiple job initialization threads, where the
number of threads are configurable via mapred.jobinit.threads.
(Matei Zaharia and Jothi Padmanabhan via ddas)
HADOOP-4191. Adds a testcase for JobHistory. (Ravi Gummadi via ddas)
HADOOP-5466. Change documenation CSS style for headers and code. (Corinne
Chandel via szetszwo)
HADOOP-5275. Add ivy directory and files to built tar.
(Giridharan Kesavan via nigel)
HADOOP-5468. Add sub-menus to forrest documentation and make some minor
edits. (Corinne Chandel via szetszwo)
HADOOP-5437. Fix TestMiniMRDFSSort to properly test jvm-reuse. (omalley)
HADOOP-5521. Removes dependency of TestJobInProgress on RESTART_COUNT
JobHistory tag. (Ravi Gummadi via ddas)
HADOOP-5714. Add a metric for NameNode getFileInfo operation. (Jakob Homan
via szetszwo)
OPTIMIZATIONS
HADOOP-3293. Fixes FileInputFormat to do provide locations for splits
based on the rack/host that has the most number of bytes.
(Jothi Padmanabhan via ddas)
HADOOP-4683. Fixes Reduce shuffle scheduler to invoke
getMapCompletionEvents in a separate thread. (Jothi Padmanabhan
via ddas)
BUG FIXES
HADOOP-5379. CBZip2InputStream to throw IOException on data crc error.
(Rodrigo Schmidt via zshao)
HADOOP-5326. Fixes CBZip2OutputStream data corruption problem.
(Rodrigo Schmidt via zshao)
HADOOP-4204. Fix findbugs warnings related to unused variables, naive
Number subclass instantiation, Map iteration, and badly scoped inner
classes. (Suresh Srinivas via cdouglas)
HADOOP-4207. Update derby jar file to release 10.4.2 release.
(Prasad Chakka via dhruba)
HADOOP-4325. SocketInputStream.read() should return -1 in case EOF.
(Raghu Angadi)
HADOOP-4408. FsAction functions need not create new objects. (cdouglas)
HADOOP-4440. TestJobInProgressListener tests for jobs killed in queued
state (Amar Kamat via ddas)
HADOOP-4346. Implement blocking connect so that Hadoop is not affected
by selector problem with JDK default implementation. (Raghu Angadi)
HADOOP-4388. If there are invalid blocks in the transfer list, Datanode
should handle them and keep transferring the remaining blocks. (Suresh
Srinivas via szetszwo)
HADOOP-4587. Fix a typo in Mapper javadoc. (Koji Noguchi via szetszwo)
HADOOP-4530. In fsck, HttpServletResponse sendError fails with
IllegalStateException. (hairong)
HADOOP-4377. Fix a race condition in directory creation in
NativeS3FileSystem. (David Phillips via cdouglas)
HADOOP-4621. Fix javadoc warnings caused by duplicate jars. (Kan Zhang via
cdouglas)
HADOOP-4566. Deploy new hive code to support more types.
(Zheng Shao via dhruba)
HADOOP-4571. Add chukwa conf files to svn:ignore list. (Eric Yang via
szetszwo)
HADOOP-4589. Correct PiEstimator output messages and improve the code
readability. (szetszwo)
HADOOP-4650. Correct a mismatch between the default value of
local.cache.size in the config and the source. (Jeff Hammerbacher via
cdouglas)
HADOOP-4606. Fix cygpath error if the log directory does not exist.
(szetszwo via omalley)
HADOOP-4141. Fix bug in ScriptBasedMapping causing potential infinite
loop on misconfigured hadoop-site. (Aaron Kimball via tomwhite)
HADOOP-4691. Correct a link in the javadoc of IndexedSortable. (szetszwo)
HADOOP-4598. '-setrep' command skips under-replicated blocks. (hairong)
HADOOP-4429. Set defaults for user, group in UnixUserGroupInformation so
login fails more predictably when misconfigured. (Alex Loddengaard via
cdouglas)
HADOOP-4676. Fix broken URL in blacklisted tasktrackers page. (Amareshwari
Sriramadasu via cdouglas)
HADOOP-3422 Ganglia counter metrics are all reported with the metric
name "value", so the counter values can not be seen. (Jason Attributor
and Brian Bockelman via stack)
HADOOP-4704. Fix javadoc typos "the the". (szetszwo)
HADOOP-4677. Fix semantics of FileSystem::getBlockLocations to return
meaningful values. (Hong Tang via cdouglas)
HADOOP-4669. Use correct operator when evaluating whether access time is
enabled (Dhruba Borthakur via cdouglas)
HADOOP-4732. Pass connection and read timeouts in the correct order when
setting up fetch in reduce. (Amareshwari Sriramadasu via cdouglas)
HADOOP-4558. Fix capacity reclamation in capacity scheduler.
(Amar Kamat via yhemanth)
HADOOP-4770. Fix rungridmix_2 script to work with RunJar. (cdouglas)
HADOOP-4738. When using git, the saveVersion script will use only the
commit hash for the version and not the message, which requires escaping.
(cdouglas)
HADOOP-4576. Show pending job count instead of task count in the UI per
queue in capacity scheduler. (Sreekanth Ramakrishnan via yhemanth)
HADOOP-4623. Maintain running tasks even if speculative execution is off.
(Amar Kamat via yhemanth)
HADOOP-4786. Fix broken compilation error in
TestTrackerBlacklistAcrossJobs. (yhemanth)
HADOOP-4785. Fixes theJobTracker heartbeat to not make two calls to
System.currentTimeMillis(). (Amareshwari Sriramadasu via ddas)
HADOOP-4792. Add generated Chukwa configuration files to version control
ignore lists. (cdouglas)
HADOOP-4796. Fix Chukwa test configuration, remove unused components. (Eric
Yang via cdouglas)
HADOOP-4708. Add binaries missed in the initial checkin for Chukwa. (Eric
Yang via cdouglas)
HADOOP-4805. Remove black list collector from Chukwa Agent HTTP Sender.
(Eric Yang via cdouglas)
HADOOP-4837. Move HADOOP_CONF_DIR configuration to chukwa-env.sh (Jerome
Boulon via cdouglas)
HADOOP-4825. Use ps instead of jps for querying process status in Chukwa.
(Eric Yang via cdouglas)
HADOOP-4844. Fixed javadoc for
org.apache.hadoop.fs.permission.AccessControlException to document that
it's deprecated in favour of
org.apache.hadoop.security.AccessControlException. (acmurthy)
HADOOP-4706. Close the underlying output stream in
IFileOutputStream::close. (Jothi Padmanabhan via cdouglas)
HADOOP-4855. Fixed command-specific help messages for refreshServiceAcl in
DFSAdmin and MRAdmin. (acmurthy)
HADOOP-4820. Remove unused method FSNamesystem::deleteInSafeMode. (Suresh
Srinivas via cdouglas)
HADOOP-4698. Lower io.sort.mb to 10 in the tests and raise the junit memory
limit to 512m from 256m. (Nigel Daley via cdouglas)
HADOOP-4860. Split TestFileTailingAdapters into three separate tests to
avoid contention. (Eric Yang via cdouglas)
HADOOP-3921. Fixed clover (code coverage) target to work with JDK 6.
(tomwhite via nigel)
HADOOP-4845. Modify the reduce input byte counter to record only the
compressed size and add a human-readable label. (Yongqiang He via cdouglas)
HADOOP-4458. Add a test creating symlinks in the working directory.
(Amareshwari Sriramadasu via cdouglas)
HADOOP-4879. Fix org.apache.hadoop.mapred.Counters to correctly define
Object.equals rather than depend on contentEquals api. (omalley via
acmurthy)
HADOOP-4791. Fix rpm build process for Chukwa. (Eric Yang via cdouglas)
HADOOP-4771. Correct initialization of the file count for directories
with quotas. (Ruyue Ma via shv)
HADOOP-4878. Fix eclipse plugin classpath file to point to ivy's resolved
lib directory and added the same to test-patch.sh. (Giridharan Kesavan via
acmurthy)
HADOOP-4774. Fix default values of some capacity scheduler configuration
items which would otherwise not work on a fresh checkout.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-4876. Fix capacity scheduler reclamation by updating count of
pending tasks correctly. (Sreekanth Ramakrishnan via yhemanth)
HADOOP-4849. Documentation for Service Level Authorization implemented in
HADOOP-4348. (acmurthy)
HADOOP-4827. Replace Consolidator with Aggregator macros in Chukwa (Eric
Yang via cdouglas)
HADOOP-4894. Correctly parse ps output in Chukwa jettyCollector.sh. (Ari
Rabkin via cdouglas)
HADOOP-4892. Close fds out of Chukwa ExecPlugin. (Ari Rabkin via cdouglas)
HADOOP-4889. Fix permissions in RPM packaging. (Eric Yang via cdouglas)
HADOOP-4869. Fixes the TT-JT heartbeat to have an explicit flag for
restart apart from the initialContact flag that there was earlier.
(Amareshwari Sriramadasu via ddas)
HADOOP-4716. Fixes ReduceTask.java to clear out the mapping between
hosts and MapOutputLocation upon a JT restart (Amar Kamat via ddas)
HADOOP-4880. Removes an unnecessary testcase from TestJobTrackerRestart.
(Amar Kamat via ddas)
HADOOP-4924. Fixes a race condition in TaskTracker re-init. (ddas)
HADOOP-4854. Read reclaim capacity interval from capacity scheduler
configuration. (Sreekanth Ramakrishnan via yhemanth)
HADOOP-4896. HDFS Fsck does not load HDFS configuration. (Raghu Angadi)
HADOOP-4956. Creates TaskStatus for failed tasks with an empty Counters
object instead of null. (ddas)
HADOOP-4979. Fix capacity scheduler to block cluster for failed high
RAM requirements across task types. (Vivek Ratan via yhemanth)
HADOOP-4949. Fix native compilation. (Chris Douglas via acmurthy)
HADOOP-4787. Fixes the testcase TestTrackerBlacklistAcrossJobs which was
earlier failing randomly. (Amareshwari Sriramadasu via ddas)
HADOOP-4914. Add description fields to Chukwa init.d scripts (Eric Yang via
cdouglas)
HADOOP-4884. Make tool tip date format match standard HICC format. (Eric
Yang via cdouglas)
HADOOP-4925. Make Chukwa sender properties configurable. (Ari Rabkin via
cdouglas)
HADOOP-4947. Make Chukwa command parsing more forgiving of whitespace. (Ari
Rabkin via cdouglas)
HADOOP-5026. Make chukwa/bin scripts executable in repository. (Andy
Konwinski via cdouglas)
HADOOP-4977. Fix a deadlock between the reclaimCapacity and assignTasks
in capacity scheduler. (Vivek Ratan via yhemanth)
HADOOP-4988. Fix reclaim capacity to work even when there are queues with
no capacity. (Vivek Ratan via yhemanth)
HADOOP-5065. Remove generic parameters from argument to
setIn/OutputFormatClass so that it works with SequenceIn/OutputFormat.
(cdouglas via omalley)
HADOOP-4818. Pass user config to instrumentation API. (Eric Yang via
cdouglas)
HADOOP-4993. Fix Chukwa agent configuration and startup to make it both
more modular and testable. (Ari Rabkin via cdouglas)
HADOOP-5048. Fix capacity scheduler to correctly cleanup jobs that are
killed after initialization, but before running.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-4671. Mark loop control variables shared between threads as
volatile. (cdouglas)
HADOOP-5079. HashFunction inadvertently destroys some randomness
(Jonathan Ellis via stack)
HADOOP-4999. A failure to write to FsEditsLog results in
IndexOutOfBounds exception. (Boris Shkolnik via rangadi)
HADOOP-5139. Catch IllegalArgumentException during metrics registration
in RPC. (Hairong Kuang via szetszwo)
HADOOP-5085. Copying a file to local with Crc throws an exception.
(hairong)
HADOOP-4759. Removes temporary output directory for failed and
killed tasks by launching special CLEANUP tasks for the same.
(Amareshwari Sriramadasu via ddas)
HADOOP-5211. Fix check for job completion in TestSetupAndCleanupFailure.
(enis)
HADOOP-5254. The Configuration class should be able to work with XML
parsers that do not support xmlinclude. (Steve Loughran via dhruba)
HADOOP-4692. Namenode in infinite loop for replicating/deleting corrupt
blocks. (hairong)
HADOOP-5255. Fix use of Math.abs to avoid overflow. (Jonathan Ellis via
cdouglas)
HADOOP-5269. Fixes a problem to do with tasktracker holding on to
FAILED_UNCLEAN or KILLED_UNCLEAN tasks forever. (Amareshwari Sriramadasu
via ddas)
HADOOP-5214. Fixes a ConcurrentModificationException while the Fairshare
Scheduler accesses the tasktrackers stored by the JobTracker.
(Rahul Kumar Singh via yhemanth)
HADOOP-5233. Addresses the three issues - Race condition in updating
status, NPE in TaskTracker task localization when the conf file is missing
(HADOOP-5234) and NPE in handling KillTaskAction of a cleanup task
(HADOOP-5235). (Amareshwari Sriramadasu via ddas)
HADOOP-5247. Introduces a broadcast of KillJobAction to all trackers when
a job finishes. This fixes a bunch of problems to do with NPE when a
completed job is not in memory and a tasktracker comes to the jobtracker
with a status report of a task belonging to that job. (Amar Kamat via ddas)
HADOOP-5282. Fixed job history logs for task attempts that are
failed by the JobTracker, say due to lost task trackers. (Amar
Kamat via yhemanth)
HADOOP-4963. Fixes a logging to do with getting the location of
map output file. (Amareshwari Sriramadasu via ddas)
HADOOP-5292. Fix NPE in KFS::getBlockLocations. (Sriram Rao via lohit)
HADOOP-5241. Fixes a bug in disk-space resource estimation. Makes
the estimation formula linear where blowUp =
Total-Output/Total-Input. (Sharad Agarwal via ddas)
HADOOP-5142. Fix MapWritable#putAll to store key/value classes.
(Do??acan G??ney via enis)
HADOOP-4744. Workaround for jetty6 returning -1 when getLocalPort
is invoked on the connector. The workaround patch retries a few
times before failing. (Jothi Padmanabhan via yhemanth)
HADOOP-5280. Adds a check to prevent a task state transition from
FAILED to any of UNASSIGNED, RUNNING, COMMIT_PENDING or
SUCCEEDED. (ddas)
HADOOP-5272. Fixes a problem to do with detecting whether an
attempt is the first attempt of a Task. This affects JobTracker
restart. (Amar Kamat via ddas)
HADOOP-5306. Fixes a problem to do with logging/parsing the http port of a
lost tracker. Affects JobTracker restart. (Amar Kamat via ddas)
HADOOP-5111. Fix Job::set* methods to work with generics. (cdouglas)
HADOOP-5274. Fix gridmix2 dependency on wordcount example. (cdouglas)
HADOOP-5145. Balancer sometimes runs out of memory after running
days or weeks. (hairong)
HADOOP-5338. Fix jobtracker restart to clear task completion
events cached by tasktrackers forcing them to fetch all events
afresh, thus avoiding missed task completion events on the
tasktrackers. (Amar Kamat via yhemanth)
HADOOP-4695. Change TestGlobalFilter so that it allows a web page to be
filtered more than once for a single access. (Kan Zhang via szetszwo)
HADOOP-5298. Change TestServletFilter so that it allows a web page to be
filtered more than once for a single access. (szetszwo)
HADOOP-5432. Disable ssl during unit tests in hdfsproxy, as it is unused
and causes failures. (cdouglas)
HADOOP-5416. Correct the shell command "fs -test" forrest doc description.
(Ravi Phulari via szetszwo)
HADOOP-5327. Fixed job tracker to remove files from system directory on
ACL check failures and also check ACLs on restart.
(Amar Kamat via yhemanth)
HADOOP-5395. Change the exception message when a job is submitted to an
invalid queue. (Rahul Kumar Singh via yhemanth)
HADOOP-5276. Fixes a problem to do with updating the start time of
a task when the tracker that ran the task is lost. (Amar Kamat via
ddas)
HADOOP-5278. Fixes a problem to do with logging the finish time of
a task during recovery (after a JobTracker restart). (Amar Kamat
via ddas)
HADOOP-5490. Fixes a synchronization problem in the
EagerTaskInitializationListener class. (Jothi Padmanabhan via
ddas)
HADOOP-5493. The shuffle copier threads return the codecs back to
the pool when the shuffle completes. (Jothi Padmanabhan via ddas)
HADOOP-5505. Fix JspHelper initialization in the context of
MiniDFSCluster. (Raghu Angadi)
HADOOP-5414. Fixes IO exception while executing hadoop fs -touchz
fileName by making sure that lease renewal thread exits before dfs
client exits. (hairong)
HADOOP-5103. FileInputFormat now reuses the clusterMap network
topology object and that brings down the log messages in the
JobClient to do with NetworkTopology.add significantly. (Jothi
Padmanabhan via ddas)
HADOOP-5483. Fixes a problem in the Directory Cleanup Thread due to which
TestMiniMRWithDFS sometimes used to fail. (ddas)
HADOOP-5281. Prevent sharing incompatible ZlibCompressor instances between
GzipCodec and DefaultCodec. (cdouglas)
HADOOP-5463. Balancer throws "Not a host:port pair" unless port is
specified in fs.default.name. (Stuart White via hairong)
HADOOP-5514. Fix JobTracker metrics and add metrics for wating, failed
tasks. (cdouglas)
HADOOP-5516. Fix NullPointerException in TaskMemoryManagerThread
that comes when monitored processes disappear when the thread is
running. (Vinod Kumar Vavilapalli via yhemanth)
HADOOP-5382. Support combiners in the new context object API. (omalley)
HADOOP-5471. Fixes a problem to do with updating the log.index file in the
case where a cleanup task is run. (Amareshwari Sriramadasu via ddas)
HADOOP-5534. Fixed a deadlock in Fair scheduler's servlet.
(Rahul Kumar Singh via yhemanth)
HADOOP-5328. Fixes a problem in the renaming of job history files during
job recovery. Amar Kamat via ddas)
HADOOP-5417. Don't ignore InterruptedExceptions that happen when calling
into rpc. (omalley)
HADOOP-5320. Add a close() in TestMapReduceLocal. (Jothi Padmanabhan
via szetszwo)
HADOOP-5520. Fix a typo in disk quota help message. (Ravi Phulari
via szetszwo)
HADOOP-5519. Remove claims from mapred-default.xml that prime numbers
of tasks are helpful. (Owen O'Malley via szetszwo)
HADOOP-5484. TestRecoveryManager fails wtih FileAlreadyExistsException.
(Amar Kamat via hairong)
HADOOP-5564. Limit the JVM heap size in the java command for initializing
JAVA_PLATFORM. (Suresh Srinivas via szetszwo)
HADOOP-5565. Add API for failing/finalized jobs to the JT metrics
instrumentation. (Jerome Boulon via cdouglas)
HADOOP-5390. Remove duplicate jars from tarball, src from binary tarball
added by hdfsproxy. (Zhiyong Zhang via cdouglas)
HADOOP-5066. Building binary tarball should not build docs/javadocs, copy
src, or run jdiff. (Giridharan Kesavan via cdouglas)
HADOOP-5459. Fix undetected CRC errors where intermediate output is closed
before it has been completely consumed. (cdouglas)
HADOOP-5571. Remove widening primitive conversion in TupleWritable mask
manipulation. (Jingkei Ly via cdouglas)
HADOOP-5588. Remove an unnecessary call to listStatus(..) in
FileSystem.globStatusInternal(..). (Hairong Kuang via szetszwo)
HADOOP-5473. Solves a race condition in killing a task - the state is KILLED
if there is a user request pending to kill the task and the TT reported
the state as SUCCESS. (Amareshwari Sriramadasu via ddas)
HADOOP-5576. Fix LocalRunner to work with the new context object API in
mapreduce. (Tom White via omalley)
HADOOP-4374. Installs a shutdown hook in the Task JVM so that log.index is
updated before the JVM exits. Also makes the update to log.index atomic.
(Ravi Gummadi via ddas)
HADOOP-5577. Add a verbose flag to mapreduce.Job.waitForCompletion to get
the running job's information printed to the user's stdout as it runs.
(omalley)
HADOOP-5607. Fix NPE in TestCapacityScheduler. (cdouglas)
HADOOP-5605. All the replicas incorrectly got marked as corrupt. (hairong)
HADOOP-5337. JobTracker, upon restart, now waits for the TaskTrackers to
join back before scheduling new tasks. This fixes race conditions associated
with greedy scheduling as was the case earlier. (Amar Kamat via ddas)
HADOOP-5227. Fix distcp so -update and -delete can be meaningfully
combined. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-5305. Increase number of files and print debug messages in
TestCopyFiles. (szetszwo)
HADOOP-5548. Add synchronization for JobTracker methods in RecoveryManager.
(Amareshwari Sriramadasu via sharad)
HADOOP-3810. NameNode seems unstable on a cluster with little space left.
(hairong)
HADOOP-5068. Fix NPE in TestCapacityScheduler. (Vinod Kumar Vavilapalli
via szetszwo)
HADOOP-5585. Clear FileSystem statistics between tasks when jvm-reuse
is enabled. (omalley)
HADOOP-5394. JobTracker might schedule 2 attempts of the same task
with the same attempt id across restarts. (Amar Kamat via sharad)
HADOOP-5645. After HADOOP-4920 we need a place to checkin
releasenotes.html. (nigel)
Release 0.19.2 - Unreleased
BUG FIXES
HADOOP-5154. Fixes a deadlock in the fairshare scheduler.
(Matei Zaharia via yhemanth)
HADOOP-5146. Fixes a race condition that causes LocalDirAllocator to miss
files. (Devaraj Das via yhemanth)
HADOOP-4638. Fixes job recovery to not crash the job tracker for problems
with a single job file. (Amar Kamat via yhemanth)
HADOOP-5384. Fix a problem that DataNodeCluster creates blocks with
generationStamp == 1. (szetszwo)
HADOOP-5376. Fixes the code handling lost tasktrackers to set the task state
to KILLED_UNCLEAN only for relevant type of tasks.
(Amareshwari Sriramadasu via yhemanth)
HADOOP-5285. Fixes the issues - (1) obtainTaskCleanupTask checks whether job is
inited before trying to lock the JobInProgress (2) Moves the CleanupQueue class
outside the TaskTracker and makes it a generic class that is used by the
JobTracker also for deleting the paths on the job's output fs. (3) Moves the
references to completedJobStore outside the block where the JobTracker is locked.
(ddas)
HADOOP-5392. Fixes a problem to do with JT crashing during recovery when
the job files are garbled. (Amar Kamat vi ddas)
HADOOP-5332. Appending to files is not allowed (by default) unless
dfs.support.append is set to true. (dhruba)
HADOOP-5333. libhdfs supports appending to files. (dhruba)
HADOOP-3998. Fix dfsclient exception when JVM is shutdown. (dhruba)
HADOOP-5440. Fixes a problem to do with removing a taskId from the list
of taskIds that the TaskTracker's TaskMemoryManager manages.
(Amareshwari Sriramadasu via ddas)
HADOOP-5446. Restore TaskTracker metrics. (cdouglas)
HADOOP-5449. Fixes the history cleaner thread.
(Amareshwari Sriramadasu via ddas)
HADOOP-5479. NameNode should not send empty block replication request to
DataNode. (hairong)
HADOOP-5259. Job with output hdfs:/user/<username>/outputpath (no
authority) fails with Wrong FS. (Doug Cutting via hairong)
HADOOP-5522. Documents the setup/cleanup tasks in the mapred tutorial.
(Amareshwari Sriramadasu via ddas)
HADOOP-5549. ReplicationMonitor should schedule both replication and
deletion work in one iteration. (hairong)
HADOOP-5554. DataNodeCluster and CreateEditsLog should create blocks with
the same generation stamp value. (hairong via szetszwo)
HADOOP-5231. Clones the TaskStatus before passing it to the JobInProgress.
(Amareshwari Sriramadasu via ddas)
HADOOP-4719. Fix documentation of 'ls' format for FsShell. (Ravi Phulari
via cdouglas)
HADOOP-5374. Fixes a NPE problem in getTasksToSave method.
(Amareshwari Sriramadasu via ddas)
HADOOP-4780. Cache the size of directories in DistributedCache, avoiding
long delays in recalculating it. (He Yongqiang via cdouglas)
HADOOP-5551. Prevent directory destruction on file create.
(Brian Bockelman via shv)
HADOOP-5671. Fix FNF exceptions when copying from old versions of
HftpFileSystem. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-5579. Set errno correctly in libhdfs for permission, quota, and FNF
conditions. (Brian Bockelman via cdouglas)
HADOOP-5816. Fixes a problem in the KeyFieldBasedComparator to do with
ArrayIndexOutOfBounds exception. (He Yongqiang via ddas)
HADOOP-5951. Add Apache license header to StorageInfo.java. (Suresh
Srinivas via szetszwo)
Release 0.19.1 - 2009-02-23
IMPROVEMENTS
HADOOP-4739. Fix spelling and grammar, improve phrasing of some sections in
mapred tutorial. (Vivek Ratan via cdouglas)
HADOOP-3894. DFSClient logging improvements. (Steve Loughran via shv)
HADOOP-5126. Remove empty file BlocksWithLocations.java (shv)
HADOOP-5127. Remove public methods in FSDirectory. (Jakob Homan via shv)
BUG FIXES
HADOOP-4697. Fix getBlockLocations in KosmosFileSystem to handle multiple
blocks correctly. (Sriram Rao via cdouglas)
HADOOP-4420. Add null checks for job, caused by invalid job IDs.
(Aaron Kimball via tomwhite)
HADOOP-4632. Fix TestJobHistoryVersion to use test.build.dir instead of the
current workding directory for scratch space. (Amar Kamat via cdouglas)
HADOOP-4508. Fix FSDataOutputStream.getPos() for append. (dhruba via
szetszwo)
HADOOP-4727. Fix a group checking bug in fill_stat_structure(...) in
fuse-dfs. (Brian Bockelman via szetszwo)
HADOOP-4836. Correct typos in mapred related documentation. (Jord? Polo
via szetszwo)
HADOOP-4821. Usage description in the Quotas guide documentations are
incorrect. (Boris Shkolnik via hairong)
HADOOP-4847. Moves the loading of OutputCommitter to the Task.
(Amareshwari Sriramadasu via ddas)
HADOOP-4966. Marks completed setup tasks for removal.
(Amareshwari Sriramadasu via ddas)
HADOOP-4982. TestFsck should run in Eclipse. (shv)
HADOOP-5008. TestReplication#testPendingReplicationRetry leaves an opened
fd unclosed. (hairong)
HADOOP-4906. Fix TaskTracker OOM by keeping a shallow copy of JobConf in
TaskTracker.TaskInProgress. (Sharad Agarwal via acmurthy)
HADOOP-4918. Fix bzip2 compression to work with Sequence Files.
(Zheng Shao via dhruba).
HADOOP-4965. TestFileAppend3 should close FileSystem. (shv)
HADOOP-4967. Fixes a race condition in the JvmManager to do with killing
tasks. (ddas)
HADOOP-5009. DataNode#shutdown sometimes leaves data block scanner
verification log unclosed. (hairong)
HADOOP-5086. Use the appropriate FileSystem for trash URIs. (cdouglas)
HADOOP-4955. Make DBOutputFormat us column names from setOutput().
(Kevin Peterson via enis)
HADOOP-4862. Minor : HADOOP-3678 did not remove all the cases of
spurious IOExceptions logged by DataNode. (Raghu Angadi)
HADOOP-5034. NameNode should send both replication and deletion requests
to DataNode in one reply to a heartbeat. (hairong)
HADOOP-5156. TestHeartbeatHandling uses MiiDFSCluster.getNamesystem()
which does not exit in branch 0.19 and 0.20. (hairong)
HADOOP-5161. Accepted sockets do not get placed in
DataXceiverServer#childSockets. (hairong)
HADOOP-5193. Correct calculation of edits modification time. (shv)
HADOOP-4494. Allow libhdfs to append to files.
(Pete Wyckoff via dhruba)
HADOOP-5166. Fix JobTracker restart to work when ACLs are configured
for the JobTracker. (Amar Kamat via yhemanth).
HADOOP-5067. Fixes TaskInProgress.java to keep track of count of failed and
killed tasks correctly. (Amareshwari Sriramadasu via ddas)
HADOOP-4760. HDFS streams should not throw exceptions when closed twice.
(enis)
Release 0.19.0 - 2008-11-18
INCOMPATIBLE CHANGES
HADOOP-3595. Remove deprecated methods for mapred.combine.once
functionality, which was necessary to providing backwards
compatible combiner semantics for 0.18. (cdouglas via omalley)
HADOOP-3667. Remove the following deprecated methods from JobConf:
addInputPath(Path)
getInputPaths()
getMapOutputCompressionType()
getOutputPath()
getSystemDir()
setInputPath(Path)
setMapOutputCompressionType(CompressionType style)
setOutputPath(Path)
(Amareshwari Sriramadasu via omalley)
HADOOP-3652. Remove deprecated class OutputFormatBase.
(Amareshwari Sriramadasu via cdouglas)
HADOOP-2885. Break the hadoop.dfs package into separate packages under
hadoop.hdfs that reflect whether they are client, server, protocol,
etc. DistributedFileSystem and DFSClient have moved and are now
considered package private. (Sanjay Radia via omalley)
HADOOP-2325. Require Java 6. (cutting)
HADOOP-372. Add support for multiple input paths with a different
InputFormat and Mapper for each path. (Chris Smith via tomwhite)
HADOOP-1700. Support appending to file in HDFS. (dhruba)
HADOOP-3792. Make FsShell -test consistent with unix semantics, returning
zero for true and non-zero for false. (Ben Slusky via cdouglas)
HADOOP-3664. Remove the deprecated method InputFormat.validateInput,
which is no longer needed. (tomwhite via omalley)
HADOOP-3549. Give more meaningful errno's in libhdfs. In particular,
EACCES is returned for permission problems. (Ben Slusky via omalley)
HADOOP-4036. ResourceStatus was added to TaskTrackerStatus by HADOOP-3759,
so increment the InterTrackerProtocol version. (Hemanth Yamijala via
omalley)
HADOOP-3150. Moves task promotion to tasks. Defines a new interface for
committing output files. Moves job setup to jobclient, and moves jobcleanup
to a separate task. (Amareshwari Sriramadasu via ddas)
HADOOP-3446. Keep map outputs in memory during the reduce. Remove
fs.inmemory.size.mb and replace with properties defining in memory map
output retention during the shuffle and reduce relative to maximum heap
usage. (cdouglas)
HADOOP-3245. Adds the feature for supporting JobTracker restart. Running
jobs can be recovered from the history file. The history file format has
been modified to support recovery. The task attempt ID now has the
JobTracker start time to disinguish attempts of the same TIP across
restarts. (Amar Ramesh Kamat via ddas)
HADOOP-4007. REMOVE DFSFileInfo - FileStatus is sufficient.
(Sanjay Radia via hairong)
HADOOP-3722. Fixed Hadoop Streaming and Hadoop Pipes to use the Tool
interface and GenericOptionsParser. (Enis Soztutar via acmurthy)
HADOOP-2816. Cluster summary at name node web reports the space
utilization as:
Configured Capacity: capacity of all the data directories - Reserved space
Present Capacity: Space available for dfs,i.e. remaining+used space
DFS Used%: DFS used space/Present Capacity
(Suresh Srinivas via hairong)
HADOOP-3938. Disk space quotas for HDFS. This is similar to namespace
quotas in 0.18. (rangadi)
HADOOP-4293. Make Configuration Writable and remove unreleased
WritableJobConf. Configuration.write is renamed to writeXml. (omalley)
HADOOP-4281. Change dfsadmin to report available disk space in a format
consistent with the web interface as defined in HADOOP-2816. (Suresh
Srinivas via cdouglas)
HADOOP-4430. Further change the cluster summary at name node web that was
changed in HADOOP-2816:
Non DFS Used - This indicates the disk space taken by non DFS file from
the Configured capacity
DFS Used % - DFS Used % of Configured Capacity
DFS Remaining % - Remaing % Configured Capacity available for DFS use
DFS command line report reflects the same change. Config parameter
dfs.datanode.du.pct is no longer used and is removed from the
hadoop-default.xml. (Suresh Srinivas via hairong)
HADOOP-4116. Balancer should provide better resource management. (hairong)
HADOOP-4599. BlocksMap and BlockInfo made package private. (shv)
NEW FEATURES
HADOOP-3341. Allow streaming jobs to specify the field separator for map
and reduce input and output. The new configuration values are:
stream.map.input.field.separator
stream.map.output.field.separator
stream.reduce.input.field.separator
stream.reduce.output.field.separator
All of them default to "\t". (Zheng Shao via omalley)
HADOOP-3479. Defines the configuration file for the resource manager in
Hadoop. You can configure various parameters related to scheduling, such
as queues and queue properties here. The properties for a queue follow a
naming convention,such as, hadoop.rm.queue.queue-name.property-name.
(Hemanth Yamijala via ddas)
HADOOP-3149. Adds a way in which map/reducetasks can create multiple
outputs. (Alejandro Abdelnur via ddas)
HADOOP-3714. Add a new contrib, bash-tab-completion, which enables
bash tab completion for the bin/hadoop script. See the README file
in the contrib directory for the installation. (Chris Smith via enis)
HADOOP-3730. Adds a new JobConf constructor that disables loading
default configurations. (Alejandro Abdelnur via ddas)
HADOOP-3772. Add a new Hadoop Instrumentation api for the JobTracker and
the TaskTracker, refactor Hadoop Metrics as an implementation of the api.
(Ari Rabkin via acmurthy)
HADOOP-2302. Provides a comparator for numerical sorting of key fields.
(ddas)
HADOOP-153. Provides a way to skip bad records. (Sharad Agarwal via ddas)
HADOOP-657. Free disk space should be modelled and used by the scheduler
to make scheduling decisions. (Ari Rabkin via omalley)
HADOOP-3719. Initial checkin of Chukwa, which is a data collection and
analysis framework. (Jerome Boulon, Andy Konwinski, Ari Rabkin,
and Eric Yang)
HADOOP-3873. Add -filelimit and -sizelimit options to distcp to cap the
number of files/bytes copied in a particular run to support incremental
updates and mirroring. (TszWo (Nicholas), SZE via cdouglas)
HADOOP-3585. FailMon package for hardware failure monitoring and
analysis of anomalies. (Ioannis Koltsidas via dhruba)
HADOOP-1480. Add counters to the C++ Pipes API. (acmurthy via omalley)
HADOOP-3854. Add support for pluggable servlet filters in the HttpServers.
(Tsz Wo (Nicholas) Sze via omalley)
HADOOP-3759. Provides ability to run memory intensive jobs without
affecting other running tasks on the nodes. (Hemanth Yamijala via ddas)
HADOOP-3746. Add a fair share scheduler. (Matei Zaharia via omalley)
HADOOP-3754. Add a thrift interface to access HDFS. (dhruba via omalley)
HADOOP-3828. Provides a way to write skipped records to DFS.
(Sharad Agarwal via ddas)
HADOOP-3948. Separate name-node edits and fsimage directories.
(Lohit Vijayarenu via shv)
HADOOP-3939. Add an option to DistCp to delete files at the destination
not present at the source. (Tsz Wo (Nicholas) Sze via cdouglas)
HADOOP-3601. Add a new contrib module for Hive, which is a sql-like
query processing tool that uses map/reduce. (Ashish Thusoo via omalley)
HADOOP-3866. Added sort and multi-job updates in the JobTracker web ui.
(Craig Weisenfluh via omalley)
HADOOP-3698. Add access control to control who is allowed to submit or
modify jobs in the JobTracker. (Hemanth Yamijala via omalley)
HADOOP-1869. Support access times for HDFS files. (dhruba)
HADOOP-3941. Extend FileSystem API to return file-checksums.
(szetszwo)
HADOOP-3581. Prevents memory intensive user tasks from taking down
nodes. (Vinod K V via ddas)
HADOOP-3970. Provides a way to recover counters written to JobHistory.
(Amar Kamat via ddas)
HADOOP-3702. Adds ChainMapper and ChainReducer classes allow composing
chains of Maps and Reduces in a single Map/Reduce job, something like
MAP+ / REDUCE MAP*. (Alejandro Abdelnur via ddas)
HADOOP-3445. Add capacity scheduler that provides guaranteed capacities to
queues as a percentage of the cluster. (Vivek Ratan via omalley)
HADOOP-3992. Add a synthetic load generation facility to the test
directory. (hairong via szetszwo)
HADOOP-3981. Implement a distributed file checksum algorithm in HDFS
and change DistCp to use file checksum for comparing src and dst files
(szetszwo)
HADOOP-3829. Narrown down skipped records based on user acceptable value.
(Sharad Agarwal via ddas)
HADOOP-3930. Add common interfaces for the pluggable schedulers and the
cli & gui clients. (Sreekanth Ramakrishnan via omalley)
HADOOP-4176. Implement getFileChecksum(Path) in HftpFileSystem. (szetszwo)
HADOOP-249. Reuse JVMs across Map-Reduce Tasks.
Configuration changes to hadoop-default.xml:
add mapred.job.reuse.jvm.num.tasks
(Devaraj Das via acmurthy)
HADOOP-4070. Provide a mechanism in Hive for registering UDFs from the
query language. (tomwhite)
HADOOP-2536. Implement a JDBC based database input and output formats to
allow Map-Reduce applications to work with databases. (Fredrik Hedberg and
Enis Soztutar via acmurthy)
HADOOP-3019. A new library to support total order partitions.
(cdouglas via omalley)
HADOOP-3924. Added a 'KILLED' job status. (Subramaniam Krishnan via
acmurthy)
IMPROVEMENTS
HADOOP-4205. hive: metastore and ql to use the refactored SerDe library.
(zshao)
HADOOP-4106. libhdfs: add time, permission and user attribute support
(part 2). (Pete Wyckoff through zshao)
HADOOP-4104. libhdfs: add time, permission and user attribute support.
(Pete Wyckoff through zshao)
HADOOP-3908. libhdfs: better error message if llibhdfs.so doesn't exist.
(Pete Wyckoff through zshao)
HADOOP-3732. Delay intialization of datanode block verification till
the verification thread is started. (rangadi)
HADOOP-1627. Various small improvements to 'dfsadmin -report' output.
(rangadi)
HADOOP-3577. Tools to inject blocks into name node and simulated
data nodes for testing. (Sanjay Radia via hairong)
HADOOP-2664. Add a lzop compatible codec, so that files compressed by lzop
may be processed by map/reduce. (cdouglas via omalley)
HADOOP-3655. Add additional ant properties to control junit. (Steve
Loughran via omalley)
HADOOP-3543. Update the copyright year to 2008. (cdouglas via omalley)
HADOOP-3587. Add a unit test for the contrib/data_join framework.
(cdouglas)
HADOOP-3402. Add terasort example program (omalley)
HADOOP-3660. Add replication factor for injecting blocks in simulated
datanodes. (Sanjay Radia via cdouglas)
HADOOP-3684. Add a cloning function to the contrib/data_join framework
permitting users to define a more efficient method for cloning values from
the reduce than serialization/deserialization. (Runping Qi via cdouglas)
HADOOP-3478. Improves the handling of map output fetching. Now the
randomization is by the hosts (and not the map outputs themselves).
(Jothi Padmanabhan via ddas)
HADOOP-3617. Removed redundant checks of accounting space in MapTask and
makes the spill thread persistent so as to avoid creating a new one for
each spill. (Chris Douglas via acmurthy)
HADOOP-3412. Factor the scheduler out of the JobTracker and make
it pluggable. (Tom White and Brice Arnould via omalley)
HADOOP-3756. Minor. Remove unused dfs.client.buffer.dir from
hadoop-default.xml. (rangadi)
HADOOP-3747. Adds counter suport for MultipleOutputs.
(Alejandro Abdelnur via ddas)
HADOOP-3169. LeaseChecker daemon should not be started in DFSClient
constructor. (TszWo (Nicholas), SZE via hairong)
HADOOP-3824. Move base functionality of StatusHttpServer to a core
package. (TszWo (Nicholas), SZE via cdouglas)
HADOOP-3646. Add a bzip2 compatible codec, so bzip compressed data
may be processed by map/reduce. (Abdul Qadeer via cdouglas)
HADOOP-3861. MapFile.Reader and Writer should implement Closeable.
(tomwhite via omalley)
HADOOP-3791. Introduce generics into ReflectionUtils. (Chris Smith via
cdouglas)
HADOOP-3694. Improve unit test performance by changing
MiniDFSCluster to listen only on 127.0.0.1. (cutting)
HADOOP-3620. Namenode should synchronously resolve a datanode's network
location when the datanode registers. (hairong)
HADOOP-3860. NNThroughputBenchmark is extended with rename and delete
benchmarks. (shv)
HADOOP-3892. Include unix group name in JobConf. (Matei Zaharia via johan)
HADOOP-3875. Change the time period between heartbeats to be relative to
the end of the heartbeat rpc, rather than the start. This causes better
behavior if the JobTracker is overloaded. (acmurthy via omalley)
HADOOP-3853. Move multiple input format (HADOOP-372) extension to
library package. (tomwhite via johan)
HADOOP-9. Use roulette scheduling for temporary space when the size
is not known. (Ari Rabkin via omalley)
HADOOP-3202. Use recursive delete rather than FileUtil.fullyDelete.
(Amareshwari Sriramadasu via omalley)
HADOOP-3368. Remove common-logging.properties from conf. (Steve Loughran
via omalley)
HADOOP-3851. Fix spelling mistake in FSNamesystemMetrics. (Steve Loughran
via omalley)
HADOOP-3780. Remove asynchronous resolution of network topology in the
JobTracker (Amar Kamat via omalley)
HADOOP-3852. Add ShellCommandExecutor.toString method to make nicer
error messages. (Steve Loughran via omalley)
HADOOP-3844. Include message of local exception in RPC client failures.
(Steve Loughran via omalley)
HADOOP-3935. Split out inner classes from DataNode.java. (johan)
HADOOP-3905. Create generic interfaces for edit log streams. (shv)
HADOOP-3062. Add metrics to DataNode and TaskTracker to record network
traffic for HDFS reads/writes and MR shuffling. (cdouglas)
HADOOP-3742. Remove HDFS from public java doc and add javadoc-dev for
generative javadoc for developers. (Sanjay Radia via omalley)
HADOOP-3944. Improve documentation for public TupleWritable class in
join package. (Chris Douglas via enis)
HADOOP-2330. Preallocate HDFS transaction log to improve performance.
(dhruba and hairong)
HADOOP-3965. Convert DataBlockScanner into a package private class. (shv)
HADOOP-3488. Prevent hadoop-daemon from rsync'ing log files (Stefan
Groshupf and Craig Macdonald via omalley)
HADOOP-3342. Change the kill task actions to require http post instead of
get to prevent accidental crawls from triggering it. (enis via omalley)
HADOOP-3937. Limit the job name in the job history filename to 50
characters. (Matei Zaharia via omalley)
HADOOP-3943. Remove unnecessary synchronization in
NetworkTopology.pseudoSortByDistance. (hairong via omalley)
HADOOP-3498. File globbing alternation should be able to span path
components. (tomwhite)
HADOOP-3361. Implement renames for NativeS3FileSystem.
(Albert Chern via tomwhite)
HADOOP-3605. Make EC2 scripts show an error message if AWS_ACCOUNT_ID is
unset. (Al Hoang via tomwhite)
HADOOP-4147. Remove unused class JobWithTaskContext from class
JobInProgress. (Amareshwari Sriramadasu via johan)
HADOOP-4151. Add a byte-comparable interface that both Text and
BytesWritable implement. (cdouglas via omalley)
HADOOP-4174. Move fs image/edit log methods from ClientProtocol to
NamenodeProtocol. (shv via szetszwo)
HADOOP-4181. Include a .gitignore and saveVersion.sh change to support
developing under git. (omalley)
HADOOP-4186. Factor LineReader out of LineRecordReader. (tomwhite via
omalley)
HADOOP-4184. Break the module dependencies between core, hdfs, and
mapred. (tomwhite via omalley)
HADOOP-4075. test-patch.sh now spits out ant commands that it runs.
(Ramya R via nigel)
HADOOP-4117. Improve configurability of Hadoop EC2 instances.
(tomwhite)
HADOOP-2411. Add support for larger CPU EC2 instance types.
(Chris K Wensel via tomwhite)
HADOOP-4083. Changed the configuration attribute queue.name to
mapred.job.queue.name. (Hemanth Yamijala via acmurthy)
HADOOP-4194. Added the JobConf and JobID to job-related methods in
JobTrackerInstrumentation for better metrics. (Mac Yang via acmurthy)
HADOOP-3975. Change test-patch script to report working the dir
modifications preventing the suite from being run. (Ramya R via cdouglas)
HADOOP-4124. Added a command-line switch to allow users to set job
priorities, also allow it to be manipulated via the web-ui. (Hemanth
Yamijala via acmurthy)
HADOOP-2165. Augmented JobHistory to include the URIs to the tasks'
userlogs. (Vinod Kumar Vavilapalli via acmurthy)
HADOOP-4062. Remove the synchronization on the output stream when a
connection is closed and also remove an undesirable exception when
a client is stoped while there is no pending RPC request. (hairong)
HADOOP-4227. Remove the deprecated class org.apache.hadoop.fs.ShellCommand.
(szetszwo)
HADOOP-4006. Clean up FSConstants and move some of the constants to
better places. (Sanjay Radia via rangadi)
HADOOP-4279. Trace the seeds of random sequences in append unit tests to
make itermitant failures reproducible. (szetszwo via cdouglas)
HADOOP-4209. Remove the change to the format of task attempt id by
incrementing the task attempt numbers by 1000 when the job restarts.
(Amar Kamat via omalley)
HADOOP-4301. Adds forrest doc for the skip bad records feature.
(Sharad Agarwal via ddas)
HADOOP-4354. Separate TestDatanodeDeath.testDatanodeDeath() into 4 tests.
(szetszwo)
HADOOP-3790. Add more unit tests for testing HDFS file append. (szetszwo)
HADOOP-4321. Include documentation for the capacity scheduler. (Hemanth
Yamijala via omalley)
HADOOP-4424. Change menu layout for Hadoop documentation (Boris Shkolnik
via cdouglas).
HADOOP-4438. Update forrest documentation to include missing FsShell
commands. (Suresh Srinivas via cdouglas)
HADOOP-4105. Add forrest documentation for libhdfs.
(Pete Wyckoff via cutting)
HADOOP-4510. Make getTaskOutputPath public. (Chris Wensel via omalley)
OPTIMIZATIONS
HADOOP-3556. Removed lock contention in MD5Hash by changing the
singleton MessageDigester by an instance per Thread using
ThreadLocal. (Iv?n de Prado via omalley)
HADOOP-3328. When client is writing data to DFS, only the last
datanode in the pipeline needs to verify the checksum. Saves around
30% CPU on intermediate datanodes. (rangadi)
HADOOP-3863. Use a thread-local string encoder rather than a static one
that is protected by a lock. (acmurthy via omalley)
HADOOP-3864. Prevent the JobTracker from locking up when a job is being
initialized. (acmurthy via omalley)
HADOOP-3816. Faster directory listing in KFS. (Sriram Rao via omalley)
HADOOP-2130. Pipes submit job should have both blocking and non-blocking
versions. (acmurthy via omalley)
HADOOP-3769. Make the SampleMapper and SampleReducer from
GenericMRLoadGenerator public, so they can be used in other contexts.
(Lingyun Yang via omalley)
HADOOP-3514. Inline the CRCs in intermediate files as opposed to reading
it from a different .crc file. (Jothi Padmanabhan via ddas)
HADOOP-3638. Caches the iFile index files in memory to reduce seeks
(Jothi Padmanabhan via ddas)
HADOOP-4225. FSEditLog.logOpenFile() should persist accessTime
rather than modificationTime. (shv)
HADOOP-4380. Made several new classes (Child, JVMId,
JobTrackerInstrumentation, QueueManager, ResourceEstimator,
TaskTrackerInstrumentation, and TaskTrackerMetricsInst) in
org.apache.hadoop.mapred package private instead of public. (omalley)
BUG FIXES
HADOOP-3563. Refactor the distributed upgrade code so that it is
easier to identify datanode and namenode related code. (dhruba)
HADOOP-3640. Fix the read method in the NativeS3InputStream. (tomwhite via
omalley)
HADOOP-3711. Fixes the Streaming input parsing to properly find the
separator. (Amareshwari Sriramadasu via ddas)
HADOOP-3725. Prevent TestMiniMRMapDebugScript from swallowing exceptions.
(Steve Loughran via cdouglas)
HADOOP-3726. Throw exceptions from TestCLI setup and teardown instead of
swallowing them. (Steve Loughran via cdouglas)
HADOOP-3721. Refactor CompositeRecordReader and related mapred.join classes
to make them clearer. (cdouglas)
HADOOP-3720. Re-read the config file when dfsadmin -refreshNodes is invoked
so dfs.hosts and dfs.hosts.exclude are observed. (lohit vijayarenu via
cdouglas)
HADOOP-3485. Allow writing to files over fuse.
(Pete Wyckoff via dhruba)
HADOOP-3723. The flags to the libhdfs.create call can be treated as
a bitmask. (Pete Wyckoff via dhruba)
HADOOP-3643. Filter out completed tasks when asking for running tasks in
the JobTracker web/ui. (Amar Kamat via omalley)
HADOOP-3777. Ensure that Lzo compressors/decompressors correctly handle the
case where native libraries aren't available. (Chris Douglas via acmurthy)
HADOOP-3728. Fix SleepJob so that it doesn't depend on temporary files,
this ensures we can now run more than one instance of SleepJob
simultaneously. (Chris Douglas via acmurthy)
HADOOP-3795. Fix saving image files on Namenode with different checkpoint
stamps. (Lohit Vijayarenu via mahadev)
HADOOP-3624. Improving createeditslog to create tree directory structure.
(Lohit Vijayarenu via mahadev)
HADOOP-3778. DFSInputStream.seek() did not retry in case of some errors.
(LN via rangadi)
HADOOP-3661. The handling of moving files deleted through fuse-dfs to
Trash made similar to the behaviour from dfs shell.
(Pete Wyckoff via dhruba)
HADOOP-3819. Unset LANG and LC_CTYPE in saveVersion.sh to make it
compatible with non-English locales. (Rong-En Fan via cdouglas)
HADOOP-3848. Cache calls to getSystemDir in the TaskTracker instead of
calling it for each task start. (acmurthy via omalley)
HADOOP-3131. Fix reduce progress reporting for compressed intermediate
data. (Matei Zaharia via acmurthy)
HADOOP-3796. fuse-dfs configuration is implemented as file system
mount options. (Pete Wyckoff via dhruba)
HADOOP-3836. Fix TestMultipleOutputs to correctly clean up. (Alejandro
Abdelnur via acmurthy)
HADOOP-3805. Improve fuse-dfs write performance.
(Pete Wyckoff via zshao)
HADOOP-3846. Fix unit test CreateEditsLog to generate paths correctly.
(Lohit Vjayarenu via cdouglas)
HADOOP-3904. Fix unit tests using the old dfs package name.
(TszWo (Nicholas), SZE via johan)
HADOOP-3319. Fix some HOD error messages to go stderr instead of
stdout. (Vinod Kumar Vavilapalli via omalley)
HADOOP-3907. Move INodeDirectoryWithQuota to its own .java file.
(Tsz Wo (Nicholas), SZE via hairong)
HADOOP-3919. Fix attribute name in hadoop-default for
mapred.jobtracker.instrumentation. (Ari Rabkin via omalley)
HADOOP-3903. Change the package name for the servlets to be hdfs instead of
dfs. (Tsz Wo (Nicholas) Sze via omalley)
HADOOP-3773. Change Pipes to set the default map output key and value
types correctly. (Koji Noguchi via omalley)
HADOOP-3952. Fix compilation error in TestDataJoin referencing dfs package.
(omalley)
HADOOP-3951. Fix package name for FSNamesystem logs and modify other
hard-coded Logs to use the class name. (cdouglas)
HADOOP-3889. Improve error reporting from HftpFileSystem, handling in
DistCp. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3946. Fix TestMapRed after hadoop-3664. (tomwhite via omalley)
HADOOP-3949. Remove duplicate jars from Chukwa. (Jerome Boulon via omalley)
HADOOP-3933. DataNode sometimes sends up to io.byte.per.checksum bytes
more than required to client. (Ning Li via rangadi)
HADOOP-3962. Shell command "fs -count" should support paths with different
file systems. (Tsz Wo (Nicholas), SZE via mahadev)
HADOOP-3957. Fix javac warnings in DistCp and TestCopyFiles. (Tsz Wo
(Nicholas), SZE via cdouglas)
HADOOP-3958. Fix TestMapRed to check the success of test-job. (omalley via
acmurthy)
HADOOP-3985. Fix TestHDFSServerPorts to use random ports. (Hairong Kuang
via omalley)
HADOOP-3964. Fix javadoc warnings introduced by FailMon. (dhruba)
HADOOP-3785. Fix FileSystem cache to be case-insensitive for scheme and
authority. (Bill de hOra via cdouglas)
HADOOP-3506. Fix a rare NPE caused by error handling in S3. (Tom White via
cdouglas)
HADOOP-3705. Fix mapred.join parser to accept InputFormats named with
underscore and static, inner classes. (cdouglas)
HADOOP-4023. Fix javadoc warnings introduced when the HDFS javadoc was
made private. (omalley)
HADOOP-4030. Remove lzop from the default list of codecs. (Arun Murthy via
cdouglas)
HADOOP-3961. Fix task disk space requirement estimates for virtual
input jobs. Delays limiting task placement until after 10% of the maps
have finished. (Ari Rabkin via omalley)
HADOOP-2168. Fix problem with C++ record reader's progress not being
reported to framework. (acmurthy via omalley)
HADOOP-3966. Copy findbugs generated output files to PATCH_DIR while
running test-patch. (Ramya R via lohit)
HADOOP-4037. Fix the eclipse plugin for versions of kfs and log4j. (nigel
via omalley)
HADOOP-3950. Cause the Mini MR cluster to wait for task trackers to
register before continuing. (enis via omalley)
HADOOP-3910. Remove unused ClusterTestDFSNamespaceLogging and
ClusterTestDFS. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3954. Disable record skipping by default. (Sharad Agarwal via
cdouglas)
HADOOP-4050. Fix TestFairScheduler to use absolute paths for the work
directory. (Matei Zaharia via omalley)
HADOOP-4069. Keep temporary test files from TestKosmosFileSystem under
test.build.data instead of /tmp. (lohit via omalley)
HADOOP-4078. Create test files for TestKosmosFileSystem in separate
directory under test.build.data. (lohit)
HADOOP-3968. Fix getFileBlockLocations calls to use FileStatus instead
of Path reflecting the new API. (Pete Wyckoff via lohit)
HADOOP-3963. libhdfs does not exit on its own, instead it returns error
to the caller and behaves as a true library. (Pete Wyckoff via dhruba)
HADOOP-4100. Removes the cleanupTask scheduling from the Scheduler
implementations and moves it to the JobTracker.
(Amareshwari Sriramadasu via ddas)
HADOOP-4097. Make hive work well with speculative execution turned on.
(Joydeep Sen Sarma via dhruba)
HADOOP-4113. Changes to libhdfs to not exit on its own, rather return
an error code to the caller. (Pete Wyckoff via dhruba)
HADOOP-4054. Remove duplicate lease removal during edit log loading.
(hairong)
HADOOP-4071. FSNameSystem.isReplicationInProgress should add an
underReplicated block to the neededReplication queue using method
"add" not "update". (hairong)
HADOOP-4154. Fix type warnings in WritableUtils. (szetszwo via omalley)
HADOOP-4133. Log files generated by Hive should reside in the
build directory. (Prasad Chakka via dhruba)
HADOOP-4094. Hive now has hive-default.xml and hive-site.xml similar
to core hadoop. (Prasad Chakka via dhruba)
HADOOP-4112. Handles cleanupTask in JobHistory
(Amareshwari Sriramadasu via ddas)
HADOOP-3831. Very slow reading clients sometimes failed while reading.
(rangadi)
HADOOP-4155. Use JobTracker's start time while initializing JobHistory's
JobTracker Unique String. (lohit)
HADOOP-4099. Fix null pointer when using HFTP from an 0.18 server.
(dhruba via omalley)
HADOOP-3570. Includes user specified libjar files in the client side
classpath path. (Sharad Agarwal via ddas)
HADOOP-4129. Changed memory limits of TaskTracker and Tasks to be in
KiloBytes rather than bytes. (Vinod Kumar Vavilapalli via acmurthy)
HADOOP-4139. Optimize Hive multi group-by.
(Namin Jain via dhruba)
HADOOP-3911. Add a check to fsck options to make sure -files is not
the first option to resolve conflicts with GenericOptionsParser
(lohit)
HADOOP-3623. Refactor LeaseManager. (szetszwo)
HADOOP-4125. Handles Reduce cleanup tip on the web ui.
(Amareshwari Sriramadasu via ddas)
HADOOP-4087. Hive Metastore API for php and python clients.
(Prasad Chakka via dhruba)
HADOOP-4197. Update DATA_TRANSFER_VERSION for HADOOP-3981. (szetszwo)
HADOOP-4138. Refactor the Hive SerDe library to better structure
the interfaces to the serializer and de-serializer.
(Zheng Shao via dhruba)
HADOOP-4195. Close compressor before returning to codec pool.
(acmurthy via omalley)
HADOOP-2403. Escapes some special characters before logging to
history files. (Amareshwari Sriramadasu via ddas)
HADOOP-4200. Fix a bug in the test-patch.sh script.
(Ramya R via nigel)
HADOOP-4084. Add explain plan capabilities to Hive Query Language.
(Ashish Thusoo via dhruba)
HADOOP-4121. Preserve cause for exception if the initialization of
HistoryViewer for JobHistory fails. (Amareshwari Sri Ramadasu via
acmurthy)
HADOOP-4213. Fixes NPE in TestLimitTasksPerJobTaskScheduler.
(Sreekanth Ramakrishnan via ddas)
HADOOP-4077. Setting access and modification time for a file
requires write permissions on the file. (dhruba)
HADOOP-3592. Fix a couple of possible file leaks in FileUtil
(Bill de hOra via rangadi)
HADOOP-4120. Hive interactive shell records the time taken by a
query. (Raghotham Murthy via dhruba)
HADOOP-4090. The hive scripts pick up hadoop from HADOOP_HOME
and then the path. (Raghotham Murthy via dhruba)
HADOOP-4242. Remove extra ";" in FSDirectory that blocks compilation
in some IDE's. (szetszwo via omalley)
HADOOP-4249. Fix eclipse path to include the hsqldb.jar. (szetszwo via
omalley)
HADOOP-4247. Move InputSampler into org.apache.hadoop.mapred.lib, so that
examples.jar doesn't depend on tools.jar. (omalley)
HADOOP-4269. Fix the deprecation of LineReader by extending the new class
into the old name and deprecating it. Also update the tests to test the
new class. (cdouglas via omalley)
HADOOP-4280. Fix conversions between seconds in C and milliseconds in
Java for access times for files. (Pete Wyckoff via rangadi)
HADOOP-4254. -setSpaceQuota command does not convert "TB" extenstion to
terabytes properly. Implementation now uses StringUtils for parsing this.
(Raghu Angadi)
HADOOP-4259. Findbugs should run over tools.jar also. (cdouglas via
omalley)
HADOOP-4275. Move public method isJobValidName from JobID to a private
method in JobTracker. (omalley)
HADOOP-4173. fix failures in TestProcfsBasedProcessTree and
TestTaskTrackerMemoryManager tests. ProcfsBasedProcessTree and
memory management in TaskTracker are disabled on Windows.
(Vinod K V via rangadi)
HADOOP-4189. Fixes the history blocksize & intertracker protocol version
issues introduced as part of HADOOP-3245. (Amar Kamat via ddas)
HADOOP-4190. Fixes the backward compatibility issue with Job History.
introduced by HADOOP-3245 and HADOOP-2403. (Amar Kamat via ddas)
HADOOP-4237. Fixes the TestStreamingBadRecords.testNarrowDown testcase.
(Sharad Agarwal via ddas)
HADOOP-4274. Capacity scheduler accidently modifies the underlying
data structures when browing the job lists. (Hemanth Yamijala via omalley)
HADOOP-4309. Fix eclipse-plugin compilation. (cdouglas)
HADOOP-4232. Fix race condition in JVM reuse when multiple slots become
free. (ddas via acmurthy)
HADOOP-4302. Fix a race condition in TestReduceFetch that can yield false
negatvies. (cdouglas)
HADOOP-3942. Update distcp documentation to include features introduced in
HADOOP-3873, HADOOP-3939. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-4319. fuse-dfs dfs_read function returns as many bytes as it is
told to read unlesss end-of-file is reached. (Pete Wyckoff via dhruba)
HADOOP-4246. Ensure we have the correct lower bound on the number of
retries for fetching map-outputs; also fixed the case where the reducer
automatically kills on too many unique map-outputs could not be fetched
for small jobs. (Amareshwari Sri Ramadasu via acmurthy)
HADOOP-4163. Report FSErrors from map output fetch threads instead of
merely logging them. (Sharad Agarwal via cdouglas)
HADOOP-4261. Adds a setup task for jobs. This is required so that we
don't setup jobs that haven't been inited yet (since init could lead
to job failure). Only after the init has successfully happened do we
launch the setupJob task. (Amareshwari Sriramadasu via ddas)
HADOOP-4256. Removes Completed and Failed Job tables from
jobqueue_details.jsp. (Sreekanth Ramakrishnan via ddas)
HADOOP-4267. Occasional exceptions during shutting down HSQLDB is logged
but not rethrown. (enis)
HADOOP-4018. The number of tasks for a single job cannot exceed a
pre-configured maximum value. (dhruba)
HADOOP-4288. Fixes a NPE problem in CapacityScheduler.
(Amar Kamat via ddas)
HADOOP-4014. Create hard links with 'fsutil hardlink' on Windows. (shv)
HADOOP-4393. Merged org.apache.hadoop.fs.permission.AccessControlException
and org.apache.hadoop.security.AccessControlIOException into a single
class hadoop.security.AccessControlException. (omalley via acmurthy)
HADOOP-4287. Fixes an issue to do with maintaining counts of running/pending
maps/reduces. (Sreekanth Ramakrishnan via ddas)
HADOOP-4361. Makes sure that jobs killed from command line are killed
fast (i.e., there is a slot to run the cleanup task soon).
(Amareshwari Sriramadasu via ddas)
HADOOP-4400. Add "hdfs://" to fs.default.name on quickstart.html.
(Jeff Hammerbacher via omalley)
HADOOP-4378. Fix TestJobQueueInformation to use SleepJob rather than
WordCount via TestMiniMRWithDFS. (Sreekanth Ramakrishnan via acmurthy)
HADOOP-4376. Fix formatting in hadoop-default.xml for
hadoop.http.filter.initializers. (Enis Soztutar via acmurthy)
HADOOP-4410. Adds an extra arg to the API FileUtil.makeShellPath to
determine whether to canonicalize file paths or not.
(Amareshwari Sriramadasu via ddas)
HADOOP-4236. Ensure un-initialized jobs are killed correctly on
user-demand. (Sharad Agarwal via acmurthy)
HADOOP-4373. Fix calculation of Guaranteed Capacity for the
capacity-scheduler. (Hemanth Yamijala via acmurthy)
HADOOP-4053. Schedulers must be notified when jobs complete. (Amar Kamat via omalley)
HADOOP-4335. Fix FsShell -ls for filesystems without owners/groups. (David
Phillips via cdouglas)
HADOOP-4426. TestCapacityScheduler broke due to the two commits HADOOP-4053
and HADOOP-4373. This patch fixes that. (Hemanth Yamijala via ddas)
HADOOP-4418. Updates documentation in forrest for Mapred, streaming and pipes.
(Amareshwari Sriramadasu via ddas)
HADOOP-3155. Ensure that there is only one thread fetching
TaskCompletionEvents on TaskTracker re-init. (Dhruba Borthakur via
acmurthy)
HADOOP-4425. Fix EditLogInputStream to overload the bulk read method.
(cdouglas)
HADOOP-4427. Adds the new queue/job commands to the manual.
(Sreekanth Ramakrishnan via ddas)
HADOOP-4278. Increase debug logging for unit test TestDatanodeDeath.
Fix the case when primary is dead. (dhruba via szetszwo)
HADOOP-4423. Keep block length when the block recovery is triggered by
append. (szetszwo)
HADOOP-4449. Fix dfsadmin usage. (Raghu Angadi via cdouglas)
HADOOP-4455. Added TestSerDe so that unit tests can run successfully.
(Ashish Thusoo via dhruba)
HADOOP-4457. Fixes an input split logging problem introduced by
HADOOP-3245. (Amareshwari Sriramadasu via ddas)
HADOOP-4464. Separate out TestFileCreationClient from TestFileCreation.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-4404. saveFSImage() removes files from a storage directory that do
not correspond to its type. (shv)
HADOOP-4149. Fix handling of updates to the job priority, by changing the
list of jobs to be keyed by the priority, submit time, and job tracker id.
(Amar Kamat via omalley)
HADOOP-4296. Fix job client failures by not retiring a job as soon as it
is finished. (dhruba)
HADOOP-4439. Remove configuration variables that aren't usable yet, in
particular mapred.tasktracker.tasks.maxmemory and mapred.task.max.memory.
(Hemanth Yamijala via omalley)
HADOOP-4230. Fix for serde2 interface, limit operator, select * operator,
UDF trim functions and sampling. (Ashish Thusoo via dhruba)
HADOOP-4358. No need to truncate access time in INode. Also fixes NPE
in CreateEditsLog. (Raghu Angadi)
HADOOP-4387. TestHDFSFileSystemContract fails on windows nightly builds.
(Raghu Angadi)
HADOOP-4466. Ensure that SequenceFileOutputFormat isn't tied to Writables
and can be used with other Serialization frameworks. (Chris Wensel via
acmurthy)
HADOOP-4525. Fix ipc.server.ipcnodelay originally missed in in HADOOP-2232.
(cdouglas via Clint Morgan)
HADOOP-4498. Ensure that JobHistory correctly escapes the job name so that
regex patterns work. (Chris Wensel via acmurthy)
HADOOP-4446. Modify guaranteed capacity labels in capacity scheduler's UI
to reflect the information being displayed. (Sreekanth Ramakrishnan via
yhemanth)
HADOOP-4282. Some user facing URLs are not filtered by user filters.
(szetszwo)
HADOOP-4595. Fixes two race conditions - one to do with updating free slot count,
and another to do with starting the MapEventsFetcher thread. (ddas)
HADOOP-4552. Fix a deadlock in RPC server. (Raghu Angadi)
HADOOP-4471. Sort running jobs by priority in the capacity scheduler.
(Amar Kamat via yhemanth)
HADOOP-4500. Fix MultiFileSplit to get the FileSystem from the relevant
path rather than the JobClient. (Joydeep Sen Sarma via cdouglas)
Release 0.18.4 - Unreleased
BUG FIXES
HADOOP-5114. Remove timeout for accept() in DataNode. This makes accept()
fail in JDK on Windows and causes many tests to fail. (Raghu Angadi)
HADOOP-5192. Block receiver should not remove a block that's created or
being written by other threads. (hairong)
HADOOP-5134. FSNamesystem#commitBlockSynchronization adds under-construction
block locations to blocksMap. (Dhruba Borthakur via hairong)
HADOOP-5412. Simulated DataNode should not write to a block that's being
written by another thread. (hairong)
HADOOP-5465. Fix the problem of blocks remaining under-replicated by
providing synchronized modification to the counter xmitsInProgress in
DataNode. (hairong)
HADOOP-5557. Fixes some minor problems in TestOverReplicatedBlocks.
(szetszwo)
HADOOP-5644. Namenode is stuck in safe mode. (suresh Srinivas via hairong)
HADOOP-6017. Lease Manager in NameNode does not handle certain characters
in filenames. This results in fatal errors in Secondary NameNode and while
restrating NameNode. (Tsz Wo (Nicholas), SZE via rangadi)
Release 0.18.3 - 2009-01-27
IMPROVEMENTS
HADOOP-4150. Include librecordio in hadoop releases. (Giridharan Kesavan
via acmurthy)
HADOOP-4668. Improve documentation for setCombinerClass to clarify the
restrictions on combiners. (omalley)
BUG FIXES
HADOOP-4499. DFSClient should invoke checksumOk only once. (Raghu Angadi)
HADOOP-4597. Calculate mis-replicated blocks when safe-mode is turned
off manually. (shv)
HADOOP-3121. lsr should keep listing the remaining items but not
terminate if there is any IOException. (szetszwo)
HADOOP-4610. Always calculate mis-replicated blocks when safe-mode is
turned off. (shv)
HADOOP-3883. Limit namenode to assign at most one generation stamp for
a particular block within a short period. (szetszwo)
HADOOP-4556. Block went missing. (hairong)
HADOOP-4643. NameNode should exclude excessive replicas when counting
live replicas for a block. (hairong)
HADOOP-4703. Should not wait for proxy forever in lease recovering.
(szetszwo)
HADOOP-4647. NamenodeFsck should close the DFSClient it has created.
(szetszwo)
HADOOP-4616. Fuse-dfs can handle bad values from FileSystem.read call.
(Pete Wyckoff via dhruba)
HADOOP-4061. Throttle Datanode decommission monitoring in Namenode.
(szetszwo)
HADOOP-4659. Root cause of connection failure is being lost to code that
uses it for delaying startup. (Steve Loughran and Hairong via hairong)
HADOOP-4614. Lazily open segments when merging map spills to avoid using
too many file descriptors. (Yuri Pradkin via cdouglas)
HADOOP-4257. The DFS client should pick only one datanode as the candidate
to initiate lease recovery. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-4713. Fix librecordio to handle records larger than 64k. (Christian
Kunz via cdouglas)
HADOOP-4635. Fix a memory leak in fuse dfs. (pete wyckoff via mahadev)
HADOOP-4714. Report status between merges and make the number of records
between progress reports configurable. (Jothi Padmanabhan via cdouglas)
HADOOP-4726. Fix documentation typos "the the". (Edward J. Yoon via
szetszwo)
HADOOP-4679. Datanode prints tons of log messages: waiting for threadgroup
to exit, active threads is XX. (hairong)
HADOOP-4746. Job output directory should be normalized. (hairong)
HADOOP-4717. Removal of default port# in NameNode.getUri() causes a
map/reduce job failed to prompt temporary output. (hairong)
HADOOP-4778. Check for zero size block meta file when updating a block.
(szetszwo)
HADOOP-4742. Replica gets deleted by mistake. (Wang Xu via hairong)
HADOOP-4702. Failed block replication leaves an incomplete block in
receiver's tmp data directory. (hairong)
HADOOP-4613. Fix block browsing on Web UI. (Johan Oskarsson via shv)
HADOOP-4806. HDFS rename should not use src path as a regular expression.
(szetszwo)
HADOOP-4795. Prevent lease monitor getting into an infinite loop when
leases and the namespace tree does not match. (szetszwo)
HADOOP-4620. Fixes Streaming to handle well the cases of map/reduce with empty
input/output. (Ravi Gummadi via ddas)
HADOOP-4857. Fixes TestUlimit to have exactly 1 map in the jobs spawned.
(Ravi Gummadi via ddas)
HADOOP-4810. Data lost at cluster startup time. (hairong)
HADOOP-4797. Improve how RPC server reads and writes large buffers. Avoids
soft-leak of direct buffers and excess copies in NIO layer. (Raghu Angadi)
HADOOP-4840. TestNodeCount sometimes fails with NullPointerException.
(hairong)
HADOOP-4904. Fix deadlock while leaving safe mode. (shv)
HADOOP-1980. 'dfsadmin -safemode enter' should prevent the namenode from
leaving safemode automatically. (shv)
HADOOP-4951. Lease monitor should acquire the LeaseManager lock but not the
Monitor lock. (szetszwo)
HADOOP-4935. processMisReplicatedBlocks() should not clear
excessReplicateMap. (shv)
HADOOP-4961. Fix ConcurrentModificationException in lease recovery
of empty files. (shv)
HADOOP-4971. A long (unexpected) delay at datanodes could make subsequent
block reports from many datanode at the same time. (Raghu Angadi)
HADOOP-4910. NameNode should exclude replicas when choosing excessive
replicas to delete to avoid data lose. (hairong)
HADOOP-4983. Fixes a problem in updating Counters in the status reporting.
(Amareshwari Sriramadasu via ddas)
Release 0.18.2 - 2008-11-03
BUG FIXES
HADOOP-3614. Fix a bug that Datanode may use an old GenerationStamp to get
meta file. (szetszwo)
HADOOP-4314. Simulated datanodes should not include blocks that are still
being written in their block report. (Raghu Angadi)
HADOOP-4228. dfs datanode metrics, bytes_read and bytes_written, overflow
due to incorrect type used. (hairong)
HADOOP-4395. The FSEditLog loading is incorrect for the case OP_SET_OWNER.
(szetszwo)
HADOOP-4351. FSNamesystem.getBlockLocationsInternal throws
ArrayIndexOutOfBoundsException. (hairong)
HADOOP-4403. Make TestLeaseRecovery and TestFileCreation more robust.
(szetszwo)
HADOOP-4292. Do not support append() for LocalFileSystem. (hairong)
HADOOP-4399. Make fuse-dfs multi-thread access safe.
(Pete Wyckoff via dhruba)
HADOOP-4369. Use setMetric(...) instead of incrMetric(...) for metrics
averages. (Brian Bockelman via szetszwo)
HADOOP-4469. Rename and add the ant task jar file to the tar file. (nigel)
HADOOP-3914. DFSClient sends Checksum Ok only once for a block.
(Christian Kunz via hairong)
HADOOP-4467. SerializationFactory now uses the current context ClassLoader
allowing for user supplied Serialization instances. (Chris Wensel via
acmurthy)
HADOOP-4517. Release FSDataset lock before joining ongoing create threads.
(szetszwo)
HADOOP-4526. fsck failing with NullPointerException. (hairong)
HADOOP-4483 Honor the max parameter in DatanodeDescriptor.getBlockArray(..)
(Ahad Rana and Hairong Kuang via szetszwo)
HADOOP-4340. Correctly set the exit code from JobShell.main so that the
'hadoop jar' command returns the right code to the user. (acmurthy)
NEW FEATURES
HADOOP-2421. Add jdiff output to documentation, listing all API
changes from the prior release. (cutting)
Release 0.18.1 - 2008-09-17
IMPROVEMENTS
HADOOP-3934. Upgrade log4j to 1.2.15. (omalley)
BUG FIXES
HADOOP-3995. In case of quota failure on HDFS, rename does not restore
source filename. (rangadi)
HADOOP-3821. Prevent SequenceFile and IFile from duplicating codecs in
CodecPool when closed more than once. (Arun Murthy via cdouglas)
HADOOP-4040. Remove coded default of the IPC idle connection timeout
from the TaskTracker, which was causing HDFS client connections to not be
collected. (ddas via omalley)
HADOOP-4046. Made WritableComparable's constructor protected instead of
private to re-enable class derivation. (cdouglas via omalley)
HADOOP-3940. Fix in-memory merge condition to wait when there are no map
outputs or when the final map outputs are being fetched without contention.
(cdouglas)
Release 0.18.0 - 2008-08-19
INCOMPATIBLE CHANGES
HADOOP-2703. The default options to fsck skips checking files
that are being written to. The output of fsck is incompatible
with previous release. (lohit vijayarenu via dhruba)
HADOOP-2865. FsShell.ls() printout format changed to print file names
in the end of the line. (Edward J. Yoon via shv)
HADOOP-3283. The Datanode has a RPC server. It currently supports
two RPCs: the first RPC retrives the metadata about a block and the
second RPC sets the generation stamp of an existing block.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2797. Code related to upgrading to 0.14 (Block CRCs) is
removed. As result, upgrade to 0.18 or later from 0.13 or earlier
is not supported. If upgrading from 0.13 or earlier is required,
please upgrade to an intermediate version (0.14-0.17) and then
to this version. (rangadi)
HADOOP-544. This issue introduces new classes JobID, TaskID and
TaskAttemptID, which should be used instead of their string counterparts.
Functions in JobClient, TaskReport, RunningJob, jobcontrol.Job and
TaskCompletionEvent that use string arguments are deprecated in favor
of the corresponding ones that use ID objects. Applications can use
xxxID.toString() and xxxID.forName() methods to convert/restore objects
to/from strings. (Enis Soztutar via ddas)
HADOOP-2188. RPC client sends a ping rather than throw timeouts.
RPC server does not throw away old RPCs. If clients and the server are on
different versions, they are not able to function well. In addition,
The property ipc.client.timeout is removed from the default hadoop
configuration. It also removes metrics RpcOpsDiscardedOPsNum. (hairong)
HADOOP-2181. This issue adds logging for input splits in Jobtracker log
and jobHistory log. Also adds web UI for viewing input splits in job UI
and history UI. (Amareshwari Sriramadasu via ddas)
HADOOP-3226. Run combiners multiple times over map outputs as they
are merged in both the map and the reduce tasks. (cdouglas via omalley)
HADOOP-3329. DatanodeDescriptor objects should not be stored in the
fsimage. (dhruba)
HADOOP-2656. The Block object has a generation stamp inside it.
Existing blocks get a generation stamp of 0. This is needed to support
appends. (dhruba)
HADOOP-3390. Removed deprecated ClientProtocol.abandonFileInProgress().
(Tsz Wo (Nicholas), SZE via rangadi)
HADOOP-3405. Made some map/reduce internal classes non-public:
MapTaskStatus, ReduceTaskStatus, JobSubmissionProtocol,
CompletedJobStatusStore. (enis via omaley)
HADOOP-3265. Removed depcrecated API getFileCacheHints().
(Lohit Vijayarenu via rangadi)
HADOOP-3310. The namenode instructs the primary datanode to do lease
recovery. The block gets a new generation stamp.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2909. Improve IPC idle connection management. Property
ipc.client.maxidletime is removed from the default configuration,
instead it is defined as twice of the ipc.client.connection.maxidletime.
A connection with outstanding requests won't be treated as idle.
(hairong)
HADOOP-3459. Change in the output format of dfs -ls to more closely match
/bin/ls. New format is: perm repl owner group size date name
(Mukund Madhugiri via omally)
HADOOP-3113. An fsync invoked on a HDFS file really really
persists data! The datanode moves blocks in the tmp directory to
the real block directory on a datanode-restart. (dhruba)
HADOOP-3452. Change fsck to return non-zero status for a corrupt
FileSystem. (lohit vijayarenu via cdouglas)
HADOOP-3193. Include the address of the client that found the corrupted
block in the log. Also include a CorruptedBlocks metric to track the size
of the corrupted block map. (cdouglas)
HADOOP-3512. Separate out the tools into a tools jar. (omalley)
HADOOP-3598. Ensure that temporary task-output directories are not created
if they are not necessary e.g. for Maps with no side-effect files.
(acmurthy)
HADOOP-3665. Modify WritableComparator so that it only creates instances
of the keytype if the type does not define a WritableComparator. Calling
the superclass compare will throw a NullPointerException. Also define
a RawComparator for NullWritable and permit it to be written as a key
to SequenceFiles. (cdouglas)
HADOOP-3673. Avoid deadlock caused by DataNode RPC receoverBlock().
(Tsz Wo (Nicholas), SZE via rangadi)
NEW FEATURES
HADOOP-3074. Provides a UrlStreamHandler for DFS and other FS,
relying on FileSystem (taton)
HADOOP-2585. Name-node imports namespace data from a recent checkpoint
accessible via a NFS mount. (shv)
HADOOP-3061. Writable types for doubles and bytes. (Andrzej
Bialecki via omalley)
HADOOP-2857. Allow libhdfs to set jvm options. (Craig Macdonald
via omalley)
HADOOP-3317. Add default port for HDFS namenode. The port in
"hdfs:" URIs now defaults to 8020, so that one may simply use URIs
of the form "hdfs://example.com/dir/file". (cutting)
HADOOP-2019. Adds support for .tar, .tgz and .tar.gz files in
DistributedCache (Amareshwari Sriramadasu via ddas)
HADOOP-3058. Add FSNamesystem status metrics.
(Lohit Vjayarenu via rangadi)
HADOOP-1915. Allow users to specify counters via strings instead
of enumerations. (tomwhite via omalley)
HADOOP-2065. Delay invalidating corrupt replicas of block until its
is removed from under replicated state. If all replicas are found to
be corrupt, retain all copies and mark the block as corrupt.
(Lohit Vjayarenu via rangadi)
HADOOP-3221. Adds org.apache.hadoop.mapred.lib.NLineInputFormat, which
splits files into splits each of N lines. N can be specified by
configuration property "mapred.line.input.format.linespermap", which
defaults to 1. (Amareshwari Sriramadasu via ddas)
HADOOP-3336. Direct a subset of annotated FSNamesystem calls for audit
logging. (cdouglas)
HADOOP-3400. A new API FileSystem.deleteOnExit() that facilitates
handling of temporary files in HDFS. (dhruba)
HADOOP-4. Add fuse-dfs to contrib, permitting one to mount an
HDFS filesystem on systems that support FUSE, e.g., Linux.
(Pete Wyckoff via cutting)
HADOOP-3246. Add FTPFileSystem. (Ankur Goel via cutting)
HADOOP-3250. Extend FileSystem API to allow appending to files.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3177. Implement Syncable interface for FileSystem.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-1328. Implement user counters in streaming. (tomwhite via
omalley)
HADOOP-3187. Quotas for namespace management. (Hairong Kuang via ddas)
HADOOP-3307. Support for Archives in Hadoop. (Mahadev Konar via ddas)
HADOOP-3460. Add SequenceFileAsBinaryOutputFormat to permit direct
writes of serialized data. (Koji Noguchi via cdouglas)
HADOOP-3230. Add ability to get counter values from command
line. (tomwhite via omalley)
HADOOP-930. Add support for native S3 files. (tomwhite via cutting)
HADOOP-3502. Quota API needs documentation in Forrest. (hairong)
HADOOP-3413. Allow SequenceFile.Reader to use serialization
framework. (tomwhite via omalley)
HADOOP-3541. Import of the namespace from a checkpoint documented
in hadoop user guide. (shv)
IMPROVEMENTS
HADOOP-3677. Simplify generation stamp upgrade by making is a
local upgrade on datandodes. Deleted distributed upgrade.
(rangadi)
HADOOP-2928. Remove deprecated FileSystem.getContentLength().
(Lohit Vijayarenu via rangadi)
HADOOP-3130. Make the connect timeout smaller for getFile.
(Amar Ramesh Kamat via ddas)
HADOOP-3160. Remove deprecated exists() from ClientProtocol and
FSNamesystem (Lohit Vjayarenu via rangadi)
HADOOP-2910. Throttle IPC Clients during bursts of requests or
server slowdown. Clients retry connection for up to 15 minutes
when socket connection times out. (hairong)
HADOOP-3295. Allow TextOutputFormat to use configurable spearators.
(Zheng Shao via cdouglas).
HADOOP-3308. Improve QuickSort by excluding values eq the pivot from the
partition. (cdouglas)
HADOOP-2461. Trim property names in configuration.
(Tsz Wo (Nicholas), SZE via shv)
HADOOP-2799. Deprecate o.a.h.io.Closable in favor of java.io.Closable.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3345. Enhance the hudson-test-patch target to cleanup messages,
fix minor defects, and add eclipse plugin and python unit tests. (nigel)
HADOOP-3144. Improve robustness of LineRecordReader by defining a maximum
line length (mapred.linerecordreader.maxlength), thereby avoiding reading
too far into the following split. (Zheng Shao via cdouglas)
HADOOP-3334. Move lease handling from FSNamesystem into a seperate class.
(Tsz Wo (Nicholas), SZE via rangadi)
HADOOP-3332. Reduces the amount of logging in Reducer's shuffle phase.
(Devaraj Das)
HADOOP-3355. Enhances Configuration class to accept hex numbers for getInt
and getLong. (Amareshwari Sriramadasu via ddas)
HADOOP-3350. Add an argument to distcp to permit the user to limit the
number of maps. (cdouglas)
HADOOP-3013. Add corrupt block reporting to fsck.
(lohit vijayarenu via cdouglas)
HADOOP-3377. Remove TaskRunner::replaceAll and replace with equivalent
String::replace. (Brice Arnould via cdouglas)
HADOOP-3398. Minor improvement to a utility function in that participates
in backoff calculation. (cdouglas)
HADOOP-3381. Clear referenced when directories are deleted so that
effect of memory leaks are not multiplied. (rangadi)
HADOOP-2867. Adds the task's CWD to its LD_LIBRARY_PATH.
(Amareshwari Sriramadasu via ddas)
HADOOP-3232. DU class runs the 'du' command in a seperate thread so
that it does not block user. DataNode misses heartbeats in large
nodes otherwise. (Johan Oskarsson via rangadi)
HADOOP-3035. During block transfers between datanodes, the receiving
datanode, now can report corrupt replicas received from src node to
the namenode. (Lohit Vijayarenu via rangadi)
HADOOP-3434. Retain the cause of the bind failure in Server::bind.
(Steve Loughran via cdouglas)
HADOOP-3429. Increases the size of the buffers used for the communication
for Streaming jobs. (Amareshwari Sriramadasu via ddas)
HADOOP-3486. Change default for initial block report to 0 seconds
and document it. (Sanjay Radia via omalley)
HADOOP-3448. Improve the text in the assertion making sure the
layout versions are consistent in the data node. (Steve Loughran
via omalley)
HADOOP-2095. Improve the Map-Reduce shuffle/merge by cutting down
buffer-copies; changed intermediate sort/merge to use the new IFile format
rather than SequenceFiles and compression of map-outputs is now
implemented by compressing the entire file rather than SequenceFile
compression. Shuffle also has been changed to use a simple byte-buffer
manager rather than the InMemoryFileSystem.
Configuration changes to hadoop-default.xml:
deprecated mapred.map.output.compression.type
(acmurthy)
HADOOP-236. JobTacker now refuses connection from a task tracker with a
different version number. (Sharad Agarwal via ddas)
HADOOP-3427. Improves the shuffle scheduler. It now waits for notifications
from shuffle threads when it has scheduled enough, before scheduling more.
(ddas)
HADOOP-2393. Moves the handling of dir deletions in the tasktracker to
a separate thread. (Amareshwari Sriramadasu via ddas)
HADOOP-3501. Deprecate InMemoryFileSystem. (cutting via omalley)
HADOOP-3366. Stall the shuffle while in-memory merge is in progress.
(acmurthy)
HADOOP-2916. Refactor src structure, but leave package structure alone.
(Raghu Angadi via mukund)
HADOOP-3492. Add forrest documentation for user archives.
(Mahadev Konar via hairong)
HADOOP-3467. Improve documentation for FileSystem::deleteOnExit.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3379. Documents stream.non.zero.exit.status.is.failure for Streaming.
(Amareshwari Sriramadasu via ddas)
HADOOP-3096. Improves documentation about the Task Execution Environment in
the Map-Reduce tutorial. (Amareshwari Sriramadasu via ddas)
HADOOP-2984. Add forrest documentation for DistCp. (cdouglas)
HADOOP-3406. Add forrest documentation for Profiling.
(Amareshwari Sriramadasu via ddas)
HADOOP-2762. Add forrest documentation for controls of memory limits on
hadoop daemons and Map-Reduce tasks. (Amareshwari Sriramadasu via ddas)
HADOOP-3535. Fix documentation and name of IOUtils.close to
reflect that it should only be used in cleanup contexts. (omalley)
HADOOP-3593. Updates the mapred tutorial. (ddas)
HADOOP-3547. Documents the way in which native libraries can be distributed
via the DistributedCache. (Amareshwari Sriramadasu via ddas)
HADOOP-3606. Updates the Streaming doc. (Amareshwari Sriramadasu via ddas)
HADOOP-3532. Add jdiff reports to the build scripts. (omalley)
HADOOP-3100. Develop tests to test the DFS command line interface. (mukund)
HADOOP-3688. Fix up HDFS docs. (Robert Chansler via hairong)
OPTIMIZATIONS
HADOOP-3274. The default constructor of BytesWritable creates empty
byte array. (Tsz Wo (Nicholas), SZE via shv)
HADOOP-3272. Remove redundant copy of Block object in BlocksMap.
(Lohit Vjayarenu via shv)
HADOOP-3164. Reduce DataNode CPU usage by using FileChannel.tranferTo().
On Linux DataNode takes 5 times less CPU while serving data. Results may
vary on other platforms. (rangadi)
HADOOP-3248. Optimization of saveFSImage. (Dhruba via shv)
HADOOP-3297. Fetch more task completion events from the job
tracker and task tracker. (ddas via omalley)
HADOOP-3364. Faster image and log edits loading. (shv)
HADOOP-3369. Fast block processing during name-node startup. (shv)
HADOOP-1702. Reduce buffer copies when data is written to DFS.
DataNodes take 30% less CPU while writing data. (rangadi)
HADOOP-3095. Speed up split generation in the FileInputSplit,
especially for non-HDFS file systems. Deprecates
InputFormat.validateInput. (tomwhite via omalley)
HADOOP-3552. Add forrest documentation for Hadoop commands.
(Sharad Agarwal via cdouglas)
BUG FIXES
HADOOP-2905. 'fsck -move' triggers NPE in NameNode.
(Lohit Vjayarenu via rangadi)
Increment ClientProtocol.versionID missed by HADOOP-2585. (shv)
HADOOP-3254. Restructure internal namenode methods that process
heartbeats to use well-defined BlockCommand object(s) instead of
using the base java Object. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-3176. Change lease record when a open-for-write-file
gets renamed. (dhruba)
HADOOP-3269. Fix a case when namenode fails to restart
while processing a lease record. ((Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-3282. Port issues in TestCheckpoint resolved. (shv)
HADOOP-3268. file:// URLs issue in TestUrlStreamHandler under Windows.
(taton)
HADOOP-3127. Deleting files in trash should really remove them.
(Brice Arnould via omalley)
HADOOP-3300. Fix locking of explicit locks in NetworkTopology.
(tomwhite via omalley)
HADOOP-3270. Constant DatanodeCommands are stored in static final
immutable variables for better code clarity.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2793. Fix broken links for worst performing shuffle tasks in
the job history page. (Amareshwari Sriramadasu via ddas)
HADOOP-3313. Avoid unnecessary calls to System.currentTimeMillis
in RPC::Invoker. (cdouglas)
HADOOP-3318. Recognize "Darwin" as an alias for "Mac OS X" to
support Soylatte. (Sam Pullara via omalley)
HADOOP-3301. Fix misleading error message when S3 URI hostname
contains an underscore. (tomwhite via omalley)
HADOOP-3338. Fix Eclipse plugin to compile after HADOOP-544 was
committed. Updated all references to use the new JobID representation.
(taton via nigel)
HADOOP-3337. Loading FSEditLog was broken by HADOOP-3283 since it
changed Writable serialization of DatanodeInfo. This patch handles it.
(Tsz Wo (Nicholas), SZE via rangadi)
HADOOP-3101. Prevent JobClient from throwing an exception when printing
usage. (Edward J. Yoon via cdouglas)
HADOOP-3119. Update javadoc for Text::getBytes to better describe its
behavior. (Tim Nelson via cdouglas)
HADOOP-2294. Fix documentation in libhdfs to refer to the correct free
function. (Craig Macdonald via cdouglas)
HADOOP-3335. Prevent the libhdfs build from deleting the wrong
files on make clean. (cutting via omalley)
HADOOP-2930. Make {start,stop}-balancer.sh work even if hadoop-daemon.sh
is not in the PATH. (Spiros Papadimitriou via hairong)
HADOOP-3085. Catch Exception in metrics util classes to ensure that
misconfigured metrics don't prevent others from updating. (cdouglas)
HADOOP-3299. CompositeInputFormat should configure the sub-input
formats. (cdouglas via omalley)
HADOOP-3309. Lower io.sort.mb and fs.inmemory.size.mb for MiniMRDFSSort
unit test so it passes on Windows. (lohit vijayarenu via cdouglas)
HADOOP-3348. TestUrlStreamHandler should set URLStreamFactory after
DataNodes are initialized. (Lohit Vijayarenu via rangadi)
HADOOP-3371. Ignore InstanceAlreadyExistsException from
MBeanUtil::registerMBean. (lohit vijayarenu via cdouglas)
HADOOP-3349. A file rename was incorrectly changing the name inside a
lease record. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-3365. Removes an unnecessary copy of the key from SegmentDescriptor
to MergeQueue. (Devaraj Das)
HADOOP-3388. Fix for TestDatanodeBlockScanner to handle blocks with
generation stamps in them. (dhruba)
HADOOP-3203. Fixes TaskTracker::localizeJob to pass correct file sizes
for the jarfile and the jobfile. (Amareshwari Sriramadasu via ddas)
HADOOP-3391. Fix a findbugs warning introduced by HADOOP-3248 (rangadi)
HADOOP-3393. Fix datanode shutdown to call DataBlockScanner::shutdown and
close its log, even if the scanner thread is not running. (lohit vijayarenu
via cdouglas)
HADOOP-3399. A debug message was logged at info level. (rangadi)
HADOOP-3396. TestDatanodeBlockScanner occationally fails.
(Lohit Vijayarenu via rangadi)
HADOOP-3339. Some of the failures on 3rd datanode in DFS write pipelie
are not detected properly. This could lead to hard failure of client's
write operation. (rangadi)
HADOOP-3409. Namenode should save the root inode into fsimage. (hairong)
HADOOP-3296. Fix task cache to work for more than two levels in the cache
hierarchy. This also adds a new counter to track cache hits at levels
greater than two. (Amar Kamat via cdouglas)
HADOOP-3375. Lease paths were sometimes not removed from
LeaseManager.sortedLeasesByPath. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-3424. Values returned by getPartition should be checked to
make sure they are in the range 0 to #reduces - 1 (cdouglas via
omalley)
HADOOP-3408. Change FSNamesystem to send its metrics as integers to
accommodate collectors that don't support long values. (lohit vijayarenu
via cdouglas)
HADOOP-3403. Fixes a problem in the JobTracker to do with handling of lost
tasktrackers. (Arun Murthy via ddas)
HADOOP-1318. Completed maps are not failed if the number of reducers are
zero. (Amareshwari Sriramadasu via ddas).
HADOOP-3351. Fixes the history viewer tool to not do huge StringBuffer
allocations. (Amareshwari Sriramadasu via ddas)
HADOOP-3419. Fixes TestFsck to wait for updates to happen before
checking results to make the test more reliable. (Lohit Vijaya
Renu via omalley)
HADOOP-3259. Makes failure to read system properties due to a
security manager non-fatal. (Edward Yoon via omalley)
HADOOP-3451. Update libhdfs to use FileSystem::getFileBlockLocations
instead of removed getFileCacheHints. (lohit vijayarenu via cdouglas)
HADOOP-3401. Update FileBench to set the new
"mapred.work.output.dir" property to work post-3041. (cdouglas via omalley)
HADOOP-2669. DFSClient locks pendingCreates appropriately. (dhruba)
HADOOP-3410. Fix KFS implemenation to return correct file
modification time. (Sriram Rao via cutting)
HADOOP-3340. Fix DFS metrics for BlocksReplicated, HeartbeatsNum, and
BlockReportsAverageTime. (lohit vijayarenu via cdouglas)
HADOOP-3435. Remove the assuption in the scripts that bash is at
/bin/bash and fix the test patch to require bash instead of sh.
(Brice Arnould via omalley)
HADOOP-3471. Fix spurious errors from TestIndexedSort and add additional
logging to let failures be reproducible. (cdouglas)
HADOOP-3443. Avoid copying map output across partitions when renaming a
single spill. (omalley via cdouglas)
HADOOP-3454. Fix Text::find to search only valid byte ranges. (Chad Whipkey
via cdouglas)
HADOOP-3417. Removes the static configuration variable,
commandLineConfig from JobClient. Moves the cli parsing from
JobShell to GenericOptionsParser. Thus removes the class
org.apache.hadoop.mapred.JobShell. (Amareshwari Sriramadasu via
ddas)
HADOOP-2132. Only RUNNING/PREP jobs can be killed. (Jothi Padmanabhan
via ddas)
HADOOP-3476. Code cleanup in fuse-dfs.
(Peter Wyckoff via dhruba)
HADOOP-2427. Ensure that the cwd of completed tasks is cleaned-up
correctly on task-completion. (Amareshwari Sri Ramadasu via acmurthy)
HADOOP-2565. Remove DFSPath cache of FileStatus.
(Tsz Wo (Nicholas), SZE via hairong)
HADOOP-3326. Cleanup the local-fs and in-memory merge in the ReduceTask by
spawing only one thread each for the on-disk and in-memory merge.
(Sharad Agarwal via acmurthy)
HADOOP-3493. Fix TestStreamingFailure to use FileUtil.fullyDelete to
ensure correct cleanup. (Lohit Vijayarenu via acmurthy)
HADOOP-3455. Fix NPE in ipc.Client in case of connection failure and
improve its synchronization. (hairong)
HADOOP-3240. Fix a testcase to not create files in the current directory.
Instead the file is created in the test directory (Mahadev Konar via ddas)
HADOOP-3496. Fix failure in TestHarFileSystem.testArchives due to change
in HADOOP-3095. (tomwhite)
HADOOP-3135. Get the system directory from the JobTracker instead of from
the conf. (Subramaniam Krishnan via ddas)
HADOOP-3503. Fix a race condition when client and namenode start
simultaneous recovery of the same block. (dhruba & Tsz Wo
(Nicholas), SZE)
HADOOP-3440. Fixes DistributedCache to not create symlinks for paths which
don't have fragments even when createSymLink is true.
(Abhijit Bagri via ddas)
HADOOP-3463. Hadoop-daemons script should cd to $HADOOP_HOME. (omalley)
HADOOP-3489. Fix NPE in SafeModeMonitor. (Lohit Vijayarenu via shv)
HADOOP-3509. Fix NPE in FSNamesystem.close. (Tsz Wo (Nicholas), SZE via
shv)
HADOOP-3491. Name-node shutdown causes InterruptedException in
ResolutionMonitor. (Lohit Vijayarenu via shv)
HADOOP-3511. Fixes namenode image to not set the root's quota to an
invalid value when the quota was not saved in the image. (hairong)
HADOOP-3516. Ensure the JobClient in HadoopArchives is initialized
with a configuration. (Subramaniam Krishnan via omalley)
HADOOP-3513. Improve NNThroughputBenchmark log messages. (shv)
HADOOP-3519. Fix NPE in DFS FileSystem rename. (hairong via tomwhite)
HADOOP-3528. Metrics FilesCreated and files_deleted metrics
do not match. (Lohit via Mahadev)
HADOOP-3418. When a directory is deleted, any leases that point to files
in the subdirectory are removed. ((Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-3542. Diables the creation of _logs directory for the archives
directory. (Mahadev Konar via ddas)
HADOOP-3544. Fixes a documentation issue for hadoop archives.
(Mahadev Konar via ddas)
HADOOP-3517. Fixes a problem in the reducer due to which the last InMemory
merge may be missed. (Arun Murthy via ddas)
HADOOP-3548. Fixes build.xml to copy all *.jar files to the dist.
(Owen O'Malley via ddas)
HADOOP-3363. Fix unformatted storage detection in FSImage. (shv)
HADOOP-3560. Fixes a problem to do with split creation in archives.
(Mahadev Konar via ddas)
HADOOP-3545. Fixes a overflow problem in archives.
(Mahadev Konar via ddas)
HADOOP-3561. Prevent the trash from deleting its parent directories.
(cdouglas)
HADOOP-3575. Fix the clover ant target after package refactoring.
(Nigel Daley via cdouglas)
HADOOP-3539. Fix the tool path in the bin/hadoop script under
cygwin. (Tsz Wo (Nicholas), Sze via omalley)
HADOOP-3520. TestDFSUpgradeFromImage triggers a race condition in the
Upgrade Manager. Fixed. (dhruba)
HADOOP-3586. Provide deprecated, backwards compatibile semantics for the
combiner to be run once and only once on each record. (cdouglas)
HADOOP-3533. Add deprecated methods to provide API compatibility
between 0.18 and 0.17. Remove the deprecated methods in trunk. (omalley)
HADOOP-3580. Fixes a problem to do with specifying a har as an input to
a job. (Mahadev Konar via ddas)
HADOOP-3333. Don't assign a task to a tasktracker that it failed to
execute earlier (used to happen in the case of lost tasktrackers where
the tasktracker would reinitialize and bind to a different port).
(Jothi Padmanabhan and Arun Murthy via ddas)
HADOOP-3534. Log IOExceptions that happen in closing the name
system when the NameNode shuts down. (Tsz Wo (Nicholas) Sze via omalley)
HADOOP-3546. TaskTracker re-initialization gets stuck in cleaning up.
(Amareshwari Sriramadasu via ddas)
HADOOP-3576. Fix NullPointerException when renaming a directory
to its subdirectory. (Tse Wo (Nicholas), SZE via hairong)
HADOOP-3320. Fix NullPointerException in NetworkTopology.getDistance().
(hairong)
HADOOP-3569. KFS input stream read() now correctly reads 1 byte
instead of 4. (Sriram Rao via omalley)
HADOOP-3599. Fix JobConf::setCombineOnceOnly to modify the instance rather
than a parameter. (Owen O'Malley via cdouglas)
HADOOP-3590. Null pointer exception in JobTracker when the task tracker is
not yet resolved. (Amar Ramesh Kamat via ddas)
HADOOP-3603. Fix MapOutputCollector to spill when io.sort.spill.percent is
1.0 and to detect spills when emitted records write no data. (cdouglas)
HADOOP-3615. Set DatanodeProtocol.versionID to the correct value.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3559. Fix the libhdfs test script and config to work with the
current semantics. (lohit vijayarenu via cdouglas)
HADOOP-3480. Need to update Eclipse template to reflect current trunk.
(Brice Arnould via tomwhite)
HADOOP-3588. Fixed usability issues with archives. (mahadev)
HADOOP-3635. Uncaught exception in DataBlockScanner.
(Tsz Wo (Nicholas), SZE via hairong)
HADOOP-3639. Exception when closing DFSClient while multiple files are
open. (Benjamin Gufler via hairong)
HADOOP-3572. SetQuotas usage interface has some minor bugs. (hairong)
HADOOP-3649. Fix bug in removing blocks from the corrupted block map.
(Lohit Vijayarenu via shv)
HADOOP-3604. Work around a JVM synchronization problem observed while
retrieving the address of direct buffers from compression code by obtaining
a lock during this call. (Arun C Murthy via cdouglas)
HADOOP-3683. Fix dfs metrics to count file listings rather than files
listed. (lohit vijayarenu via cdouglas)
HADOOP-3597. Fix SortValidator to use filesystems other than the default as
input. Validation job still runs on default fs.
(Jothi Padmanabhan via cdouglas)
HADOOP-3693. Fix archives, distcp and native library documentation to
conform to style guidelines. (Amareshwari Sriramadasu via cdouglas)
HADOOP-3653. Fix test-patch target to properly account for Eclipse
classpath jars. (Brice Arnould via nigel)
HADOOP-3692. Fix documentation for Cluster setup and Quick start guides.
(Amareshwari Sriramadasu via ddas)
HADOOP-3691. Fix streaming and tutorial docs. (Jothi Padmanabhan via ddas)
HADOOP-3630. Fix NullPointerException in CompositeRecordReader from empty
sources (cdouglas)
HADOOP-3706. Fix a ClassLoader issue in the mapred.join Parser that
prevents it from loading user-specified InputFormats.
(Jingkei Ly via cdouglas)
HADOOP-3718. Fix KFSOutputStream::write(int) to output a byte instead of
an int, per the OutputStream contract. (Sriram Rao via cdouglas)
HADOOP-3647. Add debug logs to help track down a very occassional,
hard-to-reproduce, bug in shuffle/merge on the reducer. (acmurthy)
HADOOP-3716. Prevent listStatus in KosmosFileSystem from returning
null for valid, empty directories. (Sriram Rao via cdouglas)
HADOOP-3752. Fix audit logging to record rename events. (cdouglas)
HADOOP-3737. Fix CompressedWritable to call Deflater::end to release
compressor memory. (Grant Glouser via cdouglas)
HADOOP-3670. Fixes JobTracker to clear out split bytes when no longer
required. (Amareshwari Sriramadasu via ddas)
HADOOP-3755. Update gridmix to work with HOD 0.4 (Runping Qi via cdouglas)
HADOOP-3743. Fix -libjars, -files, -archives options to work even if
user code does not implement tools. (Amareshwari Sriramadasu via mahadev)
HADOOP-3774. Fix typos in shell output. (Tsz Wo (Nicholas), SZE via
cdouglas)
HADOOP-3762. Fixed FileSystem cache to work with the default port. (cutting
via omalley)
HADOOP-3798. Fix tests compilation. (Mukund Madhugiri via omalley)
HADOOP-3794. Return modification time instead of zero for KosmosFileSystem.
(Sriram Rao via cdouglas)
HADOOP-3806. Remove debug statement to stdout from QuickSort. (cdouglas)
HADOOP-3776. Fix NPE at NameNode when datanode reports a block after it is
deleted at NameNode. (rangadi)
HADOOP-3537. Disallow adding a datanode to a network topology when its
network location is not resolved. (hairong)
HADOOP-3571. Fix bug in block removal used in lease recovery. (shv)
HADOOP-3645. MetricsTimeVaryingRate returns wrong value for
metric_avg_time. (Lohit Vijayarenu via hairong)
HADOOP-3521. Reverted the missing cast to float for sending Counters' values
to Hadoop metrics which was removed by HADOOP-544. (acmurthy)
HADOOP-3820. Fixes two problems in the gridmix-env - a syntax error, and a
wrong definition of USE_REAL_DATASET by default. (Arun Murthy via ddas)
HADOOP-3724. Fixes two problems related to storing and recovering lease
in the fsimage. (dhruba)
HADOOP-3827. Fixed compression of empty map-outputs. (acmurthy)
HADOOP-3865. Remove reference to FSNamesystem from metrics preventing
garbage collection. (Lohit Vijayarenu via cdouglas)
HADOOP-3884. Fix so that Eclipse plugin builds against recent
Eclipse releases. (cutting)
HADOOP-3837. Streaming jobs report progress status. (dhruba)
HADOOP-3897. Fix a NPE in secondary namenode. (Lohit Vijayarenu via
cdouglas)
HADOOP-3901. Fix bin/hadoop to correctly set classpath under cygwin.
(Tsz Wo (Nicholas) Sze via omalley)
HADOOP-3947. Fix a problem in tasktracker reinitialization.
(Amareshwari Sriramadasu via ddas)
Release 0.17.3 - Unreleased
IMPROVEMENTS
HADOOP-4164. Chinese translation of the documentation. (Xuebing Yan via
omalley)
BUG FIXES
HADOOP-4277. Checksum verification was mistakenly disabled for
LocalFileSystem. (Raghu Angadi)
HADOOP-4271. Checksum input stream can sometimes return invalid
data to the user. (Ning Li via rangadi)
HADOOP-4318. DistCp should use absolute paths for cleanup. (szetszwo)
HADOOP-4326. ChecksumFileSystem does not override create(...) correctly.
(szetszwo)
Release 0.17.2 - 2008-08-11
BUG FIXES
HADOOP-3678. Avoid spurious exceptions logged at DataNode when clients
read from DFS. (rangadi)
HADOOP-3707. NameNode keeps a count of number of blocks scheduled
to be written to a datanode and uses it to avoid allocating more
blocks than a datanode can hold. (rangadi)
HADOOP-3760. Fix a bug with HDFS file close() mistakenly introduced
by HADOOP-3681. (Lohit Vijayarenu via rangadi)
HADOOP-3681. DFSClient can get into an infinite loop while closing
a file if there are some errors. (Lohit Vijayarenu via rangadi)
HADOOP-3002. Hold off block removal while in safe mode. (shv)
HADOOP-3685. Unbalanced replication target. (hairong)
HADOOP-3758. Shutdown datanode on version mismatch instead of retrying
continuously, preventing excessive logging at the namenode.
(lohit vijayarenu via cdouglas)
HADOOP-3633. Correct exception handling in DataXceiveServer, and throttle
the number of xceiver threads in a data-node. (shv)
HADOOP-3370. Ensure that the TaskTracker.runningJobs data-structure is
correctly cleaned-up on task completion. (Zheng Shao via acmurthy)
HADOOP-3813. Fix task-output clean-up on HDFS to use the recursive
FileSystem.delete rather than the FileUtil.fullyDelete. (Amareshwari
Sri Ramadasu via acmurthy)
HADOOP-3859. Allow the maximum number of xceivers in the data node to
be configurable. (Johan Oskarsson via omalley)
HADOOP-3931. Fix corner case in the map-side sort that causes some values
to be counted as too large and cause pre-mature spills to disk. Some values
will also bypass the combiner incorrectly. (cdouglas via omalley)
Release 0.17.1 - 2008-06-23
INCOMPATIBLE CHANGES
HADOOP-3565. Fix the Java serialization, which is not enabled by
default, to clear the state of the serializer between objects.
(tomwhite via omalley)
IMPROVEMENTS
HADOOP-3522. Improve documentation on reduce pointing out that
input keys and values will be reused. (omalley)
HADOOP-3487. Balancer uses thread pools for managing its threads;
therefore provides better resource management. (hairong)
BUG FIXES
HADOOP-2159 Namenode stuck in safemode. The counter blockSafe should
not be decremented for invalid blocks. (hairong)
HADOOP-3472 MapFile.Reader getClosest() function returns incorrect results
when before is true (Todd Lipcon via Stack)
HADOOP-3442. Limit recursion depth on the stack for QuickSort to prevent
StackOverflowErrors. To avoid O(n*n) cases, when partitioning depth exceeds
a multiple of log(n), change to HeapSort. (cdouglas)
HADOOP-3477. Fix build to not package contrib/*/bin twice in
distributions. (Adam Heath via cutting)
HADOOP-3475. Fix MapTask to correctly size the accounting allocation of
io.sort.mb. (cdouglas)
HADOOP-3550. Fix the serialization data structures in MapTask where the
value lengths are incorrectly calculated. (cdouglas)
HADOOP-3526. Fix contrib/data_join framework by cloning values retained
in the reduce. (Spyros Blanas via cdouglas)
HADOOP-1979. Speed up fsck by adding a buffered stream. (Lohit
Vijaya Renu via omalley)
Release 0.17.0 - 2008-05-18
INCOMPATIBLE CHANGES
HADOOP-2786. Move hbase out of hadoop core
HADOOP-2345. New HDFS transactions to support appending
to files. Disk layout version changed from -11 to -12. (dhruba)
HADOOP-2192. Error messages from "dfs mv" command improved.
(Mahadev Konar via dhruba)
HADOOP-1902. "dfs du" command without any arguments operates on the
current working directory. (Mahadev Konar via dhruba)
HADOOP-2873. Fixed bad disk format introduced by HADOOP-2345.
Disk layout version changed from -12 to -13. See changelist 630992
(dhruba)
HADOOP-1985. This addresses rack-awareness for Map tasks and for
HDFS in a uniform way. (ddas)
HADOOP-1986. Add support for a general serialization mechanism for
Map Reduce. (tomwhite)
HADOOP-771. FileSystem.delete() takes an explicit parameter that
specifies whether a recursive delete is intended.
(Mahadev Konar via dhruba)
HADOOP-2470. Remove getContentLength(String), open(String, long, long)
and isDir(String) from ClientProtocol. ClientProtocol version changed
from 26 to 27. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-2822. Remove deprecated code for classes InputFormatBase and
PhasedFileSystem. (Amareshwari Sriramadasu via enis)
HADOOP-2116. Changes the layout of the task execution directory.
(Amareshwari Sriramadasu via ddas)
HADOOP-2828. The following deprecated methods in Configuration.java
have been removed
getObject(String name)
setObject(String name, Object value)
get(String name, Object defaultValue)
set(String name, Object value)
Iterator entries()
(Amareshwari Sriramadasu via ddas)
HADOOP-2824. Removes one deprecated constructor from MiniMRCluster.
(Amareshwari Sriramadasu via ddas)
HADOOP-2823. Removes deprecated methods getColumn(), getLine() from
org.apache.hadoop.record.compiler.generated.SimpleCharStream.
(Amareshwari Sriramadasu via ddas)
HADOOP-3060. Removes one unused constructor argument from MiniMRCluster.
(Amareshwari Sriramadasu via ddas)
HADOOP-2854. Remove deprecated o.a.h.ipc.Server::getUserInfo().
(lohit vijayarenu via cdouglas)
HADOOP-2563. Remove deprecated FileSystem::listPaths.
(lohit vijayarenu via cdouglas)
HADOOP-2818. Remove deprecated methods in Counters.
(Amareshwari Sriramadasu via tomwhite)
HADOOP-2831. Remove deprecated o.a.h.dfs.INode::getAbsoluteName()
(lohit vijayarenu via cdouglas)
HADOOP-2839. Remove deprecated FileSystem::globPaths.
(lohit vijayarenu via cdouglas)
HADOOP-2634. Deprecate ClientProtocol::exists.
(lohit vijayarenu via cdouglas)
HADOOP-2410. Make EC2 cluster nodes more independent of each other.
Multiple concurrent EC2 clusters are now supported, and nodes may be
added to a cluster on the fly with new nodes starting in the same EC2
availability zone as the cluster. Ganglia monitoring and large
instance sizes have also been added. (Chris K Wensel via tomwhite)
HADOOP-2826. Deprecated FileSplit.getFile(), LineRecordReader.readLine().
(Amareshwari Sriramadasu via ddas)
HADOOP-3239. getFileInfo() returns null for non-existing files instead
of throwing FileNotFoundException. (Lohit Vijayarenu via shv)
HADOOP-3266. Removed HOD changes from CHANGES.txt, as they are now inside
src/contrib/hod (Hemanth Yamijala via ddas)
HADOOP-3280. Separate the configuration of the virtual memory size
(mapred.child.ulimit) from the jvm heap size, so that 64 bit
streaming applications are supported even when running with 32 bit
jvms. (acmurthy via omalley)
NEW FEATURES
HADOOP-1398. Add HBase in-memory block cache. (tomwhite)
HADOOP-2178. Job History on DFS. (Amareshwari Sri Ramadasu via ddas)
HADOOP-2063. A new parameter to dfs -get command to fetch a file
even if it is corrupted. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2219. A new command "df -count" that counts the number of
files and directories. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2906. Add an OutputFormat capable of using keys, values, and
config params to map records to different output files.
(Runping Qi via cdouglas)
HADOOP-2346. Utilities to support timeout while writing to sockets.
DFSClient and DataNode sockets have 10min write timeout. (rangadi)
HADOOP-2951. Add a contrib module that provides a utility to
build or update Lucene indexes using Map/Reduce. (Ning Li via cutting)
HADOOP-1622. Allow multiple jar files for map reduce.
(Mahadev Konar via dhruba)
HADOOP-2055. Allows users to set PathFilter on the FileInputFormat.
(Alejandro Abdelnur via ddas)
HADOOP-2551. More environment variables like HADOOP_NAMENODE_OPTS
for better control of HADOOP_OPTS for each component. (rangadi)
HADOOP-3001. Add job counters that measure the number of bytes
read and written to HDFS, S3, KFS, and local file systems. (omalley)
HADOOP-3048. A new Interface and a default implementation to convert
and restore serializations of objects to/from strings. (enis)
IMPROVEMENTS
HADOOP-2655. Copy on write for data and metadata files in the
presence of snapshots. Needed for supporting appends to HDFS
files. (dhruba)
HADOOP-1967. When a Path specifies the same scheme as the default
FileSystem but no authority, the default FileSystem's authority is
used. Also add warnings for old-format FileSystem names, accessor
methods for fs.default.name, and check for null authority in HDFS.
(cutting)
HADOOP-2895. Let the profiling string be configurable.
(Martin Traverso via cdouglas)
HADOOP-910. Enables Reduces to do merges for the on-disk map output files
in parallel with their copying. (Amar Kamat via ddas)
HADOOP-730. Use rename rather than copy for local renames. (cdouglas)
HADOOP-2810. Updated the Hadoop Core logo. (nigel)
HADOOP-2057. Streaming should optionally treat a non-zero exit status
of a child process as a failed task. (Rick Cox via tomwhite)
HADOOP-2765. Enables specifying ulimits for streaming/pipes tasks (ddas)
HADOOP-2888. Make gridmix scripts more readily configurable and amenable
to automated execution. (Mukund Madhugiri via cdouglas)
HADOOP-2908. A document that describes the DFS Shell command.
(Mahadev Konar via dhruba)
HADOOP-2981. Update README.txt to reflect the upcoming use of
cryptography. (omalley)
HADOOP-2804. Add support to publish CHANGES.txt as HTML when running
the Ant 'docs' target. (nigel)
HADOOP-2559. Change DFS block placement to allocate the first replica
locally, the second off-rack, and the third intra-rack from the
second. (lohit vijayarenu via cdouglas)
HADOOP-2939. Make the automated patch testing process an executable
Ant target, test-patch. (nigel)
HADOOP-2239. Add HsftpFileSystem to permit transferring files over ssl.
(cdouglas)
HADOOP-2886. Track individual RPC metrics.
(girish vaitheeswaran via dhruba)
HADOOP-2373. Improvement in safe-mode reporting. (shv)
HADOOP-3091. Modify FsShell command -put to accept multiple sources.
(Lohit Vijaya Renu via cdouglas)
HADOOP-3092. Show counter values from job -status command.
(Tom White via ddas)
HADOOP-1228. Ant task to generate Eclipse project files. (tomwhite)
HADOOP-3093. Adds Configuration.getStrings(name, default-value) and
the corresponding setStrings. (Amareshwari Sriramadasu via ddas)
HADOOP-3106. Adds documentation in forrest for debugging.
(Amareshwari Sriramadasu via ddas)
HADOOP-3099. Add an option to distcp to preserve user, group, and
permission information. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-2841. Unwrap AccessControlException and FileNotFoundException
from RemoteException for DFSClient. (shv)
HADOOP-3152. Make index interval configuable when using
MapFileOutputFormat for map-reduce job. (Rong-En Fan via cutting)
HADOOP-3143. Decrease number of slaves from 4 to 3 in TestMiniMRDFSSort,
as Hudson generates false negatives under the current load.
(Nigel Daley via cdouglas)
HADOOP-3174. Illustrative example for MultipleFileInputFormat. (Enis
Soztutar via acmurthy)
HADOOP-2993. Clarify the usage of JAVA_HOME in the Quick Start guide.
(acmurthy via nigel)
HADOOP-3124. Make DataNode socket write timeout configurable. (rangadi)
OPTIMIZATIONS
HADOOP-2790. Fixed inefficient method hasSpeculativeTask by removing
repetitive calls to get the current time and late checking to see if
we want speculation on at all. (omalley)
HADOOP-2758. Reduce buffer copies in DataNode when data is read from
HDFS, without negatively affecting read throughput. (rangadi)
HADOOP-2399. Input key and value to combiner and reducer is reused.
(Owen O'Malley via ddas).
HADOOP-2423. Code optimization in FSNamesystem.mkdirs.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2606. ReplicationMonitor selects data-nodes to replicate directly
from needed replication blocks instead of looking up for the blocks for
each live data-node. (shv)
HADOOP-2148. Eliminate redundant data-node blockMap lookups. (shv)
HADOOP-2027. Return the number of bytes in each block in a file
via a single rpc to the namenode to speed up job planning.
(Lohit Vijaya Renu via omalley)
HADOOP-2902. Replace uses of "fs.default.name" with calls to the
accessor methods added in HADOOP-1967. (cutting)
HADOOP-2119. Optimize scheduling of jobs with large numbers of
tasks by replacing static arrays with lists of runnable tasks.
(Amar Kamat via omalley)
HADOOP-2919. Reduce the number of memory copies done during the
map output sorting. Also adds two config variables:
io.sort.spill.percent - the percentages of io.sort.mb that should
cause a spill (default 80%)
io.sort.record.percent - the percent of io.sort.mb that should
hold key/value indexes (default 5%)
(cdouglas via omalley)
HADOOP-3140. Doesn't add a task in the commit queue if the task hadn't
generated any output. (Amar Kamat via ddas)
HADOOP-3168. Reduce the amount of logging in streaming to an
exponentially increasing number of records (up to 10,000
records/log). (Zheng Shao via omalley)
BUG FIXES
HADOOP-2195. '-mkdir' behaviour is now closer to Linux shell in case of
errors. (Mahadev Konar via rangadi)
HADOOP-2190. bring behaviour '-ls' and '-du' closer to Linux shell
commands in case of errors. (Mahadev Konar via rangadi)
HADOOP-2193. 'fs -rm' and 'fs -rmr' show error message when the target
file does not exist. (Mahadev Konar via rangadi)
HADOOP-2738 Text is not subclassable because set(Text) and compareTo(Object)
access the other instance's private members directly. (jimk)
HADOOP-2779. Remove the references to HBase in the build.xml. (omalley)
HADOOP-2194. dfs cat on a non-existent file throws FileNotFoundException.
(Mahadev Konar via dhruba)
HADOOP-2767. Fix for NetworkTopology erroneously skipping the last leaf
node on a rack. (Hairong Kuang and Mark Butler via dhruba)
HADOOP-1593. FsShell works with paths in non-default FileSystem.
(Mahadev Konar via dhruba)
HADOOP-2191. du and dus command on non-existent directory gives
appropriate error message. (Mahadev Konar via dhruba)
HADOOP-2832. Remove tabs from code of DFSClient for better
indentation. (dhruba)
HADOOP-2844. distcp closes file handles for sequence files.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2727. Fix links in Web UI of the hadoop daemons and some docs
(Amareshwari Sri Ramadasu via ddas)
HADOOP-2871. Fixes a problem to do with file: URI in the JobHistory init.
(Amareshwari Sri Ramadasu via ddas)
HADOOP-2800. Deprecate SetFile.Writer constructor not the whole class.
(Johan Oskarsson via tomwhite)
HADOOP-2891. DFSClient.close() closes all open files. (dhruba)
HADOOP-2845. Fix dfsadmin disk utilization report on Solaris.
(Martin Traverso via tomwhite)
HADOOP-2912. MiniDFSCluster restart should wait for namenode to exit
safemode. This was causing TestFsck to fail. (Mahadev Konar via dhruba)
HADOOP-2820. The following classes in streaming are removed :
StreamLineRecordReader StreamOutputFormat StreamSequenceRecordReader.
(Amareshwari Sri Ramadasu via ddas)
HADOOP-2819. The following methods in JobConf are removed:
getInputKeyClass() setInputKeyClass getInputValueClass()
setInputValueClass(Class theClass) setSpeculativeExecution
getSpeculativeExecution() (Amareshwari Sri Ramadasu via ddas)
HADOOP-2817. Removes deprecated mapred.tasktracker.tasks.maximum and
ClusterStatus.getMaxTasks(). (Amareshwari Sri Ramadasu via ddas)
HADOOP-2821. Removes deprecated ShellUtil and ToolBase classes from
the util package. (Amareshwari Sri Ramadasu via ddas)
HADOOP-2934. The namenode was encountreing a NPE while loading
leases from the fsimage. Fixed. (dhruba)
HADOOP-2938. Some fs commands did not glob paths.
(Tsz Wo (Nicholas), SZE via rangadi)
HADOOP-2943. Compression of intermediate map output causes failures
in the merge. (cdouglas)
HADOOP-2870. DataNode and NameNode closes all connections while
shutting down. (Hairong Kuang via dhruba)
HADOOP-2973. Fix TestLocalDFS for Windows platform.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2971. select multiple times if it returns early in
SocketIOWithTimeout. (rangadi)
HADOOP-2955. Fix TestCrcCorruption test failures caused by HADOOP-2758
(rangadi)
HADOOP-2657. A flush call on the DFSOutputStream flushes the last
partial CRC chunk too. (dhruba)
HADOOP-2974. IPC unit tests used "0.0.0.0" to connect to server, which
is not always supported. (rangadi)
HADOOP-2996. Fixes uses of StringBuffer in StreamUtils class.
(Dave Brosius via ddas)
HADOOP-2995. Fixes StreamBaseRecordReader's getProgress to return a
floating point number. (Dave Brosius via ddas)
HADOOP-2972. Fix for a NPE in FSDataset.invalidate.
(Mahadev Konar via dhruba)
HADOOP-2994. Code cleanup for DFSClient: remove redundant
conversions from string to string. (Dave Brosius via dhruba)
HADOOP-3009. TestFileCreation sometimes fails because restarting
minidfscluster sometimes creates datanodes with ports that are
different from their original instance. (dhruba)
HADOOP-2992. Distributed Upgrade framework works correctly with
more than one upgrade object. (Konstantin Shvachko via dhruba)
HADOOP-2679. Fix a typo in libhdfs. (Jason via dhruba)
HADOOP-2976. When a lease expires, the Namenode ensures that
blocks of the file are adequately replicated. (dhruba)
HADOOP-2901. Fixes the creation of info servers in the JobClient
and JobTracker. Removes the creation from JobClient and removes
additional info server from the JobTracker. Also adds the command
line utility to view the history files (HADOOP-2896), and fixes
bugs in JSPs to do with analysis - HADOOP-2742, HADOOP-2792.
(Amareshwari Sri Ramadasu via ddas)
HADOOP-2890. If different datanodes report the same block but
with different sizes to the namenode, the namenode picks the
replica(s) with the largest size as the only valid replica(s). (dhruba)
HADOOP-2825. Deprecated MapOutputLocation.getFile() is removed.
(Amareshwari Sri Ramadasu via ddas)
HADOOP-2806. Fixes a streaming document.
(Amareshwari Sriramadasu via ddas)
HADOOP-3008. SocketIOWithTimeout throws InterruptedIOException if the
thread is interrupted while it is waiting. (rangadi)
HADOOP-3006. Fix wrong packet size reported by DataNode when a block
is being replicated. (rangadi)
HADOOP-3029. Datanode prints log message "firstbadlink" only if
it detects a bad connection to another datanode in the pipeline. (dhruba)
HADOOP-3030. Release reserved space for file in InMemoryFileSystem if
checksum reservation fails. (Devaraj Das via cdouglas)
HADOOP-3036. Fix findbugs warnings in UpgradeUtilities. (Konstantin
Shvachko via cdouglas)
HADOOP-3025. ChecksumFileSystem supports the delete method with
the recursive flag. (Mahadev Konar via dhruba)
HADOOP-3012. dfs -mv file to user home directory throws exception if
the user home directory does not exist. (Mahadev Konar via dhruba)
HADOOP-3066. Should not require superuser privilege to query if hdfs is in
safe mode (jimk)
HADOOP-3040. If the input line starts with the separator char, the key
is set as empty. (Amareshwari Sriramadasu via ddas)
HADOOP-3080. Removes flush calls from JobHistory.
(Amareshwari Sriramadasu via ddas)
HADOOP-3086. Adds the testcase missed during commit of hadoop-3040.
(Amareshwari Sriramadasu via ddas)
HADOOP-3046. Fix the raw comparators for Text and BytesWritables
to use the provided length rather than recompute it. (omalley)
HADOOP-3094. Fix BytesWritable.toString to avoid extending the sign bit
(Owen O'Malley via cdouglas)
HADOOP-3067. DFSInputStream's position read does not close the sockets.
(rangadi)
HADOOP-3073. close() on SocketInputStream or SocketOutputStream should
close the underlying channel. (rangadi)
HADOOP-3087. Fixes a problem to do with refreshing of loadHistory.jsp.
(Amareshwari Sriramadasu via ddas)
HADOOP-3065. Better logging message if the rack location of a datanode
cannot be determined. (Devaraj Das via dhruba)
HADOOP-3064. Commas in a file path should not be treated as delimiters.
(Hairong Kuang via shv)
HADOOP-2997. Adds test for non-writable serialier. Also fixes a problem
introduced by HADOOP-2399. (Tom White via ddas)
HADOOP-3114. Fix TestDFSShell on Windows. (Lohit Vijaya Renu via cdouglas)
HADOOP-3118. Fix Namenode NPE while loading fsimage after a cluster
upgrade from older disk format. (dhruba)
HADOOP-3161. Fix FIleUtil.HardLink.getLinkCount on Mac OS. (nigel
via omalley)
HADOOP-2927. Fix TestDU to acurately calculate the expected file size.
(shv via nigel)
HADOOP-3123. Fix the native library build scripts to work on Solaris.
(tomwhite via omalley)
HADOOP-3089. Streaming should accept stderr from task before
first key arrives. (Rick Cox via tomwhite)
HADOOP-3146. A DFSOutputStream.flush method is renamed as
DFSOutputStream.fsync. (dhruba)
HADOOP-3165. -put/-copyFromLocal did not treat input file "-" as stdin.
(Lohit Vijayarenu via rangadi)
HADOOP-3041. Deprecate JobConf.setOutputPath and JobConf.getOutputPath.
Deprecate OutputFormatBase. Add FileOutputFormat. Existing output formats
extending OutputFormatBase, now extend FileOutputFormat. Add the following
APIs in FileOutputFormat: setOutputPath, getOutputPath, getWorkOutputPath.
(Amareshwari Sriramadasu via nigel)
HADOOP-3083. The fsimage does not store leases. This would have to be
reworked in the next release to support appends. (dhruba)
HADOOP-3166. Fix an ArrayIndexOutOfBoundsException in the spill thread
and make exception handling more promiscuous to catch this condition.
(cdouglas)
HADOOP-3050. DataNode sends one and only one block report after
it registers with the namenode. (Hairong Kuang)
HADOOP-3044. NNBench sets the right configuration for the mapper.
(Hairong Kuang)
HADOOP-3178. Fix GridMix scripts for small and medium jobs
to handle input paths differently. (Mukund Madhugiri via nigel)
HADOOP-1911. Fix an infinite loop in DFSClient when all replicas of a
block are bad (cdouglas)
HADOOP-3157. Fix path handling in DistributedCache and TestMiniMRLocalFS.
(Doug Cutting via rangadi)
HADOOP-3018. Fix the eclipse plug-in contrib wrt removed deprecated
methods (taton)
HADOOP-3183. Fix TestJobShell to use 'ls' instead of java.io.File::exists
since cygwin symlinks are unsupported.
(Mahadev konar via cdouglas)
HADOOP-3175. Fix FsShell.CommandFormat to handle "-" in arguments.
(Edward J. Yoon via rangadi)
HADOOP-3220. Safemode message corrected. (shv)
HADOOP-3208. Fix WritableDeserializer to set the Configuration on
deserialized Writables. (Enis Soztutar via cdouglas)
HADOOP-3224. 'dfs -du /dir' does not return correct size.
(Lohit Vjayarenu via rangadi)
HADOOP-3223. Fix typo in help message for -chmod. (rangadi)
HADOOP-1373. checkPath() should ignore case when it compares authoriy.
(Edward J. Yoon via rangadi)
HADOOP-3204. Fixes a problem to do with ReduceTask's LocalFSMerger not
catching Throwable. (Amar Ramesh Kamat via ddas)
HADOOP-3229. Report progress when collecting records from the mapper and
the combiner. (Doug Cutting via cdouglas)
HADOOP-3225. Unwrapping methods of RemoteException should initialize
detailedMassage field. (Mahadev Konar, shv, cdouglas)
HADOOP-3247. Fix gridmix scripts to use the correct globbing syntax and
change maxentToSameCluster to run the correct number of jobs.
(Runping Qi via cdouglas)
HADOOP-3242. Fix the RecordReader of SequenceFileAsBinaryInputFormat to
correctly read from the start of the split and not the beginning of the
file. (cdouglas via acmurthy)
HADOOP-3256. Encodes the job name used in the filename for history files.
(Arun Murthy via ddas)
HADOOP-3162. Ensure that comma-separated input paths are treated correctly
as multiple input paths. (Amareshwari Sri Ramadasu via acmurthy)
HADOOP-3263. Ensure that the job-history log file always follows the
pattern of hostname_timestamp_jobid_username_jobname even if username
and/or jobname are not specfied. This helps to avoid wrong assumptions
made about the job-history log filename in jobhistory.jsp. (acmurthy)
HADOOP-3251. Fixes getFilesystemName in JobTracker and LocalJobRunner to
use FileSystem.getUri instead of FileSystem.getName. (Arun Murthy via ddas)
HADOOP-3237. Fixes TestDFSShell.testErrOutPut on Windows platform.
(Mahadev Konar via ddas)
HADOOP-3279. TaskTracker checks for SUCCEEDED task status in addition to
COMMIT_PENDING status when it fails maps due to lost map.
(Devaraj Das)
HADOOP-3286. Prevent collisions in gridmix output dirs by increasing the
granularity of the timestamp. (Runping Qi via cdouglas)
HADOOP-3285. Fix input split locality when the splits align to
fs blocks. (omalley)
HADOOP-3372. Fix heap management in streaming tests. (Arun Murthy via
cdouglas)
HADOOP-3031. Fix javac warnings in test classes. (cdouglas)
HADOOP-3382. Fix memory leak when files are not cleanly closed (rangadi)
HADOOP-3322. Fix to push MetricsRecord for rpc metrics. (Eric Yang via
mukund)
Release 0.16.4 - 2008-05-05
BUG FIXES
HADOOP-3138. DFS mkdirs() should not throw an exception if the directory
already exists. (rangadi via mukund)
HADOOP-3294. Fix distcp to check the destination length and retry the copy
if it doesn't match the src length. (Tsz Wo (Nicholas), SZE via mukund)
HADOOP-3186. Fix incorrect permission checkding for mv and renameTo
in HDFS. (Tsz Wo (Nicholas), SZE via mukund)
Release 0.16.3 - 2008-04-16
BUG FIXES
HADOOP-3010. Fix ConcurrentModificationException in ipc.Server.Responder.
(rangadi)
HADOOP-3154. Catch all Throwables from the SpillThread in MapTask, rather
than IOExceptions only. (ddas via cdouglas)
HADOOP-3159. Avoid file system cache being overwritten whenever
configuration is modified. (Tsz Wo (Nicholas), SZE via hairong)
HADOOP-3139. Remove the consistency check for the FileSystem cache in
closeAll() that causes spurious warnings and a deadlock.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3195. Fix TestFileSystem to be deterministic.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3069. Primary name-node should not truncate image when transferring
it from the secondary. (shv)
HADOOP-3182. Change permissions of the job-submission directory to 777
from 733 to ensure sharing of HOD clusters works correctly. (Tsz Wo
(Nicholas), Sze and Amareshwari Sri Ramadasu via acmurthy)
Release 0.16.2 - 2008-04-02
BUG FIXES
HADOOP-3011. Prohibit distcp from overwriting directories on the
destination filesystem with files. (cdouglas)
HADOOP-3033. The BlockReceiver thread in the datanode writes data to
the block file, changes file position (if needed) and flushes all by
itself. The PacketResponder thread does not flush block file. (dhruba)
HADOOP-2978. Fixes the JobHistory log format for counters.
(Runping Qi via ddas)
HADOOP-2985. Fixes LocalJobRunner to tolerate null job output path.
Also makes the _temporary a constant in MRConstants.java.
(Amareshwari Sriramadasu via ddas)
HADOOP-3003. FileSystem cache key is updated after a
FileSystem object is created. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-3042. Updates the Javadoc in JobConf.getOutputPath to reflect
the actual temporary path. (Amareshwari Sriramadasu via ddas)
HADOOP-3007. Tolerate mirror failures while DataNode is replicating
blocks as it used to before. (rangadi)
HADOOP-2944. Fixes a "Run on Hadoop" wizard NPE when creating a
Location from the wizard. (taton)
HADOOP-3049. Fixes a problem in MultiThreadedMapRunner to do with
catching RuntimeExceptions. (Alejandro Abdelnur via ddas)
HADOOP-3039. Fixes a problem to do with exceptions in tasks not
killing jobs. (Amareshwari Sriramadasu via ddas)
HADOOP-3027. Fixes a problem to do with adding a shutdown hook in
FileSystem. (Amareshwari Sriramadasu via ddas)
HADOOP-3056. Fix distcp when the target is an empty directory by
making sure the directory is created first. (cdouglas and acmurthy
via omalley)
HADOOP-3070. Protect the trash emptier thread from null pointer
exceptions. (Koji Noguchi via omalley)
HADOOP-3084. Fix HftpFileSystem to work for zero-lenghth files.
(cdouglas)
HADOOP-3107. Fix NPE when fsck invokes getListings. (dhruba)
HADOOP-3104. Limit MultithreadedMapRunner to have a fixed length queue
between the RecordReader and the map threads. (Alejandro Abdelnur via
omalley)
HADOOP-2833. Do not use "Dr. Who" as the default user in JobClient.
A valid user name is required. (Tsz Wo (Nicholas), SZE via rangadi)
HADOOP-3128. Throw RemoteException in setPermissions and setOwner of
DistributedFileSystem. (shv via nigel)
Release 0.16.1 - 2008-03-13
INCOMPATIBLE CHANGES
HADOOP-2869. Deprecate SequenceFile.setCompressionType in favor of
SequenceFile.createWriter, SequenceFileOutputFormat.setCompressionType,
and JobConf.setMapOutputCompressionType. (Arun C Murthy via cdouglas)
Configuration changes to hadoop-default.xml:
deprecated io.seqfile.compression.type
IMPROVEMENTS
HADOOP-2371. User guide for file permissions in HDFS.
(Robert Chansler via rangadi)
HADOOP-3098. Allow more characters in user and group names while
using -chown and -chgrp commands. (rangadi)
BUG FIXES
HADOOP-2789. Race condition in IPC Server Responder that could close
connections early. (Raghu Angadi)
HADOOP-2785. minor. Fix a typo in Datanode block verification
(Raghu Angadi)
HADOOP-2788. minor. Fix help message for chgrp shell command (Raghu Angadi).
HADOOP-1188. fstime file is updated when a storage directory containing
namespace image becomes inaccessible. (shv)
HADOOP-2787. An application can set a configuration variable named
dfs.umask to set the umask that is used by DFS.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2780. The default socket buffer size for DataNodes is 128K.
(dhruba)
HADOOP-2716. Superuser privileges for the Balancer.
(Tsz Wo (Nicholas), SZE via shv)
HADOOP-2754. Filter out .crc files from local file system listing.
(Hairong Kuang via shv)
HADOOP-2733. Fix compiler warnings in test code.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-2725. Modify distcp to avoid leaving partially copied files at
the destination after encountering an error. (Tsz Wo (Nicholas), SZE
via cdouglas)
HADOOP-2391. Cleanup job output directory before declaring a job as
SUCCESSFUL. (Amareshwari Sri Ramadasu via ddas)
HADOOP-2808. Minor fix to FileUtil::copy to mind the overwrite
formal. (cdouglas)
HADOOP-2683. Moving UGI out of the RPC Server.
(Tsz Wo (Nicholas), SZE via shv)
HADOOP-2814. Fix for NPE in datanode in unit test TestDataTransferProtocol.
(Raghu Angadi via dhruba)
HADOOP-2811. Dump of counters in job history does not add comma between
groups. (runping via omalley)
HADOOP-2735. Enables setting TMPDIR for tasks.
(Amareshwari Sri Ramadasu via ddas)
HADOOP-2843. Fix protections on map-side join classes to enable derivation.
(cdouglas via omalley)
HADOOP-2840. Fix gridmix scripts to correctly invoke the java sort through
the proper jar. (Mukund Madhugiri via cdouglas)
HADOOP-2769. TestNNThroughputBnechmark should not use a fixed port for
the namenode http port. (omalley)
HADOOP-2852. Update gridmix benchmark to avoid an artifically long tail.
(cdouglas)
HADOOP-2894. Fix a problem to do with tasktrackers failing to connect to
JobTracker upon reinitialization. (Owen O'Malley via ddas).
HADOOP-2903. Fix exception generated by Metrics while using pushMetric().
(girish vaitheeswaran via dhruba)
HADOOP-2904. Fix to RPC metrics to log the correct host name.
(girish vaitheeswaran via dhruba)
HADOOP-2918. Improve error logging so that dfs writes failure with
"No lease on file" can be diagnosed. (dhruba)
HADOOP-2923. Add SequenceFileAsBinaryInputFormat, which was
missed in the commit for HADOOP-2603. (cdouglas via omalley)
HADOOP-2931. IOException thrown by DFSOutputStream had wrong stack
trace in some cases. (Michael Bieniosek via rangadi)
HADOOP-2883. Write failures and data corruptions on HDFS files.
The write timeout is back to what it was on 0.15 release. Also, the
datnodes flushes the block file buffered output stream before
sending a positive ack for the packet back to the client. (dhruba)
HADOOP-2756. NPE in DFSClient while closing DFSOutputStreams
under load. (rangadi)
HADOOP-2958. Fixed FileBench which broke due to HADOOP-2391 which performs
a check for existence of the output directory and a trivial bug in
GenericMRLoadGenerator where min/max word lenghts were identical since
they were looking at the same config variables (Chris Douglas via
acmurthy)
HADOOP-2915. Fixed FileSystem.CACHE so that a username is included
in the cache key. (Tsz Wo (Nicholas), SZE via nigel)
HADOOP-2813. TestDU unit test uses its own directory to run its
sequence of tests. (Mahadev Konar via dhruba)
Release 0.16.0 - 2008-02-07
INCOMPATIBLE CHANGES
HADOOP-1245. Use the mapred.tasktracker.tasks.maximum value
configured on each tasktracker when allocating tasks, instead of
the value configured on the jobtracker. InterTrackerProtocol
version changed from 5 to 6. (Michael Bieniosek via omalley)
HADOOP-1843. Removed code from Configuration and JobConf deprecated by
HADOOP-785 and a minor fix to Configuration.toString. Specifically the
important change is that mapred-default.xml is no longer supported and
Configuration no longer supports the notion of default/final resources.
(acmurthy)
HADOOP-1302. Remove deprecated abacus code from the contrib directory.
This also fixes a configuration bug in AggregateWordCount, so that the
job now works. (enis)
HADOOP-2288. Enhance FileSystem API to support access control.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2184. RPC Support for user permissions and authentication.
(Raghu Angadi via dhruba)
HADOOP-2185. RPC Server uses any available port if the specified
port is zero. Otherwise it uses the specified port. Also combines
the configuration attributes for the servers' bind address and
port from "x.x.x.x" and "y" to "x.x.x.x:y".
Deprecated configuration variables:
dfs.info.bindAddress
dfs.info.port
dfs.datanode.bindAddress
dfs.datanode.port
dfs.datanode.info.bindAdress
dfs.datanode.info.port
dfs.secondary.info.bindAddress
dfs.secondary.info.port
mapred.job.tracker.info.bindAddress
mapred.job.tracker.info.port
mapred.task.tracker.report.bindAddress
tasktracker.http.bindAddress
tasktracker.http.port
New configuration variables (post HADOOP-2404):
dfs.secondary.http.address
dfs.datanode.address
dfs.datanode.http.address
dfs.http.address
mapred.job.tracker.http.address
mapred.task.tracker.report.address
mapred.task.tracker.http.address
(Konstantin Shvachko via dhruba)
HADOOP-2401. Only the current leaseholder can abandon a block for
a HDFS file. ClientProtocol version changed from 20 to 21.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2381. Support permission information in FileStatus. Client
Protocol version changed from 21 to 22. (Raghu Angadi via dhruba)
HADOOP-2110. Block report processing creates fewer transient objects.
Datanode Protocol version changed from 10 to 11.
(Sanjay Radia via dhruba)
HADOOP-2567. Add FileSystem#getHomeDirectory(), which returns the
user's home directory in a FileSystem as a fully-qualified path.
FileSystem#getWorkingDirectory() is also changed to return a
fully-qualified path, which can break applications that attempt
to, e.g., pass LocalFileSystem#getWorkingDir().toString() directly
to java.io methods that accept file names. (cutting)
HADOOP-2514. Change trash feature to maintain a per-user trash
directory, named ".Trash" in the user's home directory. The
"fs.trash.root" parameter is no longer used. Full source paths
are also no longer reproduced within the trash.
HADOOP-2012. Periodic data verification on Datanodes.
(Raghu Angadi via dhruba)
HADOOP-1707. The DFSClient does not use a local disk file to cache
writes to a HDFS file. Changed Data Transfer Version from 7 to 8.
(dhruba)
HADOOP-2652. Fix permission issues for HftpFileSystem. This is an
incompatible change since distcp may not be able to copy files
from cluster A (compiled with this patch) to cluster B (compiled
with previous versions). (Tsz Wo (Nicholas), SZE via dhruba)
NEW FEATURES
HADOOP-1857. Ability to run a script when a task fails to capture stack
traces. (Amareshwari Sri Ramadasu via ddas)
HADOOP-2299. Defination of a login interface. A simple implementation for
Unix users and groups. (Hairong Kuang via dhruba)
HADOOP-1652. A utility to balance data among datanodes in a HDFS cluster.
(Hairong Kuang via dhruba)
HADOOP-2085. A library to support map-side joins of consistently
partitioned and sorted data sets. (Chris Douglas via omalley)
HADOOP-2336. Shell commands to modify file permissions. (rangadi)
HADOOP-1298. Implement file permissions for HDFS.
(Tsz Wo (Nicholas) & taton via cutting)
HADOOP-2447. HDFS can be configured to limit the total number of
objects (inodes and blocks) in the file system. (dhruba)
HADOOP-2487. Added an option to get statuses for all submitted/run jobs.
This information can be used to develop tools for analysing jobs.
(Amareshwari Sri Ramadasu via acmurthy)
HADOOP-1873. Implement user permissions for Map/Reduce framework.
(Hairong Kuang via shv)
HADOOP-2532. Add to MapFile a getClosest method that returns the key
that comes just before if the key is not present. (stack via tomwhite)
HADOOP-1883. Add versioning to Record I/O. (Vivek Ratan via ddas)
HADOOP-2603. Add SeqeunceFileAsBinaryInputFormat, which reads
sequence files as BytesWritable/BytesWritable regardless of the
key and value types used to write the file. (cdouglas via omalley)
HADOOP-2367. Add ability to profile a subset of map/reduce tasks and fetch
the result to the local filesystem of the submitting application. Also
includes a general IntegerRanges extension to Configuration for setting
positive, ranged parameters. (Owen O'Malley via cdouglas)
IMPROVEMENTS
HADOOP-2045. Change committer list on website to a table, so that
folks can list their organization, timezone, etc. (cutting)
HADOOP-2058. Facilitate creating new datanodes dynamically in
MiniDFSCluster. (Hairong Kuang via dhruba)
HADOOP-1855. fsck verifies block placement policies and reports
violations. (Konstantin Shvachko via dhruba)
HADOOP-1604. An system administrator can finalize namenode upgrades
without running the cluster. (Konstantin Shvachko via dhruba)
HADOOP-1839. Link-ify the Pending/Running/Complete/Killed grid in
jobdetails.jsp to help quickly narrow down and see categorized TIPs'
details via jobtasks.jsp. (Amar Kamat via acmurthy)
HADOOP-1210. Log counters in job history. (Owen O'Malley via ddas)
HADOOP-1912. Datanode has two new commands COPY and REPLACE. These are
needed for supporting data rebalance. (Hairong Kuang via dhruba)
HADOOP-2086. This patch adds the ability to add dependencies to a job
(run via JobControl) after construction. (Adrian Woodhead via ddas)
HADOOP-1185. Support changing the logging level of a server without
restarting the server. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2134. Remove developer-centric requirements from overview.html and
keep it end-user focussed, specifically sections related to subversion and
building Hadoop. (Jim Kellerman via acmurthy)
HADOOP-1989. Support simulated DataNodes. This helps creating large virtual
clusters for testing purposes. (Sanjay Radia via dhruba)
HADOOP-1274. Support different number of mappers and reducers per
TaskTracker to allow administrators to better configure and utilize
heterogenous clusters.
Configuration changes to hadoop-default.xml:
add mapred.tasktracker.map.tasks.maximum (default value of 2)
add mapred.tasktracker.reduce.tasks.maximum (default value of 2)
remove mapred.tasktracker.tasks.maximum (deprecated for 0.16.0)
(Amareshwari Sri Ramadasu via acmurthy)
HADOOP-2104. Adds a description to the ant targets. This makes the
output of "ant -projecthelp" sensible. (Chris Douglas via ddas)
HADOOP-2127. Added a pipes sort example to benchmark trivial pipes
application versus trivial java application. (omalley via acmurthy)
HADOOP-2113. A new shell command "dfs -text" to view the contents of
a gziped or SequenceFile. (Chris Douglas via dhruba)
HADOOP-2207. Add a "package" target for contrib modules that
permits each to determine what files are copied into release
builds. (stack via cutting)
HADOOP-1984. Makes the backoff for failed fetches exponential.
Earlier, it was a random backoff from an interval.
(Amar Kamat via ddas)
HADOOP-1327. Include website documentation for streaming. (Rob Weltman
via omalley)
HADOOP-2000. Rewrite NNBench to measure namenode performance accurately.
It now uses the map-reduce framework for load generation.
(Mukund Madhugiri via dhruba)
HADOOP-2248. Speeds up the framework w.r.t Counters. Also has API
updates to the Counters part. (Owen O'Malley via ddas)
HADOOP-2326. The initial block report at Datanode startup time has
a random backoff period. (Sanjay Radia via dhruba)
HADOOP-2432. HDFS includes the name of the file while throwing
"File does not exist" exception. (Jim Kellerman via dhruba)
HADOOP-2457. Added a 'forrest.home' property to the 'docs' target in
build.xml. (acmurthy)
HADOOP-2149. A new benchmark for three name-node operation: file create,
open, and block report, to evaluate the name-node performance
for optimizations or new features. (Konstantin Shvachko via shv)
HADOOP-2466. Change FileInputFormat.computeSplitSize to a protected
non-static method to allow sub-classes to provide alternate
implementations. (Alejandro Abdelnur via acmurthy)
HADOOP-2425. Change TextOutputFormat to handle Text specifically for better
performance. Make NullWritable implement Comparable. Make TextOutputFormat
treat NullWritable like null. (omalley)
HADOOP-1719. Improves the utilization of shuffle copier threads.
(Amar Kamat via ddas)
HADOOP-2390. Added documentation for user-controls for intermediate
map-outputs & final job-outputs and native-hadoop libraries. (acmurthy)
HADOOP-1660. Add the cwd of the map/reduce task to the java.library.path
of the child-jvm to support loading of native libraries distributed via
the DistributedCache. (acmurthy)
HADOOP-2285. Speeds up TextInputFormat. Also includes updates to the
Text API. (Owen O'Malley via cdouglas)
HADOOP-2233. Adds a generic load generator for modeling MR jobs. (cdouglas)
HADOOP-2369. Adds a set of scripts for simulating a mix of user map/reduce
workloads. (Runping Qi via cdouglas)
HADOOP-2547. Removes use of a 'magic number' in build.xml.
(Hrishikesh via nigel)
HADOOP-2268. Fix org.apache.hadoop.mapred.jobcontrol classes to use the
List/Map interfaces rather than concrete ArrayList/HashMap classes
internally. (Adrian Woodhead via acmurthy)
HADOOP-2406. Add a benchmark for measuring read/write performance through
the InputFormat interface, particularly with compression. (cdouglas)
HADOOP-2131. Allow finer-grained control over speculative-execution. Now
users can set it for maps and reduces independently.
Configuration changes to hadoop-default.xml:
deprecated mapred.speculative.execution
add mapred.map.tasks.speculative.execution
add mapred.reduce.tasks.speculative.execution
(Amareshwari Sri Ramadasu via acmurthy)
HADOOP-1965. Interleave sort/spill in teh map-task along with calls to the
Mapper.map method. This is done by splitting the 'io.sort.mb' buffer into
two and using one half for collecting map-outputs and the other half for
sort/spill. (Amar Kamat via acmurthy)
HADOOP-2464. Unit tests for chmod, chown, and chgrp using DFS.
(Raghu Angadi)
HADOOP-1876. Persist statuses of completed jobs in HDFS so that the
JobClient can query and get information about decommissioned jobs and also
across JobTracker restarts.
Configuration changes to hadoop-default.xml:
add mapred.job.tracker.persist.jobstatus.active (default value of false)
add mapred.job.tracker.persist.jobstatus.hours (default value of 0)
add mapred.job.tracker.persist.jobstatus.dir (default value of
/jobtracker/jobsInfo)
(Alejandro Abdelnur via acmurthy)
HADOOP-2077. Added version and build information to STARTUP_MSG for all
hadoop daemons to aid error-reporting, debugging etc. (acmurthy)
HADOOP-2398. Additional instrumentation for NameNode and RPC server.
Add support for accessing instrumentation statistics via JMX.
(Sanjay radia via dhruba)
HADOOP-2449. A return of the non-MR version of NNBench.
(Sanjay Radia via shv)
HADOOP-1989. Remove 'datanodecluster' command from bin/hadoop.
(Sanjay Radia via shv)
HADOOP-1742. Improve JavaDoc documentation for ClientProtocol, DFSClient,
and FSNamesystem. (Konstantin Shvachko)
HADOOP-2298. Add Ant target for a binary-only distribution.
(Hrishikesh via nigel)
HADOOP-2509. Add Ant target for Rat report (Apache license header
reports). (Hrishikesh via nigel)
HADOOP-2469. WritableUtils.clone should take a Configuration
instead of a JobConf. (stack via omalley)
HADOOP-2659. Introduce superuser permissions for admin operations.
(Tsz Wo (Nicholas), SZE via shv)
HADOOP-2596. Added a SequenceFile.createWriter api which allows the user
to specify the blocksize, replication factor and the buffersize to be
used for the underlying HDFS file. (Alejandro Abdelnur via acmurthy)
HADOOP-2431. Test HDFS File Permissions. (Hairong Kuang via shv)
HADOOP-2232. Add an option to disable Nagle's algorithm in the IPC stack.
(Clint Morgan via cdouglas)
HADOOP-2342. Created a micro-benchmark for measuring
local-file versus hdfs reads. (Owen O'Malley via nigel)
HADOOP-2529. First version of HDFS User Guide. (Raghu Angadi)
HADOOP-2690. Add jar-test target to build.xml, separating compilation
and packaging of the test classes. (Enis Soztutar via cdouglas)
OPTIMIZATIONS
HADOOP-1898. Release the lock protecting the last time of the last stack
dump while the dump is happening. (Amareshwari Sri Ramadasu via omalley)
HADOOP-1900. Makes the heartbeat and task event queries interval
dependent on the cluster size. (Amareshwari Sri Ramadasu via ddas)
HADOOP-2208. Counter update frequency (from TaskTracker to JobTracker) is
capped at 1 minute. (Amareshwari Sri Ramadasu via ddas)
HADOOP-2284. Reduce the number of progress updates during the sorting in
the map task. (Amar Kamat via ddas)
BUG FIXES
HADOOP-2583. Fixes a bug in the Eclipse plug-in UI to edit locations.
Plug-in version is now synchronized with Hadoop version.
HADOOP-2100. Remove faulty check for existence of $HADOOP_PID_DIR and let
'mkdir -p' check & create it. (Michael Bieniosek via acmurthy)
HADOOP-1642. Ensure jobids generated by LocalJobRunner are unique to
avoid collissions and hence job-failures. (Doug Cutting via acmurthy)
HADOOP-2096. Close open file-descriptors held by streams while localizing
job.xml in the JobTracker and while displaying it on the webui in
jobconf.jsp. (Amar Kamat via acmurthy)