This patch explicitly sets the ld option '--no-as-needed'. In Ubuntu 11.10, the default behavior of ld was changed to '--as-needed', which breaks the src/native/configure script and its detection of the native liblzo2 library. More information is available at: https://github.com/kevinweil/hadoop-lzo/issues/33
Previously, this would instantiate a new Configuration object on every call, which involved re-reading and parsing the configuration XML files to load the defaults. This was very slow. The new version caches a default Configuration object statically and uses that one in this circumstance.
…h cleans up the code nicely.
…asses. These classes are more appropriate than DeprecatedLzoTextInputFormat / DeprecatedLzoLineRecordReader for use with the hadoop-streaming jar, since they have the same behavior as the default streaming input format: - input is broken into lines using any of '\n', '\r', or '\r\n'. - line contents up to the first '\t' character are treated as the key - the rest of the line is treated as the value In contrast, the DeprecatedLzoTextInputFormat treats the file offset as the key and the entire line as the value. This resulted in weird behavior when using the DeprecatedLzoTextInputFormat with a streaming MR job. For example, when using -mapper 'cat' and no reducer (which should produce an output file that's identical to the input file), this input key1 value1 key2 value2 key3 value3 Produced this output: 0 key1 value1 95 key2 value2 95 key3 value3 which is clearly wrong. Using LzoStreamingInputFormat produces the expected output (same as input).
…efault is true). The option is to be used with the DeprecatedLzoTextInputFormat and LzoTextInputFormat input format classes. When true, it causes all files that don't end in ".lzo" to be silently dropped from the input set. When false, it will keep files that don't end in ".lzo", and will process them with TextInputFormat (however, files that end in ".lzo.index" will still be ignored). This makes it possible to process a mix of LZO and non-LZO files with a single MR job, which in turn makes it much easier to perform an online upgrade to LZO compression in a production system without incurring downtime. It also makes it possible to reprocess ranges of log files that span the pre-LZO / post-LZO boundary in a single MR job. 2) Added unit test for the above feature to TestLzoTextInputFormat. 3) Added a public LzopCodec.DEFAULT_LZO_EXTENSION constant.
Updated outputformat tests to verify the index.