…ncrawl source tree (2) added query project into commoncrawl source tree (3) major refactoring of query project (4) bulk scan implementation (5) integration of parallel query functionality (6) bulk query support in cacheFE server (7) fix improper flush bug in Indexer code
…fset information, added HBase BoundedRangeFileInputStream to utils, modified FlexBuffer to derive from BinaryComparableWithOffset modified SimHash code to produce simhash from byte stream instead of char stream extended TFileReader to have a ValueReader object, to allow for partial deserialization of thrift objects modifed TFileThriftObjectWriter to take replication factor as a parameter in constructor added TFileUtils to allow for introspection of TFile metadata modified TextBytes to derive from BinaryComparableWithOffset modified URLUtils to strip www prefix by default during canonicalization
… parsing in NIOHttpConnection.
…ing from a single channel.
…ssing offset bug in TextBytes. 3. Add RawComparator,RawComparable support to FlexBuffer.
…r. Also disable native compilation in build.xml by default.
…that data structure construction calls can be chained in a builder like pattern. 2. Made ByteBufferInputStream derive from FSInputStream so that it becomes seekable (inherently supported by ByteBuffer anyhow). 3. Made MMapFileInputStream derive from FSInputStream and made MMapFile.newInputStream return an FSDataInputStream, so that the mmap'd file stream can be interchanged with a HDFS stream in certain parts of the codebase. 4. Reverted some changes in the RiceCoder for now, and also added a constructor for RiceCodeReader that takes an FSDataInputStream, thus enabling a MMapFileStream to be used to initialize a Reader object. This eliminated a buffer copy from an FSDataInputStream to ByteBuffer (the other type allowed in the Reader's constructor). 5. Added a hasMoreData method to TFileReader to enable an after 'next' check to see if EOF condition has been hit. This is possible because the last call to TFile.Scanner's next sets it in an EOF state which can be checked via its atEnd method.
…ix of FlexBuffer's clone method. Original version as cloning data members (via Object.clone) and then copy src contents into new object. Unfortunately, the data member clone ended up reusing the source's storage buffer within the context of the new object (BAD!). Clone is a deep clone, so new object need to allocate it's own storage! TODO: Integrate unit tests for FlexBuffer and TextBytes from CC private codebase to catch nasty bugs like these!
1. Integration of RiceCoder from CC private src. 2. Some Memory Mapped IO helper code (MMapUtils) 3. Better shared / copy on write semantics for TextBytes and FlexBuffer 4. Changes to various classes to reflect changes in TextBytes and FlexBuffer APIs. 5. RPC Compiler / Code Generator modifications to accomodate new TextBytes /FlexBuffer API. 6. TFile related helper utilities. 7. Added Type Parameter to RPCStruct base class.