Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: master
Commits on Nov 21, 2011
  1. @ahadrana
  2. @ahadrana
Commits on Nov 17, 2011
  1. Merge pull request #1 from matpalm/emr

    commoncrawl authored
    slight generalisation so we can build on elastic mapreduce
  2. @matpalm
Commits on Nov 16, 2011
  1. @ahadrana
  2. @ahadrana
  3. @ahadrana

    Added introductory README

    ahadrana authored
  4. @ahadrana
Commits on Nov 14, 2011
  1. Fix directory tree.

    Ahad Rana authored
  2. Minor modification to GoogleURL interface.

    Ahad Rana authored
  3. Fixed bug in InputStream implementation.

    Ahad Rana authored
  4. Includes, among other things, (1) added mergeutils project into commo…

    Ahad Rana authored Ahad Rana committed
    …ncrawl source tree (2) added query project into commoncrawl source tree (3) major refactoring of query project (4) bulk scan implementation (5) integration of parallel query functionality (6) bulk query support in cacheFE server (7) fix improper flush bug in Indexer code
  5. Emergency commit to fix Indexer Bug.

    Ahad Rana authored Ahad Rana committed
  6. Resolve project dependencies and make it build via build.xml file.

    Ahad Rana authored Ahad Rana committed
  7. Removed rendundant src directory under cc_src

    Ahad Rana authored Ahad Rana committed
  8. added BinaryComparableWithOffet to deal with comparables that need of…

    Ahad Rana authored Ahad Rana committed
    …fset information, added HBase BoundedRangeFileInputStream to utils, modified FlexBuffer to derive from BinaryComparableWithOffset modified SimHash code to produce simhash from byte stream instead of char stream extended TFileReader to have a ValueReader object, to allow for partial deserialization of thrift objects modifed TFileThriftObjectWriter to take replication factor as a parameter in constructor added TFileUtils to allow for introspection of TFile metadata modified TextBytes to derive from BinaryComparableWithOffset modified URLUtils to strip www prefix by default during canonicalization
  9. Added a new way to retrieve Value data from a TFileThriftObjectReader.

    Ahad Rana authored Ahad Rana committed
  10. Added support for reading/writing Thrift objects via TFile.

    Ahad Rana authored Ahad Rana committed
  11. Added basic Tuple support.

    Ahad Rana authored Ahad Rana committed
  12. Modifications necessary to support proper UTF-8 compliant Http Header…

    Ahad Rana authored Ahad Rana committed
    … parsing in NIOHttpConnection.
  13. Made RPCFrame thread safe to handle Muti-Threaded Actors reading/writ…

    Ahad Rana authored Ahad Rana committed
    …ing from a single channel.
  14. Added byte offset compatible API to TextBytes.

    Ahad Rana authored Ahad Rana committed
  15. Added support to extract substream from BufferListInputStream.

    Ahad Rana authored Ahad Rana committed
  16. Added more synchronization to public API calls.

    Ahad Rana authored Ahad Rana committed
  17. 1. Added server config file support to CommonCrawlServer. 2. Fixed mi…

    Ahad Rana authored Ahad Rana committed
    …ssing offset bug in TextBytes. 3. Add RawComparator,RawComparable support to FlexBuffer.
  18. Some modifications to MMapUtils and corresponding changes to RiceCode…

    Ahad Rana authored Ahad Rana committed
    …r. Also disable native compilation in build.xml by default.
  19. 1. Modified compiler to have generated setXXX methods return this so …

    Ahad Rana authored Ahad Rana committed
    …that data structure construction calls can be chained in a builder like pattern.
    
    2. Made ByteBufferInputStream derive from FSInputStream so that it
    becomes seekable (inherently supported by ByteBuffer anyhow).
    
    3. Made MMapFileInputStream derive from FSInputStream and made
    MMapFile.newInputStream return an FSDataInputStream, so that the
    mmap'd file stream can be interchanged with a HDFS stream in
    certain parts of the codebase.
    
    4. Reverted some changes in the RiceCoder for now, and also
    added a constructor for RiceCodeReader that takes an
    FSDataInputStream, thus enabling a MMapFileStream to be used
    to initialize a Reader object. This eliminated a buffer copy
    from an FSDataInputStream to ByteBuffer (the other type allowed
    in the Reader's constructor).
    
    5. Added a hasMoreData method to TFileReader to enable an
    after 'next' check to see if EOF condition has been hit. This
    is possible because the last call to TFile.Scanner's next sets
    it in an EOF state which can be checked via its atEnd method.
  20. More modifications to the TextBytes and FlexBuffer API. Plus, a bug f…

    Ahad Rana authored Ahad Rana committed
    …ix of FlexBuffer's clone method. Original version as cloning data members (via Object.clone) and then copy src contents into new object. Unfortunately, the data member clone ended up reusing the source's storage buffer within the context of the new object (BAD!). Clone is a deep clone, so new object need to allocate it's own storage!
    
    TODO: Integrate unit tests for FlexBuffer and TextBytes from CC private
    codebase to catch nasty bugs like these!
  21. A combined checkin that includes:

    Ahad Rana authored Ahad Rana committed
    1. Integration of RiceCoder from CC private src.
    2. Some Memory Mapped IO helper code (MMapUtils)
    3. Better shared / copy on write semantics for TextBytes and FlexBuffer
    4. Changes to various classes to reflect changes in TextBytes and FlexBuffer
       APIs.
    5. RPC Compiler / Code Generator modifications to accomodate new TextBytes
       /FlexBuffer API.
    6. TFile related helper utilities.
    7. Added Type Parameter to RPCStruct base class.
  22. More NodeJS related headaches.

    Ahad Rana authored Ahad Rana committed
  23. Remove NodeJS dependency yet again :-(

    Ahad Rana authored Ahad Rana committed
  24. More bug fixes.

    Ahad Rana authored Ahad Rana committed
  25. WebServer related fixes.

    Ahad Rana authored Ahad Rana committed
  26. Some formatting changes.

    Ahad Rana authored Ahad Rana committed
  27. Merged in Server and URLUtils components

    Ahad Rana authored Ahad Rana committed
Something went wrong with that request. Please try again.