.NET based webcrawler
Simple and very efficient multithreaded web crawler with pipeline based processing written in C#. Contains HTML, Text, PDF, and IFilter document processors and language detection(Google). Easy to add pipeline steps to extract, use and alter information.
Total rewrite of NCrawler from 2010 using more modern programming. Now on v4