A distributed open source search engine and spider/crawler written in C/C++ for Linux on Intel/AMD. From gigablast dot com, which has binaries for download. See the README.md file at the very bottom of this page for instructions.
C++ C HTML Python Makefile PHP
Permalink
Failed to load latest commit information.
antiword-dir Initial file population. Aug 2, 2013
diffbot-widget widget updates Apr 21, 2014
doxygen put in place doxygen stuffs May 15, 2015
html updated dmoz docs Jan 23, 2016
openssl we already include our own 32-bit Sep 16, 2013
script Increase time to mark item as stale in warc injector. Nov 2, 2015
ucdata Initial file population. Aug 2, 2013
.gitignore added Codeblocks project file Oct 31, 2014
Abbreviations.cpp now it compiles with -m32 Nov 10, 2014
Abbreviations.h replace long long with int64_t Oct 30, 2014
Accessdb.cpp good checkpoint. quite a few fixes. Nov 18, 2014
Accessdb.h now it compiles with -m32 Nov 10, 2014
Address.cpp fixed langid based query stop words. Mar 8, 2015
Address.h text replacements for bad int32_t substitutions Nov 18, 2014
Ads.cpp text replacements for bad int32_t substitutions Nov 18, 2014
Ads.h now it compiles with -m32 Nov 10, 2014
AdultBit.cpp now it compiles with -m32 Nov 10, 2014
AdultBit.h now it compiles with -m32 Nov 10, 2014
AutoBan.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
AutoBan.h now it compiles with -m32 Nov 10, 2014
BigFile.cpp added FIXBUG code to fix seg fault from Dec 8, 2015
BigFile.h all files made are now group writable. Sep 21, 2015
Bits.cpp text replacements for bad int32_t substitutions Nov 18, 2014
Bits.h now it compiles with -m32 Nov 10, 2014
Blaster.cpp bring back max mem control into master controls. Aug 14, 2015
Blaster.h now it compiles with -m32 Nov 10, 2014
Cachedb.cpp fix compiler warnings Sep 10, 2015
Cachedb.h do not store cblock, etc. tags into tagdb to save Sep 10, 2015
CatRec.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
CatRec.h good checkpoint. quite a few fixes. Nov 18, 2014
Catdb.cpp do not store cblock, etc. tags into tagdb to save Sep 10, 2015
Catdb.h do not store cblock, etc. tags into tagdb to save Sep 10, 2015
Categories.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Categories.h now it compiles with -m32 Nov 10, 2014
Clusterdb.cpp fix compiler warnings Sep 10, 2015
Clusterdb.h do not store cblock, etc. tags into tagdb to save Sep 10, 2015
Collectiondb.cpp fix the source of lots of corruption in spiderdb and titledb. Mar 15, 2016
Collectiondb.h bring back max doc len parms. Feb 8, 2016
Conf.cpp fix permissions bug when creating directories, Oct 7, 2015
Conf.h fix the source of lots of corruption in spiderdb and titledb. Mar 15, 2016
CountryCode.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
CountryCode.h now it compiles with -m32 Nov 10, 2014
DailyMerge.cpp fix so we can generate posdb map for Nov 1, 2015
DailyMerge.h move CollectionRec stuff into Collectiondb files Dec 10, 2013
DataFeed.cpp now it compiles with -m32 Nov 10, 2014
DataFeed.h use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Datedb.cpp do not store cblock, etc. tags into tagdb to save Sep 10, 2015
Datedb.h use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Dates.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Dates.h now it compiles with -m32 Nov 10, 2014
Diff.cpp good checkpoint. quite a few fixes. Nov 18, 2014
Diff.h good checkpoint. quite a few fixes. Nov 18, 2014
Dir.cpp try to fix core dumps. not sure how Aug 22, 2015
Dir.h replace long long with int64_t Oct 30, 2014
DiskPageCache.cpp re-disbale page cache. wtf? Sep 10, 2015
DiskPageCache.h the new disk page cache. temporarily disabled. Aug 14, 2015
Dns.cpp More fixes to prevent spider traffic from hitting hosts with nospider Nov 13, 2015
Dns.h now it compiles with -m32 Nov 10, 2014
DnsProtocol.h use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Domains.cpp now it compiles with -m32 Nov 10, 2014
Domains.h now it compiles with -m32 Nov 10, 2014
Entities.cpp good checkpoint. quite a few fixes. Nov 18, 2014
Entities.h now it compiles with -m32 Nov 10, 2014
Errno.cpp added 4 more diffbot errors so hopefully Jan 12, 2016
Errno.h added 4 more diffbot errors so hopefully Jan 12, 2016
Events.h text replacements for bad int32_t substitutions Nov 18, 2014
Facebook.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Facebook.h now it compiles with -m32 Nov 10, 2014
File.cpp use ./cleanexit file to ensure gb doesn't restart Mar 16, 2016
File.h use ./cleanexit file to ensure gb doesn't restart Mar 16, 2016
Flags.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Flags.h now it compiles with -m32 Nov 10, 2014
GeoIP.c Initial file population. Aug 2, 2013
GeoIP.h use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
GeoIPCity.c Initial file population. Aug 2, 2013
GeoIPCity.h Initial file population. Aug 2, 2013
GeoIP_internal.h Initial file population. Aug 2, 2013
HashTable.cpp now it compiles with -m32 Nov 10, 2014
HashTable.h now it compiles with -m32 Nov 10, 2014
HashTableT.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
HashTableT.h now it compiles with -m32 Nov 10, 2014
HashTableX.cpp fix file/dir creation permissions bugs Sep 21, 2015
HashTableX.h quite a few bug fixes. Jul 2, 2015
Highlight.cpp fix pesky memory leak finally Jul 13, 2015
Highlight.h allow up to 3000 query terms. really we can allow Jul 11, 2015
Hostdb.cpp fix getLeastLoadedInShard() to only return Nov 16, 2015
Hostdb.h Fix host selection for downloading when nospider directives are present. Nov 30, 2015
HttpMime.cpp fix gap.com redirects that require us Feb 9, 2016
HttpMime.h fix gap.com redirects that require us Feb 9, 2016
HttpRequest.cpp added httprequest debug line Mar 21, 2016
HttpRequest.h added support for supplying basic proxy authorization Feb 2, 2015
HttpServer.cpp fix gap.com redirects that require us Feb 9, 2016
HttpServer.h use http/1.0 since we dont support chunked transfer encoding Feb 9, 2016
Images.cpp fix file/dir creation permissions bugs Sep 21, 2015
Images.h now it compiles with -m32 Nov 10, 2014
IndexList.cpp cleanup all warning when not using -m32 Nov 12, 2014
IndexList.h now it compiles with -m32 Nov 10, 2014
IndexReadInfo.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
IndexReadInfo.h now it compiles with -m32 Nov 10, 2014
IndexTable.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
IndexTable.h text replacements for bad int32_t substitutions Nov 18, 2014
IndexTable2.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
IndexTable2.h use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Indexdb.cpp fix compiler warnings Sep 10, 2015
Indexdb.h do not store cblock, etc. tags into tagdb to save Sep 10, 2015
Iso8859.cpp Initial file population. Aug 2, 2013
Iso8859.h Initial file population. Aug 2, 2013
Json.cpp Add gbcapturedate to individual doc's metadata when injecting warcs. Oct 4, 2015
Json.h Add gbcapturedate to individual doc's metadata when injecting warcs. Oct 4, 2015
LICENSE license fix Jun 16, 2014
Lang.cpp now it compiles with -m32 Nov 10, 2014
Lang.h now it compiles with -m32 Nov 10, 2014
LangList.cpp now it compiles with -m32 Nov 10, 2014
LangList.h text replacements for bad int32_t substitutions Nov 18, 2014
Language.cpp fix file/dir creation permissions bugs Sep 21, 2015
Language.h now it compiles with -m32 Nov 10, 2014
LanguageIdentifier.cpp Add gbcapturedate to individual doc's metadata when injecting warcs. Oct 4, 2015
LanguageIdentifier.h now it compiles with -m32 Nov 10, 2014
LanguagePages.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
LanguagePages.h now it compiles with -m32 Nov 10, 2014
Linkdb.cpp fix to allow us to gather ip-only url outlinks again Mar 14, 2016
Linkdb.h fix bug of losing the line waiter header Nov 20, 2015
LinkedList.h now it compiles with -m32 Nov 10, 2014
Log.cpp fix file/dir creation permissions bugs Sep 21, 2015
Log.h make new logfile when current logfile hits 1GB. Jan 5, 2015
Loop.cpp added some more quickpolls. Dec 4, 2015
Loop.h Fix load balance of msg22s to use the udp slots in pinginfo. Nov 3, 2015
Make.depend force gb to recompile version every time Sep 19, 2014
Makefile makefile optimizations Mar 14, 2016
Matches.cpp Don't try to match implicit non-required phrases when verifying doc Jan 8, 2016
Matches.h Fix anomalous link text detector to take into consideration the total Nov 20, 2015
Mem.cpp Fix: possible double free Feb 5, 2016
Mem.h fixes for umsg00 electric fence. Aug 24, 2015
MemPool.cpp now it compiles with -m32 Nov 10, 2014
MemPool.h now it compiles with -m32 Nov 10, 2014
MemPoolTree.cpp good checkpoint. quite a few fixes. Nov 18, 2014
MemPoolTree.h now it compiles with -m32 Nov 10, 2014
MetaContainer.cpp now it compiles with -m32 Nov 10, 2014
MetaContainer.h now it compiles with -m32 Nov 10, 2014
Mime.cpp now it compiles with -m32 Nov 10, 2014
Mime.h now it compiles with -m32 Nov 10, 2014
Monitordb.cpp do not store cblock, etc. tags into tagdb to save Sep 10, 2015
Monitordb.h do not store cblock, etc. tags into tagdb to save Sep 10, 2015
Msg0.cpp prevent core when injecting when not in sync with host #0 Apr 28, 2015
Msg0.h try to handle those quick tagdb lookups first. Jan 30, 2015
Msg1.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Msg1.h text replacements for bad int32_t substitutions Nov 18, 2014
Msg13.cpp fix gap.com redirects that require us Feb 9, 2016
Msg13.h in the sockets table page, Aug 25, 2015
Msg17.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Msg17.h now it compiles with -m32 Nov 10, 2014
Msg1f.cpp fix file/dir creation permissions bugs Sep 21, 2015
Msg1f.h now it compiles with -m32 Nov 10, 2014
Msg2.cpp fix pesky memory leak finally Jul 13, 2015
Msg2.h allow up to 3000 query terms. really we can allow Jul 11, 2015
Msg20.cpp Merge branch 'testing' into diffbot-testing Dec 10, 2015
Msg20.h do not store cblock, etc. tags into tagdb to save Sep 10, 2015
Msg22.cpp if old title rec was corrupted we would get a random docid Mar 16, 2016
Msg22.h now it compiles with -m32 Nov 10, 2014
Msg24.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Msg28.cpp now it compiles with -m32 Nov 10, 2014
Msg28.h now it compiles with -m32 Nov 10, 2014
Msg2a.cpp working with -m32 for basic testing. Nov 12, 2014
Msg2a.h now it compiles with -m32 Nov 10, 2014
Msg2b.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Msg2b.h now it compiles with -m32 Nov 10, 2014
Msg3.cpp added some more quickpolls. Dec 4, 2015
Msg3.h added cache validation logic Sep 10, 2015
Msg30.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Msg30.h now it compiles with -m32 Nov 10, 2014
Msg35.cpp working with -m32 for basic testing. Nov 12, 2014
Msg35.h now it compiles with -m32 Nov 10, 2014
Msg36.cpp text replacements for bad int32_t substitutions Nov 18, 2014
Msg36.h now it compiles with -m32 Nov 10, 2014
Msg37.cpp now it compiles with -m32 Nov 10, 2014
Msg37.h now it compiles with -m32 Nov 10, 2014
Msg39.cpp fix cores on gi #0 Sep 25, 2015
Msg39.h fix some mem leaks from allowing really big queries. Jul 14, 2015
Msg3a.cpp do not report edocunchanged for bulk jobs ever. Jan 30, 2016
Msg3a.h allow up to 3000 query terms. really we can allow Jul 11, 2015
Msg3e.cpp now it compiles with -m32 Nov 10, 2014
Msg3e.h now it compiles with -m32 Nov 10, 2014
Msg4.cpp thanks for the bug fix, ivan! Feb 9, 2016
Msg4.h now it compiles with -m32 Nov 10, 2014
Msg40.cpp fix core from a federated query and null msg20 Feb 18, 2016
Msg40.h Fix double call of gotSummary when computing facets in msg40. Fixes Oct 20, 2015
Msg40Cache.cpp now it compiles with -m32 Nov 10, 2014
Msg40Cache.h Initial file population. Aug 2, 2013
Msg42.cpp text replacements for bad int32_t substitutions Nov 18, 2014
Msg42.h now it compiles with -m32 Nov 10, 2014
Msg5.cpp do not hit file cache when merging files on disk. Sep 11, 2015
Msg5.h now it compiles with -m32 Nov 10, 2014
Msg51.cpp fix core Nov 27, 2014
Msg51.h now it compiles with -m32 Nov 10, 2014
Msg6b.cpp now it compiles with -m32 Nov 10, 2014
Msg6b.h now it compiles with -m32 Nov 10, 2014
Msg8b.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Msg8b.h now it compiles with -m32 Nov 10, 2014
Msg9b.cpp now it compiles with -m32 Nov 10, 2014
Msg9b.h now it compiles with -m32 Nov 10, 2014
MsgC.cpp fix compiler bug Sep 13, 2015
MsgC.h now it compiles with -m32 Nov 10, 2014
Msgaa.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Msgaa.h now it compiles with -m32 Nov 10, 2014
Msge0.cpp good checkpoint. quite a few fixes. Nov 18, 2014
Msge0.h now it compiles with -m32 Nov 10, 2014
Msge1.cpp loop.cpp cleanups. Feb 13, 2015
Msge1.h now it compiles with -m32 Nov 10, 2014
Multicast.cpp Fix load balance of msg22s to use the udp slots in pinginfo. Nov 3, 2015
Multicast.h now it compiles with -m32 Nov 10, 2014
OldDiskPageCache.cpp bring back max mem control into master controls. Aug 14, 2015
OldDiskPageCache.h undo #define thing Aug 14, 2015
PageAddColl.cpp text replacements for bad int32_t substitutions Nov 18, 2014
PageAddUrl.cpp do not consider .gz a 'media' url extension any more May 2, 2015
PageBasic.cpp fix core from adding a lot of sites Mar 8, 2015
PageCatdb.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
PageCrawlBot.cpp fix misspelling Mar 28, 2016
PageCrawlBot.h more api updates Jul 13, 2014
PageDirectory.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
PageEvents.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
PageGet.cpp added trivial link on cached page to gb root page Jan 3, 2016
PageHosts.cpp change try agains recvd to try agains sent Dec 24, 2015
PageIndexdb.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
PageInject.cpp Merge branch 'ia' into testing Nov 9, 2015
PageInject.h show inject requests in the spider queue table now Sep 11, 2015
PageLogView.cpp More testing on nospider, noquery. Aug 31, 2015
PageNetTest.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
PageNetTest.h now it compiles with -m32 Nov 10, 2014
PageOverview.cpp text replacements for bad int32_t substitutions Nov 18, 2014
PageParser.cpp quite a few bug fixes from adding the new query Dec 12, 2014
PageParser.h now it compiles with -m32 Nov 10, 2014
PagePerf.cpp Fixes to injector script. Aug 14, 2015
PageReindex.cpp make query reindex (not query delete) distribute May 7, 2015
PageReindex.h now it compiles with -m32 Nov 10, 2014
PageResults.cpp hack on parentUrlDocId to the json object dump Mar 28, 2016
PageResults.h some debug statement to track down the socket snafu on host 0 Sep 11, 2015
PageRoot.cpp fix add url on root page to set collnum properly. Apr 6, 2016
PageSockets.cpp fix bug of losing the line waiter header Nov 20, 2015
PageSpam.cpp now it compiles with -m32 Nov 10, 2014
PageStats.cpp fix empty winner tree bug. Oct 2, 2015
PageStatsdb.cpp Warc pipe fixes. Fix arcs not processing https. Fix nulls being left Oct 12, 2015
PageSubmit.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
PageThesaurus.cpp now it compiles with -m32 Nov 10, 2014
PageThreads.cpp undo some possible averse changes Sep 4, 2015
PageTitledb.cpp Merge branch 'diffbot-testing' into diffbot-matt Nov 21, 2014
PageTurk.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
PageTurk.h now it compiles with -m32 Nov 10, 2014
Pages.cpp turn off profiler automatically after 60 seconds. Sep 10, 2015
Pages.h return ENOPERM on certain pages if not Jan 29, 2015
Parms.cpp hide the verify disk writes parm, seems to be causing Nov 4, 2016
Parms.h move 2nd occurence of same collnum_t collection id Aug 19, 2015
Phrases.cpp now it compiles with -m32 Nov 10, 2014
Phrases.h now it compiles with -m32 Nov 10, 2014
PingServer.cpp fix core from sending a url alert, then customer deleting Sep 8, 2015
PingServer.h fix core from sending a url alert, then customer deleting Sep 8, 2015
Placedb.cpp do not store cblock, etc. tags into tagdb to save Sep 10, 2015
Placedb.h do not store cblock, etc. tags into tagdb to save Sep 10, 2015
Pops.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Pops.h now it compiles with -m32 Nov 10, 2014
Pos.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Pos.h now it compiles with -m32 Nov 10, 2014
Posdb.cpp fix core in posdbtable from docid of 0. Feb 10, 2016
Posdb.h do not store cblock, etc. tags into tagdb to save Sep 10, 2015
PostQueryRerank.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
PostQueryRerank.h get mike's super long query working Jul 13, 2015
Process.cpp use ./cleanexit file to ensure gb doesn't restart Mar 16, 2016
Process.h more fixes for new spider updates Feb 12, 2015
Profiler.cpp Merge branch 'diffbot-testing' into testing Nov 9, 2015
Profiler.h turn off profiler automatically after 60 seconds. Sep 10, 2015
Proxy.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Proxy.h now it compiles with -m32 Nov 10, 2014
Punycode.cpp Start to detect non-asci urls and encode them to ascii. Sep 12, 2015
Punycode.h Start to detect non-asci urls and encode them to ascii. Sep 12, 2015
QAClient.cpp good checkpoint. quite a few fixes. Nov 18, 2014
QAClient.h now it compiles with -m32 Nov 10, 2014
Query.cpp Filter link text anomalies at query time. Nov 19, 2015
Query.h fix more cores from the dynamic query size changes. Jul 18, 2015
README.md update README.md Mar 20, 2015
Rdb.cpp fix more data corruption bugs. hopefully Mar 21, 2016
Rdb.h do not store cblock, etc. tags into tagdb to save Sep 10, 2015
RdbBase.cpp fix urgent merge mode bug some more? Nov 24, 2015
RdbBase.h fix bug of dumping too many files to disk and not Nov 17, 2015
RdbBuckets.cpp fix file/dir creation permissions bugs Sep 21, 2015
RdbBuckets.h added RdbBuckets::cleanBuckets() corresponding to Mar 22, 2015
RdbCache.cpp Merge branch 'diffbot-testing' into ia Oct 10, 2015
RdbCache.h do not store cblock, etc. tags into tagdb to save Sep 10, 2015
RdbDump.cpp fix dump core when collection deleted while dumping Mar 18, 2016
RdbDump.h do not store cblock, etc. tags into tagdb to save Sep 10, 2015
RdbList.cpp show docids of corrupted title recs found. Mar 16, 2016
RdbList.h fix churn bug in winnerlistcache in spider.cpp Oct 2, 2015
RdbMap.cpp fix so we can generate posdb map for Nov 1, 2015
RdbMap.h fix so we can generate posdb map for Nov 1, 2015
RdbMem.cpp fix dump core when collection deleted while dumping Mar 18, 2016
RdbMem.h after dump completes scan tree to ensure all nodes Mar 17, 2016
RdbMerge.cpp fix core when exiting while merging Oct 24, 2015
RdbMerge.h do not store cblock, etc. tags into tagdb to save Sep 10, 2015
RdbScan.cpp remove unnecessary line Sep 14, 2015
RdbScan.h do not store cblock, etc. tags into tagdb to save Sep 10, 2015
RdbTree.cpp fix dump core when collection deleted while dumping Mar 18, 2016
RdbTree.h fixed bad deletenode call causing dups in Feb 13, 2015
Rebalance.cpp fix save prevention when coring in malloc/free. Aug 23, 2015
Rebalance.h now it compiles with -m32 Nov 10, 2014
Repair.cpp clean out rebuild trees/buckets too Mar 22, 2015
Repair.h Merge branch 'diffbot-testing' into diffbot-matt Nov 21, 2014
RequestTable.cpp cleanup all warning when not using -m32 Nov 12, 2014
RequestTable.h cleanup all warning when not using -m32 Nov 12, 2014
Revdb.cpp text replacements for bad int32_t substitutions Nov 18, 2014
Revdb.h now it compiles with -m32 Nov 10, 2014
S99gb added S99gb for loading at boot. Jun 23, 2014
SafeBuf.cpp Merge branch 'ia' into testing Nov 9, 2015
SafeBuf.h Merge branch 'ia' into testing Nov 9, 2015
SafeList.h use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Sanity.h Merge branch 'diffbot-testing' into diffbot-matt Nov 21, 2014
Scores.cpp now it compiles with -m32 Nov 10, 2014
Scores.h now it compiles with -m32 Nov 10, 2014
Scraper.cpp now it compiles with -m32 Nov 10, 2014
Scraper.h now it compiles with -m32 Nov 10, 2014
SearchInput.cpp fix more cores from the dynamic query size changes. Jul 18, 2015
SearchInput.h added support for &nf=50 to limit to top 50 facets. Jan 29, 2015
Sections.cpp do not store cblock, etc. tags into tagdb to save Sep 10, 2015
Sections.h do not store cblock, etc. tags into tagdb to save Sep 10, 2015
SiteGetter.cpp fix infinite loop bug from EBADRBDID Jul 31, 2015
SiteGetter.h now it compiles with -m32 Nov 10, 2014
Speller.cpp fix file/dir creation permissions bugs Sep 21, 2015
Speller.h now it compiles with -m32 Nov 10, 2014
Spider.cpp hash bang fix. Mar 20, 2016
Spider.h improve spider performance when we have lots of collections. Nov 2, 2015
SpiderProxy.cpp fix some diffbot crawls. Dec 23, 2015
SpiderProxy.h spider proxy fixes for negative ports Oct 21, 2015
Stats.cpp allow up to 3000 query terms. really we can allow Jul 11, 2015
Stats.h allow up to 3000 query terms. really we can allow Jul 11, 2015
Statsdb.cpp Fix repeating label. Sep 24, 2015
Statsdb.h fix signed/unsigned bug Dec 10, 2014
StopWords.cpp fixed langid based query stop words. Mar 8, 2015
StopWords.h fixed langid based query stop words. Mar 8, 2015
Strings.cpp now it compiles with -m32 Nov 10, 2014
Strings.h now it compiles with -m32 Nov 10, 2014
Summary.cpp fix add url on root page to set collnum properly. Apr 6, 2016
Summary.h nomenclature changes Jul 14, 2015
Syncdb.cpp try to fix core dumps. not sure how Aug 22, 2015
Syncdb.h now it compiles with -m32 Nov 10, 2014
Synonyms.cpp a lot of bug fixes thanks to isj. Mar 29, 2016
Synonyms.h nomenclature change Dec 4, 2014
Tagdb.cpp try to fix a couple more core dumps. Feb 19, 2016
Tagdb.h do not store cblock, etc. tags into tagdb to save Sep 10, 2015
TcpServer.cpp added support for TLS SNI (Server name identification) Dec 23, 2015
TcpServer.h prevent double ./gb start calls from messing Aug 31, 2015
TcpSocket.h added proper write callback registration into Feb 16, 2015
Test.cpp now we add the spider status docs as json documents. Mar 19, 2015
Test.h now it compiles with -m32 Nov 10, 2014
Tfndb.cpp now it compiles with -m32 Nov 10, 2014
Tfndb.h now it compiles with -m32 Nov 10, 2014
Thesaurus.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Thesaurus.h now it compiles with -m32 Nov 10, 2014
Threads.cpp join with threads when exiting -- to no avail Dec 17, 2015
Threads.h try to fix exiting w/ pthreads some more (part 2) Dec 17, 2015
Timedb.cpp text replacements for bad int32_t substitutions Nov 18, 2014
Timedb.h text replacements for bad int32_t substitutions Nov 18, 2014
Timer.h now it compiles with -m32 Nov 10, 2014
Title.cpp a lot of bug fixes thanks to isj. Mar 29, 2016
Title.h now it compiles with -m32 Nov 10, 2014
Titledb.cpp do not store cblock, etc. tags into tagdb to save Sep 10, 2015
Titledb.h do not store cblock, etc. tags into tagdb to save Sep 10, 2015
TopTree.cpp fixed bad deletenode call causing dups in Feb 13, 2015
TopTree.h fix cores in top tree with last commit. this one Dec 8, 2014
TuringTest.cpp now it compiles with -m32 Nov 10, 2014
TuringTest.h now it compiles with -m32 Nov 10, 2014
Turkdb.cpp now it compiles with -m32 Nov 10, 2014
UCNormalizer.cpp now it compiles with -m32 Nov 10, 2014
UCNormalizer.h now it compiles with -m32 Nov 10, 2014
UCPropTable.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
UCPropTable.h now it compiles with -m32 Nov 10, 2014
UCWordIterator.cpp now it compiles with -m32 Nov 10, 2014
UCWordIterator.h now it compiles with -m32 Nov 10, 2014
UdpProtocol.h no limit to tagdb lookups even if niceness 1 Jan 30, 2015
UdpServer.cpp Merge branch 'ia-zak' of https://github.com/gigablast/open-source-sea… Sep 11, 2015
UdpServer.h Add logic to limit number of msg7s to 100 per hosts, then we drop the Sep 4, 2015
UdpSlot.cpp change try agains recvd to try agains sent Dec 24, 2015
UdpSlot.h allow more docids to be downloaded/served in search results. Mar 22, 2016
Unicode.cpp a lot of bug fixes thanks to isj. Mar 29, 2016
Unicode.h Optimize UTF-8 handling in getUtf8CharSize() by using logic instead o… Sep 7, 2015
UnicodeProperties.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
UnicodeProperties.h now it compiles with -m32 Nov 10, 2014
Url.cpp Merge branch 'testing' of https://github.com/gigablast/open-source-se… Mar 29, 2016
Url.h Show utf8 url in page results. Sep 21, 2015
Users.cpp fix save prevention when coring in malloc/free. Aug 23, 2015
Users.h now it compiles with -m32 Nov 10, 2014
ValidPointer.cpp Initial file population. Aug 2, 2013
ValidPointer.h Initial file population. Aug 2, 2013
Vector.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Vector.h now it compiles with -m32 Nov 10, 2014
Version.cpp now it compiles with -m32 Nov 10, 2014
Version.h now it compiles with -m32 Nov 10, 2014
Weights.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Weights.h now it compiles with -m32 Nov 10, 2014
Wiki.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Wiki.h now it compiles with -m32 Nov 10, 2014
Wiktionary.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
Wiktionary.h now it compiles with -m32 Nov 10, 2014
Words.cpp quite a few bug fixes. Jul 2, 2015
Words.h query stop words now based on selected langid. Mar 8, 2015
Xml.cpp fix </script> tag detection stuff again. Aug 31, 2015
Xml.h fix links parser so it harvests outlinks from rss feeds' Mar 13, 2015
XmlDoc.cpp a lot of bug fixes thanks to isj. Mar 29, 2016
XmlDoc.h a lot of bug fixes thanks to isj. Mar 29, 2016
XmlNode.cpp sitemap.xml support for harvesting loc urls. Mar 17, 2015
XmlNode.h sitemap.xml support for harvesting loc urls. Mar 17, 2015
addtest.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
animate.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
antiword fix ulimit and antiword bugs Jun 18, 2014
badcattable.dat Initial file population. Aug 2, 2013
blaster2.cpp fix right Oct 8, 2015
bmptopnm Initial file population. Aug 2, 2013
camsort.cpp now it compiles with -m32 Nov 10, 2014
catcountry.dat Initial file population. Aug 2, 2013
character-sets Initial file population. Aug 2, 2013
check_unicode.cpp Initial file population. Aug 2, 2013
control.deb package bldg updates Jun 17, 2014
convert.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
copyright.head package bldg updates Jun 17, 2014
copyright.tail package bldg updates Jun 17, 2014
create_ucd_tables.cpp now it compiles with -m32 Nov 10, 2014
dlstubs.c Initial file population. Aug 2, 2013
dmozparse.cpp fix make dmozparse Sep 13, 2015
dnstest.cpp now it compiles with -m32 Nov 10, 2014
dumpcore.cpp Initial file population. Aug 2, 2013
errnotest.cpp errnotest.cpp fix Aug 24, 2015
fastIndexTable.cpp good checkpoint. quite a few fixes. Nov 18, 2014
fctypes.cpp fix to shut up app checker. Nov 4, 2016
fctypes.h Merge branch 'ia' into testing Oct 12, 2015
filterquerylogs.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
gb-1.0.spec make it so we don't need --nodeps with May 26, 2014
gb-include.h replace memcpy_ass with bcopy Jan 14, 2015
gb.deb.rules if netpbm pkg already installed use it. Jul 6, 2014
gb.pem so we have spider https sites add Oct 13, 2013
gbfilter.cpp fix file/dir creation permissions bugs Sep 21, 2015
gbtitletest.cpp now it compiles with -m32 Nov 10, 2014
geneaology.cpp now it compiles with -m32 Nov 10, 2014
generateSuperMergeCode.cpp now it compiles with -m32 Nov 10, 2014
geo_ip_table.cpp Initial file population. Aug 2, 2013
geo_ip_table.h Initial file population. Aug 2, 2013
getsample.cpp now it compiles with -m32 Nov 10, 2014
giftopnm Initial file population. Aug 2, 2013
gigablast.cbp added Codeblocks project file Oct 31, 2014
gigablast.layout added Codeblocks project file Oct 31, 2014
hash.cpp fix more possible unicode errors Jul 19, 2015
hash.h fix more possible unicode errors Jul 19, 2015
hashtest.cpp now it compiles with -m32 Nov 10, 2014
hashtest2.cpp now it compiles with -m32 Nov 10, 2014
hashtest3.cpp now it compiles with -m32 Nov 10, 2014
hosts.cpp now it compiles with -m32 Nov 10, 2014
iana_charset.cpp now it compiles with -m32 Nov 10, 2014
iana_charset.h now it compiles with -m32 Nov 10, 2014
iconv.h good checkpoint. quite a few fixes. Nov 18, 2014
init.gb.conf minor make install changes May 23, 2014
injectme3 added injectme3 file and documentation into compare.html Aug 17, 2013
injectmedemo fix sections.cpp to not set root title section Dec 12, 2014
injector.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
iostream.h good checkpoint. quite a few fixes. Nov 18, 2014
ip.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
ip.h now it compiles with -m32 Nov 10, 2014
ipconfig.cpp now it compiles with -m32 Nov 10, 2014
jointest.cpp Initial file population. Aug 2, 2013
jpegtopnm Initial file population. Aug 2, 2013
keepalive.cpp Initial file population. Aug 2, 2013
libc.a Initial file population. Aug 2, 2013
libcrypto.a turn off hearbeats when compiling openssl libs Apr 22, 2014
libgcc.a Initial file population. Aug 2, 2013
libiconv.a Initial file population. Aug 2, 2013
libiconv.la Initial file population. Aug 2, 2013
libiconv64.a added 64 bit libiconv64.a Nov 15, 2014
libjpeg.so.62 thumbnail generation support back in. Apr 24, 2014
libm.a Initial file population. Aug 2, 2013
libnetpbm.so.10 thumbnail generation support back in. Apr 24, 2014
libpng12.so.0 thumbnail generation support back in. Apr 24, 2014
libpthread.a Initial file population. Aug 2, 2013
libssl.a turn off hearbeats when compiling openssl libs Apr 22, 2014
libstdc++.a Initial file population. Aug 2, 2013
libtiff.so.4 thumbnail generation support back in. Apr 24, 2014
libz.a Initial file population. Aug 2, 2013
libz.so.1 thumbnail generation support back in. Apr 24, 2014
libz64.a add libz64.a Nov 17, 2014
linkspam.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
linkspam.h now it compiles with -m32 Nov 10, 2014
looptest.cpp now it compiles with -m32 Nov 10, 2014
main.cpp update ./gb -h desc for ./gb inject. Apr 6, 2016
malloc.c Initial file population. Aug 2, 2013
matches2.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
matches2.h now it compiles with -m32 Nov 10, 2014
membustest.cpp now it compiles with -m32 Nov 10, 2014
memtest.cpp now it compiles with -m32 Nov 10, 2014
mergetest.cpp now it compiles with -m32 Nov 10, 2014
mixfile.cpp now it compiles with -m32 Nov 10, 2014
mmseg.h now it compiles with -m32 Nov 10, 2014
monitor.cpp now it compiles with -m32 Nov 10, 2014
mysynonyms.txt mysyn fixes Apr 22, 2015
numwords.cpp now it compiles with -m32 Nov 10, 2014
parse_iana_charsets.pl move CollectionRec stuff into Collectiondb files Dec 10, 2013
pdftohtml fix rdbcache init core Dec 1, 2014
pngtopnm Initial file population. Aug 2, 2013
pnmscale Initial file population. Aug 2, 2013
porter.cpp now it compiles with -m32 Nov 10, 2014
postalCodes.txt Initial file population. Aug 2, 2013
ppmtojpeg Initial file population. Aug 2, 2013
pstotext Initial file population. Aug 2, 2013
qa.cpp complete merge of ia code into testing. Nov 9, 2015
quarantine.cpp now it compiles with -m32 Nov 10, 2014
rdbtest.cpp now it compiles with -m32 Nov 10, 2014
rdbtest2.cpp now it compiles with -m32 Nov 10, 2014
readRec.cpp now it compiles with -m32 Nov 10, 2014
reindex2.cpp now it compiles with -m32 Nov 10, 2014
rescue.cpp now it compiles with -m32 Nov 10, 2014
rmbots.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
seektest.cpp now it compiles with -m32 Nov 10, 2014
seo.h now it compiles with -m32 Nov 10, 2014
sitelinks.txt fixed missing sites in sitelinks.txt Mar 6, 2015
sleepandlog.cpp now it compiles with -m32 Nov 10, 2014
sort.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
sort.h use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
streambuf.h good checkpoint. quite a few fixes. Nov 18, 2014
superMergeTest.cpp now it compiles with -m32 Nov 10, 2014
supported_charsets.cpp Initial file population. Aug 2, 2013
supported_charsets.txt Initial file population. Aug 2, 2013
test2.cpp now it compiles with -m32 Nov 10, 2014
test_convert.cpp now it compiles with -m32 Nov 10, 2014
test_hash.cpp replace long long with int64_t Oct 30, 2014
test_norm.cpp now it compiles with -m32 Nov 10, 2014
test_parser.cpp now it compiles with -m32 Nov 10, 2014
test_parser2.cpp now it compiles with -m32 Nov 10, 2014
test_unicode.cpp now it compiles with -m32 Nov 10, 2014
testfloats.cpp now it compiles with -m32 Nov 10, 2014
threadtest.cpp now it compiles with -m32 Nov 10, 2014
thunder.cpp now it compiles with -m32 Nov 10, 2014
tifftopnm Initial file population. Aug 2, 2013
treetest.cpp now it compiles with -m32 Nov 10, 2014
types.h fix keysize==8 bug in keycmp Mar 28, 2016
udptest.cpp now it compiles with -m32 Nov 10, 2014
unifiedDict.txt Initial file population. Aug 2, 2013
uniq2.cpp now it compiles with -m32 Nov 10, 2014
urlinfo.cpp use gbmemcpy not memcpy so we can get profiler working again Jan 13, 2015
wikititles.txt.part1 Initial file population. Aug 2, 2013
wikititles.txt.part2 Initial file population. Aug 2, 2013
wiktionary-buf.txt when user searches for a word without the Jun 1, 2014
wiktionary-lang.txt when user searches for a word without the Jun 1, 2014
wiktionary-syns.dat when user searches for a word without the Jun 1, 2014
zconf.h updated to a new libz64.a. updated zconf.h and Nov 17, 2014
zlib.h updated to a new libz64.a. updated zconf.h and Nov 17, 2014

README.md

open-source-search-engine

An open source web and enterprise search engine and spider/crawler. As can be seen on http://www.gigablast.com/ .

RUNNING GIGABLAST

See html/faq.html for all administrative documentation including the quick start instructions.

Alternatively, visit http://www.gigablast.com/faq.html

CODE ARCHITECTURE

See html/developer.html for all code documentation.

Alternatively, visit http://www.gigablast.com/developer.html

CONTACT

Contact me for feature requests or help in general. I will work for free for good use cases. mattdwells@hotmail.com.