Skip to content

Developing on the four main daemons (Searching, Crawling, Documents and AD)

runarbu edited this page Dec 16, 2014 · 9 revisions

Aub article This article is a work in progress. You can help Searchdaimon by expanding it with information you know.

The ES has four main daemons that handle different tasks. They are all located in the bin folder, and are commonly run from the command line when doing development.

  1. searchdbb - Do the actual searching. Get the queries to search for from cgi-bin/dispatcher_allbb.
  2. crawlManager2 - Get told what to crawl and add the data to the index. Also handle lookups of security information in 3-party system.
  3. boitho-bbdn - Get crawled documents from crawlManager2 and extract the text, makes thumbnails and add them to the repository. Another program will be told to index the repository later.
  4. boithoad - Dedicated daemon for talking to Active Directory. The code will be a plugin for crawlManager2 some day.

Running the daemons manually from the command line

When developing on the ES core you may want to run one or more of the main daemons from the command line to see what debug information they produce. Running from the command line is also necessarily when using debugging tools like GDB and Valgrind.

Most of the debug information is outputted through a login library. It can either be discarded (the default), redirected to a file or printed on the console. To get the debug info printed on the console you have to set the environment flags BBLOGGER_APPENDERS to 1 and set the severity level BBLOGGER_SEVERITY.

Stop the existing instance from the admin panel

Before you can run a daemon from the command line the already running instance must be stopped. Access the web admin panel, and under "Manage services" stop the ones you are planning to run manually.

From where

You must login to the ES with ssh, su to the boitho user and enter the /home/boitho/boithoTools folder before you can run any of the daemons.

Typically you login as root and:

su – boitho
cd boithoTools
Example running searchdbb:

env BBLOGGER_APPENDERS=1 BBLOGGER_SEVERITY=5 bin/searchdbb

The search kernel - searchdbb

Do the actual searching. Get the queries to search for from cgi-bin/dispatcher_allbb.

Running

env BBLOGGER_APPENDERS=1 BBLOGGER_SEVERITY=5 bin/searchdbb

The searchdbb daemon also have some command line option you should enable when running from the command line:

  • -f Fast startup. Skip thesaurus rebuild and index pre-chasing
  • -s Single process. Will not create a new process for each query nor use multiple threads

Example:

env BBLOGGER_APPENDERS=1 BBLOGGER_SEVERITY=5 bin/searchdbb -f -s

Now you can go to the search interface and search for something. Observe how the searchdbb daemon outputs debug info to the ssh console.

Valgrind

Sometimes you may want to run Valgrind to look for memory leeks and errors.

Example:

env BBLOGGER_APPENDERS=1 BBLOGGER_SEVERITY=5 valgrind --leak-check=full --max-stackframe=5247212 bin/searchdbb -f -s

Command line arguments

The search kernel supports some command line arguments that may be useful when running it manually.

Argument     
-m number Max. The search kernel will exit after max number of queries. Normally -m are used with -s or -t to run a certain number and queries and then exit so valgrind can do a full memory leek check
-l Log. Log messages to the logs/searchd.log file
-o Preopen. Open up index files before starting
-b file Brank file. Load static rank information from brank file
-s Single. Do not fork for new connections nor use multiple threads
-t Single with Threads. Do not fork for new connections but will use multiple threads
-f Fast startup. Skip time consuming task at startup so we can start to answer queries faster. Often used when running the searchdbb from the commandline and we don't want to wait for spelling etc
-c No cache. Do not cache indexes
-A number Set appenders
-L number Set log severity
-S number Set spelling min freq
-a seconds Alarm. Set how long a query can run for. Default is 60 seconds. When the time is up the searchdbb will receive and alarm and exit the process that was running the query

Crawler and user systems - crawlManager2

Get told what to crawl and add the data to the index. Also handle lookups of security information in 3-party system.

Running

env BBLOGGER_APPENDERS=1 BBLOGGER_SEVERITY=5 bin/crawlManager2

Command line arguments

The crawler manager supports some command line arguments that may be useful when running it manually.

Argument     
-m number Max. Exit after max number of queries. Normally -m are used with -s to run a certain number and crawls and then exit so valgrind can do a full memory leek check
-s Single. Do not fork for new connections nor use multiple threads

Document manager - boitho-bbdn

Get crawled documents from crawlManager2 and extract the text, makes thumbnails and add them to the repository. Another program will be told to index the repository later.

Plugins

The document manager uses plugins to extract the text from the files is gets. Please see the main article for more information about them: Plugin: File filter

Running from the command line

env BBLOGGER_APPENDERS=1 BBLOGGER_SEVERITY=5 bin/boitho-bbdn

Command line arguments

The crawler manager supports some command line arguments that may be useful when running it manually.

Argument     
-m number Max. Exit after max number of queries. Normally -m are used with -s to run a certain number and crawls and then exit so valgrind can do a full memory leek check
-s Single. Do not fork for new connections nor use multiple threads

Active Directory integration - boithoad

Dedicated daemon for talking to Active Directory. The code will be a plugin for crawlManager2 some day.

Running

env BBLOGGER_APPENDERS=1 BBLOGGER_SEVERITY=5 bin/boithoad