Skip to content

nemani/ahmia-index

 
 

Repository files navigation

Ahmia index

Ahmia search engine use elasticsearch to index content.

Installation

Please install elastic search from the official repository thanks to the official guide

Configuration

Default configuration is enough to run index in dev mode. Here is suggestion for a more secure configuration

/etc/security/limits.conf

elasticsearch - nofile unlimited
elasticsearch - memlock unlimited

/etc/default/elasticsearch

on CentOS/RH: /etc/sysconfig/elasticsearch

ES_HEAP_SIZE=2g # Half of your memory, other half is for Lucene
MAX_OPEN_FILES=1065535
MAX_LOCKED_MEMORY=unlimited

/etc/elasticsearch/elasticsearch.yml

bootstrap.mlockall: true
script.engine.groovy.inline.update: on
script.engine.groovy.inline.aggs: on

Start the service

# systemctl start elasticsearch

Init mappings

Please do this when running for the first time

$ curl -XPUT -i "localhost:9200/crawl-2017-10/" -H 'Content-Type: application/json' -d "@./mappings.json"
$ curl -XPUT -i "localhost:9200/crawl-2017-11/" -H 'Content-Type: application/json' -d "@./mappings.json"
$ curl -XPUT -i "localhost:9200/crawl-2017-12/" -H 'Content-Type: application/json' -d "@./mappings.json"

or

$ bash setup_index.sh

Keep crawl-latest pointed to latest monthly indexes

$ python3 point_to_indexes.py

Filter some abuse sites

$ bash call_filtering.sh

Crontab

# Every day
50 09 * * * cd /usr/local/home/juha/ahmia-index && bash call_filtering.sh > ./filter.log 2>&1
# Once a month
10 04 16 * * cd /usr/local/home/juha/ahmia-index && ./venv3/bin/python point_to_indexes.py > ./change_alias.log 2>&1

About

Ahmia's elasticsearch index

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 86.1%
  • Shell 13.9%