Skip to content

nodeca/nodeca.search

Repository files navigation

nodeca.search

Build Status

Nodeca search app.

See main repo for install instructions: https://github.com/nodeca/nodeca

Stopwords dump memo

prepare

  1. Rebuild index.
  2. Run optimize index forum_posts if not optimized yet.

dump

cd <root>/sphinx_data/node<X>
indextool --dumpdict tables/forum_posts.<XX>.spi -c searchd.conf > dump.txt
cat dump.txt | sed -e '0,/keyword,docs,hits,offset/d' | sort -t"," -k 2 -g -r | head -n 1000 | sed -e '/^\x02/d' > dump_top.txt
cat dump_top.txt | sed  's/,.*//' > stopwords.txt

Then edit stopwords.txt, remove unnecessary words. Use dump_top.txt to check documents number (second column).

About

No description, website, or topics provided.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published