The source for Nutrimatic

Build from source (the easy scripted way)

(If this doesn't work for you, see the manual steps below.)

You'll need a working C++ build system, with Python's dependencies; for Debian/Ubuntu:

sudo apt install \
    build-essential curl \
    libbz2-dev libffi-dev liblzma-dev libreadline-dev \
    libsqlite3-dev libssl-dev libxml2-dev libxmlsec1-dev \
    tk-dev xz-utils zlib1g-dev

Install mise-en-place as a tool installer: curl https://mise.run | sh (or see other install methods)
Run ./dev_setup.py which will install various dependencies locally
Then run conan build . which will leave binaries in build/

Build from source (the hard manual way)

(The scripted path above is easier! But maybe that's too magical, or you don't like mise...)

As above, you'll need C++ build tools
Use Python 3.10 (avoids this wikiextractor bug exposed by this change in Python 3.11)
You probably want to set up a Python venv
Install Conan, CMake, etc: pip install -r dev_requirements.txt
Configure Conan to build on your machine (if you haven't already)
```
conan profile detect
conan profile path default  # note the path this outputs
```
Edit the file listed by conan profile path default to set compiler.cppstd=17 (or gnu17)
Install C++ dependencies: conan install . --build=missing
Then run conan build . which will leave binaries in build/

To build an index

To actually use Nutrimatic, you will need to build an index from Wikipedia.

Download the latest Wikipedia database dump (this is a ~20GB file!):
```
wget https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
```
(You can also look for a mirror closer to you.)
Extract the text from the articles using Wikipedia Extractor (this generates ~12GB, and takes hours!):
```
pip install wikiextractor  # installs into the local virtualenv
wikiextractor enwiki-latest-pages-articles.xml.bz2
```
(There are probably better extractors these days!)

This will write many files named text/??/wiki_??.
Index the text (this generates ~100GB of data, and also takes hours!):
```
find text -type f | xargs cat | build/make-index wikipedia
```
This will write many files named wikipedia.?????.index. (You can break this up by running make-index with different chunks of input data, replacing "wikipedia" with unique names each time.)
Merge the indexes; I normally do this in two stages:
```
for x in 0 1 2 3 4 5 6 7 8 9
do build/merge-indexes 2 wikipedia.????$x.index wiki-merged.$x.index
done
```
followed by
```
build/merge-indexes 5 wiki-merged.*.index wiki-merged.index
```
There's nothing magical about this 10-batch approach, you can use any strategy you like. The 2 and 5 numbers are phrase frequency cutoffs (how many times a string must occur to be included).

Enjoy your new index:

build/find-expr wiki-merged.index '<aciimnrttu>'

Serving the web interface

If you want to run the nutrimatic.org style interface, point a web server at the web_static/ directory, and for root requests have it launch cgi_scripts/cgi-search.py with $NUTRIMATIC_FIND_EXPR set to the find-expr binary and $NUTRIMATIC_INDEX set to the index you built.

(You might want to use install_to_dir.py which will copy executables, CGI scripts, and static content to the directory of your choice.)

For example, you could adapt this nginx config:

location /my-nutrimatic/ {
  # Serve static files (change /home/me/nutrimatic_install to your install dir)
  alias /home/me/nutrimatic_install/web_static/;

  # For root requests, run the CGI script
  location = /my-nutrimatic/ {
    fastcgi_pass unix:/var/run/fcgiwrap.socket;
    fastcgi_buffering off;  # send results as soon as we find them
    include /etc/nginx/fastcgi_params;
    gzip off;  # gzip compression also causes buffering

    # (change /home/me/nutrimatic_install to your install dir)
    fastcgi_param SCRIPT_FILENAME /home/me/nutrimatic_install/cgi_scripts/cgi-search.py;
    fastcgi_param NUTRIMATIC_FIND_EXPR /home/me/nutrimatic_install/bin/find-expr;

    # (change to wherever you put your index file)
    fastcgi_param NUTRIMATIC_INDEX /home/me/nutrimatic_install/wiki-merged.index;
  }
}

Reproducing public versions

If you want to reproduce historical results from the website, you need to build an index from the corresponding Wikipedia data dump using compatible index building and searching logic:

nutrimatic.org/2016 (Dec 2016 - Feb 2024): historical code with enwiki-20161101 (discussion)
nutrimatic.org/2024 (Feb 2024 - current): main branch with enwiki-20231201

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
cgi_scripts		cgi_scripts
source		source
web_static		web_static
.dev_setup.py.swp		.dev_setup.py.swp
.gitignore		.gitignore
.mise.toml		.mise.toml
COPYING		COPYING
README.md		README.md
conanfile.py		conanfile.py
dev_requirements.txt		dev_requirements.txt
dev_setup.py		dev_setup.py
install_to_dir.py		install_to_dir.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cgi_scripts

cgi_scripts

source

source

web_static

web_static

.dev_setup.py.swp

.dev_setup.py.swp

.gitignore

.gitignore

.mise.toml

.mise.toml

COPYING

COPYING

README.md

README.md

conanfile.py

conanfile.py

dev_requirements.txt

dev_requirements.txt

dev_setup.py

dev_setup.py

install_to_dir.py

install_to_dir.py

Repository files navigation

The source for Nutrimatic

Build from source (the easy scripted way)

Build from source (the hard manual way)

To build an index

Serving the web interface

Reproducing public versions

About

Releases

Packages

Contributors 4

Languages

License

PuzzleTechHub/nutrimatic

Folders and files

Latest commit

History

Repository files navigation

The source for Nutrimatic

Build from source (the easy scripted way)

Build from source (the hard manual way)

To build an index

Serving the web interface

Reproducing public versions

About

Resources

License

Stars

Watchers

Forks

Languages