Skip to content

Installation Guide

Duong, Dinh-Phuong edited this page Jul 6, 2021 · 19 revisions

Installation Guide

Frontend

To start the frontend:

  • download and install Node.js.

  • open a terminal and install Angular CLI globally: npm install -g @angular/cli

  • Navigate into the app folder and install packages: npm ci

  • for development mode: npm run start:de from artbrowser/app folder, app will be available in a browser on localhost:4200/de or you can use any other language supported by openartbrowser (eg. npm run start:en)

  • for deployment: npm run build-locale on server and copy files to target directory

Frontend configuration:

  • default Elasticsearch url is 'https://openartbrowser.org/api/{lang}/_search'

  • To change Elasticsearch url to another server, change the elasticEnvironment variable in in app/src/environments/environment.ts

  • If you want to use the local elasticsearch docker container: npm run start_docker

Elasticsearch Docker Container

The repository provides a Dockerfile for building and running a local elasticsearch instance inside a docker container:

  • install docker
  • build and run the image via the provided helper script: etl/docker_elastic.sh or etl/docker_elastic.bat
  • the script will kill old instances of the docker container

More information about the Dockerfile can be found here.

Elasticsearch Server

To install the Elasticsearch server pick the correct installation method for your operating system here: Installing Elasticsearch.

On the server Elasticsearch was installed with the advanced packaging tool (apt). The currently installed version is 7.13.2.

Configuration

The Elasticsearch directory layout can be viewed here: Elasticsearch directory layout

To configure the server you need to make changes in the elasticsearch.yml file which is located in /etc/elasticsearch (or somewhere else depending on your OS). The elasticsearch.yml file can be found in the repository /openartbrowser/etl/upload_to_elasticsearch. To enable the snapshot feature, which is used for index swapping, you need to provide a backup directory. On the server we use /var/lib/elasticsearch/backup this also depends on the operating system but it is necessary to set this backup directory in order to run the elasticsearch_helper.py script.

After all changes were written to elasticsearch.yml you need to restart the Elasticsearch server which can be done by following commands:

Stop the Elasticsearch server

sudo systemctl stop elasticsearch.service

Start the Elasticsearch server

sudo systemctl start elasticsearch.service

Further information about this can be found here: Starting Elasticsearch

Ram configruation

The elasticsearch standard config uses half the system memory as heap. This is too much for our configuration and so the ram usage on both staging and production is limited. Currently the servers are configured to use 4GB minimum and 8GB maximum. This configuration is stored in "/etc/elasticsearch/jvm.options.d/jvm.options". The configuration was done according to this advanced configuration guide to set JVM heap.

Upgrading Elasticsearch

Upgrading the elasticsearch clusters can be done with a full cluster restart upgrade. Checking the version can be done (on the server) with curl -X GET "http://localhost:9200"

Nginx Server

The Nginx server forwards every search query to the Elasticsearch server (Nginx is the reverse proxy for Elasticsearch). The reason for this is that the Elasticsearch server must not be accessed from the outside via its REST interface. If this would be possible anyone could delete indices, documents, snapshots and so on.

The configuration for this can be found in /etc/nginx/sites-enabled/default

With the new multilanguage feature the endpoints have changed and are not the same as in the picture above but the concept stays the same.

ETL process installation

Execution requirements

The scripts which extract data from wikidata for the openartbrowser are using the programming language Python. To execute them following programs are required:

Installation on ubuntu (with apt):

  • First add Personal Package Archives (PPA) for nodejs with curl
  • Install nodejs with apt-get
    • sudo apt-get install nodejs

The versions are recommendations older versions may work.

When python is installed the dependencies for the openartbrowser code can be installed.

In order to install the dependencies run the install_etl.sh script.

./install_etl.sh

or on Windows

./install_etl.bat

Python requirements

To install all required python packages execute the following command in the script directory. If you run install_etl.sh this will be performed within the script.

pip3 install -r requirements.txt

Configure data extraction

Pywikibot configuration

To be able to execute the art_ontology_crawler.py script you first have to configure the pywikibot installation. In the repository a /openartbrowser/etl/user-config.py is provided which configures the user of the pywikibot. It is necessary that this file exists in order to run the script. The script will always use the user-config.py from the directory in which you executed the art_ontology_crawler.py script. So always execute this script from the /etl directory.

python3 data_extraction/art_ontology_crawler.py

There are several options how to configure the pywikibot: https://www.mediawiki.org/wiki/Manual:Pywikibot/user-config.py#Location .

If you want to use pywikibot with an MediaWiki account you can follow the tutorial from wikidata on the following link: https://www.wikidata.org/wiki/Wikidata:Pywikibot_-_Python_3_Tutorial/Setting_up_Shop .

Set environment variables for importing python modules

The python scripts reference own modules to avoid code duplication. It is necessary to set environment variables that this procedure works. On Unix based systems the PYTHONPATH variable has to be set to the openartbrowser/etl directory on your opened shell session.

export PYTHONPATH="${PYTHONPATH}:openartbrowser/etl"

The Unix environment variables are dependend on the shell you use so you have to look that up if the above doesn't work for you. In the openartbrowser/etl/scripts/install_etl.sh are examples on how to set the environment variable up for the bash shell.

You may also set the PYWIKIBOT_DIR variable to the openartbrowser/etl directory to be able to execute the script from another directory than openartbrowser/etl, but this is optional and not used on the server anyways.

On Windows the system environment variable (not user variable) can be set via the GUI or via the terminal like this

setx PYTHONPATH "%PYTHONPATH%;%CD%" /M

Also the PYWIKIBOT_DIR variable can be set but this is optional.

setx PYWIKIBOT_DIR "%CD%" /M

If above doesn't work for you please use the GUI for it.