Skip to content
Backend & crawler for the OSS catalog of Developers Italia
Go Shell Makefile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci One more fix for CircleCI Jan 31, 2019
crawler Remove unneeded indirect dependency for clarity Aug 6, 2019
docs Switch to BCP 47 language codes (#82) Feb 28, 2019
elasticsearch Load public entities from IndicePA into Elasticsearch Apr 18, 2019
kibana/config Enable Kibana Feb 6, 2019
prometheus Upgrade the crawler to Go 1.11 and move its code under crawler/ Jan 24, 2019
.dockerignore [#158495750] Add container for developers.italia.it. Fix Dockerfile f… Jul 19, 2018
.env.example Major refactoring for cleaner repository layout and cleaner setup. Tr… Jan 31, 2019
.gitignore Store the whitelist of third-party software in the repository and ren… Jun 20, 2019
.gometalinter.json [#158596554] Add unsupported countries ignore in jekyll file generation Oct 8, 2018
AUTHORS
LICENSE Initial commit Mar 21, 2018
Makefile
README.md typo on chapter title 4 Jun 28, 2019
docker-compose.yml Enable Kibana Feb 6, 2019

README.md

developers-italia-backend

CircleCI Go Report Card Join the #website channel Get invited

Backend & crawler for the OSS catalog of Developers Italia

The crawler finds and retrieves all the publiccode.yml files from the Organizations registered on Github/Bitbucket/Gitlab listed in the whitelistes, and then generates YAML files that are later used by the Jekyll build chain to generate the static pages of developers.italia.it.

Components

Dependencies

Set-up

Stack

  1. rename .env.example to .env and fill the variables with your values

    • default Elasticsearch user and password are elastic:elastic
    • default Kibana user and password are kibana:kibana
  2. rename elasticsearch/config/searchguard/sg_internal_users.yml.example to elasticsearch/config/searchguard/sg_internal_users.yml and insert the correct passwords

    Hashed passwords can be generated with:

    docker exec -t -i developers-italia-backend_elasticsearch elasticsearch/plugins/search-guard-6/tools/hash.sh -p <password>
  3. insert the kibana password in kibana/config/kibana.yml

  4. configure the nginx proxy for the elasticsearch host with the following directives:

    limit_req_zone $binary_remote_addr zone=elasticsearch_limit:10m rate=10r/s;
    
    server {
        ...
        location / {
            limit_req zone=elasticsearch_limit burst=20 nodelay;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_pass http://localhost:9200;
            proxy_ssl_session_reuse off;
            proxy_cache_bypass $http_upgrade;
            proxy_redirect off;
        }
    }
    
  5. you might need to type sysctl -w vm.max_map_count=262144 and make this permanent in /etc/sysctl.conf in order to start elasticsearch, as documented here

  6. start the Docker stack: make up

Crawler

  1. cd crawler
  2. Fill your domains.yml file with configuration values (like specific host basic auth tokens)
  3. Rename config.toml.example to config.toml and fill the variables
  4. build the crawler binary: make
  5. start the crawler: bin/crawler crawl whitelist/*.yml
  6. configure in crontab as desired

Tools

  • bin/crawler updateipa downloads IPA data and writes it into Elasticsearch
  • bin/crawler download-whitelist downloads orgs and repos from the onboarding portal and writes them to a whitelist file

Troubleshooting

  • From docker logs seems that Elasticsearch container needs more virtual memory and now it's Stalling for Elasticsearch....

    Increase container virtual memory: https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode

  • When trying to make build the crawler image, a fatal memory error occurs: "fatal error: out of memory"

    Probably you should increase the container memory: docker-machine stop && VBoxManage modifyvm default --cpus 2 && VBoxManage modifyvm default --memory 2048 && docker-machine stop

Development

In order to access Elasticsearch with write permissions from the outside, you can forward the 9200 port via SSH using ssh -L9200:localhost:9200 and configure ELASTIC_URL = "http://localhost:9200/" in your local config.toml.

See also

Authors

Developers Italia is a project by AgID and the Italian Digital Team, which developed the crawler and maintains this repository.

You can’t perform that action at this time.