Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


Crawls and extracts JSON-LD and Microdata from a given website. The extracted information is stored on a JSON file and optionally can be stored on a Elasticsearch local service.

How to use it:

Use example:

./bioschemas-gocrawlit_mac_64 -p -u ""
./bioschemas-gocrawlit_mac_64 -q -u
./bioschemas-gocrawlit_mac_64 -u

A folder "bioschemas_gocrawlit_cache" will be created on the current path of execution; This folder contains crawled website information in order to prevent multiple download of pages. Is safe to delete this folder.


Scraped data will be stored in a json file named <website_host>_schema.json on the current program folder.

Available commands

  • -p: Stay on current path. i.e. When crawling a page like and don't want it to crawl the whole website, e.g.
  • -m: Max number of recursion depth of visited URLs. Default infinity recursion. (The crawler does not revisit URLs)
  • -e: Adds crawled data to an Elasticsearch (v6) service at
  • -u: Start page to start crawling.
  • -q: Remove query section from the link URL found.
  • --query: Use with -q so it follows only links that contain the query word provided, e.g., ./bioschemas-gocrawlit_mac_64 -u -q --page page
  • -h: Print Help and exit.

Building binaries

To create a binary for your current SO use:

make build

To create a binary for windows, macos and linux SO use:

make build-all

The binaries would be placed under build/ path.

Elasticsearch quick setup DOCKER

Steps for starting dockerized elasticsearch and kibana locally. This requires Docker.

Create a custom network for your elastic-stack:

docker network create elastic-stack

Pull and run an elasticsearch image:

docker run -it --network=elastic-stack -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" --name elasticsearch

Avoid changing the containers name since Kibana docker image points by default to http://elasticsearch:9200.

Pull and run an elasticsearch image:

docker run --network=elastic-stack --rm -it -p 5601:5601 --name kibana

Remember the --rm flag will delete the container once it is stoped.


  • Crawl website
  • URL by command line parameters
  • JSON-LD Extraction
  • Microdata extraction
  • Better file output
  • Sitemap.xml Crawl option
  • Pagination option
  • Conecting to a flexible storage
  • RDFa extraction support
  • Writing file as it scraps