Skip to content

eea/eea.checklinks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

eea.docker.checklinks

Simple dockerised python application that takes in a list of urls, extract urls and checks the links for http status code by retrieving the HEAD data from the host.

How to use it

it requires docker engine installed.

  1. git clone this repo
  2. cd <reponame>
  3. docker build -t link-checker .
  4. docker run -it --rm -v path-to-data-folder:/checklinks/app/data:z -e "EXCLUDE_LINKS=.europa.eu" --name my-running-linkchecker linkchecker

The "path-to-data-folder" is a path to a folder on your host where you must make available a file (urls-to-analyze.txt) with urls. The file must contain one page url per line.

The tool will scan each page html and extract links from the page.

If the optional variable EXCLUDE_LINKS is passed the urls containing that string wil be skipped for checking. This can be useful if you want to extract and check only external links from your site. In this last case you pass the environment variable EXCLUDE_LINKS=yourdomain.com.

At the end the tool reports each link status code (200, 301, 404 etc.).

Releases

No releases published

Packages

No packages published

Languages