Skip to content

How to use the LinkScannerCommand

Mirko Pagliai edited this page May 8, 2020 · 4 revisions

LinkScannerCommand is a command to to scan links.

For help on the command, run it with the option --help:

$ bin/cake link_scanner --help
Performs a complete scan

Usage:
cake link_scanner [options]

Options:

--export, -e                   Export results. The filename will be
                               generated automatically
--export-only-bad-results      Only negative results will be exported
                               (status code 400 or 500). This allows you
                               to save space for exported files
--export-with-filename         Export results. You must pass a relative
                               or absolute path
--follow-redirects             Follows redirect
--force, -f                    Force mode: removes the lock file and
                               does not ask questions
--full-base-url                Full base url. By default, the
                               `App.fullBaseUrl` value will be used
--help, -h                     Display this help.
--max-depth, -d                Maximum depth of the scan. Default: 0
--no-cache                     Disables the cache
--no-external-links            Disable the scanning of external links
--quiet, -q                    Enable quiet output.
--timeout, -t                  Timeout in seconds for GET requests.
                               Default: 30
--verbose, -v                  Enable verbose output

Now an example of a scan:

$ bin/cake link_scanner -d 2 --no-cache -e -v -f -t 3 --full-base-url http://google.com

Explanation of options:

  • the scan will have a maximum depth of 2 levels (-d 2);
  • the scan will not use the cache (--no-cache);
  • the results will be exported to a file. The filename will be generated automatically (-e);
  • uses the force mode: removes the lock file and does not ask questions (-f);
  • timeout of 3 second for each GET request (-t 3);
  • the scan will take place starting from http://google.com (--full-base-url http://google.com).

Output:

-------------------------------------------------------------------------------  
Scan started for http://google.com at 2019-03-01 16:05:49  
-------------------------------------------------------------------------------  
The cache is disabled  
Force mode is enabled  
Scanning of external links is enabled  
Redirects will not be followed  
Maximum depth of the scan: 2  
Timeout in seconds for GET requests: 30  
-------------------------------------------------------------------------------  
Checking http://google.com ...OK  
Link found: http://google.it/imghp?hl=it&tab=wi  
Checking http://google.it/imghp?hl=it&tab=wi ...OK  
Link found: http://maps.google.it/maps?hl=it&tab=wl  
Checking http://maps.google.it/maps?hl=it&tab=wl ...OK  
Link found: https://play.google.com/?hl=it&tab=w8  
Checking https://play.google.com/?hl=it&tab=w8 ...OK  
Link found: http://youtube.com/?gl=IT&tab=w1  
Checking http://youtube.com/?gl=IT&tab=w1 ...301  
Link found: http://news.google.it/nwshp?hl=it&tab=wn  
Checking http://news.google.it/nwshp?hl=it&tab=wn ...301  
Link found: https://mail.google.com/mail/?tab=wm  
Checking https://mail.google.com/mail/?tab=wm ...OK  
Link found: https://drive.google.com/?tab=wo  
Checking https://drive.google.com/?tab=wo ...OK  
Link found: https://google.it/intl/it/about/products?tab=wh  
Checking https://google.it/intl/it/about/products?tab=wh ...302  
Link found: http://google.it/history/optout?hl=it  
Checking http://google.it/history/optout?hl=it ...302  
Link found: http://google.com/preferences?hl=it  
Checking http://google.com/preferences?hl=it ...301  
Link found: https://accounts.google.com/ServiceLogin?hl=it&passive=true&continue=http://www.google.com/  
Checking https://accounts.google.com/ServiceLogin?hl=it&passive=true&continue=http://www.google.com/ ...OK  
Link found: http://google.com/advanced_search?hl=it&authuser=0  
Checking http://google.com/advanced_search?hl=it&authuser=0 ...301  
Link found: http://google.com/language_tools?hl=it&authuser=0  
Checking http://google.com/language_tools?hl=it&authuser=0 ...301  
Link found: http://google.com/intl/it/ads/  
Checking http://google.com/intl/it/ads/ ...301  
Link found: http://google.it/intl/it/services/  
Checking http://google.it/intl/it/services/ ...OK  
Link found: http://google.com/intl/it/about.html  
Checking http://google.com/intl/it/about.html ...301  
Link found: http://google.com/setprefdomain?prefdom=IT&prev=http://www.google.it/&sig=K_uHUsA6Be7Q8qMIY4byVHjtH5f00%3D  
Checking http://google.com/setprefdomain?prefdom=IT&prev=http://www.google.it/&sig=K_uHUsA6Be7Q8qMIY4byVHjtH5f00%3D ...404  
Link found: http://google.com/intl/it/policies/privacy/  
Checking http://google.com/intl/it/policies/privacy/ ...OK  
Link found: http://google.com/intl/it/policies/terms/  
Checking http://google.com/intl/it/policies/terms/ ...OK  
Link found: http://google.com/images/branding/googlelogo/1x/googlelogo_white_background_color_272x92dp.png  
Checking http://google.com/images/branding/googlelogo/1x/googlelogo_white_background_color_272x92dp.png ...404  
-------------------------------------------------------------------------------  
Scan completed at 2019-03-01 16:06:01  
Elapsed time: 12 seconds  
Total scanned links: 21  
-------------------------------------------------------------------------------  
Results have been exported to /home/mirko/Server/mirkopagliai/tmp/results_google.com_1551456349