Marill -- Automated site testing utility.
Marill is an automated site testing utility, which is meant to make administrators lives easier by taking much of the leg-work out of testing. It's intended to be lightweight, flexible, and easy to use, while still being very powerful.
Table of Contents
- How does it work?
- Getting started
- Frequently Asked Questions
- Will it cause high load?
- How long does Marill take to crawl sites (e.g. 1,000 sites on a server)?
- Is it better to run Marill from inside of the server, or from a remote location?
- Can I give Marill a custom IP address for which to crawl a site (before it goes live and DNS is updated)?
- Can I give Marill a custom port for which to crawl a site?
- Can Marill crawl sub-domains and sub-folders?
Often times during server administration, migrations, and large server changes, things can and will go wrong. Servers are complex systems with many working parts, and with that comes a lot of breakage.
Creating an automated site testing utility, like Marill, allows:
- Less human interaction to test sites.
- Integration and flexibility to be built into other systems.
- Clients to be at ease; you know they won't test all of their sites.
- Administrators or developers to hate you less.
- You to be witty and say "time is money!".
Disclaimer: Marill is still in early development, and this list is subject to change drastically. (code, libraries, tools, cli-args, etc)
- Cross platform: Marill can compile across many platforms. linux, netbsd, openbsd, freebsd, and more. (Windows too, possibly!)
- Configurable output. Only output what you need.
- Many cli flags to configure input, output, what is tested, what isn't, etc.
- Ability to test cPanel based servers, Apache, Nginx (coming soon!), and
others! (any can be scanned with
- Flexible testing system. You can even write your own tests! Load them from a URL in JSON format, or from a directory! (see marill/tests)
There are a few limitations with Marill, due to how the utility was developed. Marill was meant to be lightweight, and portable. This means it cannot work exactly like a normal browser. Below are a few examples:
- Marill currently isn't able to take a screenshot of the site. However, there many other external resources for this. (usecase: pixel-by-pixel comparison -- easily tell if CSS is broken)
- Marill cannot and will not load certain resources. E.g. videos, iframes, embeds, ftp links, etc. This would make crawling the site very complex and convoluted. (however, embed plugins within things like wordpress could possibly be caught, if a test were to be written to search for bad tags)
- Marill cannot search through webserver, PHP, or other misc. logs to determine what the issue may be. This will likely never change, because adding this functionality would make the utility fragile and clunky. If there is an error, you should be able to find out what is causing it.
How does it work?
The general idea is that you place Marill on the server you would like to
test. Marill by default will then figure out the list of domains that server
is hosting. Marill will then begin to act much like a browser, crawling each
--assets is used).
It will then pass each resource it fetches through the list of builtin, or
external tests. Each domain is given a starting score of 10, and each test
has a pre-defined weight. If the test matches, that score is applied to the
main score. If the score falls below the minimum configured score, it is
Here are a few examples of tests that are useful:
- Visual PHP errors on the page. For example:
Warning: Invalid argument supplied for function() in /path/to/some/file.php
- Invalid status codes. For example,
Internal Service Error,
- Blank pages generated by PHP (common if PHP has
- MySQL or PostgreSQL related errors.
- cPanel "Sorry!" related pages (common if the incorrect IP is configured for example).
Check out the releases page for prebuilt versions. Below are example commands of how you would install the utility. Some of the more popular OS/distro steps are provided below, but there are more released versions on the releases page previously mentioned.
$ wget https://liam.sh/ghr/marill_0.1.1_linux_amd64.deb $ dpkg -i marill_0.1.1_linux_amd64.deb
$ yum localinstall https://liam.sh/ghr/marill_0.1.1_linux_amd64.rpm
Some older CentOS versions may require (if you get
Cannot open: <url>. Skipping.):
$ wget https://liam.sh/ghr/marill_0.1.1_linux_amd64.rpm $ yum localinstall marill_0.1.1_linux_amd64.rpm
$ wget https://liam.sh/ghr/marill_0.1.1_linux_amd64.tar.gz $ tar -C /usr/bin/ -xzvf marill_0.1.1_linux_amd64.tar.gz marill $ chmod +x /usr/bin/marill
Build From Source
Dependencies (to build from source only):
- Go (1.9 or greater, though latest
preferred). Ensure your
# you can "git clone" the repo too, just make sure it's following this directory # structure. $ go get -d -u github.com/lrstanley/marill $ cd $GOPATH/src/github.com/lrstanley/marill # this will show you all of the available options (to fetch dependencies, # run in debug mode, etc.) $ make help $ make $ ./marill --help
This is very likely to change quite a bit until we're out of beta. Please use wisely.
$ marill --help NAME: marill - Automated website testing utility USAGE: marill [global options] command [command options] [arguments...] VERSION: git revision XXXXXX AUTHOR(S): Liam Stanley <firstname.lastname@example.org> COMMANDS: scan [DEFAULT] Start scan for all domains on server urls, domains Print the list of urls as if they were going to be scanned tests Print the list of tests that are loaded and would be used help, h Shows a list of commands or help for one command GLOBAL OPTIONS: -d, --debug Print debugging information to stdout -q, --quiet Do not print regular stdout messages --no-color Do not print with color --no-banner Do not print the colorful banner --show-warnings Show a warning if one or more test failed, even if it didn't drop below min-score --exit-on-fail Send exit code 1 if any domains fail tests --log FILE Log information to FILE --debug-log FILE Log debugging information to FILE --result-file FILE Dump result template into FILE (will overwrite!) --no-updates Don't check to see if there are updates --threads n Use n threads to fetch data (0 defaults to server cores/2) (default: 0) --delay DURATION Delay DURATION before each resource is crawled (e.g. 5s, 1m, 100ms) (default: 0s) --http-timeout DURATION DURATION before an http request is timed out (e.g. 5s, 10s, 1m) (default: 10s) --domains DOMAIN:IP ... Manually specify list of domains to scan in form: DOMAIN:IP ..., or DOMAIN:IP:PORT --min-score value Minimum score for domain (default: 8) -a, --assets Crawl assets (css/js/images) for each page --ignore-success Only print results if they are considered failed --allow-insecure Don't check to see if an SSL certificate is valid --tmpl value Golang text/template string template for use with formatting scan output --json PATH Optional PATH to output json results to --json-pretty Used with [--json], pretty-prints the output json --ignore-http Ignore http-based URLs during domain search --ignore-https Ignore https-based URLs during domain search --ignore-remote Ignore all resources that resolve to a remote IP (use with --assets) --ignore-domains GLOB Ignore URLS during domain search that match GLOB, pipe separated list --match-domains GLOB Allow URLS during domain search that match GLOB, pipe separated list --ignore-test GLOB Ignore tests that match GLOB, pipe separated list --match-test GLOB Allow tests that match GLOB, pipe separated list --tests-url URL Import tests from a specified URL --tests-path PATH Import tests from a specified file-system PATH --ignore-std-tests Ignores all built-in tests (useful with --tests-url) --pass-text GLOB Give sites a +10 score if body matches GLOB --fail-text GLOB Give sites a -10 score if body matches GLOB --help, -h show help --version, -v print the version COPYRIGHT: (c) 2016 Liam Stanley
Getting started with Marill should be fairly easy. Since Marill is a single binary, there are no dependencies that are needed for the utility to run.
Head to this page and download the top item in the list. For example, using the latest version:
$ wget -q -O- https://release.liam.sh/marill/latest.tar.gz | tar -zx -C /root/tmp/
You should now see a file named marill in
/root/tmp/. Feel free to look
over the current flags and arguments:
$ /root/tmp/marill --help
The main arguments that may be useful are:
--debug: This will enable debugging. It doesn't provide a whole lot more information, but can help if something isn't working.
--delay: Utilize this if the load caused by the crawling is too high. E.g.
--threads: This is the amount of parallel scans that will run at a single time. By default it will be 1/2 the amount of cores on the server.
--match-domains: utilize these to skip or only scan certain domains during the crawl. E.g.
So, for example, to start off with:
$ /root/tmp/marill -a
cPanel/Apache based servers
Marill has out of the box support for cPanel based servers (though things like
For Apache, Marill will find the current running httpd instance, and run
<binary> -S, which pulls information about all virtual host entries. Note
that this isn't supported on all Apache versions (see
here for more information).
Alternatives (Nginx, Caddy, etc)
If your web server does not match the above description, you can utilize the manual domain list flag of Marill. The current syntax for this is as follows:
$ marill --domains "<items>"
Replace <items> with one of the following list of inputs:
DOMAIN can be any of one of the following examples:
So, to put it all together, you can do something like:
$ marill a --domains "somedomain.com:443 domain.com:1234 example.com:123.456.7.89:80 https://domain.com/"
Things to note/Troubleshooting
- If there are any problems or bugs, PLEASE LET ME KNOW! You can submit bugs if you have a Github account here or here if you do not
Will it cause high load?
- The general target at which this was written for are servers under
maintenance, or being ran on a new server that is being migrated to.
That being said, Marill does run scans in parallel. It will run scans
in parallel in the amount of cores divided by 2. (8 core server, 4
concurrent crawls, 2 core server, 1 crawl at a time). If you see Marill
still causing too much load, you can utilize
- The general target at which this was written for are servers under maintenance, or being ran on a new server that is being migrated to. That being said, Marill does run scans in parallel. It will run scans in parallel in the amount of cores divided by 2. (8 core server, 4 concurrent crawls, 2 core server, 1 crawl at a time). If you see Marill still causing too much load, you can utilize
How long does Marill take to crawl sites (e.g. 1,000 sites on a server)?
- Given a cPanel server, is must be noted that along with the input
(default http) version of a domain, the https version of the site will
be scanned as well if cPanel has a certificate for it. Furthermore, it
will also attempt to crawl www.domain.com, not just domain.com. As for
other webservers, it all depends on the input. Please note that
-a), that Marill will take longer. This is because this fetches all resources for each site being crawled. If you would like Marill to crawl faster, don't use
- Generally speaking, crawling without
-ais fairly fast.
- Given a cPanel server, is must be noted that along with the input (default http) version of a domain, the https version of the site will be scanned as well if cPanel has a certificate for it. Furthermore, it will also attempt to crawl www.domain.com, not just domain.com. As for other webservers, it all depends on the input. Please note that using
Is it better to run Marill from inside of the server, or from a remote location?
- Running remotely ensures there are no ip or firewall related issues, however in the same sense if you are crawling quite a few sites, many servers may assume due to the high connection count, that your connections are malicious.
- If ran from inside of the server, Marill can scan and determine what the server is hosting (by checking Apache, cPanel, etc).
Can I give Marill a custom IP address for which to crawl a site (beforeit goes live and DNS is updated)?
- Yes! For example, rather than
--domains "domain.com domain2.com", you can do something like:
$ marill --domains "domain.com:126.96.36.199 domain2.com:188.8.131.52"
- Also note that you can run scans on alternate ports:
$ marill --domains "domain.com:184.108.40.206:8080 domain.com:9000"
- Yes! For example, rather than
Can I give Marill a custom port for which to crawl a site?
- Yes! see FAQ #4
Can Marill crawl sub-domains and sub-folders?
- Yes! You can pass any url into
--domainsas necessary. For example:
$ marill --domains "https://domain.com/sub/folder/some-page"
- or with a custom ip as well:
$ marill --domains "https://domain.com/sub/folder/some-page:220.127.116.11"
- Yes! You can pass any url into
Please review the CONTRIBUTING doc for submitting issues/a guide on submitting pull requests and helping out.
LICENSE: The MIT License (MIT) Copyright (c) 2016 Liam Stanley <email@example.com> Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.