A simple script to collect slurp dumps for loadtest, #1473
Conversation
with help of PhantomJs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you been able to run a load-test using the output of this?
|
||
<VirtualHost localhost:8080> | ||
# Turn on mod_pagespeed. To completely disable mod_pagespeed, you | ||
# can set this to "off". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably can remove this and other boilerplate comments from this config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
echo "Usage: loadtest_collect/loadtest_collect_corpus.sh pages.txt out.tar.bz2" | ||
} | ||
|
||
set -u # exit the script if any variable is uninitialized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about an option to scrape the alexa 500. Something like this is a start, but we need to do a bit more sedding to strip out the site name properly:
wget -q -O - www.alexa.com/topsites/global:0 | grep href="/siteinfo/
The '0' in global:0 can be paged up to 19, giving you the alexa 500.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems brittle...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, and not required for this CL. I still think it'd be useful as a follow-up even if a "this is brittle" comment was put on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this install phantomjs if we don't have it yet?
Having some difficulty commenting on other things directly, so let's just do it manually:
Yep
For what platforms? Should it go through the fancy install script things? |
Everything else with the dev environment assumes ubuntu 14, so it's fine with me if you just assume that as well. At some point we may want to make the scripts under |
- Cleanup redundant comments in config - Install phantomjs if needed
phantomjs install, config comments cleanup pushed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
# some websites with the help of phantomjs. | ||
|
||
function usage { | ||
echo "Usage: loadtest_collect/loadtest_collect_corpus.sh pages.txt out.tar.bz2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment that pages.txt is a file of URLs, one per line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
fi | ||
|
||
if [ ! $(which phantomjs) ]; then | ||
sudo apt-get install phantomjs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
about echoing what you are doing first. If you run this from CentOS I think it'll just fail saying "apt-get command not found", and the user won't know it's just trying to install phantom without debugging the script. With the echo at least the CentOS user could just install it manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not quite that bad since you would get something like "sudo: apt-getttt: command not found"...
... but there is still a problem in that it just has sudo asking for a password for no good reason, so I've added an informational message. (or rather will have added once I push this, not at time of this comment)
(I am not a fan of the echo foo \n foo pattern since it duplicates the 'foo')
(PTAL at the tiny doc changes) |
LGTM; ready to go in |
with help of PhantomJs
(Somewhat derived from the siege helper scripts)