Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

A simple script to collect slurp dumps for loadtest, #1473

Merged
merged 3 commits into from Jan 20, 2017

Conversation

morlovich
Copy link
Contributor

with help of PhantomJs

(Somewhat derived from the siege helper scripts)

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you been able to run a load-test using the output of this?


<VirtualHost localhost:8080>
# Turn on mod_pagespeed. To completely disable mod_pagespeed, you
# can set this to "off".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably can remove this and other boilerplate comments from this config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

echo "Usage: loadtest_collect/loadtest_collect_corpus.sh pages.txt out.tar.bz2"
}

set -u # exit the script if any variable is uninitialized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about an option to scrape the alexa 500. Something like this is a start, but we need to do a bit more sedding to strip out the site name properly:

wget -q -O - www.alexa.com/topsites/global:0 | grep href="/siteinfo/

The '0' in global:0 can be paged up to 19, giving you the alexa 500.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems brittle...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, and not required for this CL. I still think it'd be useful as a follow-up even if a "this is brittle" comment was put on it.

Copy link
Contributor

@jeffkaufman jeffkaufman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this install phantomjs if we don't have it yet?

@morlovich
Copy link
Contributor Author

Having some difficulty commenting on other things directly, so let's just do it manually:

Have you been able to run a load-test using the output of this?

Yep

Can you make this install phantomjs if we don't have it yet?

For what platforms? Should it go through the fancy install script things?

@jeffkaufman
Copy link
Contributor

For what platforms? Should it go through the fancy install script things?

Everything else with the dev environment assumes ubuntu 14, so it's fine with me if you just assume that as well. At some point we may want to make the scripts under devel/ support other distros, but for now I think people should just plan on using a VM.

- Cleanup redundant comments in config
- Install phantomjs if needed
@morlovich
Copy link
Contributor Author

phantomjs install, config comments cleanup pushed.

Copy link
Contributor

@jeffkaufman jeffkaufman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

# some websites with the help of phantomjs.

function usage {
echo "Usage: loadtest_collect/loadtest_collect_corpus.sh pages.txt out.tar.bz2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment that pages.txt is a file of URLs, one per line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

fi

if [ ! $(which phantomjs) ]; then
sudo apt-get install phantomjs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about echoing what you are doing first. If you run this from CentOS I think it'll just fail saying "apt-get command not found", and the user won't know it's just trying to install phantom without debugging the script. With the echo at least the CentOS user could just install it manually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not quite that bad since you would get something like "sudo: apt-getttt: command not found"...
... but there is still a problem in that it just has sudo asking for a password for no good reason, so I've added an informational message. (or rather will have added once I push this, not at time of this comment)

(I am not a fan of the echo foo \n foo pattern since it duplicates the 'foo')

@morlovich
Copy link
Contributor Author

(PTAL at the tiny doc changes)

@jeffkaufman
Copy link
Contributor

(PTAL at the tiny doc changes)

LGTM; ready to go in

@morlovich morlovich merged commit 478fb85 into master Jan 20, 2017
@morlovich morlovich deleted the morlovich-loadtest-collect-corpus branch January 20, 2017 21:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants