Permalink
Browse files

New config & more thorough parser

  • Loading branch information...
1 parent 68af508 commit 2ef04a79c9666d6af5ce4253df9bdbc91450ca15 @jamesgpearce committed Dec 23, 2011
Showing with 162 additions and 53 deletions.
  1. +1 −0 .gitignore
  2. +68 −22 README.md
  3. +85 −31 confess.js
  4. +8 −0 config.json
View
@@ -0,0 +1 @@
+phantomjs
View
@@ -1,7 +1,7 @@
# confess.js
-A small script library that uses [PhantomJS 1.2](http://www.phantomjs.org/) to
-headlessly analyze web pages.
+A small script library that uses [PhantomJS 1.2](http://www.phantomjs.org/) (or
+later) to headlessly analyze web pages.
One useful application of this is to enumerate a web app's resources for the
purposes of creating a cache manifest file to make your apps run offline. So
@@ -17,17 +17,27 @@ For example...
# This manifest was created by confess.js, http://github.com/jamesgpearce/confess
#
- # Time: Fri Sep 02 2011 23:25:49 GMT-0700 (PDT)
- # Requested URL: http://functionsource.com
+ # Time: Fri Dec 23 2011 13:12:32 GMT-0800 (PST)
# Retrieved URL: http://functionsource.com/
- # User-agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US) AppleWebKit/533.3 (KHTML, like Gecko) PhantomJS/1.2.0 Safari/533.3
+ # User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X) AppleWebKit/534.34 (KHTML, like Gecko) PhantomJS/1.4.0 ...
+ #
+ # Config:
+ # task: manifest
+ # userAgent: default
+ # wait: 0
+ # consolePrefix: #
+ # cacheFilter: .*
+ # networkFilter: null
+ # url: http://functionsource.com
+ # configFile: config.json
CACHE:
/images/icons/netflix.png
/javascripts/lib/legacy.js
/stylesheets/light.css
/stylesheets/screen.css
/stylesheets/syntax.css
+ http://functionscopedev.files.wordpress.com/2011/12/dabblet.png
http://functionsource.com/images/avatars/ben.png
http://functionsource.com/images/avatars/dion.png
http://functionsource.com/images/avatars/kevin.png
@@ -40,36 +50,42 @@ For example...
http://functionsource.com/images/icons/podcast.png
http://functionsource.com/images/icons/rss.png
http://functionsource.com/images/icons/twitter.png
- http://use.typekit.com/k/tqz3zpc-b.css
+ http://use.typekit.com/k/tqz3zpc-b.css?3bb2a6e53c9684f...
http://use.typekit.com/tqz3zpc.js
http://www.google-analytics.com/ga.js
NETWORK:
*
-You can also set the user-agent header of the request made by PhantomJS to
-request the page, in case you're serving mobile apps off similar entry-point
-URLs to your desktop content. It's the optional second parameter.
+Using the local config.json file, you can also affect the behavior of the way in
+which confess.js runs.
+
+For example, you can set the user-agent header of the request made by PhantomJS
+to request the page, in case you're serving mobile apps off similar entry-point
+URLs to your desktop content.
+
+Similarly, you can use filters to indicate which files should be included or
+excluded from the generated CACHE list.
## Installation & usage
-The one and only dependency is [PhantomJS 1.2](http://www.phantomjs.org/).
-Install this, and ensure it's all good by trying out some of its example
-scripts.
+The one and only dependency is [PhantomJS](http://www.phantomjs.org/), version
+1.2 or later. Install this, and ensure it's all good by trying out some of its
+example scripts.
Then, assuming <code>phantomjs</code> is on your path, and from the directory
-containing <code>confess.js</code>, run the tasks with:
+containing <code>confess.js</code> and <code>config.json</code>, run the tasks
+with:
- > phantomjs confess.js URL [UA [TASK]]
+ > phantomjs confess.js URL [CONFIG]
-Where <code>URL</code> is mandatory, and points to the app you're analyzing.
-<code>UA</code> is the user-agent you'd like to use, and which defaults to
-PhantomJS' WebKit string. <code>TASK</code> is the type of analysis you'd like
-confess.js to perform, but right now it can only be <code>'manifest'</code>, the
-default.
+Where <code>URL</code> is mandatory, and points to the page or app you're
+analyzing. <code>CONFIG</code> is the location of an alternative configuration
+file, if you don't want to use the default <code>config.json</code>.
-This loads the page, then searches the DOM (and the CSS) for references to any
-external resources that the app needs.
+This loads the page, then searches the DOM and the CSSOM (and then the results
+of applying the latter to the former) for references to any external resources
+that the app needs.
The results go to stdout, but of course you can pipe it to a file. If you want
to create a cache manifest for an app, this might be called something like
@@ -88,4 +104,34 @@ manifest in the <code>html</code> element:
content type of <code>text/cache-manifest</code>.)
To check the resulting manifest's syntax, you might like to use Frederic
-Hemberger's great [cache manifest validator](http://manifest-validator.com/).
+Hemberger's great [cache manifest validator](http://manifest-validator.com/).
+
+## Configuration
+
+The following is the default <code>config.json</code> file, but you can of
+course alter any of the values in this file, or a new config file of your own.
+
+ {
+ "task": "manifest",
+ "userAgent": "default",
+ "wait": 0,
+ "consolePrefix": "#",
+ "cacheFilter": ".*",
+ "networkFilter": null
+ }
+
+The properties are defined as follows:
+
+ * <code>task</code> - the type of task you want confess.js to perform. "manifest" is the only supported value
+
+ * <code>userAgent</code> - the user-agent to make the request as, or "default" to use Phantom's usual user-agent string
+
+ * <code>wait</code> - the number of milliseconds to wait after the document has loaded before parsing for resources. This might be useful if you know that a deferred script might be making relevant additions to the DOM.
+
+ * <code>consolePrefix</code> - if set, confess.js will output the *browser's* console to the standard output. Useful for detecting if there are also any issues with the app's execution itself.
+
+ * <code>cacheFilter</code> - a regex to indicate which files to include in the <code>CACHE</code> block of the manifest. If set to <code>null</code>, none will. As a better example, "<code>\\.png$</code>" will indicate that only PNG files should be cached. (Note the double escaping: once for the regex, and once for the JSON.)
+
+ * <code>networkFilter</code> - a regex to indicate which files *not* to include in the <code>CACHE</code> block of the manifest, and which a browser will request from the network. If set to <code>null</code>, none will. Note that matching files will *not* be explicitly listed in the <code>NETWORK</code> block of the manifest, since there is always a catch-all <code>*</code> wildcard added.
+
+
View
@@ -1,55 +1,70 @@
+var fs = require('fs');
var confess = {
run: function () {
- this.settings = {};
- if (!this.utils.processArgs(this.settings, [
+ var cliConfig = {};
+ if (!this.utils.processArgs(cliConfig, [
{name:'url', def:"http://google.com", req:true, desc:"the URL of the app to cache"},
- {name:'ua', def:"[default]", req:false, desc:"the user-agent used to request the app"},
- {name:'task', def:'manifest', req:false, desc:"the task to be performed (currently only 'manifest')"}
+ {name:'configFile', def:"config.json", req:false, desc:"a local configuration file of further confess settings"},
])) {
phantom.exit();
return;
}
+ this.config = this.utils.mergeConfig(cliConfig, cliConfig.configFile);
- var task = this[this.settings.task];
+ var task = this[this.config.task];
- this.utils.load(this.settings.url, this.settings.ua,
+ this.utils.load(
+ this.config,
task.pre,
task.post,
this
);
},
manifest: {
- pre: function (page) { },
- post: function (page, status) {
+ pre: function (page, config) { },
+ post: function (page, status, config) {
if (status!='success') {
console.log('# FAILED TO LOAD');
return;
}
+
+ var key, url,
+ neverMatch = "(?!a)a",
+ cacheRegex = new RegExp(config.cacheFilter || neverMatch),
+ networkRegex = new RegExp(config.networkFilter || neverMatch);
+
console.log('CACHE MANIFEST\n');
console.log('# This manifest was created by confess.js, http://github.com/jamesgpearce/confess');
console.log('#');
- console.log('# Time: ' + new Date());
- console.log('# Requested URL: ' + this.settings.url);
+ console.log('# Time: ' + new Date());
console.log('# Retrieved URL: ' + this.getFinalUrl(page));
- console.log('# User-agent: ' + page.settings.userAgent);
+ console.log('# User-agent: ' + page.settings.userAgent);
+ console.log('#');
+ console.log('# Config:');
+ for (key in config) {
+ console.log('# ' + key + ': ' + config[key]);
+ }
console.log('\nCACHE:');
for (url in this.getResourceUrls(page)) {
- console.log(url);
+ if (cacheRegex.test(url) && !networkRegex.test(url)) {
+ console.log(url);
+ }
};
console.log('\nNETWORK:\n*');
}
},
- getFinalUrl: function (page) {
+ getFinalUrl: function (page, config) {
return page.evaluate(function () {
return document.location.toString();
});
},
- getResourceUrls: function (page) {
+ getResourceUrls: function (page, status, config) {
+
return page.evaluate(function () {
var
// resources referenced in DOM
@@ -83,15 +98,18 @@ var confess = {
elements, elementsLength, e,
stylesheets, stylesheetsLength, s,
rules, rulesLength, r,
+ computed, computedLength, c,
value;
+ // attributes in DOM
selectors.forEach(function (selectorPair) {
elements = document.querySelectorAll(selectorPair[0]);
for (e = 0, elementsLength = elements.length; e < elementsLength; e++) {
tallyResource(elements[e].getAttribute(selectorPair[1]));
};
});
+ // URLs in stylesheets
stylesheets = document.styleSheets;
for (s = 0, stylesheetsLength = stylesheets.length; s < stylesheetsLength; s++) {
rules = stylesheets[s].rules;
@@ -107,52 +125,88 @@ var confess = {
};
};
+ // URLs in styles on DOM
+ elements = document.querySelectorAll('*');
+ for (e = 0, elementsLength = elements.length; e < elementsLength; e++) {
+ computed = elements[e].ownerDocument.defaultView.getComputedStyle(elements[e], '');
+ for (c = 0, computedLength = computed.length; c < computedLength; c++) {
+ value = computed.getPropertyCSSValue(computed[c]);
+ if (value && value.primitiveType == CSSPrimitiveValue.CSS_URI) {
+ tallyResource(value.getStringValue());
+ }
+ }
+ };
+
return resources;
});
},
-
utils: {
- load: function (url, ua, pre, post, scope) {
+ load: function (config, pre, post, scope) {
var page = new WebPage();
- page.onConsoleMessage = function (msg, line, src) {
- //console.log(msg + ' (' + src + ', #' + line + ')');
+ if (config.consolePrefix) {
+ page.onConsoleMessage = function (msg, line, src) {
+ console.log(config.consolePrefix + ' ' + msg + ' (' + src + ', #' + line + ')');
+ }
}
page.onLoadStarted = function () {
- pre.call(scope, page);
+ pre.call(scope, page, config);
};
page.onLoadFinished = function (status) {
- post.call(scope, page, status);
- phantom.exit();
+ if (config.wait) {
+ setTimeout(
+ function () {
+ post.call(scope, page, status, config);
+ phantom.exit();
+ },
+ config.wait
+ );
+ } else {
+ post.call(scope, page, status, config);
+ phantom.exit();
+ }
};
- if (ua != "[default]") {
- page.settings.userAgent = ua;
+ if (config.userAgent && config.userAgent != "default") {
+ page.settings.userAgent = config.userAgent;
}
- page.open(url);
+ page.open(config.url);
},
- processArgs: function (settings, contract) {
+ processArgs: function (config, contract) {
var a = 0;
+ var ok = true;
contract.forEach(function(argument) {
if (a < phantom.args.length) {
- settings[argument.name] = phantom.args[a];
+ config[argument.name] = phantom.args[a];
} else {
if (argument.req) {
console.log('"' + argument.name + '" argument is required. This ' + argument.desc + '.');
- return false;
+ ok = false;
+ } else {
+ config[argument.name] = argument.def;
}
- settings[argument.name] = (typeof argument.def==='function') ? argument.def.call(settings) : argument.def;
}
a++;
- return true;
});
- return (a > phantom.args.length);
+ return ok;
+ },
+
+ mergeConfig: function (config, configFile) {
+ if (!fs.exists(configFile)) {
+ configFile = "config.json";
+ }
+ var result = JSON.parse(fs.read(configFile)),
+ key;
+ for (key in config) {
+ result[key] = config[key];
+ }
+ return result;
}
}
}
-confess.run();
+confess.run();
View
@@ -0,0 +1,8 @@
+{
+ "task": "manifest",
+ "userAgent": "default",
+ "wait": 0,
+ "consolePrefix": "#",
+ "cacheFilter": "\\.png$",
+ "networkFilter": null
+}

0 comments on commit 2ef04a7

Please sign in to comment.