Skip to content
This repository has been archived by the owner on Feb 7, 2022. It is now read-only.

Commit

Permalink
New config & more thorough parser
Browse files Browse the repository at this point in the history
  • Loading branch information
jamesgpearce committed Dec 23, 2011
1 parent 68af508 commit 2ef04a7
Show file tree
Hide file tree
Showing 4 changed files with 162 additions and 53 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Original file line Diff line number Diff line change
@@ -0,0 +1 @@
phantomjs
90 changes: 68 additions & 22 deletions README.md
Original file line number Original file line Diff line number Diff line change
@@ -1,7 +1,7 @@
# confess.js # confess.js


A small script library that uses [PhantomJS 1.2](http://www.phantomjs.org/) to A small script library that uses [PhantomJS 1.2](http://www.phantomjs.org/) (or
headlessly analyze web pages. later) to headlessly analyze web pages.


One useful application of this is to enumerate a web app's resources for the One useful application of this is to enumerate a web app's resources for the
purposes of creating a cache manifest file to make your apps run offline. So purposes of creating a cache manifest file to make your apps run offline. So
Expand All @@ -17,17 +17,27 @@ For example...


# This manifest was created by confess.js, http://github.com/jamesgpearce/confess # This manifest was created by confess.js, http://github.com/jamesgpearce/confess
# #
# Time: Fri Sep 02 2011 23:25:49 GMT-0700 (PDT) # Time: Fri Dec 23 2011 13:12:32 GMT-0800 (PST)
# Requested URL: http://functionsource.com
# Retrieved URL: http://functionsource.com/ # Retrieved URL: http://functionsource.com/
# User-agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US) AppleWebKit/533.3 (KHTML, like Gecko) PhantomJS/1.2.0 Safari/533.3 # User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X) AppleWebKit/534.34 (KHTML, like Gecko) PhantomJS/1.4.0 ...
#
# Config:
# task: manifest
# userAgent: default
# wait: 0
# consolePrefix: #
# cacheFilter: .*
# networkFilter: null
# url: http://functionsource.com
# configFile: config.json


CACHE: CACHE:
/images/icons/netflix.png /images/icons/netflix.png
/javascripts/lib/legacy.js /javascripts/lib/legacy.js
/stylesheets/light.css /stylesheets/light.css
/stylesheets/screen.css /stylesheets/screen.css
/stylesheets/syntax.css /stylesheets/syntax.css
http://functionscopedev.files.wordpress.com/2011/12/dabblet.png
http://functionsource.com/images/avatars/ben.png http://functionsource.com/images/avatars/ben.png
http://functionsource.com/images/avatars/dion.png http://functionsource.com/images/avatars/dion.png
http://functionsource.com/images/avatars/kevin.png http://functionsource.com/images/avatars/kevin.png
Expand All @@ -40,36 +50,42 @@ For example...
http://functionsource.com/images/icons/podcast.png http://functionsource.com/images/icons/podcast.png
http://functionsource.com/images/icons/rss.png http://functionsource.com/images/icons/rss.png
http://functionsource.com/images/icons/twitter.png http://functionsource.com/images/icons/twitter.png
http://use.typekit.com/k/tqz3zpc-b.css http://use.typekit.com/k/tqz3zpc-b.css?3bb2a6e53c9684f...
http://use.typekit.com/tqz3zpc.js http://use.typekit.com/tqz3zpc.js
http://www.google-analytics.com/ga.js http://www.google-analytics.com/ga.js


NETWORK: NETWORK:
* *


You can also set the user-agent header of the request made by PhantomJS to Using the local config.json file, you can also affect the behavior of the way in
request the page, in case you're serving mobile apps off similar entry-point which confess.js runs.
URLs to your desktop content. It's the optional second parameter.
For example, you can set the user-agent header of the request made by PhantomJS
to request the page, in case you're serving mobile apps off similar entry-point
URLs to your desktop content.

Similarly, you can use filters to indicate which files should be included or
excluded from the generated CACHE list.


## Installation & usage ## Installation & usage


The one and only dependency is [PhantomJS 1.2](http://www.phantomjs.org/). The one and only dependency is [PhantomJS](http://www.phantomjs.org/), version
Install this, and ensure it's all good by trying out some of its example 1.2 or later. Install this, and ensure it's all good by trying out some of its
scripts. example scripts.


Then, assuming <code>phantomjs</code> is on your path, and from the directory Then, assuming <code>phantomjs</code> is on your path, and from the directory
containing <code>confess.js</code>, run the tasks with: containing <code>confess.js</code> and <code>config.json</code>, run the tasks
with:


> phantomjs confess.js URL [UA [TASK]] > phantomjs confess.js URL [CONFIG]


Where <code>URL</code> is mandatory, and points to the app you're analyzing. Where <code>URL</code> is mandatory, and points to the page or app you're
<code>UA</code> is the user-agent you'd like to use, and which defaults to analyzing. <code>CONFIG</code> is the location of an alternative configuration
PhantomJS' WebKit string. <code>TASK</code> is the type of analysis you'd like file, if you don't want to use the default <code>config.json</code>.
confess.js to perform, but right now it can only be <code>'manifest'</code>, the
default.


This loads the page, then searches the DOM (and the CSS) for references to any This loads the page, then searches the DOM and the CSSOM (and then the results
external resources that the app needs. of applying the latter to the former) for references to any external resources
that the app needs.


The results go to stdout, but of course you can pipe it to a file. If you want The results go to stdout, but of course you can pipe it to a file. If you want
to create a cache manifest for an app, this might be called something like to create a cache manifest for an app, this might be called something like
Expand All @@ -88,4 +104,34 @@ manifest in the <code>html</code> element:
content type of <code>text/cache-manifest</code>.) content type of <code>text/cache-manifest</code>.)


To check the resulting manifest's syntax, you might like to use Frederic To check the resulting manifest's syntax, you might like to use Frederic
Hemberger's great [cache manifest validator](http://manifest-validator.com/). Hemberger's great [cache manifest validator](http://manifest-validator.com/).

## Configuration

The following is the default <code>config.json</code> file, but you can of
course alter any of the values in this file, or a new config file of your own.

{
"task": "manifest",
"userAgent": "default",
"wait": 0,
"consolePrefix": "#",
"cacheFilter": ".*",
"networkFilter": null
}

The properties are defined as follows:

* <code>task</code> - the type of task you want confess.js to perform. "manifest" is the only supported value

* <code>userAgent</code> - the user-agent to make the request as, or "default" to use Phantom's usual user-agent string

* <code>wait</code> - the number of milliseconds to wait after the document has loaded before parsing for resources. This might be useful if you know that a deferred script might be making relevant additions to the DOM.

* <code>consolePrefix</code> - if set, confess.js will output the *browser's* console to the standard output. Useful for detecting if there are also any issues with the app's execution itself.

* <code>cacheFilter</code> - a regex to indicate which files to include in the <code>CACHE</code> block of the manifest. If set to <code>null</code>, none will. As a better example, "<code>\\.png$</code>" will indicate that only PNG files should be cached. (Note the double escaping: once for the regex, and once for the JSON.)

* <code>networkFilter</code> - a regex to indicate which files *not* to include in the <code>CACHE</code> block of the manifest, and which a browser will request from the network. If set to <code>null</code>, none will. Note that matching files will *not* be explicitly listed in the <code>NETWORK</code> block of the manifest, since there is always a catch-all <code>*</code> wildcard added.


116 changes: 85 additions & 31 deletions confess.js
Original file line number Original file line Diff line number Diff line change
@@ -1,55 +1,70 @@
var fs = require('fs');
var confess = { var confess = {


run: function () { run: function () {


this.settings = {}; var cliConfig = {};
if (!this.utils.processArgs(this.settings, [ if (!this.utils.processArgs(cliConfig, [
{name:'url', def:"http://google.com", req:true, desc:"the URL of the app to cache"}, {name:'url', def:"http://google.com", req:true, desc:"the URL of the app to cache"},
{name:'ua', def:"[default]", req:false, desc:"the user-agent used to request the app"}, {name:'configFile', def:"config.json", req:false, desc:"a local configuration file of further confess settings"},
{name:'task', def:'manifest', req:false, desc:"the task to be performed (currently only 'manifest')"}
])) { ])) {
phantom.exit(); phantom.exit();
return; return;
} }
this.config = this.utils.mergeConfig(cliConfig, cliConfig.configFile);


var task = this[this.settings.task]; var task = this[this.config.task];


this.utils.load(this.settings.url, this.settings.ua, this.utils.load(
this.config,
task.pre, task.pre,
task.post, task.post,
this this
); );
}, },


manifest: { manifest: {
pre: function (page) { }, pre: function (page, config) { },
post: function (page, status) { post: function (page, status, config) {
if (status!='success') { if (status!='success') {
console.log('# FAILED TO LOAD'); console.log('# FAILED TO LOAD');
return; return;
} }

var key, url,
neverMatch = "(?!a)a",
cacheRegex = new RegExp(config.cacheFilter || neverMatch),
networkRegex = new RegExp(config.networkFilter || neverMatch);

console.log('CACHE MANIFEST\n'); console.log('CACHE MANIFEST\n');
console.log('# This manifest was created by confess.js, http://github.com/jamesgpearce/confess'); console.log('# This manifest was created by confess.js, http://github.com/jamesgpearce/confess');
console.log('#'); console.log('#');
console.log('# Time: ' + new Date()); console.log('# Time: ' + new Date());
console.log('# Requested URL: ' + this.settings.url);
console.log('# Retrieved URL: ' + this.getFinalUrl(page)); console.log('# Retrieved URL: ' + this.getFinalUrl(page));
console.log('# User-agent: ' + page.settings.userAgent); console.log('# User-agent: ' + page.settings.userAgent);
console.log('#');
console.log('# Config:');
for (key in config) {
console.log('# ' + key + ': ' + config[key]);
}
console.log('\nCACHE:'); console.log('\nCACHE:');
for (url in this.getResourceUrls(page)) { for (url in this.getResourceUrls(page)) {
console.log(url); if (cacheRegex.test(url) && !networkRegex.test(url)) {
console.log(url);
}
}; };
console.log('\nNETWORK:\n*'); console.log('\nNETWORK:\n*');
} }
}, },


getFinalUrl: function (page) { getFinalUrl: function (page, config) {
return page.evaluate(function () { return page.evaluate(function () {
return document.location.toString(); return document.location.toString();
}); });
}, },


getResourceUrls: function (page) { getResourceUrls: function (page, status, config) {

return page.evaluate(function () { return page.evaluate(function () {
var var
// resources referenced in DOM // resources referenced in DOM
Expand Down Expand Up @@ -83,15 +98,18 @@ var confess = {
elements, elementsLength, e, elements, elementsLength, e,
stylesheets, stylesheetsLength, s, stylesheets, stylesheetsLength, s,
rules, rulesLength, r, rules, rulesLength, r,
computed, computedLength, c,
value; value;


// attributes in DOM
selectors.forEach(function (selectorPair) { selectors.forEach(function (selectorPair) {
elements = document.querySelectorAll(selectorPair[0]); elements = document.querySelectorAll(selectorPair[0]);
for (e = 0, elementsLength = elements.length; e < elementsLength; e++) { for (e = 0, elementsLength = elements.length; e < elementsLength; e++) {
tallyResource(elements[e].getAttribute(selectorPair[1])); tallyResource(elements[e].getAttribute(selectorPair[1]));
}; };
}); });


// URLs in stylesheets
stylesheets = document.styleSheets; stylesheets = document.styleSheets;
for (s = 0, stylesheetsLength = stylesheets.length; s < stylesheetsLength; s++) { for (s = 0, stylesheetsLength = stylesheets.length; s < stylesheetsLength; s++) {
rules = stylesheets[s].rules; rules = stylesheets[s].rules;
Expand All @@ -107,52 +125,88 @@ var confess = {
}; };
}; };


// URLs in styles on DOM
elements = document.querySelectorAll('*');
for (e = 0, elementsLength = elements.length; e < elementsLength; e++) {
computed = elements[e].ownerDocument.defaultView.getComputedStyle(elements[e], '');
for (c = 0, computedLength = computed.length; c < computedLength; c++) {
value = computed.getPropertyCSSValue(computed[c]);
if (value && value.primitiveType == CSSPrimitiveValue.CSS_URI) {
tallyResource(value.getStringValue());
}
}
};

return resources; return resources;
}); });
}, },





utils: { utils: {


load: function (url, ua, pre, post, scope) { load: function (config, pre, post, scope) {
var page = new WebPage(); var page = new WebPage();
page.onConsoleMessage = function (msg, line, src) { if (config.consolePrefix) {
//console.log(msg + ' (' + src + ', #' + line + ')'); page.onConsoleMessage = function (msg, line, src) {
console.log(config.consolePrefix + ' ' + msg + ' (' + src + ', #' + line + ')');
}
} }
page.onLoadStarted = function () { page.onLoadStarted = function () {
pre.call(scope, page); pre.call(scope, page, config);
}; };
page.onLoadFinished = function (status) { page.onLoadFinished = function (status) {
post.call(scope, page, status); if (config.wait) {
phantom.exit(); setTimeout(
function () {
post.call(scope, page, status, config);
phantom.exit();
},
config.wait
);
} else {
post.call(scope, page, status, config);
phantom.exit();
}
}; };
if (ua != "[default]") { if (config.userAgent && config.userAgent != "default") {
page.settings.userAgent = ua; page.settings.userAgent = config.userAgent;
} }
page.open(url); page.open(config.url);
}, },


processArgs: function (settings, contract) { processArgs: function (config, contract) {
var a = 0; var a = 0;
var ok = true;
contract.forEach(function(argument) { contract.forEach(function(argument) {
if (a < phantom.args.length) { if (a < phantom.args.length) {
settings[argument.name] = phantom.args[a]; config[argument.name] = phantom.args[a];
} else { } else {
if (argument.req) { if (argument.req) {
console.log('"' + argument.name + '" argument is required. This ' + argument.desc + '.'); console.log('"' + argument.name + '" argument is required. This ' + argument.desc + '.');
return false; ok = false;
} else {
config[argument.name] = argument.def;
} }
settings[argument.name] = (typeof argument.def==='function') ? argument.def.call(settings) : argument.def;
} }
a++; a++;
return true;
}); });
return (a > phantom.args.length); return ok;
},

mergeConfig: function (config, configFile) {
if (!fs.exists(configFile)) {
configFile = "config.json";
}
var result = JSON.parse(fs.read(configFile)),
key;
for (key in config) {
result[key] = config[key];
}
return result;
} }


} }


} }


confess.run(); confess.run();
8 changes: 8 additions & 0 deletions config.json
Original file line number Original file line Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"task": "manifest",
"userAgent": "default",
"wait": 0,
"consolePrefix": "#",
"cacheFilter": "\\.png$",
"networkFilter": null
}

0 comments on commit 2ef04a7

Please sign in to comment.