Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

GM_safeHTMLParser

supahgreg edited this page · 7 revisions

Description

This function will safely parse a string of HTML and return an XMLDocument. It cleans the provided HTML by removing tags such as <script>, <style>, <head>, <body>, <title> and <iframe>, and will also remove all JavaScript (including element attributes containing JavaScript).

Arguments

String HTMLString

A string of HTML.

String BaseURL

Optional. If specified (and valid), this value will be used to resolve partial URLs (e.g. /images/foo.png). If omitted, things with partial URLs might be excluded from the returned XMLDocument.

Returns

XMLDocument xmlDoc

An XML document (in the XHTML namespace) representing the parsed HTML.

Note: Certain uses of the returned XML document (e.g. as a context node for an XPath query) will require the use of a namespace resolver. An example of this situation has been provided below.

Example

// GET erikvold.com
GM_xmlhttpRequest({
  method: "GET",
  url: "http://erikvold.com/",
  onload: function(response) {
    // Parse the response to an XML document
    var doc = GM_safeHTMLParser(response.responseText);
    // Query the document to get certain content, and then display it in an alert
    alert(doc.getElementById("hcard-Erik-Vergobbi-Vold").innerHTML);
  }
});
// GET google.com
GM_xmlhttpRequest({
  method: "GET",
  url: "http://google.com/",
  onload: function(response) {
    // Parse the response to an XML document
    var doc = GM_safeHTMLParser(response.responseText);
    // Get the HTML element through the use of `GM_xpath` and the namespace resolver
    var htmlEle = GM_xpath({
      path: "//x:html",
      node: doc,
      resolver: "http://www.w3.org/1999/xhtml"
    });
    // Log the element's String representation
    if (htmlEle) {
      GM_log("The HTML element was found! " + htmlEle);
    } else {
      GM_log("The HTML element was not found...");
    }
  }
});

Related Pages

Manual: API

Something went wrong with that request. Please try again.