Skip to content
iterationlabs edited this page Sep 13, 2010 · 2 revisions

A simple script, or “parselet”, looks like this:

{
  "title": "h1",
  "links(a)": [
    {
      "text": ".",
      "href": "@href"
    }
  ]
}

This returns JSON or XML output with the same structure. Applying this parselet to http://www.yelp.com/biz/amnesia-san-francisco yields either:

{
  "title": "Amnesia",
  "links": [
    {
      "href": "\/",
      "text": "Yelp"
    },
    {
      "href": "\/",
      "text": "Welcome"
    },
    {
      "href": "\/signup?return_url=%2Fuser_details",
      "text": " About Me"
    },
    .....
  ]
}

or equivalently:

<parsley:root>
  <title>Amnesia</title>
  <links>
    <parsley:group>
      <href>/</href>
      <text>Yelp</text>
    </parsley:group>
    <parsley:group>
      <href>/</href>
      <text>Welcome</text>
    </parsley:group>
    <parsley:group>
      <href>/signup?return_url=%2Fuser_details</href>
      <text> About Me</text>
    </parsley:group>
    .....
  </links>
</parsley:root>

This parselet could also have been expressed as:

{
  "title": "h1",
  "links(a)": [
    {
      "text": ".",
      "href": "@href"
    }
  ]
}

The “a” in links(a) is a “key selector” – an explicit grouping (with scope) for the array. You can use any XPath 1.0 or CSS3 expression as a value or a key selector. parsley will try to be smart, and figure out which you are using. You can use CSS selectors inside XPath functions – substring-after(h1>a, ':') is a valid expression.