Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
PHP proxy that converts webpages to json objects, filtered by jquery-like selectors.
PHP
Branch: master
Failed to load latest commit information.
LICENSE minor
README.md Minor / typo
index.php minor
phpQuery-onefile.php initial commit
post.html clean up

README.md

json-anything

PHP proxy that converts webpages to json objects, filtered by jquery-like selectors (Built on top of PHPQuery).

Takes parameters from GET or POST.

Think of it as a 'shotgun api'. Because many sites still don't provide apis.

MIT LICENSE.

Parameters:

debug:

This is a debug flag.

If set, it will var_dump the results for improved readibility. If not, a json-encoded object is returned.

url:

The url to go scrape and make JSON from.

For complex urls already using GET (ie. some search engines) you can POST url instead.

EXAMPLE

?url=http://www.bom.gov.au/

sel:

Comma separated list of selectors, in a jQuery fashion

PHP can't parse # from a url as it's never sent to the server ('url fragments').

Because of this, use % instead of #.

You can, but are not forced to, use two underscores (__) for a space (so it doesn't look like %20)

Get attributes by using a '|'

By default it will return 'text()'

a        ---> will return all text of the A tags
a|html   ---> will return all html contents of the A tags
a|href   ---> will return all href attribute of the A tags
img|src  ---> will return all src attribute of the img tags

EXAMPLES

span.hey ---> will return all spans with class hey
span%hey ---> will return all spans with id hey (I know)
div%pad table:first a ---> div#pad table:first a

Complex Example using __ instead of space:

Within #results, will return the 4th column, 2nd span, b's html

%results__td:nth-child(4)__span:nth-child(2)__b|html

Real world examples

Latest currencies $ from Reuters

index.php?url=http://uk.reuters.com/business/currencies&sel=%currPairs__td:first-child__a,%currPairs__td:first-child__a|href,%currPairs__td:nth- child(2)

{
  "url":"http://uk.reuters.com/business/currencies",
  "sel":"#currPairs td:first-child a,#currPairs td:first-child a|href,#currPairs td:nth-child(2)",
  "results":[
    [
      "GBP/USD",
      "/business/currencies/quote?srcAmt=1&srcCurr=GBP&destAmt=&destCurr=USD",
      "1.6299"
    ],
    [
      "GBP/EUR",
      "/business/currencies/quote?srcAmt=1&srcCurr=GBP&destAmt=&destCurr=EUR",
      "1.1348"
    ],

    ...
  ]
}

Hacker News

There is an unofficial api but anyway, this works too.

http://localhost/json-anything/?url=http://news.ycombinator.com/&sel=.title__a:first-child,.title__a:first-child|href

{
  "url":"http://news.ycombinator.com/",
  "sel":".title a:first-child,.title a:first-child|href",
  "results":[
    [
      "Octopress - A blogging framework for hackers",
      "http://octopress.org/"
    ],
    [
      "NoSQL is What?",
      "http://blog.zawodny.com/2011/07/23/nosql-is-what/"
    ],
    [
      "Radio Shack to start stocking Arduino, Other Goodies",
      "http://blog.radioshack.com/2011/07/21/top-ten-diy-suggestions-from-you/"
    ],

    ...
  ]
}

Australia Bureau of Meteorology (bom)

The bom is known for not having decent a api.

Take a look at its home page http://www.bom.gov.au/ ... We're going to grab the Forecast as presented for the major cities.

http://localhost/json-anything/?url=http://www.bom.gov.au/&sel=%pad__table:first__a,%pad__table:first__.max,%pad__table:first__td:last-child

url=http://www.bom.gov.au/
%pad__table:first__a, (grab the #pad then the first table's A tags text)
%pad__table:first__.max, (the maximas, has class .max)
%pad__table:first__td:last-child (grab the last TD of the first table)

{
  "url":"http://www.bom.gov.au/",
  "sel":"#pad table:first a,#pad table:first .max,#pad table:first td:last-child",
  "results":[
    [
      "Sydney",
      "17\u00b0",
      "Possible shower clearing."
    ],
    [
      "Melbourne",
      "14\u00b0",
      "Shower or two."
    ],
    [
      "Brisbane",
      "23\u00b0",
      "Fine."
    ],

    ...
  ]
}

(\u00b0 is the unicode for the DEGREE sign)

Feel free to improve

Not much of this is perfect, it comes from flaws in its quirky design, also the PHP could be improved.

If you have some PHP experience in your legs and want to improve the code and/or add features, I'd gladly take changes (fork away!).

Something went wrong with that request. Please try again.