Skip to content
DOM Traversing and Scraping using GraphQL
Branch: master
Clone or download
Latest commit f832c91 Nov 20, 2017
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
examples Updated to Graphene 2.0. Updated version to 1.0.0 🎉 Nov 20, 2017
gdom Updated to Graphene 2.0. Updated version to 1.0.0 🎉 Nov 20, 2017
.gitignore First working version of GDOM 😄 Feb 25, 2016
LICENSE First working version of GDOM 😄 Feb 25, 2016
Procfile Use gunicorn as WSGI server in sample_app Feb 25, 2016
README.md
README.rst Improved HackerNews example. Use it by default Feb 26, 2016
requirements.txt Updated to Graphene 2.0. Updated version to 1.0.0 🎉 Nov 20, 2017
sample_app.py Simplified sample_app Feb 25, 2016
setup.py Updated download url Nov 20, 2017

README.md

GDOM

GDOM is the next generation of web-parsing, powered by GraphQL syntax and the Graphene framework.

Install it typing in your console:

pip install gdom

DEMO: Try GDOM online

Usage

You can either do gdom --test to start a test server for testing queries or

gdom QUERY_FILE

This command will write in the standard output (or other output if specified via --output) the resulting JSON.

Your QUERY_FILE could look similar to this:

{
  page(url:"http://news.ycombinator.com") {
    items: query(selector:"tr.athing") {
      rank: text(selector:"td span.rank")
      title: text(selector:"td.title a")
      sitebit: text(selector:"span.comhead a")
      url: attr(selector:"td.title a", name:"href")
      attrs: next {
         score: text(selector:"span.score")
         user: text(selector:"a:eq(0)")
         comments: text(selector:"a:eq(2)")
      }
    }
  }
}

Advanced usage

If you want to generalize your gdom query to any page, just rewrite your query file adding the $page var. So should look to something like this:

query ($page: String) {
  page(url:$page) {
    # ...
  }
}

And then, query it like:

gdom QUERY_FILE http://news.ycombinator.com
You can’t perform that action at this time.