Skip to content

PhantomJS script

Sigit Dewanto edited this page Jun 14, 2017 · 4 revisions

Installation

  1. Install PhantomJS
  2. Install NodeJS and NPM
  3. Clone Webdext repository git clone git@github.com:seagatesoft/webdext.git
  4. Enter Webdext directory and run npm install
  5. Run gulp build-phantom and the required files will be built into build directory

Usage

Intelligent extraction

phantomjs intellextract.js <page_url> <output_path>

  • page_url: URL of the web page containing list of data records
  • output_path: Path to store extraction result (JSON format)

Extraction using existing extractor (wrapper)

phantomjs wrapperextract.js <wrapper_path> <page_url> <output_path>

  • wrapper_path: Path to the wrapper file. You could create it using the Chrome extension
  • page_url: URL of the web page containing list of data records
  • output_path: Path to store extraction result (JSON format)