PhantomCurl

Python wrapper around PhantomJS headless browser for advanced web scraping. Apart of the page content, it can record all requests and responses from the web page, or collect content of all the IFrames.

If used as a command line tool, it returns data in JSON format.

Installation

PhantomCurl is a wrapper around PhantomJS, so first you should install PhantomJS from the project's page

PhantomJS should be visible system wise:

which phantomjs

If the binary is not visible system-wide, you should set the environment variable PHANTOMJS_BIN to point to the PhantomJS binary.

Now, build and install the python egg:

make && make install

Command line tool

You can use the script as a command line tool with:

python -mphantomcurl --help

Returns data in JSON format

Returned values

fetch() returns dictionary with the following fields:

url             - URL fed to the fetch function
requests        - all requests captured
responses       - all responses captured
content         - content of the web page
timestamps      - [start, end], seconds
version         - version of the JS script
command_line    - command line arguments passed to the JS 
frames          - IFrames found on the page. `frames` can contain other frames recursively

IFrames inspection

The script allows deep iframes inspection (-f option). For each iframe it reports src, id and its content. Then for each frame it check if it contains more iframes and reports them, recursively.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
phantomcurl		phantomcurl
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

phantomcurl

phantomcurl

.gitignore

.gitignore

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

setup.py

setup.py

Repository files navigation

PhantomCurl

Installation

Command line tool

Returned values

IFrames inspection

About

Releases

Packages

Languages

License

sjqzhang/phantomcurl

Folders and files

Latest commit

History

Repository files navigation

PhantomCurl

Installation

Command line tool

Returned values

IFrames inspection

About

Resources

License

Stars

Watchers

Forks

Languages