Python Diffbot API Client

Preface

Identify and extract the important parts of any web page in Python! This client currently supports calls to Diffbot's Automatic APIs and Crawlbot.

Installation To install activate a new virtual environment and run the following command:

$ pip install -r requirements.txt

Configuration

To run the example, you must first configure a working API token in config.py:

$ cp config.py.example config.py; vim config.py;

Then replace the string "SOME_TOKEN" with your API token. Finally, to run the example:

$ python example.py

Usage

Article API

An example call to the Article API:

diffbot = DiffbotClient()
token = "SOME_TOKEN"
version = 2
url = "http://shichuan.github.io/javascript-patterns/"
api = "article"
response = diffbot.request(url, token, api, version=2)

Product API

An example call to the Product API:

diffbot = DiffbotClient()
token = "SOME_TOKEN"
version = 2
url = "http://www.overstock.com/Home-Garden/iRobot-650-Roomba-Vacuuming-Robot/7886009/product.html"
api = "product"
response = diffbot.request(url, token, api, version=version)

Image API

An example call to the Image API:

diffbot = DiffbotClient()
token = "SOME_TOKEN"
version = 2
url = "http://www.google.com/"
api = "image"
response = diffbot.request(url, token, api, version=version)

Analyze API

An example call to the Analyze API:

diffbot = DiffbotClient()
token = "SOME_TOKEN"
version = 2
url = "http://www.twitter.com/"
api = "analyze"
response = diffbot.request(url, token, api, version=version)

Crawlbot API

To start a new crawl, specify a crawl name, seed URLs, and the API via which URLs should be processed. An example call to the Crawlbot API:

token = "SOME_TOKEN"
name = "sampleCrawlName"
seeds = "http://www.twitter.com/"
api = "analyze"
sampleCrawl = DiffbotCrawl(token,name,seeds=seeds,api=api)

Omit "seeds" and "api" to load an existing crawl, or create a crawl as a placeholder.

To check the status of a crawl:

sampleCrawl.status()

To update a crawl:

maxToCrawl = 100
upp = "diffbot"
sampleCrawl.update(maxToCrawl=maxToCrawl,urlProcessPattern=upp)

To delete or restart a crawl:

sampleCrawl.delete()
sampleCrawl.restart()

To download crawl data:

sampleCrawl.download() # returns JSON by default
sampleCrawl.download(data_format="csv")

To pass additional arguments to a crawl:

sampleCrawl = DiffbotCrawl(token,name,seeds,apiUrl,maxToCrawl=100,maxToProcess=50,notifyEmail="support@diffbot.com")

Testing

First install the test requirements with the following command:

$ pip install -r test_requirements.txt

Currently there are some simple unit tests that mock the API calls and return data from fixtures in the filesystem. From the project directory, simply run:

$ nosetests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests

tests

.gitignore

.gitignore

INSTALL.md

INSTALL.md

README.md

README.md

client.py

client.py

config.py.example

config.py.example

example.py

example.py

requirements.txt

requirements.txt

test_requirements.txt

test_requirements.txt

Repository files navigation

Python Diffbot API Client

Preface

Configuration

Usage

Article API

Product API

Image API

Analyze API

Crawlbot API

Testing

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
tests		tests
.gitignore		.gitignore
INSTALL.md		INSTALL.md
README.md		README.md
client.py		client.py
config.py.example		config.py.example
example.py		example.py
requirements.txt		requirements.txt
test_requirements.txt		test_requirements.txt

matAlmeida/diffbot-python-client

Folders and files

Latest commit

History

Repository files navigation

Python Diffbot API Client

Preface

Configuration

Usage

Article API

Product API

Image API

Analyze API

Crawlbot API

Testing

About

Resources

Stars

Watchers

Forks

Languages