Skip to content

scrapinghub/wappalyzer-python

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
February 26, 2015 19:58
February 26, 2015 19:58
March 28, 2017 13:14
February 26, 2015 19:58

wappalyzer-python -- UNMAINTAINED

pypi badge

Python wrapper for Wappalizer (utility that uncovers the technologies used on websites)

Warning: this package is not maintained anymore.

Scrapinghub and Javier Casas, the original author, have no plans to support wappalyzer-python in the foreseeable future (this includes fixing bugs, supporting upgraded dependencies like PyV8 etc.)

If you are interested in continuing the work, please get in touch via opensource@scrapinghub.com so that we can discuss transferring ownership of this repository.

How to use it

>>> from wappalyzer import Wappalyzer
>>> w = Wappalyzer()

>>> w.analyze('http://wikipedia.org')
{u'Apache': {u'confidence': 100, u'version': u'', u'categories': [u'web-servers']},
u'Varnish': {u'confidence': 100, u'version': u'', u'categories': [u'cache-tools']}}

>>> w.analyze('http://tripadvisor.com')
{u'Apache': {u'confidence': 100, u'version': u'', u'categories': [u'web-servers']},
u'Google Analytics': {u'confidence': 100, u'version': u'', u'categories': [u'analytics']},
u'comScore': {u'confidence': 100, u'version': u'', u'categories': [u'analytics']}}

>>> w.analyze('http://facebook.com')
{u'reCAPTCHA': {u'confidence': 100, u'version': u'', u'categories': [u'captchas']}}

You can specify the User-Agent to use:

>>> w.analyze('http://www.google.com', user_agent='your_user_agent')

Or analyze from already downloaded pages (in this case you'll need to have the url and response headers too):

>>> w.analyze_from_data(url=the_url, html=the_html, headers=the_response_headers)

Apps and Categories are available as dict objects:

>>> w.apps
{u'Google Wallet': {u'website': u'wallet.google.com', u'cats': [41], u'script': [u'checkout\\.google\\.com',
u'wallet\\.google\\.com']}, u'Lockerz Share': ...}

>>> w.categories
{u'42': u'tag-managers', u'48': u'network-storage', u'43': u'paywalls', u'49': u'feed-readers', u'24':
u'rich-text-editors', u'25': u'javascript-graphics', u'26': u'mobile-frameworks', ...}

Data can be also updated with the latest version available from the Wappalyzer Github repo:

>>> from wappalyzer import updater
>>> updater.update_all()

By default app icons will be updated to the data/icons folder, in case you need them somewhere else you can specify the destination folder:

>>> from wappalyzer import updater
>>> updater.update_all(icons_folder='your_icons_folder')

Or update them individually:

>>> updater.update_icons(icons_folder='your_icons_folder')

Requirements

Note for macos users: If you have problems installing PyV8 you can use PyV8-OS-X:

pip install -e git://github.com/brokenseal/PyV8-OS-X#egg=pyv8

Install

Using setup:

python setup.py install

Using pypi:

pip install wappalyzer-python

About

UNMAINTAINED Python wrapper for Wappalyzer (utility that uncovers the technologies used on websites)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages