Skip to content

An analyzer for web page content powered by Python

License

Notifications You must be signed in to change notification settings

mervinkid/wepana

Repository files navigation

wepana: A Web Page Analyzer

Python License MIT Release

 __      _____ _ __   __ _ _ __   __ _
 \ \ /\ / / _ \ '_ \ / _` | '_ \ / _` |
  \ V  V /  __/ |_) | (_| | | | | (_| |
   \_/\_/ \___| .__/ \__,_|_| |_|\__,_|
              |_|

Wepana is an analyzer for web page content powered by Python.
It compatible with both python2 and python3.

Dependencies

No any third part dependencies.
Wepana can auto detect the major version of python runtime and use the build in library for feature implementation.

Features

  • Auto load content from url.
  • Load content from file.
  • Load content from string value.
  • Get image urls.
  • Get html link src target urls.
  • Get meta information.
  • Get keyword information.

Usage

Installation

pip install wepana

Sample

# -*- coding: utf-8 -*-
#!/usr/bin/env python
from wepana import WebPageAnalyzer

def foo():
    # load with init
    analyzer = WebPageAnalyzer(url='http://github.com')
    
    # load after init
    analyzer.connect('http://github.com')
    
    # load from file
    analyzer.read_file('/path/to/the/file.html')
    
    # load form text
    analyzer.read_text('text content')
    
    # check status
    if not analyzer.read():
        print('wepana analyzer is not ready.')
        return

    # get title
    analyzer.get_title()
    
    # get keywords
    analyzer.get_keywords()
    
    # get images
    analyzer.get_images()
    
    # get likes
    analyzer.get_links()
    
    # reset analyzer
    analyzer.reset()

if __name__ == '__main__':
    foo()

Contributing

  1. Fork it.
  2. Create your feature branch. ($ git checkout feature/my-feature-branch)
  3. Commit your changes. ($ git commit -am 'What feature I just added.')
  4. Push to the branch. ($ git push origin feature/my-feature-branch)
  5. Create a new Pull Request

Authors

@Mervin

License

The MIT License (MIT). For detail see LICENSE.

About

An analyzer for web page content powered by Python

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages