Skip to content

pensnarik/parselab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyPi version Build Status

parselab

This package contains classes that help to write parsers in Python.

Installation

pip3 install parselab

Usage

To use parelab just create a class derived from BasicParser.

from parselab.cache import FileCache
from parselab.network import NetworkManager
from parselab.parsing import BasicParser

class MyParser(BasicParser):

    def __init__(self):
        self.cache = FileCache(namespace='my-parser', path=os.environ.get('CACHE_PATH'))
        self.net = NetworkManager()
        db.connect(os.environ['PARSINGDB'])
        db.setup_project('my-project')

After that you will be able to download pages using BasicParser.get_page() method:

class MyParser(BasicParser):
    ...

    def run(self):
        page = self.get_page('https://google.com')

BasicParser will use network manager specified in __init__ method and will save all downloaded pages into directory specified by your $CACHE_PATH environment variable. Next time you invoke get_page() method it will get the requested page from cache if available.

About

Python modules to help those who parse

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages