Skip to content

fagci/parsee

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parsee

Sweet python tiny site parser.

[new] Now with CloudFlare bypass

Немного на русском

Lang syntax

<selector> [a@ <selector>] [% <python_code>]

Note: python_code relative to last tag. Use . (dot) to get attribute or call method.

Requirements

Ensure that you have installed (for lxml):

[!] for Termux without sudo

sudo apt-get install libxml2 libxslt

Before use, install requirements:

pip3 install -r requirements.txt

Examples

Crawl first page links, get paragraph in each page next to heading contains text

for page in parser / '.links a@':
    for p in page / 'h3:-soup-contains("Some title")+p':
        print(p.text)

Get titles of subpages

./parser.py http://site.org 'a@a@title%.text'

Get links at home page

./parser.py http://site.org 'a%.get("href")'

About

Python tiny site parser

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages