Parsee

Sweet python tiny site parser.

[new] Now with CloudFlare bypass

Lang syntax

<selector> [a@ <selector>] [% <python_code>]

Note: python_code relative to last tag. Use . (dot) to get attribute or call method.

Requirements

Ensure that you have installed (for lxml):

[!] for Termux without sudo

sudo apt-get install libxml2 libxslt

Before use, install requirements:

pip3 install -r requirements.txt

Examples

Crawl first page links, get paragraph in each page next to heading contains text

for page in parser / '.links a@':
    for p in page / 'h3:-soup-contains("Some title")+p':
        print(p.text)

Get titles of subpages

./parser.py http://site.org 'a@a@title%.text'

Get links at home page

./parser.py http://site.org 'a%.get("href")'

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
example.yml		example.yml
lang.py		lang.py
parser.py		parser.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parsee

Lang syntax

Requirements

Examples

Crawl first page links, get paragraph in each page next to heading contains text

Get titles of subpages

Get links at home page

About

Releases

Packages

Languages

fagci/parsee

Folders and files

Latest commit

History

Repository files navigation

Parsee

Lang syntax

Requirements

Examples

Crawl first page links, get paragraph in each page next to heading contains text

Get titles of subpages

Get links at home page

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages