Parsel

Parsel is a BSD-licensed Python library to extract data from HTML, JSON, and XML documents.

It supports:

CSS and XPath expressions for HTML and XML documents
JMESPath expressions for JSON documents
Regular expressions

Find the Parsel online documentation at https://parsel.readthedocs.org.

>>> from parsel import Selector
>>> text = """
        <html>
            <body>
                <h1>Hello, Parsel!</h1>
                <ul>
                    <li><a href="http://example.com">Link 1</a></li>
                    <li><a href="http://scrapy.org">Link 2</a></li>
                </ul>
                <script type="application/json">{"a": ["b", "c"]}</script>
            </body>
        </html>"""
>>> selector = Selector(text=text)
>>> selector.css('h1::text').get()
'Hello, Parsel!'
>>> selector.xpath('//h1/text()').re(r'\w+')
['Hello', 'Parsel']
>>> for li in selector.css('ul > li'):
...     print(li.xpath('.//@href').get())
http://example.com
http://scrapy.org
>>> selector.css('script::text').jmespath("a").get()
'b'
>>> selector.css('script::text').jmespath("a").getall()
['b', 'c']

Name		Name	Last commit message	Last commit date
Latest commit History 787 Commits
.github/workflows		.github/workflows
docs		docs
parsel		parsel
tests		tests
.bandit.yml		.bandit.yml
.bumpversion.cfg		.bumpversion.cfg
.coveragerc		.coveragerc
.flake8		.flake8
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
NEWS		NEWS
README.rst		README.rst
pylintrc		pylintrc
pytest.ini		pytest.ini
release.rst		release.rst
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

License

scrapy/parsel

Folders and files

Latest commit

History

Repository files navigation

Parsel

About

Topics

Resources

License

Stars

Watchers

Forks

Languages