BeautifulSoup interface for lxml
- FAST search in tree
- FAST serialize to str
- BeautifulSoup4 interface to interact with object:
- Search:
find
,find_all
,find_next
,find_next_sibling
- Text:
get_text
,string
- Tag:
name
,get
,clear
,__getitem__
,__str__
,__repr__
,append
,new_tag
,extract
,replace_with
- Search:
pip install fast-soup==1.1.0
from fast_soup import FastSoup
content = ... # read some html content
soup = FastSoup(content)
# interact like BS4 object
result = soup.find('a', id='my_link')
# interact like lxml object
el = result.unwrap()
Q: BS4 already implement lxml parser. Why i should use FastSoup?
A: Yes, BS4 implement parser, and it's just building the tree. All next interactions proceed with "Python speed": searching, serialization. FastSoup internally use lxml and guarantee "C speed".
Q: How FastSoup speedup works?
A: FastSoup just build xpath and execute them. For prevent rebuilding LRU cache used.
Q: Why you don't support whole interface? This will be soon?
A: I wrote functions which speed up parsing in my projects. Just create a issue or pull request and i think we find the solution ;)
You can got power of BeautifulSoup when wrap your lxml objects, e.g:
from fast_soup import Tag
content = ... # some bytes ready to parse
context = lxml.etree.iterparse(
io.BytesIO(content), ...
)
for event, elem in context:
tag = Tag(elem)
tag_text = tag.get_text()
tag_attr = tag['attribute']