You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thanks for a really good and extremely fast HTML-parser in Python.
I came across a problem with this file: bad_html.txt
(Probably it was corrupted while dumping, but here it doesn't matter)
I am finding meta-tags in it using HTMLParser.tags() function and then I'm trying to access attributes of found nodes. And on 4-th node python collapses with segfault.
Code:
html = open(path, 'r').read()
parser = HTMLParser(html)
for tag in parser.tags('meta'):
print(tag.attributes)
The fact that you cannot handle SIGSEGV signals in Python properly makes it more sad.
Hope, you will be able to handle this "fatal error".
My params:
selectolax 0.1.9
Python 3.5.2
Linux 4.15.0-45-generic (Ubuntu 16.04)
UPDATE
Maybe I've found out the weak place, because:
html = open(path, 'r').read()
parser = HTMLParser(html)
parser = HTMLParser(parser.html)
for tag in parser.tags('meta'):
print(tag.attributes)
works! Symbol " occurs inside meta tag and this builds a problem. Maybe helpful to look at differences between origin html text, parser.html and HTMLParser(parser.html).html
The text was updated successfully, but these errors were encountered:
First of all, thanks for a really good and extremely fast HTML-parser in Python.
I came across a problem with this file:
bad_html.txt
(Probably it was corrupted while dumping, but here it doesn't matter)
I am finding meta-tags in it using HTMLParser.tags() function and then I'm trying to access attributes of found nodes. And on 4-th node python collapses with segfault.
Code:
Result:
The fact that you cannot handle SIGSEGV signals in Python properly makes it more sad.
Hope, you will be able to handle this "fatal error".
My params:
UPDATE
Maybe I've found out the weak place, because:
works! Symbol " occurs inside meta tag and this builds a problem. Maybe helpful to look at differences between origin html text, parser.html and HTMLParser(parser.html).html
The text was updated successfully, but these errors were encountered: