Segmentation fault (core dumped) when calling "attributes" of a node #9

aleasims · 2019-02-12T11:45:01Z

First of all, thanks for a really good and extremely fast HTML-parser in Python.
I came across a problem with this file:
bad_html.txt
(Probably it was corrupted while dumping, but here it doesn't matter)

I am finding meta-tags in it using HTMLParser.tags() function and then I'm trying to access attributes of found nodes. And on 4-th node python collapses with segfault.
Code:

html = open(path, 'r').read()
parser = HTMLParser(html)
for tag in parser.tags('meta'):
    print(tag.attributes)

Result:

{'charset': 'utf-8'}
{'content': 'IE=edge', 'http-equiv': 'X-UA-Compatible'}
{'content': 'ÑÐ½Ð¸ÐºÐ°, ÑÐ¾Ð·ÐµÑÐºÐ¸ Ð¸ Ð²ÑÐºÐ»ÑÑÐ°ÑÐµÐ»Ð¸ ÑÐ½Ð°Ð¹Ð´ÐµÑ, Ð²ÑÐºÐ»ÑÑÐ°ÑÐµÐ»Ð¸ schneider electric, ÑÐ¾Ð·ÐµÑÐºÐ¸ ÑÐ½Ð°Ð¹Ð´ÐµÑ, ÑÐ¾Ð·ÐµÑÐºÐ¸ schneider, schneider electric ÑÐ¾Ð·ÐµÑÐºÐ¸ Ð¸ Ð²ÑÐºÐ»ÑÑÐ°ÑÐµÐ»Ð¸, Ð²ÑÐºÐ»ÑÑÐ°ÑÐµÐ»Ð¸ ÑÐ½Ð°Ð¹Ð´ÐµÑ, unica ÑÐ¾Ð·ÐµÑÐºÐ¸, unica schneider, Ð²ÑÐºÐ»ÑÑÐ°ÑÐµÐ»Ð¸ unica, schneider electric unica, ÑÐ°Ð¼ÐºÐ¸ unica, Ð\xa0Ð¾Ð·ÐµÑÐºÐ¸ schneider electric, ÑÐ¾Ð·ÐµÑÐºÐ¸ ÑÐ½Ð°Ð¹Ð´ÐµÑ ÑÐ»ÐµÐºÑÑÐ¸Ðº', 'name': 'keywords'}
Segmentation fault (core dumped)

The fact that you cannot handle SIGSEGV signals in Python properly makes it more sad.

Hope, you will be able to handle this "fatal error".

My params:

selectolax 0.1.9
Python 3.5.2
Linux 4.15.0-45-generic (Ubuntu 16.04)

UPDATE
Maybe I've found out the weak place, because:

html = open(path, 'r').read()
parser = HTMLParser(html)
parser = HTMLParser(parser.html)
for tag in parser.tags('meta'):
    print(tag.attributes)

works! Symbol " occurs inside meta tag and this builds a problem. Maybe helpful to look at differences between origin html text, parser.html and HTMLParser(parser.html).html

The text was updated successfully, but these errors were encountered:

rushter · 2019-02-13T21:16:37Z

I've fixed your problem.

Don't close this issue. I will investigate it further later because I'm not sure if that's expected behavior from the Modest engine.

aleasims · 2019-02-18T09:46:15Z

Great, it seems to work fine now! Thank you very much for quickly solving the issue.

rushter added a commit that referenced this issue Feb 13, 2019

Fix for #9

e1c83ae

rushter closed this as completed Apr 3, 2019

phoerious mentioned this issue Jul 9, 2021

Node.attrs causes segfault and has buggy items() method #39

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault (core dumped) when calling "attributes" of a node #9

Segmentation fault (core dumped) when calling "attributes" of a node #9

aleasims commented Feb 12, 2019 •

edited

rushter commented Feb 13, 2019 •

edited

aleasims commented Feb 18, 2019

Segmentation fault (core dumped) when calling "attributes" of a node #9

Segmentation fault (core dumped) when calling "attributes" of a node #9

Comments

aleasims commented Feb 12, 2019 • edited

rushter commented Feb 13, 2019 • edited

aleasims commented Feb 18, 2019

aleasims commented Feb 12, 2019 •

edited

rushter commented Feb 13, 2019 •

edited