New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help: is soupsieve case-insensitive? #95
Comments
Might be a duplicate of #87 |
@dimaqq, this is not a duplicate of #87. #87 was specifically fixing a bug where in an HTML document, the attribute value of As for your specific issue, you are using XML. XML is a case sensitive language. If I have Soup Sieve handles case sensitivity differently for XML and HTML, because the document type requires it. HTML tags and attribute names will be treated with case insensitivity, while in XML they will be treated with case sensitivity. These differences are specifically documented here: https://facelessuser.github.io/soupsieve/api/#api. Here is the thing. If I make XML tag recognition case insensitive in XML, how do I select just I personally view the behavior of the old select method as an oversight for XML because it doesn't respect the document's rules. If people really wanted, I could add a flag to force case insensitivity in XML documents, but that feels counter intuitive to XML. But I guess if it was strongly desired, I may consider it. |
Hi @facelessuser and thank you so much of the detailed explanation. I've also dug into soupsieve source code and XML vs HTML switches are clearly visible. Perhaps I didn't describe my problem clearly:
There are two problems with this:
I'll set up a MRE repo and post a link in this thread. |
A simple reproduction example will definitely help. HTML should be case insensitive. meaning I turns out I did not have a test explicitly testing tag case, but locally I just added this which passed: def test_tag_xml(self):
"""Test tag for XML."""
self.assert_selector(
"""
<Tag id="1">
<tag id="2"></tag>
<TAG id="3"></TAG>
</Tag>
""",
"tag",
["2"],
flags=util.XML
)
self.assert_selector(
"""
<Tag id="1">
<tag id="2"></tag>
<TAG id="3"></TAG>
</Tag>
""",
"Tag",
["1"],
flags=util.XML
)
self.assert_selector(
"""
<Tag id="1">
<tag id="2"></tag>
<TAG id="3"></TAG>
</Tag>
""",
"TAG",
["3"],
flags=util.XML
) When you provide the simple reproduction, please also include what version of soupsieve you are using as well as what version of BeautifulSoup. |
Oh, and do post whether you have |
@dimaqq, I was testing something that wasn't on tip. There was indeed a regression. And since I didn't have a test in place to catch it, I wasn't aware. I have a fix coming. |
MRE is at https://github.com/dimaqq/mre-bs4-soupsieve-xml-case the last test, |
Thanks, I'll put in appropriate tests this time to make sure I don't break this again in the future. |
@dimaqq, thanks for the MRE. #96 will fix case related issues. I've made sure that all XML documents will use case sensitivity for attribute values and tag names. There are tests to prevent future breakage. I've ensured CSS defined prefixes are always treated with case sensitivity, even in HTML5, as per the spec, they are always case sensitive. There was no test for this either, but now there is. I'm hoping this fixes all case related issues 🤞. |
1.7.3 has been released. Hopefully that gets where you need to be. |
Yes, it does, thank you so much! |
Before, BeautifulSoup accepted (and I think required) case-sensitive tag name in selector.
Now that BeautifulSoup uses soupsieve, it seems that only lower-case selectors are supported.
I'm really not sure why or if I can change this behaviour.
The text was updated successfully, but these errors were encountered: