Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
CSQuery Exception when ID of HTML Element contains space #5
CSQuery causes an exception when an ID of an html element contains a space Ex.
I just created this test and it passes:
Can you show me some specific failing code or point me to a URL that won't parse?
There's no reason that an ID couldn't have a space -- the ID attribute is treated the same as any other; anything inside the quotes should become its value.
However it is not possible to make that selector work as you'd like because "img test" is a valid selector that looks for an element "test" that is a child of elements "img". That is, because the space already has a specific meaning inside selectors (search all the children of the selection so far) it doesn't make sense to try to anticipate this situation. In theory I could try to check for tag selectors that aren't valid tag names and treat them differently but that could create a host of unexpected situations since using invalid (or "custom" as we like to say) tag names is common.
There are actually valid IDs that can't be selected either; for example, the period (dot) is a legal part of an ID name, but it also has a different meaning in a selector, which is to select class names. e.g.
Thanks for getting back with me. Here is the html that will not parse:
When I run the following code:
Dim csq As CsQuery.CQ = CsQuery.CQ.Create(HTMLElement)
Where HTMLElmenet equals the html above I get the following exception:
System.Exception : Unexpected character found at position 9: "
at CsQuery.Engine.SelectorParser.Parse(String selector)
at CsQuery.Engine.SelectorChain..ctor(String selector)
at CsQuery.Implementation.DomRoot.GetElementById(String id)
at CsQuery.Implementation.NodeList.Add(IDomObject item)
at CsQuery.CQ.Load(Char html)
at CsQuery.CQ.Load(String html)
Let me know if there is anything else I can do to help track down the
On Thu, Jun 7, 2012 at 5:35 AM, James Treworgy <
Confirmed bug. As it turns out this is not actually the parser - what happens is that when you add a new element to the DOM, it verifies that its ID is unique, and removes it if not. It's breaking when trying to select the invalid ID using
I think this logic is flawed, even though it's designed to prevent invalid HTML, it would result in an incorrect representation of what it was fed. Since browsers allow it (even though it's invalid) CsQuery should too. Shouldn't take long to fix.
added a commit
Jun 7, 2012
Fixed - and the element can be selected by ID now using
It will still remove the ID attribute if you try to add a duplicate ID to an existing DOM. It looks like jQuery does not do this automatically from a quick google on the subject. It seems like desirable behavior to me so I am going to leave it this way.
Thanks James. I tested the latest version and it works. I like the idea
On Thu, Jun 7, 2012 at 10:32 AM, James Treworgy <