-
Notifications
You must be signed in to change notification settings - Fork 827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce datatypes for CssSelector and XPath #1672
Comments
At a technical level this makes sound sense, but to make it work we need to have really good documentation, examples, and probably a place where people can test and get a sense of what it does by playing - and a big warning if they are doing it wrong. |
@chaals thanks. any sense for the versioning aspect? in terms of making a viable validator/checker, there's the "does this look like the right kind of formatted string" aspect, ... but then there's the "which bits of some HTML document does it match" side too. I am not an expert but I guess there are xpath expressions that would match different bits of doc depending on the assumed xpath version? |
I'm not an xpath expert either... I'll ask one if I find one. But my rough sense is that we should leave out the version thing unless there is a screaming need for it. As far as I understand, the versions are generally not going to result in a particular xpath pointing to a different part of the same document just because it uses a different version. |
I can't think of XPath constructs that would match different things depending on the XPath version in use. If you use constructs from an unsupported version, most likely you will just get an error. This is also the behaviour you are likely to get from CSS Selectors. I don't think specifying the version is really useful. For the use case at hand, v1 should be more than enough. It's also the only version you're likely to find in a browser or in JS. There are some considerations applying to usage of XPath in HTML that might be good to link to. |
Thanks @chaals @darobin - yeah I was leaning towards implying latest/v3 but not creating types for all 3. But point taken re v1 and JS. Ping @tmarshbing @scor @nicolastorzec @rvguha @vholland @tilid Any objection to my going ahead and sketching this out within the context of pending.schema.org? I feel it could give us a useful primitive for making stronger links between schema.org data and the browser environment / non-schema.org web content. |
sketch away :) |
I would suggest implying (or even specifying) v1. Switching to a higher version later if needed will be painless, which is not true of the reverse. |
Sketch away. |
Ok, I'll make a pass at this. cheers... |
This is still in Pending. The nature of the (fairly well adopted) Speakable specification is such that these terms are only used to define the vocabulary deployed for SpeakableSpecification, and won't themselves be appearing in actual markup. Since speakable stuff went into the core, I think these ought to accompany it. Any objections? /cc @RichardWallis |
@danbri |
This issue is being tagged as Stale due to inactivity. |
I've tried using this datatype for the xpath however it's showing errors in the validator which weren't there before, e.g. xpath | /html/body/div[1]/section[4]/div[1]/p[1] (No matches found for expression /html/body/div[1]/section[4]/div[1]/p[1].) This is the type of structured data I'm using for example: "mainContentOfPage": [ I've also tried adding XPathType like this: "mainContentOfPage": [ and I get this error - /html/body/div[1]/section[4]/div[1]/p[1] (The property xpath is not recognised by the schema (e.g. schema.org) for an object of type XPathType.) I've also tried adding it like this: "xpath": { I've also tried using "text" rather than "xpath" however that doesn't worth either. What's am I doing wrong here please? Thanks |
For JSON-LD, perhaps a value object makes sense: {
"@context": "https://schema.org/",
"@type": "WebPage",
"mainContentOfPage": [
{
"@type": "Table",
"name": "Rescue Add On",
"xpath": {
"@value": "/html/body/div[1]/section[4]/div[1]/p[1]",
"@type": "XPathType"
},
"sameAs": "https://www.wikidata.org/wiki/Q337810"
}
]
} But I don't know whether this is at all easier for software to consume than plain |
Thank you very much @KalleOlaviNiemitalo, I've tested that and it's still throwing an error on the schema.org validator, value does make sense from a logical perspective though. I've also tested the xpath exists on the page (which it does) so it's not that which is the problem |
I tried copying the following to https://validator.schema.org/: <html>
<head>
<title>Demo page</title>
<script type="application/ld+json">
{
"@context": "https://schema.org/",
"@type": "WebPage",
"mainContentOfPage": [
{
"@type": "Table",
"name": "Rescue Add On",
"xpath": {
"@value": "/html/body/div[1]/section[4]/div[1]/p[1]",
"@type": "XPathType"
},
"sameAs": "https://www.wikidata.org/wiki/Q337810"
}
]
}
</script>
</head>
<body>
<div>
<section></section>
<section></section>
<section>
<div>
<p>Not referenced in JSON-LD.</p>
</div>
</section>
<section>
<div>
<p>The XPath refers to this.</p>
</div>
</section>
<section>
<div>
<p>This is not referenced, either.</p>
</div>
</section>
</div>
</body>
</html> The validator did not report any errors or warnings. It displayed these results:
It is strange to me that |
I can see that it works @KalleOlaviNiemitalo which is great, however it doesn't seem to work on live sites when you check the URL. I've triple checked the Xpaths exist (I realised after I'd posted that I copied the wrong xpath in there, it should have been the table one like as below as I was working with several broken ones at the same time, sorry about that): { However this still throws an error in Schema.org validator for both speakable and table (I've removed customer data for privacy reasons but this was the live site): Is there another way to do this, as prior to XPathType being added as a new type, this worked fine? I really appreciate your help. |
https://validator.schema.org/ apparently doesn't like it if the <html>
<head>
<title>Demo page</title>
<script type="application/ld+json">
{
"@context": "https://schema.org/",
"@type": "WebPage",
"hasPart": [
{
"@type": "WebPageElement",
"xpath": "/html/body/footer[1]",
"description": "That xpath deliberately does not match anything."
},
{
"@type": "WPAdBlock",
"xpath": "/html/body/p[1]",
"description": "Validator is happy with this."
},
{
"@type": "WPAdBlock",
"xpath": "/html/body/p[2]",
"description": "Validator complains 'No matches found' even though the element is there."
},
{
"@type": "WPAdBlock",
"xpath": "/html/body/table[1]",
"description": "Validator complains 'No matches found' even though the element is there."
},
{
"@type": "WPAdBlock",
"xpath": "/html/body/table[1]//*",
"description": "The validator considers each text node a separate value."
}
]
}
</script>
</head>
<body>
<p>Buy our product!</p>
<p><strong>Now you can have a second one for free.</strong></p>
<table>
<tbody>
<tr><td>Would you like a bulk discount?</td></tr>
<tr><td>Call our representative for more information.</td></tr>
</tbody>
</table>
</body>
</html> |
Thank you so much @KalleOlaviNiemitalo , that's fixed it! It removes the error if you paste the first table row e.g "/html/body/div[1]/div[4]/div/div/div[2]/div/div[20]/div/table/thead/tr/td[1]" which actually contains text. Whether or not this will work for Google to understand the full table is yet to be seen, but hopefully it can. I really appreciate your help, you are fantastic! |
Interestingly, it appears to be a different story for "speakable" in that you need to reference the tag which contains the text rather than the text itself, e.g /html/body/div[1]/div[4]/div/div/div[2]/div/p[2] |
For the example above, I had originally tried |
When viewing the The second example has an error regarding the |
Suggestion building on (experience implementing) the xpath/css mechanism from SpeakableSpecification ie. #1389
Context: http://pending.schema.org/SpeakableSpecification was added in v3.2 (including 'xpath' and 'cssSelector' properties which expect 'Text' values), in "pending review" area of schema.org.
Proposal:
Motivation: to allow applications to offer more accurate validation, error checking, and automated coercion to other representations of these datatypes. Also to help decouple the generic aspects of the SpeakableSpecification proposal from its Text-to-Speech specifics.
Currently http://pending.schema.org/xpath expects a value of type Text; if these datatypes went through, it could expect XPathType instead (and the schemas would declare this to be a specialization of Text). Similarly for cssSelector.
Potentially there could be subtypes tied to versions of Xpath and CSS. My understanding is that XPath is more explicitly versioned than CSS, perhaps Xpath explicit versions would be more necessary? e.g. https://www.w3.org/TR/xpath-30/#nt-bnf
/cc @chaals
The text was updated successfully, but these errors were encountered: