Conversation
@@ -37,6 +37,9 @@ const canonicalUrlRules = buildRuleset('url', [ | |||
['link[rel="canonical"]', node => node.element.href], | |||
]); | |||
|
|||
const keywordsRules = buildRuleset('keywords', [ | |||
['meta[name="keywords"]', node => node.element.content], | |||
]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UPDATE: Filed as #47
Not sure if we want to add support for article:tag
, which I've seen a few times in "the wild":
Example: https://www.engadget.com/2016/08/19/the-best-headlamps/
Source:
<meta property="og:url" content="https://www.engadget.com/2016/08/19/the-best-headlamps/">
<meta property="og:title" content="The best headlamps">
<meta property="og:description" content="Go for the Black Diamond Spot.">
<meta property="og:image" content="https://s.aolcdn.com/dims5/amp:7a9ea64b5117cd0b2d3e3df595c52b67aa6a6709/t:1200,630/q:80/?url=https%3A%2F%2Fs.aolcdn.com%2Fhss%2Fstorage%2Fmidas%2F7697ed6dc5ea00ddff3537c34c17dde3%2F204221484%2F01-headlamps-2000.jpg">
<meta property="og:image:width" content="1200">
<meta property="og:image:height" content="630">
<meta property="og:type" content="article">
<meta property="article:tag" content="BlackDiamond">
<meta property="article:tag" content="BlackDiamondRevolt">
<meta property="article:tag" content="BlackDiamondSpot">
<meta property="article:tag" content="CoastFL75">
<meta property="article:tag" content="gadgetry">
<meta property="article:tag" content="gadgets">
<meta property="article:tag" content="gear">
<meta property="article:tag" content="headlamp">
<meta property="article:tag" content="headlamps">
<meta property="article:tag" content="LED Lights">
<meta property="article:tag" content="ONeill">
<meta property="article:tag" content="partner">
<meta property="article:tag" content="Shining Buddy">
<meta property="article:tag" content="syndicated">
<meta property="article:tag" content="The Revolt">
<meta property="article:tag" content="thewirecutter">
<meta property="article:tag" content="Vitchelo">
<meta property="article:tag" content="VitcheloV800">
<meta property="article:tag" content="wirecutter">
Interestingly, I can't even see a <meta name="keywords" />
on that page...
Also, it looks like they do have the same values repeated for swiftype
tags:
<meta class="swiftype" name="tags" data-type="string" content="BlackDiamond">
<meta class="swiftype" name="tags" data-type="string" content="BlackDiamondRevolt">
<meta class="swiftype" name="tags" data-type="string" content="BlackDiamondSpot">
<meta class="swiftype" name="tags" data-type="string" content="CoastFL75">
<meta class="swiftype" name="tags" data-type="string" content="gadgetry">
<meta class="swiftype" name="tags" data-type="string" content="gadgets">
<meta class="swiftype" name="tags" data-type="string" content="gear">
<meta class="swiftype" name="tags" data-type="string" content="headlamp">
<meta class="swiftype" name="tags" data-type="string" content="headlamps">
<meta class="swiftype" name="tags" data-type="string" content="LED Lights">
<meta class="swiftype" name="tags" data-type="string" content="ONeill">
<meta class="swiftype" name="tags" data-type="string" content="partner">
<meta class="swiftype" name="tags" data-type="string" content="Shining Buddy">
<meta class="swiftype" name="tags" data-type="string" content="syndicated">
<meta class="swiftype" name="tags" data-type="string" content="The Revolt">
<meta class="swiftype" name="tags" data-type="string" content="thewirecutter">
<meta class="swiftype" name="tags" data-type="string" content="Vitchelo">
<meta class="swiftype" name="tags" data-type="string" content="VitcheloV800">
<meta class="swiftype" name="tags" data-type="string" content="wirecutter">
And again for AMP, using ld+json
:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Article",
"url": "https://www.engadget.com/2016/08/19/the-best-headlamps/",
"author": "The Wirecutter",
"headline": "The best headlamps",
"datePublished": "2016-08-19 12:23:00.000000",
...
"articleBody": "...",
"articleSection": "Gear",
"keywords": ["BlackDiamond","BlackDiamondRevolt","BlackDiamondSpot","CoastFL75","gadgetry","gadgets","gear","headlamp","headlamps","LED Lights","ONeill","partner","Shining Buddy","syndicated","The Revolt","thewirecutter","Vitchelo","VitcheloV800","wirecutter"],
...
"dateModified": "2016-08-19 12:39:44.000000"
}
</script>
Not sure if we want to add the latter two right now, or leave those until the amp
and swiftype
implementation bugs.
But it also brings up a semi-related issue I keep forgetting to ask. Given that OpenGraph and swiftype and others can sometimes have multiple tags that match a ruleset, does Fathom or our parser somehow convert those to an array, or will it just pluck the first value that matches (giving us one keyword, instead of an array of keywords)?
For example, will it work for tags like this:
<meta class="swiftype" name="tags" data-type="string" content="BlackDiamond">
<meta class="swiftype" name="tags" data-type="string" content="BlackDiamondRevolt">
<meta class="swiftype" name="tags" data-type="string" content="BlackDiamondSpot">
<meta class="swiftype" name="tags" data-type="string" content="CoastFL75">
...
No description provided.