Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't fetch schema.org data if it contains "@graph" #11

Closed
mercxry opened this issue Jan 5, 2023 · 1 comment · Fixed by #12
Closed

Can't fetch schema.org data if it contains "@graph" #11

mercxry opened this issue Jan 5, 2023 · 1 comment · Fixed by #12

Comments

@mercxry
Copy link
Contributor

mercxry commented Jan 5, 2023

Hello, some websites use a slightly different version of schema.org that places all of the objects under a '@graph' tag, preventing the crate from fetching those tags because it expects them to be in the root object.

I can open a PR to fix this, let me know if you have any doubts or concerns!


Example website: https://www.vice.com/en/article/y3p9jx/nyc-bans-students-and-teachers-from-using-chatgpt

{
    "@context": "https://schema.org",
    "@graph": [
      {
        "@type": "BreadcrumbList",
        "itemListElement": [
          {
            "@type": "ListItem",
            "position": 1,
            "name": "Home",
            "item": "https://www.vice.com/en"
          },
          {
            "@type": "ListItem",
            "position": 2,
            "name": "Tech",
            "item": "https://www.vice.com/en/section/tech"
          }
        ]
      },
      {
        "@context": "https://schema.org",
        "@type": "NewsArticle",
        "mainEntityOfPage": {
          "@type": "WebPage",
          "@id": "https://www.vice.com/en/article/y3p9jx/nyc-bans-students-and-teachers-from-using-chatgpt"
        },
        "headline": "NYC Bans Students and Teachers from Using ChatGPT",
        "image": [
          "https://video-images.vice.com/articles/63b5aa421a9f6b858be7e769/lede/1672850154200-gettyimages-1240129495.jpeg?crop=1xw:0.843xh;0xw,0xh&resize=1200:*"
        ],
        "datePublished": "2023-01-04T16:37:42.109Z",
        "dateModified": "2023-01-04T16:37:42.109Z",
        "author": {
          "@type": "Person",
          "name": "Samantha Cole"
        },
        "publisher": {
          "@type": "Organization",
          "name": "VICE",
          "logo": {
            "@type": "ImageObject",
            "url": "https://vice-web-statics-cdn.vice.com/images/vice-og.png"
          }
        }
      },
      {
        "@context": "https://schema.org",
        "@type": "ItemList",
        "name": "NYC Bans Students and Teachers from Using ChatGPT",
        "itemListElement": []
      },
      []
    ]
  }

the resulting html object:

HTML {
    title: Some(
        "NYC Bans Students and Teachers from Using ChatGPT",
    ),
    description: Some(
        "The machine learning chatbot is inaccessible on school networks and devices, due to \"concerns about negative impacts on student learning,\" a spokesperson said.",
    ),
    url: Some(
        "https://www.vice.com/en/article/y3p9jx/nyc-bans-students-and-teachers-from-using-chatgpt",
    ),
    ...
    schema_org: [],
}

In this case the schema_org field is empty, but I'm expecting the schema_org field to contain the objects under @graph.

@orottier
Copy link
Owner

orottier commented Jan 5, 2023

Hi @mercxry, thanks for flagging. A pull request would be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants