Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid HTML on "schema.org" schema pages. #375

Closed
John-Nagle opened this issue Mar 7, 2015 · 6 comments
Closed

Invalid HTML on "schema.org" schema pages. #375

John-Nagle opened this issue Mar 7, 2015 · 6 comments
Assignees
Labels
site tools + python code Infrastructural issues around schema.org site. Most can ignore this! type:bug A mistake or malfunction whose remedy should be straightforward technical work

Comments

@John-Nagle
Copy link

"http://schema.org/Thing" and its descendants are no longer valid HTML. This seems to have been broken in Q1-Q2 2014, based on archived versions in the Internet Archive. In particular, the list of "More Specific Types" uses "li" items which are no longer within a "ul". (There's also an extra "/table" and a bogus "/span" tag in there.) This makes it harder to machine-process that data.

I assume those pages are automatically generated from "https://github.com/schemaorg/schemaorg/blob/master/data/schema.rdfa", which was new in 2014, so this is probably just a bad template.

W3C validation report: http://validator.w3.org/check?uri=http%3A%2F%2Fschema.org%2FThing&charset=%28detect+automatically%29&doctype=Inline&group=0

@danbri
Copy link
Contributor

danbri commented Mar 13, 2015

Thanks, yes we moved to a new codebase early last year. We ought to fix this, indeed...

@danbri danbri added type:bug A mistake or malfunction whose remedy should be straightforward technical work site tools + python code Infrastructural issues around schema.org site. Most can ignore this! labels Mar 13, 2015
@danbri danbri self-assigned this Mar 13, 2015
@danbri danbri added this to the sdo-gozer release milestone Mar 13, 2015
@John-Nagle
Copy link
Author

Thanks. The RDFa version of the same data, "http://schema.org/docs/schema_org_rdfa.html" (which comes from "https://github.com/schemaorg/schemaorg/blob/master/data/schema.rdfa") could also use some work. It's OK up until line 4536; then it gets weird. There are "href" attributes on "span" tags, starting at "action The movement the muscle generates." It looks like some schema of medical information in a similar, but not quite compatible, format was pasted in there. There's similar cut and paste trouble near "series" and "wikidoc". Some of this won't even parse properly as HTML5 in a browser.

Validator:
http://validator.w3.org/nu/?doc=http%3A%2F%2Fschema.org%2Fdocs%2Fschema_org_rdfa.html

@danbri
Copy link
Contributor

danbri commented Mar 13, 2015

Thanks. Don't spend too much time on the schema.rdfa file's compatibility - in its current form it is an implementation detail. It turned out not to be ideal to use RDFa for this, and we are looking into migration to JSON-LD anyway. But I need to integrate and test a decent parser for that first...

@timbl
Copy link

timbl commented Mar 13, 2015

Could you folks please make Turtle an option.

  • It is simpler than rdf/a or json/ld
  • It is a native graph language not a tree language like json or xml
  • It is the one required common language in the linked data platform.
  • It can be read by things old libraries
    Tim

@chaals
Copy link
Contributor

chaals commented Mar 13, 2015

  • reply@- notifications@  13.03.2015, 22:53, "Tim Berners-Lee" notifications@github.com:Could you folks please make Turtle an option.You mean getting a turtle version of information we have, or reading Turtle from pages (how)? cheers It is simpler than rdf/a or json/ldIt is a native graph language not a tree language like json or xmlIt is the one required common language in the linked data platform.It can be read by things old libraries Tim—Reply to this email directly or view it on GitHub.   --Charles McCathie Nevile - web standards - CTO Office, Yandexchaals@yandex-team.ru - - - Find more at http://yandex.com 

@John-Nagle
Copy link
Author

I have no strong preference on format. Please pick something parseable and implement it. Thank you.

@danbri danbri modified the milestones: 2015 Q2, sdo-gozer release May 12, 2015
danbri added a commit that referenced this issue Jul 24, 2015
Fixed broken html for terms pages (#375)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site tools + python code Infrastructural issues around schema.org site. Most can ignore this! type:bug A mistake or malfunction whose remedy should be straightforward technical work
Projects
None yet
Development

No branches or pull requests

5 participants