Invalid HTML on "schema.org" schema pages. #375

Open
John-Nagle opened this Issue Mar 7, 2015 · 6 comments

Projects

None yet

4 participants

@John-Nagle

"http://schema.org/Thing" and its descendants are no longer valid HTML. This seems to have been broken in Q1-Q2 2014, based on archived versions in the Internet Archive. In particular, the list of "More Specific Types" uses "li" items which are no longer within a "ul". (There's also an extra "/table" and a bogus "/span" tag in there.) This makes it harder to machine-process that data.

I assume those pages are automatically generated from "https://github.com/schemaorg/schemaorg/blob/master/data/schema.rdfa", which was new in 2014, so this is probably just a bad template.

W3C validation report: http://validator.w3.org/check?uri=http%3A%2F%2Fschema.org%2FThing&charset=%28detect+automatically%29&doctype=Inline&group=0

@danbri
Contributor
danbri commented Mar 13, 2015

Thanks, yes we moved to a new codebase early last year. We ought to fix this, indeed...

@danbri danbri self-assigned this Mar 13, 2015
@danbri danbri added this to the sdo-gozer release milestone Mar 13, 2015
@John-Nagle

Thanks. The RDFa version of the same data, "http://schema.org/docs/schema_org_rdfa.html" (which comes from "https://github.com/schemaorg/schemaorg/blob/master/data/schema.rdfa") could also use some work. It's OK up until line 4536; then it gets weird. There are "href" attributes on "span" tags, starting at "action The movement the muscle generates." It looks like some schema of medical information in a similar, but not quite compatible, format was pasted in there. There's similar cut and paste trouble near "series" and "wikidoc". Some of this won't even parse properly as HTML5 in a browser.

Validator:
http://validator.w3.org/nu/?doc=http%3A%2F%2Fschema.org%2Fdocs%2Fschema_org_rdfa.html

@danbri
Contributor
danbri commented Mar 13, 2015

Thanks. Don't spend too much time on the schema.rdfa file's compatibility - in its current form it is an implementation detail. It turned out not to be ideal to use RDFa for this, and we are looking into migration to JSON-LD anyway. But I need to integrate and test a decent parser for that first...

@timbl
timbl commented Mar 13, 2015

Could you folks please make Turtle an option.

  • It is simpler than rdf/a or json/ld
  • It is a native graph language not a tree language like json or xml
  • It is the one required common language in the linked data platform.
  • It can be read by things old libraries
    Tim
@chaals
Contributor
chaals commented Mar 13, 2015
  • reply@- notifications@  13.03.2015, 22:53, "Tim Berners-Lee" notifications@github.com:Could you folks please make Turtle an option.You mean getting a turtle version of information we have, or reading Turtle from pages (how)? cheers It is simpler than rdf/a or json/ldIt is a native graph language not a tree language like json or xmlIt is the one required common language in the linked data platform.It can be read by things old libraries Tim—Reply to this email directly or view it on GitHub.   --Charles McCathie Nevile - web standards - CTO Office, Yandexchaals@yandex-team.ru - - - Find more at http://yandex.com 
@John-Nagle

I have no strong preference on format. Please pick something parseable and implement it. Thank you.

@danbri danbri modified the milestone: 2015 Q2, sdo-gozer release May 12, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment