Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid HTML on "schema.org" schema pages. #375

Closed
John-Nagle opened this issue Mar 7, 2015 · 6 comments
Closed

Invalid HTML on "schema.org" schema pages. #375

John-Nagle opened this issue Mar 7, 2015 · 6 comments

Comments

@John-Nagle
Copy link

@John-Nagle John-Nagle commented Mar 7, 2015

"http://schema.org/Thing" and its descendants are no longer valid HTML. This seems to have been broken in Q1-Q2 2014, based on archived versions in the Internet Archive. In particular, the list of "More Specific Types" uses "li" items which are no longer within a "ul". (There's also an extra "/table" and a bogus "/span" tag in there.) This makes it harder to machine-process that data.

I assume those pages are automatically generated from "https://github.com/schemaorg/schemaorg/blob/master/data/schema.rdfa", which was new in 2014, so this is probably just a bad template.

W3C validation report: http://validator.w3.org/check?uri=http%3A%2F%2Fschema.org%2FThing&charset=%28detect+automatically%29&doctype=Inline&group=0

@danbri
Copy link
Contributor

@danbri danbri commented Mar 13, 2015

Thanks, yes we moved to a new codebase early last year. We ought to fix this, indeed...

@danbri danbri self-assigned this Mar 13, 2015
@danbri danbri added this to the sdo-gozer release milestone Mar 13, 2015
@John-Nagle
Copy link
Author

@John-Nagle John-Nagle commented Mar 13, 2015

Thanks. The RDFa version of the same data, "http://schema.org/docs/schema_org_rdfa.html" (which comes from "https://github.com/schemaorg/schemaorg/blob/master/data/schema.rdfa") could also use some work. It's OK up until line 4536; then it gets weird. There are "href" attributes on "span" tags, starting at "action The movement the muscle generates." It looks like some schema of medical information in a similar, but not quite compatible, format was pasted in there. There's similar cut and paste trouble near "series" and "wikidoc". Some of this won't even parse properly as HTML5 in a browser.

Validator:
http://validator.w3.org/nu/?doc=http%3A%2F%2Fschema.org%2Fdocs%2Fschema_org_rdfa.html

@danbri
Copy link
Contributor

@danbri danbri commented Mar 13, 2015

Thanks. Don't spend too much time on the schema.rdfa file's compatibility - in its current form it is an implementation detail. It turned out not to be ideal to use RDFa for this, and we are looking into migration to JSON-LD anyway. But I need to integrate and test a decent parser for that first...

@timbl
Copy link

@timbl timbl commented Mar 13, 2015

Could you folks please make Turtle an option.

  • It is simpler than rdf/a or json/ld
  • It is a native graph language not a tree language like json or xml
  • It is the one required common language in the linked data platform.
  • It can be read by things old libraries
    Tim
@chaals
Copy link
Contributor

@chaals chaals commented Mar 13, 2015

  • reply@- notifications@  13.03.2015, 22:53, "Tim Berners-Lee" notifications@github.com:Could you folks please make Turtle an option.You mean getting a turtle version of information we have, or reading Turtle from pages (how)? cheers It is simpler than rdf/a or json/ldIt is a native graph language not a tree language like json or xmlIt is the one required common language in the linked data platform.It can be read by things old libraries Tim—Reply to this email directly or view it on GitHub.   --Charles McCathie Nevile - web standards - CTO Office, Yandexchaals@yandex-team.ru - - - Find more at http://yandex.com 
@John-Nagle
Copy link
Author

@John-Nagle John-Nagle commented Mar 27, 2015

I have no strong preference on format. Please pick something parseable and implement it. Thank you.

@danbri danbri modified the milestones: 2015 Q2, sdo-gozer release May 12, 2015
danbri added a commit that referenced this issue Jul 24, 2015
Fixed broken html for terms pages (#375)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.