Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about grey-areas in the specification #3

Open
marienfressinaud opened this issue Apr 28, 2022 · 7 comments
Open

Questions about grey-areas in the specification #3

marienfressinaud opened this issue Apr 28, 2022 · 7 comments

Comments

@marienfressinaud
Copy link

marienfressinaud commented Apr 28, 2022

Hi @scripting, I'm working on my PHP lib to parse OPML (lib_opml) and I have (at least) one question. The specification states:

A <head> contains zero or more optional elements, described below.

so I understand the head elements are strictly limited to the list of 13 elements of the spec (e.g title, dateCreated, etc.), but the states.opml example contains some elements which aren't defined (e.g. description, ownerTwitterScreenName, etc.)

Is it an error in the example or should we clarify the specification?

Edit: the spec also states in "Extending OPML":

An OPML file may contain elements and attributes not described on this page, only if those elements are defined in a namespace, as specified by the W3C.

@scripting
Copy link
Owner

Hi @marienfressinaud --

It's a good point. It's been a long time since the spec was written and we've been adding elements to the head section pretty much at will, without documenting them.

We might do an addendum to the spec at some point to list them all.

A lot of them are mentioned in the blog docs for Drummer, my latest outliner whose native file format is OPML

http://drummer.scripting.com/

The blogging docs are linked to from the Docs menu.

@scripting
Copy link
Owner

BTW, it's clear the docs are going to have to mention this. Thanks for calling out the question.

@marienfressinaud
Copy link
Author

I have a few other comments (e.g. what to do with attributes which does not match the format, such as dates attributes). Do you want I add them to this ticket, or should I open a ticket for each comment? (it will probably not be the next few days since I'm pretty busy)

@scripting
Copy link
Owner

Just add them to this thread.

@marienfressinaud
Copy link
Author

Sorry for the delay, here's my list of questions/comments I had while reading the specifications.

  • what should be the default behaviour of parsers if a file is not a valid OPML? Some errors I could think of:
    • missing (or more than 1) head or body elements
    • missing or invalid version
    • unknown head elements
    • body containing non-outline elements (or empty body)
    • type errors (e.g. datetime, integer, boolean, email or HTTP URL elements/attributes not being of the correct type)
    • missing attributes (text, xmlUrl for type=rss, url for type=link or type=include)
  • could the numbers elements be of type float? or are they only integer?
  • are isComment and isBreakpoint case-sensitive?
  • category is not clear to me
    • the mentioned "RSS 2.0 category" document doesn't define the element the same way (no mentions of comma-separated strings)
    • is the attribute category="Science / Communication" (i.e. not starting with a /) a "tag"?
  • "HTTP URLs" are never defined in the spec (add a reference to RFC 9110?)
  • date-times must conform to RFC 822 with an exception for years: 4 characters are accepted. This is defined by the RFC 1123 so it could be mentioned

About the errors, my approach in lib_opml is to offer a way to lower the strictness of the parser (default is not strict) and of the renderer (default is strict). When strictness is disabled, lib_opml ignores most of the errors and populates its structure with the raw data. I think it's the best approach for a parser since a lot of OPMLs generated by aggregators are invalid out there ;)

@marienfressinaud marienfressinaud changed the title Are head elements strictly limited to the spec list? Questions about grey-areas in the specification Jul 24, 2022
@mincerafter42
Copy link

Must the <head> element be before the <body> element? I've only seen them in that order but the spec doesn't specify as such.

@scripting
Copy link
Owner

@mincerafter42 -- OPML is XML and XML says nothing about the order of elements, so you could put the body element before the head element, but why would you?

@marienfressinaud -- I'll answer some of your questions...

  1. The basic rule is ignore what you don't understand. Other than that I don't know what people should do if there are missing attributes or more than one head element. If you can't make sense of something, don't try.

  2. I don't know what numbers elements you're referring to. But if the spec calls for a number, it's an integer, not a floating point value.

  3. XML is case-sensitive and OPML is XML so it is also case-sensitive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants