Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

export/manage as XML #8

Closed
dret opened this issue Sep 11, 2013 · 6 comments
Closed

export/manage as XML #8

dret opened this issue Sep 11, 2013 · 6 comments

Comments

@dret
Copy link

dret commented Sep 11, 2013

being an XML dinosaur, i'd love to get all of this as XML. but then again, there probably is not a enough regularity in the current markdown, so maybe that's too much to ask for? in whichever way it's implemented, wouldn't it be great to have a machine-readable version of all of this information? it would be simple to generate markdown, but much simpler to also generate other formats: whatever people want. and out of curiosity: how are the emacs and js versions generated currently?

@andreineculau
Copy link
Member

generating XML that should be fine and rather easy - generation is a low-level hack atm (look in the dev branch). And I guess you can either go md -> xml or json -> xml, but I'd prefer the former since md is the source.

PS: I went for having the markdown as primary-source - it will be humans making additions, and I don't plan on adding more information that the "title, description, link" rule-of-thumb, so the markdown will be rather easy to parse by brute regexps.

@andreineculau
Copy link
Member

I refactored a bit the master&dev branch to reflect the intention better

@dret
Copy link
Author

dret commented Sep 12, 2013

"let's people do what they want and some regexes will parse that into robust structures" is among the more famous last words before something went down in flames. of course entirely your decision, but i think i'd rather stay away from writing regexes that probably break every now and then.
over at https://github.com/dret/HTML5-overview i have decided to go the opposite route and start from XML and drive MD from that (still need to work on that... :-), but of course that's also because i am an XML guy and have no issues with editing XML, which is something that maybe many people just don't want to do.
anyway, great initiative, and good luck!

@dret dret closed this as completed Sep 12, 2013
@andreineculau
Copy link
Member

Shame on me for expecting a boring reference to Now you have two problems :)

FWIW I have obviously started with the same reasoning locally (YML actually, not JSON; no visible commits) but I quickly switched to this "primitive" alternative. Just to lay down some thoughts leading to this outcome:

  1. a project switching from structured data to MD&regexes -> never (say never). It just feels stupid. But if I ever sense that the current setup is creating grief, be sure I will switch to structured data. Not sure if I will go through the trouble of trying a Markdown2AST parser first.
  2. I wanted to make use of github's MD rendering
  3. I wanted the (github's rendered) MD to always be the-latest-version because that's what people will read
  4. it's ok for the structured data to be out-of-date because it is for machines and they will be targeting a tag/hash, so they'll be out-of-date anyways.

Nice project you have as well, and
repeat after me: This data is "beautifully, unapologetically XML" :)

@dret
Copy link
Author

dret commented Sep 13, 2013

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 would be the most appropriate reference here. i'll be happily managing my XML over at https://github.com/dret/HTML5-overview and contribute to HTTP in cthulhu markup ;-)
if you're still mildly interested: for HTML5, the XML is the master, but refreshing really is nothing more than running the xml2ms.xslt XSLT, which takes around 0.1 sec on my machine. done, all MDs refreshed, and no brittle regex magic required for anything. and i think it's more the other way around: if you provide an easily consumable starting point, you might find others (such as myself) using it to do interesting things. if you don't, these things are simply less likely to happen. so waiting for them to happen and then making the switch is kind of backwards.

@andreineculau
Copy link
Member

FWIW

the most appropriate reference here

I'm not using regex to parse HTML. I'm using regex to parse some very simple MD (specifically rows only, meaning column=pipe delimited tokens).

if you provide an easily consumable starting point

But I do - it's not MD, it's JSON atm. That's what is intended as a published package. I don't expect anyone else to consume MD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants