Preserve as much of the original structure as possible #47

gitonthescene · 2020-05-29T03:05:18Z

Hello there,

Thanks again for such an awesome project. It would be great to have an orga-stringify utility to fit more completely into the unified ecosystem and open ourselves up to using more transform tools. Then we could parse org files to an AST, transform them and then re-render the org. Ideally minimal transformations would re-render something pretty close to the original. To do that, we'd need to preserve as much of the original structure as possible.

I propose something like these changes. I'm after the effect more than the approach so I'm happy to discuss/modify/whatever. If you'd like me to make this a pull request, please let me know.

My thinking is that the extra structure in the AST can always be stripped when not needed. For instance, you could filter out whitespace/keyword nodes as well as trim() inner text if desired. But having it in the AST allows us to (nearly) faithfully re-render the original org file.

In there is a separate commit with the changes to the snapped files if you just want to see the effect on the AST. I think in a couple of cases it even renders a bit more accurately.

Again, more than happy to discuss.

Thanks again,
-Doug

P.S. I have a prototype for orga-stringify as well which I'll add to my fork as soon as I figure out how lerna works.

gitonthescene · 2020-05-29T06:02:36Z

I've now incorporated orga-stringify into my fork. It's just pure javascript currently. But when you run the following code on this sample org file it differs from the orginal only by a single trailing new line.

const unified = require("unified");
const vfile = require("to-vfile");
const parse = require("orga-unified");
const render = require("orga-stringify");
const processor = unified().use(parse).use(render, { toJSON: false });

function main() {
  processor
    .process(
      vfile.readSync(
        "/sample/orgfile.txt"
      )
    )
    .then(
      (file) => {
        process.stdout.write(String(file));
      },
      (err) => {
        console.log(String(err));
      }
    );
}
main();

It optionally just spits out the JSON version of the tree using your getCircularReplacer() function.

gitonthescene · 2020-06-01T03:02:43Z

The head version of my fork now handles the trailing newline. Moreover, it completely reproduces all of the test examples but three. It renumbers two list examples where the numbers are out of order and it reformats a raggedly entered table into a more rectangular one.

gitonthescene · 2020-06-08T08:47:09Z

Hey there,

Not that this needs to be a goal to have these line up, but for curiosity sake I wrote the following tiny elisp function to have a look at what the emacs internal syntax tree looks like for a given org buffer:

(defun grab-org-nodes (node)
  (list (if (listp node) (car node)) (-map 'grab-org-nodes (om-get-children node))))

You need to package-install both dash.el and om.el to run it. It's just a general outline of the tree. Non-node types show up as nil.

Regards,
-Doug

gitonthescene · 2020-06-12T07:32:04Z

Also, to align with the unified structure maybe orga-unified should be called orga-parse sort of like remark-parse and there can be another package with a frozen parser like remark. Or maybe just make orga-unified have the processor.

gitonthescene · 2020-07-04T03:55:03Z

It would be great to get a reply here. The more full featured the tools are the more likely they are to be used.

boj · 2020-07-27T23:48:07Z

@gitonthescene Perusing through this project and wanted to say that this all seems to be on the right track. The ability to convert to<->from the source material without altering it would be a great use case for the toy I have in mind.

gitonthescene · 2020-07-27T23:52:30Z

Thanks. You're welcome to play with my fork. I'm happy to answer any questions you might have.

xiaoxinghu · 2020-08-10T03:34:54Z

@gitonthescene orga-stringify looks amazing, I was busy working on v2, part of the reason is that with the strongly typed codebase, it's much easier to collaborate and have a set of conventions. Can you have a look at the current master see if you can adopt the new style. also with v2 we now have Position in nodes. It's extra information that might be useful for faithfully rerender the org-mode text. I'd like to help with any issues.

xiaoxinghu · 2020-08-10T03:43:32Z

I'd like your opinion here. We now have the ability to tokenize everything including whitespaces, do you think that's a good idea to include all tokens in the AST? I was worried that it's going to be too verbose. So that's why I currently skip all the whitespaces. We can easily change it now. We do have the newline token though, but it's not included in the final Syntax Tree. What's your thought?

gitonthescene · 2020-08-10T04:19:44Z

Hey, thanks for getting back. I think it makes sense to put in all the tokens until they become a performance problem and even then make the level of detail optional. The reason I say this is that some people may want the full detail to "edit" the tree and then stringify it. That was my use case. The only potential problem I see from the extra detail is performance in processing, but as Knuth says, "The greatest evil in the world is premature optimization". You can always transform a detailed tree into a less detailed tree, but you can't go the other way around. It might even be worth providing a transformer or two which strips whitespace or whatever just to demonstrate. I'm happy to contribute code.

I'll have a look at the master and try to rework orga-stringify. As I said in one of these issues, I was more after the effect than insisting on an approach. I'm a big believer in programming "for effect" (i.e. to an API) since you can always revisit the code later. Plus shipping results helps keep users interested.

Thanks again,
-Doug

P.S. since most use cases of this are build time I'd bet most people aren't that performance sensitive.

xiaoxinghu · 2020-08-10T04:51:11Z

Also, to align with the unified structure maybe orga-unified should be called orga-parse sort of like remark-parse and there can be another package with a frozen parser like remark. Or maybe just make orga-unified have the processor.

My intention for orga is to be standalone, even though it is heavily modelled after remark, but the package orga itself is self-contained. So for the naming of the packages, remark is a unified processor, but orga is not, it's basically a function that parses a string into a syntax tree. So I am thinking of renaming orga-unified into orga-unified-parse, because we are going to add more plugins into the ecosystem, like orga-unified-toc etc. Just to give a hint that these packages should be used within unifiedjs ecosystem. And they are just wrapper around packages like oast-to-hast, which is standalone (the only "dependency" is the HAST definition, which is kind of standard convention rather than dependency). orga-unified-toc should be a think wrapper around oast-toc, just like remark-toc is to mdast-util-toc. What do you think?

Take a look at PR #62

gitonthescene · 2020-08-11T00:16:31Z

If you mean you want to keep unified wrappers separate from a core orga library, I think that that makes sense. One of the things I like about the unified setup is that it tends to be made up of a lot of small packages so that you only have to pull in what you need, sort of like the UNIX philosophy. If that's the plan, then having a consistent naming for the unified wrappers also makes sense and I think your suggestions sound good. (FWIW, I wasn't really sure what names to use when I made the suggestion above.) FWIW, @wooorm seems like a really helpful guy.

I do kind of like reorg- as a prefix, though. If nothing else, it's less typing.

tconfrey · 2021-03-30T20:34:10Z

@gitonthescene @xiaoxinghu did anything ever come of this?

For my application I'm only concerned about the header, paragraph text and link elements. I was originally dropping any other elements and handling writing out the header/para/links in an application-specific manner. Most recently I've updated to V2 and am now using the position attributes to save the original text and mirror it back out.

gitonthescene mentioned this issue May 29, 2020

Capture notes should have a preview mode MobileOrg/mobileorg#264

Open

This was referenced Jun 6, 2020

newline breaks italic markup #34

Open

Table formula tags are not binded to the table node #38

Open

No inline markup in blocks #16

Open

Code block indentation #45

Closed

gitonthescene mentioned this issue Jul 4, 2020

Why no DEADLINE: and SCHEDULED: ? #55

Closed

xiaoxinghu added the discussion Ideas label Aug 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve as much of the original structure as possible #47

Preserve as much of the original structure as possible #47

gitonthescene commented May 29, 2020

gitonthescene commented May 29, 2020

gitonthescene commented Jun 1, 2020

gitonthescene commented Jun 8, 2020

gitonthescene commented Jun 12, 2020 •

edited

Loading

gitonthescene commented Jul 4, 2020

boj commented Jul 27, 2020

gitonthescene commented Jul 27, 2020

xiaoxinghu commented Aug 10, 2020

xiaoxinghu commented Aug 10, 2020 •

edited

Loading

gitonthescene commented Aug 10, 2020

xiaoxinghu commented Aug 10, 2020 •

edited

Loading

gitonthescene commented Aug 11, 2020 •

edited

Loading

tconfrey commented Mar 30, 2021

Preserve as much of the original structure as possible #47

Preserve as much of the original structure as possible #47

Comments

gitonthescene commented May 29, 2020

gitonthescene commented May 29, 2020

gitonthescene commented Jun 1, 2020

gitonthescene commented Jun 8, 2020

gitonthescene commented Jun 12, 2020 • edited Loading

gitonthescene commented Jul 4, 2020

boj commented Jul 27, 2020

gitonthescene commented Jul 27, 2020

xiaoxinghu commented Aug 10, 2020

xiaoxinghu commented Aug 10, 2020 • edited Loading

gitonthescene commented Aug 10, 2020

xiaoxinghu commented Aug 10, 2020 • edited Loading

gitonthescene commented Aug 11, 2020 • edited Loading

tconfrey commented Mar 30, 2021

gitonthescene commented Jun 12, 2020 •

edited

Loading

xiaoxinghu commented Aug 10, 2020 •

edited

Loading

xiaoxinghu commented Aug 10, 2020 •

edited

Loading

gitonthescene commented Aug 11, 2020 •

edited

Loading