Document what the text post-processor is for #41

cburgmer · 2012-04-17T21:34:05Z

Looking at the post-processor under text.py I don't fully understand what its purpose is. Is it designed to produce nice and human readable output (but then why are tags with attributes preserved?) or just to strip off wiki markup?

I am looking for a transformation to get text only, mostly for doing processing with NLTK later on. Something like ** for bold texts might be agreeable.

erikrose · 2012-04-17T21:50:27Z

Thanks for giving the library a spin! The text renderer is meant to output a human-readable textual representation. If it's spitting out tags and attrs, then that's a bug, and I'd be happy to take patches against it.

If you want to customize the output, you can use raw.py instead, giving you a raw AST to play with.

Incidentally, at some unspecified point in the future, I'm going to finish Parsimonious (https://github.com/erikrose/parsimonious/) and port the MW grammar to that, at which time I'll start ignoring this.

peter17 · 2012-04-26T15:38:35Z

Yes, the text post-processor is designed to produce nice and human readable output.

As for tags, in the HTML post-processor, you have two kinds of tags: allowed and disallowed. By default, all tags are disallowed. In this case, they are treated as "normal" text, that's why "" is rendered as "": by default, it is not a tag. "Allowed" tags are interpreted when they are implemented (like <p>, <br/>...). In this case, they don't appear anymore in the output.

In the text post-processor, you can't currently define which tags are allowed or disallowed. They are all treated as text, except <p> and <br /> which will be interpreted as new paragraph and line break.

I think we can make a better output with the text renderer. I spent some time looking at how we can adapt the HTML renderer for this purpose. It's quite long to do and I don't have the time right now, but please feel free to propose improvements if you want to.

peter17 · 2012-04-26T21:46:44Z

Finally, I felt inspired. I proposed a first version of a new text post-processor based on the HTML one. Please feel free to test it and propose improvements.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document what the text post-processor is for #41

Document what the text post-processor is for #41

cburgmer commented Apr 17, 2012

erikrose commented Apr 17, 2012

peter17 commented Apr 26, 2012

peter17 commented Apr 26, 2012

Document what the text post-processor is for #41

Document what the text post-processor is for #41

Comments

cburgmer commented Apr 17, 2012

erikrose commented Apr 17, 2012

peter17 commented Apr 26, 2012

peter17 commented Apr 26, 2012