Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wordpress importer doesn't translate [code] #1186

Closed
asmeurer opened this issue Mar 30, 2014 · 19 comments
Closed

Wordpress importer doesn't translate [code] #1186

asmeurer opened this issue Mar 30, 2014 · 19 comments

Comments

@asmeurer
Copy link
Contributor

In my Wordpress block, I made good use of their code blocks, which are delimited like

[code]
code here
[/code]

or, to use a language

[code language="py"]
python_code_here
[/code]

(I think language can also be spelled just lang).

If the wordpress importer uses Markdown, it could just translate [code] to , `[code lang='x']` to x, and enable fenced code blocks.

@Kwpolska
Copy link
Member

This is a wordpress.com-only thing, documented here: http://en.support.wordpress.com/code/posting-source-code/

@Kwpolska Kwpolska added this to the v7.0.0 milestone Mar 30, 2014
@asmeurer
Copy link
Contributor Author

I just did a simple regex replace of \[/?code( lang(uage)?="(.*)")?\] with ```\3. It would have been nice if the importer did this for me, though.

@asmeurer
Copy link
Contributor Author

One issue with it was that wordpress replaced things like > with &gt;. So I guess it maybe needs to be smarter than this, using some <pre> environment that calls directly to the syntax highlighter.

@Kwpolska
Copy link
Member

We’d need a smarter parser, because:

  • lang(uage) is not required, it can be just [code]
  • there are other meta fields we do not support
  • &lt; &gt; &amp; replacements would break it

This requires some work and time, though. Someone might take care of this, one day…

@asmeurer
Copy link
Contributor Author

lang(uage) is not required, it can be just [code]

That is handled by my regular expression

The other problem I found is that there are blank lines between each line in [code] blocks.

@asmeurer
Copy link
Contributor Author

There is already this https://github.com/getnikola/nikola/blob/master/nikola/plugins/command/import_wordpress.py#L352, but I don't understand what it is doing. What does the ~~~~~~~~~~~ do?

@Kwpolska
Copy link
Member

probably some other form of fenced code blocks in python-markdown. (it’s also seemingly for another code-block extension. wordpress is a fucking mess.)

@asmeurer
Copy link
Contributor Author

I am willing to take a shot at fixing this, since I have to do it anyway for my imported blog.

@Kwpolska
Copy link
Member

Understood. Go on, and have fun. (make sure to read the documentation for the wordpress feature, linked above.)

@asmeurer
Copy link
Contributor Author

I'll probably end up leaving the configuration parameters unimplemented, as I didn't use them.

@Kwpolska
Copy link
Member

s/unimplemented/ignored/

s/I didn't/Nikola doesn’t/

@asmeurer
Copy link
Contributor Author

The extra newlines are only appearing in my oldest blog posts. So probably that was some Wordpress thing that was fixed. There's no reliable way to detect the issue programmatically (unless we have some date that we know it changed).

@asmeurer
Copy link
Contributor Author

OK, so I guess the only question I have is, what is the best way to convert things like &lt; back to <?

@asmeurer
Copy link
Contributor Author

s/unimplemented/ignored/

I was actually planning on unimplemented, but I guess I can do ignored too. I just need to make my regex a little more general.

@Kwpolska
Copy link
Member

Just happily ignore the newline issue, as users are meant to review the output anyways.

For converting, you’d need to know that you are in a [code] block and do replacements there. But that is not easy, not without some hardcore parsing. (though I can think of a regexp cheat that goes like [code(.*?)](.*?)&lt;(.*?)[/code] → [code\1]\2<\3[/code] — though this might very well fail, especially in corner cases, like ]<[ or ]aa<[ or ]<aa[.

@asmeurer
Copy link
Contributor Author

I mean, is there a &whatever; to the appropriate actual character mapping somewhere (standard library, external library, already in Nikola, ...)?

@Kwpolska
Copy link
Member

There are three you need to take care of:

&amp; = &
&lt;  = <
&gt;  = >

@asmeurer
Copy link
Contributor Author

Ah, the rest are just literal? I found http://www.ascii.cl/htmlcodes.htm. I think I also need &quot;.

@asmeurer
Copy link
Contributor Author

#1187

@Kwpolska Kwpolska modified the milestones: v7.1.0, v7.0.0 May 16, 2014
@Kwpolska Kwpolska modified the milestones: v7.2.0, v7.3.0 Nov 2, 2014
@ralsina ralsina modified the milestones: v8.0.0, v7.3.0 Jan 13, 2015
@ralsina ralsina modified the milestones: v8.0.0, 7.5.0 May 20, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants