Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with links from docx/odt to markdown #2689

Closed
Pullusb opened this issue Jan 30, 2016 · 4 comments
Closed

Problem with links from docx/odt to markdown #2689

Pullusb opened this issue Jan 30, 2016 · 4 comments

Comments

@Pullusb
Copy link

Pullusb commented Jan 30, 2016

linkmarkdown.docx
converting this document to markdown put a "\n" with an empty text link before each link and split unevenly when the link is a sentence.
The link is also made italic wich I don't want.
ex :

the dvorak keyboard (where the link is 'dvorak')
the[
](link)[*dvorak*](link)

the movie "le nom des gens". (where the link is ' "le nom des gens" ' )
The movie[
](link)[*"*](link)[*L*](link)[*enom des gens"*](link)

Note: this docx file was downloaded from a google doc.

@jkr
Copy link
Collaborator

jkr commented Jan 31, 2016

Weird -- they're actually different links in the docx xml file. I can collapse them, but that would only work if they go to the same place. I guess adjacent links to the same place should be collapsed in general. I wonder if this is something about how word handles French? Any way, thanks for reporting this -- I'll let you know as soon as I figure out the best way to proceed.

@Pullusb
Copy link
Author

Pullusb commented Jan 31, 2016

Actually, when I write directly the word document or a google doc with links, the file is clean and the links are processed correctly by pandoc.
I know why the file is faulty.
It was originally a copy/paste from a text with links already generated by a markdown (on my ghost blog). It seems that the copy of this document create multiple links where it's suppose to be only one. Maybe the markdown/text conversion of the ghost platform isn't clean...

@jkr
Copy link
Collaborator

jkr commented Jan 31, 2016

Well, the docx reader should be robust against these things anyway, since documents come from all sorts of sources. I have a fix almost ready to push -- just fixing up a couple of function names. Is it okay if I use part of the file above as a test case?

@Pullusb
Copy link
Author

Pullusb commented Jan 31, 2016

No problem, feel free to use it.

@jkr jkr closed this as completed in 2ee7752 Feb 2, 2016
c-forster pushed a commit to c-forster/pandoc that referenced this issue Mar 4, 2016
We want to make sure that links have their spaces removed, and are
appropriately smushed together.

This closes jgm#2689
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants