Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: copy as HTML #811

Closed
0xabu opened this issue Oct 16, 2018 · 18 comments
Closed

Feature request: copy as HTML #811

0xabu opened this issue Oct 16, 2018 · 18 comments

Comments

@0xabu
Copy link

0xabu commented Oct 16, 2018

mintty supports copying the selection as rich text format, however I've noticed that some "modern" Windows apps don't seem to support rich text and fall back to pasting plain text -- this includes the Mail app, and the OneNote app, for example. A workaround I've found for this is to copy from mintty, paste into Word, which imports/renders the rich text, then paste into the target application. I'm guessing (but haven't verified) that Word is converting to HTML here; if so, it would be nice if mintty natively supported copying the HTML.

@mintty
Copy link
Owner

mintty commented Oct 16, 2018

Mintty supports Export as HTML already (HTML Screen Dump from extended context menu), it's copied to a file, however.
To copy HTML to the clipboard, a suitable format needs to be selected. HTML is not among the Standard Clipboard Formats, but there is also an HTML Clipboard Format; why it's separately described is a bit obscure however.
Before I even consider trying that, I'd need an application that would in fact paste HTML, for testing.

@0xabu
Copy link
Author

0xabu commented Oct 16, 2018

Either of the built-in Windows 10 apps I mentioned (Mail, OneNote) would be good candidates.

I just did some tests with the debugger to try to dig into how this works. mintty calls user32!SetClipboardData with three formats: 0xd (Unicode text), 1 (plain text), and 0xc09f (RTF). Word retrieves the RTF format data on paste from mintty. When I copy from Word, it calls SetClipboardData for a whole plethora of formats:

  1. 0xc009 DataObject
  2. 0xc00e Object Descriptor
  3. 0xc09f Rich Text Format
  4. 0xc0bb HTML Format
  5. 1 Plain text
  6. 0xd Unicode
  7. 0xe Enhanced metafile handle
  8. 3 Metafile picture
  9. 0xc00b Embed Source
  10. 0xc004 Native
  11. 0xc003 OwnerLink
  12. 0xc00d Link Source
  13. 0xc00f Link Source Descriptor
  14. 0xc002 ObjectLink
  15. 0xc013 Ole Private Data

However, in the target apps, I don't see any hits on GetClipboardData -- it seems the clipboard API in UWP is entirely different. So I can't confirm they are using HTML, but it seems the most likely explanation.

@mintty
Copy link
Owner

mintty commented Oct 16, 2018

  1. 0xc0bb HTML Format

Where do you have this definition from? I see no defined value for CF_HTML in the docs or in the Windows SDK (WinUser.h does not mention CF_HTML at all). Also given that the description of the HTML Clipboard Format is utterly obscure and examples are broken, I don't see a promising way to implement this.

@0xabu
Copy link
Author

0xabu commented Oct 16, 2018

That's just the string returned by GetClipboardFormatNameA(0xc0bb), which I called inside the debugger, to figure out what formats Word was emitting. I don't know if that value is global or unique to my system -- from a quick read of the docs it looks like you are supposed to call something like RegisterClipboardFormat("HTML Format"), and it returns the ID.

Here's an example that came up in a quick search:
https://support.microsoft.com/en-us/help/274308/how-to-add-html-code-to-the-clipboard-by-using-visual-c

@0xabu
Copy link
Author

0xabu commented Oct 16, 2018

Another old example: https://blogs.msdn.microsoft.com/jmstall/2007/01/21/copying-html-on-the-clipboard/

It looks like the format is quite complex, but if we're lucky just pasting a prepared header on the front of standard HTML may be good enough.

@0xabu
Copy link
Author

0xabu commented Oct 16, 2018

What I wrote above about pasting a prepared header is bogus -- there are byte offsets in the header, but if you don't have context it's not terribly complex. In any case, this is the best summary of the format, sample code, and pointers to other references on the topic that I found: https://theartofdev.com/2014/06/12/setting-htmltext-to-clipboard-revisited/

@mintty
Copy link
Owner

mintty commented Oct 16, 2018

I had seen the latter page already but its examples are broken, too. In Figure 2, the offset values don't make any sense.

@mintty
Copy link
Owner

mintty commented Oct 16, 2018

Anyway, from these hints I guess I can construct something. A browser apparently copies this format, too, so it can be read to check the format. For the desired direction however, copying to the clipboard, I need an application that can paste it, as requested, for testing. Not something speculative but something I can use, please.

@0xabu
Copy link
Author

0xabu commented Oct 16, 2018

I think the problem with that example is that something has appended extra trailing whitepace in the version on the website. If I trim the trailing whitespace, at least the "StartHTML" field makes sense:

$ cat test.txt
Version:0.9
StartHTML:000000149
EndHTML:000000329
StartFragment:000000266
EndFragment:000000298
StartSelection:000000266
EndSelection:000000298%
$ wc -c test.txt
149 test.txt

I believe the EndHTML is also correct if you account for mangling of the UTF8.

Do you have access to Windows 10? As I said, the Mail and OneNote apps that come with it both paste this format. I'm checking whether there's a good (non-UWP) app that you can use.

@mintty
Copy link
Owner

mintty commented Oct 16, 2018

StartHTML

Yes, but StartFragment and StartSelection make no sense at all. Also there is no clear explanation about these two things which are the same in all examples.

@0xabu
Copy link
Author

0xabu commented Oct 16, 2018

Here's an open source Win32 app that supports HTML paste: http://openlivewriter.org/

https://github.com/OpenLiveWriter/OpenLiveWriter/blob/master/src/managed/OpenLiveWriter.CoreServices/DataObject/HTMLDataObject.cs seems to be the main parser.

The downside is that you need to create a throwaway account on a blog site (if you don't already have one) to use it.

@0xabu
Copy link
Author

0xabu commented Oct 16, 2018

Curiously they emit using header version 1.0 that Raymond Chen's blog explicitly says is broken :)

@mintty
Copy link
Owner

mintty commented Oct 17, 2018

Test case: copy some HTML from a browser (Firefox):
I can read clipboard format "text/html" from the clipboard as wide characters (UTF-16), but the contents is plain HTML, no preceding description tags.
I can also read clipboard format "HTML Format" as single-byte characters (actual encoding to be checked), and I get an initial line "Version:0.9" but no further tags, then the plain text, no HTML tags.
Hmm.

@0xabu
Copy link
Author

0xabu commented Oct 17, 2018

That's odd, because at least from the look of this code they do write the full header. You might also try Chrome (ref) or Edge.

mintty added a commit that referenced this issue Oct 22, 2018
@mintty
Copy link
Owner

mintty commented Nov 10, 2018

Released 2.9.4; note that HTML does not include Sixel graphics (yet).

@mintty mintty closed this as completed Nov 10, 2018
@0xabu
Copy link
Author

0xabu commented Nov 13, 2018

Thanks! Sorry I missed this when you first made the commit. I've tested and have some compat-related improvements to the HTML output. I'll clean them up and send you a PR to review.

@mintty
Copy link
Owner

mintty commented Dec 5, 2018

Released 2.9.5. HTML copy not enabled by default, but added to Options menu.
Also various HTML copy options available in extended context menu (Ctrl+right-click).

@0xabu
Copy link
Author

0xabu commented Feb 26, 2019

Thanks! Sorry I missed this when you first made the commit. I've tested and have some compat-related improvements to the HTML output. I'll clean them up and send you a PR to review.

I finally got around to doing this, but it turns out the fixes you already made for "tools like PowerPoint" in c2f48e0 also subsume my changes. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants