Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pasting content/text copied from iOS Notes app has formatting issues. #128

Closed
JohnKuan opened this issue Aug 22, 2023 · 15 comments · Fixed by #180
Closed

Pasting content/text copied from iOS Notes app has formatting issues. #128

JohnKuan opened this issue Aug 22, 2023 · 15 comments · Fixed by #180

Comments

@JohnKuan
Copy link
Contributor

JohnKuan commented Aug 22, 2023

Hi @stevengharris

I am testing a common workflow on iOS that when a user copies a simple text from the default iOS Notes app. It results in a blocks of \n around the text.

From my investigation, the text copied directly from Notes app is a html text in the format below.

"
<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\" \"http://www.w3.org/TR/html4/strict.dtd\">\n <html>\n <head>\n
    <meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\n
    <meta http-equiv=\"Content-Style-Type\" content=\"text/css\">\n <title></title>\n
    <meta name=\"Generator\" content=\"Cocoa HTML Writer\">\n <style type=\"text/css\">
      \np.p1 {
        margin: 0.0px 0.0px 3.0px 0.0px;
        font: 28.0px \'.AppleSystemUIFont\'
      }

      \nspan.s1 {
        font-family: \'UICTFontTextStyleBody\';
        font-weight: bold;
        font-style: normal;
        font-size: 28.00px
      }

      \n
    </style>\n
  </head>\n <body>\n <p class=\"p1\">
      <span class=\"s1\">New note <span class=\"Apple-converted-space\">&nbsp;</span>
      </span>
    </p>\n </body>\n </html>\n
"

The resulting view will look like this

IMG_0003

Could this be an issue with the pasteboard adding \n to the copied text?


Related issue:

This time I am appending the paste string on the top of an existing content.
The paste string does not show up upon Paste.

Another experiment where I tried to unescape the text above and the text shows up on the screen. But it also has the big block of spaces above.

Simulator Screenshot - iPhone 14 Pro Max - 2023-08-22 at 16 48 46

@stevengharris
Copy link
Owner

Info pasted from Notes is just text with newlines (as opposed to HTML), and the newlines should come in to the MarkupEditor as <p><br></p>. The logic for cleaning up the newlines definitely has some bugs in it. I am working on it and will add some tests to make sure it's covered properly. Thanks.

@stevengharris
Copy link
Owner

This has been made more complicated because I am working on MacOS 14 Sonoma, and some behavior on the JavaScript side changes that I was implicitly relying on. For example, it seems that range.extractContents() leaves the selection in the document in a slightly different state, and I have code that was relying on that state, particularly in handling Enter across selections. That isn't directly related to this bug, but it's preventing me from passing tests cleanly. So, I pulled on that thread until I found its end, and all my tests pass again. Unfortunately, there is a regression in one of the GitHub Actions Ventura-based tests that I have to sort out in a way that passes both on Ventura and Sonoma.

I'm going to open up a Discussion on supporting RTF-based pasting. I found that the Notes application paste buffer does not contain a public.html type, only a public.rtf type. It doesn't look difficult (haha) to go from RTF->AnnotatedString and then from AnnotatedString->HTML which could then be "cleaned up" like other HTML and pasted into the MarkupEditor. However, I'm not sure how much would be lost in that flow, and I don't want to deal with it right now. Still, it's annoying to lose bold/italic/underline and lists from content grabbed from Notes and probably other places.

@stevengharris
Copy link
Owner

stevengharris commented Sep 2, 2023

The fixes for this pasting-from-notes issue should be fixed with #133. I added new tests into testPasteTextPreprocessing and testPasteHtmlPreprocessing to cover it.

It does raise another issue about embedded newlines in pasted text. These are currently kept in-place during paste, unless they occur at the beginning or end of the pasted text, in which case they result in empty paragraph elements (<p><br></p>). So, for example, pasting "Hello<br>world" into an empty paragraph (<p><br></p>) results in <p>Hello<br>world</p> and displays properly, except that the line spacing is compressed because it's not two paragraphs. I think this is correct and what people would expect. Unfortunately, there is no way to put a <br> into text yourself while editing, which is inconsistent. (FWIW, it's pretty easy to do that in Markdown using a trailing \). I'm going to open another issue on this topic.

@JohnKuan
Copy link
Contributor Author

Hi @stevengharris,

I am checking on this issue again as it does not seem to have worked on iOS Notes. I have tested this on my real device.

I saw you have added testing code for MacOS Notes app copied text. I tried to replicate it on MacOS Notes to an iOS simulator and it does seem to be working fine. However with the same note copied from iOS Notes app and paste the exact same way, it does not work.

I went to check a little into the UIPasteboard logic.

For the iOS copied text on, there are 6 pasteboard types,

▿ 6 elements

  • 0 : "com.apple.notes.richtext"
  • 1 : "iOS rich content paste pasteboard type"
  • 2 : "com.apple.flat-rtfd"
  • 3 : "public.html"
  • 4 : "com.apple.webarchive"
  • 5 : "public.utf8-plain-text"
    of which one is a type 'public.html'. This causes it to be detected as a html text by your pasteableType() logic.
    However for the MacOS copied text, there are 4 pasteboard types.
    ▿ 4 elements
  • 0 : "com.apple.notes.richtext"
  • 1 : "public.rtf"
  • 2 : "public.utf8-plain-text"
  • 3 : "public.utf16-external-plain-text"

I think this difference is the first reason why there are two different behaviors for the app on MacOS and iOS.

The second issue that I suspect is the logic for

if let data = pasteboard.data(forPasteboardType: "public.html") {
           pasteHtml(String(data: data, encoding: .utf8)
}

This is the text on my notes app on iPhone.

IMG_82A189286441-1

This is the string from pasteboard.string
Hello there\nObi-wan\n

This is the html string that is derived from String(data: data, encoding: .utf8)
<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\" \"http://www.w3.org/TR/html4/strict.dtd\">\n<html>\n<head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\n<meta http-equiv=\"Content-Style-Type\" content=\"text/css\">\n<title></title>\n<meta name=\"Generator\" content=\"Cocoa HTML Writer\">\n<style type=\"text/css\">\np.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 15.0px \'.AppleSystemUIFont\'}\np.p2 {margin: 0.0px 0.0px 0.0px 36.0px; font: 15.0px \'.AppleSystemUIFont\'}\nspan.s1 {font-family: \'UICTFontTextStyleBody\'; font-weight: normal; font-style: normal; font-size: 15.00px}\nspan.s2 {font-family: \'UICTFontTextStyleItalicBody\'; font-weight: normal; font-style: italic; font-size: 15.00px}\n</style>\n</head>\n<body>\n<p class=\"p1\"><span class=\"s1\">Hello there</span></p>\n<p class=\"p2\"><span class=\"s2\">Obi-wan</span></p>\n</body>\n</html>\n

Based on my understanding of pasteHtml() function, I can see that the passed in html string gets escaped one more time. This makes the html string double-escaped.
public func pasteHtml(_ html: String?, handler: (()->Void)? = nil) { guard let html = html, !pastedAsync else { return } pastedAsync = true evaluateJavaScript("MU.pasteHTML('\(html.escaped)')") { result, error in self.pastedAsync = false handler?() } }

After html.escaped, the string is

<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\" \"http://www.w3.org/TR/html4/strict.dtd\">\\n<html>\\n<head>\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\\n<meta http-equiv=\"Content-Style-Type\" content=\"text/css\">\\n<title></title>\\n<meta name=\"Generator\" content=\"Cocoa HTML Writer\">\\n<style type=\"text/css\">\\np.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 15.0px \\\'.AppleSystemUIFont\\\'}\\np.p2 {margin: 0.0px 0.0px 0.0px 36.0px; font: 15.0px \\\'.AppleSystemUIFont\\\'}\\nspan.s1 {font-family: \\\'UICTFontTextStyleBody\\\'; font-weight: normal; font-style: normal; font-size: 15.00px}\\nspan.s2 {font-family: \\\'UICTFontTextStyleItalicBody\\\'; font-weight: normal; font-style: italic; font-size: 15.00px}\\n</style>\\n</head>\\n<body>\\n<p class=\"p1\"><span class=\"s1\">Hello there</span></p>\\n<p class=\"p2\"><span class=\"s2\">Obi-wan</span></p>\\n</body>\\n</html>\\n

Could this be the issue?

Side note: I also noticed that for the above escaped method, the results seems to differ from other online escape tool.

@JohnKuan
Copy link
Contributor Author

From what I gather,

The iOS Notes app public.html item in the pasteboard is giving a full html tag. I think this could be the issue as well. I would assume only the body tag (and perhaps style tag) will be copied.

▿ 1 element
  ▿ 0 : 6 elements
    ▿ 0 : 2 elements
      - key : "public.utf8-plain-text"
      - value : Hello there
Obi-wan
    ▿ 1 : 2 elements
      - key : "com.apple.notes.richtext"
      - value : <OS_dispatch_data: data[0x2809a8840] = { leaf, size = 1118, buf = 0x1201d0000 }>
    ▿ 2 : 2 elements
      - key : "com.apple.webarchive"
      - value : <OS_dispatch_data: data[0x2807d0900] = { leaf, size = 1086, buf = 0x1201dc000 }>
    ▿ 3 : 2 elements
      - key : "com.apple.flat-rtfd"
      - value : <OS_dispatch_data: data[0x2807d0f80] = { leaf, size = 614, buf = 0x1201d8000 }>
    ▿ 4 : 2 elements
      - key : "iOS rich content paste pasteboard type"
      - value : <>
    ▿ 5 : 2 elements
      - key : "public.html"
      - value : <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="Content-Style-Type" content="text/css">
<title></title>
<meta name="Generator" content="Cocoa HTML Writer">
<style type="text/css">
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 15.0px '.AppleSystemUIFont'}
p.p2 {margin: 0.0px 0.0px 0.0px 36.0px; font: 15.0px '.AppleSystemUIFont'}
span.s1 {font-family: 'UICTFontTextStyleBody'; font-weight: normal; font-style: normal; font-size: 15.00px}
span.s2 {font-family: 'UICTFontTextStyleItalicBody'; font-weight: normal; font-style: italic; font-size: 15.00px}
</style>
</head>
<body>
<p class="p1"><span class="s1">Hello there</span></p>
<p class="p2"><span class="s2">Obi-wan</span></p>
</body>
</html>

@JohnKuan
Copy link
Contributor Author

JohnKuan commented Dec 22, 2023

I tried to insert a test case with the full html tag with DOCTYPE ontestPasteHtmlPreprocessing() and it gave

<p><br></p><p><br></p>\n\n<title></title>\n\n<style type=\"text/css\">\np.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 15.0px \'.AppleSystemUIFont\'}\np.p2 {margin: 0.0px 0.0px 0.0px 36.0px; font: 15.0px \'.AppleSystemUIFont\'}\nspan.s1 {font-family: \'UICTFontTextStyleBody\'; font-weight: normal; font-style: normal; font-size: 15.00px}\nspan.s2 {font-family: \'UICTFontTextStyleItalicBody\'; font-weight: normal; font-style: italic; font-size: 15.00px}\n</style>\n\n\n<p>Hello there</p>\n<p>Obi-wan</p>\n\n

It seems to me that the <p><br></p> is being added as a replacement to one or more of the removed tags (html, head, meta), as they are placed just before the <title></title> tag. CMIIW.

@stevengharris
Copy link
Owner

Thanks for all the detail. I will look into it. It's a kind of weird and no doubt brittle guessing game on how to deal with the pasteboard types. Basically, I thought the logic should be: if an image is present, use it; else if html is present, use it; else, use text if present. I suppose I could use the com.apple.notes.richtext (or public.rtf) if it's present without public.html, and then do some kind of rtf-to-html conversion, but I don't know how well that will translate and presumably it could still be a bit different between iOS and Mac Notes unless I always prefer the rtf over html. I will have to play with it a bit. Let me know if you have an opinion on the best way to go.

@JohnKuan
Copy link
Contributor Author

I am not sure about rtf conversion to html will resolve this issue, but that still involves clean up on the copied text, just in a different format.

Perhaps it could be an additional cleanup method to remove these tags properly, without causing the addition of <p><br></p>

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="Content-Style-Type" content="text/css">

I noticed that your test cases handle the simple paragraph <p>some_text</p> examples. How about the case for <p>some_text_1</p><p>some_text_2</p>? I tried that and it only paste the first paragraph, but ignores the second (and subsequent) text after that. Is this an expected behavior?

@JohnKuan
Copy link
Contributor Author

JohnKuan commented Jan 2, 2024

Hi @stevengharris,

I was considering the possibilities over the holidays.

RTF-based copy and paste
I think for the case of rtf copied text, it might be worthwhile to explore that as a separate case for pastableTypes. It should try to preserve the formatting as much as possible while converting to a valid html.

HTML-based copy and paste
For most cases, it will copy the html structure as what you have selected from (from websites, or other html sources). However, what we need is to retain the body of the html. For that we could consider using the SwiftSoup Library to extract out the html body. What could be done better is that we are able to also retain the formatting as much as possible if we use this method, as this will exclude the <style> tags that are in the header.

Otherwise text-based copy and paste
If all else fails, we might need to weigh the possibility of having an option for text-only paste, for a simpler paste experience. IMO in most cases, the accuracy of pasted text is the highest priority, followed by preserving the formatting. We can recreate the formatting with the methods you have provided in this library.

Cheers & Happy New Year!

@stevengharris stevengharris reopened this Jan 2, 2024
@stevengharris
Copy link
Owner

Thanks again for looking into it further. I had suggested the RTF conversion in cases where HTML was missing in #128 (comment) mainly because in #128 (comment) you showed that iOS Notes was providing RTF+HTML while MacOS Notes was only providing RTF without the HTML. Both cases showed up with text, so that would always be there for a fallback (or if the user chooses "paste without formatting". This way you would get some reasonably formatted text on both iOS and Mac, although there might be differences depending on what I use for the conversion. I am using SwiftSoup elsewhere and could use that here, too.

I will be checking out the header issues and the p-followed-by-p cases you mention above next. I am hoping also maybe to come up with a kind of Rosetta stone of a Notes content case that I can then bake into a paste test. I envision something with various formatting, lists, checkboxes (won't be good on the MarkupEditor side, I'm sure, per #120), and images.

@JohnKuan
Copy link
Contributor Author

JohnKuan commented Jan 4, 2024

Thanks for considering my inputs.

May I know if there is a foreseeable timeline in which I can revisit this issue?

Currently, this library is being used in production in my company. I am looking into this issue as the stakeholders in my company are raising this as a bug, and I am wondering how to solve this without going into specifics of the javascript codes.

@stevengharris
Copy link
Owner

I am just about done and will push something tomorrow even if it's just on a branch. It looks very good but is causing a few issues with the tests. I did add the rtf as a supported paste type.

@stevengharris
Copy link
Owner

I think this and issue #179 are fixed on the https://github.com/stevengharris/MarkupEditor/tree/pasteFromNotes branch, with PR #180. Unfortunately, the RedoTests.testRedoPasteHtml test fails, so I don't want to merge it yet. I am 99% certain it is some kind of weird test anomaly that only occurs because of the way the test simulates selection compared to real life, since I've see similar things in the past. I'm going to be traveling for a few days and won't have time to revisit the test failure until the middle of next week. I'm guessing you can use this with confidence as-is.

@stevengharris
Copy link
Owner

Turns out my "99% certain" was 100% wrong, because the test was flagging a real issue. I also found that the undo/redo for pasting into an empty document (#179) was not working correctly. I also added tests for the pasting-into-an-empty-document case you identified in #179.

@JohnKuan
Copy link
Contributor Author

Hi @stevengharris,

I retracted my previous reply. It is working now. I just have to point to your latest markup.js code changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants